Every couple weeks I dig into the code first support in Entity Framework and continue to be surprised at all the different options you have to control the generated database.
Quick background: Entity Framework 4.1 was released earlier this month and includes “code first” support. This enables you to write plain old c# objects (POCO) that model your domain and then EF will generate a database schema for you the first time you run your code.
On my current project, we were noodling around the other day with ideas on the best way to represent an inheritance tree in SQL schema. I decided to look at how the code-first guys are doing it, and opened my eyes to 3 common patterns. I won’t go into them in super gory detail, Google can help you find plenty of articles on them. Here’s just a quick rundown.
Table per Hierarchy (TPH)
In this pattern (the default in EF code first), there is a single table that represents all classes in the inheritance hierarchy. This makes for very straight forward SQL to access the data, but relies on using NULLs in many columns since the table ends up having the complete set of all properties for all the classes involved. EF adds a “discriminator” column to keep track of what class type is represented by a particular row.
Table per Type (TPT)
This pattern uses a different table for each class (including abstract base classes) in the hierarchy. This seems the most “natural” to me. The inherited properties are in the “base class” table, and the properties for subclasses are in their own tables. These “derived” tables have ID primary key columns that are a foreign key to the base class table’s primary key.
The SQL needed to retrieve the requested class is not too crazy – it includes an INNER JOIN between the subclass and the base class. I see warnings all over the place about being careful not to use this if the hierarchy is very deep. This approach is capable of handling polymorphism just fine.
To achieve this in EF code first, you just use an attribute on the subclasses:
[Table("Student")] public class Student : Person { ... }
Table per Concrete Type (TPC)
This final pattern supported by code first uses a database table for each non-abstract (concrete) class in the hierarchy. Each table is self-contained, having all the properties required by a class, including its inherited properties. There are no relationships between the tables in the database (they just seem related because they have common column names).
The SQL used to get data for a particular class *can* be pretty straight forward, you just hit the table that represents that class. But there is no good way to achieve polymorphism in this pattern, since there is no relationship between the tables. You have to know what table to use for a given class reference.
To use this pattern, you have to use the “fluent API” in code first – I couldn’t find an attribute related to TPC.
Bottom Line
As with many questions about patterns – which one I should use in a particular case really “just depends”. I know I like TPT the best, seems the most natural to me, and I just have to keep an eye on the SQL and performance.
Let me know which one YOU like best.
I like TPT as well. Its neat and more intuitive.
Hi there,
Great little article. Gave me all the information I required to make my decision. I’m really enjoying working with EF since they introduced CodeFirst properly.