bluebay2014 - Fotolia
Star schema vs snowflake schema: Which is better?
Yahoo’s data and business intelligence architect, Rohit Chatter, answers the latest debate, Star versus Snowflake schema, by breaking down the differences, one aspect at a time
What are the key differences in Snowflake and Star schema and where should they be applied?
Comparing the Star schema and Snowflake schema reveals four fundamental differences:
1. Data optimisation
The Snowflake model uses normalised data, which means that the data is organised inside the database in order to eliminate redundancy and thus helps to reduce the amount of data. The hierarchy of the business and its dimensions are preserved in the data model through referential integrity.
Figure 1: Snowflake model
The Star model, on the other hand, uses de-normalised data. In this model, dimensions directly refer to fact table, and business hierarchy is not implemented via referential integrity between dimensions.
Figure 2: Star model
2. Business model
Primary key is a single unique key (data attribute) that is selected for particular data. In the previous “advertiser” example, the Advertiser_ID will be the primary key (business key) of a dimension table. The foreign key (referential attribute) is just a field in one table that matches a primary key of another dimension table. In our example, the Advertiser_ID could be a foreign key in Account_dimension.
In the Snowflake model, the business hierarchy of data model is represented in a primary key-foreign key relationship between the various dimension tables.
In the Star model, all required dimension tables have only foreign keys in the fact tables.
The third differentiator in this Star schema vs Snowflake schema face-off is the performance of these models. The Snowflake model has more joins between the dimension table and the fact table, so the performance is slower. For instance, if you want to know the advertiser details, this model will ask for a lot of information, such as the advertiser name, ID and address, for which the advertiser and account table need to be joined up with each other and then joined with the fact table.
The Star model, on the other hand, has fewer joins between dimension tables and the facts table. In this model, if you need information on the advertiser, you just have to join the advertiser dimension table with the fact table.
Star schema explained
The Star schema provides a fast response to queries and is the ideal source for cube structures. Learn all about Star schema in this article.
The Snowflake model loads the data marts and so the ETL job is more complex in design and cannot be parallelised because the dependency model restricts it.
The Star model loads the dimension table without dependency between dimensions and so the ETL job is simpler and can achieve higher parallelism.
This brings us to the end of the Star schema vs Snowflake schema debate – but how do you know which one to use?
Where do the two methods fit in?
With the Snowflake model, dimension analysis is easier. For example, “How many accounts or campaigns are online for a given advertiser?”
The Star schema model is useful for metrics analysis, such as “What is the revenue for a given customer?”
(As told to Mitchelle R Jansen)