Shaban Ibrahim

Posted on Feb 1

DATA MODELLING IN POWER BI

#powerbi #starschema #snowflakeschema #datamodelling

Data Modelling in Power BI

For any successful solution when it comes to visualisation and analysis using Power BI, the user has to be well versed with Data Modelling and understand how to navigate around it because that is the core foundation. The model determines a lot, especially when it comes to the outcome of the analysis in terms of performance, usability, DAX complexity and correctness of the results.

Facts and Dimensional Tables

Fact Tables
These can be referred to as master tables; they store measurable and quantitative data. They contain numeric columns and mostly have the largest number of rows compared to the dimension tables, and have foreign keys that connect to the dimension tables. In cases where there are no dimension tables and an analyst wants to do a detailed analysis, they can use the fact table to generate a dimension table or tables.

Examples of fact tables are the sales table, the transaction table, the inventory movement, and the customer table.

Dimension Tables
These tables provide context and detailed descriptions for the fact tables; they can contain textual or categorical data. The primary keys of dimension tables are the ones that form foreign keys in the fact table. They have fewer rows compared to the fact tables, and most of their attributes are used for filtering, grouping and slicing.

An example of a dimension table includes: a date table, a geographic table, and a products table.

Star Schema

A start schema can be described as a data modelling pattern where a fact table is connected to multiple dimension tables, and the dimension tables are NOT connected to each other. A start schema is characterised by a single fact table, a one-to-many relationship from the dimension tables to the fact tables and denormalised dimension tables.

For Examples a model might have FactSales, DimDate, DimCustomer, DimProduct e.t.c

The start schema is commonly used in Power BI because it simplifies the model understanding, can predict filter behaviour, has simple and more reliable DAX calculations and reduces the risks of ambiguity.

This is a recommended and preferred modelling approach.

Snowflake Schema

This is a modelling pattern that is the mutation of the star schema and has its variations, where dimension tables are normalised into multiple related tables.

The snowflake schema is characterised by the fact that, apart from the dimension tables connecting to the fact table alone, they are connected to other sub-dimension tables, making the structure hierarchical in nature.

FactSales --> DimProducts
DimProducts --> DimProductsCategory
DimProductCategory --> DimProductGroup

The snowflake schema is less preferred because it has a more complex relationship chain, making the model more complex, harder to understand and maintain, might degrade performance and makes the DAX calculation complex as a result of indirect filter paths.

The snowflake schema dimensions are usually flattened into a single denormalised dimension, i.e., start schema, to improve performance and usability.

Relationship in Power BI

A relationship connects tables so Power BI knows how data in one table relates to data in another. Without relationships, Power BI treats tables as unrelated. Power Bi supports several types of relationships

1. Cardinality
It defines how rows match between tables:
The Include :

One-to-Many (most common and recommended)
Many-to-One
One-to-One
Many-to-Many

2. Active and Inactive Relationships
Only one active relationship can exist between the two tables at a time, and one can activate an inactive relationship by a DAX syntax USEREALTIONSHIP
Excessive use of inactive relationships can increase the risk of errors and the complexity of the model.

Active → solid line
Inactive → dotted line

3. Cross Filter Direction
This one controls the flow of the filter

Single (Recommended)
Filters go from dimension → fact
Example: Customers → Sales
Both
Filters flow both ways, and this can cause ambiguity

Importance of a good Data Model

A good data model goes a long way to ensure optimal performance and critical reporting, especially when decision-making is needed. The good model will first of all reduce complexity, especially when joining using start schema, having fewer and necessary tables would lead to faster queries, and denormalisation improves compression.

A good data model also goes along way to ensure accurate reporting by ensuring clear relationships and dimensions are established, which helps to ensure filter propagation and consistent slicing and grouping. This also helps to reduce DAX complexities and clean measures that are easy to read, maintain and understand.

The downside of a poor model is slow visuals, timeouts and excessive resource utilisation, misleading calculations and inconsistent results across visuals.

To achieve better and more accurate results, always stick to these principles:

Prefer star schema over snowflake schema
Separate facts and dimension tables clearly.
Use a one-to-many relationship
Keep filter direction singular where possible.
Flat dimension for reporting
Design the model before writing DAX

When it comes to Power BI, Data Modelling is a primary task since it forms the foundation of successful analytics. Being conversant with the schemas, properly defining facts and dimension tables, and proper establishment of relationships goes a great length to impact performance, accuracy and usability of the model. BI Developers can build robust, scalable and reliable reports that can deliver trustworthy insights if they stick to some of the principles we have mentioned.

DEV Community