Edmund Eryuba

Posted on Feb 1

Data Modelling for High Performance and Accurate Analytics in Power BI

#datascience #powerfuldevs #datastructures #powerbi

This article explores data modelling in Power BI with a focus on different schema types and explains how proper modelling enhances performance and ensures accurate reporting.

What is Data Modelling

Data modelling is one of the most critical steps in building effective business intelligence (BI) solutions. In Power BI, data modelling refers to the process of structuring data into related tables, defining relationships and creating a logical framework that supports efficient querying, accurate calculations and meaningful reporting.

A well-designed data model is not just about organizing tables; it directly impacts report performance, usability, scalability, and the correctness of insights derived from data. Poor data modelling leads to slow reports, incorrect aggregations, complex DAX expressions, and ultimately unreliable business decisions.

In Power BI, data modelling involves:

Identifying business entities (facts and dimensions)
Structuring tables logically
Defining relationships between tables
Setting cardinality and filter direction
Creating calculated columns and measures
Ensuring data granularity and consistency

What is a Schema in Power BI?

In Power BI, a schema refers to the structure and organization of data within a data model. Schemas define how data is connected and related within the model, influencing the efficiency and performance of data queries and reports. Understanding schemas requires modelers to classify their model tables as either dimension or fact.

Fact tables

Fact tables store quantitative, transactional data that can be sales orders, quantity sold, revenue, profit and more. A fact table contains dimension key columns that relate to dimension tables, and numeric measure columns. The dimension key columns determine the dimensionality of a fact table, while the dimension key values determine the granularity of a fact table.

Example of a fact table: Consider a simple sales analytics scenario in Power BI.

This table stores transactional (measurable) data.

SalesID	DateKey	ProductKey	CustomerKey	Quantity	SalesAmount
1001	20240101	501	301	2	200
1002	20240101	502	302	1	150
1003	20240102	501	303	3	300
1004	20240103	503	301	1	120

Characteristics:

Contains foreign keys to dimensions.
Contains numeric measures.
Has many rows (high volume).

Dimension tables

Dimension tables describe the business entities that are modelled. Entities can include products, people, places, and concepts including time itself. A dimension table contains a key column (or columns) that acts as a unique identifier, and other columns. Other columns support filtering and grouping your data.

This table provides descriptive attributes about products.

ProductKey	ProductName	Category	Brand
501	Laptop	Electronics	Dell
502	Headphones	Accessories	Sony
503	Mouse	Accessories	Logitech

Types of Schemas in Power BI:

1. Star Schema

The star schema consists of a central fact table connected directly to multiple dimension tables, much like the appearance of a star.
The central fact table contains quantitative data (e.g., sales), while the dimension tables hold descriptive attributes related to the facts (e.g. Employee, Date, Territory). Dimension tables are not connected to each other.

Star schemas are ideal for straightforward reporting and querying. They are efficient for read-heavy operations, making them suitable for dashboards and summary reports.

2. Snowflake Schema

The snowflake schema is a normalized version of the star schema. In this design, dimension tables are further divided into related tables, resulting in a more complex structure.
The normalization process eliminates redundancy by splitting dimension tables into multiple related tables. This results in a web-like structure, resembling a snowflake.

Snowflake schemas are used in scenarios requiring detailed data models and efficient storage. They are beneficial when dealing with large datasets where data redundancy needs to be minimized.

3. Galaxies Schema (Fact Constellation Schema)

The galaxies schema involves multiple fact tables that share dimension tables, creating a complex, interconnected data model.

This schema consists of multiple fact tables linked to shared dimension tables, enabling the analysis of different business processes within a single model.

Galaxies schemas are suitable for large-scale enterprise environments where multiple related business processes need to be analysed. They support complex queries and detailed reporting across various domains.

Implementing schemas in Power BI

a. Creating a Star Schema

Set Up Fact and Dimension Tables: Identify and create the central fact table and surrounding dimension tables.
Link Tables: Establish relationships between the fact table and dimension tables using foreign keys.
Optimize for Performance: Index key columns and use efficient data types to enhance query performance.

b. Implementing a Snowflake Schema

Normalize Dimension Tables: Split dimension tables into related sub-tables to reduce redundancy.
Create Relationships: Define relationships between sub-tables and the main dimension tables, ensuring referential integrity.
Optimize Storage: Use appropriate storage and indexing strategies to manage complex joins efficiently.

c. Setting Up a Galaxies Schema

Identify Fact Tables: Determine the various fact tables needed for different business processes.
Share Dimension Tables: Create shared dimension tables to link multiple fact tables.
Ensure Efficient Querying: Design the schema to support complex queries and optimize performance through indexing and data partitioning.

Role of data modelling in performance and reliable reporting

a. Query and Engine Performance

Query and engine performance refers to how efficiently Power BI processes data when users interact with reports. Power BI’s VertiPaq engine performs best when data is organised using a star schema, where fact tables store numerical data and dimension tables store descriptive attributes.

This structure improves compression, reduces the number of joins required during query execution, and simplifies DAX calculations. As a result, reports load faster, visuals respond more quickly, and overall system performance improves.

b. Memory and Scalability

Memory and scalability describe the ability of a Power BI model to handle large and growing datasets. Proper data modelling controls dataset size by reducing column cardinality and removing unnecessary fields. Low-cardinality columns compress efficiently, while column pruning helps minimise memory usage and refresh time.

By structuring data into lean fact and dimension tables rather than wide flat tables, Power BI models become more scalable and capable of supporting high data volumes without performance degradation.

c. Correct Aggregations and Metrics

Correct aggregations ensure that reported values accurately reflect business operations. This depends on defining clear data granularity and using proper relationship structures. Each fact table must represent a consistent level of detail, and filters should flow from dimensions to facts.

Poor modelling can result in double counting, ambiguous totals, or misleading KPIs. A well-designed model prevents these issues by enforcing one-to-many relationships and maintaining logical data structure.

d. Filter Propagation and User Trust

Filter propagation determines how slicers and filters affect report visuals. In a properly modelled system, filters behave predictably and consistently across all visuals, allowing users to explore data intuitively.

When modelling is poor, filters may behave inconsistently, leading to confusing or contradictory results. Reliable filter behavior builds user trust and ensures that insights derived from reports are credible and easy to interpret.

e. Maintainability and Governance

Maintainability refers to how easy the model is to manage and extend over time. A strong data model supports reusable measures, consistent dimensions, and standard business definitions across reports. This creates a single source of truth for the organisation, reduces duplication of logic, and simplifies governance.

As a result, the reporting environment becomes easier to maintain, more consistent, and more reliable for long-term decision-making.

Conclusion

Understanding different schemas in Power BI is crucial for designing efficient data models. Each schema has unique advantages: the star schema is ideal for straightforward reporting and querying, offering simplicity and ease of use; the snowflake schema provides detailed, normalized structures, reducing redundancy and optimizing storage; and the galaxies schema supports complex, large-scale data models with multiple fact tables sharing dimension tables.

Choosing the right schema improves query performance, data storage efficiency, and data refresh operations. By mastering these schemas, you can create robust and scalable data models, enabling your organization to make data-driven decisions effectively.

Understanding and implementing different schemas in Power BI is crucial for designing efficient and effective data models. Each schema type; star, snowflake, and galaxies, offers unique benefits and use cases. By mastering these schemas, you can create robust data models that support comprehensive analysis and insightful reporting. Experiment with different designs based on your data needs and continue refining your skills to become a Power BI expert.

DEV Community