How to choose the right data modelling technique for your project.
Steven Wallace
Introduction
I have seen a lot of chatter about various data modelling techniques and which might be the best, but the truth is.. It depends!
In this article I'll describe some of the OLAP modelling methods and architecture I know and when you might need to use them. Hopefully giving you a bit of a more holistic picture.
OBT (One Big Table)
OBT simplifies data modelling by consolidating all data into a single table. Recommended for its simplicity and speed, it's ideal for small-scale projects or those with straightforward data structures. Some people say that with modern compute power and cheap storage, this is the ‘One size fits all’. I disagree.
When to Use OBT:
Prototyping and Rapid Development: OBT is ideal for quickly prototyping ideas or building proof of concept projects where speed is important.
Simple Data Structures: When dealing with minimal relationships and data complexity, OBT provides a straightforward solution.
Star Schema (Kimball model)
Created by Ralph Kimball; a Star schema organises data into a central fact table surrounded by dimension tables, resembling a star. This approach optimises analytical queries and reporting, making it popular in data warehousing projects.
When to Use Star Schema (Kimball approach):
Analytical Reporting: Star schema excels in environments requiring complex analytical queries and generating reports for business intelligence.
Data Warehousing: If you're building a data warehouse or data mart, star schema offers proven performance and scalability.
Snowflake Schema
Snowflake schema further normalises dimension tables from star schema, enhancing data integrity and scalability at the expense of increased complexity. It's suitable for projects prioritising these things.
When to Use Snowflake Schema:
(Never! - I strongly believe the high maintenance of this outweighs any benefits compared to other methods)
Data Integrity: Snowflake schema ensures data integrity by reducing redundancy and maintaining consistency.
Scalability: Projects expecting significant growth benefit from snowflake schema's normalised structure, simplifying data management as they scale.
Inmon (3NF + Data marts)
Created by Bill Inmon; This approach emphasises building an enterprise data warehouse as a central repository for all data, integrating data from disparate sources into a normalised structure.
When to Use Inmon Approach:
Centralised Data Storage: If your project requires a centralised repository for all organisational data, the Inmon approach provides a robust solution.
Data Consistency: Inmon's emphasis on normalisation ensures data consistency and integrity across the organisation.
Long-term Analytics: Projects focused on long-term analytics and historical trend analysis benefit from the Inmon approach's comprehensive data storage and integration.
Data Vault 2.0
Data Vault methodology addresses the challenges of large-scale data integration and warehousing projects by organising data into Hubs, Links, and Satellites. It excels in environments with complex integration needs and historical data tracking requirements.
When to Use a Data Vault model:
Large-scale Integration: Data Vault accommodates multiple data sources and evolving integration requirements, enabling scalable and incremental updates.
Historical Data Tracking: Projects needing comprehensive historical data tracking for auditing or regulatory compliance benefit from Data Vault's Satellite tables.
Agile Development: Data Vault's modular design and scalability support agile development methodologies, facilitating iterative enhancements and changes.
Hybrid
Hybrid data modelling approaches combine multiple techniques to leverage the strengths of each while mitigating any weaknesses. These approaches are often bespoke to specific project requirements and may involve a combination of relational, NoSQL, or other specialised data models.
I have used a hybrid approach before. Inmon's approach to 3NF combined with elements of Data Vault (link tables) and then Kimball Star Schemas for a data mart layer. In some cases there were OBT/views off of that too.
When to Use Hybrid Approaches:
Custom Requirements: Hybrid approaches are suitable for projects with unique or specific data modelling requirements that cannot be fully addressed by any single technique.
Optimising Performance: By combining the strengths of different data modelling techniques, hybrid approaches can optimise performance for specific use cases or query patterns.
A few more data modelling examples
There are also several other types of database, including document-oriented, key-value, and graph databases, each with its own data modelling considerations. Here's a brief overview of each:
Document-Oriented Databases
These store data as documents, typically in formats like JSON or BSON. Each document can have a different structure, and they are often grouped into collections. Data modelling in document-oriented databases involves designing the structure of these documents and deciding how to organise them within collections.
Key-Value Stores
These databases store data as key-value pairs, where each value is associated with a unique key. Data modelling in key-value stores involves determining the keys and values for each piece of data and how they are organised.
Graph Databases
These databases store data as nodes, edges, and properties, making them ideal for modelling relationships between data points. Data modelling in graph databases involves defining nodes and edges and how they are connected.
Conclusion
We should carefully evaluate the requirements, constraints, and objectives of our projects to select the most appropriate data modelling technique.
By understanding when to use each approach and matching them to the specific needs of the project, we can build robust, efficient, and scalable data models that drive actionable insights and business value.
Remember, there is no one-size-fits-all approach to data modelling, but with a bit of careful consideration and strategic planning, the right solution can be found for any project.