Data Modeling Best Practices
In today’s world, businesses are continuously evolving. With changing times a wave of transformation be it digital, be it predictive forecasting or building the agile ways of operations that will sustain in crisis, having your data structured in the right format and accurate with respect to changing needs of your business is of prime importance. Data modeling plays a crucial role in enabling business users to make more informed, and data-driven decisions are key to success. Data Modeling is a way to structure your data to be stored in a database, and the approaches define agility of the database, query performance and value add to the information being extracted.
Data Modeling History
The origin of the concept Data warehouse and data modeling concepts is considered during the 1980s, but even before the architecture was put together, many people were working on concepts like dimensions and facts to support some business decisions.
Before the time of the Enterprise DW, there were many databases storing redundant information, and there was no link between them. At times for global companies, even the language and currency used to be local.
It wasn’t easy to even with the data being available; to be useful to make a consensus organizational decision and define business strategies for future roadmap.
During the same time, many companies were coming up with large databases to support decision making.
In the 1990s, both Bill Imnon and Ralph Kimball published their recommendations of approaches in their books. Bill Imnon’s Building the data warehouse and Ralph Kimball the data warehouse toolkit.
Kimball’s approach focuses on business needs and suggests storing the business data that is then structured for querying and analytics. His approach uses the dimensional model such as star schemas or snowflakes enabling usage of slow-changing dimensions as well to organize the data in dimensional data warehouse making this set up faster to implement but a bit difficult to maintain. Kimball’s approach is a bottom-up design where first the analytics data marts are created first, and then they are used to join further to set up a whole data warehouse.
Inmon uses the ER model in which each department will have a separate data mart.
Inmon’s approach is a top-down design where the first step is setting up a normalized data model, and the next step is to set up dimensional data marts for specific business processes.
Choosing a model needs consideration of the type of business, scope of data, available cost and time. The approaches are suitable for specific use cases and which fast-changing world adopting to suitable hybrid models is a usual practice.
Diagram: Inmon’s Approach:
Evolution Of Data Modeling In Data Warehouses:
As the enterprise data warehouse architecture started taking shape, the beginning of this journey was the databases which were more integrated and could enable the user to report on the data.
The data was updated with specific frequencies such as daily, weekly or monthly from the business operational databases.
Due to the limitation of query performance and reporting, now companies were looking for more optimized structures along with frequent updates to the reporting databases from transactional databases.
Next phase included more like real-time data update and improved performance databases mainly focussed on reporting and aggregating the data to get summaries.
The organizations grew the data integration to collaborate between different processes of business became essential, and that began today’s harmonized enterprise architecture. This architecture provides all business processes related data can be accessed and joined based on the requirement to derive required intended output of a KPI.
Key Considerations In Data Modeling:
- We are beginning with a clear set of goals or expectations to achieve with excellent business knowhow.
- Keeping in mind the database’s growth rate is enabling scalability and keeping it as simple as possible.
- Beginning with a limited scope and structuring data in facts, dimensions attributes helps
- as the data warehouse grows, keep a tap on growth rate. The archival processes, as well as other optimization techniques, come in handy.
- We are categorizing and segregation wherever necessary.
Recommended Data Modeling Best Practices Well Suited Data Model:
- List all the involved entity types in scope.
- Map the complete list of Dimensions, Attributes and relevant Facts or Measures.
- Define the grain or granularity at which the data is expected to be stored; the most recommended grain would be the lowest grain in most cases, to begin with, data modelling and then transforming and aggregating data to achieve summarised information.
- Mapping of all entities along with their relationships
- define the logic for customized Facts or Measures to be derived.
- Ensuring the optimal number of tables to be created and defining the joins/views wherever necessary.
- Refer and use recommended naming conventions.
- Choose a data model design method to apply.
- Data redundancy to be checked and removed using normalization.
- If the performance is not optimal, apply various denormalization methods to achieve improved performance.
- The essential part of defining a data model is how your access roles and security is passed down to your grain and tables, including access to dimension and its attributes.