Print   Add To Favorites     Email To Friend   Rate This Article SQL Server 2005 Data Modeling - Part 1
 page 3 of 4
by
Feedback
Views (Total / Last 10 Days): 22356/ 63
Article Contents:

Features

This section defines basic components of the dimensional model, facts and dimensions, along with some of the key concepts involved in handling changes over time.

Facts

Each fact table contains the measurements associated with a specific business process, like taking an order. A record in a fact table is a measurement, and a measurement event can always produce a fact table record. These events usually have numeric measurements that quantify the magnitude of the event, such as quantity ordered, sale amount, or call duration. These numbers are called facts.

Most facts are numeric and each fact value can vary widely depending on the business process being measured. Not all numeric data are facts. Exceptions include discrete descriptive information like package size or weight.

The Grain

The level of detail contained in the fact table is called the grain. It is strongly recommended to build fact tables with the lowest level of detail that is possible from the original source – generally this is known as atomic level.

Each fact table must be kept at a single grain.

For example, it would be confusing and dangerous to have individual sales order line items in the same fact table as the monthly forecast.

Dimensions

Dimensions are the foundation of the dimensional model, describing the objects of the business, such as employee and customer.

Each dimension table links to all the business processes in which it participates. For example, the products dimension will be involved in supplier orders and inventory. Usually, a dimension includes all instances of its entity – all the products the company sells, for example.

Bringing Facts and Dimensions Together:

The completed dimensional model has a characteristic appearance, with the fact table in the middle surrounded by the dimensions.

The figure below shows a simple dimensional model for the classic example: the retail grocery sales business process.

Figure 1

This model allows users across the business to analyze retail sales activity from various perspectives. Category managers can look at sales by product for different stores and different dates. Store planners can look at sales by store format or location. Store managers can look at sales by date or cashier. While this model is reasonably robust, a large retail grocer would have a few more dimensions, customers in particular, and many more attributes.

The Bus Matrix, Conformed Dimensions, and Drill Across:

The idea of reusing dimensions across multiple business processes is the foundation of the enterprise data warehouse system and the heart of the Enterprise Data Warehouse Bus Matrix concept.

In the retail grocery example, a dimension such as Product will be used in both the retail sales and the store inventory dimensional models. This works only if the two business processes use the exact same product dimension with the same keys; that is, they use a conformed dimension. Conformed dimensions are the cornerstone of the enterprise-enabled data warehouse system. This kind of analysis involving data from more than one business process is called drill-across.

Surrogate Keys

For the data warehouse system, we need to create a whole new set of keys in the data warehouse database, separate from the keys in the transaction source systems. These are known as Surrogate Keys. A Surrogate key is a unique value, usually an integer, assigned to each row in the dimension. This surrogate key becomes the primary key of the dimension table and is used to join the dimension to the associated foreign key field in the fact table.

Surrogate keys protect the DW system from changes in the source system.

Surrogate keys allow the DW system to integrate data from multiple source systems.

Surrogate keys enable you to add rows to dimensions that do not exist in the source system.

Surrogate keys provide the means for tracking change in dimension attributes over time.

Integer surrogate keys are an efficient key in the relational database and Analysis Services. Using them improves query and processing performance, and because of their compactness, substantially reduces the size of the fact tables in the relational database.

Slowly Changing Dimensions:

Although we like to think of the attribute values in a dimension to be fixed, it turns out that some change over time. In an employee dimension, the date of birth should not change over time. However, other fields, such as the employee's department, might change several times over the length of a person's employment. Many of these changes are critical to understanding the dynamics of the business. The ability to track these changes over time is one of the fundamental reasons for the existence of the DW system.

If the value of an attribute can change, you need to be prepared to deal with that change. We call dimensions that have changeable attribute values slowly changing dimensions.

Dates

Date is the fundamental business dimension across all organizations and industries, although many times a date table does not exist in the operational environment. Analyses that trend across dates or make comparisons between periods are best supported by creating and maintaining a robust Date dimension table.

Every dimensional DW system has a Date dimension, typically with one row for every day for which you expect to have data in a fact table.

The date dimension is a good example of a role-playing dimension. It is common for a date dimension to be used to represent different dates, such as order date, due date and ship date. If the data is accessed through Analysis Services, we do not need to bother with views or synonyms to handle multiple roles. Analysis Services understands the concept of role-playing dimensions.

Snowflaking

In simple terms, snowflaking is the practice of connecting lookup tables to fields in the dimension tables. At the extreme, snowflaking involves re-normalizing the dimensions to the third normal form level, usually under the misguided belief that this will improve maintainability, increase flexibility or save space. We discourage snowflaking. It makes the model more complex and less usable.

In few cases, we support the idea of lookup or grouping tables to the dimensions. One of these cases involves rarely used lookups, as in the example of joining the Date table to the DateOfBirth field in the Customer dimension so we can count customers grouped by their month of birth.

 « (Page 2) View Entire Article (Page 4) »