This section defines basic components of the dimensional
model, facts and dimensions, along with some of the key concepts involved in
handling changes over time.
Facts
Each fact table contains the measurements associated with a
specific business process, like taking an order. A record in a fact table is a
measurement, and a measurement event can always produce a fact table record.
These events usually have numeric measurements that quantify the magnitude of
the event, such as quantity ordered, sale amount, or call duration. These
numbers are called facts.
Most facts are numeric and each fact value can vary widely
depending on the business process being measured. Not all numeric data are
facts. Exceptions include discrete descriptive information like package size or
weight.
The Grain
The level of detail contained in the fact table is called
the grain. It is strongly recommended to build fact tables with the lowest
level of detail that is possible from the original source – generally this is
known as atomic level.
Each fact table must be kept at a single grain.
For example, it would be confusing and dangerous to have
individual sales order line items in the same fact table as the monthly
forecast.
Dimensions
Dimensions are the foundation of the dimensional model,
describing the objects of the business, such as employee and customer.
Each dimension table links to all the business processes in
which it participates. For example, the products dimension will be involved in
supplier orders and inventory. Usually, a dimension includes all instances of
its entity – all the products the company sells, for example.
Bringing Facts and Dimensions Together:
The completed dimensional model has a characteristic
appearance, with the fact table in the middle surrounded by the dimensions.
The figure below shows a simple dimensional model for the
classic example: the retail grocery sales business process.
Figure 1
This model allows users across the business to analyze
retail sales activity from various perspectives. Category managers can look at
sales by product for different stores and different dates. Store planners can
look at sales by store format or location. Store managers can look at sales by
date or cashier. While this model is reasonably robust, a large retail grocer would
have a few more dimensions, customers in particular, and many more attributes.
The Bus Matrix, Conformed Dimensions, and
Drill Across:
The idea of reusing dimensions across multiple business
processes is the foundation of the enterprise data warehouse system and the
heart of the Enterprise Data Warehouse Bus Matrix concept.
In the retail grocery example, a dimension such as Product
will be used in both the retail sales and the store inventory dimensional
models. This works only if the two business processes use the exact same
product dimension with the same keys; that is, they use a conformed dimension.
Conformed dimensions are the cornerstone of the enterprise-enabled data
warehouse system. This kind of analysis involving data from more than one business
process is called drill-across.
Surrogate Keys
For the data warehouse system, we need to create a whole new
set of keys in the data warehouse database, separate from the keys in the
transaction source systems. These are known as Surrogate Keys. A Surrogate key
is a unique value, usually an integer, assigned to each row in the dimension.
This surrogate key becomes the primary key of the dimension table and is used
to join the dimension to the associated foreign key field in the fact table.
Surrogate keys protect the DW system from changes in the
source system.
Surrogate keys allow the DW system to integrate data from
multiple source systems.
Surrogate keys enable you to add rows to dimensions that do
not exist in the source system.
Surrogate keys provide the means for tracking change in
dimension attributes over time.
Integer surrogate keys are an efficient key in the
relational database and Analysis Services. Using them improves query and
processing performance, and because of their compactness, substantially reduces
the size of the fact tables in the relational database.
Slowly Changing Dimensions:
Although we like to think of the attribute values in a
dimension to be fixed, it turns out that some change over time. In an employee
dimension, the date of birth should not change over time. However, other
fields, such as the employee's department, might change several times over the
length of a person's employment. Many of these changes are critical to
understanding the dynamics of the business. The ability to track these changes
over time is one of the fundamental reasons for the existence of the DW system.
If the value of an attribute can change, you need to be
prepared to deal with that change. We call dimensions that have changeable
attribute values slowly changing dimensions.
Dates
Date is the fundamental business dimension across all
organizations and industries, although many times a date table does not exist
in the operational environment. Analyses that trend across dates or make
comparisons between periods are best supported by creating and maintaining a
robust Date dimension table.
Every dimensional DW system has a Date dimension, typically
with one row for every day for which you expect to have data in a fact table.
The date dimension is a good example of a role-playing
dimension. It is common for a date dimension to be used to represent different
dates, such as order date, due date and ship date. If the data is accessed
through Analysis Services, we do not need to bother with views or synonyms to
handle multiple roles. Analysis Services understands the concept of
role-playing dimensions.
Snowflaking
In simple terms, snowflaking is the practice of connecting
lookup tables to fields in the dimension tables. At the extreme, snowflaking
involves re-normalizing the dimensions to the third normal form level, usually
under the misguided belief that this will improve maintainability, increase
flexibility or save space. We discourage snowflaking. It makes the model more
complex and less usable.
In few cases, we support the idea of lookup or grouping
tables to the dimensions. One of these cases involves rarely used lookups, as
in the example of joining the Date table to the DateOfBirth field in the
Customer dimension so we can count customers grouped by their month of birth.