Creating and Using Data Warehouses

Dimension Tables

Dimension tables contain attributes that describe fact records in the fact table. Some of these attributes provide descriptive information; others are used to specify how fact table data should be summarized to provide useful information to the analyst. Dimension tables contain hierarchies of attributes that aid in summarization. For example, a dimension containing product information would often contain a hierarchy that separates products into categories such as food, drink, and nonconsumable items, with each of these categories further subdivided a number of times until the individual product SKU is reached at the lowest level.

Dimensional modeling produces dimension tables in which each table contains fact attributes that are independent of those in other dimensions. For example, a customer dimension table contains data about customers, a product dimension table contains information about products, and a store dimension table contains information about stores. Queries use attributes in dimensions to specify a view into the fact information. For example, a query might use the product, store, and time dimensions to ask the question "What was the cost of nonconsumable goods sold in the northeast region in 1999?" Subsequent queries might drill down along one or more dimensions to examine more detailed data, such as "What was the cost of kitchen products in New York City in the third quarter of 1999?" In these examples, the dimension tables are used to specify how a measure (cost) in the fact table is to be summarized.

Columns in a dimension table can be used to categorize the information into hierarchical levels. For example, a dimension table for stores in the FoodMart 2000 sample database includes the following columns that specify the hierarchy levels.

Column Description
store_country Specifies the country in which the store is located. This is the country level of the hierarchy.
store_state Specifies the state in which the store is located. This is the state level of the hierarchy.
store_city Specifies the city in which the store is located. This is the city level of the hierarchy.
store_id Specifies the individual store. This is the lowest level of the hierarchy. This field contains the primary key of the store dimension table and is used to join the dimension table to the fact table.
store_name Specifies the name of the store. The values in this column are used to identify the store to users in a readable form.

Other columns not shown provide additional attribute information. For information about how dimension tables are used in OLAP cubes built using Microsoft® SQL Server™ 2000 Analysis Services, see Dimensions.

Varieties of Dimension Tables

The preceding example illustrates a dimension table that contains a balanced hierarchy that is separated into regular levels. Other types of dimension tables contain less balanced information, such as part-breakdown structures or organization charts in which the hierarchy is represented by parent-child relationships instead of an array of levels.

Surrogate Keys

It is important that primary keys of dimension tables remain stable. It is strongly recommended that surrogate keys be created and used for primary keys for all dimension tables. Surrogate keys are keys that are maintained within the data warehouse instead of keys taken from source data systems. There are several reasons for the use of surrogate keys:

The implementation and management of surrogate keys is the responsibility of the data warehouse. OLTP systems are rarely affected by these situations, and the purpose of these keys is to accurately track history in the data warehouse. Surrogate keys are maintained in the data preparation area during the data transformation process.

Referential Integrity

Referential integrity must be maintained between all dimension tables and the fact table. Each fact record contains foreign keys that relate to primary keys in the dimension tables. Every fact record must have a related record in every dimension table used with that fact table. Missing records in a dimension table can cause facts to be ignored when the dimension table is joined to the fact table to respond to queries or for the population of OLAP cubes. Queries can return inconsistent results if records are missing in one or more dimension tables. Queries that join a defective dimension table to the fact table will exclude facts whereas queries that do not join the defective dimension table will include those facts.

Shared Dimensions

A data warehouse must provide consistent information for similar queries. One method to maintain consistency is to create dimension tables that are shared and used by all components and data marts in the data warehouse. Candidates for shared dimensions include customers, time, products, and geographical dimensions such as the store dimension in the example earlier in this topic. For example, requiring that all OLAP cubes and data marts use the same shared time dimension enforces consistency of results summarized by time.