Understanding Data Warehousing
page 1 of 8
Published: 02 Mar 2007
Abstract
In this article, Arindam examines the concept of data warehousing in detail.
by Arindam Ghosh
Feedback
Average Rating: 
Views (Total / Last 10 Days): 35378/ 66

Introduction

Since the beginning of time reports and information to the masses have been provided. Whether that information was in the form of smoke signals, stone tablets, or hand-written records, information sharing has been around for ages. Today we call this collecting and sharing of information data warehousing. Data warehousing technology comprises a set of new concepts and tools which support the knowledge worker (executive, manager, and analyst) with informational material for decision-making. The fundamental reason for building a data warehouse is to improve the quality of information in an organization. The key issue is the provision of access to a company-wide view of data whenever it resides. Data coming from internal and external sources, existing in a variety of forms from traditional structural data to unstructured data like text files or multimedia is cleaned and integrated into a single repository. A data warehouse (DWH) is the consistent store of this data which is made available to end users in a way they can understand and use in a business context.

A data warehouse is a data repository designed to support the decision-making process for an organization. Unlike with its operational system counterpart, the information can be stored many times in many different locations. Its primary purpose is to provide management with the information it needs in order to make intelligent business decisions. In data warehousing, data is integrated from various, heterogeneous operational systems (like database systems, flat files, etc.) and further external data sources (like demographic and statistical databases etc.). Before the integration; structural and semantic differences have to be reconciled, i.e., data have to be “homogenized” according to a uniform data model. Furthermore, data values from operational systems have to be cleaned in order to get correct data into the data warehouse.

Data warehouses have four distinct characteristics that differentiate them from operational systems. Data warehouses are

Subject oriented: The warehouse is organized around a specific business process such as purchasing.

Integrated: The warehouse is integrated so that we can relate one subject against another, so that, for instance, we could perform purchasing versus sales analysis.

Nonvolatile: The warehouse is static it is not changing like an operational system. We load data on a regular basis into the warehouse, but we do not change data that already exists in the database.

Time based: The basic power of the warehouse is that the information contained in it is based on specific point-in-time loads. The loads may be daily, weekly, monthly, or based on some other time period, but whatever that time is, we include this with the data in the warehouse. The warehouse shows your business information at many points in time, whereas your operational system shows you information at the time you look at the information.

Accessible: The primary purpose of a data warehouse is to provide readily accessible information to end-users.

Process-Oriented: It is important to view data warehousing as a process for the delivery of information. The maintenance of a data warehouse is ongoing and iterative in nature.

A data warehouse system (DWS) comprises the data warehouse and all components used for building, accessing and maintaining the DWH. The center of a data warehouse system is the data warehouse itself. The data import and preparation component is responsible for data acquisition. It includes all programs, applications and legacy systems interfaces that are responsible for extracting data from operational sources, preparing and loading it into the warehouse. The access component includes all different applications (OLAP or data mining applications) that make use of the information stored in the warehouse.

In data warehousing, there are various types of metadata (metadata is defined as data about data or data describing the meaning of data ), e.g., information about the operational sources, the structure and semantics of the DWH data, the tasks performed during the construction, the maintenance and access of a DWH, etc. The need for metadata is well known. Statements like, “A data warehouse without adequate metadata is like a filing cabinet stuffed with papers, but without any folders or labels”, characterize this situation. Thus, the quality of metadata and the resulting quality of information gained using a data warehouse solution are tightly linked.


View Entire Article

User Comments

Title: sakota   
Name: said Ahmed
Date: 2008-04-09 5:22:55 AM
Comment:
Very nice and cute...

Product Spotlight
Product Spotlight 





Community Advice: ASP | SQL | XML | Regular Expressions | Windows


©Copyright 1998-2024 ASPAlliance.com  |  Page Processed at 2024-03-29 3:20:49 AM  AspAlliance Recent Articles RSS Feed
About ASPAlliance | Newsgroups | Advertise | Authors | Email Lists | Feedback | Link To Us | Privacy | Search