Since the beginning of time reports and information to the
masses have been provided. Whether that information was in the form of smoke
signals, stone tablets, or hand-written records, information sharing has been
around for ages. Today we call this collecting and sharing of information data
warehousing. Data warehousing technology comprises a set of new concepts and
tools which support the knowledge worker (executive, manager, and analyst) with
informational material for decision-making. The fundamental reason for building
a data warehouse is to improve the quality of information in an organization.
The key issue is the provision of access to a company-wide view of data
whenever it resides. Data coming from internal and external sources, existing
in a variety of forms from traditional structural data to unstructured data
like text files or multimedia is cleaned and integrated into a single
repository. A data warehouse (DWH) is the consistent store of this data which
is made available to end users in a way they can understand and use in a
business context.
A data warehouse is a data repository designed to support
the decision-making process for an organization. Unlike with its operational
system counterpart, the information can be stored many times in many different
locations. Its primary purpose is to provide management with the information it
needs in order to make intelligent business decisions. In data warehousing,
data is integrated from various, heterogeneous operational systems (like
database systems, flat files, etc.) and further external data sources (like
demographic and statistical databases etc.). Before the integration; structural
and semantic differences have to be reconciled, i.e., data have to be “homogenized”
according to a uniform data model. Furthermore, data values from operational
systems have to be cleaned in order to get correct data into the data
warehouse.
Data warehouses have four distinct characteristics that
differentiate them from operational systems. Data warehouses are
Subject oriented: The warehouse is
organized around a specific business process such as purchasing.
Integrated: The warehouse is
integrated so that we can relate one subject against another, so that, for
instance, we could perform purchasing versus sales analysis.
Nonvolatile: The warehouse is static
it is not changing like an operational system. We load data on a regular basis
into the warehouse, but we do not change data that already exists in the
database.
Time based: The basic power of the
warehouse is that the information contained in it is based on specific
point-in-time loads. The loads may be daily, weekly, monthly, or based on some
other time period, but whatever that time is, we include this with the data in
the warehouse. The warehouse shows your business information at many points in
time, whereas your operational system shows you information at the time you
look at the information.
Accessible: The primary purpose of a
data warehouse is to provide readily accessible information to end-users.
Process-Oriented: It is important to
view data warehousing as a process for the delivery of information. The
maintenance of a data warehouse is ongoing and iterative in nature.
A data warehouse system (DWS) comprises the data warehouse
and all components used for building, accessing and maintaining the DWH. The
center of a data warehouse system is the data warehouse itself. The data import
and preparation component is responsible for data acquisition. It includes all
programs, applications and legacy systems interfaces that are responsible for
extracting data from operational sources, preparing and loading it into the
warehouse. The access component includes all different applications (OLAP or
data mining applications) that make use of the information stored in the
warehouse.
In data warehousing, there are various types of metadata
(metadata is defined as data about data or data describing the meaning of data
), e.g., information about the operational sources, the structure and semantics
of the DWH data, the tasks performed during the construction, the maintenance
and access of a DWH, etc. The need for metadata is well known. Statements like,
“A data warehouse without adequate metadata is like a filing cabinet stuffed
with papers, but without any folders or labels”, characterize this situation.
Thus, the quality of metadata and the resulting quality of information gained
using a data warehouse solution are tightly linked.