The extraction of data from the operational environment to
the data warehouse environment requires a change in technology.
The selection of data from the operational environment may
be very complex.
Data is reformatted.
Data is cleansed.
Multiple input sources of data exist.
Default values need to be supplied.
Summarization of data often needs to be done.
The input records that must be read have “exotic” or
nonstandard formats.
Data format conversion must be done.
Massive volumes of input must be accounted for.
Perhaps the worst of all: Data relationships that have been
built into old legacy program logic must be understood and unraveled before
those files can be used as input.
Implementing a concrete Data Warehousing Software (DWS) is a
complex task comprising two major phases. In the DWS configuration phase, a
conceptual view of the warehouse is first specified according to user
requirements (data warehouse design). Then, the involved data sources and the methods
in which data will be extracted and loaded into the warehouse (data
acquisition) are determined.
Finally, decisions about persistent storage of the warehouse
using database technology and the various ways data will be accessed during
analysis are made. After the initial load during the DWS operation phase,
warehouse data must be regularly refreshed, i.e., modifications of operational
data since the last DWH refreshment must be propagated into the warehouse such
that data stored in the DWH reflect the state of the underlying operational
systems. Besides DWH refreshment, DWS operation includes further tasks like
archiving and purging of DWH data or DWH monitoring.