The Case for Data Warehousing

The following is a list of the basic reasons why organizations implement data warehousing. This list was put together because too much of the data warehousing literature confuses "next order" benefits with these basic reasons. For example, spend a little time reading data warehouse trade material and you will read about using a data warehouse to "convert data into business intelligence", "make management decision making based on facts not intuition", "get closer to the customers", and the seemingly ubiquitously used phrase "gain competitive advantage". In probably 99% of the data warehousing implementations, data warehousing is only one step out of many in the long road toward the ultimate goal of accomplishing these highfalutin objectives.

The basic reasons organizations implement data warehouses are:

To perform server/disk bound tasks associated with querying and reporting on servers/disks not used by transaction processing systems

Most firms want to set up transaction processing systems so there is a high probability that transactions will be completed in what is judged to be an acceptable amount of time. Reports and queries, which can require a much greater range of limited server/disk resources than transaction processing, run on the servers/disks used by transaction processing systems can lower the probability that transactions complete in an acceptable amount of time. Or, running queries and reports, with their variable resource requirements, on the servers/disks used by transaction processing systems can make it quite complex to manage servers/disks so there is a high enough probability that acceptable response time can be achieved. Firms therefore may find that the least expensive and/or most organizationally expeditious way to obtain high probability of acceptable transaction processing response time is to implement a data warehousing architecture that uses separate servers/disks for some querying and reporting.

To use data models and/or server technologies that speed up querying and reporting and that are not appropriate for transaction processing

There are ways of modeling data that usually speed up querying and reporting (e.g., a star schema) and may not be appropriate for transaction processing because the modeling technique will slow down and complicate transaction processing. Also, there are server technologies that that may speed up query and reporting processing but may slow down transaction processing (e.g., bit-mapped indexing) and server technologies that may speed up transaction processing but slow down query and report processing (e.g., technology for transaction recovery.) – Do note that whether and by how much a modeling technique or server technology is a help or hindrance to querying/reporting and transaction processing varies across vendors' products and according to the situation in which the technique or technology is used.

To provide an environment where a relatively small amount of knowledge of the technical aspects of database technology is required to write and maintain queries and reports and/or to provide a means to speed up the writing and maintaining of queries and reports by technical personnel

Often a data warehouse can be set up so that simpler queries and reports can be written by less technically knowledgeable personnel. Nevertheless, less technically knowledgeable personnel often "hit a complexity wall" and need IS help. IS, however, may also be able to more quickly write and maintain queries and reports written against data warehouse data. It should be noted, however, that much of the improved IS productivity probably comes from the lack of bureaucracy usually associated with establishing reports and queries in the data warehouse.

To provide a repository of "cleaned up" transaction processing systems data that can be reported against and that does not necessarily require fixing the transaction processing systems

Please read my essay on what data errors you may find when building a data warehouse for an explanation of the type of "errors" that need cleaning up. The data warehouse provides an opportunity to clean up the data without changing the transaction processing systems. Note, however, that some data warehousing implementations provide a means to capture corrections made to the data warehouse data and feed the corrections back into transaction processing systems. Sometimes it makes more sense to handle corrections this way than to apply changes directly to the transaction processing system.

To make it easier, on a regular basis, to query and report data from multiple transaction processing systems and/or from external data sources and/or from data that must be stored for query/report purposes only

For a long time firms that need reports with data from multiple systems have been writing data extracts and then running sort/merge logic to combine the extracted data and then running reports against the sort/merged data. In many cases this is a perfectly adequate strategy. However, if a company has large amounts of data that need to be sort/merged frequently, if data purged from transaction processing systems needs to be reported upon, and most importantly, if the data need to be "cleaned", data warehousing may be appropriate.

To provide a repository of transaction processing system data that contains data from a longer span of time than can efficiently be held in a transaction processing system and/or to be able to generate reports "as was" as of a previous point in time

Older data are often purged from transaction processing systems so the expected response time can be better controlled. For querying and reporting, this purged data and the current data may be stored in the data warehouse where there presumably is less of a need to control expected response time or the expected response time is at a much higher level. – As for "as was" reporting, some times it is difficult, if not impossible, to generate a report based on some characteristic at a previous point in time. For example, if you want a report of the salaries of employees at grade Level 3 as of the beginning of each month in 1997, you may not be able to do this because you only have a record of current employee grade level. To be able to handle this type of reporting problem, firms may implement data warehouses that handle what is called the "slowly changing dimension" issue.

To prevent persons who only need to query and report transaction processing system data from having any access whatsoever to transaction processing system databases and logic used to maintain those databases

The concern here is security. For example, data warehousing may be interesting to firms that want to allow report and querying only over the Internet.


Some firms implement data warehousing for all the reasons cited. Some firm implement data warehousing for only one of the reasons cited.

By the way, I am not saying that a data warehouse has no "business" objectives. (I grit my teeth when I say that because I am not one to assume that an IT objective is not a business objective. We IT people are businesspeople too.) I do believe that the achievement of a "business" objective for a data warehouse necessarily comes about because of the achievement of one or many of the above objectives.

If you examine the list you may be struck that need for data warehousing is mainly caused by the limitations of transaction processing systems. These limitations of transaction processing systems are not, however, inherent. That is, the limitations will not be in every implementation of a transaction processing system. Also, the limitations of transaction processing systems will vary in how crippling they are.

Finally, to repeat the point I made initially, a firm that expects to get business intelligence, better decision making, closeness to its customers, and competitive advantage simply by plopping down a data warehouse is in for a surprise. Obtaining these next order benefits requires firms to figure out, usually by trial and error, how to change business practices to best use the data warehouse and then to change their business practices. And that can be harder than implementing a data warehouse.

Comments? Contact Larry Greenfield at larryg@lgisystems.com