This paper is a list of political issues that frequently come up in data warehousing projects. People often get blind sided by politics. My hope is that this paper might give readers some advance warning of these issues. Though what is done about these issues varies by organization, I believe the best advice to data warehouse implementers is to do your best to spot these issues early and then pick your battles wisely.
I recommend that you read Marc Demarest’s The Politics of Data Warehousing in conjunction with this paper. In his June 1997 paper, Marc comments on how little extended discussion of politics there is in the data warehousing literature. As of the writing of this paper, to the best of my knowledge, that situation still has not changed. This is unfortunate because ambitious data warehousing projects are rife with political issues.
My working definition of a data warehousing "political issue" is a situation where the equally valid and reasonable goals and interests of two or more parties collide with each other. That is, these are situations where there is great potential for conflict. Though these issues can appear minor and even petty, they can account for a good portion of the mental wear and tear experienced by data warehouse developers.
In this paper, I have classified the political issues into those that are within the IS organization (IS to IS), those that are between IS and the users (IS to Users), and those that are between users (User to User).
Finally, in this paper I try to list the political issues that are peculiar to data warehousing. Data warehousing experiences all the usual political problems (i.e., resources, deadlines, etc.) that occur in complex technology projects. Just check into literature about IS project management and you will find a wealth of material on these issues.
IS to IS issues
Internecine conflicts in IS projects can be the most difficult to deal with. Data warehousing projects probably are typical in this respect.
Where does the data warehousing development group report to?
The issue is whether the data warehousing development group should be a
free standing development organization or whether it should be part of
a group that traditionally has concentrated its efforts on transaction
processing development. Often transaction processing development
organizations have been driven by their work order backlogs and the
need to react to whatever is the crisis on hand. Some persons believe
that data warehousing, however, best flourishes when done with an
entrepreneurial orientation rather than with a reactive orientation. On
the other hand, many organizations quickly come to depend on data
warehousing systems for day–to–day work. These data warehousing systems
need to be as "industrial safe" as some of the transaction processing
systems. Placing the data warehousing effort in a separate development
group can lessen knowledge transfer and appreciation of how to make
data warehouses industrial safe.
Who should administer the data warehousing databases – the DBA group or the data warehousing development group?
The need to make data warehouse database structure changes can be
relatively frequent. Proliferating data marts, uncertainty about usage
patterns, and the "I’ll know what I want when I see it" nature of data
warehouse development can necessitate table and index changes. Data
warehouse developers, concerned about losing the favor and interest of
data warehouse users, want changes made quickly and get quite
frustrated being put on the DBA backlog. On the other hand, DBAs often
have knowledge about how to make database processing industrial safe.
Cutting the DBA organization out of the data warehousing support loop
can deprive the data warehousing effort of some valuable wisdom.
How to gain the cooperation of feeder system developers who
appear to have much more to lose than to gain in the data warehouse
development effort
Data warehousing efforts often bring to light problems in feeder
transaction processing systems that may have been "hidden" for years.
The developers of these systems, whose knowledge is often crucial to
the data warehousing effort, may be reluctant to help if they feel that
the data warehousing effort is going to be audit of their work.
Should feeder system problems be corrected in the data warehouse or in the feeder system?
Actually, the question often becomes whether: 1) The feeder system
should be fixed or 2) The feeder system should be left alone and the
data in the warehouse should be fixed or 3) Data should be fixed in the
data warehouse with the fixes fed back to the feeder system. And to
further complicate matters, usually there are multiple problems with
different groups suggesting different combinations of actions.
Against what data should reports be written?
Often an organization quickly discovers that quite a few reports can be
written against data in the data warehouse or against data in the
transaction processing systems. This can be quite perplexing to
organizations where there is not agreement as to what the data
warehouse is for.
How big is the data warehousing batch processing window?
Often there is need for a time period where transaction processing
systems are kept stable so changes made to the systems can be captured
and fed into the data warehouse. When changes cannot be easily
identified, a typical course of action is to compare a previous copy of
the transaction system database with the current database. After the
changes are identified, a copy of the current database is made for
comparison in the next processing cycle. In some firms, the need to
"freeze" transaction processing system databases can cause
inconveniences to other processing. How much time should be allotted to
the window in which transaction processing system databases are frozen
can be a source of contention.
Who has ongoing responsibility for data quality monitoring?
Data quality is not a one time concern to many firms that implement
data warehouses. In a firm with complex feeder systems, it is not
uncommon for previously undiscovered data quality problems occur after
the big push to clean data for the initial load of the data warehouse
is done. Firms find it necessary to install procedures to regularly
audit data quality. And in most firms it is unclear who should have
responsibility for executing these procedures.
How are requests to make feeder transaction processing system
changes approved and how is knowledge about the changes communicated?
Small changes in feeder transaction processing systems can have major
impacts on the feed to a data warehouse. Conflicts arise when
transaction processing system developers, under pressure from their
users to make changes, now have to work with data warehouse developers
to assess the impact on downstream systems. Even more vexing situations
come when a change is made in the feeder transaction processing system
and is not communicated to the data warehouse developers.
IS to User issues
User issues can be especially thorny with data warehouses because, unlike with transaction processing systems, use of data warehousing systems is often optional. Unless data warehouses are tailored to their preferences, users may quickly decide not to use the data warehouse.
Why should users give up control of user managed databases?
Many user departments have, on their own, developed databases that meet
some of their key reporting needs. Often these systems were built by
user organizations on their own because the IS organization was
unwilling or unable to help the users or the users were skeptical about
the level of support they would receive if they were to work with IS.
It is highly likely when a data warehouse that will subsume the
functions of these user managed databases is proposed, these users may
be skeptical about whether the IS organization can do as good a job
supporting the user reporting needs as the users did on their own.
How to gain the cooperation of a user whose spreadsheet is being automated
Often part of the goal of a data warehouse is to automate the
production of a spreadsheet or series of spreadsheets that have been
manually created by a user. Sometimes the user’s corporate identity is
tied to the spreadsheets and he or she feels (rightfully) threatened by
the prospect of automation. This user’s cooperation will be needed in
the data warehouse development. Though dealing with this sensitive
personnel issue probably should be to be the responsibility of user
management, often the IS organization has the burden of figuring out
how to gain cooperation.
Should design be for the needs of the masses or for the needs of the most demanding user?
In many data warehousing projects it is not uncommon for the IS
organization to find one to a handful of users whose "needs" go way
beyond those of most of the data warehouse users. Usually, the need is
for a far greater level of detail and/or for far more history and/or
for a series of reports of both a high deal of technical and business
complexity. It can be quite expensive and time consuming to satisfy the
needs of these far more demanding users. On the other hand, these users
can have a peculiar need that is especially beneficial to the business
and/or can be people whose support is vital to the success of the
project.
What requirements should be frozen; When should requirements be frozen (and unfrozen)?
Data warehousing development is iterative. This does not mean that
requirements never get frozen. Rather, there can be many start–stop
cycles in data warehousing requirements definition. Also, some
requirements may be frozen while some are always loose. Managing
requirements definition in a data warehouse effort can require a deft
political touch.
How many data marts should there be?
Users want their own data marts for a variety of reasons. Some of the
reasons are: 1) The desire to put their data on different hardware
platforms so their reporting needs are less impacted by other people’s
processing 2) The desire to modify data at their own discretion (though
this may strike terror in a data warehousing purist) 3) The desire not
have to work with other groups on resolving data definition issues. –
Some reasons sometimes do make good business sense. Unfortunately, it
can get quite expensive to support a proliferating number of data marts.
In how timely a manner are data corrected?
Sometimes users are used to being able to make a correction to data and
then immediately run reports against corrected data. Perhaps the users
have been running reports against a transaction system database which
could immediately be adjusted. Perhaps the users had their own database
or spreadsheets which they could adjust at their will and then generate
reports. Problems come if data warehouse developers design systems so
corrections now are now incorporated into the data warehouse during a
batch feed at the end of the day or at the end of the week or at the
end of the month.
Who should have responsibility for maintaining data warehouse data not fed by transaction processing systems?
Often as part of a data warehouse it is necessary to manually maintain
dimension tables and conversion tables that contain data not in any
transaction processing system. Also, sometimes budget, forecast, or
quota data must be manually maintained. This maintenance can be quite
involved. Determining whether users and/or IS should bear the
maintenance burden can be a major issue.
Who is in charge of ongoing audit of data quality
As mentioned before, data errors pop up after the data warehouse is
implemented. For example, problems occur because sometimes data is not
fed from the transaction processing systems or fed multiple times. Many
times it is necessary to make someone explicitly responsible for
regularly auditing data. However, it often is not clear who this person
should be.
How to pass responsibility for running and maintaining a report from the users to IS
Users write reports that the business comes to depend on for day–to–day
functioning. Here is what often happens: 1) The reports become too
technically difficult for the users to change and/or 2) The report
"code" becomes lost or corrupted and/or 3) The user leaves the
organization (usually without documenting the report). In these cases,
IS usually gets called in. This need to obtain IS involvement can
create great consternation in an IS organization who thought that
building a data warehouse was going to get it out of the report writing
business.
User to User issues
These are issues that involve potential conflicts among the users of a data warehouse. This does not mean that IS is not involved. Rather, IS can be right in the middle between users.
Who has access to what data
As can be imagined, one business group may not want another business
group to see its data and one location may not want another location to
see its data. Also common is for division personnel not to want
corporate personnel to see detail division data. Perhaps more
complicated to deal with are concerns of one user group that another
user group may misinterpret data. Often one functional area thinks
another won’t understand certain data, e.g., Sales say Finance won’t
understand "its" numbers and Finance says Sales won't understand "its"
numbers. Often people’s whose formal job it is to analyze information
question whether people whose formal job is not to analyze information
will misinterpret data, e.g. , financial and market analysts question
whether line accountants and sales people can understand certain data.
What dimensions, attributes, calculations should be defined similarly?
You may have seen some data warehousing literature that talks about how
the data warehouse should create a "common view" (or some similar term)
of all the data. To put this is in what I believe are in more concrete
terms, I believe that this is referring to making sure that dimensions
conform, that attributes are used consistently, and that calculations
are always calculated the same way. Though this is a nice ideal, I
believe that most firms do not have the patience to do this. Rather,
through a great deal of give and take, firms implementing data
warehouse decide a subset of dimensions, attributes, and calculations
whose definition is worthwhile making the effort to calculate similarly.
How to define a customer; How is profitability calculated
Most firms end up wanting to determine similar definitions of customers
and profitability. It is my opinion that these definition tasks
probably cause more political issues than any other definition tasks .
– Note that a common use of a data warehouse is to report profitability
for internal purposes in a way more meaningful than profitability as
calculated per generally accepted accounting principles. It is very
common to want to report profitability by customer and/or by product.
If so, the firm may have issues as to what a customer is. A customer
may be a legal entity, it may be a location, or it may be the people
performing a function for a legal entity or a location, etc. To
determine profitability, it may be necessary to include expense
allocations, the determination of which can be politically contentious.
Finally, another common major issue regarding profitability is when a
sale should be recognized.
Who has final say over the correctness of data?
If multiple user organizations are going to be accessing the same data,
there will be ongoing disagreements about the 7qu
correctness" of data added to the data warehouse. These debates about
correctness will not be which items are in error. Rather, these will be
debates regarding interpretation of data. Note that an unexpected
consequence of data warehousing is that while before users might be
able to reconcile their differences by making adjustments to summarized
numbers, data warehousing may force them to agree on how the detail
should be interpreted.
Conclusion
If you go through these issues I believe you will see three common threads regarding why data warehousing projects engender political issues: 1) Data warehousing imposes new obligations whose responsibilities are unclear 2) Data warehousing requires changes in processes that an organization is comfortable with 3) Data warehousing requires agreement on some, but not all, definitions of data.