A Campbell
McCracken is a writer and independent consultant based in
Fife |
Kay
Hammer |
Peter
McCann |
Peter
Weston | |
|
It has been estimated that, on average,
more than two thirds of a company's IT budget is spent on
the maintenance of legacy databases. This money is used for a
variety of things, including the computers that the databases reside
on, and the patching of software to cater for circumstances that
were unforeseen when the software was originally written. One
topical example of the latter, of course, is the millennium bug
which was caused by software systems using a two-digit
representation for the year being used for much longer than they
were originally intended.
It has also been estimated by the
Gartner Group that more than 35 per cent of all programming effort
is taken up in developing and maintaining programs that link databases
and transfer information between them. So, if legacy databases
are such a huge drain on financial resources, what benefits
do they give back in return and what are the problems that you.
ll encounter on the way to reaping these benefits?
Legacy
databases for operational use
Although relational databases
have been around for a number of years, the majority of companies,
especially large companies, still run their operations using flat
databases known as legacy databases. The term legacy is apt for two
reasons. Many of the databases are passed on, like an inheritance,
from one management regime to the next. Others are inherited when
one company takes over another, or when two companies
merge.
One problem with legacy
databases is that they are usually geared up for operational
business and transactional processing, handling short transactions
hundreds and thousands of times a day. This means that although they
work well in the operational side of businesses such as
manufacturing systems, order processing and billing systems, and so
on, they are not suited to the decision-processing needs required
for business intelligence decisions such as data
warehousing.
Although many companies would
like to replace their legacy databases and systems, simply moving
the data to a more modern structure does not add value in itself.
Many have taken the
opportunity of the need to do
something about the millennium bug to upgrade to modern enterprise
resource planning (ERP) systems. However there are alternatives,
such as consultants SEEC and Progeni who offer Year 2000 conversion
solutions, especially for COBOL applications.
Legacy databases can contain
valuable information either on their own, or when several databases
are brought together. One reason for linking the disparate databases
from separate business units of a company or from newly merged
companies into a data warehouse is that hitherto unseen customer
buying trends can be identified and used as a marketing
tool.
For example, Somerfield (one of the UK.
s largest supermarket chains) has built an information warehouse based
on Oracle's Database Server which lets them know what
a customer bought, what time they bought it, how they paid for it,
and what other items they bought with it. It helps them to understand
their customers, their habits, what they don. t buy and so
on (see the PMPL and Norwich Direct case study on page
31).
Hidden
Problems
There are a number of problems that come to light when
you try to merge or migrate several legacy databases to a new database.
The first problem is that legacy databases tend to be fragmented
within a company's structure and rarely talk to each
other. Because of this they are rarely co-ordinated. You would generally
find that the same customer is represented differently in each
database. For example, a well-known company could be represented
in three different databases as . Digital Equipment Corporation.
, . and . DEC. .
This means that, before the
data from the various databases can be brought together, it has to
be cleaned to improve the quality: it is extremely hard to look for
interesting elements within your data unless you clean up the
inconsistencies. There are tools available that will help tie these
records together. For example, in addition to uncovering information
buried in free-form fields, the INTEGRITY Data Re-engineering
Environment from Vality identifies relationships between data
values, and discovers metadata. For the cases where the match is not
exact, INTEGRITY includes data analysis, cleansing and unique
probabilistic and fuzzy matching capabilities and would indicate
that, say, there was a 95 per cent probability that the two records
were the same.
A second problem arises because some
of the inconsistencies within the database could have been put
there deliberately. These are known as . planned inconsistencies.
. Although IT departments frequently have to update
their database systems because their business needs have changed or
because they need to store a different kind of information about
their customers, they rarely want to rebuild the database when they
make the changes. So they trick the database into thinking that
nothing has happened. For example they might have declared that a
department record is 200 bytes long. By switching different
copy-books, in and out, they can make these 200 bytes mean different
things. However these tricks are rarely documented adequately and
they generally have to be discovered when, for example, the legacy
database is loaded into a data warehouse and bogus data is
discovered.
Other problems arise from the use
of keys to represent information in databases. If two or more databases
are being merged into a data warehouse then it is likely that
different keys have been used to represent, say, vendors. So, for
example, there could be several different vendors each with key 100.
The default approach is that new keys are created for everything.
If the data warehouse is updated on a batch basis from the
legacy databases then the key correspondence will have to be kept
around to allow the conversion to take place each time. "The whole
issue of getting the data in the databases sorted and consistent
is extremely computer-intensive if you are dealing with anything
but a really simple environment," says ETI's Kay
Hammer.
What are the
pitfalls?
Most consultants and data
migration specialist organisations predictably warn of the dangers
of attempting to perform a migration project either manually or
unaided. But perhaps they have good reason to. Both the Gartner
Group and the Standish Group put the rate of failure in large data
migration projects at around 30 per cent. Some of the pitfalls
already mentioned could account for some of the failures. Other
reasons include companies rushing into the projects with poor
planning. Steve Moran of IT Consultancy CITL says: "Far too many IT
managers are simply unaware that migrating data is not just another
development project. It demands specific skills, processes and
methods." Simon Jennings, Managing Director of ETI in the UK says:
"Data migration projects tend to be very complex and, even if a
company is aware of automated migration techniques, managers often
underestimate costs and risks." However, in some cases where the
approach was wrong the project can be made successful by applying
the right tools.
Danny Lonighan, Senior Sales
Manager at ETI, also suggests that if a customer is planning to
handle the project by himself, then the training of the
implementation team should not be overlooked. "To avoid failure of a
data warehouse, you should get the project started off in the right
environment," says Lonighan. "This typically means 20 to 30 days
training plus the services of a consultant for six months to a
year." This is a small investment when you consider that large
migration projects can have a budget of several hundreds of thousand
of pounds, if not millions.
Conclusion
Although legacy databases may appear
to be just a massive drain on financial resources, they should
not be regarded as something to get rid of. They can . either on
their own or when merged with other databases . be made to yield valuable
information. If you are contemplating a migration or merging
project, don. t let the apparent simplicity of the underlying
database structure fool you. A large percentage of such projects
fail because they are not treated with the seriousness they deserve.
Make sure that either you employ skilled consultants or that your
implementation team gets the necessary training.
Metadata
The integrating with,
or migration of, a database will be a difficult process
unless you have some sense of the context of the information,
known as the metadata. A database's
metadata describes, for example, how fields A and B are
related to field C. It has been described as being like
the addressing and postmark information on an envelope,
with the envelope contents being the data which is
generated by an application or tool. Usually a
programming language such as SQL, C++, COBOL or VB
Scripts is used to describe the metadata. However, if
different databases and tools use different languages
then it means that there is no standard way of
representing this form of functional logic.
This lack of a consistent approach makes
database tool inter-operability hard. "Unless
you have some sense of the context, just
grabbing a piece of code and understanding what it
is all about is not that easy," says ETI's
CEO Kay Hammer. "If you used an ETI tool to build a
data warehouse then it would give you all the information about
the source-to-target mapping, and all the business
rules about what's been done to the data. You
don. t want to have to re-enter that data into your
end-user query tools. Youreally like them to suck the
metadata into their system."
With this aim in mind,
a group of approximately 50 vendors and users got together
in 1995 and formed the Meta Data Coalition (MDC)
to push for a metadata inter-operability standard. ETI.
s Kay Hammer is also the co-chair of the MDC. In
July of this year, the MDC accepted the Open Information
Standard, a technology-independent and vendor neutral
metadata standard based on technology developed by
Microsoft with over 20 leading industrial
partners.
In addition, ETI have
made the grammars of their software tools available to
the Meta Data Coalition, and gave them the ownership of
extending the grammars. The grammars consist of a menu
system that says what the legal choices are in defining
the metadata and, because they lead a programmer
through a set of legal choices, s/he can only choose
the legal options. It also means that s/he doesn.
t need to know any programming language in
particular.
An added benefit of this
approach is that the tools
will create a metadata audit trail that is . English-like.
, so that an average person can read it
and see what has been done to the data values. "If you
don. t learn from history, you. re destined to
repeat it," says Hammer, "and metadata is your
history." She advocates that when you are choosing an
infrastructure product to solve one of your problems you
should look for the vendors. strategies for metadata
exchange, even if they are not the MDC standard.
"You. ve got to have a strategy because you. ll
need an audit trail of what you. ve
done." | |
Contact the author at freelance@madscotsman.freeserve.co.uk
Case Studies
PMPL at Norwich Direct
ETI and The Complexities of Database Mapping
Cognos and Using Business Intelligence to Improve Customer Service & Profitability
Oracle at Conoco
Top Ten Tips
How To Run A Successful Integration Project
Contacts
|