Knowledge Management Magazine

Knowledge Management & Intranet Solutions - Conference & Exhibition

Cover Story


Product-centric databases are dead. They now need to be knowledge-management enabled. Campbell McCracken finds out why.

John Thomson, WhiteCross’s vice president of worldwide marketing

Case Study -
WhiteCross at Freeserve
WhiteCross Systems has developed database technology especially suited to rapid analysis of large volumes of data. It has been working with UK service provider and portal Freeserve to help it maximise the customers’ experience on the site and improving the effectiveness of advertising. Working with WhiteCross, Freeserve has been able to gain an understanding of how, when and why customers are using the site.

"Within three days of looking at their data we said, ‘Here are two areas that are completely unrelated but the same people visit them with a high degree of frequency," says WhiteCross’s John Thomson. One was horse racing, and the other was astrology. "So they reoriented those two areas and cross-linked them and they saw an increase of about 20 per cent in traffic."

Another important aspect of the analysis is speed. For instance, it is possible for an affiliate to run a one-hour promotion and check the effectiveness of the promotion in order to run it again later in the day. Data can be cleaned and analysed at a rate of 5 billion rows per second, so that Freeserve can see usage patterns in a matter of minutes. This allows for ad-hoc, ‘real time’ analysis.

Traditionally databases have been used to store structured data, such as customer names, addresses and phone numbers. And until recently, corporate databases tended to be product-centric. "Everybody wanted to build databases to make decisions on products, sales reps, territories and so on," says WhiteCross’s vice president of worldwide marketing John Thomson. But that has changed.

For the past few years the emphasis for databases has moved away from being product-centric to being customer-centric. "No-one’s come to us in the last three or four years and said ‘Build me a large database around my product set.’" says Thomson. "Products have become a dimension of the customer view of the world."

Multi-channel data collection
One of the most recent changes in the way customers do business, and consequently the way organisations need to gather information, is the web. "When you consider that the channel for retail isn’t just the store," says Avellino’s vice president of marketing, Ed Wrazen, "but it’s also through the Internet, it’s through catalogue shopping, mobile WAP shopping as well, you can get a lot of consumer information."

"Probably 90% of what we’re doing right now is based on web data," says WhiteCross’s Thomson. (See Case Study - WhiteCross at Freeserve.) "If you look at that web data you discover your customers are telling you how they want to do business with you."

"There’s an analysis that you can do that is called ClickStream," says Communicata’s head of solutions development, Greg Rouchotas. This gives access to every click that a customer makes on your website. "If you are a big catalogue site you might see that a lot of people are coming down to a specific product category but they’re not actually buying. It could be that there’s interest for that product but your price is too high. So you can play around with price sensitivity and fine tune your price."

The single view
The analysis of customers across all channels allows organisations to identify how products are selling through which channels. "This is particularly so within the retail sector," says Wrazen, "and within the banking, finance and insurance sectors where a lot of information from different products is being consolidated under one single view of the customer."

"But as always, you’ve got to get your data right in the first place," warns Wrazen. "Companies are finding a lot of inconsistencies and a lot of problems in their source data, largely because the data has come from disparate sources, from perhaps 10 or 20 years ago. I know of one company that has something like 20 analysts just going through all this data and just trying to find all the inconsistencies."

One common reason is that there are ‘duplicate’ records in the database, in other words records relating to the same person, but which are held separately because some of the details are slightly different. It could be that the name has been input differently (for example John Smith one time and J. B. Smith another time) or it could be that the person has moved house.

"12% of the population move every year," says Data Discoveries’ marketing manager Claire Breslin. One of the products Data Discoveries offers is Fastrac. "If you need to find somebody, that’s what Fastrac will do," claims Breslin.For example, it automatically verifies when a person was last at the address given. Then using co-habitee and date information, it searches forward on the current year to find the new address of the person you are looking for.

In addition to cleansing the data, you can augment it to try to create a profile of your best customers. Going by the 80:20 rule, you’ll want to try to find more customers like the top 20% that generates 80% of your income. Data Discoveries’ claims its product Realiser can help you do that by adding external demographic and lifestyle information to your existing data. This allows you to create a comprehensive profile of your customers. It can let you know where you can find more like them.

Chris Ward, Oracle’s business intelligence marketing manager

Profile -
Oracle 9
WOracle launched Oracle 9i, the latest version of their database, in June with several new features aimed at KM and Business Intelligence. For example it boasts integrated Online Analytical Processing (OLAP) and an integrated Scoring Engine, rather than bits of technology that you have to install and maintain separately.

Another feature is the Oracle Internet Filing System (IFS), which can be used to replace a standard file system. "The advantage is that everything you have is stored in one central place which is then managed properly by your database administrator," says Communicata’s Rouchotas. "Everything is backed up and it’s completely searchable. Oracle has a fairly complex database searching mechanism and you can do all sorts of things, like search for a word that sounds like something, or an image that looks like an image you provide."

"File systems have been pretty good when they’re small scale," says Oracle’s business intelligence marketing manager, Chris Ward. "If you’re moving into an area where you’re looking for versioning, where you’re looking for checking in and checking out of a file if you make changes, where you’re trying to build things around the files, such as XML or metadata, it’s clearly much more preferential to store it inside a single kernel."

"As soon as it’s in the database you have so many more things that you can do with that dataset. It can still be a Word or Excel document, but you can serve it up over the web. You can allow multiple people to work on it. It’s excellent for workgroups, it’s excellent for working in teams, whereas I think the traditional file system is excellent for working individually."

Oracle 9i also includes a much greater support for XML than previous versions. "XML features that are delivered straight away with the product are things like parsing," says Ward. "So you can take a file and run it using the IFS and it will parse the file and encode it in such a way that it has an XML structure, with metadata and everything else around it. And of course you can render a file out to an XML standard. Also we deliver a software development kit so you can extend the structure."

Clustered Computing
The other important new feature with Oracle 9i comes into play when your volume of data is increasing exponentially or you’re working off the web and you’re looking at clicks and datasets that produce massive volumes of data in real time. One way of achieving that is to cluster computers together to share the processing power between them.

Clustering technology has been around for a few years but it’s only really been available on a mainframe system, and if it was down on a UNIX or NT system, as soon as you’ve extended you’ve had to repartition and change your application to recognise the new part of the cluster."The big breakthrough technology in 9i is something called Real Application Clusters," says Chris Ward.

"Suppose you were running a small company and you had an NT box with a single processor. You can put Oracle 9i on there, with Real Application Clusters, and if you wanted to extend your hardware you could just plug another box into it and it would look as if it was one instance, one database. You would not have to repartition your data or change your applications. It’s automatic. And that’s the big change."

Where the single view stops working
However, there is a problem with single view concept. "Our Matching Engine software will look for duplications across an organisation’s database," says Innovative Systems’s senior vice president of European operations, Mike Healy. "When we do that analysis and get the marketing and e-commerce people in a room and say ‘Which of these are duplicates?’ they will all argue about it. Typically the e-commerce people are going to be incredibly cautious about bringing the records for John Smith and J. B. Smith together, because maybe that would give John Smith access to J. B. Smith’s transactional information."

"And although they are being the most cautious, the view that they would build would be the least accurate because really 99.9% of the time those two people are the same person. If you don’t bring them together you leave the database riddled with duplicate customer views. When you then go to do KM, most of which is derived from transactional information, those are incredibly wrong." In one example, a top bank thought they had 17 million clients. But after analysis they had 12.5 million clients, the rest were duplicates. So 9 million customer profiles were wrong (the 4.5 million duplicates and the 4.5 million originals), in other words more than 50% of their database.

"The single view of the customer across the enterprise is now hampering people’s ability to do effective KM and targeting," says Healy. "We would argue that you should have one place where customer views are maintained and you should have more than one view of the customer - Purpose Views - for example an ecommerce view (where the duplicates are not brought together) and an operational view (where they are)."


Ed Wrazen, Avellino’s vice president of marketing	Greg Rouchotas, Communicata’s head of solutions development	Mike Healy, Innovative Systems’s senior vice president of European operations

Database technology
Database technology itself continues to evolve. There are tools now to hold different types of data, not just structured textual data or numeric data, but also graphics, images, bills of materials, parts explosions, videos, multimedia, voice etc. You can hold the data in a data warehouse format, or a customer view, or a product view and so on. (See Profile -Oracle 9i).

Volume and speed of access are no longer issues. The amount of data that databases can hold is vast and response times are generally good too. "However it still boils down to design," says Wrazen. "That’s one thing that does not go away. You still have to ensure that the data is in a format that gives you optimal response time and efficiencies."

Increasingly database technology now spans a vast array of computing devices from PDAs (you can run DB2 on Compaq iPAQs) right up to an IBM System 390 mainframe and every platform in between. "So there is a very nice migration path should you need to move your database to a faster device," says Wrazen. Of course you wouldn’t run a mission critical application with millions of rows on a PDA. "But it’s a nice technology for downloading, say, a salesman’s view of particular clients. We’re going to see more use of hand-held computing technology, and database vendors are certainly moving in that area."