Information World Review | The leading magazine for the information professional

ADVERTISE

NEWSFEED

SAMPLE COPY

SUBSCRIBE

EMAIL UPDATES

ABOUT IWR

CONTACT US

SEARCH

Evolution through revolution

Campbell McCracken reports from the frontier of a new internet

A new technique has been developed that could revolutionise the way searches are carried out on the internet. The technology, dubbed InfraSearch, changes the way searches are carried out by taking the power of the search away from huge centralised search engines and putting it in the hands of the information owners.

Currently search engines work by ‘spidering’ sites. They start with the set of web pages that they know about and then ‘click’ on all the links on these pages to try to find more pages. They take a snapshot of each of the new pages they find and add them to gigantic databases. When you perform a search, you actually search the database of the snapshots, not the original sites.

There are many drawbacks with this approach. First, what you are searching is never really up to date. Typically sites do their spidering every few weeks or months. This means that the index is potentially weeks or months out of date.

Second, search engines can only spider static web pages. They cannot take snapshots of dynamic pages (e.g. pages containing a ‘?’ in their URL) that are created on the fly by ecommerce or other web sites in response to customer visits. This means that they cannot index, say, the contents of a database.

It also means that one of the prime means of attracting visitors to your site is completely out of your control. You cannot force a search engine to index your site. When you create your site or make a change to it, you can ask the engine to index your site, but you have no control over how or when it does this.

By contrast, InfraSearch puts the search decision and management in the hands of the information owner. Each computer participating in the InfraSearch runs a small piece of software that links it to a few other computers, each of which in turn links to a few more computers. This interconnecting of computers making up the InfraSearch mirrors the original architecture of the internet and provides a level of immunity from outages and denial of service attacks.

When a search request is made, it is passed from computer to computer to see if any of them can respond to it. The responsibility for how the request is interpreted is up to each computer. More importantly, each can decide at the time of the request what information it has available to fulfil the request. This means that the response should always be up to date. Bad links, caused by search engines still indexing web pages that no longer exist, will be a thing of the past.

Because the decision on how to respond to each request is in the hands of the information holder, it is up to the holder to determine where the response comes from. It could be sourced from static web page or it could be created dynamically, using all the information that is available to the holder, including the full contents of databases.

This next step in the evolution of the internet was born out of a revolution against the authorities trying to stifle the sharing MP3 music files. The highly popular Napster software allows anyone to announce the availability of their collection of MP3 files on a central server. Others searching for a particular MP3 file can search the server to find out where to go to get the file. However the existence of a central server made it easy for the music industry lawyers to identify those who were sharing the files, potentially robbing the industry of royalties, and shut them down.

So programmers at Nullsoft developed Gnutella, based on Napster, but which didn’t require a central server. Instead individual computers with MP3 files to share were linked, and requests for particular MP3 files were passed between them.

The wider potential of this approach was spotted by Gene Kan, one of the lead programmers in the InfraSearch project. "I realised this wasn’t about swapping MP3 music files, but a cool new technology," said Kan. "The whole distributed real-time search domain is something that’s going to change the internet. This is a whole new technological frontier, ripe for exploitation."

Netscape founder Marc Andreesen said, "It’s a big deal. [InfraSearch] will do for search what the internet did for communications." He added: "Most of what we’ve been doing on the web for the past five or six years has been pretty centralised. It’s ironic it’s taken so long for this to happen."

http:///