Campbell McCracken reports from the frontier of
a new internet
A new technique has been developed that could
revolutionise the way searches are carried out on the
internet. The technology, dubbed InfraSearch, changes
the way searches are carried out by taking the power of
the search away from huge centralised search engines and
putting it in the hands of the information owners.
Currently search engines work by ‘spidering’ sites.
They start with the set of web pages that they know
about and then ‘click’ on all the links on these pages
to try to find more pages. They take a snapshot of each
of the new pages they find and add them to gigantic
databases. When you perform a search, you actually
search the database of the snapshots, not the original
sites.
There are many drawbacks with this approach. First,
what you are searching is never really up to date.
Typically sites do their spidering every few weeks or
months. This means that the index is potentially weeks
or months out of date.
Second, search engines can only spider static web
pages. They cannot take snapshots of dynamic pages (e.g.
pages containing a ‘?’ in their URL) that are created on
the fly by ecommerce or other web sites in response to
customer visits. This means that they cannot index, say,
the contents of a database.
It also means that one of the prime means of
attracting visitors to your site is completely out of
your control. You cannot force a search engine to index
your site. When you create your site or make a change to
it, you can ask the engine to index your site, but you
have no control over how or when it does this.
By contrast, InfraSearch puts the search decision and
management in the hands of the information owner. Each
computer participating in the InfraSearch runs a small
piece of software that links it to a few other
computers, each of which in turn links to a few more
computers. This interconnecting of computers making up
the InfraSearch mirrors the original architecture of the
internet and provides a level of immunity from outages
and denial of service attacks.
When a search request is made, it is passed from
computer to computer to see if any of them can respond
to it. The responsibility for how the request is
interpreted is up to each computer. More importantly,
each can decide at the time of the request what
information it has available to fulfil the request. This
means that the response should always be up to date. Bad
links, caused by search engines still indexing web pages
that no longer exist, will be a thing of the past.
Because the decision on how to respond to each
request is in the hands of the information holder, it is
up to the holder to determine where the response comes
from. It could be sourced from static web page or it
could be created dynamically, using all the information
that is available to the holder, including the full
contents of databases.
This next step in the evolution of the internet was
born out of a revolution against the authorities trying
to stifle the sharing MP3 music files. The highly
popular Napster software allows anyone to announce the
availability of their collection of MP3 files on a
central server. Others searching for a particular MP3
file can search the server to find out where to go to
get the file. However the existence of a central server
made it easy for the music industry lawyers to identify
those who were sharing the files, potentially robbing
the industry of royalties, and shut them down.
So programmers at Nullsoft developed Gnutella, based
on Napster, but which didn’t require a central server.
Instead individual computers with MP3 files to share
were linked, and requests for particular MP3 files were
passed between them.
The wider potential of this approach was spotted by
Gene Kan, one of the lead programmers in the InfraSearch
project. "I realised this wasn’t about swapping MP3
music files, but a cool new technology," said Kan. "The
whole distributed real-time search domain is something
that’s going to change the internet. This is a whole new
technological frontier, ripe for exploitation."
Netscape founder Marc Andreesen said, "It’s a big
deal. [InfraSearch] will do for search what the internet
did for communications." He added: "Most of what we’ve
been doing on the web for the past five or six years has
been pretty centralised. It’s ironic it’s taken so long
for this to happen."