The Invisible Web

 

Back in 2000, Danny Sullivan of SearchEngineWatch wrote about the growing “invisible web” here. Since then, search engines have made a lot of changes in their algorithms to index as much of the invisible web as possible. With this post we take a look on where the invisible web or dark web stands today. For those who are not familiar with the term, lets take a quick look at the basics.


What is invisible web?
When you want to search for something over the web, you mostly use search engines for that. You type a keyword in the search box and get the relevant results. These results returned to you by the search engine are stored in a database and form the “index” of the search engine. In short, these are the pages that match your query and the search engines knows about them.


What about the ones the search engine doesn’t knows? Weird question, you may think. It is not. There are pages that search engine bots fail to index for many reasons like :

Dynamic URLs with longer parameters attached to them which are not fully followed by search engine bots and even a missing parameter results in 404 error.
Password protected areas of the website.
Form controlled entry wherein a manual trigger is required to display the content, for example a search form to display the content dynamically.
Pages that are not linked from other pages meaning that search bot do not know they really exist and thus fail to reach them.

These are just a few. There have been several other reasons that prevent search bots to reach (or intentionally prevent)


How deep is the “invisible web” today?
In December 2006, Computer World reported that search engines index only 16-20% of the web. The remaining 80% constitutes the invisible web. In 2004, Google’s VP of Engineering, Bill Coughran wrote on Google’s official blog that Google had more than 8 billion web pages in its index. That makes me think, even if we consider that old figure comparing the 2006 report of Computer World, there would be more than 32 billion pages that yet wait to be indexed by search engines. Wow !

Now is not it likely that if you fail to locate something using search engines, then it may well be a part of this deeper, invisible web? I am sure you will agree to this. Lets take a look on how to search within this invisible web.


How to search in the invisible web?
First is to understand what the problem is. You should know beforehand about the above problem and it is only then that you can get out of it. For those who don’t yet understand what the problem is, i would recommend reading the article again. For the rest of you, here it is :

1. Use the tools provided by website itself, like archives, site search etc.
2. Use deep web search engines like Complete Planet.
3. Search in subject directories.
4. Use the words database, list etc. with your keyword(s) in regular searches with search engines.

For those who are wondering if they have similar problem of un-indexed pages with their website, the article is not yet over.


How to improve the visibility of your site?
If you read the article carefully and understood the problem we face with the invisible web, most of you must have guessed half of the points here. Still, here it is :

Deep and internal linking
of web pages helps search engine bots to reach the pages through already indexed pages.
Social bookmark the pages. The concept is somewhat same : Helping bots to reach the web pages and thus knowing about their existence.
Publish html site maps.
Remove password restrictions
and avoid them as much as possible.
Publish supplementary content for audio, video and other formats not readable by search engines.

Remember one thing. The content really exists. It is only that search engine bot could not find a way to reach and index them or that the format is not such that it allows bots to read its content as in case of video and audio files. Once you keep this point in mind, you will have a pretty long list to make your website more and more visible to the search engines.

Thats the easiest way out.

One Response to “The Invisible Web”

  1. Verticalization of search : More relevancy, solution to Invisible Web. Says:

    […] on web. With the kind of information available on the web and the ever increasing span of the invisible web it becomes hard to find that relevant information. Vertical search engines address this problem and […]

Leave a Reply

You must be logged in to post a comment.