Thursday, October 6, 2011

Trends (#4): Non-Google World

In their latest review of the 'Invisible Web' the website provides their estimates of the size of the Internet.  The Google World (everything found using represents only 8% of the total pages hosted on the Internet.  That 8% if large, though, - with some 26.5 billion public web pages being indexed by Google.  But the Non-Google World is much, much  larger with some 300 billion web pages.

When I teach I call it the non-Google world.  Others refer to it as "invisible Web," or "deep web" or "grey web" or "cloaked web."  Essentially any content that is not findable by Google.  Examples: product reviews; full-text articles; academic archives; tax documents...

We forget that this non-Google world exists and can be found if we look for the most logical site (that would contain the information we need) and not the actual data we need.  These sites are:

  1. Pages that are called up only after a separate search done within that site (not at Google). 
  2. Corporate intranet pages that are private and require a password to access.
  3. On-demand databases where the data or content pages only appear if you run a search and the page is then created.  These pages are stored but are not indexed by public search engines.
  4. All the library databases fall in this category as they are subscribed to (paid for) by libraries, the institutions, or the consortiums and then placed for easy access on the Internet.
When you think Google is the answer - or only answer - then think again.  

