Web 'Visitor' Statistics
I found this post dissecting site 'users' at adblockplus.org to be rather validating. I use AWStats and Google Analytic at the same time and in the past had come close to reconciling the two visitor figures (GA is always lower). I know a lot of these 'missing visits and visitors' were from search engine crawlers....but could it be that nearly 10% of a website's 'traffic' is from from crawlers?
I've often thought there is a market for a company that 'caches' the web and offers cloud computing capacity at a premium in order to access the cache. Yahoo has started to do something similar with the introduction of BOSS but as far as I can tell you still need to 'move' the search results onto your computing capacity (think Amazon's EC2) and you are limited to what is in Yahoo's index (vs the entire HTML). Amazon has a service that can get you pretty close but it too doesn't provide access to to the entire page. It's called Grep the Web and is an option on the Alexa web service.
I'm waiting for a company like Cuill.com of maybe Metaweb to create this offer...or a variation that is 'free' if your output is placed into the public domain accessible on the cache provider's site.
Makes me wonder....how big is this market? Are there only 100 companies willing to pay for such a cache or would others come out of the ether.