Slashdotted! Daved! Guess the Cyveillancebot rant touched a nerve... good news on the log analysis front: the greater activity and inbound linking mean that some patterns are easier to discern.
I spent some time reading the Cyveillance Web site, and then reading cached versions of previous press releases it has since deleted, in the hopes of understanding just what their technology does, the better to see if patterns in my Web access logs fit various theories.
One theory that's emerged is that Cyveillancebot is particularly sensitive to certain hubs, one of which is almost certainly www.scripting.com. A quick eyeball of the logs seems to show a correlation (and one other blogger noted this) between being linked from Dave's site and being crawled by Cyveillance.
It's as if Cyveillancebot regards certain sites as 'evil' (or maybe it's 'good') and hurries to see what they're linking to. It would be informative to analyze logs at scripting.com and a couple of other high-flow zeitgeist sites and see if there's a discernible pattern.
Cyveillence touts its proprietary Extraction Agents and Data Transformation technology claiming that it delivers 100% relevant results (and call a salesperson for a demo). However, knowledgeable people who hang out at Webmasterworld and Slashdot describe Cyveillancebot as 'stupid' and 'badly behaved'. Not only is it unusually aggressive (it routinely completely saturates my modest Net connection), it gets caught in loops on database-driven sites that most other crawlers have long since been programmed to avoid.
While a 2000 press release bragged of indexing every one of the Web's then 2 billion pages, I note that, as far as I can tell, there are only 4 or 5 Cyveillancebots versus the dozens (or more) that Google runs. Napkin math would seem to indicate that you'd be hard pressed to crawl 3 billion Web pages - including the 7 million new ones every day - at anything like a level that would provide Cyveillance clients with the sort of immediate warning of misappropriation of assets or brand that Cyveillance marketing seems to promise.
Anyway, I've set a couple more 'bot experiments in place: be fun to see what, if any data can be gleaned. Despite the Slashdotting, and the presence of the name of many of Cyveillance's major customers' names in close proximity to hot-button terms in this document, and a word list likely to be very interesting to RIAA, Cyveillancebot has not visited since it came in on a link on Monday. And please do continue to send me 'GREP sightings' of Cyveillancebot: you can find it in your access logs by typing at the command line:
grep 63.148.99 access_log
...if you're equipped and inclined to do the analysis, I'd be interested in correlations between its activity (especially the intense sessions where it downloads every page of a site) and inbound links. Do links from certain sites seem to trigger it?
Comments
9:14:19 AM
|