logo_text_trans.gif
Click to see the XML version of this web page.
Wednesday, June 4, 2003

My note back to Brian at Cyveillance:

Brian,

Thank you very much for responding. I think the issue is that your 'bot assumes a wide pipe on the server: for those of us with meager resources, it's a nuisance. We get particularly cranky when the 'bot downloads a directory of 5 years worth of copyrighted columns, which it has done many hundreds of times now.

Which brings me to my other 2 concerns, and thank you for suggesting I bring them up directly with you:

Is Cyveillance not concerned that you are in at least technical violation of the DMCA when your 'bot ignores robots.txt? Can't this be construed as circumventing a protection mechanism?

Lastly, what do you do with my copyrighted materials? Your site suggests that you provide Internet content to clients (part of your 'early warning system'?) for a fee, which, in the case of my stuff, is pretty clearly a copyright violation, lacking prior authorization. Giving your clients a pointer is one thing, actually giving them the material is quite another.

Anyway, just curious...

Best, thanks again for looking into the bandwidth issue...

Chris

Somebody's got to ask... are the rules the same for corporations and individual citizens?
Comments 4:37:47 PM    

Cyveillance's Brian Murray responds:
Hi Chris,

Thank you very much for bringing this to my attention. I will speak with our IT folks tomorrow and ask them to investigate and also to exclude your domain from future crawls, just to be sure. We try very hard to prevent this type of thing from happening. It is true that our technology retrieves data by creating a single connection to your Web site and then downloading html files across this one connection, but you should know that this is done with the intention of eliminating the work of constantly building and dropping connections. It is also meant to minimize the impact on other users, though it leaves a distinct imprint on typical log files, especially if the server was not loaded at the time, allowing it to fill these serial requests very quickly. Since we do not download images-typically the largest files on a site-minimal bandwidth is required. We believe this low impact, high visibility technique is the most responsible technique for us to use under the circumstances. We are also working on ways to further minimize the impact. Based on your message, it appears the approach may not have worked as intended, and I will be sure to look into it. Hopefully, the data you have provided will help us identify what happened here.

Thanks again, and please feel free to contact me directly should you have any further issues.

Sincerely,

Brian Murray
Vice President of Client Services
Cyveillance, Inc.

Now I just have 2 more questions, Brian...
Comments 4:09:10 PM    


"Army looking for a few good blogs?" asks Roger, who spots some unusual visitors to his blog, one of which has also been to gulker.com. It would seem that something called Land Information Warfare Activity, US Army Intelligence and Security Command (INSCOM), Fort Belvoir, Virginia came in on a Google search on 'Cyveillance'. Land Information Warfare Activity? Well, I guess they got a browserfull of my opinion about the quality of Cyveillance technology...
Comments 12:14:51 PM    

Cyveillance, redux: a number of readers have pointed to this comment by Brian Murray, VP of client services at Cyveillance:

http://news.spamcop.net/pipermail/spamcop-help/2003-June/034004.html

Brian writes:

"In terms of the other concerns expressed in this Forum relating to how Cyveillance gathers information from the Internet, we set the highest standards for our online activities... and we take great pains to ensure that our crawlers minimize the load on other sites’ servers"

My response:

Brian-

I read your comments on 'setting high standards' and 'taking pains to minimize your crawlers' load on servers' with great interest.

If the 'bot that usually identifies itself as some flavor of IE operating on IP addresses 63.148.99.xxx is indeed yours (as is widely reported on the Net), I would be grateful for your comments on the behavior I and others have noted in our server's logs. Here's a GREP of my Apache Web server's access log, Dec. 15 to present:

http://www.gulker.com/music_industry/63_148_99_log.txt

Please note that the 'bot in question connects repeatedly to long directories and downloads files sequentially without pausing - sometimes more than a hundred in a row - as fast as my relatively modest 144K net connection will allow. A number of other Webmasters have written me with similar experiences.

While this 'bot is connected, my server is all but inaccessible to others, and we at gulker.com are unable to access external sites easily. This 'bot is not well-behaved: it also ignores robots.txt.

So is it yours? If so, when will you apply your stated policy, and fix the darn thing?

Chris Gulker

http://www.gulker.com/

PS. This is probably redundant, since your firm specializes in knowing what's happening on the Net, but there is a category, complete with RSS feed, of information about the behavior of this 'bot:

http://www.gulker.com/categories/cyveillancebot/

Here's an essay I wrote describing my experience with this creature:

http://www.gulker.com/stories/2003/05/06/whatToThinkAboutCyveillanc

And the column I wrote for London's Independent 2 weeks ago:

http://news.independent.co.uk/digital/features/story.jsp?story=408191

And the article about same on Slashdot:

http://yro.slashdot.org/article.pl?sid=03/05/07/0120237

Awaiting a response with great interest...
Comments 9:07:18 AM    




Top of page | Home | About gulker.com | About Chris Gulker

Updated 4/16/04; 1:19:50 PM

Chris Gulker's view from Silicon Valley - in words and pictures

Updated 4/16/04; 1:19:50 PM


June 2003
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          
May   Jul

Gulker Photo Archive Logo

Features & Categories:
Columns (soon)
Dotcom Garden
Lone Genius Hackers
Picture Weblog
Theory & Strategy
Weblogging

gulker.com Cam
gulker.com Cam

Interesting blogs et al.:

AlwaysOn Network
Natalie d'Arbeloff
Azeem Azhar
Ken Bereskin
Blogging Ecosysytem
Blogging Network
BlogStreet
Boing Boing
Tim Bray
Matt Croydon
DaveNet
Rael Dornfest
Esther Dyson
Dave Farber's IP
Dave Fitch
David Galbraith
John Getze
William Gibson
Dan Gillmor
James Gleick
Bernie Goldbach
Meg Hourihan
Joi Ito
Xeni Jardin
Jeff Jarvis
Linux Journal
Mitch Kapor
Kuro5hin
Gunnar Langemark
Joshua Levy
Scott Loftesness
Macintouch
Ross Mayfield
Hans Moravec
Rafe Needleman
Nonsense Verse
OS Opinion
Tim Porter
Recommended Reading
Reverse Cowgirl
Glenn Reynolds
Roger Ridey
Phil Ringnalda
John Robb
Scott Rosenberg
Anita Rowland
Brent Simmons
Robert Scoble
Doc Searls
Jessica Shea
Gavin Sheridan
Shifted Librarian
Stefan Smalla
Bruce Sterling
Scripting News
Slashdot
Dan Shafer
John Tringham
Jon Udell
Moicho Umeda
Philipp Weltentummler
Kevin Werbach
Amy Wohl

Click here to visit the Radio UserLand website.

Subscribe to "Cyveillancebot" in Radio UserLand.






Google