logo_text_trans.gif
Click to see the XML version of this web page. Bad Robot, get off my Web site!

By Chris Gulker

I'm a geek: I read my Web server access logs along with the morning papers. The logs record visits to my Web site: lately thereÕs been a creepy one. The visitor in question is a Ôbot, short for ÔrobotÕ, but itÕs really software.

Bots, aka ÔcrawlersÕ aka ÔspidersÕ, are programs that crawl the Web: they follow links from page to page endlessly searching for new and changed pages. And they have a lot of work to do: there are some 7 million new pages added to the Web daily.

One of the most famous Ôbots is Googlebot, the crawler that works for Google, and which has managed to place 3,083,324,652 pages in the search engineÕs index. There are so many bots [<] Google has hundreds, as do Yahoo, Hotbot, Internet Archive and dozens of other organizations, that some informal rules have been put in place to help manage the traffic they generate.

One rule is that the bot identify itself in the Web serverÕs access log, including a line that tells you where to go to if you have a problem; another is that the bot will respect rules for access to the site contained in a file called robots.txt.

The bot in question is like a stranger who comes to your door, and, by way of introduction, lies to you. Indeed, itÕs a kind of hyperkinetic liar: it forges the names of different versions of MicrosftÕs Internet Explorer, in the space of a few seconds. Imagine meeting someone who told you "Hi, I'm Bob Jones, Hi I'm Roger Smith, Hi, I'm Elaine MacPherson" in one gush.

If Googlebot is like the well-known delivery person who comes to the door in a uniform, shows ID and leaves a business card, this bot is like an unshaven, twitchy guy with his hat pulled down, lurking by your half-open window.

My visitor is known to work for record companies, and motion picture studios and large corporations. Its job is to find out what people think about them, and check, while itÕs around, that no stray intellectual property [<] mp3 files, movies, trademarks - happens to be on my serverÕs hard drive.

The bot has another message: it often shows up shortly after I write about bad ideas like digital rights management, the Digital Millenium Copyright Act, or the folly of entertainment companies who seek to punish, and criminalize their best customers. Frequently, it hits my Web site and Internet connection so hard that I canÕt use it, and neither can anyone else.

It has shown up more than 1000 times since mid December (I write a lot) and not once has it checked robots.txt, my digital Ôkeep off the lawnÕ sign.

And, interestingly, for an agent of the defenders of copyright, it has downloaded hundreds of copyrighted articles, essays and presentations from my site. If a scholarly journal description of this bot is correct, it is sharing these copyrighted files with its clients, and without my permission. IsnÕt sharing sharing copyrighted files the very act that record companies find so objectionable? At least when itÕs their copyrighted material?

Recently the record industry trade group RIAA apologized for sending dozens of erroneous cease-and-desist orders to people who had been identified by their robots, including one to a professor emeritus of astronomy: it had sought to shut down his departmentÕs server in the middle of exam week when it mistook him, and a chorus of astronomers for the singer Usher. He got a CD and T-shirt for the bother.

But, no T-shirt for me, and I suspect thousands of others. For us, itÕs just a rude visitor, anytime we dare criticize its masters.

Top of page | Home | About gulker.com | About Chris Gulker

Updated 4/16/04; 1:56:36 PM

Chris Gulker's view from Silicon Valley - in words and pictures

Updated 4/16/04; 1:56:36 PM


April 2004
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30  
Mar   May

Gulker Photo Archive Logo

Features & Categories:
Columns (soon)
Dotcom Garden
Picture Weblog
Theory & Strategy
Weblogging

gulker.com Cam
gulker.com Cam

Interesting blogs et al.:

AlwaysOn Network
Natalie d'Arbeloff
Azeem Azhar
Ken Bereskin
Blogging Ecosysytem
Blogging Network
BlogStreet
Boing Boing
Tim Bray
Matt Croydon
DaveNet
Rael Dornfest
Esther Dyson
Dave Farber's IP
Dave Fitch
David Galbraith
John Getze
William Gibson
Dan Gillmor
James Gleick
Bernie Goldbach
Meg Hourihan
Joi Ito
Xeni Jardin (home, Boing Boing)
Jeff Jarvis
Linux Journal
Mitch Kapor
Kuro5hin
Gunnar Langemark
Joshua Levy
Scott Loftesness
Macintouch
Ross Mayfield
Hans Moravec
Rafe Needleman
Nonsense Verse
OS Opinion
Tim Porter
Recommended Reading
Reverse Cowgirl
Glenn Reynolds
Roger Ridey
Phil Ringnalda
John Robb
Scott Rosenberg
Anita Rowland
Brent Simmons
Robert Scoble
Doc Searls
Jessica Shea
Gavin Sheridan
Shifted Librarian
Stefan Smalla
Bruce Sterling
Scripting News
Slashdot
Dan Shafer
John Tringham
Jon Udell
Moicho Umeda
Philipp Weltentummler
Kevin Werbach
Amy Wohl

Click here to visit the Radio UserLand website.

Subscribe to "www.gulker.com - words and pictures from Silicon Valley" in Radio UserLand.






Google