![]() | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Bad Robot, get off my Web site! By Chris Gulker I'm a geek: I read my Web server access logs along with the morning papers. The logs record visits to my Web site: lately thereÕs been a creepy one. The visitor in question is a Ôbot, short for ÔrobotÕ, but itÕs really software. Bots, aka ÔcrawlersÕ aka ÔspidersÕ, are programs that crawl the Web: they follow links from page to page endlessly searching for new and changed pages. And they have a lot of work to do: there are some 7 million new pages added to the Web daily. One of the most famous Ôbots is Googlebot, the crawler that works for Google, and which has managed to place 3,083,324,652 pages in the search engineÕs index. There are so many bots [<] Google has hundreds, as do Yahoo, Hotbot, Internet Archive and dozens of other organizations, that some informal rules have been put in place to help manage the traffic they generate. One rule is that the bot identify itself in the Web serverÕs access log, including a line that tells you where to go to if you have a problem; another is that the bot will respect rules for access to the site contained in a file called robots.txt. The bot in question is like a stranger who comes to your door, and, by way of introduction, lies to you. Indeed, itÕs a kind of hyperkinetic liar: it forges the names of different versions of MicrosftÕs Internet Explorer, in the space of a few seconds. Imagine meeting someone who told you "Hi, I'm Bob Jones, Hi I'm Roger Smith, Hi, I'm Elaine MacPherson" in one gush. If Googlebot is like the well-known delivery person who comes to the door in a uniform, shows ID and leaves a business card, this bot is like an unshaven, twitchy guy with his hat pulled down, lurking by your half-open window. My visitor is known to work for record companies, and motion picture studios and large corporations. Its job is to find out what people think about them, and check, while itÕs around, that no stray intellectual property [<] mp3 files, movies, trademarks - happens to be on my serverÕs hard drive. The bot has another message: it often shows up shortly after I write about bad ideas like digital rights management, the Digital Millenium Copyright Act, or the folly of entertainment companies who seek to punish, and criminalize their best customers. Frequently, it hits my Web site and Internet connection so hard that I canÕt use it, and neither can anyone else. It has shown up more than 1000 times since mid December (I write a lot) and not once has it checked robots.txt, my digital Ôkeep off the lawnÕ sign. And, interestingly, for an agent of the defenders of copyright, it has downloaded hundreds of copyrighted articles, essays and presentations from my site. If a scholarly journal description of this bot is correct, it is sharing these copyrighted files with its clients, and without my permission. IsnÕt sharing sharing copyrighted files the very act that record companies find so objectionable? At least when itÕs their copyrighted material? Recently the record industry trade group RIAA apologized for sending dozens of erroneous cease-and-desist orders to people who had been identified by their robots, including one to a professor emeritus of astronomy: it had sought to shut down his departmentÕs server in the middle of exam week when it mistook him, and a chorus of astronomers for the singer Usher. He got a CD and T-shirt for the bother.
But, no T-shirt for me, and I suspect thousands of others. For us, itÕs just a rude visitor, anytime we dare criticize its masters.
Updated 4/16/04; 1:56:36 PM |
Updated 4/16/04; 1:56:36 PM
AlwaysOn Network
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||