<?xml version="1.0"?><!-- RSS generated by Radio UserLand v8.0 on Sun, 08 Jun 2003 00:08:28 GMT --><rss version="2.0">	<channel>		<title>Chris Gulker: Cyveillancebot</title>		<link>http://www.gulker.com/categories/cyveillancebot/</link>		<description>Information and rants about Cyveillance Inc.&apos;s very bad bot</description>		<language>en-us</language>		<copyright>Copyright 2003 Chris Gulker</copyright>		<lastBuildDate>Sun, 08 Jun 2003 00:08:28 GMT</lastBuildDate>		<docs>http://backend.userland.com/rss</docs>		<generator>Radio UserLand v8.0</generator>		<managingEditor>cg@gulker.com</managingEditor>		<webMaster>cg@gulker.com</webMaster>		<category domain="http://www.weblogs.com/rssUpdates/changes.xml">rssUpdates</category> 		<skipHours>			<hour>0</hour>			<hour>1</hour>			<hour>2</hour>			<hour>3</hour>			<hour>4</hour>			<hour>5</hour>			<hour>23</hour>			<hour>22</hour>			</skipHours>		<ttl>60</ttl>		<item>			<description>&lt;strong&gt;Brian Murray, Cyveillance VP of client services called&lt;/strong&gt; this morning. We had a long, amicable chat. Brian made a couple points that I think are fair to point out: &lt;blockquote&gt;- his firm does not currently work for RIAA&lt;/p&gt;&lt;p&gt;- the &apos;bot is not meant to harass sites that publish opinions that its clients may not favor. They are looking at ways to ameliorate the &apos;bot&apos;s &apos;DoS mode&apos;&lt;/p&gt;&lt;p&gt;- their &apos;bots crawl either randomly or in an A to Z fashion, not in response to postings&lt;/p&gt;&lt;p&gt;- Cyveillance does not store  or distribute materials downloaded from Web sites, except for materials that belong to their clients&lt;/blockquote&gt;&lt;/p&gt;&lt;p&gt;I expressed my concerns that the behavior of the technology and their public messages are out of sync. It seems to me that a firm that wanted to be a good net citizen would fix the &apos;bot, observe robots.txt and otherwise be straightforward and forthcoming, insofar as that is consistent with their mission. Brian made reasonable responses, but said he would leave it to others to decide whether circumventing robots.txt amounts to &apos;circumventing a protection mechanism&apos; per the DMCA.&lt;/p&gt;&lt;p&gt;Lastly, I asked what would happen if their detection mechanism misfired, and included my content along with some of the creepy stuff they sift through. I was concerned that a bug in their software would land my essays and articles in some sort of &apos;bad content line-up&apos; that had the potential to sully my good name (such as it is). His response was that, while no one can guarantee that software won&apos;t misfire, everything that Cyveillance software flags is checked by humans (unlike &lt;a href=&quot;http://rss.com.com/2100-1025_3-1001319.html?type=pt&amp;part=rss&amp;tag=feed&amp;subj=news&quot;&gt;RIAA&apos;s last batch of Cease-and-Desist orders&lt;/a&gt;).  &lt;em&gt;Brian was concerned that I had published his email.  BTW my email to him was a public response to his public note on the spamcop-help lis. We are now in sync about expectations.  I may go drop in on these guys in D.C. later on this year...&lt;/em&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/06/05.html#a1421</guid>			<pubDate>Thu, 05 Jun 2003 17:19:27 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1421&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F06%2F05.html%23a1421</comments>			</item>		<item>			<description>&lt;strong&gt;My note back to Brian at Cyveillance&lt;/strong&gt;:&lt;/p&gt;&lt;p&gt;&lt;blockquote&gt;Brian,&lt;/p&gt;&lt;p&gt;Thank you very much for responding. I think the issue is that your &apos;bot assumes a wide pipe on the server: for those of us with meager resources, it&apos;s a nuisance. We get particularly cranky when the &apos;bot downloads a &lt;a href=&quot;http://www.gulker.com/ra/&quot;&gt;directory&lt;/a&gt; of 5 years worth of copyrighted columns, which it has done many hundreds of times now.&lt;/p&gt;&lt;p&gt;Which brings me to my other 2 concerns, and thank you for suggesting I bring them up directly with you: &lt;/p&gt;&lt;p&gt;Is Cyveillance not concerned that you are in at least technical violation of the DMCA when your &apos;bot ignores robots.txt? Can&apos;t this be construed as circumventing a protection mechanism?&lt;/p&gt;&lt;p&gt;Lastly, what do you do with my copyrighted materials? Your site suggests that you provide Internet content to clients (part of your &apos;&lt;a href=&quot;http://www.cyveillance.com/web/solutions/corp_security.htm&quot;&gt;early warning system&lt;/a&gt;&apos;?) for a fee, which, in the case of my stuff, is pretty clearly a copyright violation, lacking prior authorization.  Giving your clients a pointer is one thing, actually giving them the material is quite another.&lt;/p&gt;&lt;p&gt;Anyway, just curious...&lt;/p&gt;&lt;p&gt;Best, thanks again for looking into the bandwidth issue...&lt;/p&gt;&lt;p&gt;Chris&lt;/p&gt;&lt;p&gt;&lt;/blockquote&gt;&lt;em&gt;Somebody&apos;s got to ask... are the rules the same for corporations and individual citizens?&lt;/em&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/06/04.html#a1416</guid>			<pubDate>Wed, 04 Jun 2003 23:37:47 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1416&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F06%2F04.html%23a1416</comments>			</item>		<item>			<description>&lt;strong&gt;Cyveillance&apos;s Brian Murray responds&lt;/strong&gt;:&lt;blockquote&gt;Hi Chris,&lt;/p&gt;&lt;p&gt;Thank you very much for bringing this to my attention.  I will speak with our IT folks tomorrow and ask them to investigate and also to exclude your domain from future crawls, just to be sure.  We try very hard to prevent this type of thing from happening.  It is true that our technology retrieves data by creating a single connection to your Web site and then downloading html files across this one connection, but you should know that this is done with the intention of eliminating the work of constantly building and dropping connections.  It is also meant to minimize the impact on other users, though it leaves a distinct imprint on typical log files, especially if the server was not loaded at the time, allowing it to fill these serial requests very quickly. Since we do not download images-typically the largest files on a site-minimal bandwidth is required. We believe this low impact, high visibility technique is the most responsible technique for us to use under the circumstances.  We are also working on ways to further minimize the impact.  Based on your message, it appears the approach may not have worked as intended, and I will be sure to look into it.  Hopefully, the data you have provided will help us identify what happened here.&lt;/p&gt;&lt;p&gt;Thanks again, and please feel free to contact me directly should you have any further issues.  &lt;/p&gt;&lt;p&gt;Sincerely,&lt;/p&gt;&lt;p&gt;Brian Murray&lt;br&gt;Vice President of Client Services&lt;br&gt;Cyveillance, Inc.&lt;/blockquote&gt;&lt;/p&gt;&lt;p&gt;&lt;em&gt;Now I just have 2 more questions, Brian...   &lt;/em&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/06/04.html#a1415</guid>			<pubDate>Wed, 04 Jun 2003 23:09:10 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1415&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F06%2F04.html%23a1415</comments>			</item>		<item>			<description>&lt;strong&gt;&quot;Army looking for a few good blogs?&quot;&lt;/strong&gt; asks &quot;Roger&quot;, who spots some &lt;a href=&quot;http://www.ridey.net/blog/2003/06/04.html#a244&quot;&gt;unusual visitors&lt;/a&gt; to his blog, one of which has  also been to gulker.com. It would seem that something called Land Information Warfare Activity, US Army Intelligence and Security Command (INSCOM), Fort Belvoir, Virginia came in on a Google search on &apos;Cyveillance&apos;.  &lt;em&gt;Land Information Warfare Activity? Well, I guess they got a browserfull of my opinion about the quality of Cyveillance technology... &lt;/em&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/06/04.html#a1412</guid>			<pubDate>Wed, 04 Jun 2003 19:14:51 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1412&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F06%2F04.html%23a1412</comments>			</item>		<item>			<description>&lt;strong&gt;Cyveillance, redux&lt;/strong&gt;: a number of readers have pointed to this comment by Brian Murray, VP of client services at Cyveillance:&lt;/p&gt;&lt;p&gt;&lt;font size=-1&gt;&lt;a href=&quot;http://news.spamcop.net/pipermail/spamcop-help/2003-June/034004.html&quot;&gt;http://news.spamcop.net/pipermail/spamcop-help/2003-June/034004.html&lt;/a&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;Brian writes:&lt;/p&gt;&lt;p&gt;&lt;blockquote&gt;&quot;In terms of the other concerns expressed in this Forum relating to how Cyveillance gathers information from the Internet, we set the highest standards for our online activities... and we take great pains to ensure that our crawlers minimize the load on other sites&amp;#146; servers&quot;&lt;/blockquote&gt;&lt;/p&gt;&lt;p&gt;My response:&lt;/p&gt;&lt;p&gt;Brian-&lt;/p&gt;&lt;p&gt;I read your comments on &apos;setting high standards&apos; and &apos;taking pains to minimize your crawlers&apos; load on servers&apos; with great interest.&lt;/p&gt;&lt;p&gt;If the &apos;bot that usually identifies itself as some flavor of IE operating on IP addresses 63.148.99.xxx is indeed yours (as is widely reported on the Net), I would be grateful for your comments on the behavior I and others have noted in our server&apos;s logs. Here&apos;s a GREP of my Apache Web server&apos;s access log, Dec. 15 to present:&lt;/p&gt;&lt;p&gt;   &lt;font size=-1&gt;&lt;a href=&quot;http://www.gulker.com/music_industry/63_148_99_log.txt&quot;&gt;http://www.gulker.com/music_industry/63_148_99_log.txt&lt;/a&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;Please note that the &apos;bot in question connects repeatedly to long directories and downloads files sequentially without pausing - sometimes more than a hundred in a row - as fast as my relatively modest 144K net connection will allow. A number of other Webmasters have written me with similar experiences.&lt;/p&gt;&lt;p&gt;While this &apos;bot is connected, my server is all but inaccessible to others, and we at gulker.com are unable to access external sites easily.  This &apos;bot is not well-behaved: it also ignores robots.txt.&lt;/p&gt;&lt;p&gt;So is it yours? If so, when will you apply your stated policy, and fix the darn thing?&lt;/p&gt;&lt;p&gt;Chris Gulker&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;http://www.gulker.com/&quot;&gt;http://www.gulker.com/&lt;/a&gt;&lt;/p&gt;&lt;p&gt;PS. This is probably redundant, since your firm specializes in knowing what&apos;s happening on the Net, but there is a category, complete with RSS feed, of information about the behavior of this &apos;bot:&lt;/p&gt;&lt;p&gt;   &lt;font size=-1&gt;&lt;a href=&quot;http://www.gulker.com/categories/cyveillancebot/&quot;&gt;http://www.gulker.com/categories/cyveillancebot/&lt;/a&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;Here&apos;s an essay I wrote describing my experience with this creature:&lt;/p&gt;&lt;p&gt;   &lt;font size=-1&gt;&lt;a href=&quot;http://www.gulker.com/stories/2003/05/06/whatToThinkAboutCyveillanc&quot;&gt;http://www.gulker.com/stories/2003/05/06/whatToThinkAboutCyveillanc&lt;/a&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;And the column I wrote for London&apos;s Independent 2 weeks ago:&lt;/p&gt;&lt;p&gt;   &lt;font size=-1&gt;&lt;a href=&quot;http://news.independent.co.uk/digital/features/story.jsp?story=408191&quot;&gt;http://news.independent.co.uk/digital/features/story.jsp?story=408191&lt;/a&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;And the article about same on Slashdot:&lt;/p&gt;&lt;p&gt;   &lt;font size=-1&gt;&lt;a href=&quot;http://yro.slashdot.org/article.pl?sid=03/05/07/0120237&amp;mode=thread&amp;tid=158&amp;tid=99&quot;&gt;&lt;a href=&quot;http://yro.slashdot.org/article.pl?sid=03/05/07/0120237&quot;&gt;http://yro.slashdot.org/article.pl?sid=03/05/07/0120237&lt;/a&gt;&lt;/a&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;em&gt;Awaiting a response with great interest...&lt;/em&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/06/04.html#a1411</guid>			<pubDate>Wed, 04 Jun 2003 16:07:18 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1411&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F06%2F04.html%23a1411</comments>			</item>		<item>			<description>&lt;strong&gt;&quot;The business of popular music&lt;/strong&gt;, today, is now, in some peculiarly new way, entirely about promotion. &lt;em&gt;William Gibson, in a speech to the &lt;a href=&quot;http://www.williamgibsonbooks.com/archive/2003_05_01_archive.asp#200322370&quot;&gt;Director&apos;s Guild of America&lt;/a&gt;...&lt;/em&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/27.html#a1365</guid>			<pubDate>Wed, 28 May 2003 02:47:23 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1365&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F27.html%23a1365</comments>			</item>		<item>			<description>&lt;strong&gt;More &apos;bots o&apos; interest&lt;/strong&gt;: &lt;a href=&quot;http://www.blogpulse.com/&quot;&gt;Blog Pulse&lt;/a&gt;, &lt;a href=&quot;http://www.breakingblogs.com/timbo_bot.html&quot;&gt;Breaking Blogs&lt;/a&gt;, &lt;a href=&quot;http://alme1.almaden.ibm.com/cs/crawler/&quot;&gt;IBM Almaden Research Center&lt;/a&gt;, Kototoi &lt;a href=&quot;http://www.kototoi.org/zao/&quot;&gt;Zao&lt;/a&gt;, &lt;a href=&quot;http://www.deepindex.com/&quot;&gt;DeepIndex&lt;/a&gt;, NCSA &lt;a href=&quot;http://vias.ncsa.uiuc.edu/viasarchivinginformation.html&quot;&gt;VIAS&lt;/a&gt;, &lt;a href=&quot;http://216.239.37.100/search?q=cache:tXvCFcrY3DcJ:www.blogbot.com/+blogbot&amp;hl=en&amp;ie=UTF-8&quot;&gt;BlogBot&lt;/a&gt;... &lt;em&gt;It&apos;s a zoo out there...&lt;/em&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/20.html#a1333</guid>			<pubDate>Wed, 21 May 2003 00:06:12 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1333&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F20.html%23a1333</comments>			</item>		<item>			<description>&lt;strong&gt;Interesting &apos;bots&lt;/strong&gt;: &lt;a href=&quot;http://grub.org/&quot;&gt;Grub&lt;/a&gt; is a distributed bot, whose crawlers run as a background process on many machines; &lt;a href=&quot;http://www.turnitin.com/robot/crawlerinfo.html&quot;&gt;TurnItIn&lt;/a&gt; indexes your pages so teachers can see if students are plagiarizing them, &lt;a href=&quot;http://www.nutch.org/docs/bot.html&quot;&gt;Nutch&lt;/a&gt; is an Open Source &apos;bot. &lt;em&gt;They&apos;re not all &lt;a href=&quot;http://www.gulker.com/music_industry/cyveillancebot.html&quot;&gt;idiots&lt;/a&gt;...&lt;/em&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/20.html#a1332</guid>			<pubDate>Tue, 20 May 2003 23:18:26 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1332&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F20.html%23a1332</comments>			</item>		<item>			<description>&lt;strong&gt;RIAA spider &apos;tarpit&apos;&lt;/strong&gt; &lt;a href=&quot;http://www.kuro5hin.org/story/2003/5/16/163447/493&quot;&gt;randomly generates&lt;/a&gt; hundreds of &apos;&lt;a href=&quot;http://madonna.ricky.music.stodge.org/&quot;&gt;sites&lt;/a&gt;&apos; just loaded with &apos;mp3&apos; files, the better to keep &lt;a href=&quot;http://www.gulker.com/music_industry/cyveillancebot.html&quot;&gt;stupid robot&lt;/a&gt; (and &lt;a href=&quot;http://www.gulker.com/categories/cyveillancebot/2003/05/13.html&quot;&gt;human&lt;/a&gt;) behavior in check. &lt;em&gt;Complete with source code...&lt;/em&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/20.html#a1330</guid>			<pubDate>Tue, 20 May 2003 20:45:10 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1330&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F20.html%23a1330</comments>			</item>		<item>			<description>&lt;strong&gt;Darn, I thought &lt;a href=&quot;http://www.cyveillance.com/&quot;&gt;Cyveillance&lt;/a&gt; had taken me off&lt;/strong&gt; their list... but their idiot bot&apos;s been back, and woe be to those who try to access this site while &lt;a href=&quot;http://www.gulker.com/music_industry/cyveillancebot.html&quot;&gt;Cyveillancebot&lt;/a&gt; is crawling the !@#$%!@ calendar links (of all things) you see on the right side of this page. &lt;/p&gt;&lt;p&gt;And, oh, yeah, we can&apos;t use our net connection while their stupid bot is hammering us, either.  I can&apos;t believe a responsible company (they have &lt;a href=&quot;http://www.nea.com/&quot;&gt;good investors&lt;/a&gt;) lets technology that is this poor loose on the Web. &lt;em&gt;You&apos;re embarrassing yourselves... and the people who think highly enough of you to actually pay your bills...&lt;/em&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/17.html#a1319</guid>			<pubDate>Sun, 18 May 2003 06:30:15 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1319&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F17.html%23a1319</comments>			</item>		<item>			<description>&lt;strong&gt;I&apos;m a geek, I read referrer logs&lt;/strong&gt;, along with my copy of the NY Times in the morning. And maybe that data point explains why &lt;a href=&quot;http://www.gulker.com/music_industry/cyveillancebot.html&quot;&gt;Cyveillancebot&lt;/a&gt; hasn&apos;t visited ever since an apparent real, live human at Cyveillance read this Weblog a couple days ago.&lt;/p&gt;&lt;p&gt;Why would that cause the apparent (and welcome) cessation of Cyveillancebot activity? Well, one theory would be that Cyveillance is aware that they are in the business of, technically at least, infringing copyright, and are staying low, now that they know that I know.&lt;/p&gt;&lt;p&gt;One of the things I know, is that that access logs show a different pattern if a page is opened from my server, and if a copy of that page is opened from a file saved to a hard drive. So here&apos;s what it looks like when the page is opened from the server:&lt;/p&gt;&lt;p&gt;63.148.99.229 - - [13/May/2003:12:24:28 -0700] &quot;GET / HTTP/1.0&quot; 304 - &quot;-&quot; &quot;Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)&quot;&lt;/p&gt;&lt;p&gt;63.148.99.229 - - [13/May/2003:12:24:28 -0700] &quot;GET /graphics/logo_blu_bg_shado_116.png HTTP/1.0&quot; 304 - &quot;http://www.gulker.com/&quot; &quot;Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)&quot;&lt;/p&gt;&lt;p&gt;[snip - the full log sequence is &lt;a href=&quot;http://www.gulker.com/music_industry/server_sequence.txt&quot;&gt;here&lt;/a&gt; - snip]&lt;/p&gt;&lt;p&gt;63.148.99.229 - - [13/May/2003:12:24:29 -0700] &quot;GET /graphics/right_bg.jpg HTTP/1.0&quot; 304 - &quot;http://www.gulker.com/&quot; &quot;Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)&quot;&lt;/p&gt;&lt;p&gt;The first item in the sequence is a request for &quot;/&quot;, the short way to request gulker.com&apos;s default home page, and all the rest are requests for the graphical bits and pieces that comprise the page. So what happens if someone saves my page to their hard drive, and then opens it in a browser?&lt;/p&gt;&lt;p&gt;What happens is that you get the same sequence in the access log, with one exception: there is no request for &quot;/&quot; - the browser already has it, it only needs the graphics and other &apos;furniture&apos; to draw the page.&lt;/p&gt;&lt;p&gt;In a recent 2-day period, I noticed that the graphics-only sequence was requested 554 more times than the full sequence. So, more than 250 times a day, a browser somewhere in the world was pulling my page from a local file, rather than from my server.&lt;/p&gt;&lt;p&gt;One reason that this would happen is that the browser has cached my page, but not the graphics files associated with it. I&apos;ve done some experiments with this (Mozilla, IE, Safari), and the behavior depends on the type of browser and how its cache prefs are set. I&apos;m sure that some of the time, particular browser versions,  set in just the right fashion, are causing this behavior, when people come back to my page before it&apos;s expired from their cache.&lt;/p&gt;&lt;p&gt;But there is another reason this could happen. If someone were to download my page, store it on their own hard drive or local server, and then open it from that server, you see the same sequence - no &quot;/&quot;.&lt;/p&gt;&lt;p&gt;You might see this if, for example, Cyveillance - who have pulled more than a thousand files from my server including essays, articles, research, presos etc. without &lt;em&gt;ever&lt;/em&gt; (until the Tuesday human visit) downloading the attendant graphics files -  were to post my files to their internal, private network, and was allowing access to them by employees and clients.&lt;/p&gt;&lt;p&gt;If that&apos;s what they&apos;re doing, I think that is copyright infringement - many of the files they have pulled (and continue to pull down, over and over again) are copyrighted. It seems to me that if Cyveillance were to post something like &quot;This imbecile is spouting anti-DCMA blasphemy at http://www.gulker.com/ &quot; on their private or public servers, that would probably not be a copyright violation - their clients would be reading my opinions from my server. In this case, they are being paid to find inimical opinion on the Web.&lt;/p&gt;&lt;p&gt;But if they place my original work on their server, and then distribute it to the clients who pay them (large) fees, that probably is a copyright violation - they  are being paid for distributing my copyrighted material without permission, which, of course, is exactly what they and their clients object to so strenuously. &lt;em&gt;So, next step is a little detective work to figure out who owns the IP addresses that are pulling my stuff in this fashion... 63.148.99.229, BTW, is registered to Cyveillance according to arin.net, and is almost certainly one of their firewall machines...&lt;/em&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/15.html#a1304</guid>			<pubDate>Thu, 15 May 2003 17:13:50 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1304&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F15.html%23a1304</comments>			</item>		<item>			<description>&lt;strong&gt;RIAA now &lt;a href=&quot;http://rss.com.com/2100-1025_3-1001319.html?type=pt&amp;part=rss&amp;tag=feed&amp;subj=news&quot;&gt;admits&lt;/a&gt; to sending dozens&lt;/strong&gt; of erroneous cease-and-desist notices. &quot;The errors represent a black eye for the RIAA&apos;s latest efforts against piracy, which rely on automated crawlers to scour the Internet in an attempt to find material that is being distributed in a way that violates federal copyright law.&quot; &lt;i&gt;Ahem.. this was &lt;a href=&quot;http://www.gulker.com/2003/05/07.html#a1256&quot;&gt;predicted&lt;/a&gt; here...&lt;/i&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/13.html#a1294</guid>			<pubDate>Tue, 13 May 2003 22:01:08 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1294&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F13.html%23a1294</comments>			</item>		<item>			<description>&lt;strong&gt;Cyveillance humans have arrived&lt;/strong&gt;... for the first time, &lt;a href=&quot;http://www.gulker.com/music_industry/cyveillance_human_05_13_03.txt&quot;&gt;requests incoming&lt;/a&gt; from the Cyveillance IP block are pulling down graphics files linked to pages... a sign that a human-operated browser rather than a bot is looking. &lt;i&gt;Enjoy the content&amp;#8482;... and please respect the copyrights...&lt;/i&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/13.html#a1293</guid>			<pubDate>Tue, 13 May 2003 21:13:53 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1293&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F13.html%23a1293</comments>			</item>		<item>			<description>&lt;strong&gt;My &apos;interesting referrer&apos; comes from AOL&lt;/strong&gt;: &quot;Roger&quot; &lt;a href=&quot;http://www.dnsstuff.com/tools/whois.ch?ip=172.174.227.122&quot;&gt;looked it up&lt;/a&gt;.  Hmmmm... could it be? AOL-TW is well acquainted with copyright. They charge for their service, and they frequently cache content, including copyrighted material owned by others, on their servers to minimize bandwidth costs... so the claim could be made that they are selling copyrighted materials to others. Shows how loopy old-school copyright is in a networked world when an ISP is guilty of copyright infringement. &lt;i&gt;So the referrer does indeed have a future with RIAA... heck, they &lt;u&gt;are&lt;/u&gt; RIAA...&lt;/i&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/13.html#a1292</guid>			<pubDate>Tue, 13 May 2003 19:20:03 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1292&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F13.html%23a1292</comments>			</item>		<item>			<description>&lt;strong&gt;More referrer log amusement&lt;/strong&gt;: this one from someone who followed links in robots.txt intended to catch bad &apos;bots:&lt;/p&gt;&lt;p&gt;172.174.227.122 - - [11/May/2003:01:46:34 -0700] &quot;GET ***** HTTP/1.0&quot; 200 4695 &quot;-&quot; &quot;By Allowing Me Access You Waive All Rules Associated With It.&quot;&lt;/p&gt;&lt;p&gt;&lt;i&gt;A unilateral, after-the-fact contract... Someone has a future with RIAA... You could probably write a script that would put the lines of a haiku or poem in successive referrers... I want to set my referrer to &quot;All your Web are belong to us&quot;...&lt;/i&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/13.html#a1291</guid>			<pubDate>Tue, 13 May 2003 17:16:57 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1291&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F13.html%23a1291</comments>			</item>		<item>			<description>&lt;strong&gt;Americans enjoy a political democracy&lt;/strong&gt;, but not an economic democracy. Big law firms and corporations know this: they are well aware that  almost no one has the resources - time and money - to prevail against a wealthy corporation and its phalanx of lawyers. Erin Brockovich notwithstanding, it is far more likely that corporations, not individuals will prevail in civil law matters.&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;http://www.cyveillance.com/&quot;&gt;Cyveillance&lt;/a&gt;, a company that is reported to be an agent of the RIAA, MPAA and dozens of large corporations with a mission to protect their copyrighted materials has downloaded more than a thousand files from my Web site in the last 5 months.  Many hundreds of these files are copyrighted original essays, reports and presentations. &lt;/p&gt;&lt;p&gt;Responsible, law abiding publishers and corporations normally pay me from a few hundred to several thousand dollars to use the materials I have developed to formulate strategy and tactics, gauge markets, etc.&lt;/p&gt;&lt;p&gt;Cyveillance&apos;  Web site offers a &lt;a href=&quot;http://www.cyveillance.com/web/solutions/mkt_comment.htm&quot;&gt;fee-based service&lt;/a&gt; providing companies with information gleaned from the Web. So, since www.gulker.com is so amazingly popular with them, it is reasonable to suspect that they are providing my copyrighted materials (along with god knows how many others) to corporations for fees that are reported to be hundreds of thousands of dollars a year.&lt;/p&gt;&lt;p&gt;True, I place some of these files on the Web site so others can read them: it is my hope that they will spur debate and commentary, and advance the cause and utility of a global network. I would be delighted if Hilary Rosen logged on, read my work, and it helped her to find a way to serve, rather than sue, her customers. &lt;/p&gt;&lt;p&gt;However,  I do not authorize anyone to resell my copyrighted materials for a profit without my permission. It is unreasonable, unethical and illegal for others to do this. The &lt;em&gt;way&lt;/em&gt; in which my copyrighted material is delivered to RIAA executives or anybody else is important.&lt;/p&gt;&lt;p&gt;No one makes this point more loudly than RIAA and its lawyers: they recently sued 4 students for billions of dollars for distributing what they claimed were their members&apos; copyrighted materials &lt;em&gt;for free&lt;/em&gt; on private networks.  Yet, here is at least one of their agents doing exactly the same thing (Cyveillance has a &lt;a href=&quot;http://www.cyveillance.com/web/secure/partners/login.asp&quot;&gt;private network&lt;/a&gt; for its clients) &lt;em&gt;for profit&lt;/em&gt;.&lt;/p&gt;&lt;p&gt;Sure I can sue. That involves paying an attorney or team of attorneys several hundred dollars an hour for god knows how long - likely months or years. RIAA, while arguably just as guilty as kids in college dorms, would likely easily outlast my resources.&lt;/p&gt;&lt;p&gt;The &apos;lawyer door&apos; to RIAA&apos;s castle is thick, heavy and defended by large and hideous brutes. However, that doesn&apos;t mean that there is no recourse.  RIAA&apos;s members are publicly-traded corporations whose executives are (or should be) highly sensitive to the profitability of those companies.  When RIAA uses its resources to hurt  customers by way of making an example of those whose listening and viewing preferences they don&apos;t like, it is playing with fire.&lt;/p&gt;&lt;p&gt;So let&apos;s approach via (or, better, stay away from) the &apos;customer door&apos;: if angry teenagers and young people and other good customers decided that it would be very cool not to buy any CDs or DVDs or go to a movie for, say, 90 days, the same corporations would likely undergo a rapid attitude, and tactics,  adjustment. &lt;/p&gt;&lt;p&gt;Even better if the same customers took their disposable entertainment income and invested it in supporting local musicians, Indies et al. - or sent it along to responsible  charities.  &lt;i&gt;I have refused to buy CDs or DVDs for 2 years now: the $1000 average that I spent annually on each of those products in the past now goes elsewhere... $1000 probably doesn&apos;t make an RIAA lawyer&apos;s car payment, but if enough of us choose this path, maybe RIAA will get the message...&lt;/i&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/13.html#a1290</guid>			<pubDate>Tue, 13 May 2003 17:02:24 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1290&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F13.html%23a1290</comments>			</item>		<item>			<description>&lt;strong&gt;&lt;a href=&quot;htp://www.riaa.org/&quot;&gt;RIAA&lt;/a&gt; apologizes for cease-and-desist order&lt;/strong&gt; after they &lt;a href=&quot;http://news.com.com/2100-1025_3-1001095.html&quot;&gt;threaten an emeritus astronomy professor&lt;/a&gt;, &lt;a href=&quot;http://www.astro.psu.edu/users/usher/&quot;&gt;Peter Usher&lt;/a&gt;, at Penn State . The report says that a cease-and-desist was issued after the prof&apos;s name &quot;Usher&quot; and an mp3 file were discoverd on the Penn State astronomy departments ftp server. The mp3 turned out to be an &lt;a href=&quot;ftp://ftp.swift.psu.edu/pub/Swift/Documents/swift_song.mp3&quot;&gt;a capella ode&lt;/a&gt; to a satellite the department helped design, sung by the astronomers. &lt;i&gt;RIAA apparently sends the orders without actually listening to the songs... A spider is supposed to have uncovered the file... I&apos;m having a hard time believing it could be the work of the underpowered Cyveillance &apos;bot (or maybe it was and &lt;a href=&quot;http://www.gulker.com/2003/05/07.html#a1256&quot;&gt;my prediction&lt;/a&gt; is coming true)... Naaaa... So which critter is doing the crawling for RIAA?&lt;/i&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/12.html#a1284</guid>			<pubDate>Tue, 13 May 2003 04:09:19 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1284&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F12.html%23a1284</comments>			</item>		<item>			<description>&lt;strong&gt;&lt;a href=&quot;http://www.gulker.com/music_industry/cyveillancebot.html&quot;&gt;Cyveillancebot&lt;/a&gt; is quite busy on www.gulker.com&lt;/strong&gt; today: 5:00 AM, 9:00 AM visits and more &apos;mini thrashes&apos; at 11:00 and noon. &lt;i&gt;Still hasn&apos;t crawled &lt;a href=&quot;http://www.gulker.com/categories/cyveillancebot/&quot;&gt;this&lt;/a&gt;, though blogging ecosystem &lt;a href=&quot;http://organica.us/sources?url_id=233471&quot;&gt;has&lt;/a&gt;...&lt;/i&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/12.html#a1283</guid>			<pubDate>Mon, 12 May 2003 20:22:29 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1283&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F12.html%23a1283</comments>			</item>		<item>			<description>&lt;strong&gt;Cyveillance says the term &lt;a href=&quot;http://www.riaa.org/&quot;&gt;RIAA&lt;/a&gt; shows up on 1620 &apos;home pages&apos;&lt;/strong&gt;, Googlebot &lt;a href=&quot;http://www.google.com/search?q=RIAA&amp;ie=UTF-8&amp;oe=UTF-8&quot;&gt;says&lt;/a&gt; it shows up on 442,000. I&apos;m beginning to think that Cyveillance&apos;s &lt;a href=&quot;http://www.cyveillance.com/web/solutions/corp_security.htm&quot;&gt;claim&lt;/a&gt; that it provides an Internet content &quot;early warning system&quot; for corporations is without basis in observed results.&lt;/p&gt;&lt;p&gt;When the RIAA &lt;a href=&quot;http://www.wired.com/news/technology/0,1282,57048,00.html&quot;&gt;failed to notice&lt;/a&gt; that their site had been hacked, by a relatively trivial mechanism, observers pointed out that RIAA seemed to be all but clueless about technology, particularly Internet technology.&lt;/p&gt;&lt;p&gt;I think that this jibes with my observations about Cyveillance: its technology does not to appear to be very good (in fact, is visibly very bad in the instance of its thrashing &apos;bot). The reason that it can command large fees from RIAA and others is that they are even more clueless than Cyveillance is. &lt;i&gt;In the kingdom of the blind, a guy with one severely near-sighted eye can apparently do quite well selling Web services...&lt;/i&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/12.html#a1282</guid>			<pubDate>Mon, 12 May 2003 17:31:11 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1282&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F12.html%23a1282</comments>			</item>		<item>			<description>&lt;strong&gt;Looks like Cyveillancebot has thrashed&lt;/strong&gt; &lt;a href=&quot;http://www.devin.com/cruft/cyveillance.html&quot;&gt;someone else&apos;s site&lt;/a&gt;. &lt;em&gt;But, how do you really feel...&lt;/em&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/12.html#a1281</guid>			<pubDate>Mon, 12 May 2003 17:08:38 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1281&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F12.html%23a1281</comments>			</item>		<item>			<description>&lt;strong&gt;&lt;a href=&quot;http://www.hitsongscience.com/&quot;&gt;Hit Song Science&lt;/a&gt; is another Web service&lt;/strong&gt; that all the major music companies subscribe to. &lt;i&gt;Interesting... record companies relying on robots to pick hits...&lt;/i&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/12.html#a1279</guid>			<pubDate>Mon, 12 May 2003 16:44:08 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1279&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F12.html%23a1279</comments>			</item>		<item>			<description>&lt;strong&gt;Cyveillancebot thrashed &lt;a href=&quot;http://www.gulker.com/&quot;&gt;www.gulker.com&lt;/a&gt;&lt;/strong&gt; (actually a &lt;a href=&quot;http://www.gulker.com/music_industry/access_log_may_11.txt&quot;&gt;mini-thrash&lt;/a&gt; this time) at around 4:30 AM on May 11, but it was snooping &lt;a href=&quot;http://www.gulker.com/ra/&quot;&gt;old directories&lt;/a&gt;, not the &lt;a href=&quot;http://www.gulker.com/categories/cyveillancebot/index.html&quot;&gt;new directories&lt;/a&gt; loaded with commentary about it, its owners, its major clients and hot-button terms from DRM to file trading to terrorism. &lt;i&gt;It does seem to have a predilection for downloading copyrighted material... in this case &lt;/i&gt;my&lt;i&gt; copyrighted material...&lt;/i&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/12.html#a1277</guid>			<pubDate>Mon, 12 May 2003 16:24:09 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1277&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F12.html%23a1277</comments>			</item>		<item>			<description>&lt;strong&gt;Interesting: Cyveillance&apos;s &apos;bot ignores robots.txt&lt;/strong&gt; (it hasn&apos;t accessed that file on gulker.com in nearly 1000 visits), but &lt;a href=&quot;http://www.cyveillance.com/&quot;&gt;Cyveillance&lt;/a&gt; publishes a &lt;a href=&quot;http://www.cyveillance.com/robots.txt&quot;&gt;robots.txt&lt;/a&gt; on their site. &lt;i&gt;Hmmm... so what&apos;s good for the goose is not good for us ganders?&lt;/i&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/10.html#a1274</guid>			<pubDate>Sun, 11 May 2003 06:55:16 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1274&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F10.html%23a1274</comments>			</item>		<item>			<description>&lt;strong&gt;&lt;a href=&quot;http://www.cyveillance.com/&quot;&gt;Cyveillance&lt;/a&gt; will analyze your online brand for free&lt;/strong&gt;: go to &lt;a href=&quot;http://www.cyveillance.com/web/secure/partners/sizer/index.asp&quot;&gt;this page&lt;/a&gt; and enter a brand that you want to analyze (I tried my name) and it will tell you how many domains you have, how many pages you show up on (and breaks them out by conventional, violence, porn etc.), how often you turn up in links, etc.&lt;/p&gt;&lt;p&gt;Interesting... but I sense something&apos;s missing... here are the results of 3 searches on names in both Cyveillance and Google and the number of hits they return:&lt;/p&gt;&lt;p&gt;&lt;table border=&quot;0&quot; cellspacing=&quot;4&quot; cellpadding=&quot;4&quot; align=&quot;center&quot;&gt;&lt;/p&gt;&lt;p&gt;	&lt;tr&gt;&lt;/p&gt;&lt;p&gt;		&lt;td&gt;&lt;u&gt;Search Name&lt;/u&gt;&lt;/td&gt;&lt;/p&gt;&lt;p&gt;		&lt;td&gt;Dave Winer&lt;/td&gt;&lt;/p&gt;&lt;p&gt;		&lt;td&gt;Hilary Rosen&lt;/td&gt;&lt;/p&gt;&lt;p&gt;		&lt;td&gt;Chris Gulker&lt;/td&gt;&lt;/p&gt;&lt;p&gt;	&lt;/tr&gt;&lt;/p&gt;&lt;p&gt;	&lt;tr&gt;&lt;/p&gt;&lt;p&gt;		&lt;td&gt;&lt;u&gt;Cyveillance&lt;/u&gt;&lt;/td&gt;&lt;/p&gt;&lt;p&gt;		&lt;td&gt;252&lt;/td&gt;&lt;/p&gt;&lt;p&gt;		&lt;td&gt;60&lt;/td&gt;&lt;/p&gt;&lt;p&gt;		&lt;td&gt;22&lt;/td&gt;&lt;/p&gt;&lt;p&gt;	&lt;/tr&gt;&lt;/p&gt;&lt;p&gt;	&lt;tr&gt;&lt;/p&gt;&lt;p&gt;		&lt;td&gt;&lt;u&gt;Googlebot&lt;/u&gt;&lt;/td&gt;&lt;/p&gt;&lt;p&gt;		&lt;td&gt;130,000&lt;/td&gt;&lt;/p&gt;&lt;p&gt;		&lt;td&gt;31,000&lt;/td&gt;&lt;/p&gt;&lt;p&gt;		&lt;td&gt;30,500&lt;/td&gt;&lt;/p&gt;&lt;p&gt;	&lt;/tr&gt;&lt;/p&gt;&lt;p&gt;&lt;/table&gt;&lt;/p&gt;&lt;p&gt;Hilary Rosen is CEO of RIAA, and much in the news lateley. Dave Winer is a Berkman Fellow at Harvard, and a much-read observer of technology. So, I&apos;m either misunderstanding what Cyveillance means by &apos;home pages&apos; or the universe that &lt;a href=&quot;http://www.gulker.com/music_industry/cyveillancebot.html&quot;&gt;Cyveillancebot&lt;/a&gt; crawls is miniscule compared to&lt;a href=&quot;http://www.google.com/bot.html&quot;&gt; Googlebot&lt;/a&gt;&apos;s. &lt;i&gt;Googlebot touches gulker.com about 1000 times more often than Cyveillancebot: if that&apos;s any measure of the relative crawl-power of the two &apos;bots, then you&apos;d expect to see 1000x fewer hits on a given name on Cyveillance, which is very roughly the case...&lt;/i&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/10.html#a1273</guid>			<pubDate>Sun, 11 May 2003 02:05:56 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1273&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F10.html%23a1273</comments>			</item>		<item>			<description>&lt;strong&gt;&lt;a href=&quot;http://philringnalda.com/&quot;&gt;Phil Ringnalda&lt;/a&gt; &lt;a href=&quot;http://radiocomments.userland.com/comments?u=100924&amp;p=1268&amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F09.html%23a1268&quot;&gt;comments&lt;/a&gt;&lt;/strong&gt; that some &apos;bots treat robots.txt as a menu, not a fence. &lt;i&gt;I&apos;ve set gulker.com&apos;s &lt;a href=&quot;http://www.gulker.com/robots.txt&quot;&gt;version&lt;/a&gt; to help study this behavior...&lt;/i&gt;</description>			<guid>http://www.gulker.com/categories/cyveillancebot/2003/05/10.html#a1270</guid>			<pubDate>Sat, 10 May 2003 21:09:05 GMT</pubDate>			<comments>http://radiocomments.userland.com/comments?u=100924&amp;amp;p=1270&amp;amp;link=http%3A%2F%2Fwww.gulker.com%2F2003%2F05%2F10.html%23a1270</comments>			</item>		</channel>	</rss>