 |
 |
 |
Thursday, June 5, 2003 |
Brian Murray, Cyveillance VP of client services called this morning. We had a long, amicable chat. Brian made a couple points that I think are fair to point out:
- his firm does not currently work for RIAA
- the 'bot is not meant to harass sites that publish opinions that its clients may not favor. They are looking at ways to ameliorate the 'bot's 'DoS mode'
- their 'bots crawl either randomly or in an A to Z fashion, not in response to postings
- Cyveillance does not store or distribute materials downloaded from Web sites, except for materials that belong to their clients
I expressed my concerns that the behavior of the technology and their public messages are out of sync. It seems to me that a firm that wanted to be a good net citizen would fix the 'bot, observe robots.txt and otherwise be straightforward and forthcoming, insofar as that is consistent with their mission. Brian made reasonable responses, but said he would leave it to others to decide whether circumventing robots.txt amounts to 'circumventing a protection mechanism' per the DMCA.
Lastly, I asked what would happen if their detection mechanism misfired, and included my content along with some of the creepy stuff they sift through. I was concerned that a bug in their software would land my essays and articles in some sort of 'bad content line-up' that had the potential to sully my good name (such as it is). His response was that, while no one can guarantee that software won't misfire, everything that Cyveillance software flags is checked by humans (unlike RIAA's last batch of Cease-and-Desist orders). Brian was concerned that I had published his email. BTW my email to him was a public response to his public note on the spamcop-help lis. We are now in sync about expectations. I may go drop in on these guys in D.C. later on this year...
Comments
10:19:27 AM
|
|
 |
Wednesday, June 4, 2003 |
My note back to Brian at Cyveillance:
Brian,
Thank you very much for responding. I think the issue is that your 'bot assumes a wide pipe on the server: for those of us with meager resources, it's a nuisance. We get particularly cranky when the 'bot downloads a directory of 5 years worth of copyrighted columns, which it has done many hundreds of times now.
Which brings me to my other 2 concerns, and thank you for suggesting I bring them up directly with you:
Is Cyveillance not concerned that you are in at least technical violation of the DMCA when your 'bot ignores robots.txt? Can't this be construed as circumventing a protection mechanism?
Lastly, what do you do with my copyrighted materials? Your site suggests that you provide Internet content to clients (part of your 'early warning system'?) for a fee, which, in the case of my stuff, is pretty clearly a copyright violation, lacking prior authorization. Giving your clients a pointer is one thing, actually giving them the material is quite another.
Anyway, just curious...
Best, thanks again for looking into the bandwidth issue...
Chris
Somebody's got to ask... are the rules the same for corporations and individual citizens?
Comments
4:37:47 PM
|
|
Cyveillance's Brian Murray responds:
Hi Chris,
Thank you very much for bringing this to my attention. I will speak with our IT folks tomorrow and ask them to investigate and also to exclude your domain from future crawls, just to be sure. We try very hard to prevent this type of thing from happening. It is true that our technology retrieves data by creating a single connection to your Web site and then downloading html files across this one connection, but you should know that this is done with the intention of eliminating the work of constantly building and dropping connections. It is also meant to minimize the impact on other users, though it leaves a distinct imprint on typical log files, especially if the server was not loaded at the time, allowing it to fill these serial requests very quickly. Since we do not download images-typically the largest files on a site-minimal bandwidth is required. We believe this low impact, high visibility technique is the most responsible technique for us to use under the circumstances. We are also working on ways to further minimize the impact. Based on your message, it appears the approach may not have worked as intended, and I will be sure to look into it. Hopefully, the data you have provided will help us identify what happened here.
Thanks again, and please feel free to contact me directly should you have any further issues.
Sincerely,
Brian Murray Vice President of Client Services Cyveillance, Inc.
Now I just have 2 more questions, Brian...
Comments
4:09:10 PM
|
|
"Army looking for a few good blogs?" asks Roger, who spots some unusual visitors to his blog, one of which has also been to gulker.com. It would seem that something called Land Information Warfare Activity, US Army Intelligence and Security Command (INSCOM), Fort Belvoir, Virginia came in on a Google search on 'Cyveillance'. Land Information Warfare Activity? Well, I guess they got a browserfull of my opinion about the quality of Cyveillance technology...
Comments
12:14:51 PM
|
|
Cyveillance, redux: a number of readers have pointed to this comment by Brian Murray, VP of client services at Cyveillance:
http://news.spamcop.net/pipermail/spamcop-help/2003-June/034004.html
Brian writes:
"In terms of the other concerns expressed in this Forum relating to how Cyveillance gathers information from the Internet, we set the highest standards for our online activities... and we take great pains to ensure that our crawlers minimize the load on other sites servers"
My response:
Brian-
I read your comments on 'setting high standards' and 'taking pains to minimize your crawlers' load on servers' with great interest.
If the 'bot that usually identifies itself as some flavor of IE operating on IP addresses 63.148.99.xxx is indeed yours (as is widely reported on the Net), I would be grateful for your comments on the behavior I and others have noted in our server's logs. Here's a GREP of my Apache Web server's access log, Dec. 15 to present:
http://www.gulker.com/music_industry/63_148_99_log.txt
Please note that the 'bot in question connects repeatedly to long directories and downloads files sequentially without pausing - sometimes more than a hundred in a row - as fast as my relatively modest 144K net connection will allow. A number of other Webmasters have written me with similar experiences.
While this 'bot is connected, my server is all but inaccessible to others, and we at gulker.com are unable to access external sites easily. This 'bot is not well-behaved: it also ignores robots.txt.
So is it yours? If so, when will you apply your stated policy, and fix the darn thing?
Chris Gulker
http://www.gulker.com/
PS. This is probably redundant, since your firm specializes in knowing what's happening on the Net, but there is a category, complete with RSS feed, of information about the behavior of this 'bot:
http://www.gulker.com/categories/cyveillancebot/
Here's an essay I wrote describing my experience with this creature:
http://www.gulker.com/stories/2003/05/06/whatToThinkAboutCyveillanc
And the column I wrote for London's Independent 2 weeks ago:
http://news.independent.co.uk/digital/features/story.jsp?story=408191
And the article about same on Slashdot:
http://yro.slashdot.org/article.pl?sid=03/05/07/0120237
Awaiting a response with great interest...
Comments
9:07:18 AM
|
|
 |
Tuesday, May 27, 2003 |
"The business of popular music, today, is now, in some peculiarly new way, entirely about promotion. William Gibson, in a speech to the Director's Guild of America...
Comments
7:47:23 PM
|
|
 |
Tuesday, May 20, 2003 |
Interesting 'bots: Grub is a distributed bot, whose crawlers run as a background process on many machines; TurnItIn indexes your pages so teachers can see if students are plagiarizing them, Nutch is an Open Source 'bot. They're not all idiots...
Comments
4:18:26 PM
|
|
 |
Saturday, May 17, 2003 |
Darn, I thought Cyveillance had taken me off their list... but their idiot bot's been back, and woe be to those who try to access this site while Cyveillancebot is crawling the !@#$%!@ calendar links (of all things) you see on the right side of this page.
And, oh, yeah, we can't use our net connection while their stupid bot is hammering us, either. I can't believe a responsible company (they have good investors) lets technology that is this poor loose on the Web. You're embarrassing yourselves... and the people who think highly enough of you to actually pay your bills...
Comments
11:30:15 PM
|
|
 |
Thursday, May 15, 2003 |
I'm a geek, I read referrer logs, along with my copy of the NY Times in the morning. And maybe that data point explains why Cyveillancebot hasn't visited ever since an apparent real, live human at Cyveillance read this Weblog a couple days ago.
Why would that cause the apparent (and welcome) cessation of Cyveillancebot activity? Well, one theory would be that Cyveillance is aware that they are in the business of, technically at least, infringing copyright, and are staying low, now that they know that I know.
One of the things I know, is that that access logs show a different pattern if a page is opened from my server, and if a copy of that page is opened from a file saved to a hard drive. So here's what it looks like when the page is opened from the server:
63.148.99.229 - - [13/May/2003:12:24:28 -0700] "GET / HTTP/1.0" 304 - "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
63.148.99.229 - - [13/May/2003:12:24:28 -0700] "GET /graphics/logo_blu_bg_shado_116.png HTTP/1.0" 304 - "http://www.gulker.com/" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
[snip - the full log sequence is here - snip]
63.148.99.229 - - [13/May/2003:12:24:29 -0700] "GET /graphics/right_bg.jpg HTTP/1.0" 304 - "http://www.gulker.com/" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
The first item in the sequence is a request for "/", the short way to request gulker.com's default home page, and all the rest are requests for the graphical bits and pieces that comprise the page. So what happens if someone saves my page to their hard drive, and then opens it in a browser?
What happens is that you get the same sequence in the access log, with one exception: there is no request for "/" - the browser already has it, it only needs the graphics and other 'furniture' to draw the page.
In a recent 2-day period, I noticed that the graphics-only sequence was requested 554 more times than the full sequence. So, more than 250 times a day, a browser somewhere in the world was pulling my page from a local file, rather than from my server.
One reason that this would happen is that the browser has cached my page, but not the graphics files associated with it. I've done some experiments with this (Mozilla, IE, Safari), and the behavior depends on the type of browser and how its cache prefs are set. I'm sure that some of the time, particular browser versions, set in just the right fashion, are causing this behavior, when people come back to my page before it's expired from their cache.
But there is another reason this could happen. If someone were to download my page, store it on their own hard drive or local server, and then open it from that server, you see the same sequence - no "/".
You might see this if, for example, Cyveillance - who have pulled more than a thousand files from my server including essays, articles, research, presos etc. without ever (until the Tuesday human visit) downloading the attendant graphics files - were to post my files to their internal, private network, and was allowing access to them by employees and clients.
If that's what they're doing, I think that is copyright infringement - many of the files they have pulled (and continue to pull down, over and over again) are copyrighted. It seems to me that if Cyveillance were to post something like "This imbecile is spouting anti-DCMA blasphemy at http://www.gulker.com/ " on their private or public servers, that would probably not be a copyright violation - their clients would be reading my opinions from my server. In this case, they are being paid to find inimical opinion on the Web.
But if they place my original work on their server, and then distribute it to the clients who pay them (large) fees, that probably is a copyright violation - they are being paid for distributing my copyrighted material without permission, which, of course, is exactly what they and their clients object to so strenuously. So, next step is a little detective work to figure out who owns the IP addresses that are pulling my stuff in this fashion... 63.148.99.229, BTW, is registered to Cyveillance according to arin.net, and is almost certainly one of their firewall machines...
Comments
10:13:50 AM
|
|
 |
Tuesday, May 13, 2003 |
RIAA now admits to sending dozens of erroneous cease-and-desist notices. "The errors represent a black eye for the RIAA's latest efforts against piracy, which rely on automated crawlers to scour the Internet in an attempt to find material that is being distributed in a way that violates federal copyright law." Ahem.. this was predicted here...
Comments
3:01:08 PM
|
|
Cyveillance humans have arrived... for the first time, requests incoming from the Cyveillance IP block are pulling down graphics files linked to pages... a sign that a human-operated browser rather than a bot is looking. Enjoy the content™... and please respect the copyrights...
Comments
2:13:53 PM
|
|
My 'interesting referrer' comes from AOL: Roger looked it up. Hmmmm... could it be? AOL-TW is well acquainted with copyright. They charge for their service, and they frequently cache content, including copyrighted material owned by others, on their servers to minimize bandwidth costs... so the claim could be made that they are selling copyrighted materials to others. Shows how loopy old-school copyright is in a networked world when an ISP is guilty of copyright infringement. So the referrer does indeed have a future with RIAA... heck, they are RIAA...
Comments
12:20:03 PM
|
|
More referrer log amusement: this one from someone who followed links in robots.txt intended to catch bad 'bots:
172.174.227.122 - - [11/May/2003:01:46:34 -0700] "GET ***** HTTP/1.0" 200 4695 "-" "By Allowing Me Access You Waive All Rules Associated With It."
A unilateral, after-the-fact contract... Someone has a future with RIAA... You could probably write a script that would put the lines of a haiku or poem in successive referrers... I want to set my referrer to "All your Web are belong to us"...
Comments
10:16:57 AM
|
|
Americans enjoy a political democracy, but not an economic democracy. Big law firms and corporations know this: they are well aware that almost no one has the resources - time and money - to prevail against a wealthy corporation and its phalanx of lawyers. Erin Brockovich notwithstanding, it is far more likely that corporations, not individuals will prevail in civil law matters.
Cyveillance, a company that is reported to be an agent of the RIAA, MPAA and dozens of large corporations with a mission to protect their copyrighted materials has downloaded more than a thousand files from my Web site in the last 5 months. Many hundreds of these files are copyrighted original essays, reports and presentations.
Responsible, law abiding publishers and corporations normally pay me from a few hundred to several thousand dollars to use the materials I have developed to formulate strategy and tactics, gauge markets, etc.
Cyveillance' Web site offers a fee-based service providing companies with information gleaned from the Web. So, since www.gulker.com is so amazingly popular with them, it is reasonable to suspect that they are providing my copyrighted materials (along with god knows how many others) to corporations for fees that are reported to be hundreds of thousands of dollars a year.
True, I place some of these files on the Web site so others can read them: it is my hope that they will spur debate and commentary, and advance the cause and utility of a global network. I would be delighted if Hilary Rosen logged on, read my work, and it helped her to find a way to serve, rather than sue, her customers.
However, I do not authorize anyone to resell my copyrighted materials for a profit without my permission. It is unreasonable, unethical and illegal for others to do this. The way in which my copyrighted material is delivered to RIAA executives or anybody else is important.
No one makes this point more loudly than RIAA and its lawyers: they recently sued 4 students for billions of dollars for distributing what they claimed were their members' copyrighted materials for free on private networks. Yet, here is at least one of their agents doing exactly the same thing (Cyveillance has a private network for its clients) for profit.
Sure I can sue. That involves paying an attorney or team of attorneys several hundred dollars an hour for god knows how long - likely months or years. RIAA, while arguably just as guilty as kids in college dorms, would likely easily outlast my resources.
The 'lawyer door' to RIAA's castle is thick, heavy and defended by large and hideous brutes. However, that doesn't mean that there is no recourse. RIAA's members are publicly-traded corporations whose executives are (or should be) highly sensitive to the profitability of those companies. When RIAA uses its resources to hurt customers by way of making an example of those whose listening and viewing preferences they don't like, it is playing with fire.
So let's approach via (or, better, stay away from) the 'customer door': if angry teenagers and young people and other good customers decided that it would be very cool not to buy any CDs or DVDs or go to a movie for, say, 90 days, the same corporations would likely undergo a rapid attitude, and tactics, adjustment.
Even better if the same customers took their disposable entertainment income and invested it in supporting local musicians, Indies et al. - or sent it along to responsible charities. I have refused to buy CDs or DVDs for 2 years now: the $1000 average that I spent annually on each of those products in the past now goes elsewhere... $1000 probably doesn't make an RIAA lawyer's car payment, but if enough of us choose this path, maybe RIAA will get the message...
Comments
10:02:24 AM
|
|
Top of page | Home | About gulker.com | About Chris Gulker
Updated 4/16/04; 1:19:03 PM
|
Updated 4/16/04; 1:19:03 PM
| April 2004 |
| Sun |
Mon |
Tue |
Wed |
Thu |
Fri |
Sat |
| |
|
|
|
1 |
2 |
3 |
| 4 |
5 |
6 |
7 |
8 |
9 |
10 |
| 11 |
12 |
13 |
14 |
15 |
16 |
17 |
| 18 |
19 |
20 |
21 |
22 |
23 |
24 |
| 25 |
26 |
27 |
28 |
29 |
30 |
|
| Jun May |
Features & Categories:
Columns (soon)
Dotcom Garden
Lone Genius Hackers
Picture Weblog
Theory & Strategy
Weblogging
gulker.com Cam
Interesting blogs et al.:
AlwaysOn Network
Natalie d'Arbeloff
Azeem Azhar
Ken Bereskin
Blogging Ecosysytem
Blogging Network
BlogStreet
Boing Boing
Tim Bray
Matt Croydon
DaveNet
Rael Dornfest
Esther Dyson
Dave Farber's IP
Dave Fitch
David Galbraith
John Getze
William Gibson
Dan Gillmor
James Gleick
Bernie Goldbach
Meg Hourihan
Joi Ito
Xeni Jardin
Jeff Jarvis
Linux Journal
Mitch Kapor
Kuro5hin
Gunnar Langemark
Joshua Levy
Scott Loftesness
Macintouch
Ross Mayfield
Hans Moravec
Rafe Needleman
Nonsense Verse
OS Opinion
Tim Porter
Recommended Reading
Reverse Cowgirl
Glenn Reynolds
Roger Ridey
Phil Ringnalda
John Robb
Scott Rosenberg
Anita Rowland
Brent Simmons
Robert Scoble
Doc Searls
Jessica Shea
Gavin Sheridan
Shifted Librarian
Stefan Smalla
Bruce Sterling
Scripting News
Slashdot
Dan Shafer
John Tringham
Jon Udell
Moicho Umeda
Philipp Weltentummler
Kevin Werbach
Amy Wohl




|
 |