 |
 |
 |
Wednesday, June 4, 2003 |
"Tom Friedman... unofficial Minister of Lucidity for the Bush administration." Why I love reading Doc...
Comments
10:45:39 PM
|
|
Geek Drawers: Gunnar has a Geek Closet, Bernie has 6, and Anita says hers aren't really that organized - "Really, I'm much more on the cluttered side! Strewing is my natural tendency." Theory: you can't be a geek, without at least one, serious Geek Drawer...
Comments
10:24:23 PM
|
|
Martha resigns: "she is being unfairly prosecuted... she is being persecuted because of her high profile or to offset the failure of authorities to prevent mammoth accounting scandals at Enron and WorldCom." You go girl... your business is a heck of a lot more solid than the above mentioned...
Comments
7:12:59 PM
|
|
 Pic-o-the day: Dee Dee Bridgewater: I was fortunate enough to have this charming chanteuse sit for a me in the 70s while I was at the long-gone, lamented Herald Examiner. Stay tuned for news about the Gulker Photo Archive...
Comments
5:15:22 PM
|
|
My note back to Brian at Cyveillance:
Brian,
Thank you very much for responding. I think the issue is that your 'bot assumes a wide pipe on the server: for those of us with meager resources, it's a nuisance. We get particularly cranky when the 'bot downloads a directory of 5 years worth of copyrighted columns, which it has done many hundreds of times now.
Which brings me to my other 2 concerns, and thank you for suggesting I bring them up directly with you:
Is Cyveillance not concerned that you are in at least technical violation of the DMCA when your 'bot ignores robots.txt? Can't this be construed as circumventing a protection mechanism?
Lastly, what do you do with my copyrighted materials? Your site suggests that you provide Internet content to clients (part of your 'early warning system'?) for a fee, which, in the case of my stuff, is pretty clearly a copyright violation, lacking prior authorization. Giving your clients a pointer is one thing, actually giving them the material is quite another.
Anyway, just curious...
Best, thanks again for looking into the bandwidth issue...
Chris
Somebody's got to ask... are the rules the same for corporations and individual citizens?
Comments
4:37:47 PM
|
|
Cyveillance's Brian Murray responds:
Hi Chris,
Thank you very much for bringing this to my attention. I will speak with our IT folks tomorrow and ask them to investigate and also to exclude your domain from future crawls, just to be sure. We try very hard to prevent this type of thing from happening. It is true that our technology retrieves data by creating a single connection to your Web site and then downloading html files across this one connection, but you should know that this is done with the intention of eliminating the work of constantly building and dropping connections. It is also meant to minimize the impact on other users, though it leaves a distinct imprint on typical log files, especially if the server was not loaded at the time, allowing it to fill these serial requests very quickly. Since we do not download images-typically the largest files on a site-minimal bandwidth is required. We believe this low impact, high visibility technique is the most responsible technique for us to use under the circumstances. We are also working on ways to further minimize the impact. Based on your message, it appears the approach may not have worked as intended, and I will be sure to look into it. Hopefully, the data you have provided will help us identify what happened here.
Thanks again, and please feel free to contact me directly should you have any further issues.
Sincerely,
Brian Murray Vice President of Client Services Cyveillance, Inc.
Now I just have 2 more questions, Brian...
Comments
4:09:10 PM
|
|
Cluetrain Manifesto quote on Doc's site this A.M.:
"A powerful global conversation begun. Through the Internet, people are discovering and inventing new ways to share relevant knowledge with blinding speed. As a direct result, markets are getting smarter — and getting smarter faster than most companies.
"These markets are conversations. Their members communicate in language that is natural, open, honest, direct, funny and often shocking. Whether explaining or complaining, joking or serious, the human voice is unmistakably genuine. It can't be faked."
Where's my copy... time to re-read... may help me understand the mechanisms at work, as mentioned below... and I just found out Doc's a DXer... (shoulda guessed)...
Comments
12:50:50 PM
|
|
Sensitivity to initial conditions? I was thinking further about my recent posts about marketing on the jog across Stanford campus this morning. I am of the opinion, you may recall, that the rise of the Internet has changed the rules of the game for business in general, and marketing in particular.
But we humans have had networks for centuries: postal service, ship lines, rail lines, telegraph, telephone, broadcast networks, airline service etc. etc. Why should one additional network make such a difference?
I think the short answer is probably that complex systems are unpredictable and sensitive to initial conditions. A world that had a number of mostly synchronous networks, now has a mainly asynchronous network that's quite flexible and robust, and which is daily put to ingenious uses. Whatever the mechanism, the network cat is out of the bag...
Comments
12:41:50 PM
|
|
"Army looking for a few good blogs?" asks Roger, who spots some unusual visitors to his blog, one of which has also been to gulker.com. It would seem that something called Land Information Warfare Activity, US Army Intelligence and Security Command (INSCOM), Fort Belvoir, Virginia came in on a Google search on 'Cyveillance'. Land Information Warfare Activity? Well, I guess they got a browserfull of my opinion about the quality of Cyveillance technology...
Comments
12:14:51 PM
|
|
Cyveillance, redux: a number of readers have pointed to this comment by Brian Murray, VP of client services at Cyveillance:
http://news.spamcop.net/pipermail/spamcop-help/2003-June/034004.html
Brian writes:
"In terms of the other concerns expressed in this Forum relating to how Cyveillance gathers information from the Internet, we set the highest standards for our online activities... and we take great pains to ensure that our crawlers minimize the load on other sites servers"
My response:
Brian-
I read your comments on 'setting high standards' and 'taking pains to minimize your crawlers' load on servers' with great interest.
If the 'bot that usually identifies itself as some flavor of IE operating on IP addresses 63.148.99.xxx is indeed yours (as is widely reported on the Net), I would be grateful for your comments on the behavior I and others have noted in our server's logs. Here's a GREP of my Apache Web server's access log, Dec. 15 to present:
http://www.gulker.com/music_industry/63_148_99_log.txt
Please note that the 'bot in question connects repeatedly to long directories and downloads files sequentially without pausing - sometimes more than a hundred in a row - as fast as my relatively modest 144K net connection will allow. A number of other Webmasters have written me with similar experiences.
While this 'bot is connected, my server is all but inaccessible to others, and we at gulker.com are unable to access external sites easily. This 'bot is not well-behaved: it also ignores robots.txt.
So is it yours? If so, when will you apply your stated policy, and fix the darn thing?
Chris Gulker
http://www.gulker.com/
PS. This is probably redundant, since your firm specializes in knowing what's happening on the Net, but there is a category, complete with RSS feed, of information about the behavior of this 'bot:
http://www.gulker.com/categories/cyveillancebot/
Here's an essay I wrote describing my experience with this creature:
http://www.gulker.com/stories/2003/05/06/whatToThinkAboutCyveillanc
And the column I wrote for London's Independent 2 weeks ago:
http://news.independent.co.uk/digital/features/story.jsp?story=408191
And the article about same on Slashdot:
http://yro.slashdot.org/article.pl?sid=03/05/07/0120237
Awaiting a response with great interest...
Comments
9:07:18 AM
|
|
Top of page | Home | About gulker.com | About Chris Gulker
Updated 4/16/04; 12:41:59 PM
|
Updated 4/16/04; 12:41:59 PM
Features & Categories:
Columns (soon)
Dotcom Garden
Lone Genius Hackers
Picture Weblog
Theory & Strategy
Weblogging
gulker.com Cam
Interesting blogs et al.:
AlwaysOn Network
Natalie d'Arbeloff
Azeem Azhar
Ken Bereskin
Blogging Ecosysytem
Blogging Network
BlogStreet
Boing Boing
Tim Bray
Matt Croydon
DaveNet
Rael Dornfest
Esther Dyson
Dave Farber's IP
Dave Fitch
David Galbraith
John Getze
William Gibson
Dan Gillmor
James Gleick
Bernie Goldbach
Meg Hourihan
Joi Ito
Xeni Jardin
Jeff Jarvis
Linux Journal
Mitch Kapor
Kuro5hin
Gunnar Langemark
Joshua Levy
Scott Loftesness
Macintouch
Ross Mayfield
Hans Moravec
Rafe Needleman
Nonsense Verse
OS Opinion
Tim Porter
Recommended Reading
Reverse Cowgirl
Glenn Reynolds
Roger Ridey
Phil Ringnalda
John Robb
Scott Rosenberg
Anita Rowland
Brent Simmons
Robert Scoble
Doc Searls
Jessica Shea
Gavin Sheridan
Shifted Librarian
Stefan Smalla
Bruce Sterling
Scripting News
Slashdot
Dan Shafer
John Tringham
Jon Udell
Moicho Umeda
Philipp Weltentummler
Kevin Werbach
Amy Wohl




|
 |