wp.jochen.hayek.name/blog-en

Month: October 2010

Web crawler – Wikipedia, the free encyclopedia

Oct 30, 2010

—

by

johayek

in Uncategorized

Web crawler – Wikipedia, the free encyclopedia
Web scraping – Wikipedia, the free encyclopedia

Oct 30, 2010

—

by

johayek

in Uncategorized

Web scraping – Wikipedia, the free encyclopedia
screen-scraper.com

Oct 30, 2010

—

by

johayek

in Uncategorized

screen-scraper.com
Google group scrubyt is gone

Oct 30, 2010

—

by

johayek

in Uncategorized

Once in a while I am curious to see, what goes on in the scrubyt are. I have a few Atom and RSS feed URLs stored in my feed reader (Firefox Sage), but I don’t reach them any more. The group just does not exist any more: Cannot find scrubyt The group named scrubyt has…
hpricot | RubyGems.org | a swift, liberal HTML parser with a fantastic library

Oct 30, 2010

—

by

johayek

in Uncategorized

hpricot | RubyGems.org | your community gem host
Nokogiri – an HTML, XML, SAX, & Reader parser with the ability to search documents via XPath or CSS3 selectors… and much more

Oct 30, 2010

—

by

johayek

in Uncategorized

Nokogiri No Javascript support.
web scraping afternoon

Oct 29, 2010

—

by

johayek

in Uncategorized

This wasn’t meant to be yet another web scraping afternoon. This afternoon started with me trying to recover a little from a hard time. I had two probation days for a web-site testing job with Selenium, I am in the middle of a couple of recruitment processes, and I don’t want to tell you about…
Scrappy: All Powerful Web Harvester, Spider, Scraper fully automated – search.cpan.org

Oct 29, 2010

—

by

johayek

in Uncategorized

Scrappy – search.cpan.org
EDI for Ruby (edi4r)

Oct 29, 2010

—

by

johayek

in EDIFACT, The Ruby Programming Language

EDI for Ruby (edi4r) Actually they refer to EDIFACT here. You can use this software to output JSON, which you can process in any other software than.
WWW::Mechanize::Firefox – search.cpan.org

Oct 29, 2010

—

by

johayek

in CPAN, HTTP scripting, The Perl Programming Language

WWW::Mechanize::Firefox – search.cpan.org Support for Javascript and XPath. What about recording resp. capturing such a script?