HTML::TableExtract – Perl module for extracting the content contained in tables within an HTML document, either as text or encoded element trees. – metacpan.org
I just completed 2 tasks, where I faced obfuscated CGI forms. It was quite a challenge, and I didn’t anticipate the final success from the beginning. But it’s done. Now I am rather eager to apply my technology for interesting and lucrative tasks.
Web Scraping – Quora
Scrappy – metacpan.org: “Scrappy – The All Powerful Web Spidering, Scraping, Creeping Crawling Framework”
Amaryllis – Webcrawling – “Die Erschließung des Webs”
An Introduction to Testing Web Applications with twill and Selenium – O’Reilly Media To cheap not to own it – I thought a little, now I am reading it.
Data Extraction for Web 2.0: Screen Scraping in Ruby/Rails, Episode 1 http://scrubyt.org (ruby) HPricot.com : “a swift, liberal HTML parser with a fantastic library” (ruby) http://brightplanet.com : “Pioneers in Harvesting the Deep Web” … Update 2010-06-05/06: One night later I am still very impressed by scrubyt, and I rather want to try it on a… Continue reading more on web harvesting
I implemented a toolkit years ago, that I call JHwis. Now and then I think, I should have do more advertising for it. I have been using software created by that toolkit for downloading bank account statements and other stuff for years now. I would like to prove you, it’s also very well suited for… Continue reading web harvesting and my toolkit JHwis