Category: page scraping
-
my new page scraping assignment – getting familiar again with my toolkit
For my new page scraping assignment I thought for a while of trying a much more modern approach. That actually kept me from really starting it for quite a couple of weeks now, because it seemed so very tedious and I thought, I don’t have like 3 shots for it. This week I thought about…
-
CPAN: Scrappy – The All Powerful Web Spidering, Scraping, Creeping Crawling Framework
Scrappy – metacpan.org: “Scrappy – The All Powerful Web Spidering, Scraping, Creeping Crawling Framework”
-
more on web harvesting
Data Extraction for Web 2.0: Screen Scraping in Ruby/Rails, Episode 1 http://scrubyt.org (ruby) HPricot.com : “a swift, liberal HTML parser with a fantastic library” (ruby) http://brightplanet.com : “Pioneers in Harvesting the Deep Web” … Update 2010-06-05/06: One night later I am still very impressed by scrubyt, and I rather want to try it on a…
-
web harvesting and my toolkit JHwis
I implemented a toolkit years ago, that I call JHwis. Now and then I think, I should have do more advertising for it. I have been using software created by that toolkit for downloading bank account statements and other stuff for years now. I would like to prove you, it’s also very well suited for…