Category: page scraping
-
VTI’s tutorial on “web scraping with LWP”
Perltuts.com | Interactive Perl tutorials
-
Google+ Scraper – retrieve data from Google+ profiles with NodeJS and CoffeeScript
fhemberger/googleplus-scraper – GitHub A lot of Javascript, CoffeeScript, NodeJS, etc.
-
Firefox Add-on “Dafizilla Table2Clipboard”
Dafizilla Table2Clipboard :: Add-ons for Firefox sources on Sourceforge.net If you want to paste data in Microsoft Excel or OpenOffice Calc with correct disposition simply use Table2Clipboard.
-
HTML::TableExtract – metacpan.org
HTML::TableExtract – Perl module for extracting the content contained in tables within an HTML document, either as text or encoded element trees. – metacpan.org
-
harvesting HTML-obfuscated web-sites looks like horror to you?
I just completed 2 tasks, where I faced obfuscated CGI forms. It was quite a challenge, and I didn’t anticipate the final success from the beginning. But it’s done. Now I am rather eager to apply my technology for interesting and lucrative tasks.
-
rather satisfied with today’s page scraping work
I did not experience much trouble, everything works just as expected. There could be more days like this one.
-
another page scraping task for the same client
It’s getting funnier again, after I got more familiar again with my “old” tool set. At first I care for the forward navigation. Got the loop operating. But will the loop also stop? Yes, the loop stops successfully. Now for the content. No, reworking the loop first. Alright, the navigational part works fine. Now for…