wp.jochen.hayek.name/blog-en

my approach to HTTP scripting, web harvesting, page scraping, and all that

Years ago (I guess the situation is not so much different nowadays) libcurl was just so much more powerful than LWP, that I simply had to got for libcurl. Read Using cURL to automate HTTP jobaka “The Art Of Scripting HTTP Requests Using Curl“, that’s IMHO the major evergreen in that area. libcurl does a lot more than just simple PUT, GET and that sort of thing.
All that makes my swiss army knife of web harvesting and page scraping.
Reality is seriously more challenging than text book examples, trust me!
Right, I could make all of that open source. I just recently started my open sourcerer career, after SzabGab had stayed in my place for a couple of days around LinuxTag.org/2010 at Berlin.

I should also mention Daniel Stenberg, the father of curl. IMO without his great work the art of HTTP scripting would not stand, where it stands today.

Right: last not least: no, I am not into dealing with AJAX and all that. For the last couple of years my approach has been: with the toolset I described above I can still manage “all” tasks without caring for AJAX. It does not matter enough.

Exit mobile version