This wasn’t meant to be yet another web scraping afternoon.
This afternoon started with me trying to recover a little from a hard time.
I had two probation days for a web-site testing job with Selenium, I am in the middle of a couple of recruitment processes, and I don’t want to tell you about the real trouble.
- I got intrigued to search oreilly.com for literature on Selenium and found a “Short Cut” document.
 - I found something.
 - I had a few looks over the chapter on “twill”.
 - Before I really dived into the chapter on Selenium, I summed up, what I really liked and disliked about Selenium.
 - Of course, being able to use XPath is great.
 - With Selenium you somehow aren’t aware at all, that there is Javascript being made use of on a web-site, but you just leave this to the browser engine, initially to Firefox and to the Selenium IDE.
 - I actually hate it, if your HTTP scripting depends on desktop computers running a browser and some remote control software to connect your server, where you “HTTP scripts” actually run, and the web browser(s), that you make use of.
 - I did a little superficial research on: perl/ruby + mechanize + xpath.
 - Yes, there is still scrubyt around, but isn’t that vaporware now itself?
 - Found perl’s WWW::Scraper::TidyXML – “TidyXML and XPath support for Scraper”. Not bad. But then it’s from around 2003, and it seems to be vaporware. My e-mail to the author could not get delivered (“over quota”), so I guess, it’s seriously no longer maintained.
 - WWW::Mechanize::Firefox seems to be nice, have a look at WWW::Mechanize::Firefox::Cookbook!
 - …
 
Leave a Reply