This wasn’t meant to be yet another web scraping afternoon.
This afternoon started with me trying to recover a little from a hard time.
I had two probation days for a web-site testing job with Selenium, I am in the middle of a couple of recruitment processes, and I don’t want to tell you about the real trouble.
- I got intrigued to search oreilly.com for literature on Selenium and found a “Short Cut” document.
- I found something.
- I had a few looks over the chapter on “twill”.
- Before I really dived into the chapter on Selenium, I summed up, what I really liked and disliked about Selenium.
- Of course, being able to use XPath is great.
- With Selenium you somehow aren’t aware at all, that there is Javascript being made use of on a web-site, but you just leave this to the browser engine, initially to Firefox and to the Selenium IDE.
- I actually hate it, if your HTTP scripting depends on desktop computers running a browser and some remote control software to connect your server, where you “HTTP scripts” actually run, and the web browser(s), that you make use of.
- I did a little superficial research on: perl/ruby + mechanize + xpath.
- Yes, there is still scrubyt around, but isn’t that vaporware now itself?
- Found perl’s WWW::Scraper::TidyXML – “TidyXML and XPath support for Scraper”. Not bad. But then it’s from around 2003, and it seems to be vaporware. My e-mail to the author could not get delivered (“over quota”), so I guess, it’s seriously no longer maintained.
- WWW::Mechanize::Firefox seems to be nice, have a look at WWW::Mechanize::Firefox::Cookbook!
- …
Leave a Reply