Blog

  • web scraping afternoon

    This wasn’t meant to be yet another web scraping afternoon.

    This afternoon started with me trying to recover a little from a hard time.
    I had two probation days for a web-site testing job with Selenium, I am in the middle of a couple of recruitment processes, and I don’t want to tell you about the real trouble.

    • I got intrigued to search oreilly.com for literature on Selenium and found a “Short Cut” document.
    • I found something.
    • I had a few looks over the chapter on “twill”.
    • Before I really dived into the chapter on Selenium, I summed up, what I really liked and disliked about Selenium.
    • Of course, being able to use XPath is great.
    • With Selenium you somehow aren’t aware at all, that there is Javascript being made use of on a web-site, but you just leave this to the browser engine, initially to Firefox and to the Selenium IDE.
    • I actually hate it, if your HTTP scripting depends on desktop computers running a browser and some remote control software to connect your server, where you “HTTP scripts” actually run, and the web browser(s), that you make use of.
    • I did a little superficial research on: perl/ruby + mechanize + xpath.
    • Yes, there is still scrubyt around, but isn’t  that vaporware now itself?
    • Found perl’s WWW::Scraper::TidyXML – “TidyXML and XPath support for Scraper”. Not bad. But then it’s from around 2003, and it seems to be vaporware. My e-mail to the author could not get delivered (“over quota”), so I guess, it’s seriously no longer maintained.
    • WWW::Mechanize::Firefox seems to be nice, have a look at WWW::Mechanize::Firefox::Cookbook!
  • EDI for Ruby (edi4r)

    EDI for Ruby (edi4r)

    Actually they refer to EDIFACT here.

    You can use this software to output JSON, which you can process in any other software than.

  • WWW::Mechanize::Firefox – search.cpan.org

    WWW::Mechanize::Firefox – search.cpan.org

    Support for Javascript and XPath.

    What about recording resp. capturing such a script?

  • perl, cpan: WWW::Scripter

    WWW::Scripter – search.cpan.org

    From the POD there:

    DESCRIPTION 

    This is a subclass of WWW::Mechanize that uses the W3C DOM and provides support for scripting.

    No actual scripting engines are provided with WWW::Scripter, but are available as separate plugins. (See also the “SEE ALSO” section below.)

    So it supports DOM, but no XPath expression yet.
    And there is Javascript support through plugins.

  • HSDD = hypoactive sexual desire disorder

    A link to the abstract of the conference article / press release.

    From that abstract:

    CONCLUSION: Cerebral activation patterns in women with HSDD differs from those in women with normal sexual function and may reflect differences in how they interpret sexual stimuli.

    In other words: Women with low libidos ‘have different brains’.

    Have a good laugh!!!

    Here is a lengthy discussion of the “miserable” approach in that article.

  • Selenium+XPather: e.g. verifyTextPresent vs. verifyElementPresent

    Selenium usually records string clicks and tests instead of true native language independent XPath expressions. But you can always find the right XPath expression yourself (resp. with the help of XPather, a Firefox extension), and make use of it in your selenium code.

    Caveat: the XPath expression, that XPather tells you, needs yet another ‘/’ in the beginning to be useful in your Selenium code.

    Yes, these XPath expressions are lengthy, and you may think they are overspecifying your location in question, but then: when will that lengthy XPath expression ever fail? If your HTML programmer changes his code. And that’s exactly, what you should insist of being informed of in the first place. Track your HTML programmer! If you don’t, he will screw you w/o any mercy. You don’t want to screw him, but you need to know the consequences of what he is doing. Actually not in every detail, but more details are better than no details at all.

    We replaced verifyTextPresent with verifyElementPresent, and it worked “out of the box”. We gained native language independence immediately.

  • Wall Street 2: Money Never Sleeps (2010)

    Wall Street 2: Money Never Sleeps (2010)

    My Wednesday (2010-10-27) night movie.

    Very nice. Good entertainment.

    Now I know, how to pronounce the surname “Schwartz” in English. That’s one of the main characters in that movie.

    Thanks to my movie night sponsor EN!!!!

  • Selenium: strftime, sprintf

    I would like to see a strftime or an sprintf in Selenium.

    Javascript has a printf, but only for files, not for strings.
    (Maybe there is a way to regards strings as files, but my Javascript competence is not good enough for that.)

    I found Javascript code, that implements sprintf.

    (I only introduced “Selenium” in the title, because the Tweet created from a title w/o it looks stupid.)