Blog

  • working through elance.com

    what do you think about working through elance.com as a “provider?

  • Spidering Hacks – O’Reilly Media

    CHAPTER ONE


    Hack #2 – Best Practices for You and Your Spider    


    Be Liberal in What You Accept
    … This is an inexact science, to put it mildly. …

    Monitor your spider’s output on a regular basis to make sure it’s working as expected [Hack #31], make the appropriate adjustments as soon as possible to avoid losing ground with your data gathering, and design your spider to be as adaptive to site redesigns [Hack #32] as possible.

    Don’t Reinvent the Wheel

    • Best Practices for You

    If you must scrape HTML, do so sparingly. If the information you want is avail- able only embedded in an HTML page, try to find a “Text Only” or “Print this Page” variant; these usually have far less complicated HTML and a higher content-to-presentation markup quotient, and they don’t tend to change all that much (by comparison) during site redesigns.
    Hack #4 – Registering Your Spider
    By the way, you might think that your spider is minimal or low-key enough that nobody’s going to notice it. That’s probably not the case. In fact, sites like Webmaster World (http://www.webmasterworld.com) have entire forums devoted to identifying and discussing spiders. Don’t think that your spider is going to get ignored just because you’re not using a thousand online servers and spidering millions of pages a day.
    Naming Your Spider
    … There are web sites, like http://www.iplists.com, devoted to tracking IP addresses of legitimate spiders. …
    Hack #5 – Preempting Discovery
    No matter how gentle and polite your spider is, sooner or later you’re going to be noticed. Some webmaster’s going to see what your spider is up to, and they’re going to want some answers.

    Hack #6 – Keeping Your Spider Out of Sticky Situations Hack
    Bad Spider, No Biscuit!
    … There is nothing stopping a disgruntled site from revising its TOS to deny a spider’s access, and then sending you a “cease and desist” letter. … Spidering another site’s content and reappropriating it into your own framed pages is bad. Don’t do it. …
    Competitive IntelligenceSome sites complain because their competitors access and spider their data—data that’s publicly available to any browser—and use it in their com- petitive activities. You might agree with them and you might not, but the fact is that such scraping has been the object of legal action in the past. Bid- der’s Edge was sued by eBay (http://pub.bna.com/lw/21200.htm) for such a spider. …
    Possible Consequences of Misbehaving Spiders
    … But considering lawyer’s fees, the time it’ll take out of your life, and the monetary penalties that might be imposed on you, a lawsuit is bad enough, and it’s a good enough reason to make sure that your spiders are behaving and your intent is fair.
    CHAPTER TWO
    Assembling a Toolbox

    Hacks #8–32



    Chapter 4 Gleaning Data from Databases

    Hack #69 – Aggregating RSS and Posting Changes
    -> meta feeds, aggregating feeds, …

  • jobs.perl.org needs a couple of changes — let’s start brain storming!

    For me as a freelancer it’s very clear:

    • There must be separate feeds for freelance and salaried staff.
    • There should be an opportunity of commenting on the job postings, e.g. if the original poster doesn’t close the job, it makes sense to get that information from somebody else, maybe from somebody who was somehow involved. Yes, that cannot happen anonymously.

    What else?

    Yes, I tried to contact Ask
    Bjørn Hansen
    at ask(AT)perl.org before I started this here, but to no success.

  • “Senior Software Engineer – Perl” / “Germany, Karlsruhe” / “Pay rate: 70,00 €/h” / CLOSED

    “Closed”, so the recruiter says.
    What a pity, that comments on job postings on jobs.perl.org are not possible.

  • “Tour de Babel” by Steve Yegge

    […]
    My whirlwind tour will cover C, C++, Lisp, Java, Perl, (all
    languages we use at Amazon), Ruby (which I just plain like), and Python,
    which is in there because — well, no sense getting ahead of ourselves,
    now.

    […]

  • “A Quick Tour of Ruby” by Steve Yegge

    Very nice to read.

    Ruby used to annoy me simply by existing. I first heard about Ruby
    years ago, in maybe 1997 or 1998, and folks said it was kind of like
    Perl, but “cleaner”, whatever that meant. Ruby fans back then seemed
    like a tiny minority of rebels and fringe separatists.

    Ruby irked me primarily because we already had Perl, which was
    working just fine thank you very much. And if for some strange reason
    you didn’t like Perl, we had Python. If Perl fans were dog owners, and
    Python fans were cat owners, then Ruby fans seemed like ferret owners.
    They could go on and on about how much they adored their
    beady-eyed albino stretch-limo rats, and how cute they were,
    but we all knew they were just looking for attention. Nobody really
    wants a pet rat. (Ferret owners will correct me and say they’re not
    rodents; they’re more closely related to weasels and skunks. As if that
    helps.) Regardless, I didn’t want to have anything to do with Ruby.

    Last year, though, I was looking at a bunch of different languages
    in the hopes of finding one to replace Perl for small- to medium-sized
    tasks. One day my magic Perl dust had worn off rather suddenly, and I’d
    joined the growing ranks of people who were beginning to notice the
    emperor was a wee bit underdressed. But all the alternatives to Perl
    looked pretty bad themselves, and I started judging languages by how far
    I’d get into the reference manual before throwing it across the room.

    I eventually picked up a Ruby book — …

    Steve …’s home page.

    I personally keep loving both of them. I can afford that in the comp.lang.* area and in some others as well, but that doesn’t concern my girl-friend, of course.

    I actually came across Steve, when I searched for elisp.

  • iPhone apps, that I need sooner or later

  • how to avoid to accidentally Quit Firefox?

    Is there any config. variable?
    There is a checkbox labeled “Warn me when closing multiple tabs”. That does the job.

  • first steps in IRC with pidgin

    1. “Add Account”  for each IRC server/user pair (e.g. irc.freenode.net), that you want to use, within pidgin with IRC as protocol
    2. Join a Chat” (below Buddies), select the right Account (i.e. one of your IRC protocol/server accounts), enter the Channel (including the ‘#’), leave the Password blank! here we are!

    Did I mention recently, how much I love my pidgin?
    I did all this with a (fink) pidgin on my MacBook running Snow Leopard (OS X), but I don’t doubt, it will also run on my openSUSE Samsung notebook.

  • networks and logos

    Where do you get to the personalised logos resp. badges of misc. networks:

    To be continue …