Category: DocBook

  • my new page scraping assignment – getting familiar again with my toolkit

    For my new page scraping assignment I thought for a while of trying a much more modern approach.
    That actually kept me from really starting it for quite a couple of weeks now, because it seemed so very tedious and I thought, I don’t have like 3 shots for it.
    This week I thought about going with my own old approach and about making use of the state-of-the-art technology at a (slightly) later stage. That should work.
    So where is my software and where is my documentation?
    • I remember, I had left a link here at my Aleph-Soft.com website
    • that leads me to my slightly more extensive dedicated article
    • of course, while I read it, I switch to the sources of that article, so that I can improve the article “en passent”; OMG: running that DocBook website toolchain even works after at least a year or so! I’m amazed. well, not updating software does have some positive side-effects.
    • does LiveHTTPHeaders still work with my current Firefox? LiveHTTPHeaders is one of the reasons I still keep my Firefox updated, although I chose Chromium as my main browser on all platforms (*** bookmark ***)
    • what about its cousin ieHTTPHeaders for IE? WTF, where does it actually live and get maintained? alright, I assume Jonas Blunck is the creator and maintainer
    • is there anything like *HTTPHeaders for Chrome/Chromium? that would be nice; I would have to make my respective tool read its logfile then
    • creating a perl script from LiveHTTPHeaders’s log file still works
    • integrated that perl script into my framework for that kind of stuff
    • download the root HTML page, parsing it, extracting the 1st few bits of information wanted
    • download the 1st linked page; the navigation doesn’t go further / deeper than this
    • TBD: extract the information details from that linked page; CAVEAT: there is an optional intermediate (“region”) level within that page
    (This article is getting extended and updated these days in early November 2011.)
  • web-sites created with DocBook Website and the trolls all over

    A few trolls get along in mailing lists, trying to bash me for the web-sites I created with DocBook Website.

    I guess they are in the middle of their adolescence, but they let everybody know, that the HTML is shitty.

    WTF do they care?
    • It looks impressive to ordinary people and potential customers.
    • It’s trivially created for somebody with the right know-how and easily maintained, w/o a fat CMS underneath,
    • and a pimpled PHP programmer alongside.

    Take this rule for serious:

      Don’t you let the cheap little creeps get at your nerves!!!

        But still: they always attempt to steal your time and energy with their lousy behavior.

        Update / 2010-08-02:
        This discussion is quite similar to this one: high-order languages vs. assembly languages.
        These guys, that are into assembly languages often argue, that hand-written machine code is so much nicer than the one generated by compilers from high-order language code.
        WTF asks that question and who wants to know? Not me. So R.I.P.!

      • weird error messages during the DocBook Website compilation batch…

        Sometimes you really didn’t change anything crucial, and you find the new error messages just too obscure, then do this:
        $ make clean
        $ make realclean
        And most of the time the error messages ar gone. That worked at least for me.

      • using SVG for graphics within HTML generated from DocBook Website

        I learned the hard way, that SVG graphics must get referenced via EMBED, not via IMG. I do that now.
        But still…
        I created a “generic logo”. It’s white on a transparent background, the real background determined by the context. That’s the idea. But I found this on the web – now I am confused:

        How do I set the background color of an SVG image?

        Sadly, SVG does not support directly specifying an image background color. With aiSee, however, you can easily work around this drawback by artificially enlarging the layout plane as follows: Open the SVG file with a text editor and manually adjust the four values of the viewBox attribute. This attribute is to be found in the third line of the SVG file.

        The idea is to share this logo with the DocBook community. So far all new DocBook Websites are branded NM like Norman Walsh, that’s because he started that software. I asked him for the sources of the logos a couple of days ago, but he couldn’t find them, and they were GIMP XFC anyway, and not scalable as SVG. SVG is the hit IMO. I thought I should mention this: I am using O’Reilly’s SVG Essentials, that’s IMO a great book, and you can read and print it for free on their O’Reilly Commons wiki.

         

      • this was a rather good day DocBookWebsite’ing

        Migrated all relevant web-sites to DocBook Website. Simple HTML from DocBook documents looks quite a little different and not so appealing. I can only recommend using DocBook Website.That’s another good reason, why I want to spread the good word of DocBook in Berlin. Have a look at the block “my most exciting web-sites” in the right column here!
        Update / 2010-07-15:
        If goals are easy to achieve, you don’t delay them for very long, you just do them. With DocBook Website things are easy to achieve, and I keep simplifying, improving, and renovating my web-site(s). I guess, the need to change will saturate rather sooner than later.
        Hayek.name is now as short and as nice, as it has never been before.
        BTW: Now I removed xmllint‘ing my documents from my Makefile, as it kept spitting out weird messages. Alright, I agree, that sounds rather silly, but … I use nxml-mode, and that constantly validates my documents, so I assume, I am on the safe side.

        The next step:
        Continuous Integration. That means, each web-sites gets recompiled, as soon as its sources got modified. Pretty cool stuff, serious!
        For now I am using the Unix batch command. Pretty neat as well, of course, but not as neat as Continuous Integration, that’s for sure.

      • DocBook Website

        All the relevant pointers in one place:

        • Norman Walsh’s example on SourceForge – enjoy it!
          (IIRC he says, you shouldn’t regard the information contained in there as up-to-date)
        • the example within the Website release on SourceForge is slightly more extensive
        • the release notes for the current release on SourceForge
        • Bob Stayton’s book DocBook XSL: The Complete Guide, chapter 31. Website
          (I do own the dead tree version of it, and if I got a bundle price for the PDF, I would go for it)
        • searching Bob Stayton’s book: “site:www.sagehill.net sitemap
        • Dave Pawson’s article How to use the DocBook Website system
        • searching Dave Pawson’s article: “site:www.dpawson.co.uk website sitemap

        DocBook, The Definitive Guide (the book) (I honestly do own various versions of it):

        • http://docbook.org/tdg/ – the book’s website
        • http://docbook.org/tdg5/ – the new book’s website (DocBook 5, The Definitive Guide)
        • http://docbook.org/tdg5/en/html/docbook.html – the new book as HTML
        • http://docbook.org/tdg51/en/html/docbook.html – the book in progress as HTML
        • http://docbook.org/tdg51/en/html/variants.html#s.variants – Website only gets mentioned there
        General pointers:
        • http://wiki.docbook.org

        Update / 2010-07-24:

          I found the mailing list docbook-apps hosted on oasis-open.org very, very valuable.
          I read the mailing list archive via my newsreader at news.gmane.org.
          Not that you want to know that, but my newsreader is Gnus.
        • DocBook Berlin – created the Google Group for the 1st regional DocBook User Group

          I am very excited about this.

          There is an exploding number of web views on that Google Group.
          Keep your fingers crossed, that there will be frequent activities soon.

          DocBook Website is going to revolutionise the activities necessary to set up static and almost static web-sites, it will not stay the gold mine for a few – it will be affordable for the many.

          DocBook Slides is the new way to create slides, forget everything before!

          Tell everybody about this user group in Berlin! Join “us” for learning and helping each others!

        • editing XML documents in emacs using nxml-mode

          One good reason for not not authoring in XML is not having a suitable editor or IDE. I personally use and recommend emacs and James Clark‘s nxml-mode. I create and modify all sorts of XML documents this way. If you supply nxml-mode with the right schema for your document, nxml-mode can even help you with tag completion and document validation. nxml-mode makes use of schemas in RELAX-NG, co-created by James Clark. RELAX-NG schemas are rather easily created, if not yet just available, as for DocBook, DocBook Website, DocBook Slides, and many, many other XMLs.

          Trang” is your tool for creating a RELAX-NG schema:

          • if you want to convert a DTD into a RELAX-NG schema,
          • if you want to derive a RELAX-NG schema from a couple of XML files of a specific kind,

          I have done that a dozen times, it does work.

          Here you find nxml-mode’s manual page.

          Your mileage may vary …
        • creating my 1st DocBook Website web-sites

          I have the DocBook XSL book in front of me (opened at “Chapter 31. Website“), asking myself and the world (irc://irc.freenode.net#docbook) silly questions, like the ones, you can find as my recent articles on this blog.

          I want to change a couple of pretty raw vanilla DocBook web sites to pretty raw but neat vanilla DocBook Website web sites during the next couple of hours. There is other work to complete pretty soon, so I rather complete this thing now.

          I am using docbook-website-2.6.0/example from Sourceforge (look around here!! ((FIXME))) (of course as example-JH-0, so I can always diff to the origin).
          Their Makefile-example.txt is now my Makefile, I just had to adapt DOCBOOK_WEBSITE and XSLT:

          • DOCBOOK_WEBSITE=/usr/local/docbook-website-2.6.0
          • XSLT=xsltproc


          Try this:

          $ make clean
          $ make realclean
          $ make depends
          $ make

          Update / 2010-07-14:
          I have made pretty good progress during the last couple of days.
          I converted a couple of plain DocBook web-sites (HTML!!) to DocBook Website, and I rather sense some satisfaction there. You can find those web-sites right here in the right column listed as my most exciting web-sites. Sorry for the bragging, but the Website guys did a rather good job, so even me cannot spoil that a lot.
          Right, and a web-designer mate of mine will show up on Friday, and we are going to discuss the replacement logos for all the NDW-logos around there. And a very big “thank you!!!” here to Norman for all his good work!!!

        • DocBook Website – where to get the Relax-NG schema from?

          I found it at SourceForge.
          I really love editing XML in Emacs’s nxml-mode. I did mention that at my DocBook Wiki home page already.