Category: PDF

  • pdfguru – subscription trap – I got my money back

    I used pdfguru a couple of times for getting “image PDFs” OCR-ed. Nice service actually. Cost me € 0.99 per week. Sort of really, really cheap.

    But then they charged me € 49.99, and I found out, they thought, I opted for a monthly ongoing subscription. That’s called subscription trap, in German: “Abofalle”. I searched the web.

    That’s mean!

    I contacted my bank. OMG, what a painful process altogether!

    I also contacted pdfguru, and asked them to reimburse me. They replied, that’s impossible. Once charged, they cannot reimburse me.

    My bank accepted the “plea” and started the process. Several days later, pdfguru sent me a mail saying, I got reimbursed, A few days later the transfer showed up my cash account. I am happy.

    BTW: If you are looking for a similar (but …) service, look for “pdf24” and then for OCR. I keep using their web page for that service.

  • PDF OCR

  • “pdftohtml -xml” – only the poppler suite supports “-xml”

    One of my most favourite tools.

    I have been using it for years now – on a daily basis. (I came across it in my local Ruby user group many years ago.)

    Of course it only works on PDF with text.

    Luckily enough there are tools resp. services, that “OCR” your “image PDF”, just in case your PDF file does not include the text it shows as text.

    I am editing the XML result in Emacs with nXML mode, and I developed a RELAX-NG grammar for context sensitive editing of such XML files.
    I am annotating these XML files using specific XML comments.
    For PDF files from several providers I created scripts for automated annotation. (Best case: find lvalue and rvalue together. Most of the time I find at least lvalue.)
    I created scripts to extract the details from those annotations. And they create text, that resembles by (personal home-made / home-maintained) bank statements – so I can “reconcile” them.

    I am processing every bill PDF like that.

    I am processing every contract PDF like that. I guess you understand, how much better it is to read and annotate a text file in place instead of keeping notes outside the source. Yes, that’s of course like inline documentation within programming language source files.

    Just in case anybody reads this and finds it useful: Of course I am able and most willing to provide far more details.

  • error when viewing PDFs in emacs resp. after deleting associated buffers

    the annoying, recurring error message:

    Error running timer `doc-view-display’: (error “Selecting deleted buffer”)

    what to do:

    M-x cancel-timer
    Cancel timers of function: doc-view-display

    Update 2019-11-05 (GNU Emacs 26.3 on Mojave):

    M-x cancel-function-timers
    Cancel timers of function: doc-view-display
  • book: PDF Explained

  • how to print a PDF document together with its filename in its upper left corner?

    This is only a very special case of “how to add text given on the command line to a PDF file?

    If I don’t find a nice and read utility to do this,

    • I will create an image file from the PDF file,
    • I will create another image file from that text string (ImageMagick, GraphicsMagic?!?),
    • and I will finally overlay them (ImageMagick, GraphicsMagic?!?).
    PDF Hacks‘ Hack# 90 describes how to “Superimpose PDF Pages“; there is also a one-liner with pdftk for that:
    $ pdftk mydoc.pdf output mydoc.marked.pdf background watermark.pdf
     
    I need to print like 200 rather similar (1-page) files, which look rather, rather similar, and I don’t want to guess the filename from the contents of the page, so I prefer to print the filename on the same page as the actual contents.

     

    Update 2011-07-04:
    I am using text2pdf (a rather, rather simple tool) for this task (regard this as one logical line):

    $ echo “text to be printed into the upper left corner” | /usr/local/text2pdf/text2pdf -A4 -s10 -v12 | pdftk original_file output file_with_sth_in_its_upper_left_corner background –

  • pdftk – The PDF Toolkit

    Superimpose Pages with pdftk

    pdftk packs iText’s power into a standalone program. Apply a single PDF page to the background of an entire document like so:

    pdftk mydoc.pdf output mydoc.marked.pdf background watermark.pdf

    pdftk will use the first page of watermark.pdf, if it has more than one page. You can combine this background option with additional input operations (such as assembling PDFs [Hack#51]) and other output options (such as encryption [Hack #52]).

  • xpdf: Error (…): Missing ‘endstream’

    There are a few PDF documents around here, that I can read with Acrobat Reader w/o problems, but xpdf and its companions moan. I guess, that’s because they got modified and a little destroyed using Acrobat X Pro.

    I used pdftk to get rid of that problem: first output/uncompress, than output/compress again:

    $ pdftk x.pdf output
        x.uncompressed.pdf uncompress
    $ pdftk x.uncompressed.pdf output
        x.recompressed.pdf compress

  • how to burst a PDF document into single pages (etc.)

    This command line shows, how to get the output files named your way:

    $ pdftk … burst output ‘page.%02d.pdf’

    Split Select Pages from Multiple PDFs into a New Document:

    $ pdftk A=one.pdf B=two.pdf cat A1-7 B1-5 A8 output combined.pdf

    Select a single page (#130) into a new document:
    $ pdftk A=one.pdf cat A130 output one.p130.pdf
    Extract pages 10 through 11 to y.pdf :
     
    $ pdftk x.pdf cat 10-11 output y.pdf

    Please find more information (like examples, man page, …) on pdftk through the link above!