“pdftohtml” – the one PDF utility I cannot “be” without

I actually mean “pdftohtml -xml” – which creates XML from PDF, and this is my command line:

$ pdftohtml -xml -i -nomerge -hidden FILE.pdf

resp.:

$ pdftohtml -xml -i -nomerge -hidden FILE.pdf FILE.pdftohtml.xml

Sometimes I need to run “pdftohtml -xml” (on the command line) on a file living on my NAS – it is really an essential utility for me.

CAVEAT: Be sure you have poppler-utils installed, not xpdf – xpdf’s pdftohtml is far outdated (their numbering schemes are different):

root@DiskStation:~# /opt/bin/opkg install xpdf
Installing xpdf (4.00-1) to root...
Downloading http://pkg.entware.net/binaries/x86-64/xpdf_4.00-1_x86-64.ipk
Configuring xpdf.
root@DiskStation:~# /opt/bin/opkg search /opt/bin/pdftohtml
xpdf - 4.00-1
root@DiskStation:~# /opt/bin/pdftohtml --help
pdftohtml version 4.00

root@DiskStation:~# /opt/bin/opkg remove xpdf
Removing package xpdf from root...
root@DiskStation:~# /opt/bin/opkg install poppler-utils
Installing poppler-utils (0.53.0-1) to root...
Downloading http://pkg.entware.net/binaries/x86-64/poppler-utils_0.53.0-1_x86-64.ipk
Configuring poppler-utils.
root@DiskStation:~# /opt/bin/opkg search /opt/bin/pdftohtml
poppler-utils - 0.53.0-1
root@DiskStation:~# /opt/bin/pdftohtml --help
pdftohtml version 0.53.0


Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.