“pdftohtml” vs. DRM


A project of mine involves extracting strings and other details from PDF files using “pdftohtml -xml“.

A plain “pdftohtml -xml” refuses to read PDF files with set copy-protection bits set. But if you add “-nodrm” on the command line, it reads them anyway, but it mentions the problem on STDERR.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: