“pdftohtml” vs. DRM

johayek

15 years ago

A project of mine involves extracting strings and other details from PDF files using “pdftohtml -xml“.

A plain “pdftohtml -xml” refuses to read PDF files with set copy-protection bits set. But if you add “-nodrm” on the command line, it reads them anyway, but it mentions the problem on STDERR.

Share this: