https://wp.jochen.hayek.name/blog-en/2021/02/02/pdftohtml-xml-2/
"pdftohtml -xml" – only the poppler suite supports that