- https://en.wikipedia.org/wiki/Office_Open_XML_file_formats
- https://github.com/JochenHayek/misc/blob/master/using_timestamps_in_filenames/create_snapshot_from_OOXML.sh
- https://en.wikipedia.org/wiki/XMLStarlet
VSDX does not get listed as an OOXML conform file format, but for this purpose (here) we can treat it like one.
Your “.docx” (or “.xlsx”) file is a ZIP file with a docProps/core.xml inside:
$ unzip -l YOUR.docx … … docProps/core.xml …
This is a convenient way to extract docProps/core.xml to STDOUT:
$ unzip -p YOUR.docx docProps/core.xml …
This is how to get the XML reformatted using xmlstartlet:
$ unzip -p YOUR.docx docProps/core.xml | xml fo
This command line shows you the possible XPath expressions:
$ unzip -p YOUR.docx docProps/core.xml | xml el … cp:coreProperties/dcterms:modified …
How to extract “modified” to STDOUT?
$ unzip -p YOUR.docx docProps/core.xml | xml sel --template --value-of cp:coreProperties/dcterms:modified
And how to extract the timestamp w/o anything but decimal digits?
$ unzip -p YOUR.docx docProps/core.xml | xml sel --template --value-ofcp:coreProperties/dcterms:modified | tr -d ':TZ-'
…
Leave a Reply