Category: XML
-
rnc-mode: Emacs mode to edit Relax-NG Compact files
- https://elpa.gnu.org/packages/rnc-mode.html
- https://elpa.gnu.org – GNU Emacs Lisp Package Archive
-
XmlStarlet User’s Guide and IBM link regarding PYX
- http://xmlstar.sourceforge.net/doc/UG/xmlstarlet-ug.html
- http://www-106.ibm.com/developerworks/xml/library/x-matters17.html – does no longer work as of 2017-10-02 (probably quite a little earlier already)
- http://www.ibm.com/developerworks/xml/library/x-matters17.html – does no longer work as of 2021-08-31 (probably quite a little earlier already)
- https://www.ibm.com/developerworks/xml/library/x-matters17/x-matters17-pdf.pdf – this is the actual and relevant document (broken link as of 2021-08-31)
- https://duckduckgo.com/?q=xml+pyx
-
the Cygwin packages that I need most seriously on a Windows PC
my package list:
- https://cygwin.com/packages/x86_64/wget
- https://cygwin.com/packages/x86_64/unzip
- https://cygwin.com/packages/x86_64/git
- https://cygwin.com/packages/x86_64/curl
I can install these ones through apt-cyg (see below!) (maybe git, … from the list above as well):
- https://cygwin.com/packages/x86_64/diffutils
- https://cygwin.com/packages/x86_64/perl
- https://cygwin.com/packages/x86_64/perl-debuginfo
- – solves the “Tie::Hash::NamedCapture” problem
- https://cygwin.com/packages/x86_64/python2
- https://cygwin.com/packages/x86_64/python3
- https://cygwin.com/packages/x86_64/ruby
- https://cygwin.com/packages/x86_64/openssh
- https://cygwin.com/packages/x86_64/rsync
- https://cygwin.com/packages/x86_64/xinit – X.Org X server launcher (Cygwin/X)
- https://cygwin.com/packages/x86_64/xmlstarlet – XMLStarlet is a command line XML toolkit which can be used to transform, query, validate, and edit XML documents using a simple set of shell commands
- https://cygwin.com/packages/x86_64/poppler – includes pdftohml (my pdf2xml converter)
- https://cygwin.com/packages/x86_64/procps-ng – includes watch
- https://cygwin.com/packages/x86_64/jq/ – a lightweight and flexible command-line JSON processor
- https://cygwin.com/packages/x86_64/psmisc – provides pstree
- https://cygwin.com/packages/x86_64/getent
- https://cygwin.com/packages/x86_64/qterminal
- https://cygwin.com/packages/x86_64/xkill
- https://cygwin.com/packages/x86_64/vim
And certainly I do need apt-cyg 😎 — it has its own article here:
- http://Jochen.Hayek.name/wp/blog-en/2018/07/04/apt-cyg/
- http://Jochen.Hayek.name/wp/blog-en/category/cygwin
apt-cyg wants to see itself on PATH.
-
XML Shell: xmlsh
- http://www.xmlsh.org
- https://github.com/xmlsh/xmlsh1_3
- they also provide a Java .jar file and a shell script etc around it, so you can run it on your platform
- http://www.xmlsh.org/CommandCsv2xml – builtin csv2xml, funny!
- http://www.xmlsh.org/CommandRngconvert – that’s trang, dealing with “Relax NG“
examples for csv2xml:
$ csv2xml -header my_file.csv
$ csv2xml -header -attr my_file.csv
# for CSV files created by Excel in Germany:
$ csv2xml -delim ‘;’ -header -attr my_file.csv
With -header the header line with its column names is being used to name the tags around the specific “cells” in XML.
If you prefer to use attributes for the column names, use -attr!
-
your ODF file (“.odt”, “.ods”, …) and its “modified” timestamp
- https://en.wikibooks.org/wiki/XML_-_Managing_Data_Exchange/OpenOffice.org_%26_OpenDocument_Format
- https://github.com/JochenHayek/misc/blob/master/using_timestamps_in_filenames/create_snapshot_from_ODF.sh
- https://en.wikipedia.org/wiki/XMLStarlet
Your “.odt” (or “.ods”) file is a ZIP file with a meta.xml inside:
$ unzip -l YOUR.ods … … meta.xml …
This is a convenient way to extract meta.xml to STDOUT:
$ unzip -p YOUR.ods meta.xml …
This is how to get the XML reformatted using xmlstartlet:
$ unzip -p YOUR.ods meta.xml | xml fo
This command line shows you the possible XPath expressions:
$ unzip -p YOUR.ods meta.xml | xml el … office:document-meta/office:meta/dc:date …
How to extract “modified” to STDOUT?
$ unzip -p YOUR.ods meta.xml | xml sel --template --value-of office:document-meta/office:meta/dc:date
And how to extract the timestamp w/o anything but decimal digits?
$ unzip -p YOUR.ods meta.xml | xml sel --template --value-of office:document-meta/office:meta/dc:date | tr -d ':TZ-' | perl -pe 's/^(.*)..*$/$1/'
…
-
xmlstarlet: how to deal with a default namespace in XPath expressions?
I ran into this problem, when I tried to extract values from JasperReport’s JRXML using xmlstarlet. JRXML files introduce a default namespace (which does not seem to serve a lot), and XPath processors need to take that into account.
When I searched for help in xmlstarlet’s documentation and on the web, I came across this very helpful article:
Once your XML has a default namespace, you have to use that default namespace with every node (that has no explicit namespace part) in your XPath expressions.
As the article above explains, for “xml sel” each such node …
- … either needs to make use of “_” as name of the default namespace:
# TBD: use a true JasperReports example! $ xml sel -t -v '/_:publications/_:magazine' 2-023_node-sets.xml
- or you assign an explicit name (let’s say “a”) to your default namespace on your xmlstarlet command line, and then you have to make use of that “explicit default namespace”:
# TBD: use a true JasperReports example! $ xml sel -N a:'http://jasperreports.sourceforge.net/jasperreports' -t -v '/a:publications/a:magazine' 2-023_node-sets.xml
CAVEAT: The XPath expressions created by “xml el” do not deal with a default namespace, so you have to adapt them yourself in one of the ways explained above.
A one-liner (e.g. in Perl) can help:
$ xml el FILE.xml | fgrep /subreportExpression | perl -n -a -F'///' -e 'print "_:",join("/_:",@F)' -
your OOXML file (“.docx”, “.xlsx”, “.vsdx”, …) and its “modified” timestamp
- https://en.wikipedia.org/wiki/Office_Open_XML_file_formats
- https://github.com/JochenHayek/misc/blob/master/using_timestamps_in_filenames/create_snapshot_from_OOXML.sh
- https://en.wikipedia.org/wiki/XMLStarlet
VSDX does not get listed as an OOXML conform file format, but for this purpose (here) we can treat it like one.
Your “.docx” (or “.xlsx”) file is a ZIP file with a docProps/core.xml inside:
$ unzip -l YOUR.docx … … docProps/core.xml …
This is a convenient way to extract docProps/core.xml to STDOUT:
$ unzip -p YOUR.docx docProps/core.xml …
This is how to get the XML reformatted using xmlstartlet:
$ unzip -p YOUR.docx docProps/core.xml | xml fo
This command line shows you the possible XPath expressions:
$ unzip -p YOUR.docx docProps/core.xml | xml el … cp:coreProperties/dcterms:modified …
How to extract “modified” to STDOUT?
$ unzip -p YOUR.docx docProps/core.xml | xml sel --template --value-of cp:coreProperties/dcterms:modified
And how to extract the timestamp w/o anything but decimal digits?
$ unzip -p YOUR.docx docProps/core.xml | xml sel --template --value-ofcp:coreProperties/dcterms:modified | tr -d ':TZ-'
…