wp.jochen.hayek.name/blog-en

Category: xmlstarlet

XmlStarlet User’s Guide and IBM link regarding PYX
- http://xmlstar.sourceforge.net/doc/UG/xmlstarlet-ug.html
- http://www-106.ibm.com/developerworks/xml/library/x-matters17.html – does no longer work as of 2017-10-02 (probably quite a little earlier already)
- http://www.ibm.com/developerworks/xml/library/x-matters17.html – does no longer work as of 2021-08-31 (probably quite a little earlier already)
- https://www.ibm.com/developerworks/xml/library/x-matters17/x-matters17-pdf.pdf – this is the actual and relevant document (broken link as of 2021-08-31)
- https://duckduckgo.com/?q=xml+pyx
2017-10-02
the Cygwin packages that I need most seriously on a Windows PC
- https://cygwin.com/cgi-bin2/package-grep.cgi
- https://cygwin.com/packages/package_list.html
my package list:
I can install these ones through apt-cyg (see below!) (maybe git, … from the list above as well):
- https://cygwin.com/packages/x86_64/diffutils
- https://cygwin.com/packages/x86_64/perl
- https://cygwin.com/packages/x86_64/perl-debuginfo
- – solves the “Tie::Hash::NamedCapture” problem
- https://cygwin.com/packages/x86_64/python2
- https://cygwin.com/packages/x86_64/python3
- https://cygwin.com/packages/x86_64/ruby
- https://cygwin.com/packages/x86_64/openssh
- https://cygwin.com/packages/x86_64/rsync
- https://cygwin.com/packages/x86_64/xinit – X.Org X server launcher (Cygwin/X)
- https://cygwin.com/packages/x86_64/xmlstarlet – XMLStarlet is a command line XML toolkit which can be used to transform, query, validate, and edit XML documents using a simple set of shell commands
- https://cygwin.com/packages/x86_64/poppler – includes pdftohml (my pdf2xml converter)
- https://cygwin.com/packages/x86_64/procps-ng – includes watch
- https://cygwin.com/packages/x86_64/jq/ – a lightweight and flexible command-line JSON processor
- https://cygwin.com/packages/x86_64/psmisc – provides pstree
- https://cygwin.com/packages/x86_64/getent
- https://cygwin.com/packages/x86_64/qterminal
- https://cygwin.com/packages/x86_64/xkill
- https://cygwin.com/packages/x86_64/vim
And certainly I do need apt-cyg 😎 — it has its own article here:
- http://Jochen.Hayek.name/wp/blog-en/2018/07/04/apt-cyg/
- http://Jochen.Hayek.name/wp/blog-en/category/cygwin
apt-cyg wants to see itself on PATH.
2017-09-25
your ODF file (“.odt”, “.ods”, …) and its “modified” timestamp
Your “.odt” (or “.ods”) file is a ZIP file with a meta.xml inside:
```
$ unzip -l YOUR.ods
…
… meta.xml
…
```
This is a convenient way to extract meta.xml to STDOUT:
```
$ unzip -p YOUR.ods meta.xml
…
```
This is how to get the XML reformatted using xmlstartlet:
```
$ unzip -p YOUR.ods meta.xml | xml fo
```
This command line shows you the possible XPath expressions:
```
$ unzip -p YOUR.ods meta.xml | xml el
…
office:document-meta/office:meta/dc:date
…
```
How to extract “modified” to STDOUT?
```
$ unzip -p YOUR.ods meta.xml | xml sel --template --value-of office:document-meta/office:meta/dc:date
```
And how to extract the timestamp w/o anything but decimal digits?
```
$ unzip -p YOUR.ods meta.xml | xml sel --template --value-of office:document-meta/office:meta/dc:date | tr -d ':TZ-' | perl -pe 's/^(.*)..*$/$1/'
```
…
2017-06-29
xmlstarlet: how to deal with a default namespace in XPath expressions?
I ran into this problem, when I tried to extract values from JasperReport’s JRXML using xmlstarlet. JRXML files introduce a default namespace (which does not seem to serve a lot), and XPath processors need to take that into account.

When I searched for help in xmlstarlet’s documentation and on the web, I came across this very helpful article:
- http://www.howtobuildsoftware.com/index.php/how-do/bp0A/xml-xmlstarlet-querying-value-of-xml-file-using-xmlstarlet-with-namspace
Once your XML has a default namespace, you have to use that default namespace with every node (that has no explicit namespace part) in your XPath expressions.

As the article above explains, for “xml sel” each such node …
- … either needs to make use of “_” as name of the default namespace:
```
# TBD: use a true JasperReports example!

$ xml sel -t -v '/_:publications/_:magazine' 2-023_node-sets.xml
```
- or you assign an explicit name (let’s say “a”) to your default namespace on your xmlstarlet command line, and then you have to make use of that “explicit default namespace”:
```
# TBD: use a true JasperReports example!

$ xml sel -N a:'http://jasperreports.sourceforge.net/jasperreports' -t -v '/a:publications/a:magazine' 2-023_node-sets.xml
```
CAVEAT: The XPath expressions created by “xml el” do not deal with a default namespace, so you have to adapt them yourself in one of the ways explained above.

A one-liner (e.g. in Perl) can help:
```
$ xml el FILE.xml | fgrep /subreportExpression |
perl -n -a -F'///' -e 'print "_:",join("/_:",@F)'
```
2017-02-15
your OOXML file (“.docx”, “.xlsx”, “.vsdx”, …) and its “modified” timestamp
VSDX does not get listed as an OOXML conform file format, but for this purpose (here) we can treat it like one.

Your “.docx” (or “.xlsx”) file is a ZIP file with a docProps/core.xml inside:
```
$ unzip -l YOUR.docx
…
… docProps/core.xml
…
```
This is a convenient way to extract docProps/core.xml to STDOUT:
```
$ unzip -p YOUR.docx docProps/core.xml
…
```
This is how to get the XML reformatted using xmlstartlet:
```
$ unzip -p YOUR.docx docProps/core.xml | xml fo
```
This command line shows you the possible XPath expressions:
```
$ unzip -p YOUR.docx docProps/core.xml | xml el
…
cp:coreProperties/dcterms:modified
…
```
How to extract “modified” to STDOUT?
```
$ unzip -p YOUR.docx docProps/core.xml | xml sel --template --value-of cp:coreProperties/dcterms:modified
```
And how to extract the timestamp w/o anything but decimal digits?
```
$ unzip -p YOUR.docx docProps/core.xml | xml sel --template --value-ofcp:coreProperties/dcterms:modified | tr -d ':TZ-'
```
…
2017-01-18
my 2017 Windows working environment
All these packages resp. utilities do not require Windows admin rights for getting them “installed” – actually they do not need “a Windows system installation”.

Below C:\Users\jhayek I created a couple of subdirectories:
- opt: every package resp. utility has its own subdirectory below there
- bin: some .bat and .sh (BusyBox ash) scripts go there
Packages resp. utilities:
- GNU Emacs
- busybox-w32: includes a shell and a lot of Unix utilities
- Strawberry Perl
- https://ConEmu.github.io — a Windows console alternative, where you can paste text w/o using the mouse 😎
- xmlstarlet
After having worked with this set-up for a couple of days, I have to admit: this is not just a minimalist Unix-ish working environment, but it is rather enjoyable working environment. I do not have the GNU utilities with all their advantages (nice long command line options and lots of features) – but for most purposes the utilities built into busybox-w32 are good enough for my purposes. What a great idea it was to think of “busybox for Windows” a couple of days ago – and actually find “busybox-w32”!!! I had to consider a lightweight alternative of Cygwin, because on my new client’s Windows computers it’s not available.

ConEmu makes busybox-w32 and its shell (the “ash”) even more enjoyable.

GNU Emacs is as good as always – I can’t really describe how sad it is to not have it available in a serious working environment.

Strawberry Perl so far has all the modules, that my utilities need. I am really glad to have that “distribution”.

xmlstarlet is my XPath and XML Swiss Army Knife.

With all these utilities and packages available it’s even quite fun to work on Windows 7 😆
2017-01-17
creating diary entries from my (little) blog articles
- https://codex.wordpress.org/WordPress_Feeds
- https://codex.wordpress.org/WordPress_Feeds#Finding_Your_Feed_URL
1st approach: xmlstarlet with an XPath expression (rss/channel/item/link) + shell wrapper.

2nd approach: XQuery script looping over rss/channel/item (+ shell wrapper). I am glad to upload it to my github area, in case somebody is interested.

Update 2016-08-26: By default WordPress creates an RSS feed with just 10 entries. I don’t create my diary entries (as described above) a couple of times a day, so sometimes it may very well make sense to look back like 20 or more entries. I did not immediately recognise, that figure is a parameter WordPress allows you to set for your blog. I rather came across this forum thread:
- https://wordpress.org/support/topic/how-to-make-rss-summary-feeds-show-300-characters-not-just-50
And I found the “expert’s” opinion rather annoying. Luckily enough I was courageous enough to search for the suspected parameter in the Settings / Reading section. I am glad it’s there.
2016-08-21
after a couple of months using XPath and xmlstarlet I created my first lenghty XQuery script using Saxon9HE

“In production” I have been using a Bash+xmlstarlet script, that I wanted to optimise by rewriting it into an XQuery script. This would be my 1st XQuery script after a couple of months of exercising XQuery through Priscilla Walmsley’s book and Saxon9HE – a couple of hours per week only because of my entire workload.

During the last couple of days I managed to complete the rewriting of my utility. I acquired quite some experience, and now I am rather satisfied with my work – and myself 😎

I learned, how to use regular expressions in XQuery – of course with decades of using regular expressions in various languages as a starting point.

And I learned, how to use decimal number formatting – in 2 different technical “locales” in parallel. Apparently “left white space padding” of decimal numbers is something the creators of XQuery did not genuinely incorporate into the XQuery environment. Decimal number formatting in XQuery is rather different to how you do it in C or Perl, it resembles more how you do it in Fortran or COBOL – and I don’t mean to criticise that approach. All over it is just very, very different – and confusing, if it comes to details.

2016-07-21
XmlStarlet UG on “Studying Structure of XML Document” – “xmlstarlet elements …”
- http://xmlstar.sourceforge.net/doc/UG/xmlstarlet-ug.html#idm47077139665568
- http://xmlstar.sourceforge.net/doc/UG/xmlstarlet-ug.html
I really like “xmlstarlet elements …” – but these very nice and useful XPath expressions come without index. So what if you really, really need them with indexes? I am not sure, you can write a general utility, that rewrites these XPath expressions without index into XPath expressions with index, but for my Jenkins CI context I created a utility (jenkins_xpath_el2eli.pl), that does some limited rewriting of that kind.
2016-07-11
extract your build steps from a Jenkins CI job to ordinary and “easy to edit” flat files
Over time Jenkins jobs can grow into something “a little confusing”, in other words: like cancer.

The Jenkins developers were thoughtful enough to provide an API to all the data structures, that Jenkins and its jobs operate on, so we are able to export an entire Jenkins job as XML. You certainly do not want to edit a Shell script encapsulated within this XML, or a Windows batch script. You are certainly not the first one to need an export facility for this, and certainly a couple of approaches got developed over time. I am trying to collect them here for you and myself. I actually only found Ken Dreyer’s tool in the beginning – but only after I started developing something myself. NIH applies maybe …
- https://github.com/ktdreyer/jenkins-job-wrecker – Ken Dreyer’s jenkins-job-wrecker – converts Jenkins job XML to JJB YAML
Translate Jenkins XML jobs to YAML. The YAML can then be fed into Jenkins Job Builder.

Have a lot of Jenkins jobs that were crafted by hand over the years? This tool allows you to convert your Jenkins jobs to JJB quickly and accurately.
- https://github.com/openstack-infra/jenkins-job-builder – from YAML back to XML
Initially I was / I am only interested in “project/builders” build steps:
- project/builders/hudson.tasks.BatchFile/command
- project/builders/hudson.tasks.Shell/command
- project/builders/EnvInjectBuilder/info/propertiesContent
- project/builders/EnvInjectBuilder/info/propertiesFilePath
I am extracting the bits and pieces with XPath expressions using XMLStartlet within Shell scripts. Every build step goes into its own file, with names derived from the step’s ordinal number within the Jenkins job, have a look at this example:
- 00–___.properties
- 01–___.propertiesFilePath
- 02–___.sh
- 03–___.bat
These “raw names” push you to think about more reasonable names, that will remind you of their meaning and content from then on.

I called my script “jenkins_config2files.sh“.

For finding the XPath expressions it uses “xmlstarlet el config.xml”. The XPath expressions are not indexed though, i.e. you need to add the XPath indexes yourself.

I created a small Perl script, that creates indexed XPath expressions from the output of “xmlstarlet el”. That works for ordinary “project” and also for “matrix-project” Jenkins XML. I will add more, as I will come across them.

I am going to upload my scripts after completing some tidying to my github account within a couple of days. (Maybe you want to remind me!)
2016-07-09