Category: XPath
-
XmlStarlet User’s Guide and IBM link regarding PYX
- http://xmlstar.sourceforge.net/doc/UG/xmlstarlet-ug.html
- http://www-106.ibm.com/developerworks/xml/library/x-matters17.html – does no longer work as of 2017-10-02 (probably quite a little earlier already)
- http://www.ibm.com/developerworks/xml/library/x-matters17.html – does no longer work as of 2021-08-31 (probably quite a little earlier already)
- https://www.ibm.com/developerworks/xml/library/x-matters17/x-matters17-pdf.pdf – this is the actual and relevant document (broken link as of 2021-08-31)
- https://duckduckgo.com/?q=xml+pyx
-
the Cygwin packages that I need most seriously on a Windows PC
my package list:
- https://cygwin.com/packages/x86_64/wget
- https://cygwin.com/packages/x86_64/unzip
- https://cygwin.com/packages/x86_64/git
- https://cygwin.com/packages/x86_64/curl
I can install these ones through apt-cyg (see below!) (maybe git, … from the list above as well):
- https://cygwin.com/packages/x86_64/diffutils
- https://cygwin.com/packages/x86_64/perl
- https://cygwin.com/packages/x86_64/perl-debuginfo
- – solves the “Tie::Hash::NamedCapture” problem
- https://cygwin.com/packages/x86_64/python2
- https://cygwin.com/packages/x86_64/python3
- https://cygwin.com/packages/x86_64/ruby
- https://cygwin.com/packages/x86_64/openssh
- https://cygwin.com/packages/x86_64/rsync
- https://cygwin.com/packages/x86_64/xinit – X.Org X server launcher (Cygwin/X)
- https://cygwin.com/packages/x86_64/xmlstarlet – XMLStarlet is a command line XML toolkit which can be used to transform, query, validate, and edit XML documents using a simple set of shell commands
- https://cygwin.com/packages/x86_64/poppler – includes pdftohml (my pdf2xml converter)
- https://cygwin.com/packages/x86_64/procps-ng – includes watch
- https://cygwin.com/packages/x86_64/jq/ – a lightweight and flexible command-line JSON processor
- https://cygwin.com/packages/x86_64/psmisc – provides pstree
- https://cygwin.com/packages/x86_64/getent
- https://cygwin.com/packages/x86_64/qterminal
- https://cygwin.com/packages/x86_64/xkill
- https://cygwin.com/packages/x86_64/vim
And certainly I do need apt-cyg 😎 — it has its own article here:
- http://Jochen.Hayek.name/wp/blog-en/2018/07/04/apt-cyg/
- http://Jochen.Hayek.name/wp/blog-en/category/cygwin
apt-cyg wants to see itself on PATH.
-
your ODF file (“.odt”, “.ods”, …) and its “modified” timestamp
- https://en.wikibooks.org/wiki/XML_-_Managing_Data_Exchange/OpenOffice.org_%26_OpenDocument_Format
- https://github.com/JochenHayek/misc/blob/master/using_timestamps_in_filenames/create_snapshot_from_ODF.sh
- https://en.wikipedia.org/wiki/XMLStarlet
Your “.odt” (or “.ods”) file is a ZIP file with a meta.xml inside:
$ unzip -l YOUR.ods … … meta.xml …
This is a convenient way to extract meta.xml to STDOUT:
$ unzip -p YOUR.ods meta.xml …
This is how to get the XML reformatted using xmlstartlet:
$ unzip -p YOUR.ods meta.xml | xml fo
This command line shows you the possible XPath expressions:
$ unzip -p YOUR.ods meta.xml | xml el … office:document-meta/office:meta/dc:date …
How to extract “modified” to STDOUT?
$ unzip -p YOUR.ods meta.xml | xml sel --template --value-of office:document-meta/office:meta/dc:date
And how to extract the timestamp w/o anything but decimal digits?
$ unzip -p YOUR.ods meta.xml | xml sel --template --value-of office:document-meta/office:meta/dc:date | tr -d ':TZ-' | perl -pe 's/^(.*)..*$/$1/'
…
-
xmlstarlet: how to deal with a default namespace in XPath expressions?
I ran into this problem, when I tried to extract values from JasperReport’s JRXML using xmlstarlet. JRXML files introduce a default namespace (which does not seem to serve a lot), and XPath processors need to take that into account.
When I searched for help in xmlstarlet’s documentation and on the web, I came across this very helpful article:
Once your XML has a default namespace, you have to use that default namespace with every node (that has no explicit namespace part) in your XPath expressions.
As the article above explains, for “xml sel” each such node …
- … either needs to make use of “_” as name of the default namespace:
# TBD: use a true JasperReports example! $ xml sel -t -v '/_:publications/_:magazine' 2-023_node-sets.xml
- or you assign an explicit name (let’s say “a”) to your default namespace on your xmlstarlet command line, and then you have to make use of that “explicit default namespace”:
# TBD: use a true JasperReports example! $ xml sel -N a:'http://jasperreports.sourceforge.net/jasperreports' -t -v '/a:publications/a:magazine' 2-023_node-sets.xml
CAVEAT: The XPath expressions created by “xml el” do not deal with a default namespace, so you have to adapt them yourself in one of the ways explained above.
A one-liner (e.g. in Perl) can help:
$ xml el FILE.xml | fgrep /subreportExpression | perl -n -a -F'///' -e 'print "_:",join("/_:",@F)' -
your OOXML file (“.docx”, “.xlsx”, “.vsdx”, …) and its “modified” timestamp
- https://en.wikipedia.org/wiki/Office_Open_XML_file_formats
- https://github.com/JochenHayek/misc/blob/master/using_timestamps_in_filenames/create_snapshot_from_OOXML.sh
- https://en.wikipedia.org/wiki/XMLStarlet
VSDX does not get listed as an OOXML conform file format, but for this purpose (here) we can treat it like one.
Your “.docx” (or “.xlsx”) file is a ZIP file with a docProps/core.xml inside:
$ unzip -l YOUR.docx … … docProps/core.xml …
This is a convenient way to extract docProps/core.xml to STDOUT:
$ unzip -p YOUR.docx docProps/core.xml …
This is how to get the XML reformatted using xmlstartlet:
$ unzip -p YOUR.docx docProps/core.xml | xml fo
This command line shows you the possible XPath expressions:
$ unzip -p YOUR.docx docProps/core.xml | xml el … cp:coreProperties/dcterms:modified …
How to extract “modified” to STDOUT?
$ unzip -p YOUR.docx docProps/core.xml | xml sel --template --value-of cp:coreProperties/dcterms:modified
And how to extract the timestamp w/o anything but decimal digits?
$ unzip -p YOUR.docx docProps/core.xml | xml sel --template --value-ofcp:coreProperties/dcterms:modified | tr -d ':TZ-'
…
-
my 2017 Windows working environment
All these packages resp. utilities do not require Windows admin rights for getting them “installed” – actually they do not need “a Windows system installation”.
Below C:\Users\jhayek I created a couple of subdirectories:
- opt: every package resp. utility has its own subdirectory below there
- bin: some .bat and .sh (BusyBox ash) scripts go there
Packages resp. utilities:
- GNU Emacs
- busybox-w32: includes a shell and a lot of Unix utilities
- Strawberry Perl
- https://ConEmu.github.io — a Windows console alternative, where you can paste text w/o using the mouse 😎
- xmlstarlet
After having worked with this set-up for a couple of days, I have to admit: this is not just a minimalist Unix-ish working environment, but it is rather enjoyable working environment. I do not have the GNU utilities with all their advantages (nice long command line options and lots of features) – but for most purposes the utilities built into busybox-w32 are good enough for my purposes. What a great idea it was to think of “busybox for Windows” a couple of days ago – and actually find “busybox-w32”!!! I had to consider a lightweight alternative of Cygwin, because on my new client’s Windows computers it’s not available.
ConEmu makes busybox-w32 and its shell (the “ash”) even more enjoyable.
GNU Emacs is as good as always – I can’t really describe how sad it is to not have it available in a serious working environment.
Strawberry Perl so far has all the modules, that my utilities need. I am really glad to have that “distribution”.
xmlstarlet is my XPath and XML Swiss Army Knife.
With all these utilities and packages available it’s even quite fun to work on Windows 7 😆
-
XQuery/Inserting and Updating Attributes – Wikibooks, open books for an open world
<!–www.bibleserver.com | 522: Connection timed out
body{margin:0;padding:0}
<!–[if lte IE 9]>/cdn-cgi/scripts/jquery.min.js<![endif]–>
/cdn-cgi/scripts/zepto.min.js<!–
/cdn-cgi/scripts/cf.common.jsError
522
Ray ID: 3a8c5df38cb61583 • 2017-10-05 00:47:29 UTC
Connection timed out
You
Browser
Working
Frankfurt
Cloudflare
Working
www.bibleserver.com
Host
Error
What happened?
The initial connection between Cloudflare’s network and the origin web server timed out. As a result, the web page can not be displayed.
What can I do?
If you’re a visitor of this website:
Please try again in a few minutes.
If you’re the owner of this website:
Contact your hosting provider letting them know your web server is not completing requests. An Error 522 means that the request was able to connect to your web server, but that the request didn’t finish. The most likely cause is that something on your server is hogging resources. Additional troubleshooting information here.