Category: XPath
-
an XQuery recipe: generating lots of documents in a single XQuery run …
http://www.gnu.org/software/qexo/XQ-Gen-XML.html – search there for “Generate all the HTML output files“! … by putting them in a single large XML object – then use a post-processor to split this into separate files. (Alright this isn’t really a true “single XQuery run” approach. But it is close enough.) With Saxon-HE there is no way to write to separate text…
-
Xidel – yet another HTML/XML/JSON data extraction tool
Xidel is a command line tool to download html/xml pages and extract data from them using CSS 3 selectors, XPath 3 expressions or pattern-matching templates. http://www.videlibri.de/xidel.html https://en.wikipedia.org/wiki/XQuery – I am “watching” the changes on this article, and somebody just added Xidel, that’s how I came across Xidel Cygwin’s and Fink’s repository do not have Xidel, but Xidel’s…
-
XQuery …
https://en.wikipedia.org/wiki/XQuery http://www.xml.com/pub/a/2005/03/02/xquery.html – Bob DuCharme: Getting Started with XQuery http://shop.oreilly.com/product/0636920035589.do – O’Reilly Media book on XQuery, 2nd Edition https://en.wikibooks.org/wiki/XQuery https://www.w3.org/XML/Query/#implementations https://en.wikipedia.org/wiki/Saxon_XSLT – “Saxon is an XSLT and XQuery processor …“ https://sourceforge.net/projects/saxon/files/ https://sourceforge.net/projects/saxon/files/Saxon-HE/9.7/readme97.txt http://www.saxonica.com/documentation/index.html#!about/installationjava http://www.saxonica.com/documentation/documentation.xml http://www.saxonica.com/html/documentation/using-xquery/ http://www.saxonica.com/html/documentation/using-xquery/commandline.html
-
O’Reilly Media book: XQuery, 2nd Edition
http://shop.oreilly.com/product/0636920035589.do http://www.datypic.com/books/xquery/examples.html – sample queries and XML files http://www.xqueryfunctions.com I am executing the examples with Saxon (“Home Edition”), e.g.: $ saxon net.sf.saxon.Query example0105.xqy It does help though to have a more serious look at Saxon’s “using XQuery” (on the command line) documentation. There are quite a few command line options, and some of them may be…
-
converting a Jenkins CI job’s config.xml to several flat files (.properties, .sh, .bat, …)
Over time Jenkins jobs can grow into something “a little confusing”, in other words: like cancer. The Jenkins developers were thoughtful enough to provide an API to all the data structures, that Jenkins and its jobs operate on, so we are able to export an entire Jenkins job as XML. You certainly do not want…
-
once you are getting familiar with XPath and XMLStarlet, you are using it for rather “ordinary tasks”
http://xmlstar.sourceforge.net https://www.cygwin.com – “Get that Linux feeling – on Windows“ https://cygwin.com/cgi-bin2/package-grep.cgi?grep=xmlstarlet http://www.finkproject.org – “The Fink project wants to bring the full world of Unix Open Source software to Darwin and Mac OS X. …“ http://pdb.finkproject.org/pdb/package.php/xmlstarlet Areas, where you will want to make use of XPath expressions and xmlstarlet in order to extract details: HTML web pages –…
-
using XPath on non-XML HTML – how to tidy dirty HTML?
Scraping HTML using XPath is far nicer than through low-level text processing. But how to proceed, if your XPath tool cannot deal with the HTML, because it is not XHTML conform resp. properly formatted XML? My XPath tool is XMLStarlet: And it can also help reformatting HTML, so that XPath expressions can get applied. I…
-
Q: how to get updates from web pages w/o RSS feed? A: XPath + cron or Jenkins job
sadly enough even now in 2016 a lot of web pages are not XHTML conform, but getting them fairly conform is not that expensive: use “xmlstarlet fo –html –recover“ get the (cron or) Jenkins job to save the current page content in the job’s workspace let the Jenkins job compare the current to the last…
-
Jenkins: how to authenticate as a scripted client?
https://wiki.jenkins-ci.org/display/JENKINS/Authenticating+scripted+clients To make scripted clients (such as wget) invoke operations that require authorization (such as scheduling a build), use HTTP BASIC authentication to specify the user name and the API token. This is often more convenient than emulating the form-based authentication. The article quote above mentions “buildToken“, but I don’t need it at all. The…