Author: johayek
-
“Apache Nutch” is a highly extensible and scalable open source web crawler software project
https://en.wikipedia.org/wiki/Apache_Nutch https://en.wikipedia.org/wiki/Web_crawler https://nutch.apache.org – official website https://wiki.apache.org/nutch – official wiki https://wiki.apache.org/nutch/Nutch2Crawling – “a description of the crawling jobs and field to database mappings” https://www.amazon.de/dp/1590596870 – apress: Building Search Applications with Lucene and Nutch https://www.amazon.de/dp/1783286857 – PACKT: Web Crawling and Data Mining with Apache Nutch (2017-06-20: PACKT do not list this product any longer – but still date 2014-… and available “somewhere”) https://www.amazon.de/dp/1156025532 –…
-
Asaf Avidan’s 2017 “into the labyrinth” tour
http://www.asafavidanmusic.com/tour-1/ He will be in Berlin again. I am looking forward to 2017-12-04.
-
MediaWiki: how to extract the wiki text (only) from an article?
https://stackoverflow.com/questions/1625162/get-text-content-from-mediawiki-page-via-api https://stackoverflow.com/questions/33844207/how-to-get-wikipedia-content-as-text-by-api https://www.mediawiki.org/w/api.php?action=help&modules=query The above articles (Stack Overflow) do not exactly answer my question, but they are rather close. The key is using api.php and action=query. Purpose: I want to compare articles of two rather close wikis, and the articles have a common ancestor.
-
ebooks: bookzz… moved to b-ok…
I hope this note will help me finding them again next time. http://www.gadgetswright.com/bookzz-alternatives
-
Synology’s Virtual Machine Manager
https://www.synology.com/en-global/beta/2017_VMM https://www.synology.com/en-global/beta/2017_VMM/ReleaseNotes … models with less than 4GB of memory cannot run virtual machines …
-
OpenOffice, RTF files, “fields”, and the irritating RTF keyword “MERGEFIELD”
https://en.wikipedia.org/wiki/Rich_Text_Format AKA RTF, filename extension “.rtf”; a Microsoft standard https://en.wikipedia.org/wiki/OpenOffice http://shop.oreilly.com/product/9780596004750.do – RTF Pocket Guide http://interglacial.com/rtf/ – RTF Pocket Guide (home page) http://interglacial.com/rtf/emacs/ – emacs RTF mode Once a year my accountants send me RTF files (“.rtf”) with “fields“. “fields” have “names” and “values“. Microsoft Office (presumably) deals properly with displaying and printing the field values. On OS…
-
“The Search for Traces” on Exhibit in Berlin – it documents Jewish heritage sites in Ukraine, Moldova, Romania and Poland
https://vanishedworld.wordpress.com/2017/05/08/the-search-for-traces-on-exhibit-in-berlin/ I attended the vernissage on 2017-05-07, because my cousin pointed me there https://vanishedworld.wordpress.com/book/ – ISBN 978-3-00-048258-8 – the accompanying book – with the shown pictures and short text https://vanishedworld.files.wordpress.com/2017/05/ausstellungstexte-berlin-2017-02-021.pdf – the exhibition booklet as PDF with extended text – the same text as shown on the A4 pages (to be read within the exhibition room) with small pictures…
-
MySQL / MariaDB and “CEST” as server time zone value
My Synology DS713+ had a couple of package updates, MariaDB being one of them. I am running hibiscus-server, which has its own database on a MariaDB (MySQL) server there. 5.5.54-MariaDB – this looks currently like the only relevant updated piece hibiscus-server-2.7.0-1418 mysql-connector-java-6.0.5.jar On restarting that hibiscus server the connection to the MariaDB server keeps failing: detected…
-
dmoztools.net “is an independently created static mirror of dmoz.org”
http://dmoztools.net https://en.wikipedia.org/wiki/DMOZ – as of 2017-04-21 does not mention the life of the DMOZ.org content after the closedown on 2017-03-17 https://www.resource-zone.com – “visit resource-zone to stay in touch with the community” I have no idea, whether dmoztools.net is being maintained actively, but I (still) find DMOZ’s “hierarchical ontology scheme for organizing site listings” very useful. I am…