web crawling – wp.jochen.hayek.name/blog-en

https://en.wikipedia.org/wiki/Apache_Nutch
https://en.wikipedia.org/wiki/Web_crawler
https://nutch.apache.org – official website
https://wiki.apache.org/nutch – official wiki
https://wiki.apache.org/nutch/Nutch2Crawling – “a description of the crawling jobs and field to database mappings”
https://www.amazon.de/dp/1590596870 – apress: Building Search Applications with Lucene and Nutch
https://www.amazon.de/dp/1783286857 – PACKT: Web Crawling and Data Mining with Apache Nutch (2017-06-20: PACKT do not list this product any longer – but still date 2014-… and available “somewhere”)
https://www.amazon.de/dp/1156025532 – LLC Books: Free Search Engine Software: Lucene, Apache Solr, Yacy, Dataparksearch, Nutch, Pubchemsr, Sciencenet, Xapian, Opensearchserver, Grub, Ht—Dig

Scrappy – The All Powerful Web Spidering, Scraping, Creeping Crawling Framework | Al Newkirk [blogs.perl.org]

Scrappy – metacpan.org: “Scrappy – The All Powerful Web Spidering, Scraping, Creeping Crawling Framework”