“Apache Nutch” is a highly extensible and scalable open source web crawler software project
8 years ago
https://en.wikipedia.org/wiki/Apache_Nutch
https://en.wikipedia.org/wiki/Web_crawler
https://nutch.apache.org – official website
https://wiki.apache.org/nutch – official wiki
https://wiki.apache.org/nutch/Nutch2Crawling – “a description of the crawling jobs and field to database mappings”
https://www.amazon.de/dp/1590596870 – apress: Building Search Applications with Lucene and Nutch
https://www.amazon.de/dp/1783286857 – PACKT: Web Crawling and Data Mining with Apache Nutch (2017-06-20: PACKT do not list this product any longer – but still date 2014-… and available “somewhere”)
https://www.amazon.de/dp/1156025532 – LLC Books: Free Search Engine Software: Lucene, Apache Solr, Yacy, Dataparksearch, Nutch, Pubchemsr, Sciencenet, Xapian, Opensearchserver, Grub, Ht—Dig