“Apache Nutch” is a highly extensible and scalable open source web crawler software project

https://en.wikipedia.org/wiki/Apache_Nutch https://en.wikipedia.org/wiki/Web_crawler https://nutch.apache.org – official website https://wiki.apache.org/nutch – official wiki https://wiki.apache.org/nutch/Nutch2Crawling – “a description of the crawling jobs and field to database mappings” https://www.amazon.de/dp/1590596870 – apress: Building Search Applications with Lucene and Nutch https://www.amazon.de/dp/1783286857 – PACKT: Web Crawling and Data Mining with Apache Nutch (2017-06-20: PACKT do not list this product any longer – but still date 2014-… and available “somewhere”) https://www.amazon.de/dp/1156025532 –… Continue reading “Apache Nutch” is a highly extensible and scalable open source web crawler software project