https://wp.jochen.hayek.name/blog-en/2017/06/20/apache-nutch/
"Apache Nutch" is a highly extensible and scalable open source web crawler software project