Category: web crawling
-
automating & scraping the Web with JavaScript and Puppeteer
https://codeburst.io/a-guide-to-automating-scraping-the-web-with-javascript-chrome-puppeteer-node-js-b18efb9e9921 https://developers.google.com/web/tools/puppeteer/ https://github.com/puppeteer/puppeteer/
-
“Apache Nutch” is a highly extensible and scalable open source web crawler software project
https://en.wikipedia.org/wiki/Apache_Nutch https://en.wikipedia.org/wiki/Web_crawler https://nutch.apache.org – official website https://wiki.apache.org/nutch – official wiki https://wiki.apache.org/nutch/Nutch2Crawling – “a description of the crawling jobs and field to database mappings” https://www.amazon.de/dp/1590596870 – apress: Building Search Applications with Lucene and Nutch https://www.amazon.de/dp/1783286857 – PACKT: Web Crawling and Data Mining with Apache Nutch (2017-06-20: PACKT do not list this product any longer – but still date 2014-… and available “somewhere”) https://www.amazon.de/dp/1156025532 –…
-
Scrappy – The All Powerful Web Spidering, Scraping, Creeping Crawling Framework | Al Newkirk [blogs.perl.org]
Scrappy – The All Powerful Web Spidering, Scraping, Creeping Crawling Framework | Al Newkirk [blogs.perl.org]: ‘via Blog this’
-
CPAN: Scrappy – The All Powerful Web Spidering, Scraping, Creeping Crawling Framework
Scrappy – metacpan.org: “Scrappy – The All Powerful Web Spidering, Scraping, Creeping Crawling Framework”
-
Ronny Harbich: Amaryllis – Webcrawling (in German)
Amaryllis – Webcrawling – “Die Erschließung des Webs”
-
web harvesting and my toolkit JHwis
I implemented a toolkit years ago, that I call JHwis. Now and then I think, I should have do more advertising for it. I have been using software created by that toolkit for downloading bank account statements and other stuff for years now. I would like to prove you, it’s also very well suited for…