I made a few updates to the JavaScript URL scraping code snippet:
// Google Images
var urls=Array.from(document.querySelectorAll(".rg_i")).map(el=>el.hasAttribute("data-src")?el.getAttribute("data-src"):el.getAttribute("data-iurl")).filter(l=>l!=null).join("\n");
var a = document.createElement("a");a.download = "filename.txt";a.href = "data:text/csv;charset=utf-8,"+urls;a.click();
// DuckDuckGo Images (uses Bing)
var urls=Array.from(document.querySelectorAll(".tile--img__img")).map(el=>el.hasAttribute("data-src")?el.getAttribute("src"):el.getAttribute("data-src")).filter(l=>l!=null).map(l=>"https:"+l).join("\n");
var a = document.createElement("a");a.download = "filename.txt";a.href = "data:text/csv;charset=utf-8,"+urls;a.click();
// Bing Images
var urls=Array.from(document.querySelectorAll(".mimg")).map(el=>el.hasAttribute("src")?el.getAttribute("src"):null).filter(l=>l!=null&&l.startsWith("http")).join("\n");
var a = document.createElement("a");a.download = "filename.txt";a.href = "data:text/csv;charset=utf-8,"+urls;a.click();
This fixes a few issues in the code snippet from lesson2-download.ipynb:
- the
escape()
function was escaping the the new line - filters out the blank lines that were being added every 100 lines or so
- you can now provide a file name in
filename.txt
instead of it being downloaded todownload
- uses anchor tag instead of
window.open()
to be more compatible with most browsers
Hope you find useful!