Duck Duck Go Scraper

j_co · July 25, 2022, 9:42pm

Hi!

Where do the functions for the Duck Duck Go scraper come from? E.g. in this code in Lesson 1:

from fastcore.all import *
import time

def search_images(term, max_images=200):
    url = 'https://duckduckgo.com/'
    res = urlread(url,data={'q':term})
    searchObj = re.search(r'vqd=([\d-]+)\&', res)
    requestUrl = url + 'i.js'
    params = dict(l='us-en', o='json', q=term, vqd=searchObj.group(1), f=',,,', p='1', v7exp='a')
    urls,data = set(),{'next':1}
    while len(urls)<max_images and 'next' in data:
        data = urljson(requestUrl,data=params)
        urls.update(L(data['results']).itemgot('image'))
        requestUrl = url + data['next']
        time.sleep(0.2)
    return L(urls)[:max_images]

I can’t seem to find any functions like urlread or urljson in the fastai package, or online, but I’m interested to know more about how they work. Thanks!

bencoman · July 26, 2022, 2:29am

To answer the general case, go to the org level on github (https://github.com/fastai)
then from there search for “def urlread” - quotes inlcuded

HTH

[Edit: Jeremy’s response below is better. I’m falling back to old habits from other systems.]

ForBo7 · July 26, 2022, 6:16am

I think it’s either in the fastai package or the fastcore package; I don’t remember of the top of my head, though it’s probably the latter.

You can try doing from fastcore import * instead
or
from fastai import *

jeremy · July 26, 2022, 6:18am

Type urlread?? in jupyter to get the source code and file location of the symbol. (PS: this is in chapter 1 of the book IIRC so be sure to read it!)

j_co · July 26, 2022, 6:57pm

Thank you! And thanks @bencoman and @ForBo7 too