Duckduckgo search not working

Thanks for your time putting this up. This is exactly what worked for me, and the only method that worked for me essentially. Cheers!

Thanks, this was the only solution that worked for me (or at least first one – I’ll stop digging now :laughing:)

What did not work:

  • Upgrading duckduckgo_search to >= 2.9.4, even 3.3.0 failed with HTTP 403 forbidden, although the same URL worked when used within the browser. Might be the User-Agent header that is used in fastai’s util function.
  • Fixing the deprecation warning and using DDGS(), same error in request library call.
  • Restarting the kernel because some caching apparently prevented the usage of the updated/changed library, so using a working library version might still fail (but no version of plain duckduckgo_search fixed my problem).

So thanks again, @maitland :slight_smile:

PS: I did not have to fiddle with the parameter names, max_images worked just fine.

PPS: This is equivalent to what @cmondorf did in his kaggle notebook linked above.

2 Likes

Yes @sebkraemer ! restarting the runtime works, I am using colab.

1 Like


Anyone know how to fix this?

If someone Still having Issues with duckduckgo on Kaggle (403 forbidden for me), they can use Google Colab for this. It worked nicely for me.

Hey there, I found a workaround for the duckduckgo search issue.

Here is my notebook that works:

Long story short: there seems to be a problem with version 3.8.5 of duckduckgo library, once I updated to the latest 3.9.5 it started working. Note, that this requires python version >=3.9
In kaggle there is an option in Notebook options I had to change: Always use latest environment (see screenshot)

Hope that helps!

7 Likes

Hi y’all, as of today, the Client error “403 forbidden” occurs for me no matter what: locally, Paperspace Gradient, Colab, Kaggle … Also the workarounds mentioned above and utilizing the alternative context manager implementation like

def search_images_ddg(term, max_images=200):
    with DDGS() as ddgs:
        results = ddgs.images(keywords=term)
        images = [next(results).get("image") for _ in range(max_images)]
        return L(images)

don’t work anymore or make no difference.

Anybody familiar with the API chiming in on what’s going on and/or a (new) fix would be highly appreciated. I pulled too many hairs trying to get it working on a Friday evening :slightly_smiling_face:

I found this GitHub issue which lists a solution to pass the dictionary {"Accept-Encoding": "gzip, deflate, br"} to the headers parameter of DDGS as follows:

def search_images(term, max_images=200):
    with DDGS(headers = {"Accept-Encoding": "gzip, deflate, br"}) as ddgs:
        results = ddgs.images(keywords=term)
        images = [next(results).get("image") for _ in range(max_images)]
        return L(images)

I tested this in a Kaggle notebook and it seems to work:

5 Likes

Can confirm, tested it locally and this solves the issue :+1:

2 Likes

Works perfectly!

1 Like

Use this: Duckduckgo search not working - #38 by iluu

yeah, ddg_images is depricated
you can replace

from duckduckgo_search import ddg_images

def search_images(term, max_images=30):
    print(f"Searching for '{term}'")
    return L(ddg_images(term, max_results=max_images)).itemgot('image')

with

from duckduckgo_search import DDGS

def search_images(keywords, max_images = 30):
    print(f"Searching for {keywords}")
    return L(DDGS().images(keywords,max_results=max_images)).itemgot('image')
8 Likes

This worked for me. Thank you very much!

1 Like

Going into the notebook options and following these settings helped me. I also restarted and cleared my cell output if that helps.

1 Like

Hi all,

I had the same issues with downloading images using Bing/Azure, and I found two versions of the Duckduckgo code. I messed it even more by being blacklisted by ddg a whole day :confused:

It seems that there is an alternate solution using Hugging Face Image API.

It gives something like that:

SEARCH_URL = "https://huggingface.co/api/experimental/images/search"

def get_image_urls_by_term(search_term: str, count=150):
    params  = {"q": search_term, "license": "public", "imageType": "photo", "count": count}
    response = requests.get(SEARCH_URL, params=params)
    response.raise_for_status()
    response_data = response.json()
    image_urls = [img['thumbnailUrl'] for img in response_data['value']]
    return image_urls


def gen_images_from_urls(urls):
    num_skipped = 0
    for url in urls:
        response = requests.get(url)
        if not response.status_code == 200:
            num_skipped += 1
        try:
            img = Image.open(BytesIO(response.content))
            yield img
        except UnidentifiedImageError:
            num_skipped +=1

    print(f"Retrieved {len(urls) - num_skipped} images. Skipped {num_skipped}.")


def urls_to_image_folder(urls, save_directory):
    for i, image in enumerate(gen_images_from_urls(urls)):
        image.save(save_directory / f'{i}.jpg')

I found this snippet here.

Thank you for this, Christian!

Helped me immensely.

Regards,
Beau

Not sure if this is the same thing, but in lesson 1, I ran the cell that contains this code:

#NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.
#    If you get a JSON error, just try running it again (it may take a couple of tries).
urls = search_images('bird photos', max_images=1)
urls[0]

And got this error output:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/tmp/ipykernel_17/2432147335.py in <module>
      1 #NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.
      2 #    If you get a JSON error, just try running it again (it may take a couple of tries).
----> 3 urls = search_images('bird photos', max_images=1)
      4 urls[0]

/tmp/ipykernel_17/1717929076.py in search_images(term, max_images)
      4 def search_images(term, max_images=30):
      5     print(f"Searching for '{term}'")
----> 6     return L(ddg_images(term, max_results=max_images)).itemgot('image')

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/compat.py in ddg_images(keywords, region, safesearch, time, size, color, type_image, layout, license_image, max_results, page, output, download)
     80         type_image=type_image,
     81         layout=layout,
---> 82         license_image=license_image,
     83     ):
     84         results.append(r)

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in images(self, keywords, region, safesearch, timelimit, size, color, type_image, layout, license_image)
    403         assert keywords, "keywords is mandatory"
    404 
--> 405         vqd = self._get_vqd(keywords)
    406         assert vqd, "error in getting vqd"
    407 

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_vqd(self, keywords)
     93     def _get_vqd(self, keywords: str) -> Optional[str]:
     94         """Get vqd value for a search query."""
---> 95         resp = self._get_url("POST", "https://duckduckgo.com", data={"q": keywords})
     96         if resp:
     97             for c1, c2 in (

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
     87                 logger.warning(f"_get_url() {url} {type(ex).__name__} {ex}")
     88                 if i >= 2 or "418" in str(ex):
---> 89                     raise ex
     90             sleep(3)
     91         return None

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
     80                 )
     81                 if self._is_500_in_url(str(resp.url)) or resp.status_code == 202:
---> 82                     raise httpx._exceptions.HTTPError("")
     83                 resp.raise_for_status()
     84                 if resp.status_code == 200:

What’s going on? Any help is much appreciated, and thanks in advance!

thank you kindly!! that works!!

Thank you this worked for me too!

Hi All - I’ve spent a bit of time in the forums trying various solutions proposed but wasn’t able to resolve my error. I kept encountering the 403 error no matter what I did. I eventually was able to adjust the code using pieces from 2 different solutions.

I amended the first code cell under STEP 1

from duckduckgo_search import ddg_images, DDGS

from fastcore.all import *

def search_images(term, max_images=200):
with DDGS(headers = {“Accept-Encoding”: “gzip, deflate, br”}) as ddgs:
results = ddgs.images(keywords=term)
images = [next(results).get(“image”) for _ in range(max_images)]
return L(images)

This resolved the error for me when using Kaggle.

2 Likes