Thanks for your time putting this up. This is exactly what worked for me, and the only method that worked for me essentially. Cheers!
Thanks, this was the only solution that worked for me (or at least first one – I’ll stop digging now )
What did not work:
- Upgrading
duckduckgo_search
to>= 2.9.4
, even3.3.0
failed with HTTP 403 forbidden, although the same URL worked when used within the browser. Might be theUser-Agent
header that is used in fastai’s util function. - Fixing the deprecation warning and using
DDGS()
, same error inrequest
library call. - Restarting the kernel because some caching apparently prevented the usage of the updated/changed library, so using a working library version might still fail (but no version of plain
duckduckgo_search
fixed my problem).
So thanks again, @maitland
PS: I did not have to fiddle with the parameter names, max_images
worked just fine.
PPS: This is equivalent to what @cmondorf did in his kaggle notebook linked above.
If someone Still having Issues with duckduckgo on Kaggle (403 forbidden for me), they can use Google Colab for this. It worked nicely for me.
Hey there, I found a workaround for the duckduckgo search issue.
Here is my notebook that works:
Long story short: there seems to be a problem with version 3.8.5 of duckduckgo library, once I updated to the latest 3.9.5 it started working. Note, that this requires python version >=3.9
In kaggle there is an option in Notebook options I had to change: Always use latest environment (see screenshot)
Hope that helps!
Hi y’all, as of today, the Client error “403 forbidden” occurs for me no matter what: locally, Paperspace Gradient, Colab, Kaggle … Also the workarounds mentioned above and utilizing the alternative context manager implementation like
def search_images_ddg(term, max_images=200):
with DDGS() as ddgs:
results = ddgs.images(keywords=term)
images = [next(results).get("image") for _ in range(max_images)]
return L(images)
don’t work anymore or make no difference.
Anybody familiar with the API chiming in on what’s going on and/or a (new) fix would be highly appreciated. I pulled too many hairs trying to get it working on a Friday evening
I found this GitHub issue which lists a solution to pass the dictionary {"Accept-Encoding": "gzip, deflate, br"}
to the headers
parameter of DDGS
as follows:
def search_images(term, max_images=200):
with DDGS(headers = {"Accept-Encoding": "gzip, deflate, br"}) as ddgs:
results = ddgs.images(keywords=term)
images = [next(results).get("image") for _ in range(max_images)]
return L(images)
I tested this in a Kaggle notebook and it seems to work:
Can confirm, tested it locally and this solves the issue
Works perfectly!
yeah, ddg_images is depricated
you can replace
from duckduckgo_search import ddg_images
def search_images(term, max_images=30):
print(f"Searching for '{term}'")
return L(ddg_images(term, max_results=max_images)).itemgot('image')
with
from duckduckgo_search import DDGS
def search_images(keywords, max_images = 30):
print(f"Searching for {keywords}")
return L(DDGS().images(keywords,max_results=max_images)).itemgot('image')
This worked for me. Thank you very much!
Going into the notebook options and following these settings helped me. I also restarted and cleared my cell output if that helps.
Hi all,
I had the same issues with downloading images using Bing/Azure, and I found two versions of the Duckduckgo code. I messed it even more by being blacklisted by ddg a whole day
It seems that there is an alternate solution using Hugging Face Image API.
It gives something like that:
SEARCH_URL = "https://huggingface.co/api/experimental/images/search"
def get_image_urls_by_term(search_term: str, count=150):
params = {"q": search_term, "license": "public", "imageType": "photo", "count": count}
response = requests.get(SEARCH_URL, params=params)
response.raise_for_status()
response_data = response.json()
image_urls = [img['thumbnailUrl'] for img in response_data['value']]
return image_urls
def gen_images_from_urls(urls):
num_skipped = 0
for url in urls:
response = requests.get(url)
if not response.status_code == 200:
num_skipped += 1
try:
img = Image.open(BytesIO(response.content))
yield img
except UnidentifiedImageError:
num_skipped +=1
print(f"Retrieved {len(urls) - num_skipped} images. Skipped {num_skipped}.")
def urls_to_image_folder(urls, save_directory):
for i, image in enumerate(gen_images_from_urls(urls)):
image.save(save_directory / f'{i}.jpg')
I found this snippet here.
Thank you for this, Christian!
Helped me immensely.
Regards,
Beau
Not sure if this is the same thing, but in lesson 1, I ran the cell that contains this code:
#NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.
# If you get a JSON error, just try running it again (it may take a couple of tries).
urls = search_images('bird photos', max_images=1)
urls[0]
And got this error output:
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
/tmp/ipykernel_17/2432147335.py in <module>
1 #NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.
2 # If you get a JSON error, just try running it again (it may take a couple of tries).
----> 3 urls = search_images('bird photos', max_images=1)
4 urls[0]
/tmp/ipykernel_17/1717929076.py in search_images(term, max_images)
4 def search_images(term, max_images=30):
5 print(f"Searching for '{term}'")
----> 6 return L(ddg_images(term, max_results=max_images)).itemgot('image')
/opt/conda/lib/python3.7/site-packages/duckduckgo_search/compat.py in ddg_images(keywords, region, safesearch, time, size, color, type_image, layout, license_image, max_results, page, output, download)
80 type_image=type_image,
81 layout=layout,
---> 82 license_image=license_image,
83 ):
84 results.append(r)
/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in images(self, keywords, region, safesearch, timelimit, size, color, type_image, layout, license_image)
403 assert keywords, "keywords is mandatory"
404
--> 405 vqd = self._get_vqd(keywords)
406 assert vqd, "error in getting vqd"
407
/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_vqd(self, keywords)
93 def _get_vqd(self, keywords: str) -> Optional[str]:
94 """Get vqd value for a search query."""
---> 95 resp = self._get_url("POST", "https://duckduckgo.com", data={"q": keywords})
96 if resp:
97 for c1, c2 in (
/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
87 logger.warning(f"_get_url() {url} {type(ex).__name__} {ex}")
88 if i >= 2 or "418" in str(ex):
---> 89 raise ex
90 sleep(3)
91 return None
/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
80 )
81 if self._is_500_in_url(str(resp.url)) or resp.status_code == 202:
---> 82 raise httpx._exceptions.HTTPError("")
83 resp.raise_for_status()
84 if resp.status_code == 200:
What’s going on? Any help is much appreciated, and thanks in advance!
thank you kindly!! that works!!
Thank you this worked for me too!
Hi All - I’ve spent a bit of time in the forums trying various solutions proposed but wasn’t able to resolve my error. I kept encountering the 403 error no matter what I did. I eventually was able to adjust the code using pieces from 2 different solutions.
I amended the first code cell under STEP 1
from duckduckgo_search import ddg_images, DDGS
from fastcore.all import *
def search_images(term, max_images=200):
with DDGS(headers = {“Accept-Encoding”: “gzip, deflate, br”}) as ddgs:
results = ddgs.images(keywords=term)
images = [next(results).get(“image”) for _ in range(max_images)]
return L(images)
This resolved the error for me when using Kaggle.