Yes @sebkraemer ! restarting the runtime works, I am using colab.
If someone Still having Issues with duckduckgo on Kaggle (403 forbidden for me), they can use Google Colab for this. It worked nicely for me.
Hey there, I found a workaround for the duckduckgo search issue.
Here is my notebook that works:
Long story short: there seems to be a problem with version 3.8.5 of duckduckgo library, once I updated to the latest 3.9.5 it started working. Note, that this requires python version >=3.9
In kaggle there is an option in Notebook options I had to change: Always use latest environment (see screenshot)
Hope that helps!
Hi y’all, as of today, the Client error “403 forbidden” occurs for me no matter what: locally, Paperspace Gradient, Colab, Kaggle … Also the workarounds mentioned above and utilizing the alternative context manager implementation like
def search_images_ddg(term, max_images=200):
with DDGS() as ddgs:
results = ddgs.images(keywords=term)
images = [next(results).get("image") for _ in range(max_images)]
return L(images)
don’t work anymore or make no difference.
Anybody familiar with the API chiming in on what’s going on and/or a (new) fix would be highly appreciated. I pulled too many hairs trying to get it working on a Friday evening
I found this GitHub issue which lists a solution to pass the dictionary {"Accept-Encoding": "gzip, deflate, br"}
to the headers
parameter of DDGS
as follows:
def search_images(term, max_images=200):
with DDGS(headers = {"Accept-Encoding": "gzip, deflate, br"}) as ddgs:
results = ddgs.images(keywords=term)
images = [next(results).get("image") for _ in range(max_images)]
return L(images)
I tested this in a Kaggle notebook and it seems to work:
Can confirm, tested it locally and this solves the issue
Works perfectly!
yeah, ddg_images is depricated
you can replace
from duckduckgo_search import ddg_images
def search_images(term, max_images=30):
print(f"Searching for '{term}'")
return L(ddg_images(term, max_results=max_images)).itemgot('image')
with
from duckduckgo_search import DDGS
def search_images(keywords, max_images = 30):
print(f"Searching for {keywords}")
return L(DDGS().images(keywords,max_results=max_images)).itemgot('image')
This worked for me. Thank you very much!
Going into the notebook options and following these settings helped me. I also restarted and cleared my cell output if that helps.
Hi all,
I had the same issues with downloading images using Bing/Azure, and I found two versions of the Duckduckgo code. I messed it even more by being blacklisted by ddg a whole day
It seems that there is an alternate solution using Hugging Face Image API.
It gives something like that:
SEARCH_URL = "https://huggingface.co/api/experimental/images/search"
def get_image_urls_by_term(search_term: str, count=150):
params = {"q": search_term, "license": "public", "imageType": "photo", "count": count}
response = requests.get(SEARCH_URL, params=params)
response.raise_for_status()
response_data = response.json()
image_urls = [img['thumbnailUrl'] for img in response_data['value']]
return image_urls
def gen_images_from_urls(urls):
num_skipped = 0
for url in urls:
response = requests.get(url)
if not response.status_code == 200:
num_skipped += 1
try:
img = Image.open(BytesIO(response.content))
yield img
except UnidentifiedImageError:
num_skipped +=1
print(f"Retrieved {len(urls) - num_skipped} images. Skipped {num_skipped}.")
def urls_to_image_folder(urls, save_directory):
for i, image in enumerate(gen_images_from_urls(urls)):
image.save(save_directory / f'{i}.jpg')
I found this snippet here.
Thank you for this, Christian!
Helped me immensely.
Regards,
Beau
Not sure if this is the same thing, but in lesson 1, I ran the cell that contains this code:
#NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.
# If you get a JSON error, just try running it again (it may take a couple of tries).
urls = search_images('bird photos', max_images=1)
urls[0]
And got this error output:
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
/tmp/ipykernel_17/2432147335.py in <module>
1 #NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.
2 # If you get a JSON error, just try running it again (it may take a couple of tries).
----> 3 urls = search_images('bird photos', max_images=1)
4 urls[0]
/tmp/ipykernel_17/1717929076.py in search_images(term, max_images)
4 def search_images(term, max_images=30):
5 print(f"Searching for '{term}'")
----> 6 return L(ddg_images(term, max_results=max_images)).itemgot('image')
/opt/conda/lib/python3.7/site-packages/duckduckgo_search/compat.py in ddg_images(keywords, region, safesearch, time, size, color, type_image, layout, license_image, max_results, page, output, download)
80 type_image=type_image,
81 layout=layout,
---> 82 license_image=license_image,
83 ):
84 results.append(r)
/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in images(self, keywords, region, safesearch, timelimit, size, color, type_image, layout, license_image)
403 assert keywords, "keywords is mandatory"
404
--> 405 vqd = self._get_vqd(keywords)
406 assert vqd, "error in getting vqd"
407
/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_vqd(self, keywords)
93 def _get_vqd(self, keywords: str) -> Optional[str]:
94 """Get vqd value for a search query."""
---> 95 resp = self._get_url("POST", "https://duckduckgo.com", data={"q": keywords})
96 if resp:
97 for c1, c2 in (
/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
87 logger.warning(f"_get_url() {url} {type(ex).__name__} {ex}")
88 if i >= 2 or "418" in str(ex):
---> 89 raise ex
90 sleep(3)
91 return None
/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
80 )
81 if self._is_500_in_url(str(resp.url)) or resp.status_code == 202:
---> 82 raise httpx._exceptions.HTTPError("")
83 resp.raise_for_status()
84 if resp.status_code == 200:
What’s going on? Any help is much appreciated, and thanks in advance!
thank you kindly!! that works!!
Thank you this worked for me too!
Hi All - I’ve spent a bit of time in the forums trying various solutions proposed but wasn’t able to resolve my error. I kept encountering the 403 error no matter what I did. I eventually was able to adjust the code using pieces from 2 different solutions.
I amended the first code cell under STEP 1
from duckduckgo_search import ddg_images, DDGS
from fastcore.all import *
def search_images(term, max_images=200):
with DDGS(headers = {“Accept-Encoding”: “gzip, deflate, br”}) as ddgs:
results = ddgs.images(keywords=term)
images = [next(results).get(“image”) for _ in range(max_images)]
return L(images)
This resolved the error for me when using Kaggle.
This worked for me too. Thanks @periculo
Only thing I had to do was swap the double quotes for single quotes and nudge the indent on a few lines.
from duckduckgo_search import ddg_images, DDGS
from fastcore.all import *
def search_images(term, max_images=200):
with DDGS(headers = {'Accept-Encoding': 'gzip, deflate, br'}) as ddgs:
results = ddgs.images(keywords=term)
images = [next(results).get('image') for _ in range(max_images)]
return L(images)
this works for me too! Thank you!