Lesson 1 official topic

Hello, everyone,
I’m studying lesson 1 and I got stuck while executing the search_images() block in kaggle. I made sure I had the network turned on but I still got an HTTPError. Any help would be appreciated.

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/tmp/ipykernel_17/3262053840.py in <module>
      1 #NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.
      2 #    If you get a JSON error, just try running it again (it may take a couple of tries).
----> 3 urls = search_images('bird photos', max_images=1)
      4 urls[0]
      5 print(urls[0])

/tmp/ipykernel_17/1717929076.py in search_images(term, max_images)
      4 def search_images(term, max_images=30):
      5     print(f"Searching for '{term}'")
----> 6     return L(ddg_images(term, max_results=max_images)).itemgot('image')

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/compat.py in ddg_images(keywords, region, safesearch, time, size, color, type_image, layout, license_image, max_results, page, output, download)
     80         type_image=type_image,
     81         layout=layout,
---> 82         license_image=license_image,
     83     ):
     84         results.append(r)

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in images(self, keywords, region, safesearch, timelimit, size, color, type_image, layout, license_image)
    403         assert keywords, "keywords is mandatory"
    404 
--> 405         vqd = self._get_vqd(keywords)
    406         assert vqd, "error in getting vqd"
    407 

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_vqd(self, keywords)
     93     def _get_vqd(self, keywords: str) -> Optional[str]:
     94         """Get vqd value for a search query."""
---> 95         resp = self._get_url("POST", "https://duckduckgo.com", data={"q": keywords})
     96         if resp:
     97             for c1, c2 in (

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
     87                 logger.warning(f"_get_url() {url} {type(ex).__name__} {ex}")
     88                 if i >= 2 or "418" in str(ex):
---> 89                     raise ex
     90             sleep(3)
     91         return None

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
     80                 )
     81                 if self._is_500_in_url(str(resp.url)) or resp.status_code == 202:
---> 82                     raise httpx._exceptions.HTTPError("")
     83                 resp.raise_for_status()
     84                 if resp.status_code == 200:

HTTPError: 

I followed the tips and changed the environment to “Always use latest environment” and the problem was solved.

3 Likes

It worked! Thank you so much for your help. I tried both methods and they both worked :slight_smile:

1 Like

Just sharing my homework from lesson 1: an art style classifier for paintings.

This was my first time using the fastai libraries, so I followed along with Jeremy - then unpicked the code piece by piece to better understand it and figure out where the different components were coming from (was a really rewarding exercise).

I used 20 different art categories and downloaded 200 images for each as a training dataset.

I used resnet34 and got an error rate of about 40% - I’m not too sure if that’s a success or not? Haha

3 Likes

Hi everyone,

I am stuck about half way through the first lesson already lol. I wont give up so if anyone has any idea how to help I’d appreciate it.

My problem is this → HTTPStatusError: Client error ‘403 Forbidden’ for url

I was having problems with the search_images function and tried a bunch of things to help solve it and ended up running the code many times which is why I think I have the 403 error now. I looked it up and its suppose to be from restricted access via DuckDuckGo.

Any help would be much appreciated, I am brand new and really enjoy this but am frustrated currently haha.

Just saw a similar question on another topic so I’m posting my potential solution here:

1 Like

You can also place the number of images argument directly into DDGS().images(query, max_results), rather than at the end of the iterator :slight_smile:

3 Likes

wow! your awesome! thanks so much for your help this is a gamechanger!!

1 Like

thanks so much for your reply!

1 Like

Hi all. I’m hitting a wall in the lesson. I’ve searched this thread and I haven’t found the same issue anywhere, so apologies in advance if I have missed something simple.

This is in the ‘grab a few examples’ section. I’ve corrected ‘search_images’ as recent comments have suggested, and the single bird and forest pics are fetched and displayed, but when I run the cell to download images, the notebook throws an error:

I hope the solution is something simple. I appreciate any help!

I resolved it by setting max_images=100 in the search_images function. Anything more than that throws that StopIteration error again.

DDG must have reduced the maximum number of image URL’s that can be queried per request in recent time …

1 Like

Thank you @arikru - that solved it!

Hey I got exactly the same error…

I’m getting “HTTPError” when using the DDG search through ddg_images in the Kaggle notebook. I can’t even get a single image downloaded. This is the error message:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/tmp/ipykernel_65/2432147335.py in <module>
      1 #NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.
      2 #    If you get a JSON error, just try running it again (it may take a couple of tries).
----> 3 urls = search_images('bird photos', max_images=1)
      4 urls[0]

/tmp/ipykernel_65/1717929076.py in search_images(term, max_images)
      4 def search_images(term, max_images=30):
      5     print(f"Searching for '{term}'")
----> 6     return L(ddg_images(term, max_results=max_images)).itemgot('image')

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/compat.py in ddg_images(keywords, region, safesearch, time, size, color, type_image, layout, license_image, max_results, page, output, download)
     80         type_image=type_image,
     81         layout=layout,
---> 82         license_image=license_image,
     83     ):
     84         results.append(r)

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in images(self, keywords, region, safesearch, timelimit, size, color, type_image, layout, license_image)
    403         assert keywords, "keywords is mandatory"
    404 
--> 405         vqd = self._get_vqd(keywords)
    406         assert vqd, "error in getting vqd"
    407 

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_vqd(self, keywords)
     93     def _get_vqd(self, keywords: str) -> Optional[str]:
     94         """Get vqd value for a search query."""
---> 95         resp = self._get_url("POST", "https://duckduckgo.com", data={"q": keywords})
     96         if resp:
     97             for c1, c2 in (

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
     87                 logger.warning(f"_get_url() {url} {type(ex).__name__} {ex}")
     88                 if i >= 2 or "418" in str(ex):
---> 89                     raise ex
     90             sleep(3)
     91         return None

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
     80                 )
     81                 if self._is_500_in_url(str(resp.url)) or resp.status_code == 202:
---> 82                     raise httpx._exceptions.HTTPError("")
     83                 resp.raise_for_status()
     84                 if resp.status_code == 200:

HTTPError: 

Please can I get the updated discord channel invite link?

Hi!

I’m jumping back here from lesson 3 :slight_smile: I’ve been trying to train my own little model to be able to detect skin pigmentation issues. I have been trying to increase my dataset (that gets downloaded from DDG), however, it seems like I can’t go over a certain point. I seem to only be able to download ±100 images per category (so ±300 total). I set max images to 250, 500, and 1000 (to test), yet the amount of downloaded data stays the same. I don’t think it has anything to do with the number of available images on DDG since I’ve tested it on different queries (unless I’m missing something, please lmk). Does anyone know the reason for this?

Here’s part of my code for reference:

def search_images(term, max_images=1000):
print(f"Searching for {term}.")
return L(ddg_images(term, max_results=max_images)).itemgot(“image”)

skin_status = “melasma”, “acne marks”, “healthy”
path = Path(“skin_concerns”)

if not path.exists():
path.mkdir()
for o in skin_status:
dest = (path/o)
dest.mkdir(exist_ok=True)
if o == “healthy”:
download_images(dest, urls=search_images(f"{o} skin photo"))
else:
download_images(dest, urls=search_images(f"{o} on skin photo"))
resize_images(path/o, max_size=400, dest=path/o)

ddg_images has been deprecated and replaced with DDGS().images. Not sure if that’s why you are not getting the expected performance, but I replaced the search_images function with the following in Colab and was able to get close to 300 images downloaded per category when setting max_images=300:

def search_images(keywords, max_images=300):
    print(f"Searching for {keywords}")
    return L(DDGS().images(keywords,max_results=max_images)).itemgot('image')

1 Like

Hi! Thank you so much. I suspected that was the issue, and indeed it was. It seems I can only go up to about 500 images per item, which is definitely better than before! Perhaps there’s some sort of internal limitation I don’t know about. But this is good enough for me :).

Thank you for your help!

1 Like

I have the exact same issue, were u able to solve it somehow

Hi!

Here is my first homework:

There was an issue with ddg search function, I’ve found a fix for that

1 Like