Lesson 1 official topic

Maybe dotdotgo API changed since cause it returns 100 results and when it tries to parse “next” key is not present in the “data” dict aka JSON response.

KeyError: 'next'

in search_images(term, max_images)
....
---> 20         requestUrl = url + data['next']
data:
{'ads': None, 
'query': 'bird photo', 
'queryEncoded': 'bird%20photo', 
'response_type': 'images', 
'results':{....}}
1 Like

I should have mentioned that I was trying the notebook on Kaggle. I have just tried it on Colab without any issues, other than having to install fastai and pytorch manually.

1 Like

Hi!

I’m sad to say I’m stuck at the very first lesson :frowning:
I have no clue why I keep getting this error. I have been trying to insert an image to run the example in the textbook on, but it keeps saying “No such file or directory”. Where is it meant to be accessing the image from? At first, I stored the image in a local file on my desktop, then I tried to connect my colab file into GDrive and placed the image in the same folder as the file. Both solutions didn’t work. I feel like I’m missing something very very very simple, I just can’t get it. Can anyone help?

Here’s a couple of ways to resolve this error:

You can “Upload to session storage” an image that’s titled “Young_cats.jpg” which will then make the rest of the code work. The only downside to this approach is that once you disconnect from your Google Colab runtime, it will delete the uploaded image and the next time you start a new session you will have to upload the image again.

Alternatively, you can run the following code to connect to your Google Drive (it’ll ask you to sign-in to Google Drive):

from google.colab import drive

drive.mount('/content/gdrive')

And then you can access Google Drive files with the base path of "/content/gdrive/MyDrive":

3 Likes

Hello, everyone,
I’m studying lesson 1 and I got stuck while executing the search_images() block in kaggle. I made sure I had the network turned on but I still got an HTTPError. Any help would be appreciated.

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/tmp/ipykernel_17/3262053840.py in <module>
      1 #NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.
      2 #    If you get a JSON error, just try running it again (it may take a couple of tries).
----> 3 urls = search_images('bird photos', max_images=1)
      4 urls[0]
      5 print(urls[0])

/tmp/ipykernel_17/1717929076.py in search_images(term, max_images)
      4 def search_images(term, max_images=30):
      5     print(f"Searching for '{term}'")
----> 6     return L(ddg_images(term, max_results=max_images)).itemgot('image')

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/compat.py in ddg_images(keywords, region, safesearch, time, size, color, type_image, layout, license_image, max_results, page, output, download)
     80         type_image=type_image,
     81         layout=layout,
---> 82         license_image=license_image,
     83     ):
     84         results.append(r)

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in images(self, keywords, region, safesearch, timelimit, size, color, type_image, layout, license_image)
    403         assert keywords, "keywords is mandatory"
    404 
--> 405         vqd = self._get_vqd(keywords)
    406         assert vqd, "error in getting vqd"
    407 

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_vqd(self, keywords)
     93     def _get_vqd(self, keywords: str) -> Optional[str]:
     94         """Get vqd value for a search query."""
---> 95         resp = self._get_url("POST", "https://duckduckgo.com", data={"q": keywords})
     96         if resp:
     97             for c1, c2 in (

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
     87                 logger.warning(f"_get_url() {url} {type(ex).__name__} {ex}")
     88                 if i >= 2 or "418" in str(ex):
---> 89                     raise ex
     90             sleep(3)
     91         return None

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
     80                 )
     81                 if self._is_500_in_url(str(resp.url)) or resp.status_code == 202:
---> 82                     raise httpx._exceptions.HTTPError("")
     83                 resp.raise_for_status()
     84                 if resp.status_code == 200:

HTTPError: 

I followed the tips and changed the environment to “Always use latest environment” and the problem was solved.

3 Likes

It worked! Thank you so much for your help. I tried both methods and they both worked :slight_smile:

1 Like

Just sharing my homework from lesson 1: an art style classifier for paintings.

This was my first time using the fastai libraries, so I followed along with Jeremy - then unpicked the code piece by piece to better understand it and figure out where the different components were coming from (was a really rewarding exercise).

I used 20 different art categories and downloaded 200 images for each as a training dataset.

I used resnet34 and got an error rate of about 40% - I’m not too sure if that’s a success or not? Haha

3 Likes

Hi everyone,

I am stuck about half way through the first lesson already lol. I wont give up so if anyone has any idea how to help I’d appreciate it.

My problem is this → HTTPStatusError: Client error ‘403 Forbidden’ for url

I was having problems with the search_images function and tried a bunch of things to help solve it and ended up running the code many times which is why I think I have the 403 error now. I looked it up and its suppose to be from restricted access via DuckDuckGo.

Any help would be much appreciated, I am brand new and really enjoy this but am frustrated currently haha.

Just saw a similar question on another topic so I’m posting my potential solution here:

1 Like

You can also place the number of images argument directly into DDGS().images(query, max_results), rather than at the end of the iterator :slight_smile:

3 Likes

wow! your awesome! thanks so much for your help this is a gamechanger!!

1 Like

thanks so much for your reply!

1 Like

Hi all. I’m hitting a wall in the lesson. I’ve searched this thread and I haven’t found the same issue anywhere, so apologies in advance if I have missed something simple.

This is in the ‘grab a few examples’ section. I’ve corrected ‘search_images’ as recent comments have suggested, and the single bird and forest pics are fetched and displayed, but when I run the cell to download images, the notebook throws an error:

I hope the solution is something simple. I appreciate any help!

I resolved it by setting max_images=100 in the search_images function. Anything more than that throws that StopIteration error again.

DDG must have reduced the maximum number of image URL’s that can be queried per request in recent time …

1 Like

Thank you @arikru - that solved it!

Hey I got exactly the same error…

I’m getting “HTTPError” when using the DDG search through ddg_images in the Kaggle notebook. I can’t even get a single image downloaded. This is the error message:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/tmp/ipykernel_65/2432147335.py in <module>
      1 #NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.
      2 #    If you get a JSON error, just try running it again (it may take a couple of tries).
----> 3 urls = search_images('bird photos', max_images=1)
      4 urls[0]

/tmp/ipykernel_65/1717929076.py in search_images(term, max_images)
      4 def search_images(term, max_images=30):
      5     print(f"Searching for '{term}'")
----> 6     return L(ddg_images(term, max_results=max_images)).itemgot('image')

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/compat.py in ddg_images(keywords, region, safesearch, time, size, color, type_image, layout, license_image, max_results, page, output, download)
     80         type_image=type_image,
     81         layout=layout,
---> 82         license_image=license_image,
     83     ):
     84         results.append(r)

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in images(self, keywords, region, safesearch, timelimit, size, color, type_image, layout, license_image)
    403         assert keywords, "keywords is mandatory"
    404 
--> 405         vqd = self._get_vqd(keywords)
    406         assert vqd, "error in getting vqd"
    407 

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_vqd(self, keywords)
     93     def _get_vqd(self, keywords: str) -> Optional[str]:
     94         """Get vqd value for a search query."""
---> 95         resp = self._get_url("POST", "https://duckduckgo.com", data={"q": keywords})
     96         if resp:
     97             for c1, c2 in (

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
     87                 logger.warning(f"_get_url() {url} {type(ex).__name__} {ex}")
     88                 if i >= 2 or "418" in str(ex):
---> 89                     raise ex
     90             sleep(3)
     91         return None

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
     80                 )
     81                 if self._is_500_in_url(str(resp.url)) or resp.status_code == 202:
---> 82                     raise httpx._exceptions.HTTPError("")
     83                 resp.raise_for_status()
     84                 if resp.status_code == 200:

HTTPError: 

Please can I get the updated discord channel invite link?

Hi!

I’m jumping back here from lesson 3 :slight_smile: I’ve been trying to train my own little model to be able to detect skin pigmentation issues. I have been trying to increase my dataset (that gets downloaded from DDG), however, it seems like I can’t go over a certain point. I seem to only be able to download ±100 images per category (so ±300 total). I set max images to 250, 500, and 1000 (to test), yet the amount of downloaded data stays the same. I don’t think it has anything to do with the number of available images on DDG since I’ve tested it on different queries (unless I’m missing something, please lmk). Does anyone know the reason for this?

Here’s part of my code for reference:

def search_images(term, max_images=1000):
print(f"Searching for {term}.")
return L(ddg_images(term, max_results=max_images)).itemgot(“image”)

skin_status = “melasma”, “acne marks”, “healthy”
path = Path(“skin_concerns”)

if not path.exists():
path.mkdir()
for o in skin_status:
dest = (path/o)
dest.mkdir(exist_ok=True)
if o == “healthy”:
download_images(dest, urls=search_images(f"{o} skin photo"))
else:
download_images(dest, urls=search_images(f"{o} on skin photo"))
resize_images(path/o, max_size=400, dest=path/o)