Lesson 1 official topic

I resolved it by setting max_images=100 in the search_images function. Anything more than that throws that StopIteration error again.

DDG must have reduced the maximum number of image URL’s that can be queried per request in recent time …

1 Like

Thank you @arikru - that solved it!

Hey I got exactly the same error…

I’m getting “HTTPError” when using the DDG search through ddg_images in the Kaggle notebook. I can’t even get a single image downloaded. This is the error message:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/tmp/ipykernel_65/2432147335.py in <module>
      1 #NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.
      2 #    If you get a JSON error, just try running it again (it may take a couple of tries).
----> 3 urls = search_images('bird photos', max_images=1)
      4 urls[0]

/tmp/ipykernel_65/1717929076.py in search_images(term, max_images)
      4 def search_images(term, max_images=30):
      5     print(f"Searching for '{term}'")
----> 6     return L(ddg_images(term, max_results=max_images)).itemgot('image')

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/compat.py in ddg_images(keywords, region, safesearch, time, size, color, type_image, layout, license_image, max_results, page, output, download)
     80         type_image=type_image,
     81         layout=layout,
---> 82         license_image=license_image,
     83     ):
     84         results.append(r)

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in images(self, keywords, region, safesearch, timelimit, size, color, type_image, layout, license_image)
    403         assert keywords, "keywords is mandatory"
    404 
--> 405         vqd = self._get_vqd(keywords)
    406         assert vqd, "error in getting vqd"
    407 

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_vqd(self, keywords)
     93     def _get_vqd(self, keywords: str) -> Optional[str]:
     94         """Get vqd value for a search query."""
---> 95         resp = self._get_url("POST", "https://duckduckgo.com", data={"q": keywords})
     96         if resp:
     97             for c1, c2 in (

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
     87                 logger.warning(f"_get_url() {url} {type(ex).__name__} {ex}")
     88                 if i >= 2 or "418" in str(ex):
---> 89                     raise ex
     90             sleep(3)
     91         return None

/opt/conda/lib/python3.7/site-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
     80                 )
     81                 if self._is_500_in_url(str(resp.url)) or resp.status_code == 202:
---> 82                     raise httpx._exceptions.HTTPError("")
     83                 resp.raise_for_status()
     84                 if resp.status_code == 200:

HTTPError: 

Please can I get the updated discord channel invite link?

Hi!

I’m jumping back here from lesson 3 :slight_smile: I’ve been trying to train my own little model to be able to detect skin pigmentation issues. I have been trying to increase my dataset (that gets downloaded from DDG), however, it seems like I can’t go over a certain point. I seem to only be able to download ±100 images per category (so ±300 total). I set max images to 250, 500, and 1000 (to test), yet the amount of downloaded data stays the same. I don’t think it has anything to do with the number of available images on DDG since I’ve tested it on different queries (unless I’m missing something, please lmk). Does anyone know the reason for this?

Here’s part of my code for reference:

def search_images(term, max_images=1000):
print(f"Searching for {term}.")
return L(ddg_images(term, max_results=max_images)).itemgot(“image”)

skin_status = “melasma”, “acne marks”, “healthy”
path = Path(“skin_concerns”)

if not path.exists():
path.mkdir()
for o in skin_status:
dest = (path/o)
dest.mkdir(exist_ok=True)
if o == “healthy”:
download_images(dest, urls=search_images(f"{o} skin photo"))
else:
download_images(dest, urls=search_images(f"{o} on skin photo"))
resize_images(path/o, max_size=400, dest=path/o)

ddg_images has been deprecated and replaced with DDGS().images. Not sure if that’s why you are not getting the expected performance, but I replaced the search_images function with the following in Colab and was able to get close to 300 images downloaded per category when setting max_images=300:

def search_images(keywords, max_images=300):
    print(f"Searching for {keywords}")
    return L(DDGS().images(keywords,max_results=max_images)).itemgot('image')

2 Likes

Hi! Thank you so much. I suspected that was the issue, and indeed it was. It seems I can only go up to about 500 images per item, which is definitely better than before! Perhaps there’s some sort of internal limitation I don’t know about. But this is good enough for me :).

Thank you for your help!

1 Like

I have the exact same issue, were u able to solve it somehow

Hi!

Here is my first homework:

There was an issue with ddg search function, I’ve found a fix for that

2 Likes

IMDB Movie Review - how satisfactory is the result? - How to improve it?

Hello all,
as the very first example, I tried the code on p. 43 of the book in order to have classified my own reviews. I am completely new so I don’t know what quality to expect from an AWD_LSTM architecture, but I was astonished to find as response to my request
learn.predict("I don't recommend this movie.")
the answer
('pos', tensor(1), tensor([0.0418, 0.9582]))

Is it expected that the “don’t” does not seem to be interpreted together with “recommend”? Only if I asked the prediction of a sentence with a clearly negative adjective, like “This movie is boring.”, I got an actual negative result.

I would like to try a different model (or should I try different settings of the model, or of the fine_tuneing?), but I am unsure how to find some. I noticed this model is the only one defined in the respective .py file fastai\text\models\awdlstm.py, and the folder does not contain more models; I tried to search docs.fast.ai for the AWD_LSTM to locate feasible alternatives but didn’t succeed.

Best regards,
Martin

I am facing the same issue. can anyone help?

I found an answer somewhere in this forum, to solve it firstly add fastbook to pip install and then rewrite search_images as:

from fastbook import search_images_ddg

def search_images(term, max_images=30):
    print(f"Searching for '{term}'")
    return search_images_ddg(term, max_images=max_images)
8 Likes

I also had an issue with urlopen not working with DDGS, but I got past it, feel free to have a look here:

Basically, I rewrote download_images to use the Python Requests library, an update to urllib.

However, further down, I had to address the urllib issue directly. I had to create an unverified SSL context:

import ssl
ssl._create_default_https_context = ssl._create_unverified_context

Note: this may entail some security issues, but I’m not sure what exactly. I know I can make HTTPS calls with the Python Requests library just fine, not sure why urllib verification was not working. But the code that calls urllib is too deep in the fastai code to work around, and all it is doing is grabbing the resnet18 weights from PyTorch Hub, so if you trust your local network and domain server I think it’s ok.

1 Like

I saw the link to the https://course.fast.ai/datasets is showing a 404. Is it browseable?

I think you probably saw that link with the previous version (2020) of the course which is now located here:

This is different than the 2022 version of the course which I do not believe has a specific section url for datasets.

1 Like

Thank you!

1 Like

Lesson One Homework:

Does anyone have any working examples on how to use the download_images() function using a text or csv file in kaggle? The file would contain the list of urls of images I want to use as part of my dataset for the homework.

I’ve tried both. I uploaded them into kaggle under notebook ‘input data’, and they appear as data sets, with the file there.

searchTerm = 'blonde woman'
path = Path('blondes')
print(path)
dest.mkdir(exist_ok=True, parents=True)
download_images(dest, url_file= '/kaggle/input/brit-csv/brittany_list.csv')

The error that I’m seeing is:

File /opt/conda/lib/python3.10/site-packages/fastai/vision/utils.py:39, in download_images(dest, url_file, urls, max_pics, n_workers, timeout, preserve_filename)
     37 def download_images(dest, url_file=None, urls=None, max_pics=1000, n_workers=8, timeout=4, preserve_filename=False):
     38     "Download images listed in text file `url_file` to path `dest`, at most `max_pics`"
---> 39     if urls is None: urls = url_file.read_text().strip().split("\n")[:max_pics]
     40     dest = Path(dest)
     41     dest.mkdir(exist_ok=True)

AttributeError: 'str' object has no attribute 'read_text'

Any ideas? I think it’s how I’m referencing the file, but not sure.

It’s clear that url_file.read_text() isn’t working, but I don’t know where that .read_text() function is defined so I could further debug.

Instead of search giving me the dataset, I wanted ot reference my own dataset, where the the urls I want to specifically point to for images would be contained in the csv.

As a reply to my own question…

It’s because it needs to be a Path() object. That wasn’t clear from the documentation in terms of the type required.

Thanks!

1 Like

Is is necessary to read the book after each lecture? I realize it would be ideal, but I have limited time due to working and wonder if it is a bit overkill and necessary?

To anyone who watch the lectures and read the book, did you find it necessary? Or do you feel the lectures alone are adequate to learn everything and move onto learning by doing and then going on to the next lecture?