Beginner: Basics of fastai, PyTorch, numpy, etc ✅

Conwyn · December 28, 2024, 5:11pm

Hi Robert
Try with less data so 3000, 6000 and then 12000 and note the time. A variation of 3-5 minutes seems a lot. Are you running on your own PC or shared computer or some type of pay-service. The latter maybe applying a throtel or quota system.
Regards Conwyn

rgne · December 29, 2024, 5:22pm

Hey Conwyn,

thanks, decreasing the number of samples definitely helps (for testing purposes, I do that a lot). Actually, the amount time it takes to process the data is the same (sorry, if I was not precise in this regard), the 3-5 minutes were referred to varying number of samples. I am wondering, whether a 2-3 minutes of preparing the Datasets object is normal. And you are absolutely right, it also depends on the hardware i am using. I use my own computer, it can handle machine learning tasks fairly well (I have an RTX 2070). Maybe better to ask my question this way: the amount of time to create the Datasets object is about the same as 1 epoch in the training with an RNN (which can be also subjective, I know). So, I guess, i am asking is this: how do you know, that something is not OK, or something could be improved during the data preparation phase, so that the creation of Datasets object is as streamlined as it should be? Or if it creates the Datasets object, than everything is fine, and the input data format does not affect the performance (e.g. whether the targets are provided in a list, or in a tensor format)?

Thanks,

Robert

VarelleSoraya · December 30, 2024, 12:08am

#NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.
#    If you get a JSON error, just try running it again (it may take a couple of tries).
urls = search_images('bird photos', max_images=1)
urls[0]

When I try to execute this code from one of the cells in the ‘Is it a bird? Creating a model from your own data’ Notebook from the fast.ai course, I get the following error: HTTPStatusError: Client error '403 Forbidden' for url. Anyone know what I might be doing wrong? Is there a ‘birds images’ dataset I could point to if I am unable to dowload bird images? Sorry I’m a beginner.

VarelleSoraya · December 30, 2024, 12:29am

I re-read the error message and learned that the function from the notebook was deprecated. I updated the search function to use DDGS().images() as below, but I am still getting the ‘403’ Forbidden request error.

# from duckduckgo_search import ddg_images  (deprecated)
from duckduckgo_search import DDGS
from fastcore.all import *

def search_images(term, max_images=30):
    print(f"Searching for '{term}'")
    ddgs=DDGS()
    return L(ddgs.images(keywords=term)).itemgot('image')

#NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.
#    If you get a JSON error, just try running it again (it may take a couple of tries).
urls = search_images('bird photos', max_images=1)
urls[0]

Any insight from the community is appreciated.

Conwyn · December 30, 2024, 6:41pm

VarelleSoraya:

# from duckduckgo_search import ddg_images  (deprecated)
from duckduckgo_search import DDGS
from fastcore.all import *

def search_images(term, max_images=30):
    print(f"Searching for '{term}'")
    ddgs=DDGS()
    return L(ddgs.images(keywords=term)).itemgot('image')

#NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.
#    If you get a JSON error, just try running it again (it may take a couple of tries).
urls = search_images('bird photos', max_images=1)
urls[0]

Hi
It seems OK on colab

Conwyn · December 30, 2024, 6:43pm

VarelleSoraya · January 4, 2025, 2:43am

Hi @Conwyn thanks for testing my code on your colab instance and confirming it works. I’m not sure why mine isn’t working. maybe I’ll try using the %pip cell magic to install the duckduckgo-search module to see if that makes a difference. Thanks again.