Lesson 1 official topic

samsfgreen · November 2, 2022, 7:27pm

Hi there,

I don’t know if this is really the right place to ask this or not.

I’m having issues getting the search_images or search_images_ddg working locally.

I’ve even created a separate notebook to test just the search_images_ddg function and this doesn’t work either. The issues appears to be with the urlread line.

I’m running a Mac M1, 32GB, Ventura 13.0 with the Conda setup described here

Any help would be really appreciated as I can’t follow along with the exercises right now…it’s been a few hours of trying to get this working and no luck

Thanks a lot !

bencoman · November 2, 2022, 8:27pm

The environmetn can have a significant impact, and I don’t have a Mac, so I’m little help.

I could help if you were using one of the cloud services and set your notebook to public viewing. Using another platform would also provide you an extra datum for your troubleshooting. It often is very useful to compare a working and non-working implementation.

It is considered easier to commence your DL journey using a cloud service, so you don’t have to deal so much with platform issues.

Fahim · November 3, 2022, 12:48am

I have the same configuration as you (M1 MBP, 32GB, Ventura etc.) and I was able to run the search_images_ddg method with no issues locally:

The error screenshot you posted does not show all the error info. There’s a bit at the very end (or the very beginning? I forget which …) which should show the actual error. So if you can post the full error stack, it would be helpful in figuring out what might be going on.

samsfgreen · November 3, 2022, 8:57am

Hey Fahim, really appreciate your help!!

The full error I get back is below…

Would it be possible to share your setup / conda env with me to see what I might have done differently?

---------------------------------------------------------------------------
TimeoutError                              Traceback (most recent call last)
File ~/miniconda3/envs/fastai/lib/python3.10/urllib/request.py:1348, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
   1347 try:
-> 1348     h.request(req.get_method(), req.selector, req.data, headers,
   1349               encode_chunked=req.has_header('Transfer-encoding'))
   1350 except OSError as err: # timeout error

File ~/miniconda3/envs/fastai/lib/python3.10/http/client.py:1282, in HTTPConnection.request(self, method, url, body, headers, encode_chunked)
   1281 """Send a complete request to the server."""
-> 1282 self._send_request(method, url, body, headers, encode_chunked)

File ~/miniconda3/envs/fastai/lib/python3.10/http/client.py:1328, in HTTPConnection._send_request(self, method, url, body, headers, encode_chunked)
   1327     body = _encode(body, 'body')
-> 1328 self.endheaders(body, encode_chunked=encode_chunked)

File ~/miniconda3/envs/fastai/lib/python3.10/http/client.py:1277, in HTTPConnection.endheaders(self, message_body, encode_chunked)
   1276     raise CannotSendHeader()
-> 1277 self._send_output(message_body, encode_chunked=encode_chunked)

File ~/miniconda3/envs/fastai/lib/python3.10/http/client.py:1037, in HTTPConnection._send_output(self, message_body, encode_chunked)
   1036 del self._buffer[:]
-> 1037 self.send(msg)
   1039 if message_body is not None:
   1040 
   1041     # create a consistent interface to message_body

File ~/miniconda3/envs/fastai/lib/python3.10/http/client.py:975, in HTTPConnection.send(self, data)
    974 if self.auto_open:
--> 975     self.connect()
    976 else:

File ~/miniconda3/envs/fastai/lib/python3.10/http/client.py:1447, in HTTPSConnection.connect(self)
   1445 "Connect to a host on a given (SSL) port."
-> 1447 super().connect()
   1449 if self._tunnel_host:

File ~/miniconda3/envs/fastai/lib/python3.10/http/client.py:941, in HTTPConnection.connect(self)
    940 sys.audit("http.client.connect", self, self.host, self.port)
--> 941 self.sock = self._create_connection(
    942     (self.host,self.port), self.timeout, self.source_address)
    943 # Might fail in OSs that don't implement TCP_NODELAY

File ~/miniconda3/envs/fastai/lib/python3.10/socket.py:845, in create_connection(address, timeout, source_address)
    844 try:
--> 845     raise err
    846 finally:
    847     # Break explicitly a reference cycle

File ~/miniconda3/envs/fastai/lib/python3.10/socket.py:833, in create_connection(address, timeout, source_address)
    832     sock.bind(source_address)
--> 833 sock.connect(sa)
    834 # Break explicitly a reference cycle

TimeoutError: [Errno 60] Operation timed out

During handling of the above exception, another exception occurred:

URLError                                  Traceback (most recent call last)
Input In [9], in <cell line: 1>()
----> 1 results = search_images_ddg('grizzly bear', max_images=10)
      2 # ims = results.attrgot('contentUrl')
      3 print(len(results), results)

File ~/miniconda3/envs/fastai/lib/python3.10/site-packages/fastbook/__init__.py:57, in search_images_ddg(term, max_images)
     55 assert max_images<1000
     56 url = 'https://duckduckgo.com/'
---> 57 res = urlread(url,data={'q':term})
     58 searchObj = re.search(r'vqd=([\d-]+)\&', res)
     59 assert searchObj

File ~/miniconda3/envs/fastai/lib/python3.10/site-packages/fastcore/net.py:117, in urlread(url, data, headers, decode, return_json, return_headers, timeout, **kwargs)
    115 "Retrieve `url`, using `data` dict or `kwargs` to `POST` if present"
    116 try:
--> 117     with urlopen(url, data=data, headers=headers, timeout=timeout, **kwargs) as u: res,hdrs = u.read(),u.headers
    118 except HTTPError as e:
    119     if 400 <= e.code < 500: raise ExceptionsHTTP[e.code](e.url, e.hdrs, e.fp, msg=e.msg) from None

File ~/miniconda3/envs/fastai/lib/python3.10/site-packages/fastcore/net.py:108, in urlopen(url, data, headers, timeout, **kwargs)
    106     if not isinstance(data, (str,bytes)): data = urlencode(data)
    107     if not isinstance(data, bytes): data = data.encode('ascii')
--> 108 try: return urlopener().open(urlwrap(url, data=data, headers=headers), timeout=timeout)
    109 except HTTPError as e: 
    110     e.msg += f"\n====Error Body====\n{e.read().decode(errors='ignore')}"

File ~/miniconda3/envs/fastai/lib/python3.10/urllib/request.py:519, in OpenerDirector.open(self, fullurl, data, timeout)
    516     req = meth(req)
    518 sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method())
--> 519 response = self._open(req, data)
    521 # post-process response
    522 meth_name = protocol+"_response"

File ~/miniconda3/envs/fastai/lib/python3.10/urllib/request.py:536, in OpenerDirector._open(self, req, data)
    533     return result
    535 protocol = req.type
--> 536 result = self._call_chain(self.handle_open, protocol, protocol +
    537                           '_open', req)
    538 if result:
    539     return result

File ~/miniconda3/envs/fastai/lib/python3.10/urllib/request.py:496, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    494 for handler in handlers:
    495     func = getattr(handler, meth_name)
--> 496     result = func(*args)
    497     if result is not None:
    498         return result

File ~/miniconda3/envs/fastai/lib/python3.10/urllib/request.py:1391, in HTTPSHandler.https_open(self, req)
   1390 def https_open(self, req):
-> 1391     return self.do_open(http.client.HTTPSConnection, req,
   1392         context=self._context, check_hostname=self._check_hostname)

File ~/miniconda3/envs/fastai/lib/python3.10/urllib/request.py:1351, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
   1348         h.request(req.get_method(), req.selector, req.data, headers,
   1349                   encode_chunked=req.has_header('Transfer-encoding'))
   1350     except OSError as err: # timeout error
-> 1351         raise URLError(err)
   1352     r = h.getresponse()
   1353 except:

URLError: <urlopen error [Errno 60] Operation timed out>```

Fahim · November 3, 2022, 9:13am

Hey Sam, happy to help

My setup won’t help you since your URL query is timing out for some reason — it’s basically just a network error and not anything else, as far as I can tell. This is the relevant bit:

URLError: <urlopen error [Errno 60] Operation timed out>```

I believe the URL that the code is trying to connect to is:

https://duckduckgo.com/

The first debugging step you might want to try is to see if you can connect to the URL directly from your browser. If that works, then you might need to do a bit more checking. But if it works, then you at least know it should generally work and that there’s some issue with the Python code perhaps?

samsfgreen · November 3, 2022, 10:00am

Thanks Fahim!

Narrowing it down haha…

Looks like I can’t access duckduckgo.com.

If I restart my router, then duckduckgo opens, however when I run search_images_ddg, duckduckgo then become unresponsive again. Tried this twice now. For some reason, running the search_images_ddg stops duckduckgo being accessible on my wifi.

No idea what’s going on here

EDIT: I think I’ve found a similar issue - going to try a different tak: Ddg not replying?

Fahim · November 3, 2022, 10:17am

Good luck with the issue. That is a strange one …

Generally, the first lesson should work on an M1 device since I did get it to work after making some changes. That was a few months back and PyTorch was at a different point then. They’ve fixed a bunch of things since then to work on M1 macs and so hopefully, it works better. If you do run into issues, check the FastAI GitHub repo — I opened an issue way back when listing all the changes that had to be made to get things working for lesson 1. Hope that helps!

samsfgreen · November 3, 2022, 10:22am

Thanks Fahim!

I think I’ve discovered the issue → the conda install fastbook is different to the git install…

However I’ve setup my env to use conda, so going to have to figure out how to get the git install working.

Note the different underlying code for the util function.

Will checkout your github issue to see if you answered how to get your MacM1 setup for working with the fastai library!

Thanks again!

Fahim · November 3, 2022, 11:00am

As far as I can tell, what’s on the GitHub repo is version 0.0.19 whereas what you get via pip/conda is 0.0.29. So that would seem to indicate that what’s you get via pip/conda is a later version … You should be able to simply add the code from the GitHub repo to your Jupyter notebook and call it from within your notebook itself to see if that works when the other version does not …

That might help you figure out if the issue is withe one version of the code or the other, or something else …

blin17 · November 4, 2022, 6:50am

Hi,
I’m trying to build my own classifier based on my pet cat (named Poopy). I’m running the code from lesson 1 on Kaggle, but the only difference is I’m not using DuckDuckGo and instead using my own JPG files. I uploaded a bunch of jpg files and then tried to use the DataBlock, but I’m running into errors:

ValueError: This DataLoader does not contain any batches

image2304×960 280 KB

Can anyone help me? I’ve been stuck on this for a while.

Thanks!

blin17 · November 4, 2022, 6:51am

Also I know its reading in data into the datablock because when I switch it to a dataset, it shows the file:

benkarr · November 4, 2022, 8:52am

If you want to, you could make your notebook public and share a link here. It makes debuging easier especially since there is no obvious mistake in your code-snippet, at least that I can see.

Anyways, the dblock in your datasets example doesn’t contain the blocks argument, this could be the reason why your dsets work but the dls wont.

You can have a detailed look at how the dblock is generated with:

dblock = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    get_y=lambda x: x.name[0] == "I",
    item_tfms=[Resize(192), method='squish')],
)

dblock.summary(path)

Hopefully that gives more detail on whats going wrong.

blin17 · November 4, 2022, 10:08am

Thanks for the response!

Here’s a link to the public notebook:

And actually I just removed that blocks section in dblocks so it would display the filename, otherwise it would display random parameters, but I can see why that’s confusing. I tried the summary, but it still doesn’t make sense to me:

Thanks!

benkarr · November 4, 2022, 1:33pm

Shure

I still can’t see anything that shouldn’t work code-wise, so my guess is that the problem is in the data itself.
Your dataset is private, so I can’t check that . You could make it public or add me as a contributor (“benkarr” - in the datasets Settings Tab under Sharing). If you want to keep your Poopy private () a thing I would try is to load all images, see if there are broken ones and remove them, eg:

fns = get_image_files(path)
for fn in fns:
    try:
        Image.open(fn) 
    except:
        print(fn)

benkarr · November 4, 2022, 2:27pm

@blin17 Ok, the issue is that the dataset has 15 images, you use RandomSplitter to save some of them for validation so your training set has less than 15 images. Since the batch size is also 15, the dataloader tries to pull 15 images when creating the first batch and fails. Using a batch size that’s at most (15*0.8=) 12 should work!

Besides that: You can attach your dataset to a kaggle notebook via the right panel: (+Add Data) > (Your Datasets) > ''Poopy Train" and don’t have to use the opendatasets library. You can then find the data at ../input/poopy-train.

Also: Poopy!!

blin17 · November 4, 2022, 2:27pm

@benkarr haha no I don’t mind sharing picture of the Poops-- made you a collaborator on the dataset. But I tried loading every single image in that train folder and it still doesn’t work either-- I hand pasted every file name in the training set into the top cell and the pictures all load fine:

blin17 · November 4, 2022, 7:51pm

it works!! thank you @benkarr , but now it recognizes everything as a Poopy, is that because there’s not enough training data?

benkarr · November 5, 2022, 12:31pm

I was distracted by Poopys cuteness, so I didn’t think of it before
Right now (at least it seems so in the notebook/datasets) you only provide data for your “True” label (images of Poopy) during training.
In order to make the model learn to predict “False” for other things you also have to provide the “other things” to the Learner. Your last screenshot suggest that you already have a ‘notpoopy’ folder, so to keep things simple I would do the following:

Locally create a single poopy_dataset folder. This folder contains poopy and notpoopy subfolders (we’re going to derive the labels from the folder names) and put all the images (so no split into train and valid as in your kaggle datasets before) into the according folders. The folder structure should look something like:

poopy_dataset
├── notpoopy
│   ├── IMG_2314.jpg <-- images of something else
│   ├── IMG_3142.jpg
│   └── etc…
└── poopy
    ├── IMG_1234.jpg <-- images of Poopy
    ├── IMG_4321.jpg
    └── etc…

Zip poopy_dataset and upload the .zip file to kaggle, creating a new Dataset (kaggle automatically unzips it). Say you use the method to add a dataset to a notebook that I mentioned before, you will have the same folder structure as above at path = Path('..input/poopy_dataset') (depending on the name you gave the dataset).

For training you only have to adjust the dblock in one place, namely the get_y function that generates the labels for each image. You can use parent_label which fastai provides. It returns the folder name each image is contained in as the label, so notpoopy for parent_label('../input/poopy_dataset/notpoopy/IMG_2314.jpg') for example.

A functioning result might look like this:

from fastai.vision.all import *
path = Path('../input/poopy_dataset')
dblock = DataBlock(
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=[Resize(192, method='squish')]
)
dls = dblock.dataloaders(path, bs=8)
dls.show_batch() ## Check if the labels match the images.

learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(3)

Let me know if it worked or if you need any other help .

sumitsays · November 9, 2022, 5:25pm

Does the “Weights” in the above image refers to the parameter of the model/architecture?

P.S: Image is from Chapter 1 of the book under the section “How Our Image Recognizer Works”.

cinnabun · November 9, 2022, 9:44pm

yes - the weights of the model are the parameters of the model