General course chat

Random tidbit, came across this while doing some study: if you’re trying to data clean a vision model (lesson 1, 2), following along with the ch-2 production notebook: using

fns = get_image_files(path)

won’t reveal non-image files to your verify_images function later. I haven’t noticed if this affects the learner (I think the dataloaders know to ignore non-images anyway), but if you make a function to convert images to JPEG this’ll cause errors. Instead using:

fns = get_files(path)

will expose everything in path.

This is useful because PIL will complain when your dataloader comes across non-JPEG images (RGBA instead of RGB) when you do your first training. A conversion function:

    for fpath in subdir.iterdir():
        if fpath.suffix != '.jpg':
            im = Image.open(fpath)
            im = im.convert('RGB')
            im.save(str(fpath.parent/fpath.stem)+'.jpg')
            fpath.unlink() // delete original file

will remove this warning later on.

1 Like

Where Lesson 3 at [28:08] says “this is how we set the seed so that each time I run this I gotta get the same random numbers,” makes me remember this example of well commented code from xkcd…
image

2 Likes

Does our fastai output results are reproducible? I am observing that they are not. Any comments on this on how to produce reproducible results??

Where specifically are you finding that the training process is not reproducible?

Are you using the set_seed function?

1 Like

To ensure reproducibility when using fastai, you may want to follow the approach outlined in this fastai forum post. It’s arguably one of the hardest things to get right and therefore can be frustrating as heck :wink:

I’ve been using this approach in my work for some time now. Try it out with a small model and small dataset to get things reproducible and take it from there.

Put set_seed(42,True) at the top of your notebook, and restart the notebook kernel each time you run your notebook, and you should get the same results each time.

2 Likes

This might be a stupid question … but, in what kind of cases/situations would you want to set reproducible=False (the default)?

No I did not. will use now and test it out.

1 Like

I almost never do reproducible notebooks. The main exception is when I’m creating a lesson, and I want to be able to refer to the exact results in prose. So I don’t want them to change.

But otherwise I really like to see the variation that occurs across runs.

5 Likes

set_seed(42,True) is tried and tested - I got the same output. My main motto here is to produce reproducible results. This is for a live hackathon I am participating. Thanks @jeremy @ilovescience

2 Likes

A bit off topic, but still programming related… this was just too awesome not to share
and actually a bit earlier there was a bit on neural networks [up to 22:00].

2 Likes

what are folks (especially beginners) doing after you’ve finished the course?

1 Like
  • working through the book
  • revising lessons trying improve my “Road To The Top” Paddy Kaggle entry
  • hanging about the forums answering questions to practice what I’ve learnt and learn more from being corrected when I’m wrong.
  • looking for opportunites at work to apply ML
2 Likes

Hi Jeremy,

I could install fastbook on M1 chip, error keeps pop up. Let me know your suggestion please! thank you
ImportError: dlopen(/Users/dinglab/mambaforge/lib/python3.9/site-packages/scipy/special/_ufuncs.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/liblapack.3.dylib
Referenced from: /Users/dinglab/mambaforge/lib/python3.9/site-packages/scipy/special/_ufuncs.cpython-39-darwin.so
Reason: tried: ‘/Users/dinglab/mambaforge/lib/python3.9/site-packages/scipy/special/…/…/…/…/liblapack.3.dylib’ (no such file), ‘/Users/dinglab/mambaforge/lib/python3.9/site-packages/scipy/special/…/…/…/…/liblapack.3.dylib’ (no such file), ‘/Users/dinglab/mambaforge/bin/…/lib/liblapack.3.dylib’ (no such file), ‘/Users/dinglab/mambaforge/bin/…/lib/liblapack.3.dylib’ (no such file), ‘/usr/local/lib/liblapack.3.dylib’ (no such file), ‘/usr/lib/liblapack.3.dylib’ (no such file)

@lingliao please don’t cross-post.

Sure, thx for the reminding!

Continuing the discussion from General course chat:

#seprating images into folders according to labels:
I am using this code to seprate images of HAM10000 but it is showing this error...
#Sort images to subfolders first 
import pandas as pd
import os
import shutil

# Dump all images into a folder and specify the path:
data_dir = os.getcwd() + "/data/all_images/"

# Path to destination directory where we want subfolders
dest_dir = os.getcwd() + "/data/reorganized/"

# Read the csv file containing image names and corresponding labels
skin_df2 = pd.read_csv('data/HAM10000/HAM10000_metadata.csv')
print(skin_df['dx'].value_counts())

label=skin_df2['dx'].unique().tolist()  #Extract labels into a list
label_images = []


# Copy images to new folders
for i in label:
    os.mkdir(dest_dir + str(i) + "/")
    sample = skin_df2[skin_df2['dx'] == i]['image_id']
    label_images.extend(sample)
    for id in label_images:
        shutil.copyfile((data_dir + "/"+ id +".jpg"), (dest_dir + i + "/"+id+".jpg"))
    label_images=[] 

ERROR:

File “C:\Users\Sam\PycharmProjects\pythonProject6\main.py”, line 8
data_dir = os.getcwd() + “/data/all_images/”
^
SyntaxError: (unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escape

Hi @samely, Since the above error message indicates it occurs on Line 8 I’m confused that you include all that code after Line 8 in your query. Do you think that code may have an impact on the error? Including all that additional code makes it harder to answer you.

Tip: To make it easier to answer you, please trim down your code to a Minimal Example. That is, in your notebook, remove everything after Line 8, and see if the error remains. If the error goes away, restore lines until the error comes back.

Line 8 also seems to rely only on “import os”, and on no other lines prior to Line 8, so delete those preceeding lines. I suspect you’ll end up trimming your notebook down to as small as this…

import os
dest_dir = os.getcwd() + “/data/reorganized/” 

with the error still occuring. This tiny bit code has a much smaller “surface area” that is easier to reason about. If you get trimmed down to this…

import os
data_dir = os.getcwd() + “/data/all_images/”
dest_dir = os.getcwd() + “/data/reorganized/” 

and the error occurs on line 3, that means line 2 was okay. Since these lines are very similar, a Minimal Example facilitates a close examination for what is different between the working and non-working lines, which may highlight what may be causing the error. At ths point, try googling for your error message.

p.s. For problems in my own code, I often find the process of paring down my code getting ready to ask a question on the forum, causes the answer leap out at me. Please follow up with your observations from running your Minimal Example. This will provide good info so readers can try to assist your further.

2 Likes

Is there any news/updates on part2 of the course?
Eagerly waiting :smiley:
@jeremy