Lesson 1 discussion - beginner

(Faris Baker) #22


I cloned fastai in my computer using:

$ git clone https://github.com/fastai/fastai.git

However, when I run it on my CPU, I receive the following message:

ModuleNotFoundError: No module named ‘bcolz’




(Rahul Pathak) #23

Probably you are missing dependencies required for this course. You can go inside the fastai folder top level and run the following command ‘pip install -r requirements.txt’. This will install dependencies.

(Rahul Pathak) #24

If you are running this for the first time then you also need to install torch. Check pytorch.org and use the installation command according to your system configuration

(Vijay Narayanan Parakimeethal) #25

Please do pip install bcolz in your system. Best would be to create the environment in your conda. In the fastai parent folder please look for enviornment.yml. If you anaconda installed then do conda env create -f environment.yml

(naveen manwani) #26

i was wondering ,why this was used ,could anyone please explain me the intuition behind this step
[Crestle has the datasets required for fast.ai in /datasets, so we’ll create symlinks to the data we want for this competition. (NB: we can’t write to /datasets, but we need a place to store temporary files, so we create our own writable directory to put the symlinks in, and we also take advantage of Crestle’s /cache/ faster temporary storage space.)]

(Jeremy Howard) #27

Running the notebooks on your own computer is an advanced topic. I’d suggest avoiding it if at all possible. If you need to do so, please discuss this on #part1-v2 , rather than on the beginner forum.

(Jeremy Howard) #28

Can you explain what you’re asking about here? Which bit is unclear?

(naveen manwani) #29

actually @jeremy i need to know about the functionality of this part of the code which is mention below
os.makedirs(‘data/dogscats/models’, exist_ok=True)

!ln -s /datasets/fast.ai/dogscats/train {PATH}
!ln -s /datasets/fast.ai/dogscats/test {PATH}
!ln -s /datasets/fast.ai/dogscats/valid {PATH}

os.makedirs(’/cache/tmp’, exist_ok=True)
!ln -fs /cache/tmp {PATH}

because i’m new to the concept of symlinks

thanks in advance

(Faris Baker) #30

Thanks Jeremy for your reply, i will start running the scripts remotely at least for part 1.

(Faris Baker) #31

In Lesson 1, cell ( In [5] ), what does the statement sz=224 mean?

(Jeremy Howard) #32

If you’re running on Crestle, you need to run that code, because Crestle already has the data downloaded for you. The details don’t matter at all - it’s just a little technical detail for this particular platform. If you’re interested in learning more, you could read https://kb.iu.edu/d/abbe

(Jeremy Howard) #33

It sets a variable called sz equal to the value 224. Later on in the notebook we’ll use that variable to define the size of the images passed to the model.

(naveen manwani) #34

thank you @jeremy. ,

(Faris Baker) #35

Thank you Jeremy.

(Adil Ansari) #37

Running lesson 1 in aws environment and getting this:

(Jeremy Howard) #39

@adilansari you have last year’s version there! Make sure you use the part2 AMI, which has the new version installed.


What do the various folders mean - specifically valid and train ? and how do i use them say to set up an example differentiating hotdogs and hamburgers

my attempt at understanding is that the model picks some training data , then runs them … and then runs this new trained model against the valid folder …Once it has a perfect model then it runs the Test to see if this is correct or false

Is this a good assumption / understanding

(Ramesh Sampath) #41

train - Images in this folder is used for Training the Learner (Updating weights to minimize Training loss based on images in train folder)

valid - On each Epoch the learn.fit(), generates prediction and tells you how much error the model has. This is what you are looking for to understand if the model will generalize.

test - Images in this folder are never seen by the model until the final run or when you use learn.TTA(is_test=True)

sample - Jeremy likes to run the models on a sample of images to do a sanity check and also see if the model is able to learn (train). Sometimes you only need a sample of images to get good results.

models - Probably has the saved weights - please double check

tmp - I am not sure of all the scenarios that it’s used for. I think it’s used when you set precompute=True to save the frozen layer output so the custom layers can be trained quickly to get to a reasonable place before you start to unfreeze layers of pre-trained network.

My current understanding. I have not dig deep into models or tmp directory. Hope this is useful.

(Hiromi Suenaga) #42
  • train folder contains labeled data (i.e. you know the correct answers for them) for training.
  • valid folder contains labeled data that is used after every epoch to see how good your model is. These images are not used for actual training and becomes important when you check for over-fitting/under-fitting etc.
  • test folder contains un-labeled data (i.e. you don’t know the correct answers) which you make predictions against to submit to competitions like Kaggles.

Ha. @ramesh beat me to it :slight_smile:

(Jeremy Howard) #43

This is exactly right! :smiley: