Lesson 1 discussion - beginner

FarisMBaker · November 10, 2017, 10:11am

Hi,

I cloned fastai in my computer using:

$ git clone https://github.com/fastai/fastai.git

However, when I run it on my CPU, I receive the following message:

ModuleNotFoundError: No module named 'bcolz’

Regards,

Faris

rpathak · November 10, 2017, 10:20am

Probably you are missing dependencies required for this course. You can go inside the fastai folder top level and run the following command ‘pip install -r requirements.txt’. This will install dependencies.

rpathak · November 10, 2017, 10:22am

If you are running this for the first time then you also need to install torch. Check pytorch.org and use the installation command according to your system configuration

pnvijay · November 10, 2017, 12:03pm

Please do pip install bcolz in your system. Best would be to create the environment in your conda. In the fastai parent folder please look for enviornment.yml. If you anaconda installed then do conda env create -f environment.yml

naveenmanwani · November 10, 2017, 3:18pm

hi
i was wondering ,why this was used ,could anyone please explain me the intuition behind this step
[Crestle has the datasets required for fast.ai in /datasets, so we’ll create symlinks to the data we want for this competition. (NB: we can’t write to /datasets, but we need a place to store temporary files, so we create our own writable directory to put the symlinks in, and we also take advantage of Crestle’s /cache/ faster temporary storage space.)]

jeremy · November 10, 2017, 4:43pm

Running the notebooks on your own computer is an advanced topic. I’d suggest avoiding it if at all possible. If you need to do so, please discuss this on #part1-v2 , rather than on the beginner forum.

jeremy · November 10, 2017, 4:44pm

Can you explain what you’re asking about here? Which bit is unclear?

naveenmanwani · November 10, 2017, 4:47pm

actually @jeremy i need to know about the functionality of this part of the code which is mention below
os.makedirs(‘data/dogscats/models’, exist_ok=True)

!ln -s /datasets/fast.ai/dogscats/train {PATH}
!ln -s /datasets/fast.ai/dogscats/test {PATH}
!ln -s /datasets/fast.ai/dogscats/valid {PATH}

os.makedirs(’/cache/tmp’, exist_ok=True)
!ln -fs /cache/tmp {PATH}

because i’m new to the concept of symlinks

thanks in advance

FarisMBaker · November 10, 2017, 5:36pm

Thanks Jeremy for your reply, i will start running the scripts remotely at least for part 1.

FarisMBaker · November 10, 2017, 6:15pm

In Lesson 1, cell ( In [5] ), what does the statement sz=224 mean?

jeremy · November 10, 2017, 8:49pm

If you’re running on Crestle, you need to run that code, because Crestle already has the data downloaded for you. The details don’t matter at all - it’s just a little technical detail for this particular platform. If you’re interested in learning more, you could read https://kb.iu.edu/d/abbe

jeremy · November 10, 2017, 8:50pm

It sets a variable called sz equal to the value 224. Later on in the notebook we’ll use that variable to define the size of the images passed to the model.

naveenmanwani · November 11, 2017, 3:15am

thank you @jeremy. ,

FarisMBaker · November 11, 2017, 5:25am

Thank you Jeremy.

adilansari · November 13, 2017, 12:28am

Running lesson 1 in aws environment and getting this:

jeremy · November 13, 2017, 4:33am

@adilansari you have last year’s version there! Make sure you use the part2 AMI, which has the new version installed.

Judywawira · November 15, 2017, 4:04am

What do the various folders mean - specifically valid and train ? and how do i use them say to set up an example differentiating hotdogs and hamburgers

my attempt at understanding is that the model picks some training data , then runs them … and then runs this new trained model against the valid folder …Once it has a perfect model then it runs the Test to see if this is correct or false

Is this a good assumption / understanding

ramesh · November 15, 2017, 4:13am

train - Images in this folder is used for Training the Learner (Updating weights to minimize Training loss based on images in train folder)

valid - On each Epoch the learn.fit(), generates prediction and tells you how much error the model has. This is what you are looking for to understand if the model will generalize.

test - Images in this folder are never seen by the model until the final run or when you use learn.TTA(is_test=True)

sample - Jeremy likes to run the models on a sample of images to do a sanity check and also see if the model is able to learn (train). Sometimes you only need a sample of images to get good results.

models - Probably has the saved weights - please double check

tmp - I am not sure of all the scenarios that it’s used for. I think it’s used when you set precompute=True to save the frozen layer output so the custom layers can be trained quickly to get to a reasonable place before you start to unfreeze layers of pre-trained network.

My current understanding. I have not dig deep into models or tmp directory. Hope this is useful.

hiromi · November 15, 2017, 4:15am

train folder contains labeled data (i.e. you know the correct answers for them) for training.
valid folder contains labeled data that is used after every epoch to see how good your model is. These images are not used for actual training and becomes important when you check for over-fitting/under-fitting etc.
test folder contains un-labeled data (i.e. you don’t know the correct answers) which you make predictions against to submit to competitions like Kaggles.

Ha. @ramesh beat me to it

jeremy · November 15, 2017, 4:33am

This is exactly right!