Developer chat

stas · October 28, 2018, 5:53pm

You haven’t followed up on my reply to your question Developer chat so I’m not sure what you’re commenting on, @gsg. Why did you expect it to suddenly become different? Why do you want the date in the version? You can look at the fastai/version.py timestamp to see the date if you really need the date.

stas · October 28, 2018, 5:55pm

@gsg is asking to have the date embedded in the version like the pre-release versions of pytorch. But pytorch is only doing that temporarily until 1.0 is released if I understand their strategy correctly. We have already released 1.0, so we are on normal version numbers now.

nok · October 28, 2018, 5:58pm

So I tried to clone my fork repo and then run this command, it creates a repo inside my current fork repo, and then I have to go inside and checkout to the new branch. Did I do something wrong here? Where should I run this command? Thanks.

tools/fastai-make-pr-branch

gsg · October 28, 2018, 6:36pm

Yes, I do use the bleeding edge version w/developer install.
Sorry about the confusion, I was referring to Sylvain’s Developer chat
" Merged a big change: Learner objects now determine from the loss function…"

I (tried to) upload that merge but the label was still 1.0.15, so was not sure if I did get the newer version (with Sylvain’s) changes or the previous version without them.
I understand that both before and after the merge are not “releases” so both are still 1.0.15… so adding a 1.0.15.20181028 would help differentiate between them.
Just a small bandaid for those at the bleeding-edge…

stas · October 28, 2018, 6:37pm

Thank you for trying the new tool, @nok.

First, to explain how it currently works:

It clones the repo into wherever you are running it from.

It first checks whether the directory you’re in is already a clone you’re wanting to make and then it doesn’t clone, but re-uses the current checkout. the logic is to compare the output of:

git config --get remote.origin.url

with the url you are asking for, so for example if I’m inside the original fastai repo, the above command will return:

git@github.com:fastai/fastai.git

but if I’m asking for the fork of the same, which in my case would be git@github.com:stas00/fastai.git, then it can’t reuse that checkout and must make a new one. And so it does.

However if I’m already inside a checkout that matches: git@github.com:stas00/fastai.git and I am invoking tools/fastai-make-pr-branch for the same repo, it will not do a new checkout and use the current one instead.

Now to how we can improve usability. I think the issue is that when you call it from the fastai repo, with tools/fastai-make-pr-branch - that’s where it will create the new clone. So ideally it should not be called it that way, but as explained here: https://docs-dev.fast.ai/git.html#helper-program

curl -O https://raw.githubusercontent.com/fastai/fastai/master/tools/fastai-make-pr-branch
chmod a+x fastai-make-pr-branch
./fastai-make-pr-branch https your-github-username fastai new-feature

another approach is to position yourself into the base directory you want the clone to happen in:

cd fastai
cd ..
fastai/tools/fastai-make-pr-branch https your-github-username fastai new-feature ../put-it-here

or put the script somewhere in your $PATH, so that you could invoke it from anywhere.

or we should instrument it to have an extra argument so that the user can specify where the output should go. So say if you do call it from the fastai checkout folder you could say:

cd fastai
tools/fastai-make-pr-branch https your-github-username fastai new-feature ../put-it-here

Thoughts?

stas · October 28, 2018, 6:41pm

Thank you for the clarification, @gsg. We already have that mechanism in place. It’s .dev0. If you have 1.0.15 then you’re using a released version, if you now do a dev install you will get 1.0.16.dev0 after git pull - and now you’re on the bleeding edge. You weren’t before.

The timeline is:

...
1.0.14
1.0.15.dev0
1.0.15
1.0.16.dev0
...

gsg · October 28, 2018, 7:04pm

Thanks for the follow-up @stas.
My understanding now is that whenever we do the developer install

git pull https://github.com/fastai/fastai 
cd fastai       
tools/run-after-git-clone                                                                                                                
pip install -e .[dev]

twice, If fastai.__version__ has not changed between the 2 deployments, then there have been no changes to fastai code.

In the above case, since it stayed the same, e.g., 1.0.15.dev0,
this indicates that the changes that Sylvain announced Sunday morning, were not yet in the latest “bleeding” edge…(or that they were already in the Saturday git pull)
Correct?

stas · October 28, 2018, 7:33pm

See the updated doc here: https://docs-dev.fast.ai/develop.html#development-editable-install

You only need to do it once. After that you just do git pull and nothing else.

If you do git pull right now you will see 1.0.16.dev0, so now you’re on the bleeding edge and all the commits should be there. Please double check that it’s the case.

you can also run:

git log --oneline

in your checkout and see the short log of everything you have. if you want pretty:

git log --graph --decorate --pretty=oneline --abbrev-commit

* 4c11bce (HEAD -> master, origin/master, origin/HEAD) improvements:
* 3856935 document version+timeline, adjust levels
*   ffa3f50 Merge branch 'master' of github.com:fastai/fastai
|\
| * 32377a3 new dev cycle: 1.0.16.dev0
| * 0a4e629 CHANGES
| * ef15dda rename create_cnn
* | 67d2ff3 require dev install for PR, plus run-after-git-clone, and split steps
* | b56c79a require coverage for dev, needed for testing on fastai and fastai_docs
* | 2f98955 azure support links
|/
*   14c02c2 Merge branch 'master' of github.com:fastai/fastai
|\
| * 70cb432 Add maybe copy tests (#980)
* | a456e56 Remove model type
|/
* fbd6235 Learner.create_cnn
*   768d606 Merge branch 'master' of github.com:fastai/fastai
|\
| * 2d63ae4 Update CHANGES
| * c037f61 Fix pred_batch
| * 644cb64 create x in cuda for model_sizes() (#990)
* | 01aec14 Learner.create_cnn
|/
* a1ff5c2 Auto activ (#992)
* 7da5bd3 SegmentationDataset classes
* bc255a8 document the issue with missing libcuda.so.1
* 2735255 document gpustat, and nvidia-smi dmon -s u (forum tips)
* bd62fdb add jekyll templates in the package
* 186739f Ensure that plot_pdp accepts an axis name. Fixes #986. (#987)
* ab4a39b Fix saleElapsed vs YearMade interaction plot in ml1/lesson2-rf_interpretation. Fixes #988. (#989)
* 0de3384 move property
*   7d68137 recurse flag

plus, there is CHANGES.md where important changes like bugfixes are logged.

jeremy · October 28, 2018, 7:46pm

Just pushed 1.0.15. Main change (from CHANGES.md):

ConvLearner ctor is replaced by a function called create_cnn

gsg · October 28, 2018, 8:40pm

If you do git pull right now you will see 1.0.16.dev0, so now you’re on the bleeding edge and all the commits should be there. Please double check that it’s the case.

Confirmed!
Bleeding again…
Thanks!!

stas · October 28, 2018, 10:59pm

How do we go about creating a TODO/HELP-WANTED list, and invite others to contribute on tasks that need to be done?

I have one item that is up for grabs: https://docs-dev.fast.ai/git.html#hub (see HELP-WANTED there). It should be a fun little project, shouldn’t take more than a few hours to figure out. I laid out all the details, and it just needs to be coded in python to support windows users w/o bash.

gsg · October 28, 2018, 11:45pm

@stas Perhaps a simple way would be to create a “TODO/HELP-WANTED” category in the Forum.
Each entry under the category could follow a template (eg like the bug reports template),
that describes the requirements, such as, in the above example, “Windows Platform”, etc.
Others may then state their interest and even create small groups to tackle the task together as a teaching opportunity…

sgugger · October 29, 2018, 12:29am

New big change, introduced the data block API. Jeremy will explain it more on Tuesday and I’ll document it tomorrow, but the basics is that it lets you plug the different parts of creating a DataBunch as you want with a lot more flexibility than the current factory methods. Specifically, you tell

where are the filenames (if applicable)
how to determine the label of each input (re pattern, folder names, csv file…)
how to create a validation set (random split, folder names, valid indexes…)
what Dataset function to apply (ImageDataset, ImageMultiDataset, SegmentationDataset…)
transforms to apply (if applicable)
how to databunch it (which is where you tell the batchsize, the dl transforms…)

Examples are in the 104a and 104b notebooks in the dev folder, but here are a few of them:

Pets datasets from lesson 1

path = untar_data(URLs.PETS)
tfms = get_transforms()
data = (InputList.from_folder(path/'images')
        .label_from_re(r'^(.*)_\d+.jpg$')
        .random_split_by_pct(0.2)
        .datasets(ImageClassificationDataset)
        .transform(tfms, size=224)
        .databunch(bs=64)

Classic dogscats in an Imagenet style folder structure

path = Path('data/dogscats')
tfms = get_transforms()
data = (InputList.from_folder(path)
        .label_from_folder()
        .split_by_folder()
        .datasets(ImageClassificationDataset)
        .transform(tfms, size=224)
        .databunch(bs=64))

Planet dataset (multiclassification problem with labels in a csv file)

path = untar_data(URLs.PLANET_SAMPLE)
tfms = get_transforms()
data = (InputList.from_folder(path)
        .label_from_csv('labels.csv', sep=' ', suffix='.jpg', folder='train')
        .random_split_by_pct(0.2)
        .datasets(ImageMultiDataset)
        .transform(tfms, size=128)
        .databunch(bs=64))

Camvid (segmentation tasks with segmentation masks in another folder):

path = Path('data/camvid')
get_y_fn = lambda x: path_lbl/f'{x.stem}_P{x.suffix}'
codes = np.loadtxt(path/'codes.txt', dtype=str)
tfms = get_transforms()
data = (InputList.from_folder(path/'images')
        .label_from_func(get_y_fn)
        .split_by_fname_file('../valid.txt')
        .datasets(SegmentationDataset, classes=codes)
        .transform(get_transforms(), size=128, tfm_y=True)
        .databunch(bs=64))

argetlam93 · October 29, 2018, 1:25am

Facing the same issue, after the latest pull
NameError: name 'ConvLearner' is not defined

Did you make use of the create_cnn in the vision.learner then ?

When I used that like:

learn = create_cnn(data, models.resnet34, metrics=error_rate)

I get the following error:

AttributeError: module 'fastai.vision.data' has no attribute 'c'

Kaspar · October 29, 2018, 11:47am

Thanks for the new camvid notebook - so elegant.

running:
train_ds = SegmentationDataset(train_fns, y_train_fns)
valid_ds = SegmentationDataset(valid_fns, y_valid_fns)

i get this:

TypeError Traceback (most recent call last)
in
----> 1 train_ds = SegmentationDataset(train_fns, y_train_fns)
2 valid_ds = SegmentationDataset(valid_fns, y_valid_fns)

TypeError: init() missing 1 required positional argument: ‘classes’

I believe it is fixed by adding codes as the classes argument ? :

train_ds = SegmentationDataset(train_fns, y_train_fns,codes)
valid_ds = SegmentationDataset(valid_fns, y_valid_fns,code)

Also there is a:

learn.unfreezefreeze()

that should be changed to:

learn.unfreeze()

A small suggestion would be to use an explict mapping from segments to classes using a dict with:

key as the classes pixel value in the mask
value as the class

sgugger · October 29, 2018, 1:33pm

You should restart your ntoebook and make sure you define data somewhere as pyton believes it’s the data module of fastai.vision from your error message.

jeremy · October 29, 2018, 1:34pm

Thanks - note that this is a work in progress so no need to give feedback until it’s done. It’s not in a working state yet.

nok · October 29, 2018, 2:03pm

Thanks for your detail explanation, always learn a lot from the tools/tricks that you shared.

For now, it feels natural to call it within a fastai repo, but I don’t see any thing stopping this tool to be used outside fastai. I am not good at bash script but I saw orig_user_name=fastai inside the script, so from my understanding we can simply changes this line and use it in other open source project as well. So putting it in $PATH make perfect sense.

I will give it a few more trials and see if I will come back with any question.

Thanks again.

fredguth · October 29, 2018, 2:55pm

It reminds me of Processing language.

shaun1 · October 29, 2018, 3:55pm

I have the developer version installed with the latest pull. I get the following error when trying to create a cnn (from the docs):

AttributeError: type object ‘Learner’ has no attribute ‘create_cnn’

Any ideas?