Developer chat

See the updated doc here: https://docs-dev.fast.ai/develop.html#development-editable-install

You only need to do it once. After that you just do git pull and nothing else.

If you do git pull right now you will see 1.0.16.dev0, so now you’re on the bleeding edge and all the commits should be there. Please double check that it’s the case.

you can also run:

git log --oneline

in your checkout and see the short log of everything you have. if you want pretty:

git log --graph --decorate --pretty=oneline --abbrev-commit

* 4c11bce (HEAD -> master, origin/master, origin/HEAD) improvements:
* 3856935 document version+timeline, adjust levels
*   ffa3f50 Merge branch 'master' of github.com:fastai/fastai
|\
| * 32377a3 new dev cycle: 1.0.16.dev0
| * 0a4e629 CHANGES
| * ef15dda rename create_cnn
* | 67d2ff3 require dev install for PR, plus run-after-git-clone, and split steps
* | b56c79a require coverage for dev, needed for testing on fastai and fastai_docs
* | 2f98955 azure support links
|/
*   14c02c2 Merge branch 'master' of github.com:fastai/fastai
|\
| * 70cb432 Add maybe copy tests (#980)
* | a456e56 Remove model type
|/
* fbd6235 Learner.create_cnn
*   768d606 Merge branch 'master' of github.com:fastai/fastai
|\
| * 2d63ae4 Update CHANGES
| * c037f61 Fix pred_batch
| * 644cb64 create x in cuda for model_sizes() (#990)
* | 01aec14 Learner.create_cnn
|/
* a1ff5c2 Auto activ (#992)
* 7da5bd3 SegmentationDataset classes
* bc255a8 document the issue with missing libcuda.so.1
* 2735255 document gpustat, and nvidia-smi dmon -s u (forum tips)
* bd62fdb add jekyll templates in the package
* 186739f Ensure that plot_pdp accepts an axis name. Fixes #986. (#987)
* ab4a39b Fix saleElapsed vs YearMade interaction plot in ml1/lesson2-rf_interpretation. Fixes #988. (#989)
* 0de3384 move property
*   7d68137 recurse flag

plus, there is CHANGES.md where important changes like bugfixes are logged.

Just pushed 1.0.15. Main change (from CHANGES.md):

ConvLearner ctor is replaced by a function called create_cnn
1 Like
If you do git pull right now you will see 1.0.16.dev0, so now you’re on the bleeding edge and all the commits should be there. Please double check that it’s the case.

Confirmed!
Bleeding again… :slight_smile:
Thanks!!

1 Like

How do we go about creating a TODO/HELP-WANTED list, and invite others to contribute on tasks that need to be done?

I have one item that is up for grabs: https://docs-dev.fast.ai/git.html#hub (see HELP-WANTED there). It should be a fun little project, shouldn’t take more than a few hours to figure out. I laid out all the details, and it just needs to be coded in python to support windows users w/o bash.

@stas Perhaps a simple way would be to create a “TODO/HELP-WANTED” category in the Forum.
Each entry under the category could follow a template (eg like the bug reports template),
that describes the requirements, such as, in the above example, “Windows Platform”, etc.
Others may then state their interest and even create small groups to tackle the task together as a teaching opportunity…

New big change, introduced the data block API. Jeremy will explain it more on Tuesday and I’ll document it tomorrow, but the basics is that it lets you plug the different parts of creating a DataBunch as you want with a lot more flexibility than the current factory methods. Specifically, you tell

  • where are the filenames (if applicable)
  • how to determine the label of each input (re pattern, folder names, csv file…)
  • how to create a validation set (random split, folder names, valid indexes…)
  • what Dataset function to apply (ImageDataset, ImageMultiDataset, SegmentationDataset…)
  • transforms to apply (if applicable)
  • how to databunch it (which is where you tell the batchsize, the dl transforms…)

Examples are in the 104a and 104b notebooks in the dev folder, but here are a few of them:

Pets datasets from lesson 1

path = untar_data(URLs.PETS)
tfms = get_transforms()
data = (InputList.from_folder(path/'images')
        .label_from_re(r'^(.*)_\d+.jpg$')
        .random_split_by_pct(0.2)
        .datasets(ImageClassificationDataset)
        .transform(tfms, size=224)
        .databunch(bs=64)

Classic dogscats in an Imagenet style folder structure

path = Path('data/dogscats')
tfms = get_transforms()
data = (InputList.from_folder(path)
        .label_from_folder()
        .split_by_folder()
        .datasets(ImageClassificationDataset)
        .transform(tfms, size=224)
        .databunch(bs=64))

Planet dataset (multiclassification problem with labels in a csv file)

path = untar_data(URLs.PLANET_SAMPLE)
tfms = get_transforms()
data = (InputList.from_folder(path)
        .label_from_csv('labels.csv', sep=' ', suffix='.jpg', folder='train')
        .random_split_by_pct(0.2)
        .datasets(ImageMultiDataset)
        .transform(tfms, size=128)
        .databunch(bs=64))

Camvid (segmentation tasks with segmentation masks in another folder):

path = Path('data/camvid')
get_y_fn = lambda x: path_lbl/f'{x.stem}_P{x.suffix}'
codes = np.loadtxt(path/'codes.txt', dtype=str)
tfms = get_transforms()
data = (InputList.from_folder(path/'images')
        .label_from_func(get_y_fn)
        .split_by_fname_file('../valid.txt')
        .datasets(SegmentationDataset, classes=codes)
        .transform(get_transforms(), size=128, tfm_y=True)
        .databunch(bs=64))
3 Likes

Facing the same issue, after the latest pull
NameError: name 'ConvLearner' is not defined

Did you make use of the create_cnn in the vision.learner then ?

When I used that like:

learn = create_cnn(data, models.resnet34, metrics=error_rate)

I get the following error:

AttributeError: module 'fastai.vision.data' has no attribute 'c'

1 Like

Thanks for the new camvid notebook - so elegant.

running:
train_ds = SegmentationDataset(train_fns, y_train_fns)
valid_ds = SegmentationDataset(valid_fns, y_valid_fns)

i get this:

TypeError Traceback (most recent call last)
in
----> 1 train_ds = SegmentationDataset(train_fns, y_train_fns)
2 valid_ds = SegmentationDataset(valid_fns, y_valid_fns)

TypeError: init() missing 1 required positional argument: ‘classes’

I believe it is fixed by adding codes as the classes argument ? :

train_ds = SegmentationDataset(train_fns, y_train_fns,codes)
valid_ds = SegmentationDataset(valid_fns, y_valid_fns,code)

Also there is a:

learn.unfreezefreeze()

that should be changed to:

learn.unfreeze()

A small suggestion would be to use an explict mapping from segments to classes using a dict with:

  • key as the classes pixel value in the mask
  • value as the class

You should restart your ntoebook and make sure you define data somewhere as pyton believes it’s the data module of fastai.vision from your error message.

Thanks - note that this is a work in progress so no need to give feedback until it’s done. It’s not in a working state yet.

Thanks for your detail explanation, always learn a lot from the tools/tricks that you shared.

For now, it feels natural to call it within a fastai repo, but I don’t see any thing stopping this tool to be used outside fastai. I am not good at bash script but I saw orig_user_name=fastai inside the script, so from my understanding we can simply changes this line and use it in other open source project as well. So putting it in $PATH make perfect sense.

I will give it a few more trials and see if I will come back with any question. :slight_smile:

Thanks again.

It reminds me of Processing language.

I have the developer version installed with the latest pull. I get the following error when trying to create a cnn (from the docs):

AttributeError: type object ‘Learner’ has no attribute ‘create_cnn’

Any ideas?

create_cnn is a function, it’s not inside Learner.

Do the docs need to be updated then? It says here:

learn = Learner.create_cnn(data, models.resnet18, metrics=accuracy)

Yes, orig_user_name can be made into a parameter and then you could use the script with any github project.
that’s why I called it, fastai-make-pr-branch - as it hardwires the fastai user :wink:

The only custom thing in the script is that it runs tools/run-after-git-clone if it finds it in the repo.

1 Like

Hey all,

Just sharing this issue here with the validation set random seed.
https://forums.fast.ai/t/lesson-1-pets-benchmarks/27681/55?u=jamesrequa

Please feel free to verify this on your end as well. Steps to reproduce:

  1. Set a random seed in the jupyter notebook np.random.seed(2)
  2. Create an ImageDataBunch data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=299, bs=bs, valid_pct=0.2)
  3. Save train/val x for later trn_x, val_x = data.train_ds.x, data.valid_ds.x
  4. Create a new ImageDataBunch repeating step 2
  5. Check again the train/val x for this new data instance trn_x2, val_x2 = data.train_ds.x, data.valid_ds.x. Compare with the first train/valid set and verify they are not the same.

I have already implemented the code changes which fixes this issue so if you like I can submit a PR :slight_smile: I think this is pretty important to fix right away as it can result in validation loss/error rate results which are not reliable and can happen in a very innocent way like if you just wanted to change the batch size or image size (as we saw one student achieved 1% error rate on pets dataset for this reason).

I don’t have that issue. The thing is you say to repeat the creation from step 2, but you have to go back to step 1 and reset the seed to 2, then your validation set will be the same.

1 Like

Hi Sylvain, good to see you here!

Thank you for confirming this! I was not re-running the code cell np.random.seed(2) when I went to re-create ImageDataBunch. My guess is that others did the same.

I would suggest updating the notebooks so that this seed generation is being done in the same cell block as the creation of the ImageDataBunch to avoid something like this happening for others :slight_smile:

Alternatively, passing in the seed value as a parameter to ImageDataBunch will ensure this mixup will never happen (more user friendly imho) but I realize this affects the code so its probably less desirable.

Good idea.