Bug when using ImageDataBunch.from_folder and valid_pct with test

You have to change the data object of your Learner for learn.TTA and learn.get_preds. Did you set learn.data = data_test?

Yes i did but still using the valid_pct partition for the first data set.
this is exactly what I have done:

learn.load(S)
learn.data=test_data
log_preds, y_true = learn.TTA (ds_type=test_data.valid_dl, beta= 0.5, scale = 1.3)

However i figured out what is the issue. I found that unless you define your dataset as : ds_type=DatasetType.Valid it won’t use the new validation set.

If you please let me know why is this? or point me to where there is an explanation for it i would appreciate as it helps me and others to have a better understanding of what is the difference between DatasetType.Valid and data_test.valid_dl

Thanks a lot!

That line can’t work with current fastai. ds_type must be of DatasetType. I’ll stop replying until you provide your whole code, as it’s pointless for me to try to guess what’s happening. Not trying to be mean, but I (or any other person in this forum) really can’t help without seeing everything. Failure might be linked to some line of codes before what you are showing.
Also the whole error message (if applicable) and your current setup (given by show_install) are necessary information to figure out what’s going on.

Thanks, and I think the answer is [quote=“sgugger, post:23, topic:28292”]
That line can’t work with current fastai. ds_type must be of DatasetType .
[/quote]
simply during the version update many thing has changed while the docs are still not fully updated and somewhat frustrating for everyone who is using it. :sweat:

I understand that i have to share my code but in this case every thing was according to the aforementioned docs which i have shared the link and thought is needless to copy and paste everything again here. but here we go as you requested:

data = (ImageList.
                  from_folder(f'{path_LC}' ).
                  filter_by_folder(include=['train'], exclude=['test']).
                  split_by_rand_pct(valid_pct = 0.2, seed=None). 
                  label_from_folder().
                  transform(ds_tfms,size=329).
                  databunch(bs=8, num_workers=0)   
               )

learn = cnn_learner(data, arch,pretrained=True,
                    metrics=[accuracy, error_rate, top_k_accuracy], 
                    callback_fns=[partial(CSVLogger, filename =str('stat_' +str(tr)+'_S_'+ aName))])

learn.fit_one_cycle(epoch, max_lr=maxLR, moms =[0.95, 0.85], div_factor = 25.0)
        
        learn.freeze()
        learn.export()

test_data = (ImageList.
                    from_folder(f'{path_LC}').
                    split_by_folder(train='train', valid='test').
                    label_from_folder().
                    transform(ds_tfms,size=329).
                    databunch(bs=16, num_workers=0))

learn.load()
learn.data=test_data
log_preds, y_true = learn.TTA(ds_type=test_data.valid_ds, beta= 0.5, scale = 1.3)
=== Software === 
python        : 3.7.1
fastai        : 1.0.49
fastprogress  : 0.1.20
torch         : 1.0.1
torch cuda    : 10.0 / is available
torch cudnn   : 7401 / is enabled

=== Hardware === 
torch devices : 1
  - gpu0      : GeForce GTX 1080 with Max-Q Design

=== Environment === 
platform      : Windows-10-10.0.16299-SP0
conda env     : base
python        : C:\ProgramData\Anaconda3\python.exe
sys.path      : C:\Users\sshahinf\Desktop\Python_code
C:\ProgramData\Anaconda3\python37.zip
C:\ProgramData\Anaconda3\DLLs
C:\ProgramData\Anaconda3\lib
C:\ProgramData\Anaconda3

C:\ProgramData\Anaconda3\lib\site-packages
C:\ProgramData\Anaconda3\lib\site-packages\win32
C:\ProgramData\Anaconda3\lib\site-packages\win32\lib
C:\ProgramData\Anaconda3\lib\site-packages\Pythonwin
C:\ProgramData\Anaconda3\lib\site-packages\IPython\extensions
C:\Users\sshahinf\.ipython
no nvidia-smi is found

The docs are updated with each new version. If there are places when they’re not fully updated, any PR to fix them will always be more than welcome. TTA isn’t documented in any case, which is another thing where a contribution would be appreciated. In general you’ll find people will be more prompt to help you if you use language like “it’s not perfect, how can help make it better?” rather than just complaining.

Same for the changes. I’m not sure what are the many things that changed since the functions you use haven’t moved in the past three months.

In any case, the correct line is

learn.TTA(ds_type=DatasetType.Test, beta= 0.5, scale = 1.3)
1 Like

thanks for your clarification. I will try to contribute to TTA documentation.
this is what i wrote two comments above:

and i just asked the question for what is the difference between two data type.

well, there have been changes that each of them took a bit of time until we figured out how to fix them from v0.7 to v1.0. it was unfortunate that we tried to use this library at the time many changes had to happen. Needless to mention all of them here. it was just bad timing i guess :smile:

thanks anyways!

Ah yes sorry, ds_type=DatasetType.Valid is what you want when you have set your new data, because it’s the validation set of data_test. I confused myself :wink:

Oh I didn’t realize you were talking of v0.7. It has been stated very clearly that v1.0 was a complete rewrite, so there is absolutely no backward compatibility. Also there was no docs for v0.7, so it’s not a question of having them be updated, more like writing them :slight_smile:

Why do we have two different methods to create data -
ImageDataBunch and ImageLists?

Hi Shruti,

There are many methods, actually : ) Please have a look at the data_block docs and the DataBunch docs, for a start. You should also be able to Google many useful blogposts about “fastai data block” API. Worth noticing the strive for consistency (kudos to the devs) between the creation methods for different data types, e.g. image, tabular, text, etc., such that very similar lines of code in fastai can be used for different DL models / applications. Thanks.

Yijin

1 Like

Yep, did just that - read a blogpost on the data block API. Now things make more sense :slight_smile: