Fastai_v1, adding features

(Jeremy Howard (Admin)) #83

You’re looking at the sig of DataBunch, which I believe can take arbitrary Datasets. However create_cnn needs an ImageDataBunch, which needs c defined.

(Ilia) #84

Yeah, that’s correct. I was thinking that if ImageDataBunch is OK with accepting arbitrary Dataset instances then it works with other library classes all the way down to training loops. It was a bit unexpected to me that it is not the case :smile:

Ok, sorry for the misinterpretation.

(Giles Strong) #85

I’m wondering if the tabular data class could be extended to provide the ability to use sample weights for each entry when computing losses?
For the data I work with (high energy physics) these weights are necessary in order to allow the simulated data we train on to match reality.
Currently I use Keras for my work, which has such a feature, but having followed the DL courses I’m looking to move to using the Fast.AI library.

Having looked on the forums, there only seems to be a few topics on balancing classes via hard-coded weights and a custom loss function, but with sample weights it depends on the batch of data being passed to the loss function, so it’s a bit more tricky.


(Christian Werner) #86

Adding an option to auto-save model state after each fit_one_cycle() cycle

I am wondering if an option to fit_one_cycle would be useful to autosave the model state after each cycle? I had the problem on several cloud providers that my notebook died when training a domain language model and so I reran the 10-cycle step like 4 times…
Maybe one could specify a flag autosave=True or autosave=“modelname” and the function would dump the state automatically into modelname-01.pth, modelname-02.pth, …

If you think this sounds worthwhile I might take a look at it? Hints where to start (hooks?) welcome…

(Michael Schuldes) #87

Hi Christian,
Have you taken a look at the Save Model Callback? It might be already what you need.

(Christian Werner) #88

Did not know that one!

And yes, seems like it does just that!

(Fabrizio) #89

Hi, and sorry if this request has been already discussed somewhere else!
I think that fastai is fantastic, but as allenNlp user I find quite convenient to instantiate objects from Jsonnet blobs. In ablation studies such declarative syntax allows to specify an entire experiment using json, moreover it allows to change architectures without changing code. It would be great to have experiment configuration files in fastai too. Thanks!!
For the unfamiliar reader, check this out to have an idea :

(Gidi Shperber) #90

I’m using ImageDataBunch.from_df, which randomly splits the data frame to train/test, with no option to pass random seed.

I would like to add a seed=None argument, which will be passed to random_split_by_pct, and allow reproducible split. Do you think it’s necessary?


You can do it with the data block API (which you should learn since it’s more flexible than the factory methods).