How save preprocessed objects?

madara · September 18, 2020, 6:23pm

HI
As possible that it does not exist a method to save preprocessed object, like “TextDataLoaders.from_d” output??
those methods are time consuming (works only on cpu); then how it’s possible save them without lose further time, if something goes wrong???
why it does not create a json vocab file?? it would be the most logical thing to do;
Moreover why there is not a caching features process???
Moreover process feature is slower and huge memory consuming compared to more compelx arch. like transformers…how do you work in these conditions???
mha…
some good stuff in this framework, but too weak at least for serious NLP projects… sorry

micstan · September 20, 2020, 10:53am

@madara
You can find your numericalized texts in dls.dataset, your vocabulary in dls.vocab, mapping in dls.o2i and use dls.numericalize.decodes/encodes for reverse mapping. You can save it as json if you prefer this way or just pickle the whole dataloader to keep the results.

I find the way you ask your “questions” rude and be aware that many people will just ignore it because of that. It is not a great approach to provide feedback to open source project. If you are not happy with the performance I’m sure everyone would applaud if you can make it better.

madara · September 21, 2020, 2:31pm

@micstan
“You can find your numericalized texts in dls.dataset, your vocabulary in dls.vocab,
mapping in dls.o2i and use dls.numericalize.decodes/encodes for reverse mapping.
You can save it as json if you prefer this way or just pickle the whole dataloader to keep the results”

Thanks;

sorry if I seem rude, but I only explain a my opinion;
I’m not a dev, but only a data analyst;
I, personally,(even if I can, but probably not) I have no interest to improve a framework when I can leverage others open source fmw, also if not in the same way of course;
however I thank You for your answer! good explanation;

however I find this documentation a bit inconsistent, not for quick reference;
probably fastai, would be better usable with its course; and it’s ok for beginner or students;
for example when I need tools to work on some DL task, don’t have time to take a course on new framework but of course I need access to good documentations;
when I need info about some tensorflow features, Pytorch or keras, i find their docs very intuitive and of immediate and easy reference;
unfortunately the same thing I can’t say about Fastai;
maybe and probably it will be my limit…
thanks again

wjs20 · November 6, 2020, 1:48pm

Hi micstan

How do you pkl the whole dataloader?

Also when I created my dataloader it saved a .vocab and .model file in a tmp folder which I moved to my gdrive. I lost the dls object when colab logged out, is there any way that I can recreate the dls without having it internally retrain the tokenizer? it took a long time the first time!

Thanks!

micstan · November 11, 2020, 1:16pm

@wjs20 you can pickle it in the same way as every Python object (https://wiki.python.org/moin/UsingPickle). I’m not sure if I understand the second part correctly but i believe your model should have dls attribute with data loaders (learn.dls).

muellerzr · November 11, 2020, 2:10pm

Just do torch.save(learn.dls, myFname)

Torch will pickle it. Bring it back in with torch.load(myFname)

wjs20 · November 11, 2020, 2:52pm

Does learn.save() save the dls structure with it?

muellerzr · November 11, 2020, 2:55pm

No that only saves the model and optimizer, as the documentation states…

Save model and optimizer state (if with_opt ) to self.path/self.model_dir/file

wjs20 · November 11, 2020, 3:50pm

So if I have created the learner object but don’t want to start training immediately. Would I use learn.export() instead? and then load_learner?