I kind of very like the idea of using Data Block API in order to create databunch. I also understand the pipeline and the order of series of operations used.
But when I dig more into the source code to understand what really happens behind, I feel lost at some points. Maybe Jeremy will make it more clear in the next lectures but now I like to clarify something:
Nomally the pipeline in block api should be processed:
Example for text problem : TextList -> ItemList -> ItemLists -> LabelLists -> Databunch.
So where would the text in dataset be transformed into numbers and preprocessed? I guess that would be done by calling process method but I don’t know exactly where it would be called in the pipeline above.
Maybe related to the first question. But what are the private variables such as _bunch, _processor in TextList class used for? they are created at the beginning of the class, but I don’t see either they are called somewhere in the pipeline (maybe my bad)
In lesson 3, i had an error while trying to print
print(TextList.from_csv(path, 'texts.csv', cols='text')) AttributeError: 'NoneType' object has no attribute 'textify'
Same thing happens for TabularList. Is it normal? in this case, it should print out the type