Looking for documentation on 'path' argument in TabularDataBunch.from_df()

I’m not sure I understand the purpose of path (mandatory) argument in TabularDataBunch.from_df(path=path,df=df,...) of fast.ai library in Python 3.6.

I checked documentation, but can’t seem to find the details there.
In particular, I have a pd.DataFrame that does not have an associated CSV file on a disk. How do I go about applying .from_df method to it?

Does anyone have more info or links to references?

Found an example here that helped, but still would like to figure out what path argument is and how to use it.

df = pd.DataFrame({'A': list('aabbccabca'), 'B': np.random.normal(size=10).round(2), 'Y': list('aabbccabca')})
tfms = [Categorify]
tblrData = TabularDataBunch.from_df('output', df, dep_var='Y', valid_idx=[7,8], procs=tfms, cat_names=['A'], bs=4)
(cat_x,cont_x),y = next(iter(tblrData.train_dl))
for o in (cat_x, cont_x, y): print(to_np(o[:5]))

Also, is there a documentation about the bs argument?
UPDATE: found it in the video. bs is a batch size parameter.

path is just the working directory where temporary files/models will be saved.

Thanks Sylvain. I also found it in the video (lecture 4, 43rd minute).

Is there a documentation that describes this? I’m sure I will face more questions about other arguments as I start applying the methods.

Be careful, you’re using fastai v1 with the old MOOC so it won’t work properly.

My version of fastai is 1.0.39. What’s the best and up to date source explaining the usage and examples of the library?

Apart from docs.fast.ai and examples/notebooks on the fastai github (and the code itself), this is a good read on the datablock api: https://medium.com/@wgilliam/finding-data-block-nirvana-a-journey-through-the-fastai-data-block-api-c38210537fe4
and this tutorial on torch.nn by @jeremy will help put things in context https://pytorch.org/tutorials/beginner/nn_tutorial.html

1 Like