ImageList.from_df add './' by default in front of filenames?

jiapei100 · August 28, 2019, 9:57pm

I’m actually using ImageList.from_df to load a bunch of images as my training data. However, it seems ImageList.from_df ALWAYS add “./” in the VERY front of all file names?

print(ImageList.from_df(df=df,path='',cols='path',folder=None,suffix=''))

brings me the following ERROR message:

FileNotFoundError: [Errno 2] No such file or directory: './/XXXXXXXX

Clearly, the VERY front ‘./’ is incorrect.

Any suggestions?
Pei

KevinB · August 28, 2019, 10:21pm

what happens if you change path to path='/'?

~~I am trying to figure out where that ./ is being added and I haven’t found it quite yet, but will keep looking.~~

So looking at the code for ImageList.from_df, this calls:

res = super().from_df(df, path=path, cols=cols, **kwargs)

Super() in this case refers to ItemList, so looking at what ItemList.from_df actually takes in:

    @classmethod
    def from_df(cls, df:DataFrame, path:PathOrStr='.', cols:IntsOrStrs=0, processor:PreProcessors=None, **kwargs)->'ItemList':
        "Create an `ItemList` in `path` from the inputs in the `cols` of `df`."
        inputs = df.iloc[:,df_names_to_idx(cols, df)]
        assert not inputs.isna().any().any(), f"You have NaN values in column(s) {cols} of your dataframe, please fix it."
        res = cls(items=_maybe_squeeze(inputs.values), path=path, inner_df=df, processor=processor, **kwargs)
        return res

I see that it feeds to self (in this case called cls) items=_maybe_squeeze(inputs.values), path=path, inner_df=df, processor=processor, **kwargs

So at this point, remember path=’’

So now, let’s see what the ItemList.init looks like:

    def __init__(self, items:Iterator, path:PathOrStr='.', label_cls:Callable=None, inner_df:Any=None,
                 processor:PreProcessors=None, x:'ItemList'=None, ignore_empty:bool=False):
        self.path = Path(path)
        self.num_parts = len(self.path.parts)
        self.items,self.x,self.ignore_empty = items,x,ignore_empty
        if not isinstance(self.items,np.ndarray): self.items = array(self.items, dtype=object)
        self.label_cls,self.inner_df,self.processor = ifnone(label_cls,self._label_cls),inner_df,processor
        self._label_list,self._split = LabelList,ItemLists
        self.copy_new = ['x', 'label_cls', 'path']

So path is what we are interested here:

self.path = Path(path)

This is your culprit.

Path() will return “.” so in order to have it be at the root level, you have to explicitly add that to your path.

This can be tested and shown by executing Path("") and Path("/")

jiapei100 · August 28, 2019, 10:27pm

Oops…
Thank you so much… That helped… It’s working now…
Thank you

KevinB · August 28, 2019, 10:28pm

I added a bit more explanation after digging in a bit deeper. Let me know if you have any questions!