How to custom the way of reading my image?

I use ImageDataBunch.from_csv, it works fine, but I want to custom the way of reading my image, in 1.0.22, I found I can modify data.train_dl.dl.dataset.ds.image_opener, but now I found in 1.0.28 it doesn’t work, how to do it?

I done it previously in this way:

# this is from fastai code, named open_image
def my_open_imagey(fn:PathOrStr)->image.Image:
    custom my image reading
    return image

data.train_dl.dl.dataset.ds.image_opener = open_image_rgby

and it’s done, but what can I do now?

2 Likes

You have to subclass ImageItemList now. You’ll see it has an open method, so just go

class MyImageItemList(ImageItemList):
    def open(self, fn:PathOrStr)->Image:
        ...
        return image

And when using the data block API, instead of calling ImageItemList.bla do MyImageItemList.bla.

If you aren’t using the data block API (you should!) check the source code of the factory method of ImageDataBunch you’re using. They’re written using the data block API so you can copy/paste then adapt to your new custom class.

4 Likes

is the image_opener assignment out now ?

An how I can pass a custom sampler to the data loader constructor?
I am currently doing a mix of old and new, is this the right way?

def get_data(sz=64, bs=64, pct=0.2, sample=5000):
#     sz, pct, bs = 64, 0.2, 64
    src = (MyImageItemList.from_df(df=seg, path=PATH, folder=TRAIN)
           .random_split_by_pct(pct)
           .label_from_df(sep=' ')
           .add_test([TEST/f for f in test_names]))
    data = (src.transform(tfms, size=sz))
    #         .databunch(bs=bs).normalize(stats))

   
    datasets = data.train, data.valid,  data.test
    sampler = ImbalancedDatasetSampler(datasets[0], num_samples=sample)
    train_dl = DataLoader(datasets[0], bs, sampler=sampler, num_workers=12)
    val_dl = DataLoader(datasets[1], 2*bs, False, num_workers=8)
    test_dl = DataLoader(datasets[2], 2*bs, False, num_workers=8)

    return ImageDataBunch(train_dl, val_dl, test_dl).normalize(stats)

I am using a custom sampler to pass to the DataLoader constructor. (Just and oversampler of low count classes)

1 Like

fastai lib is changing rapidly, so I don’t know what will it be in the final, so I choose to use 1.0.22 version, and waiting for the stable version.

1 Like

@tcapelle Have you succeed with this snippet?
I have found this by googling the same problem - using custom batch sampler, and now got:

TypeError: batch must contain tensors, numbers, dicts or lists; found <class ‘fastai.data_block.LabelList’>

Yes it worked.

Thanks. Finally, I have got this working as well - needed to set batch_sampler, not sampler

but you are passing the sampler to the pytorch DataLoader? Ok, so you built a batch sampler, it is not the same.

I am on fastai v1.0.42 and run into the following:

ds = ImageItemList.from_csv(path=PATH, csv_name=LBLS, folder=TRAIN, cols=[0,1])
print(type(ds))
ds= ds.random_split_by_pct(0.2, seed=SEED)
print(type(ds))
ds = ds.label_from_list(lbls.Target.values)
print(type(ds))
<class 'fastai.vision.data.ImageItemList'>
<class 'fastai.data_block.ItemLists'>
<class 'fastai.data_block.LabelLists'>

My issue is that even though I overwrite ds.open=my_image_loader no matter what I do the ds instance will always default back to the standard open() method which calls open_image().

I tried overwriting ds.open at each level of instantiating ds, i.e. on the level of ImageItemList, ItemList, LabelLists. On the former I get my method as desired. As soon as I move towards ItemList or downstream I get the default method, however.

I tried subclassing as you outlined above with:

class MyImageItemList(ImageItemList):
    def open(self, fn):
        ...
        return image
```
How can I enforce the use of my custom method?

You should just replace the ImageItemList by MyImageItemList in your call of the data block API.
Also, be very careful when using label_from_list, it won’t work as your inputs aren’t in the same order after the split, you should use label_from_df.

I did both these things. What I do not understand is that the LabelList which gets created falls back to the factory default open(). I presume it is a scoping problem, but just can‘t find out where it goes wrong. I will report if I find out.

Seems to work now. I had (from an older version of my code subclassing fastai << 1.0.42) overwritten the methods

    def __getitem__(self,i):
        return self.open(self.items[i])
    
    def get(self, fn):
        return self.open(fn)

as well. I thought I needed that, but in fact it totally messes up everything and causes a lot of trouble. Seems to work now.

Yeah, just subclassing open is preferable. Glad it’s working now!

Hi trying to take baby steps on handling 3d image training and reading the above post. Trying to understand how to use:

class MyImageItemList(ImageItemList):
def open(self, fn:PathOrStr)->Image:
    ...
    return image

Described above. The images are from MRNet and they are .npy files. I can read in each file to my Jupyter notebook with:

img_array = np.load('0000.npy')

The resulting array is (44, 256, 256). It is basically 44 slices of (1, 256, 256) images. So, if I want to view the last slice I run:

plt.imshow(img_array[43], cmap='gray')
plt.show()

I am trying to use the datablock api and I think a custom ItemList as the first step. I initially tried:

mri_list = ObjectItemList.from_csv(path, 'train_knee_tiny.csv', folder='sagittal', 
suffix='.npy')

It runs but when I type mri_list I get:

OSError: cannot identify image file '.\\sagittal\\n0000.npy'

I guess that makes sense, since the ObjectItemList is looking for 2d images.

So, when I try to do the above just returning:

return np.load('image')

I get

NameError: name 'ImageItemList' is not defined

So, I think I have some conceptual errors how this works. I am reading more (I just started lesson 7, so not sure of myself around the Datablock API and how to handle something that is seems like it is non-standard.

Hi,
I am trying similar image read from .npy and get same error.
Did you solve the problem?

@burak and @jmstadt
I think this here should work:

def open_npy(fn:PathOrStr, cls:type=MyImage, after_open:Callable=None)->Image:
    x = np.load(fn)
    if after_open: x = after_open(x)
    return cls(x)

class MyList(ImageList):
    def open(self, fn):
        return open_npy(fn, after_open=self.after_open)

Change cls:type=MyImage to cls:type=Image if you don’t want to customize your ItemBase class. If you want to overload something in the Image class you can just:

class MyImage(Image):
   #Do whatever necessary to represent your data :-) 
1 Like

Hi Burak, no, I had not found a solution. So, thanks Kai! will give that a try!

I’m working on the Digit problem. Since the data is given in csv format, to convert it to an image, I wrote a custom class :

class CustomImageList(ImageList):
    def open(self, fn):
        img = fn.reshape(28,28)
        img = np.stack((img,)*3, axis=-1)
        return Image(pil2tensor(img, dtype=np.float32))
    
    @classmethod
    def from_csv_custom(cls, path:PathOrStr, csv_name:str, imgIdx:int=1, header:str='infer', **kwargs)->'ItemList': 
        df = pd.read_csv(Path(path)/csv_name, header=header)
        res = super().from_df(df, path=path, cols=0, **kwargs)
        
        res.items = df.iloc[:,imgIdx:].apply(lambda x: x.values / 255.0, axis=1).values
        
        return res
    
    @classmethod
    def from_df_custom(cls, path:PathOrStr, df:DataFrame, imgIdx:int=1, header:str='infer', **kwargs)->'ItemList': 
        res = super().from_df(df, path=path, cols=0, **kwargs)
        
        res.items = df.iloc[:,imgIdx:].apply(lambda x: x.values / 255.0, axis=1).values
        
        return res

I call it simply how we would call our Data Block API. Everything seems to run fine. I exported my pkl file too. Now when I am trying to run this on Heroku, I get the following error in my logs :

AttributeError: Can't get attribute 'CustomImageList' on <module '__main__' from 'app/server.py'>

Can any one help here?

Edit : Adding my serve.py code too :

import aiohttp
import asyncio
import uvicorn
from fastai import *
from fastai.vision import *
from io import BytesIO
from starlette.applications import Starlette
from starlette.middleware.cors import CORSMiddleware
from starlette.responses import HTMLResponse, JSONResponse
from starlette.staticfiles import StaticFiles

export_file_url = 'https://www.googleapis.com/drive/v3/files/1iRYfxkbrmHoAiV6aJbiLoaEOyXdERBe1?alt=media&key=AIzaSyA1CbVi3ynikmMs4KXq1xXnHSol27UaQ2U'
export_file_name = 'export.pkl'

Port = int(os.environ.get('PORT', 50000))

classes = ['0','1','2','3','4','5','6','7','8','9']
path = Path(__file__).parent

app = Starlette()
app.add_middleware(CORSMiddleware, allow_origins=['*'], allow_headers=['X-Requested-With', 'Content-Type'])
app.mount('/static', StaticFiles(directory='app/static'))


async def download_file(url, dest):
    if dest.exists(): return
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            data = await response.read()
            with open(dest, 'wb') as f:
                f.write(data)


async def setup_learner():
    await download_file(export_file_url, path / export_file_name)
    try:
        learn = load_learner(path, export_file_name)
        return learn
    except RuntimeError as e:
        if len(e.args) > 0 and 'CPU-only machine' in e.args[0]:
            print(e)
            message = "\n\nThis model was trained with an old version of fastai and will not work in a CPU environment.\n\nPlease update the fastai library in your training environment and export your model again.\n\nSee instructions for 'Returning to work' at https://course.fast.ai."
            raise RuntimeError(message)
        else:
            raise


loop = asyncio.get_event_loop()
tasks = [asyncio.ensure_future(setup_learner())]
learn = loop.run_until_complete(asyncio.gather(*tasks))[0]
loop.close()


@app.route('/')
async def homepage(request):
    html_file = path / 'view' / 'index.html'
    return HTMLResponse(html_file.open().read())


@app.route('/analyze', methods=['POST'])
async def analyze(request):
    img_data = await request.form()
    img_bytes = await (img_data['file'].read())
    img = open_image(BytesIO(img_bytes))
    prediction = learn.predict(img)[0]
    return JSONResponse({'result': str(prediction)})


if __name__ == '__main__':
    if 'serve' in sys.argv:
        uvicorn.run(app=app, host='0.0.0.0', port=Port, log_level="info")

Sorry for the long post !

Okay I figured out the error. I created a utils.py and pasted my custom class CustomImageList in it. Also don’t forget to import the fastai, pandas and numpy libraries too. The in your serve.py add from utils import CustomImageList. Worked for me.