Bengali AI kaggle contest

I am a novice kaggle. I am taking part in the Bengali AI contest. I am facing a roadblock. It is being incredibly difficult to train using Kaggle as the Kaggle kernel cannot be inactive for an hour and the duration of training for 10 epochs with Resnet 101 takes about 10 hours which is beyond the total amount of time Kaggle kernels can be used. I thought of using Colab for this purpose but I am facing a roadblock in uploading the 200K images dataset to google drive(I downloaded it as a zip file in Gdrive and I am trying to extract it which is unsuccessful as the Colab notebook often crashes due to the exorbitant amount of data. Any suggestions?

Are you running out of room specifically in Colab for the files?

Memory doesn’t seem to be the problem. The private data test is downloaded into my google drive via the setting up the Kaggle API on Colab. The dataset is downloaded as a zip file and is stored in my google drive. While extracting the zip file that is in gdrive, the Colab notebook crashes due to timeout due to large no of images. I have also tried to manually extract it but it doesn’t bear fruit either.

Can you post the exact error it winds up throwing at you?

If you are using fastai2 you can check this kernel I just shared on kaggle :slight_smile:
https://www.kaggle.com/mnpinto/fastai2-starter-lb0-9598

There are other good kernels using fastai v1 that you can look to get a good baseline.

Regarding Colab I’m not sure but if you try smaller models even a resnet50 the 9h limit on kaggle should be enough to train a single fold.

3 Likes

I think you’re directly unzipping the files from the Gdrive, better copy the zip file to local colab storage and then extract it.

2 Likes

Not sure if you have already found a solution but this is how I did it. Once you have downloaded the zip files from Kaggle into Colab, move them to your google drive: (using your paths)

import shutil
shutil.move("/content/train.csv.zip", “/content/gdrive/My Drive/Colab_Data/Bengali/”)

Once moved, cd into that directory and uzip:

!unzip train_image_data_0.parquet.zip

I have included a link the code used: https://github.com/asvcode/BengaliAI

How can we create a seperate ImageDataBunch for test data which doesn’t have any labels?

I don’t know if this is the easiest way but you can define the labels to be all zero for the test data and create the ImageDataBunch as usual.

I tried to load a pretrained densenet121 model. While creating the cnn_learner object , the weights are being downloaded using the internet. As Kaggle doesn’t allow to submit ones kernel with the internet switched on, I created a directory ‘/tap/.cache/torch/checkpoints. I added a pretrained densenet121 weights using “Add Data”. I moved the weights added to the previously created directory. This technique usually works . Others have employed it in other contests. But I am still getting a fairer or. Any idea on what to do ?

Train in one notebook/script and then save/export the model and predict in another notebook/script.

That’s what I did. I loaded the trained weights separately in my inference notebook

Is it possible to reduce the number of channels of an Image while creating the DataBucnch object? If so, how?

If you are interested in teaming up… let me know. I’m looking to do this contest as well.

My current LB score 96.45. Happy to team up if someone is looking.

Have you tried the competition previously?

I have found that Random Crop has increased my val recall_score by 1.2%. This was something I found in second chapter of FastBook - the idea to try RandomCrop :slight_smile:

I’ve stored test images in a directory and all I’ve to do is get predictions for each, since the train and test directories are different, Getter for image defined in a DataBlock won’t work with test_dl, so I choose low level API to get dataloaders:

def get_x(o): return f'test_images/{o}.png'

dsets = Datasets(test_ids, tfms=[[get_x, PILImageBW.create]])
tdl = TfmdDL(dsets, bs=1, after_item=[ToTensor()], after_batch=[IntToFloatTensor(),Normalize.from_stats([0.0692],[0.2051])], device=default_device())

Now I just need predictions for each so passed the dl to learn.get_preds as follows:

learn.get_preds(dl=tdl, with_decoded=True,with_loss=False)

Note: calling it without any extra kwargs also causes the same issue.

I do have a metric called RecallCombine but since I don’t need need the results/loss I suppose that shouldn’t be invoked while calling simple get_preds. But still, I’m getting some error associated with RecallCombine for above line. The error log for the same is as follows:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-114-f5c756526f11> in <module>
----> 1 learn.get_preds(dl=tdl, with_decoded=True,with_loss=False)

/opt/conda/lib/python3.6/site-packages/fastai2/learner.py in get_preds(self, ds_idx, dl, with_input, with_decoded, with_loss, act, inner, **kwargs)
    216             for mgr in ctx_mgrs: stack.enter_context(mgr)
    217             self(event.begin_epoch if inner else _before_epoch)
--> 218             self._do_epoch_validate(dl=dl)
    219             self(event.after_epoch if inner else _after_epoch)
    220             if act is None: act = getattr(self.loss_func, 'activation', noop)

/opt/conda/lib/python3.6/site-packages/fastai2/learner.py in _do_epoch_validate(self, ds_idx, dl)
    175         except CancelValidException:                         self('after_cancel_validate')
    176         finally:
--> 177             dl,*_ = change_attrs(dl, names, old, has);       self('after_validate')
    178 
    179     def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False):

/opt/conda/lib/python3.6/site-packages/fastai2/learner.py in __call__(self, event_name)
    121     def ordered_cbs(self, cb_func): return [cb for cb in sort_by_run(self.cbs) if hasattr(cb, cb_func)]
    122 
--> 123     def __call__(self, event_name): L(event_name).map(self._call_one)
    124     def _call_one(self, event_name):
    125         assert hasattr(event, event_name)

/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in map(self, f, *args, **kwargs)
    360              else f.format if isinstance(f,str)
    361              else f.__getitem__)
--> 362         return self._new(map(g, self))
    363 
    364     def filter(self, f, negate=False, **kwargs):

/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in _new(self, items, *args, **kwargs)
    313     @property
    314     def _xtra(self): return None
--> 315     def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs)
    316     def __getitem__(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None)
    317     def copy(self): return self._new(self.items.copy())

/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in __call__(cls, x, *args, **kwargs)
     39             return x
     40 
---> 41         res = super().__call__(*((x,) + args), **kwargs)
     42         res._newchk = 0
     43         return res

/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in __init__(self, items, use_list, match, *rest)
    304         if items is None: items = []
    305         if (use_list is not None) or not _is_array(items):
--> 306             items = list(items) if use_list else _listify(items)
    307         if match is not None:
    308             if is_coll(match): match = len(match)

/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in _listify(o)
    240     if isinstance(o, list): return o
    241     if isinstance(o, str) or _is_array(o): return [o]
--> 242     if is_iter(o): return list(o)
    243     return [o]
    244 

/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in __call__(self, *args, **kwargs)
    206             if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
    207         fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 208         return self.fn(*fargs, **kwargs)
    209 
    210 # Cell

/opt/conda/lib/python3.6/site-packages/fastai2/learner.py in _call_one(self, event_name)
    124     def _call_one(self, event_name):
    125         assert hasattr(event, event_name)
--> 126         [cb(event_name) for cb in sort_by_run(self.cbs)]
    127 
    128     def _bn_bias_state(self, with_bias): return bn_bias_params(self.model, with_bias).map(self.opt.state)

/opt/conda/lib/python3.6/site-packages/fastai2/learner.py in <listcomp>(.0)
    124     def _call_one(self, event_name):
    125         assert hasattr(event, event_name)
--> 126         [cb(event_name) for cb in sort_by_run(self.cbs)]
    127 
    128     def _bn_bias_state(self, with_bias): return bn_bias_params(self.model, with_bias).map(self.opt.state)

/opt/conda/lib/python3.6/site-packages/fastai2/callback/core.py in __call__(self, event_name)
     21         _run = (event_name not in _inner_loop or (self.run_train and getattr(self, 'training', True)) or
     22                (self.run_valid and not getattr(self, 'training', False)))
---> 23         if self.run and _run: getattr(self, event_name, noop)()
     24         if event_name=='after_fit': self.run=True #Reset self.run to True at each end of fit
     25 

/opt/conda/lib/python3.6/site-packages/fastai2/learner.py in after_validate(self)
    429     def begin_validate(self): self._valid_mets.map(Self.reset())
    430     def after_train   (self): self.log += self._train_mets.map(_maybe_item)
--> 431     def after_validate(self): self.log += self._valid_mets.map(_maybe_item)
    432     def after_cancel_train(self):    self.cancel_train = True
    433     def after_cancel_validate(self): self.cancel_valid = True

/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in map(self, f, *args, **kwargs)
    360              else f.format if isinstance(f,str)
    361              else f.__getitem__)
--> 362         return self._new(map(g, self))
    363 
    364     def filter(self, f, negate=False, **kwargs):

/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in _new(self, items, *args, **kwargs)
    313     @property
    314     def _xtra(self): return None
--> 315     def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs)
    316     def __getitem__(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None)
    317     def copy(self): return self._new(self.items.copy())

/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in __call__(cls, x, *args, **kwargs)
     39             return x
     40 
---> 41         res = super().__call__(*((x,) + args), **kwargs)
     42         res._newchk = 0
     43         return res

/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in __init__(self, items, use_list, match, *rest)
    304         if items is None: items = []
    305         if (use_list is not None) or not _is_array(items):
--> 306             items = list(items) if use_list else _listify(items)
    307         if match is not None:
    308             if is_coll(match): match = len(match)

/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in _listify(o)
    240     if isinstance(o, list): return o
    241     if isinstance(o, str) or _is_array(o): return [o]
--> 242     if is_iter(o): return list(o)
    243     return [o]
    244 

/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in __call__(self, *args, **kwargs)
    206             if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
    207         fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 208         return self.fn(*fargs, **kwargs)
    209 
    210 # Cell

/opt/conda/lib/python3.6/site-packages/fastai2/learner.py in _maybe_item(t)
    385 
    386 def _maybe_item(t):
--> 387     t = t.value
    388     return t.item() if isinstance(t, Tensor) and t.numel()==1 else t
    389 

<ipython-input-106-0c956accd1b2> in value(self)
     34     @property
     35     def value(self):
---> 36         return self.combine

AttributeError: 'RecallCombine' object has no attribute 'combine'

Code for the RecallPartial and RecallCombined borrowed from this kernel

class RecallPartial(Metric):
    # based on AccumMetric
    "Stores predictions and targets on CPU in accumulate to perform final calculations with `func`."
    def __init__(self, a=0, **kwargs):
        self.func = partial(recall_score, average='macro', zero_division=0)
        self.a = a

    def reset(self): self.targs,self.preds = [],[]

    def accumulate(self, learn):
        pred = learn.pred[self.a].argmax(dim=-1)
        targ = learn.y[self.a]
        pred,targ = to_detach(pred),to_detach(targ)
        pred,targ = flatten_check(pred,targ)
        self.preds.append(pred)
        self.targs.append(targ)

    @property
    def value(self):
        if len(self.preds) == 0: return
        preds,targs = torch.cat(self.preds),torch.cat(self.targs)
        return self.func(targs, preds)

    @property
    def name(self): return df.columns[self.a+1]
    
class RecallCombine(Metric):
    def accumulate(self, learn):
        scores = [learn.metrics[i].value for i in range(3)]
        self.combine = np.average(scores, weights=[2,1,1])

    @property
    def value(self):
        return self.combine