I am a novice kaggle. I am taking part in the Bengali AI contest. I am facing a roadblock. It is being incredibly difficult to train using Kaggle as the Kaggle kernel cannot be inactive for an hour and the duration of training for 10 epochs with Resnet 101 takes about 10 hours which is beyond the total amount of time Kaggle kernels can be used. I thought of using Colab for this purpose but I am facing a roadblock in uploading the 200K images dataset to google drive(I downloaded it as a zip file in Gdrive and I am trying to extract it which is unsuccessful as the Colab notebook often crashes due to the exorbitant amount of data. Any suggestions?
Are you running out of room specifically in Colab for the files?
Memory doesnât seem to be the problem. The private data test is downloaded into my google drive via the setting up the Kaggle API on Colab. The dataset is downloaded as a zip file and is stored in my google drive. While extracting the zip file that is in gdrive, the Colab notebook crashes due to timeout due to large no of images. I have also tried to manually extract it but it doesnât bear fruit either.
Can you post the exact error it winds up throwing at you?
If you are using fastai2 you can check this kernel I just shared on kaggle
https://www.kaggle.com/mnpinto/fastai2-starter-lb0-9598
There are other good kernels using fastai v1 that you can look to get a good baseline.
Regarding Colab Iâm not sure but if you try smaller models even a resnet50 the 9h limit on kaggle should be enough to train a single fold.
I think youâre directly unzipping the files from the Gdrive, better copy the zip file to local colab storage and then extract it.
Not sure if you have already found a solution but this is how I did it. Once you have downloaded the zip files from Kaggle into Colab, move them to your google drive: (using your paths)
import shutil
shutil.move(â/content/train.csv.zipâ, â/content/gdrive/My Drive/Colab_Data/Bengali/â)
Once moved, cd into that directory and uzip:
!unzip train_image_data_0.parquet.zip
I have included a link the code used: GitHub - asvcode/BengaliAI: Google Colab starter notebook to download and extract Bengali.AI Handwritten Grapheme Classification Dataset
How can we create a seperate ImageDataBunch for test data which doesnât have any labels?
I donât know if this is the easiest way but you can define the labels to be all zero for the test data and create the ImageDataBunch as usual.
I tried to load a pretrained densenet121 model. While creating the cnn_learner object , the weights are being downloaded using the internet. As Kaggle doesnât allow to submit ones kernel with the internet switched on, I created a directory â/tap/.cache/torch/checkpoints. I added a pretrained densenet121 weights using âAdd Dataâ. I moved the weights added to the previously created directory. This technique usually works . Others have employed it in other contests. But I am still getting a fairer or. Any idea on what to do ?
Train in one notebook/script and then save/export the model and predict in another notebook/script.
Thatâs what I did. I loaded the trained weights separately in my inference notebook
Is it possible to reduce the number of channels of an Image while creating the DataBucnch object? If so, how?
If you are interested in teaming up⌠let me know. Iâm looking to do this contest as well.
My current LB score 96.45. Happy to team up if someone is looking.
Have you tried the competition previously?
I have found that Random Crop has increased my val recall_score by 1.2%. This was something I found in second chapter of FastBook - the idea to try RandomCrop
Iâve stored test images in a directory and all Iâve to do is get predictions for each, since the train and test directories are different, Getter for image defined in a DataBlock wonât work with test_dl
, so I choose low level API to get dataloaders:
def get_x(o): return f'test_images/{o}.png'
dsets = Datasets(test_ids, tfms=[[get_x, PILImageBW.create]])
tdl = TfmdDL(dsets, bs=1, after_item=[ToTensor()], after_batch=[IntToFloatTensor(),Normalize.from_stats([0.0692],[0.2051])], device=default_device())
Now I just need predictions for each so passed the dl
to learn.get_preds
as follows:
learn.get_preds(dl=tdl, with_decoded=True,with_loss=False)
Note: calling it without any extra kwargs also causes the same issue.
I do have a metric called RecallCombine
but since I donât need need the results/loss I suppose that shouldnât be invoked while calling simple get_preds
. But still, Iâm getting some error associated with RecallCombine for above line. The error log for the same is as follows:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-114-f5c756526f11> in <module>
----> 1 learn.get_preds(dl=tdl, with_decoded=True,with_loss=False)
/opt/conda/lib/python3.6/site-packages/fastai2/learner.py in get_preds(self, ds_idx, dl, with_input, with_decoded, with_loss, act, inner, **kwargs)
216 for mgr in ctx_mgrs: stack.enter_context(mgr)
217 self(event.begin_epoch if inner else _before_epoch)
--> 218 self._do_epoch_validate(dl=dl)
219 self(event.after_epoch if inner else _after_epoch)
220 if act is None: act = getattr(self.loss_func, 'activation', noop)
/opt/conda/lib/python3.6/site-packages/fastai2/learner.py in _do_epoch_validate(self, ds_idx, dl)
175 except CancelValidException: self('after_cancel_validate')
176 finally:
--> 177 dl,*_ = change_attrs(dl, names, old, has); self('after_validate')
178
179 def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False):
/opt/conda/lib/python3.6/site-packages/fastai2/learner.py in __call__(self, event_name)
121 def ordered_cbs(self, cb_func): return [cb for cb in sort_by_run(self.cbs) if hasattr(cb, cb_func)]
122
--> 123 def __call__(self, event_name): L(event_name).map(self._call_one)
124 def _call_one(self, event_name):
125 assert hasattr(event, event_name)
/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in map(self, f, *args, **kwargs)
360 else f.format if isinstance(f,str)
361 else f.__getitem__)
--> 362 return self._new(map(g, self))
363
364 def filter(self, f, negate=False, **kwargs):
/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in _new(self, items, *args, **kwargs)
313 @property
314 def _xtra(self): return None
--> 315 def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs)
316 def __getitem__(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None)
317 def copy(self): return self._new(self.items.copy())
/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in __call__(cls, x, *args, **kwargs)
39 return x
40
---> 41 res = super().__call__(*((x,) + args), **kwargs)
42 res._newchk = 0
43 return res
/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in __init__(self, items, use_list, match, *rest)
304 if items is None: items = []
305 if (use_list is not None) or not _is_array(items):
--> 306 items = list(items) if use_list else _listify(items)
307 if match is not None:
308 if is_coll(match): match = len(match)
/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in _listify(o)
240 if isinstance(o, list): return o
241 if isinstance(o, str) or _is_array(o): return [o]
--> 242 if is_iter(o): return list(o)
243 return [o]
244
/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in __call__(self, *args, **kwargs)
206 if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
207 fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 208 return self.fn(*fargs, **kwargs)
209
210 # Cell
/opt/conda/lib/python3.6/site-packages/fastai2/learner.py in _call_one(self, event_name)
124 def _call_one(self, event_name):
125 assert hasattr(event, event_name)
--> 126 [cb(event_name) for cb in sort_by_run(self.cbs)]
127
128 def _bn_bias_state(self, with_bias): return bn_bias_params(self.model, with_bias).map(self.opt.state)
/opt/conda/lib/python3.6/site-packages/fastai2/learner.py in <listcomp>(.0)
124 def _call_one(self, event_name):
125 assert hasattr(event, event_name)
--> 126 [cb(event_name) for cb in sort_by_run(self.cbs)]
127
128 def _bn_bias_state(self, with_bias): return bn_bias_params(self.model, with_bias).map(self.opt.state)
/opt/conda/lib/python3.6/site-packages/fastai2/callback/core.py in __call__(self, event_name)
21 _run = (event_name not in _inner_loop or (self.run_train and getattr(self, 'training', True)) or
22 (self.run_valid and not getattr(self, 'training', False)))
---> 23 if self.run and _run: getattr(self, event_name, noop)()
24 if event_name=='after_fit': self.run=True #Reset self.run to True at each end of fit
25
/opt/conda/lib/python3.6/site-packages/fastai2/learner.py in after_validate(self)
429 def begin_validate(self): self._valid_mets.map(Self.reset())
430 def after_train (self): self.log += self._train_mets.map(_maybe_item)
--> 431 def after_validate(self): self.log += self._valid_mets.map(_maybe_item)
432 def after_cancel_train(self): self.cancel_train = True
433 def after_cancel_validate(self): self.cancel_valid = True
/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in map(self, f, *args, **kwargs)
360 else f.format if isinstance(f,str)
361 else f.__getitem__)
--> 362 return self._new(map(g, self))
363
364 def filter(self, f, negate=False, **kwargs):
/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in _new(self, items, *args, **kwargs)
313 @property
314 def _xtra(self): return None
--> 315 def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs)
316 def __getitem__(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None)
317 def copy(self): return self._new(self.items.copy())
/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in __call__(cls, x, *args, **kwargs)
39 return x
40
---> 41 res = super().__call__(*((x,) + args), **kwargs)
42 res._newchk = 0
43 return res
/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in __init__(self, items, use_list, match, *rest)
304 if items is None: items = []
305 if (use_list is not None) or not _is_array(items):
--> 306 items = list(items) if use_list else _listify(items)
307 if match is not None:
308 if is_coll(match): match = len(match)
/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in _listify(o)
240 if isinstance(o, list): return o
241 if isinstance(o, str) or _is_array(o): return [o]
--> 242 if is_iter(o): return list(o)
243 return [o]
244
/opt/conda/lib/python3.6/site-packages/fastcore/foundation.py in __call__(self, *args, **kwargs)
206 if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
207 fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 208 return self.fn(*fargs, **kwargs)
209
210 # Cell
/opt/conda/lib/python3.6/site-packages/fastai2/learner.py in _maybe_item(t)
385
386 def _maybe_item(t):
--> 387 t = t.value
388 return t.item() if isinstance(t, Tensor) and t.numel()==1 else t
389
<ipython-input-106-0c956accd1b2> in value(self)
34 @property
35 def value(self):
---> 36 return self.combine
AttributeError: 'RecallCombine' object has no attribute 'combine'
Code for the RecallPartial and RecallCombined borrowed from this kernel
class RecallPartial(Metric):
# based on AccumMetric
"Stores predictions and targets on CPU in accumulate to perform final calculations with `func`."
def __init__(self, a=0, **kwargs):
self.func = partial(recall_score, average='macro', zero_division=0)
self.a = a
def reset(self): self.targs,self.preds = [],[]
def accumulate(self, learn):
pred = learn.pred[self.a].argmax(dim=-1)
targ = learn.y[self.a]
pred,targ = to_detach(pred),to_detach(targ)
pred,targ = flatten_check(pred,targ)
self.preds.append(pred)
self.targs.append(targ)
@property
def value(self):
if len(self.preds) == 0: return
preds,targs = torch.cat(self.preds),torch.cat(self.targs)
return self.func(targs, preds)
@property
def name(self): return df.columns[self.a+1]
class RecallCombine(Metric):
def accumulate(self, learn):
scores = [learn.metrics[i].value for i in range(3)]
self.combine = np.average(scores, weights=[2,1,1])
@property
def value(self):
return self.combine