Beginner: Beginner questions that don't fit elsewhere ✅

I see. Thank you for answering my question!

Generally it is “safe”, except from yourself and those whom you grant write access to. :slight_smile:

These protections are more to enforce best practices are used or not accidently mess up your repository (i.e. --force)…

I would set the repository to private and double check that there are no credentials or “secrets” were pushed to the repository before exposing it to the public.

Which one do I click if I want other send pull request but I need to approve them to approve the change?

Likely nothing. If they don’t have write access to the repo, then only way to incorporate changes is through a pull request which must be processed by those who have write access.

Edward

Yeah same here, I’m using a remote server on Callisto right now. It makes things like this a little easier since I don’t have to worry about M1 compatibility issues.

Does anybody know how to copy a line in jupyter notebook without highlighting it?
And is it possible to move a line up and down (like using Ctrl-Alt-Arrow keys I think in VS code)?

I could not find it in keyboard shortcuts from the menu, but maybe there’s a way to do it.
Thank you.

Hey friends! I’ve not done the 2022 course and I am preparing to have a group of people go through it together. I am trying to plan for us to have enough time to go through assignments together, live. From what I can tell, the guidance is from Jeremy is that each 90min lesson will require 10 hours of going through the assignments in order to consolidate it. Is that right? Since I have about 4 full workdays to work with, that probably means that we only get through the first 3-4 lessons. So I wanted to check if that’s really how I should plan it.

In doing the 2022 Part 1 course (my first exposure to fast.ai), but its hard to judge how much time I spent on it - there were lots of different things to dig into, and also following interesting rabbit holes. And it also depends on each student’s past experience. Are you considering 4 consecutive days, or some other arrangement?

I don’t think you need to fully consolidate lessons before moving to the next one. Learning this is an iterative process with pieces bouncing off each other as you go along.

I think you could aim for two course lessons a day, plus one Live Coding video for homework.

Prior to each lesson, as a group read through a print out the associated questionarie and answer them as you first watch the lesson video all the way through. Then rewatch it in a start/stop way to replicate it - here is the notebook for the first one: Is it a bird? Creating a model from your own data | Kaggle .

Look for others in these lesson resources or here.

3 Likes

Thanks for your help! Yes, I am planning four 8-hour days, with a lunch break in between.

hey,
I wanted to know what is the difference between 2022 version and 2020 version as i nearly see 2020v doesn’t talk about Recurrent networks and don’t have homework.
thanks

Hi!

First-time poster, so please forgive me if this is in the wrong place! (I did search the forums first to try to find an answer before posting :slight_smile: )

I’m trying to follow along with Lesson 2, in the context of wanting to use some PyTorch datasets we already have set up at work with fastai’s DataBlock API. In particular, I think things like .show_batch() and the cleaner seem immediately useful for inspecting our data, and the DataBlock API overall sounds great for combining data in different ways to reframe problems, so I’d love to be able to use it!

I’ve been approaching this by trying to follow along with the DataBlock tutorial (notebook 50_tutorial.datablock.ipynb), trying to adapt it to use an existing dataset rather than a path / list of filenames. The existing dataset is basically a collection of images and the corresponding target/y; each item in existing_dataset is a dict with keys ['images', 'target'] mapping 'images'>tensor, 'target'>tensor. So in theory it’s structurally super similar to the example in Lesson 2 and in the DataBlock tutorial notebook. Before getting to dblock.dataloaders(), I figured I would start by trying to get the simpler dblock.datasets() going. I would think that something like this would work:

def get_items(existing_dataset):
    # just a passthrough
    return existing_dataset

def get_target_from_item(item):
    import pdb
    pdb.set_trace()  # for debug only, but isn't executed?
    return item['target']  # shape: torch.Size([1])

def get_image_from_item(item):
    import pdb
    pdb.set_trace()  # for debug only, but isn't executed?
    return item['images']  # shape: torch.Size([3, 256, 256])

dblock = DataBlock(
    blocks = (ImageBlock, RegressionBlock),
    get_items = get_items,
    get_y = get_target_from_item,
    get_x = get_image_from_item,
    splitter = RandomSplitter()
)

print(f"{len(existing_dataset)=}")
print(existing_dataset[0])
dsets = dblock.datasets(existing_dataset, verbose=True)

dsets.train[0]

But unfortunately this doesn’t work: I get an IndexError deep inside fastcore.foundation. (Full traceback below, since it’s long.) My first instinct was to check that I’m at least getting the image and target correctly with get_x and get_y, so I put in those pdb.set_trace() statements so I could inspect the item and see what I was returning… but it doesn’t seem like the pdb breakpoints there are ever even executed! Is there something structurally broken about this approach even before we get to using the actual data?

I’m feeling stuck. I’ve been reading around a bunch, and see lots of people using the provided example datasets, and creating their own datasets in a similar format (files in the filesystem), but I haven’t yet found a case where someone’s using an existing (pytorch) dataset and wants to use the DataBlock API with it. Could someone point me in the right direction, please? Thank you so much!

Full output w traceback:

len(existing_dataset)=3513681
{'targets': tensor([0.5000]), 'images': tensor([[[0.8549, 0.8549, 0.8549,  ..., 0.8706, 0.8706, 0.8706],
         [0.8549, 0.8549, 0.8549,  ..., 0.8706, 0.8706, 0.8706],
         [0.8549, 0.8549, 0.8549,  ..., 0.8706, 0.8706, 0.8706],
         ...,
         [0.8588, 0.8588, 0.8588,  ..., 0.8902, 0.8902, 0.8902],
         [0.8588, 0.8588, 0.8588,  ..., 0.8902, 0.8902, 0.8902],
         [0.8588, 0.8588, 0.8588,  ..., 0.8902, 0.8902, 0.8902]],

        [[0.8588, 0.8588, 0.8588,  ..., 0.8745, 0.8745, 0.8745],
         [0.8588, 0.8588, 0.8588,  ..., 0.8745, 0.8745, 0.8745],
         [0.8588, 0.8588, 0.8588,  ..., 0.8745, 0.8745, 0.8745],
         ...,
         [0.8549, 0.8549, 0.8549,  ..., 0.8902, 0.8902, 0.8902],
         [0.8549, 0.8549, 0.8549,  ..., 0.8902, 0.8902, 0.8902],
         [0.8549, 0.8549, 0.8549,  ..., 0.8902, 0.8902, 0.8902]],

        [[0.8392, 0.8353, 0.8392,  ..., 0.8824, 0.8824, 0.8824],
         [0.8392, 0.8353, 0.8392,  ..., 0.8824, 0.8824, 0.8824],
         [0.8392, 0.8353, 0.8392,  ..., 0.8824, 0.8824, 0.8824],
         ...,
         [0.8392, 0.8392, 0.8392,  ..., 0.8824, 0.8824, 0.8824],
         [0.8392, 0.8392, 0.8392,  ..., 0.8824, 0.8824, 0.8824],
         [0.8392, 0.8392, 0.8392,  ..., 0.8824, 0.8824, 0.8824]]])}
Collecting items from <existing_dataset_class[redacted] object at 0x7f04a8f414c0>
Found 3513681 items
2 datasets of sizes 2810945,702736
Setting up Pipeline: get_image_from_item -> PILBase.create
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Input In [95], in <cell line: 34>()
     32 print(f"{len(existing_dataset)=}")
     33 print(existing_dataset[0])
---> 34 dsets = dblock.datasets(existing_dataset, verbose=True)
     36 dsets.train[0]

File /opt/conda/lib/python3.8/site-packages/fastai/data/block.py:147, in DataBlock.datasets(self, source, verbose)
    145 splits = (self.splitter or RandomSplitter())(items)
    146 pv(f"{len(splits)} datasets of sizes {','.join([str(len(s)) for s in splits])}", verbose)
--> 147 return Datasets(items, tfms=self._combine_type_tfms(), splits=splits, dl_type=self.dl_type, n_inp=self.n_inp, verbose=verbose)

File /opt/conda/lib/python3.8/site-packages/fastai/data/core.py:451, in Datasets.__init__(self, items, tfms, tls, n_inp, dl_type, **kwargs)
    442 def __init__(self, 
    443     items:list=None, # List of items to create `Datasets`
    444     tfms:list|Pipeline=None, # List of `Transform`(s) or `Pipeline` to apply
   (...)
    448     **kwargs
    449 ):
    450     super().__init__(dl_type=dl_type)
--> 451     self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
    452     self.n_inp = ifnone(n_inp, max(1, len(self.tls)-1))

File /opt/conda/lib/python3.8/site-packages/fastai/data/core.py:451, in <listcomp>(.0)
    442 def __init__(self, 
    443     items:list=None, # List of items to create `Datasets`
    444     tfms:list|Pipeline=None, # List of `Transform`(s) or `Pipeline` to apply
   (...)
    448     **kwargs
    449 ):
    450     super().__init__(dl_type=dl_type)
--> 451     self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
    452     self.n_inp = ifnone(n_inp, max(1, len(self.tls)-1))

File /opt/conda/lib/python3.8/site-packages/fastcore/foundation.py:98, in _L_Meta.__call__(cls, x, *args, **kwargs)
     96 def __call__(cls, x=None, *args, **kwargs):
     97     if not args and not kwargs and x is not None and isinstance(x,cls): return x
---> 98     return super().__call__(x, *args, **kwargs)

File /opt/conda/lib/python3.8/site-packages/fastai/data/core.py:365, in TfmdLists.__init__(self, items, tfms, use_list, do_setup, split_idx, train_setup, splits, types, verbose, dl_type)
    363 if do_setup:
    364     pv(f"Setting up {self.tfms}", verbose)
--> 365     self.setup(train_setup=train_setup)

File /opt/conda/lib/python3.8/site-packages/fastai/data/core.py:386, in TfmdLists.setup(self, train_setup)
    383 def setup(self, 
    384     train_setup:bool=True # Apply `Transform`(s) only on training `DataLoader`
    385 ):
--> 386     self.tfms.setup(self, train_setup)
    387     if len(self) != 0:
    388         x = super().__getitem__(0) if self.splits is None else super().__getitem__(self.splits[0])[0]

File /opt/conda/lib/python3.8/site-packages/fastcore/transform.py:200, in Pipeline.setup(self, items, train_setup)
    198 tfms = self.fs[:]
    199 self.fs.clear()
--> 200 for t in tfms: self.add(t,items, train_setup)

File /opt/conda/lib/python3.8/site-packages/fastcore/transform.py:204, in Pipeline.add(self, ts, items, train_setup)
    202 def add(self,ts, items=None, train_setup=False):
    203     if not is_listy(ts): ts=[ts]
--> 204     for t in ts: t.setup(items, train_setup)
    205     self.fs+=ts
    206     self.fs = self.fs.sorted(key='order')

File /opt/conda/lib/python3.8/site-packages/fastcore/transform.py:87, in Transform.setup(self, items, train_setup)
     85 def setup(self, items=None, train_setup=False):
     86     train_setup = train_setup if self.train_setup is None else self.train_setup
---> 87     return self.setups(getattr(items, 'train', items) if train_setup else items)

File /opt/conda/lib/python3.8/site-packages/fastai/data/core.py:338, in <lambda>(i, x)
    334         dls = [dl] + [dl.new(self.subset(i), **merge(kwargs,def_kwargs,val_kwargs,dl_kwargs[i]))
    335                       for i in range(1, self.n_subsets)]
    336         return self._dbunch_type(*dls, path=path, device=device)    
--> 338 FilteredBase.train,FilteredBase.valid = add_props(lambda i,x: x.subset(i))
    340 # %% ../../nbs/03_data.core.ipynb 52
    341 class TfmdLists(FilteredBase, L, GetAttr):

File /opt/conda/lib/python3.8/site-packages/fastai/data/core.py:373, in TfmdLists.subset(self, i)
--> 373 def subset(self, i): return self._new(self._get(self.splits[i]), split_idx=i)

File /opt/conda/lib/python3.8/site-packages/fastcore/foundation.py:120, in L._get(self, i)
    116 if is_indexer(i) or isinstance(i,slice): return getattr(self.items,'iloc',self.items)[i]
    117 i = mask2idxs(i)
    118 return (self.items.iloc[list(i)] if hasattr(self.items,'iloc')
    119         else self.items.__array__()[(i,)] if hasattr(self.items,'__array__')
--> 120         else [self.items[i_] for i_ in i])

File /opt/conda/lib/python3.8/site-packages/fastcore/foundation.py:120, in <listcomp>(.0)
    116 if is_indexer(i) or isinstance(i,slice): return getattr(self.items,'iloc',self.items)[i]
    117 i = mask2idxs(i)
    118 return (self.items.iloc[list(i)] if hasattr(self.items,'iloc')
    119         else self.items.__array__()[(i,)] if hasattr(self.items,'__array__')
--> 120         else [self.items[i_] for i_ in i])

IndexError: list index out of range

Hello , I am trying to deploy a model to hugging face spaces using gradio interference but it is showing that gradio.inputs and gradio.outputs --these functions are deprecated , I dont know how to work with gradio.components ,tried to google it but wasn’t helpful… please find the attached image, can someone please help?

if you want to refer my Kaggle notebook and help me find what was wrong,please refer my notebook

Hello! In Lesson 2, in the section “Creating a Gradio interface”, these parameters were defined as follows: inputs=image, outputs=label.

Hello I am doing Lesson 6. I cannot find information covered after topic ‘random forest’ in the notebooks. Where can I find them?

Hi Karen, it is one and the same, instead of defining separate variables image and label , i directly assigned the value, please refer the section which i have commented out in the screenshot

Actually, I haven’t reached that point. I’m close, though. If things work for me, I’ll share with you my solution.

Just curious!

Do top Kagglers use some high-fi local GPU box or they use Gradient or other cloud GPUs?

Does success in Kaggle competitions depend on the hardwares used?

I am in the process of setting up a goal to get involved with Kaggle competitions and was wondering if the lack of hardware will hinder in excelling. I heard somewhere that in Kaggle competitions the data is huge and it takes hours or days to train the models.

1 Like

Hello, I’m observing following error while running this stable_diffusion.ipynb notebook on kaggle

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_402/1246433801.py in <module>
----> 1 pipe(prompt).images[0]

/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     24         def decorate_context(*args, **kwargs):
     25             with self.__class__():
---> 26                 return func(*args, **kwargs)
     27         return cast(F, decorate_context)
     28 

/opt/conda/lib/python3.7/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py in __call__(self, prompt, height, width, num_inference_steps, guidance_scale, negative_prompt, num_images_per_prompt, eta, generator, latents, output_type, return_dict, callback, callback_steps)
    487         # 2. Define call parameters
    488         batch_size = 1 if isinstance(prompt, str) else len(prompt)
--> 489         device = self._execution_device
    490         # here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
    491         # of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`

/opt/conda/lib/python3.7/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py in _execution_device(self)
    211         hooks.
    212         """
--> 213         if self.device != torch.device("meta") or not hasattr(self.unet, "_hf_hook"):
    214             return self.device
    215         for module in self.unet.modules():

**RuntimeError: Expected one of cpu, cuda, mkldnn, opengl, opencl, ideep, hip, msnpu, xla device type at start of device string: meta**

Please suggest if I’m missing something.

@george23

hey,
I wanted to know what is the difference between 2022 version and 2020 version as i nearly see 2020v doesn’t talk about Recurrent networks and don’t have homework.
thanks

The 2022 version of the course is the version you want to do. It’s more, up-to-date, uses more relevant technologies, and presents topics differently from the book.


@prasadkulkarni

Hello , I am trying to deploy a model to hugging face spaces using gradio interference but it is showing that gradio.inputs and gradio.outputs --these functions are deprecated , I dont know how to work with gradio.components ,tried to google it but wasn’t helpful… please find the attached image, can someone please help?

You can do inputs='image' and outputs='outputs' instead. Example below:

interface = gr.Interface(fn=classify_image, inputs='image', outputs='label',
                         examples=examples, title=title,
                         description=description, article=article)

If you want more control, I think you can do:

image = gr.Image()
label = gr.Label()

And then pass in those as the respective arguments. Check the docs for the gr.Image and gr.Label classes on how you can further customize them.


@theaeonwanderer

Just curious!

Do top Kagglers use some high-fi local GPU box or they use Gradient or other cloud GPUs?

Does success in Kaggle competitions depend on the hardwares used?

I am in the process of setting up a goal to get involved with Kaggle competitions and was wondering if the lack of hardware will hinder in excelling. I heard somewhere that in Kaggle competitions the data is huge and it takes hours or days to train the models.

Kaggle offers its own free GPUs. Though the GPU configuration isn’t exactly the best, it’s still great for those who do not have access to a good GPU and it can definitely take you a long way.

Some competitions have lots of data, while others don’t, while others contain data that simple takes a long time to process. It all depends on the competition and how you approach experimenting with the models. If there is too much data or a model takes too long to train, you typically want to begin iterating with a smaller subset of the data, or with a smaller model respectively.

3 Likes

In some comps, yes, it can be hard (or near impossible!) to get a gold medal without good hardware. But I think in pretty much all comps you could get silver medal using free compute.

3 Likes

What’s the best practice to be followed while posting our work on Twitter, can we mention Jeremy?

And you can always rent very potent GPUs (Nvidia A100) on services like https://jarvislabs.ai or https://lambdalabs.com/

1 Like