Beginner: Beginner questions that don't fit elsewhere ✅

New sub for him. I might take a ganger at his videos.

1 Like

I’ve just started the course and it’s great so far, thank you!!

I have a question on different results on the fine-tuning step of the Is it a bird? intro notebook.

I’ve run the notebook both in the cloud (Kaggle) and in a local Jupyter lab instance and I get wildly different results during the fine-tuning step.

At first I thought it might have been differences in the image data (bird_or_not), but I tar’ed up and downloaded the data from Kaggle and still very different results.

My best guess at the moment is somehow training/fine-tuning varies depending on hardware? For context I am training locally on an Intel Macbook with no GPU.

I am very confused by this as I was guessing that the only difference would be performance when not using a GPU (much slower). But that error rate would be similar.

Please see below for results on Kaggle vs laptop

Kaggle

Laptop

Edit: I’ve also searched here, but the questions seem to center on different results across different runs on the same platform.

2 Likes

Update, I’ve just tried using AWS SageMaker Studio with the smallest available instance: ml.t3.medium with 2 vCPU and 4 GiB memory.

The results were much better as seen below, so I can only assume that somehow fine-tuning is somehow hardware-dependent (though I don’t yet understand why).

1 Like

Are the differences consistent, i.e., are the results on your laptop always better than on Kaggle?

Thanks for the response!!

I’ve tried this a number of times locally as well as using various cloud services and the results are consistent:

  1. The error rate when running on my laptop is ~0.5 for each epoch
  2. The error rates when running on Kaggle/Colab/SageMaker are much lower (between .009 - .06)

In addition, one of the things I noticed when running in SageMaker is that each epoch took significantly longer. I am guessing because I was using the “lowest” ml instance type.

One thing that I am also a little confused about is your last statement about the my results being better on my laptop. I thought that lower error rates were “better”?

Back when I did this lesson, I was also confused as to why 0 was chosen. It felt quite arbitrary.

That’s because the threshold can be anything. The model will learn during the training process that anything above 0 is classified as a 3, and anything below 0 is classified as a 7.

When you move on to more categories, you’ll have to introduce new thresholds.

If no one else can help you, you can try ChatGPT. The premium version is pretty good at Python coding. The free version could still be insightful.

Copy paste the code and error message and ask a question.

I’ve got a few questions about the Lesson 1 notebook.

Is it bad to put all the imports in the first cell? From programming, that’s what I’m used to. But in the lesson 1 notebook the imports are scattered throughout multiple cells.

In my notebook I just used !pip install -Uqq fastai duckduckgo_search without checking if I am in Kaggle.
Why does it check if the notebook is inside Kaggle?

import os
iskaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')

if iskaggle:
    !pip install -Uqq fastai

Why do I have to reload all the imports each time the session restarts?

Do I need to redownload the images when the session restarts? Does the HDD have persistence?

Is there a way to view the downloaded images, like a file browser?

Thanks for the response!!

I’ve tried this a number of times locally as well as using various cloud services and the results are consistent:

  1. The error rate when running on my laptop is ~0.5 for each epoch
  2. The error rates when running on Kaggle/Colab/SageMaker are much lower (between .009 - .06)

In addition, one of the things I noticed when running in SageMaker is that each epoch took significantly longer. I am guessing because I was using the “lowest” ml instance type.

One thing that I am also a little confused about is your last statement about the my results being better on my laptop. I thought that lower error rates were “better”?

Hey,

a few observations:

Training speed
It’s totally expected that training takes longer on some hardware. Roughly, the better the GPU, the faster training will be.

Training quality
I think it’s also expected that training results will be a bit different on different hardware. There are some parameters that affect training results (e.g. batch size). And if you let a framework choose the best parameters for you, it might choose different parameters on different hardware. Maybe one machine has more GPU memory, so a larger batch size can be used, while on another machine with less GPU memory a smaller batch size is auto-chosen. So, I would expect that if you really fix all low-level choices, you would get identical results on different hardware. But don’t worry about that for now! :slight_smile:

Metrics

Yes, lower error rates are better. But I looked at validation loss. Interestingly, the val loss on your laptop is way smaller, even though error rate is higher. As val loss is actually what the training process optimizes, I usually look at that to determine if traning is successful.
And normally, lower val loss and lower error rate go hand in hand. If that’s not the case, you might have a lot of randomness in your data. That might be due to a small data set.

I’m not very familiar with github and kaggle so please bear with me. I can find the notebooks in the github repo but can’t seem to open them in kaggle to interact with it myself. Is it possible to do that? I wanted to use the stripped down version but can’t find them on kaggle.

By the way, is the one on cats and dogs not there anymore on kaggle? I can only find the birds one.

Thanks,

Hey! I made a short video on how to do it:

Is there a way to send a notebook from github to Kaggle?

I had a notebook in Kaggle and pushed it to github. Then I downloaded the repo and edited it on Pycharm. I pushed Pycharm back to github. Now I want to take my edited notebook and run it on kaggle. How do I do that?

Hi guys,

I get an error when I try to install mamba install -c fastchan fastbook (like in Live Coding part 2).
The OS is Linux EndeavourOS (Arch based).
The error is:

Multi-download failed. Reason: Transfer finalized, status: 404 [https://conda.anaconda.org/fastchan/noarch/platformdirs-3.10.0-pyhd8ed1ab_0.conda] 17 bytes

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File "/home/ltr01/mambaforge/lib/python3.10/site-packages/conda/exceptions.py", line 1132, in __call__
        return func(*args, **kwargs)
      File "/home/ltr01/mambaforge/lib/python3.10/site-packages/mamba/mamba.py", line 941, in exception_converter
        raise e
      File "/home/ltr01/mambaforge/lib/python3.10/site-packages/mamba/mamba.py", line 934, in exception_converter
        exit_code = _wrapped_main(*args, **kwargs)
      File "/home/ltr01/mambaforge/lib/python3.10/site-packages/mamba/mamba.py", line 892, in _wrapped_main
        result = do_call(parsed_args, p)
      File "/home/ltr01/mambaforge/lib/python3.10/site-packages/mamba/mamba.py", line 754, in do_call
        exit_code = install(args, parser, "install")
      File "/home/ltr01/mambaforge/lib/python3.10/site-packages/mamba/mamba.py", line 588, in install
        transaction.fetch_extract_packages()
    RuntimeError: Multi-download failed. Reason: Transfer finalized, status: 404 [https://conda.anaconda.org/fastchan/noarch/platformdirs-3.10.0-pyhd8ed1ab_0.conda] 17 bytes

`$ /home/ltr01/mambaforge/bin/mamba install -c fastchan fastbook`

  environment variables:
                 CIO_TEST=<not set>
        CONDA_DEFAULT_ENV=base
                CONDA_EXE=/home/ltr01/mambaforge/bin/conda
             CONDA_PREFIX=/home/ltr01/mambaforge
    CONDA_PROMPT_MODIFIER=(base)
         CONDA_PYTHON_EXE=/home/ltr01/mambaforge/bin/python
               CONDA_ROOT=/home/ltr01/mambaforge
              CONDA_SHLVL=1
           CURL_CA_BUNDLE=<not set>
               LD_PRELOAD=<not set>
                     PATH=/home/ltr01/mambaforge/bin:/home/ltr01/mambaforge/condabin:/usr/local/
                          sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/
                          usr/bin/core_perl
       REQUESTS_CA_BUNDLE=<not set>
            SSL_CERT_FILE=<not set>
            XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0
         XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session1

     active environment : base
    active env location : /home/ltr01/mambaforge
            shell level : 1
       user config file : /home/ltr01/.condarc
 populated config files : /home/ltr01/mambaforge/.condarc
          conda version : 23.3.1
    conda-build version : not installed
         python version : 3.10.12.final.0
       virtual packages : __archspec=1=x86_64
                          __cuda=12.2=0
                          __glibc=2.38=0
                          __linux=6.4.12=0
                          __unix=0=0
       base environment : /home/ltr01/mambaforge  (writable)
      conda av data dir : /home/ltr01/mambaforge/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/fastchan/linux-64
                          https://conda.anaconda.org/fastchan/noarch
                          https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /home/ltr01/mambaforge/pkgs
                          /home/ltr01/.conda/pkgs
       envs directories : /home/ltr01/mambaforge/envs
                          /home/ltr01/.conda/envs
               platform : linux-64
             user-agent : conda/23.3.1 requests/2.31.0 CPython/3.10.12 Linux/6.4.12-arch1-1 endeavouros/rolling glibc/2.38
                UID:GID : 1000:1000
             netrc file : None
           offline mode : False

An unexpected error has occurred. Conda has prepared the above report.

Any idea what can I do to fix it?
Thanks!

1 Like

Bump because I have the same error on Ubuntu. I tried both mamba install -c fastchan fastai and conda install -c fastchan fastai.

I went to Files :: Anaconda.org and noticed that noarch/platformdirs-3.10.0-pyhd8ed1ab_0.conda has 0 downloads despite being uploaded over a month ago. If I go to the link https://conda.anaconda.org/fastchan/noarch/platformdirs-3.10.0-pyhd8ed1ab_0.conda, it says release not found. I think there is an issue with the package source?

1 Like

Try the ‘Fastai’ channel instead

‘conda install -c fastai fastai’

(I’m not sure why there are two, Anaconda docs list both fastchan & fastai in different parts. Fastai :: Anaconda.org

1 Like

I was expecting after lesson 1 that all of the lessons would be in kaggle and match up to the videos but for lesson 2 the kaggle provided is different than the video. Am I missing something?

I am migrating from PyTorch code to fastai as described here. For the most part, it’s going okay. However, I am getting an error outside of the training portion of the code:

Traceback (most recent call last):
  File "/home/miran045/reine097/projects/loes-scoring-2/src/dcan/training/training.py", line 428, in <module>
    LoesScoringTrainingApp().main()
  File "/home/miran045/reine097/projects/loes-scoring-2/src/dcan/training/training.py", line 307, in main
    create_scatterplot(output_distributions, self.cli_args.plot_location)
  File "/home/miran045/reine097/projects/loes-scoring-2/src/dcan/plot/create_scatterplot.py", line 21, in create_scatterplot
    plt.scatter(xs, ys)
  File "/home/miran045/reine097/projects/AlexNet_Abrol2021/venv/lib/python3.9/site-packages/matplotlib/pyplot.py", line 2807, in scatter
    __ret = gca().scatter(
  File "/home/miran045/reine097/projects/AlexNet_Abrol2021/venv/lib/python3.9/site-packages/matplotlib/__init__.py", line 1412, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/home/miran045/reine097/projects/AlexNet_Abrol2021/venv/lib/python3.9/site-packages/matplotlib/axes/_axes.py", line 4367, in scatter
    y = np.ma.ravel(y)
  File "/home/miran045/reine097/projects/AlexNet_Abrol2021/venv/lib/python3.9/site-packages/numpy/ma/core.py", line 6773, in __call__
    marr = asanyarray(a)
  File "/home/miran045/reine097/projects/AlexNet_Abrol2021/venv/lib/python3.9/site-packages/numpy/ma/core.py", line 8005, in asanyarray
    return masked_array(a, dtype=dtype, copy=False, keep_mask=True, subok=True)
  File "/home/miran045/reine097/projects/AlexNet_Abrol2021/venv/lib/python3.9/site-packages/numpy/ma/core.py", line 2826, in __new__
    _data = np.array(data, dtype=dtype, copy=copy,
  File "/home/miran045/reine097/projects/AlexNet_Abrol2021/venv/lib/python3.9/site-packages/torch/_tensor.py", line 678, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Here is the routine it is failing on:

def create_scatterplot(d, output_file):
    xs = []
    ys = []
    for x in d:
        vals = d[x]
        for y in vals:
            xs.append(x)
            ys.append(y)

    _, ax = plt.subplots()
    plt.scatter(xs, ys)
    plt.title('Actual Loes score vs. predicted Loes score')
    lims = [
        min([ax.get_xlim(), ax.get_ylim()]),  # min of both axes
        max([ax.get_xlim(), ax.get_ylim()]),  # max of both axes
    ]

    # now plot both limits against each other
    ax.plot(lims, lims, 'k-', alpha=0.75, zorder=0)
    ax.set_aspect('equal')
    ax.set_xlim(lims)
    ax.set_ylim(lims)
    plt.axvline(x=10, color='b', ls='--')
    plt.axhline(y=10, color='b', ls='--')
    plt.xlabel("Actual Loes score")
    plt.ylabel("Predicted Loes score")

    plt.savefig(output_file)

Line 21 is here:

plt.scatter(xs, ys)

xs is a list of ints and so is ys.

What am I doing wrong here? This code was working before I started the migration to fastai. Let me know if you need any more information.

Hey guys,

I think I hava a beginner question that does not fit elsewhere:

Is it possible to train a neural net with non-binary targets? For example say we would want to train a knob for a certain task or measurement or anything really, say for example a knob to regulate a temperature?

Normally when we had targets like in the MNIST dataset example, they are always binary (e.g. either it’s a 9 or it is not a 9) - could you train a neural net to recreate a certain “degree” of a target?

Just some arbitrary example: say the neural net gets pictures of landscapes and it has to guess the temperature in the scenery. So rather than calculating the loss by assuming “the proposed number was 9 / letter X, but this is not number 9 / letter X” we could assume “the proposed temperature was 20 degrees celsius, but the actual temperature in the scenery (target) was 31 degrees celsius, so get the loss by calculating a mean squared error between the wrong temperature and the correct temperature” would it be possible to train a net like this at all? I’ve never seen anything like this really, but I could imagine that it would also work? Or am I overseeing something?

Only a guess, find which variable is “type tensor” using…
print(type(myvar))
then do…
myvar = myvar.cpu()