Wiki thread: lesson 1

I received activation fine (via facebook), but cannot accept the rules for bluebook bulldozer to download the dataset. Can accept other competition rules OK. If anyone can dump the train.zip somewhere to get moving with this course, it would be appreciated!

edit: for those interested, i have found a similar competition that may be good enough to play with: https://www.kaggle.com/c/house-prices-advanced-regression-techniques (and I can accept the rules for this one).

For those having trouble accepting rules: you must verify your phone number in your profile to accept the rules. It fixed the issue for me, so just double check your phone is verified!

I tried reinstalling feather as suggested, and am now getting this error on the read_feather call:

TypeError: read_feather() got an unexpected keyword argument 'nthreads'

Have you run into that at all?

But in any event, you were correct that the feathered version was causing the crash, and I was able to get the notebook to work by simply skipping the feathering/unfeathering steps and keeping df_raw in its original form. Thanks for getting me unblocked! :smile:

Guys, is Google Colab fit for this work ?

Hi, I followed your instructions and launched jupyter notebook. i opened a notebook and tried importing libraries.

from fastai import *
from fastai.text import *

Below is the error message i am getting.

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-9acdcc7330cd> in <module>
----> 1 from fastai import *        # Quick access to most common functionality
      2 from fastai.text import *   # Quick access to NLP functionality

~/.local/lib/python3.6/site-packages/fastai/__init__.py in <module>
----> 1 from .basic_train import *
      2 from .callback import *
      3 #from .callbacks import *
      4 from .core import *
      5 from .basic_data import *

~/.local/lib/python3.6/site-packages/fastai/basic_train.py in <module>
      1 "Provides basic training and validation with `Learner`"
----> 2 from .torch_core import *
      3 from .basic_data import *
      4 from .callback import *
      5 

~/.local/lib/python3.6/site-packages/fastai/torch_core.py in <module>
      1 "Utility functions to help deal with tensors"
----> 2 from .imports.torch import *
      3 from .core import *
      4 
      5 AffineMatrix = Tensor

~/.local/lib/python3.6/site-packages/fastai/imports/__init__.py in <module>
----> 1 from .core import *
      2 from .torch import *

~/.local/lib/python3.6/site-packages/fastai/imports/core.py in <module>
      4 import abc, collections, hashlib, itertools, json, operator, pathlib
      5 import mimetypes, inspect, typing, functools, importlib
----> 6 import html, re, spacy, requests, tarfile, numbers
      7 
      8 from abc import abstractmethod, abstractproperty

~/.local/lib/python3.6/site-packages/spacy/__init__.py in <module>
      6 
      7 # These are imported as part of the API
----> 8 from thinc.neural.util import prefer_gpu, require_gpu
      9 
     10 from .cli.info import info as cli_info

~/.local/lib/python3.6/site-packages/thinc/neural/__init__.py in <module>
----> 1 from ._classes.model import Model

~/.local/lib/python3.6/site-packages/thinc/neural/_classes/model.py in <module>
     10 
     11 from .. import util
---> 12 from ..train import Trainer
     13 from ..ops import NumpyOps, CupyOps
     14 from ..mem import Memory

~/.local/lib/python3.6/site-packages/thinc/neural/train.py in <module>
      1 from __future__ import unicode_literals, print_function
      2 
----> 3 from .optimizers import Adam, SGD, linear_decay
      4 from .util import minibatch
      5 

optimizers.pyx in init thinc.neural.optimizers()

ops.pyx in init thinc.neural.ops()

ImportError: /home/user/.local/lib/python3.6/site-packages/murmurhash/mrmr.cpython-36m-x86_64-linux-gnu.so: file too short

Please suggest

Sounds like your versions are a bit wonky. I have been playing with fast.ai v1 and the feather stuff works with these lines to load it:

import feather
df_raw = feather.read_dataframe(‘tmp/bulldozers-raw’)

This thread is useful: Read_feather() function error

Hi All. I had a play using fast.ai v1 instead and seem to be able to get everything to work. The short of it is that a bunch of functions from structured.py need to be copy + pasted over, and the feather loading is slightly different. Nothing else major had to change that I came across.

I have made a condensed gist of lessons 1 + 2 notebooks into one, that works with the current version of fast.ai. Hope it helps: https://gist.github.com/mnye/bb1653562b6e2d85ee44478cfdf0f5a1

I am not sure why these functions were completely thrown away from the repo, but there is a new tabular section for NN which might be worth taking a look at. It would be interesting to hear from @jeremy what his plan is for this course and in particular if things are in for a shake up now v1 is out?

2 Likes

Thanks for doing this! I would love to see a fastai v1 compatible version of all the course. If there are important missing bits of missing functionality, I’d be happy to discuss ways to make them work. I’d like to find a more integrated way of doing things overall - fastai v1 is much more carefully designed than 0.7, so hopefully we can find neat ways of incorporating all the functionality required.

(This will require a community effort however - it’s not something I have time to do myself at the moment.)

1 Like

I was secretly hoping you would have run a course this year with v1 or will be soon, and would update accordingly :slight_smile: As mentioned, there is just a handful of helper functions required (at least for the random forest portion of the course), so I think it would not be hard to keep it working / alive.

Integrating it with the new structure (which looks quite impressive!) I can’t comment on, but I’m hoping to play with the new features in the coming weeks. I have found the random forest portion of the course fascinating though (such a good insight despite already having been exposed to them previously) and it would be great to keep the simple functionality of them alive.

Can someone please help me to download the data for lesson 1 of machine learning?
in kaggle its asking for my phone number and I am from INDIA, so sms(pin) cant be reached
pls help

Hi @harrshjain – All 5 of the notebooks (and the associated datasets) for the ML lessons are available on Kaggle, if that helps.

1 Like

Wow, you really saved the day!
Thankyou sir!

1 Like

This error is coming in jupyter notebook, although I have alrerady installed all the packages of fastai and updated them. What should I do?

you want to use v0.7 of fastai, not v1. make sure you have the right version of the libraries installed

Hi,

When doing the initial processing of a dataframe, is it better to run the function add_datepart to all columns of dtype ‘datetime64’ ?

I’ve come up with the following function to run the add_datepart() function if the column is of the datetime dtype:

columns = list(df_raw)
n_columns = len(columns)
for n in range(n_columns):
if df_raw[columns[n]].dtype  == '<M8[ns]':
    add_datepart(df_raw, columns[n])

Do you think this is good?

Hi,

I would like to know how one can add its own functions to the fastai library to make it available to all notebooks.

For example, I have written the following small function to convert every datetime column into categories:

columns = list(df_raw)
n_columns = len(columns)
for n in range(n_columns):
if df_raw[columns[n]].dtype  == '<M8[ns]':
    add_datepart(df_raw, columns[n])

I’d like to save it somewhere so I can use it in future notebooks. I guess I can just write a python file, but I don’t know where to save it. Also, I’m afraid that it will be overwritten whenever I git pull. Does anyone have any advice to give me on this?

Hi,

I’d like to know how to better approach categories order after running the function train_cats(). In lesson 1, Jeremy rearranges the order of the category ‘UsageBand’.

Do we have to look at each category created and update their order if it is wrong? It seems like a slow process to do this for each category column.

Does anyone have any experience with this?

Based on a fork of this and the work @mrbruce did above, I got Lesson 1 working in a kaggle kernel.

The kernel is here: https://www.kaggle.com/beezus666/fast-ai-machine-learning-lesson-1

Just one little tweak from @mrbruce’s work was I had to change is_string_dtype to pd.api.types.is_string_dtype

Also, I might be doing something wrong, as I’m getting pretty different results in some spots from others were getting. I’m going to go through the lesson again with this working and see what’s what.

Just completed Lesson 1, under proc_df procedure why are we replacing the missing values with median of the column?

In the notebook the SalePrice is converted to log of that SalePrice. But doing that makes our predicted values also have log(SalePrice). So when we submit this output to kaggle we will be getting very bad result. So actually it is better to compute the log in the definition of RMSE.

def rmse(x,y): return math.sqrt(((np.log(x) - np.log(y)) ** 2).mean())

Correct me if I’m wrong

1 Like