Hi,
consider me the newb of the day…
for the sake of not being told to follow on jupyter etc… I did follow on google colaboratory
and for a bit more background, i have been writing applications professionally since the 1980s when dinosaurs roamed the earth and keven mitnick was a criminal…
I built my own system i use for all manner of development and play (others call it masochism).
its a i7 system, Asus board, water cooled, 32 gigs of ripjaw, and a titan X card…
(currently out of work after 15 years last job, and looking, and this is how i use my time in between searches… ultimately want to do Kaggle, and other things)
I am fine following the courses on colabatory
but now i want to start writing my own nets, and have spent two weeks prepping data with the idea that after i run things several ways in tabular i will do other ideas.
the data is huge… however i am starting with a subset of 90,000 rows…
my jupyter seems to be running right, but i prefer to run native in spyder…
both say my GPU is available… even telling me its a GTX Titan X
and even enjoyed the page code where you can load a 500,500,500 and see the time difference
on my machine the cpu did it in milliseconds, while the gpu did it in microseconds
and when i was young and foolish i thought a 300 baud modem was exciting…
love the courses, they have been the most productive so far towards getting working code up!!!
a stupendous feat if you have spent as much time as i have…
Now i am trying to do my first tabular data run and having problems (i hope someone here would find simple) -
and i am trying to run it native, then tried in jupyter as that is adding more issues than removing them…
one is that the tutorial code from git does does not match the docs.fast.ai/tabular.html code… whee!!! such fun…
1st problem:
path = untar_data(URLs.ADULT_SAMPLE)
instead i am using and the file is in the local directory i am running from
df = pd.read_csv(‘filename.CSV’)
This wouldn’t much be a problem, except that TabularList.from_df throws an error without path
and i tried even putting one in by path = ‘text’
2nd problem:
The tutorial gives example of Cats and Cont names… and a list of procs…
[the video gives >50, the tutorial gives salary when the column is targets, and targets html gets it right using salary with a column names salary]
The targets page only gives cat names…
and my project has no cat names, but only cont names… fun fun fun…
(the idea of all numeric tabular data seems to escaped the mind)
3rd problem:
the tutorial page from git and online creates a test df using iloc
test = TabularList.from_df(df.iloc[800:1000].copy(), path=path, cat_names=cat_names, cont_names=cont_names)
the docs page doesnt make a test, and uses valid_idx = range(len(df)-2000, len(df))
it also leaves out cont names…
so far this is a lot of different combinations i am trying way before i got here…
that is, i have been trying to make this work with these varying examples!!!
and it does run without error to this point…
but given the differences, who knows if its working right? not i
4th problem…
tutorial says data.show_batch(rows=10), which so far seems to work for me…
but the docs page says…
(cat_x,cont_x),y = next(iter(data.train_dl))
for o in (cat_x, cont_x, y): print(to_np(o[:5]))
which throws a broken pipe error 32 which i have not been able to resolve…
do note that cat_x is no where in the page other than there and they use cat_names
so i have no idea what that code is to refer to… does anyone?
here is the error
File “C:\Anaconda3\lib\multiprocessing\reduction.py”, line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
so now i am quite stuck!
but at least i know my cuda core works…
torch.cuda.is_available() and print(torch.cuda.is_available())
prints True in both spyder and jupyter…
i was so hoping to see my first net running after two and a half weeks of full time effort…
here is all my code… (i shortened the cont_names list for space)
import torch
import pandas as pd
from fastai.tabular import *
path = ‘M:\tabular’
df = pd.read_csv(‘filename.CSV’) #dtype=np.float64
valid_idx = range(len(df)-8000, len(df))
print(valid_idx)
dep_var = ‘Target’
cat_names = []
cont_names = [‘H1’,‘M1’,‘OP1’,‘HP1’, -list shortened-]
procs = [FillMissing, Categorify, Normalize]
test = TabularList.from_df(df.iloc[800:1000].copy(), path=path, cont_names=cont_names)
data = TabularDataBunch.from_df(path, df, dep_var, valid_idx=valid_idx, procs=procs, cont_names=cont_names)
(cat_x,cont_x),y = next(iter(data.train_dl))
for o in (cat_x, cont_x, y): print(to_np(o[:5]))