Loading text CSV without header


(Avi A) #1

The class method TextDataBunch.from_csv() doesn’t give the ability to pass the CSV headers to it.
Suggestion: add names parameters and pass it to pd.read_csv().

It’s a small pull request but can be useful IMO.


#2

We don’t want the factory methods to have too many parameters (note that there is an argument header but I’m not sure it’s what you want). In this case, it’s easy to read the csv as a dataframe with the right header then use from_df.


(Avi A) #3

Sounds legit, thanks!
(header indeed doesn’t meet my needs)


(Beatrice Paige) #4

also, This works well for me on v0.21 .

import io

text = \
'''1,4.0,?,?,none,?
2,2.0,3.0,?,none,38
2,2.5,2.5,?,tc,39'''

buf = io.StringIO(text)  

df = pd.read_csv(buf, na_values=['?', 'none'], header=None, prefix='col_') 
df

col_0  col_1  col_2  col_3 col_4  col_5
0      1    4.0    NaN    NaN   NaN    NaN
1      2    2.0    3.0    NaN   NaN   38.0
2      2    2.5    2.5    NaN    tc   39.0

Another trick (if this still doesn’t work) would be to use add_prefix :

df

   0    1    2   3    4     5
0  1  4.0  NaN NaN  NaN   NaN
1  2  2.0  3.0 NaN  NaN  38.0
2  2  2.5  2.5 NaN   tc  39.0

df = df.add_prefix('col_')    
df

   col_0  col_1  col_2  col_3 col_4  col_5
0      1    4.0    NaN    NaN   NaN    NaN
1      2    2.0    3.0    NaN   NaN   38.0
2      2    2.5    2.5    NaN    tc   39.0