Loading text CSV without header

bachsh · December 24, 2018, 8:07pm

The class method TextDataBunch.from_csv() doesn’t give the ability to pass the CSV headers to it.
Suggestion: add names parameters and pass it to pd.read_csv().

It’s a small pull request but can be useful IMO.

sgugger · December 26, 2018, 10:40am

We don’t want the factory methods to have too many parameters (note that there is an argument header but I’m not sure it’s what you want). In this case, it’s easy to read the csv as a dataframe with the right header then use from_df.

bachsh · December 26, 2018, 12:45pm

Sounds legit, thanks!
(header indeed doesn’t meet my needs)

Beatrice · December 26, 2018, 12:49pm

also, This works well for me on v0.21 .

import io

text = \
'''1,4.0,?,?,none,?
2,2.0,3.0,?,none,38
2,2.5,2.5,?,tc,39'''

buf = io.StringIO(text)  

df = pd.read_csv(buf, na_values=['?', 'none'], header=None, prefix='col_') 
df

col_0  col_1  col_2  col_3 col_4  col_5
0      1    4.0    NaN    NaN   NaN    NaN
1      2    2.0    3.0    NaN   NaN   38.0
2      2    2.5    2.5    NaN    tc   39.0

Another trick (if this still doesn’t work) would be to use add_prefix :

df

   0    1    2   3    4     5
0  1  4.0  NaN NaN  NaN   NaN
1  2  2.0  3.0 NaN  NaN  38.0
2  2  2.5  2.5 NaN   tc  39.0

df = df.add_prefix('col_')    
df

   col_0  col_1  col_2  col_3 col_4  col_5
0      1    4.0    NaN    NaN   NaN    NaN
1      2    2.0    3.0    NaN   NaN   38.0
2      2    2.5    2.5    NaN    tc   39.0