Hi everyone,
i had some pleasant experiences with fast.ai so far, but now I’ve encountered my first mayor issue. I’m using the following piece of code to read a DataFrame into a DataBunch:
def get_valid_idx(data: pd.DataFrame,valid_percentage: float=0.2):
split_idx = int(np.floor(data.shape[0]*(1-valid_percentage)))
return range(split_idx, data.shape[0])
dep_var = 'Transaction Label'
cat_names = ['Buchungstext','Auftraggeber /
Beguenstigter','Verwendungszweck','Kontonummer', 'BLZ', 'Glaeubiger-
ID','Mandatsreferenz','Kundenreferenz','Balance']
cont_names = ['Betrag (EUR)']
procs = [FillMissing, Categorify, Normalize]
path = './tmp'
df = ba.get_data()
df_dropped = df.copy()
del df_dropped['Buchungstag']
del df_dropped['Wertstellung']
valid_idx=get_valid_idx(df)
data = TabularDataBunch.from_df(path, df_dropped, dep_var, valid_idx=valid_idx,
procs=procs, cat_names=cat_names)
However after the parsing, many of the entries are replaced with #na#. This especially happens in ‘Auftraggeber / Beguenstigter’ german for Client / beneficiary’. This columns contains multiple words is it an issue to have columns with more that word or special charaters like ‘+’,’/’,’&’?
Im using the fastai Notebook on paperspace.com btw. If you need information to help me, I happy to provide more.
Thanks to all of you in advance!