I currently have a trained model that achieved quite good results. I then exported the learner and exported the validation set: python nlp_learn.dls.valid.items.to_csv("../data/interim/cta_nn/valid.csv"). In a new notebook, I loaded the learner and the csv to a dataframe. I created a dataloader with
python dl = learn.dls.test_dl(df.tfm_text) . However, when I run python dl.show_batch(), I get a batch that has the order of the items different than the dataframe. This also applies if I pass shuffle=False to test_dl(). This obviously makes it impossible to match back up to the dataframe for downstream processing. I’m fairly certain I’m just missing something (aka, doing something incorrectly). Alternatively (perhaps the better solution), how can I get the dataframe index passed back with the preds? My current pipeline of load->predict is below.
import pandas as pd
from fastai.text.all import *
from fastai.callback.wandb import *
df = pd.read_csv("../data/interim/cta_nn/valid.csv")
learn = load_learner("../models/learner_best_dutiful-sponge-33", cpu=False)
learn.remove_cbs([WandbCallback])
df["tfm_text"] = df.CHIEF_COMPLAINT.str.replace("'", "").str.replace('"', "").str.strip().str.capitalize()
df = df.drop(columns=["text", "Unnamed: 0.1", "Unnamed: 0"])
df = df.dropna(subset="CHIEF_COMPLAINT")
dl = learn.dls.test_dl(df.tfm_text, shuffle=False)
learn.get_preds(dl=dl)
Thank you in advance.