Lesson 7 - Nietsche train / val?

Hi everyone,

I had a quick question. I am looking at the video for Part 1 of the fast.ai course. In short, I am trying to understand the setup for the trn and the val directories associated with training the Nietzsche data set with the “Stateful” model of https://github.com/fastai/fastai/blob/master/courses/dl1/lesson6-rnn.ipynb (which actually seems to be covered in “Lesson 7”).

The code that I see is

from torchtext import vocab, data

from fastai.nlp import *
from fastai.lm_rnn import *

PATH='data/nietzsche/'

TRN_PATH = 'trn/'
VAL_PATH = 'val/'
TRN = f'{PATH}{TRN_PATH}'
VAL = f'{PATH}{VAL_PATH}'

%ls {PATH}

and I see:

models/ nietzsche.txt trn/ val

Unfortunately, when I go into the train directory, I don’t see anything.

%ls {PATH}trn

[ No result ]

Just curious what everyone else is doing. Is everyone just splitting the nietzsche data set by hand into two parts and placing that in trn and val?

Hi Ralph,

In case you have not already figured this out, splitting Nietzsche.txt manually is mentioned in the Lesson 7 video. See the video timeline for the exact spot. HTH.

Pomo,

Thanks for the advice. I saw the same about a couple of hours after I made the post. I was unable to delete the post or close out the issue.

Thanks again for the follow up.

Ralph

I was confused too. I normally don’t have time to complete the class in one day and coming to the video on a different day - I forgot what has happened before.
In fact, Jeremy explained that he had created train and validation part by hand.
But since I was too lazy to copy-paste text and make folders manually, I wrote few lines of code to prepare the data:

os.makedirs(TRN, exist_ok=True)
os.makedirs(VAL, exist_ok=True)

train_perc = .8
with open(f'{PATH}/nietzsche.txt', 'r') as fp:
    lines = fp.readlines()
    text_len = len(lines)
    part_train = open(f'{TRN}nietzsche1.txt', 'w')
    part_val = open(f'{VAL}nietzsche2.txt', 'w')    
    for ix,l in enumerate(lines):

        if ix/text_len<train_perc:
            part_train.write(l)
        else:
            part_val.write(l)
    part_train.close()
    part_val.close()    

You need to run those once TRN and VAL are declared, so after:

from torchtext import vocab, data

from fastai.nlp import *
from fastai.lm_rnn import *

PATH = 'data/nietzsche/'

TRN_PATH = 'trn/'
VAL_PATH = 'val/'
TRN = f'{PATH}{TRN_PATH}'
VAL = f'{PATH}{VAL_PATH}'
%ls {PATH}
4 Likes