I wanted to try to see if it was possible to use an LSTM to predict if a name is a female or male.
The dataset I am using is http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names/.
My code is as follows:
maxLen = 15 #this will be used for padding in keras """ open the files , load them, and clean them """ file = open('../.datasets/genderNames/female.txt') femaleArr = file.read().lower().split('\n')[:-1] file = open('../.datasets/genderNames/male.txt') maleArr = file.read().lower().split('\n')[:-1] characters = """abcdefghijklmnopqrstuvwxyz- '""" print('femaleArrays length '+str(len(femaleArr))) print('maleArrays length '+str(len(maleArr))) """ looks up the variable characters and converts each character to its index. """ def getIndices(name): indices =  for character in name: indices.append(characters.index(character)+1) return indices male_x = [getIndices(x) for x in maleArr] female_x = [getIndices(x) for x in femaleArr] """ at this point, male_x should have an array of length equal to the number of names in male, and each element in that, is each name. Each character is the index that getIndeces returned """ x =  y =  x.extend(male_x) y = *len(male_x) x.extend(female_x) y.extend( *len(female_x)) print(len(x)) print(len(y)) print(y) #0 is male, 1 is female """ adding a padding to X so that keras can accept batches """ X = sequence.pad_sequences(x,maxlen=maxLen) # X = np.expand_dims(X,axis=2) print(X.shape) Y = np.array(y) Y = Y.reshape((7944,1)) print(Y.shape) """ The model. I have already tried removing Embedding, and it still does not work """ model = Sequential() model.add(Embedding(len(characters),50)) model.add(BatchNormalization()) model.add(LSTM(100)) model.add(BatchNormalization()) model.add(Dense(200,activation='relu')) model.add(Dense(1)) model.add(Activation('softmax')) model.compile(loss='binary_crossentropy', optimizer='adam') model.fit(X, Y, batch_size=128, nb_epoch=100)
This loss is stuck at exactly
Even after multiple epochs.
Is there something obvious that I am missing ?