Any interest in Seq 2 SQL?

I see. I try both character and word level generation, but always run into dead loop. It seems always repeating postid posts etc…

I’d have to see a notebook or something to be able to have an idea of what’s going on.

Try using multinomial to not always select the best match like this:

to_np(torch.multinomial(probs[0][-1].exp(), 1))[0]

That is a weird spot for the infinite loop though it seems like.

Will clean up and post later today, it is messy and full of experimenta code now.

So is mine :stuck_out_tongue: that’s ok

is this torch.multinomial kind of sampling the words with a probability distribution ?

You are probably right, I haven’t read the spacy source code though. Although I think WED is tokenized as “we” “d” is probably a bad choice.

1 Like

Great question. I know it selects the top answer most of the time, but it also has a chance to select other answers. I believe Jeremy used it at one point as a way to predict the next word on sentence generation. I’m not sure if there would be a better thing to use, but that’s done a pretty good job for me.

I probably have missed that, I have implement a get_next_random with np.random.choice() which will sample for a probability distribution. However, since softmax is always trying to pick a winner, ie. there are 4 words, the probabilities are [0.95, 0.01,0.01,0.02], it does not helps a lot although it makes the random generation richer.

What if you use a sigmoid instead of a softmax? I think that will let the percentages be less aggressive.

hmmm… not sure how can I use sigmoid as output? The probability distribution needed to be add up to 1. In this case, 1 class can be output at a time only, so softmax make sense instead of cross entropy loss, but I will think more about it later.

I will clean up my code first. Thanks.

So after reading your notebook, I have add accuracy as an evaluation metric during training.

Edited: Found an error of the accuracy function, the dimension actually doesn’t match and create a matrix and cause weird accuracy.

I spent half my day debugging, was messing up with the eval metric but that does not affect the training, accuracy looks good now, but still producing garbage result.

Have you tried more epochs? And maybe use 10,1 for clr_beta?

But everything not SQL keywords or functions in your vocab is unnecessary, which will make training more difficult

I think I tried, the accuracy is so high that I don’t understand why it generate garbage sequence, I try to find is there a bug in training but cannot find anything.

Yeah I guess I will have to figure out some regex to clean up… or maybe I will just borrow your pre-processing and see if the model works later. I guess it may be a good idea to test if there is bug inside my code as well…

36% accuracy? That seems very low to me. What happens when you run a few more epochs?

I guess I should clean up a bit more, accuracy is ~0.95 if I train it fully and loss is around 0.4

You also don’t want to overfit or it will just generate SQL from the dataset.

But you shouldn’t just be getting garbage.

EDIT: Still, you’re using a vocab of 16k which is 100x more than I use

Yeah, but I can’t even overfit it. The notebook is word level. I tried character level to avoid producing garbage sentence (I didn’t upload the character level notebook), I still think there may be some bug so I am going to try your pre-processing and re-run the model.

Yeah sorry- definitely misread when skimming it! I see the “94% accuracy” now.

Have you tried using fast ai LanguageModelLoader / LanguageModelData?

1 Like