Any interest in Seq 2 SQL?

nok · April 25, 2018, 3:27pm

I see. I try both character and word level generation, but always run into dead loop. It seems always repeating postid posts etc…

jsonm · April 25, 2018, 9:13pm

I’d have to see a notebook or something to be able to have an idea of what’s going on.

KevinB · April 25, 2018, 9:27pm

Try using multinomial to not always select the best match like this:

to_np(torch.multinomial(probs[0][-1].exp(), 1))[0]

That is a weird spot for the infinite loop though it seems like.

nok · April 26, 2018, 2:20am

Will clean up and post later today, it is messy and full of experimenta code now.

jsonm · April 26, 2018, 2:23am

So is mine that’s ok

nok · April 26, 2018, 4:57am

is this torch.multinomial kind of sampling the words with a probability distribution ?

nok · April 26, 2018, 5:21am

You are probably right, I haven’t read the spacy source code though. Although I think WED is tokenized as “we” “d” is probably a bad choice.

KevinB · April 26, 2018, 5:27am

Great question. I know it selects the top answer most of the time, but it also has a chance to select other answers. I believe Jeremy used it at one point as a way to predict the next word on sentence generation. I’m not sure if there would be a better thing to use, but that’s done a pretty good job for me.

nok · April 26, 2018, 5:47am

I probably have missed that, I have implement a get_next_random with np.random.choice() which will sample for a probability distribution. However, since softmax is always trying to pick a winner, ie. there are 4 words, the probabilities are [0.95, 0.01,0.01,0.02], it does not helps a lot although it makes the random generation richer.

KevinB · April 26, 2018, 5:48am

What if you use a sigmoid instead of a softmax? I think that will let the percentages be less aggressive.

nok · April 26, 2018, 6:16am

hmmm… not sure how can I use sigmoid as output? The probability distribution needed to be add up to 1. In this case, 1 class can be output at a time only, so softmax make sense instead of cross entropy loss, but I will think more about it later.

I will clean up my code first. Thanks.

nok · April 26, 2018, 7:49am

So after reading your notebook, I have add accuracy as an evaluation metric during training.

Edited: Found an error of the accuracy function, the dimension actually doesn’t match and create a matrix and cause weird accuracy.

nok · April 26, 2018, 3:47pm

I spent half my day debugging, was messing up with the eval metric but that does not affect the training, accuracy looks good now, but still producing garbage result.

jsonm · April 26, 2018, 4:00pm

Have you tried more epochs? And maybe use 10,1 for clr_beta?

But everything not SQL keywords or functions in your vocab is unnecessary, which will make training more difficult

nok · April 26, 2018, 4:30pm

I think I tried, the accuracy is so high that I don’t understand why it generate garbage sequence, I try to find is there a bug in training but cannot find anything.

Yeah I guess I will have to figure out some regex to clean up… or maybe I will just borrow your pre-processing and see if the model works later. I guess it may be a good idea to test if there is bug inside my code as well…

jsonm · April 26, 2018, 4:38pm

36% accuracy? That seems very low to me. What happens when you run a few more epochs?

nok · April 26, 2018, 4:39pm

I guess I should clean up a bit more, accuracy is ~0.95 if I train it fully and loss is around 0.4

jsonm · April 26, 2018, 4:42pm

You also don’t want to overfit or it will just generate SQL from the dataset.

But you shouldn’t just be getting garbage.

EDIT: Still, you’re using a vocab of 16k which is 100x more than I use

nok · April 26, 2018, 4:51pm

Yeah, but I can’t even overfit it. The notebook is word level. I tried character level to avoid producing garbage sentence (I didn’t upload the character level notebook), I still think there may be some bug so I am going to try your pre-processing and re-run the model.

jsonm · April 26, 2018, 5:06pm

Yeah sorry- definitely misread when skimming it! I see the “94% accuracy” now.

Have you tried using fast ai LanguageModelLoader / LanguageModelData?