NLP input-output-based substring prediction

Hi all,

I need help. Here is a task, pretty similar to what’s been on the lessons (IMDB) and if you could give me some tips about that, that would help me solve my main problem (I’m going to write about the exact task to be solved bellow).
So what if I don’t want to do any classification with the IMDB dataset, but I want to predict which movie it was written about? We know, that the review contains the movie title as a substring. For this I think I would have to use a dataset that contains tuples (review,movie title), and feed that to a neural network.

Here is what I actually want to do:
I have a short (max. 10 lines) python code as the input. I want to use deep learning to predict the “main” variable from the ones being used in the source code. It is part of a bigger project, so I can’t explain the meaning of “main” here, but I think it is not that important either. I have a dataset that has the source codes, and the correct variables, that should be guessed as tupples in a csv. In this case there are no classes, because from every input, the output is chosen from a different set of tokens.

I also considered this: I could write a small python script, which would return a list of all the variables in the code, from which the “main” one should be chosen. How much do you think that would help me at all solving the task?

I don’t have any specific question, the problem is that I don’t know how to start, so anything would help. An example jupiter notebook would be amazing, but I couldn’t find any by myself.