Who is best in class?

RAtwood · November 9, 2020, 10:32pm

I have tabular data about students and would like to predict who is best in class.

student class attribute1 attribute2 attribute3 ... attribute30  target
John    1 ...                                                   is_best
Anna    1 ...                                                   is_not_best
...
Peter   1 ...                                                   is_not_best
Amir    2 ...                                                   is_not_best
Ahmed   2 ...                                                   is_not_best
...
Aalyia  2 ...                                                   is_best
Kim     3 ...                                                   is_not_best
Seojeon 3 ...                                                   is_not_best
...
yuri    3 ...                                                   is_best
...

The problem is that I want to predict for each class of students.

If I just take a normal tabular model, then the class is just an attribute. In other words I want to know who is best in this class and not who is best in general. Given this, class who is the best, so to say.

For example Lea might be very good but her class is exceptional and is only in the top 3, wheras Mia is not that good in absolute terms but is on top of her class.

There is no absolute reference.

Any experiences or advice with training such data?

cid_2 · November 10, 2020, 1:56am

Here is one way that might work:

If you have an actual score that you can run regression on, say average grades, then train a model to predict that on all the students, regardless of which class they’re in.

Then you could predict on every student in a class, and just rank them by the predicted average grade after the fact.

birosjh · November 10, 2020, 1:57pm

I second what cid_2 said. I would also suggest just separating them by class and running the model individually on each class.

RAtwood · November 13, 2020, 3:02pm

I see what you mean and I considered this approach.

The issue is that a rank 6 in class 1 can be better than a rank 2 in class 2 in absolute terms. For example the entire class can be better than another one (if we had an absolute measure).