How to structure Tabular data with time series ? League of Legends

hello everyone ! :smiley:

Before posting, i tried to find other posts on the forum or medium posts explaining what i am looking for but actually i did not find. Maybe i miss something ?

I’m new since one month on fast.ai, i was before learning IA/Deep learning on coursera with Prof NG.

Here are my questions :

From RIOT API (game : league of legends), i am able to download all available information linked to “ranked” games. Data is quite similar to https://www.kaggle.com/paololol/league-of-legends-ranked-matches

In this dataset from Kaggle, you can remark that player “role” is available : TOP, MID, JUNGLER, ADC, SUPPORT. In football, it’s equivalent to goalkeeper, defenders, attacking players … However, when downloading data from Riot API, this information is not available and have to be “created”.

For that, i start to work on this “classification” issue. We may want to predict player role based on the data accumulated about the matches. We access to a set of instances (matches), a collection of features (such as players id, all actions done by players during the match linked to a timeline, players level, gold earned …) for each. The target variable is the role (top, mid, adc, support, jungler).

I don’t know how to structure this data in order to put into into a random forest or deep learning model.

How should be the structure of this panda dataframe / csv file ?
Do i have to list all matches in the first column, and for each match list all players id in different columns ? Here, the problem is how to structure all actions done by players in other columns, how do we linked an action column such as “number of kills” to a specific “players id” which is in an other column. :sweat_smile:

Do i need to have one “player id” by line in the data frame and thus have :

match id n1 - player 1 - number of kill - gold earned -
match id n1 - player 2 - number of kill - gold earned -
match id n1 - player 3 - number of kill - gold earned -
match id n1 - player 4 - number of kill - gold earned -
match id n1 - player 5 - number of kill - gold earned -
match id n1 - player 6 - number of kill - gold earned -
match id n1 - player 7 - number of kill - gold earned -
match id n1 - player 8 - number of kill - gold earned -
match id n1 - player 9 - number of kill - gold earned -
match id n1 - player 10 - number of kill - gold earned -

match id n10 - player 1 - number of kill - gold earned -

With this structure, how can i use the data linked to match time line ? As an example, i would need to know at which time, player id n1 has killed an other player and thus analyzed some patern betwenn time, action and player position. Also I need to do this for each match.

Actually, i think i need to find more information about how to structure tabular data, such as for a list of matches with time series associated. How would you structured a such table for football, with all actions and variables linked to players associated to time for all season matches ?

Thank a lot for the help, I hope i am clear in my description

Seems like you are trying to do transactional analysis on aggregated data - is this correct? You could add a bunch of columns for first kill, second kill etc., with a repeating set of what role was killed, how long since last kill, and whatever else you have. Probably better to create pseudo-transactional data with a different datapoint for each event, with incremental time, and incremental gold earned etc since last event.

That will do for a random forest or other traditional ML model. If you want to do deep learning, you might work through the Rossmann approach to categorical data for some ideas. Fastai covers this in most versions of the course. That analysis has a time-series aspect as well, so a good choice to study either way.

Hello @Ralph, thank you a lot for your help !

Sorry for my late answer i took some times to study more ML et DL, and i created a basic model using only 4 features to predict player role. I created a random forest, using supervised learning, based on the same approach as this article : https://medium.com/snipe-gg/using-unsupervised-machine-learning-to-assume-positions-in-league-of-legends-8eb142063ea6

Precision is around 99% which is really good, but to learn more i would like to create a more complex model by doing transactional analysis on aggregated data. Could you give me some advices , articles or courses links where i can find information on how i can create and used relational database for Random Forest ?

The idea here is not to just use 4 important features in my dataset but instead create maybe several tables linked together as relational database. Maybe like this :

  • 1 dataset with all match_id listed containing match features such as time, win team, loosing teams, players_id
  • 1 datasets for each player_id with all actions done by the player such as kills, assistances, gold etc as a time series events/data

Example :

table1
match_id_1; player_1; match_duration; winning_team; …
match id _1; player_2; match_duration; winning_team; …
match id_1; player_3 ; match_duration; winning_team; …

match id_1; player_10; match_duration; winning_team; …

match id_1000 - player_10 - number of kill - gold earned -

table2 for match_id_1; player_1
time; event_type; role_killed; gold; etc …

Table 2 represent a time serie for a specific player in a specific match

This relational database could allow me to feed the random forest with much more information but i do not know to process with Random forest. Could we use only “joind.df” function and then training the model.

Do you have any advice ?