Understanding structured data with anonymous features

I was working on a structured dataset with anonymous features. How should I proceed to draw a better inference from the data and to do some feature engineering on it

My takeaway from the classes has been that you need to draw tons of graphs to understand your features - how they interact with the dependent variable, what is the distribution, etc.

As an example, one of the classes has a discussion about data leakage and how certain missing fields were a great indicator about how the grant applications were approved or not. The only way to know this is to plot the features against dependent variable.

Can you mention in which lesson was this discussed?

The first mention about data leakage appeared in Lesson 3 and also in Lesson 6

For me, one the most important takeaways from the machine learning course is that you should try to build a first model using the pre-processing best practices taught by Jeremy (using add_datepart, using proc_df, using get_elapsed to deal with events etc.) without thinking too much.

Then, you should use the interpretation techniques seen in the class to understand better the features and therefore to refine the pre-processing in order to create a better model.

For example, use feature importance to identify which features are worth exploring and focusing on (you don’t want to spend too much time on features having low impact on the outcome), use the max_n_cat parameter in proc_df to turn categorical features having low cardinality into new columns and see the impact in terms of feature importance and in terms of model performance, remove redundant features using hierarchical clustering to improve the interpretability of the model, etc.

So, to me doing model based feature engineering is at least as important as doing raw data based feature engineering.

I had gone through the lessons and was doing model based interpretation but I wanted to know how to proceed without it (model based) as I had not been acquainted on how to proceed with making graphs and interpreting data in a proper way.

