Correlation: A Naive Approach

Before considering any models, it is best to start of by just naively looking at correlations. As you may have noticed, we have a bunch of categorical variables, which poses a problem to "calculating" correlations.

There are two options when it comes to categorical variables: creating dummies or considering correlations by groups. In our example we do the latter. If you would like to have a dummy variable you can follow the following steps: We have a binary variable, such as region, which differentiates between "Advanced Economies" or "Emerging Markets and Developing Economies". We want to see whether there is any correlation between log GDP per capita and the regions. For this, create a dummy variable where the region dummy is 1 for "Advanced Economies", and 0 for "Emerging Markets and Developing Economies", and include it as an additional variable in your correlation plot.

Plot a correlation plot in the way that makes more sense to you. Since you have plenty of variables you do not have to stick to variables we have in our example, feel free to use more/others! Explain to us shortly what correlations you can observe with log GDP per capita.

It can be a bit tricky in Python to replicate the exact same plot, but fear not! We can explore Seaborn's pairplot, which could get us pretty close to the graphic above. To examine the correlation values separately (since incorporating them directly into the plot is somewhat challenging in Python), corr() could be of help to you!

PreviousBuilding Models with Machine Learning NextSet Up

Last updated 11 months ago