Industry-Location Relationship Analysis

Now we want to analyze how port locations correlate with dominant industries using categorical-categorical visualization, while practicing log-scale transformations for skewed distributions.

We will create a heatmap to show the relationships between continents and top industries. Here's some reading to get you familiar with the concept:


🔧 Your tasks:

  • There is a problem here: The "Mineral Products" industry absolutely dominates all other industries, this prevents us from seeing the relationships of other columns. We can use a log scale to prevent this.

  • Optionally, to reduce clutter and focus on the most important industries, you can filter the dataset first to only include top 5 industries and then calculate the contingency table based on that. That would result in this:


Interpretation tasks:

You can use pandas' .crosstab() for computing a contingency table between two variables.

For log scale transformation, you can use numpy's log1p(). This calculates the natural logarithm of the value plus 1. (🤔Why plus one?)

You can use value_counts() with nlargest(x) to filter the top x values.

Last updated