Industry-Location Relationship Analysis
Now we want to analyze how port locations correlate with dominant industries using categorical-categorical visualization, while practicing log-scale transformations for skewed distributions.
We will create a heatmap to show the relationships between continents and top industries. Here's some reading to get you familiar with the concept:
Heatmaps and correlation plots, Ricardo García Ramírez, Medium.com
🔧 Your tasks:

There is a problem here: The "Mineral Products" industry absolutely dominates all other industries, this prevents us from seeing the relationships of other columns. We can use a log scale to prevent this.

Optionally, to reduce clutter and focus on the most important industries, you can filter the dataset first to only include top 5 industries and then calculate the contingency table based on that. That would result in this:

Interpretation tasks:
You can use pandas'
.crosstab()for computing a contingency table between two variables.For log scale transformation, you can use numpy's
log1p(). This calculates the natural logarithm of the value plus 1. (🤔Why plus one?)You can use
value_counts()withnlargest(x)to filter the topxvalues.
Last updated
