# Industry-Location Relationship Analysis Now we want to analyze how port locations correlate with dominant industries using categorical-categorical visualization, while practicing log-scale transformations for skewed distributions. We will create a heatmap to show the relationships between continents and top industries. Here's some reading to get you familiar with the concept: * [Heatmaps and correlation plots](https://medium.com/@rgr5882/100-days-of-data-science-day-53-heatmaps-and-correlation-plots-33f7fea60bc0), Ricardo García Ramírez, Medium.com * [Demystifying heapmaps: A comprehensive beginner's guide](https://datasciencedojo.com/blog/heatmaps/) *** ### **🔧 Your tasks:** * [ ] First calculate the contingency table. * [ ] Plot the contingency table as a heatmap. It will likely look like this:

* There is a problem here: The "Mineral Products" industry absolutely dominates all other industries, this prevents us from seeing the relationships of other columns. We can use a log scale to prevent this. - [ ] Transform the contingency table values to log scale. Now you should have something like this:

* Optionally, to reduce clutter and focus on the most important industries, you can filter the dataset first to only include top 5 industries and then calculate the contingency table based on that. That would result in this:

*** **Interpretation tasks:** * [ ] 🤔 Do some online research to find an explanation for such dominance of the "Mineral Products" industry in maritime trade. * [ ] 🤔 Do some online research to find out what real-world factors explain Asia's high "Animal & Animal Products" presence? >

> > You can use pandas' `.crosstab()` for computing a contingency table between two variables. > > For log scale transformation, you can use numpy's `log1p()`. This calculates the natural logarithm of the value plus 1. (🤔Why plus one?) > > You can use `value_counts()` with `nlargest(x)` to filter the top `x` values.