# Industry-Location Relationship Analysis

Now we want to analyze how port locations correlate with dominant industries using categorical-categorical visualization, while practicing log-scale transformations for skewed distributions.

We will create a heatmap to show the relationships between continents and top industries. Here's some reading to get you familiar with the concept:

* [Heatmaps and correlation plots](https://medium.com/@rgr5882/100-days-of-data-science-day-53-heatmaps-and-correlation-plots-33f7fea60bc0), Ricardo García Ramírez, Medium.com
* [Demystifying heapmaps: A comprehensive beginner's guide](https://datasciencedojo.com/blog/heatmaps/)

***

### **🔧 Your tasks:**

* [ ] First calculate the contingency table.
* [ ] Plot the contingency table as a heatmap. It will likely look like this:

<figure><img src="https://2669499530-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FnYNN3nXNuXMJpHACcH73%2Fuploads%2FaKnqLZI7NydSmEGdCKdo%2Fimage.png?alt=media&#x26;token=2571f1ae-83ce-4a2c-b9e4-1ab9d060b650" alt="" width="563"><figcaption></figcaption></figure>

* There is a problem here: The "Mineral Products" industry absolutely dominates all other industries, this prevents us from seeing the relationships of other columns. We can use a log scale to prevent this.

- [ ] Transform the contingency table values to log scale. Now you should have something like this:

<figure><img src="https://2669499530-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FnYNN3nXNuXMJpHACcH73%2Fuploads%2F3EOsxznHlYYsbYFxikQJ%2Fimage.png?alt=media&#x26;token=cb71e354-7e08-4146-82ca-f3d67c89d344" alt="" width="563"><figcaption></figcaption></figure>

* Optionally, to reduce clutter and focus on the most important industries, you can filter the dataset first to only include top 5 industries and then calculate the contingency table based on that. That would result in this:

<figure><img src="https://2669499530-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FnYNN3nXNuXMJpHACcH73%2Fuploads%2FggZItjecSsoX2rHnjejw%2Fimage.png?alt=media&#x26;token=3b9a60d8-14b0-45cb-8280-201f2b0253ea" alt="" width="563"><figcaption></figcaption></figure>

***

**Interpretation tasks:**

* [ ] 🤔 Do some online research to find an explanation for such dominance of the "Mineral Products" industry in maritime trade.
* [ ] 🤔 Do some online research to find out what real-world factors explain Asia's high "Animal & Animal Products" presence?

> <img src="https://2669499530-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FnYNN3nXNuXMJpHACcH73%2Fuploads%2Ft1yAGmUambZeYVQvPSeu%2Fp.png?alt=media&#x26;token=01872756-9ca8-44f9-9ec1-1ff5f70ce561" alt="" data-size="line">
>
> You can use pandas' `.crosstab()` for computing a contingency table between two variables.
>
> For log scale transformation, you can use numpy's `log1p()`. This calculates the natural logarithm of the value plus 1. (🤔Why plus one?)
>
> You can use `value_counts()` with `nlargest(x)` to filter the top `x` values.
