# Initial Data Visualization

***Note: Of the following plots, some were made in R and some in Python. Of course the same plots can be achieved in both languages, but they might look a bit different. Don't worry to much about recreating the exact same plots as we did.***&#x20;

Now that we got a feeling for the data, it’s time to explore and visualize it before diving into cleaning. Working with raw data helps identify potential issues and anomalies that need to be addressed during the cleaning process. Data visualization helps reveal patterns, trends, and insights that may not be immediately obvious from the raw data alone. By creating bar charts and other visual aids, you’ll enhance your understanding of the information and uncover valuable insights that can guide your analysis. Let’s dive in and see what the raw data has to show!

First we want to look at our sample by plotting some demographics (age, gender, descent). Think about your choice of plot type!

* [ ] Plot a histogram of age.
* [ ] Plot a barplot of descent and note down the 5 most affected victim descents.
* [ ] Plot the victim's gender distribution with a plot type of your choice.
* [ ] How many cases remain open? Create a bar chart to visualize this.

This is how a bar chart of the victims gender might look like:&#x20;

<figure><img src="https://825077565-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fy9vTgCprl10E1m8Ff5QP%2Fuploads%2F17DKhgF4Bl3ZXl9nQqWg%2Fe04d417b-b1e5-4652-925d-e997e7fd5001.png?alt=media&#x26;token=ef2dc1cc-a17d-44a0-a80f-3ecfbb49a86c" alt="" width="375"><figcaption></figcaption></figure>

Here is an example for the victim's gender distribution:

<figure><img src="https://825077565-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fy9vTgCprl10E1m8Ff5QP%2Fuploads%2FhwwhHQpHqh3TMnDuq4zr%2Fimage.png?alt=media&#x26;token=3456adfc-cd2a-4492-840e-626fbb8a7590" alt="" width="375"><figcaption></figcaption></figure>

* [ ] What do you think about this plot? Write down your thoughts about what's good and especially bad about this plot.

<figure><img src="https://825077565-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fy9vTgCprl10E1m8Ff5QP%2Fuploads%2FoYjQEEJxVG0TnuqWE4Hm%2Fimage.png?alt=media&#x26;token=ba41b663-42f0-4a14-a281-12a4e94c620f" alt="" width="563"><figcaption></figcaption></figure>

Often times we get unexpected results especially when working with data that we didn’t collect ourselves. Causes can be very diverse so that's why it's important to understand the underlying data before making conclusions. Can you explain everything in your plots or do you find any anomalies?

* [ ] Look at your plots you created above. Are there still some things unclear? If yes, how can you handle them? \
  (*Hint: When working with data that we didn't collect ourselves, look at the data source for a description*.)<br>

{% hint style="info" %}
🏴‍☠️:  If you’re looking for a simple solution to create bar charts, the base R function `barplot()` works perfectly. You can use it to quickly visualize the number of victims by sex or ethnicity by first creating a summary table using `table()` and then passing it to `barplot()`.

For more customization and detailed visualizations, the **ggplot2** package is a great option. You can use `ggplot()`combined with `geom_bar()` for more control over aesthetics like colors, themes, and labels. This will allow you to experiment and refine your charts to highlight insights in a visually appealing way.

Feel free to start simple with `barplot()` or dive deeper with **ggplot2** for a more enhanced visualization!
{% endhint %}

{% hint style="info" %}
:snake:: Matplotlib's `pyplot` module offers a variety of functions to create visually appealing charts.&#x20;

Some useful methods include `hist()` for histograms to display the distribution of numerical data, `bar()` for bar charts to compare different categories, and `pie()` for pie charts to show proportions within a whole.&#x20;

Don't forget to customize your plots with titles, labels, and colors to enhance readability. If you're unsure about the arguments these methods require, the Matplotlib documentation provides clear examples to help you get started.
{% endhint %}

{% hint style="info" %}
:snake:: Quick guide to creating visually pleasing charts in Python.

* **Figure Size & Layout:**
  * Use `plt.figure(figsize=(width, height))` to ensure your chart has the right proportions for readability and aesthetics.
  * Control layout spacing with `plt.tight_layout()` to avoid overlapping elements.
* **Contextual Elements:**
  * Add informative titles, axis labels, and legends using `plt.title()`, `plt.xlabel()`, `plt.ylabel()`, and `plt.legend()`.
  * Use `plt.grid()` to include light, unobtrusive gridlines for better readability.
* **Adding Color:**
  * Customize bar or pie chart colors using arguments like `color` (e.g., `color=['skyblue', 'orange']`).
  * Change background color or figure facecolor with `plt.gcf().set_facecolor('lightgray')`.
* **Polish with Details:**
  * Adjust font size and style using `fontsize` in titles, labels, or ticks.
  * Use `plt.xticks(rotation=45)` to handle long category names without clutter.
  * Avoid overloading colors and focus on a harmonious palette to maintain clarity.
* For more, check [this](https://www.earthdatascience.org/courses/scientists-guide-to-plotting-data-in-python/plot-with-matplotlib/introduction-to-matplotlib-plots/customize-plot-colors-labels-matplotlib/) and [this](https://ozbunae.medium.com/creating-an-elegant-plot-17de19a3550c) out.
  {% endhint %}
