Initial Data Visualization

Note: Of the following plots, some were made in R and some in Python. Of course the same plots can be achieved in both languages, but they might look a bit different. Don't worry to much about recreating the exact same plots as we did.

Now that we got a feeling for the data, it’s time to explore and visualize it before diving into cleaning. Working with raw data helps identify potential issues and anomalies that need to be addressed during the cleaning process. Data visualization helps reveal patterns, trends, and insights that may not be immediately obvious from the raw data alone. By creating bar charts and other visual aids, you’ll enhance your understanding of the information and uncover valuable insights that can guide your analysis. Let’s dive in and see what the raw data has to show!

First we want to look at our sample by plotting some demographics (age, gender, descent). Think about your choice of plot type!

This is how a bar chart of the victims gender might look like:

Here is an example for the victim's gender distribution:

Often times we get unexpected results especially when working with data that we didn’t collect ourselves. Causes can be very diverse so that's why it's important to understand the underlying data before making conclusions. Can you explain everything in your plots or do you find any anomalies?

🏴‍☠️: If you’re looking for a simple solution to create bar charts, the base R function barplot() works perfectly. You can use it to quickly visualize the number of victims by sex or ethnicity by first creating a summary table using table() and then passing it to barplot().

For more customization and detailed visualizations, the ggplot2 package is a great option. You can use ggplot()combined with geom_bar() for more control over aesthetics like colors, themes, and labels. This will allow you to experiment and refine your charts to highlight insights in a visually appealing way.

Feel free to start simple with barplot() or dive deeper with ggplot2 for a more enhanced visualization!

🐍: Matplotlib's pyplot module offers a variety of functions to create visually appealing charts.

Some useful methods include hist() for histograms to display the distribution of numerical data, bar() for bar charts to compare different categories, and pie() for pie charts to show proportions within a whole.

Don't forget to customize your plots with titles, labels, and colors to enhance readability. If you're unsure about the arguments these methods require, the Matplotlib documentation provides clear examples to help you get started.

🐍: Quick guide to creating visually pleasing charts in Python.

  • Figure Size & Layout:

    • Use plt.figure(figsize=(width, height)) to ensure your chart has the right proportions for readability and aesthetics.

    • Control layout spacing with plt.tight_layout() to avoid overlapping elements.

  • Contextual Elements:

    • Add informative titles, axis labels, and legends using plt.title(), plt.xlabel(), plt.ylabel(), and plt.legend().

    • Use plt.grid() to include light, unobtrusive gridlines for better readability.

  • Adding Color:

    • Customize bar or pie chart colors using arguments like color (e.g., color=['skyblue', 'orange']).

    • Change background color or figure facecolor with plt.gcf().set_facecolor('lightgray').

  • Polish with Details:

    • Adjust font size and style using fontsize in titles, labels, or ticks.

    • Use plt.xticks(rotation=45) to handle long category names without clutter.

    • Avoid overloading colors and focus on a harmonious palette to maintain clarity.

  • For more, check this and this out.

Last updated