🎬
Data Science - Wintersemester 24/25
  • Welcome
  • What’s Data Science and How Do I Do It?
    • 📆Timeline
    • 🏴‍☠️R Overview
      • 📩Installation
      • 🐈‍⬛GitHub Setup
      • 🥗DataCamp Courses
    • 🐍Python Overview
      • 📩Installation
      • 🐈‍⬛GitHub Setup
      • 📦Virtual Environment Setup
      • 🥗DataCamp Courses
  • Introduction to Your Project
    • About the Project Guide
    • What is this Project About?
  • Exploratory Data Analysis (EDA)
    • Getting started
    • Discovering the Data 🔎
      • Initial Exploration Tasks
      • Initial Data Visualization
    • Data Cleaning and Transformation
      • Cleaning the Crime Dataset👮🏼
      • Cleaning the Weather Dataset🌦️
    • Data Visualization
      • Crime Rate Over Time
      • Crime Types
    • Grouping and Merging Data
    • Linear Regression
    • Impress us!
    • Internship Complete!
  • Advanced
    • Introduction
    • K-Means Clustering
      • The Clustering Model
      • Visualize the clusters
    • Impress us!
  • ✅Exercise Checklist
  • Legal Disclaimer
Powered by GitBook
On this page
  1. Exploratory Data Analysis (EDA)
  2. Discovering the Data 🔎

Initial Data Visualization

PreviousInitial Exploration TasksNextData Cleaning and Transformation

Last updated 3 months ago

Note: Of the following plots, some were made in R and some in Python. Of course the same plots can be achieved in both languages, but they might look a bit different. Don't worry to much about recreating the exact same plots as we did.

Now that we got a feeling for the data, it’s time to explore and visualize it before diving into cleaning. Working with raw data helps identify potential issues and anomalies that need to be addressed during the cleaning process. Data visualization helps reveal patterns, trends, and insights that may not be immediately obvious from the raw data alone. By creating bar charts and other visual aids, you’ll enhance your understanding of the information and uncover valuable insights that can guide your analysis. Let’s dive in and see what the raw data has to show!

First we want to look at our sample by plotting some demographics (age, gender, descent). Think about your choice of plot type!

This is how a bar chart of the victims gender might look like:

Here is an example for the victim's gender distribution:

Often times we get unexpected results especially when working with data that we didn’t collect ourselves. Causes can be very diverse so that's why it's important to understand the underlying data before making conclusions. Can you explain everything in your plots or do you find any anomalies?

🏴‍☠️: If you’re looking for a simple solution to create bar charts, the base R function barplot() works perfectly. You can use it to quickly visualize the number of victims by sex or ethnicity by first creating a summary table using table() and then passing it to barplot().

For more customization and detailed visualizations, the ggplot2 package is a great option. You can use ggplot()combined with geom_bar() for more control over aesthetics like colors, themes, and labels. This will allow you to experiment and refine your charts to highlight insights in a visually appealing way.

Feel free to start simple with barplot() or dive deeper with ggplot2 for a more enhanced visualization!

Some useful methods include hist() for histograms to display the distribution of numerical data, bar() for bar charts to compare different categories, and pie() for pie charts to show proportions within a whole.

Don't forget to customize your plots with titles, labels, and colors to enhance readability. If you're unsure about the arguments these methods require, the Matplotlib documentation provides clear examples to help you get started.

  • Figure Size & Layout:

    • Use plt.figure(figsize=(width, height)) to ensure your chart has the right proportions for readability and aesthetics.

    • Control layout spacing with plt.tight_layout() to avoid overlapping elements.

  • Contextual Elements:

    • Add informative titles, axis labels, and legends using plt.title(), plt.xlabel(), plt.ylabel(), and plt.legend().

    • Use plt.grid() to include light, unobtrusive gridlines for better readability.

  • Adding Color:

    • Customize bar or pie chart colors using arguments like color (e.g., color=['skyblue', 'orange']).

    • Change background color or figure facecolor with plt.gcf().set_facecolor('lightgray').

  • Polish with Details:

    • Adjust font size and style using fontsize in titles, labels, or ticks.

    • Use plt.xticks(rotation=45) to handle long category names without clutter.

    • Avoid overloading colors and focus on a harmonious palette to maintain clarity.

: Matplotlib's pyplot module offers a variety of functions to create visually appealing charts.

: Quick guide to creating visually pleasing charts in Python.

For more, check and out.

🐍
🐍
this
this