๐ŸŽฌ
Data Science - Wintersemester 24/25
  • Welcome
  • Whatโ€™s Data Science and How Do I Do It?
    • ๐Ÿ“†Timeline
    • ๐Ÿดโ€โ˜ ๏ธR Overview
      • ๐Ÿ“ฉInstallation
      • ๐Ÿˆโ€โฌ›GitHub Setup
      • ๐Ÿฅ—DataCamp Courses
    • ๐ŸPython Overview
      • ๐Ÿ“ฉInstallation
      • ๐Ÿˆโ€โฌ›GitHub Setup
      • ๐Ÿ“ฆVirtual Environment Setup
      • ๐Ÿฅ—DataCamp Courses
  • Introduction to Your Project
    • About the Project Guide
    • What is this Project About?
  • Exploratory Data Analysis (EDA)
    • Getting started
    • Discovering the Data ๐Ÿ”Ž
      • Initial Exploration Tasks
      • Initial Data Visualization
    • Data Cleaning and Transformation
      • Cleaning the Crime Dataset๐Ÿ‘ฎ๐Ÿผ
      • Cleaning the Weather Dataset๐ŸŒฆ๏ธ
    • Data Visualization
      • Crime Rate Over Time
      • Crime Types
    • Grouping and Merging Data
    • Linear Regression
    • Impress us!
    • Internship Complete!
  • Advanced
    • Introduction
    • K-Means Clustering
      • The Clustering Model
      • Visualize the clusters
    • Impress us!
  • โœ…Exercise Checklist
  • Legal Disclaimer
Powered by GitBook
On this page
  • Calculate the Crime Count for Each Day
  • Merge the Temperature and Crime Count
  1. Exploratory Data Analysis (EDA)

Grouping and Merging Data

PreviousCrime TypesNextLinear Regression

Last updated 3 months ago

You're doing great with the data visualization, but now your boss wants you to dig even deeper!

Your boss gave you a that says, "Hot temperatures directly influence aggression and violence."

๐Ÿ”ฅ -> ?

Your task now is to check if the data from the LAPD supports this claim:

So, whatโ€™s the next step? Letโ€™s break it down!

Calculate the Crime Count for Each Day

To see if hot weather leads to more crime, we need to compare the number of crimes on hot days to those on cooler days. Luckily, you already have the weather dataset with daily temperatures! ๐ŸŒก๏ธ

Whatโ€™s still missing? The crime count per day!

Group your crime data by day and count how many crimes occurred each day. This will give you the total number of crimes on each day, which is exactly what you need to compare with the temperatures.

๐Ÿดโ€โ˜ ๏ธ: To calculate the crime count per day, use the group_by() function to group your crime data by day, and then use summarize() to count the number of crimes for each group.

Use `groupby` with size() or count(), find out the number of crime each day

Merge the Temperature and Crime Count

Now that you have both the daily crime counts and daily temperatures, itโ€™s time to merge the two datasets. By doing this, you'll be able to easily see which day had what temperature and how many crimes occurred.

๐Ÿดโ€โ˜ ๏ธ: To merge the daily temperature data and daily crime counts, use the merge() function in R.

Ensure that both datasets have a common column (in this case, day) with the same format (e.g., Date) before merging. If the formats differ, you can use the as.Date() function to standardize them.

Optional: If you want to include all rows from one dataset even if thereโ€™s no match in the other, use the all.x = TRUE or all.y = TRUE argument in merge() for left or right joins.

: You can use at least 3 clever methods and functions from pandas to merge your datasets. Be creative. You can use merge(), concat() and join() from pandas. You should read and look into methods and what the differences are. You've calculated the daily crime count before, you know the drill

๐Ÿ
๐Ÿ˜‰
๐Ÿ’ƒ
๐Ÿ‘Š
๐Ÿ
scientific paper