Grouping and Merging Data

You're doing great with the data visualization, but now your boss wants you to dig even deeper! 💃

Your boss gave you a scientific paper that says, "Hot temperatures directly influence aggression and violence."

🔥 -> 👊?

Your task now is to check if the data from the LAPD supports this claim:

So, what’s the next step? Let’s break it down!

Calculate the Crime Count for Each Day

To see if hot weather leads to more crime, we need to compare the number of crimes on hot days to those on cooler days. Luckily, you already have the weather dataset with daily temperatures! 🌡️

What’s still missing? The crime count per day!

Group your crime data by day and count how many crimes occurred each day. This will give you the total number of crimes on each day, which is exactly what you need to compare with the temperatures.

🏴‍☠️: To calculate the crime count per day, use the group_by() function to group your crime data by day, and then use summarize() to count the number of crimes for each group.

🐍

Use `groupby` with size() or count(), find out the number of crime each day

Merge the Temperature and Crime Count

Now that you have both the daily crime counts and daily temperatures, it’s time to merge the two datasets. By doing this, you'll be able to easily see which day had what temperature and how many crimes occurred.

🏴‍☠️: To merge the daily temperature data and daily crime counts, use the merge() function in R.

Ensure that both datasets have a common column (in this case, day) with the same format (e.g., Date) before merging. If the formats differ, you can use the as.Date() function to standardize them.

Optional: If you want to include all rows from one dataset even if there’s no match in the other, use the all.x = TRUE or all.y = TRUE argument in merge() for left or right joins.

🐍: You can use at least 3 clever methods and functions from pandas to merge your datasets. Be creative. You can use merge(), concat() and join() from pandas. You should read and look into methods and what the differences are. You've calculated the daily crime count before, you know the drill 😉

Last updated