Getting started

For this project, you will work with two crime datasets from Los Angeles covering the years 2020 to 2023. You will need to choose one of the following crime datasets to work with:

  1. The original crime dataset, which includes all reported crimes.

  2. The cleaned crime dataset, where crimes involving sensitive topics (such as sexual abuse or other potentially triggering content) have been removed.

ℹ️ Data source & variable's explanations can be found here.

You only need to download and work with one of these datasets based on your preference. However, please note that the cleaned dataset is not fully representative of the actual crime data, as it excludes certain categories of crimes. Any conclusions drawn from this version should be interpreted with caution, as key data has been omitted. You cannot infer strong causal relationships based on this dataset due to the missing information.

In addition to your selected crime dataset, you will also need to download the weather dataset, which includes weather data for the same time period. This will be useful for analyzing how weather conditions may have influenced crime patterns.

Once you've downloaded the weather dataset and your selected crime dataset, the next step is to load them into your working directory and begin the Exploratory Data Analysis (EDA). This process will help you understand the structure and contents of the data, laying the foundation for more detailed analysis later on.

🏴‍☠️: You can import data files by, for example, using the command dataset <- read_csv(“dataset.csv”). However, first you have to choose the correct folder / working directory to load the the file from. Here the command setwd("working directory") comes in handy. You also might have to install the readr package. Use the calls install.package("package") and library(package) to do so.

Besides that, make sure to have installed and loaded the tidyverse package as this one includes a lot of useful functions for the first exercises!

🐍: Before you import your data files, I would recommend you to import all kinds of important libraries and functions. It is really good to have imports at the beginning of your code (clean coding). The import of libraries and functions is as easy as you can imagine: use import (e.g. import matplotlib as mlp), for something specific, you just define: from where import what (e.g. from sklearn.tree import DecisionTreeClassifier)

To-Do's:

  1. Import all kinds of necessary libraries (pandas,...), methods and functions

  2. Import of the data: my personal tip is to create variables for your uploaded data (e.g. data_crime = xxxx). There are many creative ways how to upload the data (load_csv("file"), using numpy, using pandas,...). Use one effective way and remember to load the correct files and remember the variable "name" of each file (e.g. data_crime)

Last updated