๐ŸŽฌ
Data Science - Wintersemester 24/25
  • Welcome
  • Whatโ€™s Data Science and How Do I Do It?
    • ๐Ÿ“†Timeline
    • ๐Ÿดโ€โ˜ ๏ธR Overview
      • ๐Ÿ“ฉInstallation
      • ๐Ÿˆโ€โฌ›GitHub Setup
      • ๐Ÿฅ—DataCamp Courses
    • ๐ŸPython Overview
      • ๐Ÿ“ฉInstallation
      • ๐Ÿˆโ€โฌ›GitHub Setup
      • ๐Ÿ“ฆVirtual Environment Setup
      • ๐Ÿฅ—DataCamp Courses
  • Introduction to Your Project
    • About the Project Guide
    • What is this Project About?
  • Exploratory Data Analysis (EDA)
    • Getting started
    • Discovering the Data ๐Ÿ”Ž
      • Initial Exploration Tasks
      • Initial Data Visualization
    • Data Cleaning and Transformation
      • Cleaning the Crime Dataset๐Ÿ‘ฎ๐Ÿผ
      • Cleaning the Weather Dataset๐ŸŒฆ๏ธ
    • Data Visualization
      • Crime Rate Over Time
      • Crime Types
    • Grouping and Merging Data
    • Linear Regression
    • Impress us!
    • Internship Complete!
  • Advanced
    • Introduction
    • K-Means Clustering
      • The Clustering Model
      • Visualize the clusters
    • Impress us!
  • โœ…Exercise Checklist
  • Legal Disclaimer
Powered by GitBook
On this page
  1. Exploratory Data Analysis (EDA)

Getting started

PreviousWhat is this Project About?NextDiscovering the Data ๐Ÿ”Ž

Last updated 5 months ago

For this project, you will work with two crime datasets from Los Angeles covering the years 2020 to 2023. You will need to choose one of the following crime datasets to work with:

  1. The original crime dataset, which includes all reported crimes.

  2. The cleaned crime dataset, where crimes involving sensitive topics (such as sexual abuse or other potentially triggering content) have been removed.

Data source & variable's explanations can be found .

You only need to download and work with one of these datasets based on your preference. However, please note that the cleaned dataset is not fully representative of the actual crime data, as it excludes certain categories of crimes. Any conclusions drawn from this version should be interpreted with caution, as key data has been omitted. You cannot infer strong causal relationships based on this dataset due to the missing information.

In addition to your selected crime dataset, you will also need to download the weather dataset, which includes weather data for the same time period. This will be useful for analyzing how weather conditions may have influenced crime patterns.

Once you've downloaded the weather dataset and your selected crime dataset, the next step is to load them into your working directory and begin the Exploratory Data Analysis (EDA). This process will help you understand the structure and contents of the data, laying the foundation for more detailed analysis later on.

๐Ÿดโ€โ˜ ๏ธ: You can import data files by, for example, using the command dataset <- read_csv(โ€œdataset.csvโ€). However, first you have to choose the correct folder / working directory to load the the file from. Here the command setwd("working directory") comes in handy. You also might have to install the readr package. Use the calls install.package("package") and library(package) to do so.

Besides that, make sure to have installed and loaded the tidyverse package as this one includes a lot of useful functions for the first exercises!

: Before you import your data files, I would recommend you to import all kinds of important libraries and functions. It is really good to have imports at the beginning of your code (clean coding). The import of libraries and functions is as easy as you can imagine: use import (e.g. import matplotlib as mlp), for something specific, you just define: from where import what (e.g. from sklearn.tree import DecisionTreeClassifier)

To-Do's:

  1. Import all kinds of necessary libraries (pandas,...), methods and functions

  2. Import of the data: my personal tip is to create variables for your uploaded data (e.g. data_crime = xxxx). There are many creative ways how to upload the data (load_csv("file"), using numpy, using pandas,...). Use one effective way and remember to load the correct files and remember the variable "name" of each file (e.g. data_crime)

โ„น๏ธ
๐Ÿ
here