๐ŸŽฌ
Data Science - Wintersemester 24/25
  • Welcome
  • Whatโ€™s Data Science and How Do I Do It?
    • ๐Ÿ“†Timeline
    • ๐Ÿดโ€โ˜ ๏ธR Overview
      • ๐Ÿ“ฉInstallation
      • ๐Ÿˆโ€โฌ›GitHub Setup
      • ๐Ÿฅ—DataCamp Courses
    • ๐ŸPython Overview
      • ๐Ÿ“ฉInstallation
      • ๐Ÿˆโ€โฌ›GitHub Setup
      • ๐Ÿ“ฆVirtual Environment Setup
      • ๐Ÿฅ—DataCamp Courses
  • Introduction to Your Project
    • About the Project Guide
    • What is this Project About?
  • Exploratory Data Analysis (EDA)
    • Getting started
    • Discovering the Data ๐Ÿ”Ž
      • Initial Exploration Tasks
      • Initial Data Visualization
    • Data Cleaning and Transformation
      • Cleaning the Crime Dataset๐Ÿ‘ฎ๐Ÿผ
      • Cleaning the Weather Dataset๐ŸŒฆ๏ธ
    • Data Visualization
      • Crime Rate Over Time
      • Crime Types
    • Grouping and Merging Data
    • Linear Regression
    • Impress us!
    • Internship Complete!
  • Advanced
    • Introduction
    • K-Means Clustering
      • The Clustering Model
      • Visualize the clusters
    • Impress us!
  • โœ…Exercise Checklist
  • Legal Disclaimer
Powered by GitBook
On this page
  • Visualisation
  • Correlation
  • Simple Linear Regression
  • Linear Regression
  1. Exploratory Data Analysis (EDA)

Linear Regression

PreviousGrouping and Merging DataNextImpress us!

Last updated 3 months ago

Now youโ€™re ready to see if hot weather really correlates with crime rates! ๐Ÿ”ฅ๐Ÿ‘ฎ

Visualisation

Correlation

But checking with the eyes is unprofessional! Let's check the correlation of the variables!

Simple Linear Regression

Now that we see, that there could be a correlation between temperature and crime count. Let's check with a simple linear regression!

Linear Regression

Simple linear regression just wouldn't be enough. Let's get some more independent variables in!

๐Ÿดโ€โ˜ ๏ธ: To create a smooth visualization of noisy data, apply a rolling mean to both the crime count and temperature before plotting. Use the zoo package's rollmean() function to calculate a 30-day rolling mean. Then, use ggplot() to create your plot and geom_line() to visualize both the crime count and temperature data. To ensure both datasets are properly aligned, use a scaling factor to adjust the temperature values. For the secondary y-axis (temperature), use sec_axis() to display the temperature in Celsius. Adjust the y-axis limits with scale_y_continuous() and apply colors using scale_color_manual() to make the lines distinguishable.

To explore the relationship between crime count and temperature, consider using the lm() function to perform a linear regression.

๐Ÿ: To create a smoother visualization of noisy data, apply a rolling mean to both the crime count and temperature using the .rolling() and .mean() methods before plotting.

You'll need a graph with a dual Y-axis: Use plt.subplots()to create a chart with user-defined dimensions.

Use twinx() to add a second Y-axis, allowing you to display two different data scales on the same graph and the .corr() method to determine the correlation between temperature and crime rate.

To explore the relationship between crime count and temperature, as well as multiple variables, consider using the OLS function from the statsmodels library to perform both simple and multiple linear regressions.

Color and labels help visually distinguish the data lines, enhancing clarity and making the plot easier to interpret.

๐Ÿ˜„
๐Ÿ˜Ž