๐ŸŽฌ
Data Science - Wintersemester 24/25
  • Welcome
  • Whatโ€™s Data Science and How Do I Do It?
    • ๐Ÿ“†Timeline
    • ๐Ÿดโ€โ˜ ๏ธR Overview
      • ๐Ÿ“ฉInstallation
      • ๐Ÿˆโ€โฌ›GitHub Setup
      • ๐Ÿฅ—DataCamp Courses
    • ๐ŸPython Overview
      • ๐Ÿ“ฉInstallation
      • ๐Ÿˆโ€โฌ›GitHub Setup
      • ๐Ÿ“ฆVirtual Environment Setup
      • ๐Ÿฅ—DataCamp Courses
  • Introduction to Your Project
    • About the Project Guide
    • What is this Project About?
  • Exploratory Data Analysis (EDA)
    • Getting started
    • Discovering the Data ๐Ÿ”Ž
      • Initial Exploration Tasks
      • Initial Data Visualization
    • Data Cleaning and Transformation
      • Cleaning the Crime Dataset๐Ÿ‘ฎ๐Ÿผ
      • Cleaning the Weather Dataset๐ŸŒฆ๏ธ
    • Data Visualization
      • Crime Rate Over Time
      • Crime Types
    • Grouping and Merging Data
    • Linear Regression
    • Impress us!
    • Internship Complete!
  • Advanced
    • Introduction
    • K-Means Clustering
      • The Clustering Model
      • Visualize the clusters
    • Impress us!
  • โœ…Exercise Checklist
  • Legal Disclaimer
Powered by GitBook
On this page
  1. Exploratory Data Analysis (EDA)
  2. Discovering the Data ๐Ÿ”Ž

Initial Exploration Tasks

Let's kick things off with a few straightforward tasks that will help you get familiar with the datasets and deepen your understanding of their structure!

PreviousDiscovering the Data ๐Ÿ”ŽNextInitial Data Visualization

Last updated 5 months ago

๐Ÿดโ€โ˜ ๏ธ: You could use dplyrโ€™s select, filter, count and arrange function to compute the desired outcome to answer the question. If you havenโ€™t heard of the dplyr package yet, take the respective DataCamp course asap! However, str() , dim() and summary() are also a powerful non-dplyr functions for getting started.

: When exploring and manipulating data, Pandas offers a range of functions to make tasks easier.

Useful functions from this task include

  • shape (to check dataset dimensions),

  • notnull (to count non-missing values) or isnull(),

  • sum()(to take a sum of something),

  • dtypes (to identify variable types),

  • sort_values()(to sort a Series) and

  • value_counts() (to find the distribution of categories).

If these methods feel unfamiliar, donโ€™t hesitate to explore their examples in the Pandas documentation or try them out on small subsets of the data. Hands-on practice is the best way to get comfortable with these tools!

๐Ÿ