Initial Exploration Tasks

Let's kick things off with a few straightforward tasks that will help you get familiar with the datasets and deepen your understanding of their structure!

🏴‍☠️: You could use dplyr’s select, filter, count and arrange function to compute the desired outcome to answer the question. If you haven’t heard of the dplyr package yet, take the respective DataCamp course asap! However, str() , dim() and summary() are also a powerful non-dplyr functions for getting started.

🐍: When exploring and manipulating data, Pandas offers a range of functions to make tasks easier.

Useful functions from this task include

  • shape (to check dataset dimensions),

  • notnull (to count non-missing values) or isnull(),

  • sum()(to take a sum of something),

  • dtypes (to identify variable types),

  • sort_values()(to sort a Series) and

  • value_counts() (to find the distribution of categories).

If these methods feel unfamiliar, don’t hesitate to explore their examples in the Pandas documentation or try them out on small subsets of the data. Hands-on practice is the best way to get comfortable with these tools!

Last updated