Discovering the Data

The first step in starting any project is loading the data. For the exploratory data analysis part (EDA), we will use the `Ports.csv` data set only.

Import the data set and describe it according to these questions:

Some of these questions are best answered in code, while some other questions require you to write us what you think. Key words will not suffice, please do tell your audience your reasoning. If you are unsure on how to answer this question or the other following questions, please go back to the section How to Complete the Project where we outline what we expect of your solutions.

And don't fret, if you cannot answer all questions right now, which we totally do not expect of you. The goal of this exercise is for you to structure your thoughts, as well as to understand and familiarize yourself with the data and be able to anticipate difficulties that might come up in your own independent research. For now, we will guide you along after your initial thinking process. Perhaps after finishing parts of the later questions, you will find and identify issues that will change the answer you gave to this very first question. If you do, please update your answer here. We want you to understand what you are doing, and we want to understand your thinking.

In order to manipulate big amounts of data, Pandas implements different functions to manipulate entries in the Dataframe. Useful functions are: loc,iloc,filter, value_counts and max. If some of those functions seem new to you, don’t scare away from them just now. When it comes to packages and their implementations, there are usually some easy to understand examples that can be found on their wiki page.

Last updated