Your Personal Data

3 Your personal data

3.1 Load your data

You will now use the data set which you requested from Netflix. In the Netflix folder you will find the document of interest: ViewingActivity. Load this in your environment and inspect it as you did before with the Netflix Dataset. If you can’t request your data, ask your mentor; they will provide you with an alternative data set.

3.2 Clean and transform dataset

As you might have noticed, Netflix recorded every time you clicked on a movie even if you didn’t watch it. Check which column indicates those with a specific value.

3.3 Merging datasets: primitive approach

As a data scientist, you’ll often find yourself working with data sets from different data sources referencing the same object. For example, you might have the movie names in one file and the respective genre in a separate file. It would make more sense to just merge the two data sets into one. Indeed, this is the case with our data. Your Netflix data does not provide information on genre, actor, or director, while the general Netflix data set does.

There are several things kind of “wrong” with the merged data set. What is it and why? Tipp: You need to prepare the dataset in a way that the title, session and episode will be split into a column of title, session and episode separately in order for the two data frames to join each other properly.

3.4 Dynamic line plot

Your goal for this task is to plot how each viewer’s activity was recorded over time. Since it would be a bit unclear in a static plot, let’s try to do a dynamic chart here!

The data is already ready for this. Your ggplot() call might need some more input than you used before. As the graphic is supposed to be a line with points geom_line() and geom_point () will be a good choice of combination. The dynamic aspect is no magic, you just need to add the transition_reveal() call.

Just a heads up, in Python the Solution will not be as straight forward as in R (Python offers way too much customization for that). This Dynamic plot will help us to further understand the Backbone of all the graphs we have plotted before, we will learn more about matplotlib trying to animate our simple linelplot. If you feel like you are not up to this challenge already, dont worry, a lineplot with nice axis labels will also be counted as a solution. For those who are up to the challenge: try finding guidelines online with the keywords “animated lineplot python”. A possible solution will be to use the matplotlib.animation Module, specifically the Function “FuncAnimation”. This Function will require us to set our own update function, where we tell matplotlib what to display in every iteration that the update function will be called. Every update call will be turned into 1 frame in the animation.

If you managed to create a static plot, but ran into trouble making the plot dynamic, don't hesitate to ask your mentors.

Last updated