📈Linear Regression

We first would like to start with a simple linear regression model. Using the UN_data set, you can start by replicating our results.

The "Advanced Economies Dummy" is a dummy variable which is one if the country is an advanced economy, and zero otherwise. Labor participation and the employment shares are given in the data, rename them for your own convenience! "Top 1 - Top 3 Export to Advanced Economies" - variables are generated as follows: look at the countries which are the top 3 export countries (there are these three variables in our data set). Generate three new dummy variables which are one if the exporting countries are "Advanced Economies", zero if not, and NA if you do not have information on the country. For your convenience, you can click to find the list of countries considered "Advanced Economies".

After replicating our models, build a model of your own. Feel free to generate new variables too, but remember that this variable has to exist in the training and testing data set. Always extensively comment your code to explain what you are doing for your audience and for your future self. You can also include more models than our two and your mandatory own one, just explain your intuition behind your model and the variables. Once you have your models, briefly comment on the output. Specifically, tell us what coefficients are and how you can interpret them. If you are unsure how to interpret coefficients when one of your variables is in natural logs, have a peek here. Also, briefly explain what R-squared and the adjusted R-squared is, as well as what the stars and statistical significance means - all of this is covered in your DataCamp courses, but feel free to use a textbook or Google if you are unsure and would like to refresh your memory. Furthermore, what do you notice about the number of observations? Do you have the same number of observations as in your testing data subset? If no, why?

In python you can directly filter your data with the isin() function of a Dataframe. It can be helpful to write the filtered data into a new Dataframe.
sklearn.linear_model contains the LinearRegression.fit() method which you can use. We recommend using this although you can also use scipy.stats.linregress() . For output, the packages tabulate or PrettyTable can simplify things.

PreviousSet Up NextEvaluating Your Models

Last updated 11 months ago