Evaluating Your Models
Now that you have serveral linear regression models, the time has come to evaluate which model of yours is the best. Defining "what is best" can be tricky: how should be the goodness of fit be evaluated. You have already encountered one approach to evaluate models: using R-squared and adjusted R-squared, which usually are conveniently reported, and is a relative measure of fit. If you have not reviewed the two mentioned measures in the previous exercise, go check them out now. As you probably have read: R-squared comes with the downfall that it will never decrease with the number or regressors used in a model. While adjusted R-squared penalizes the more regressors are used in a model, we should still keep this issue in the back of our heads.
Additionally, we should considers the Root Mean Squared Error (RMSE), which we will calculate ourselves, with the following formula:
Furthermore, remember that we split in our data into two parts? We will not only consider the RMSE of the the model we fitted on to our training data - in our sample so to speak, but also how the model does in predicting values on our testing data - out of sample. Confused what to do now? Follow the below steps:
![]()
If you previously used
sklearnand have your model object, you can use itspredict()method to make predictions. If you usedscipyyou can write a simple prediction function yourself and apply it to your test data using pandasDataFrame.apply()function.
Last updated
