Linear Regression Model-Part 2

3 min readMay 5, 2021

In my previous article, we went over the model fitting for the given data of percentage of hardwood pulp in the paper(X) and the tensile strength of the paper (Y).

Once we have established the significance of the regression model and the estimated of the population parameters of the model, we then have to analyze the trends of the residuals to verify the assumptions of the linear regression model. We run the following R code to obtain a series if insightful plots as discussed below.

Note: The assumptions of the linear regression plots cannot be completely met, but we are examining if those assumptions are relatively met for us to use this analysis and continue with the regression model.
Although, we can always plot the observed value and check for the linearity (in case of simple linear regression) because we would have only two variables. But, while fitting multiple linear regression, we would have multiple regressors and hence, that scatter plot is harder to obtain and analyze, in that case, these residual plots play a vital role.

R code for obtaining the Diagnostic Model plots:
> plot(model)
*model refers to the variable that stores the linear regression model built in R (from the previous article)

Output Plot 1: Residuals v/s Fitted Values

Checking the Linearity Assumption

According to the theory, the plot of the Residuals v/s Fitted Values for a decent/predictive/good model should be random in nature. As seen from the plot above, the plot is random in nature and doesn’t portray any trend, like, an upwards sloping line/ a downward sloping line/ cyclic nature etc. Also, with reference to the vertical axis, we see that the values of the residuals should be concentrated around the mean value of the residuals, i.e., 0 (stemming from the assumption, Errors ~ N(0,1)).

Although the red line should be fairly flat (or along the x=0 line) it is not the case here.

Output Plot 2: QQ plot for the Residuals

Normal QQ plot

Since one of the assumptions of the Linear regression model is the normality of the residuals, this plot gives us an understanding of the QQ plot for the residuals. It plots the Standardized residuals of the model against the Theoretical Quantiles. Once we obtain the observed plots on the straight 45 degree line, we can safely conclude that the residuals obey normality. (as also discussed in the article Testing for Normality)

Output Plot 4: Residuals vs Leverage Plot

Non Linearities and Non-constant Variance

The last two plots of the group of four plots for diagnostics of the model, they help us understand the non-linearities and the non-constant variance assumptions of the model. We should ideally observe a linear trend in the red line in Plot 3 i.e., the Scale-Location Plot. We use the Standardized residuals for both the plots. Practically, we have three types of residuals in a model: Regular Residuals, Standardized Residuals and Studentized Residuals.
Plot 4, Leverage meaning: The measure of ‘how far from central tendency is the observed predictor variable’.

Limitations of the Model Fit

You might notice some inefficiencies in the model fit, due to the lack of sufficiently large data. Once we have a large and somewhat exhaustive dataset to all the situations, we can obtain a much better estimate and much better fit of the models. This also increases the type of the diagnostics plot of the model, making the assumptions of the linear regression decently valid.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Regression Analysis

Data Science

Beginners Guide

Written by Anushka Agrawal

21 Followers

4 Following

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

More from Anushka Agrawal

Test for Heteroscedasticity, Multicollinearity and Autocorrelation

Nerd For Tech

Anushka Agrawal

Test for Heteroscedasticity, Multicollinearity and Autocorrelation

In the articles earlier, we understood the importance of observing the three behaviors in the model: Homoscedasticity, Multicollinearity…

May 7, 2021

Simple Linear Regression Modeling-Part 1

Nerd For Tech

Anushka Agrawal

Simple Linear Regression Modeling-Part 1

Regression Analysis is one of the most acknowledged and useful tools of statistics. It is one of the most efficient ways to understand the…

May 5, 2021

Anushka Agrawal

Hypothesis Testing in Statistics

At times, the magnitudes of various statistics, trick us into believing that some relation between the variables exists, when the truth is…

May 1, 2021

Anushka Agrawal

Terminologies in Hypothesis Testing

In my previous article, we went through the story of the necessity of hypothesis testing, the types of hypothesis and the way we form the…

May 2, 2021

See all from Anushka Agrawal

Recommended from Medium

Multicollinearity Problems in Linear Regression. Clearly Explained!

Manoj Mangam

Multicollinearity Problems in Linear Regression. Clearly Explained!

A behind-the-scenes look at the infamous multicollinearity

Mar 21, 2023

Interpreting Support Vector Machine Coefficients: A Comprehensive Analysis

D.H. Jang

Interpreting Support Vector Machine Coefficients: A Comprehensive Analysis

In the rapidly advancing landscape of artificial intelligence (AI) and machine learning (ML), specific methodologies and their…

Nov 3, 2024

Lists

Predictive Modeling w/ Python

20 stories1857 saves

Practical Guides to Machine Learning

10 stories2225 saves

Coding & Development

11 stories1033 saves

ChatGPT prompts

51 stories2643 saves

How Does Our Sense of Humor Change With Age? A Statistical Analysis

Fanfare

Daniel Parris

How Does Our Sense of Humor Change With Age? A Statistical Analysis

How do our comedic sensibilities form and transform over time?

Jun 22, 2024

TDS Archive

Dr. Roi Yehoshua

Mastering Logistic Regression

From theory to implementation in Python

May 20, 2023

Data Science All Algorithm Cheatsheet 2025

Artificial Intelligence in Plain English

Ritesh Gupta

Data Science All Algorithm Cheatsheet 2025

Stories, strategies, and secrets to choosing the perfect algorithm.

Jan 5

My Data Scientist — 2 Interview Experience at Zepto

Ajit Kumar Singh

My Data Scientist — 2 Interview Experience at Zepto

Hey everyone! 👋

Jan 31

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams