Lean Six Sigma Resources

Residuals Analysis

Residuals analysis is one of the most important steps in validating any regression model. While regression equations and correlation coefficients often get the spotlight, residuals quietly reveal whether your model is trustworthy. In the Improve phase, where decisions translate directly into changes in the process, you cannot afford to rely on a model that violates assumptions or misrepresents reality. Residuals analysis ensures that the model you built is both statistically sound and practically reliable.

A residual is the difference between the observed value and the value predicted by the regression model. In simple terms, it represents the portion of the response that the model could not explain. If the model is appropriate, the residuals should behave like random noise—scattered evenly around zero with no patterns, trends, or structure. When residuals show patterns, they signal that the model is missing something important.

The first step in residuals analysis is examining a residuals‑versus‑fits plot. This plot shows residuals on the vertical axis and predicted values on the horizontal axis. Ideally, the points should form a random cloud. If you see a funnel shape, it suggests non‑constant variance—the spread of residuals increases or decreases with the predicted value. This violates a key regression assumption and can distort confidence intervals and p‑values. If you see curvature, it suggests that the relationship is not truly linear and that a transformation or non‑linear model may be more appropriate.

Another essential tool is the normal probability plot of residuals. Regression assumes that residuals are normally distributed. While the model can tolerate some deviation, strong departures from normality—such as heavy tails or skewness—can affect the accuracy of hypothesis tests and confidence intervals. If the points deviate significantly from the straight line, you may need to transform the response or consider a different modeling approach.

Residuals should also be independent. A residuals‑versus‑order plot helps you detect time‑based patterns. If residuals drift upward or downward over time, the process may be unstable, or the model may be missing a time‑related factor. Independence is especially important in service and transactional processes where time‑based patterns are common.

Outliers are another critical consideration. A single extreme point can distort the regression line, inflate the error, and weaken the model’s predictive power. Residuals analysis helps you identify outliers and determine whether they represent data entry errors, special causes, or legitimate but unusual conditions. Outliers should never be removed without understanding their cause.

In the Improve phase, residuals analysis is not just a statistical exercise—it is a quality check. It ensures that the model you rely on for predictions, decisions, and improvement actions is grounded in reality. When residuals behave well, you can trust your model. When they do not, residuals analysis points you toward the adjustments needed to strengthen it.

More than 120 free downloads:

Define, Measure, Analyze, Improve, Control
Article, Guide, Lesson, Template, Tool, Video

Store | Sign Up | Search this site | Contact