Back to Blog·Machine Learning

A Guide to Overfitting and Underfitting in Machine Learning

AISchoolAuthor

October 21, 2023

7 min read

Introduction to Overfitting and Underfitting

When training machine learning models, one of the key challenges is balancing model complexity. Models that are too simple may struggle to detect the underlying patterns in the data. This is called underfitting. On the other hand, models that are too complex may overfocus on the noise and irrelevant details in the training data. This is called overfitting.

The goal in machine learning is to find the “Goldilocks” model that is “just right” – not too simple and not too complex. This ideal model is able to learn the true signal from the data without memorizing unnecessary noise. Finding the right level of model complexity for the problem and dataset is critical for success.

In this comprehensive guide, we will dig deep into overfitting and underfitting. You will learn how to detect overfitted and underfitted models, understand the dangers of these issues, and techniques to address them. Mastering model complexity helps build machine learning models that generalize well to new, unseen data.

What is Model Complexity?

Model complexity refers to how flexible or rigid a machine learning model is. Simple linear models like linear regression tend to have low complexity. The model only learns a single line or plane. More complex models like deep neural networks have many parameters and can learn highly non-linear relationships.

Model complexity is driven by factors like:

Number of features or input variables
Number of model parameters that must be learned
Type of model (linear vs. non-linear)
Depth of the model (e.g. number of hidden layers in a neural network)

Complex models are very flexible and can fit many types of functions. Simple models are more rigid but may fail to pick up nuanced patterns. The optimal model complexity depends on the complexity of the true underlying relationship we want to model.

Overfitting in Machine Learning

Overfitting occurs when a machine learning model fits the training data too well, but does poorly on new data. An overfit model has “memorized” the noise and details of the training set, instead of learning the true underlying relationship.

For example, imagine training a classifier to detect spam emails. If the model overfits, it may fit the quirks and characteristics of the emails in the training set, without learning general properties of spam. The performance on the training set will be excellent, but the model will do poorly on new emails it hasn’t seen before.

Overfitting commonly occurs with highly flexible models like deep neural networks. The abundance of parameters enables the model to perfectly fit the training data by essentially memorizing it.

Signs that a Model is Overfitting

There are several telltale signs that a machine learning model is overfitting:

Training error is low, but validation error is high. The gap between training and validation performance indicates overfitting.
Training error decreases, but validation error starts increasing. The model begins overfitting as capacity increases.
The model is very complex relative to the amount and complexity of the training data. High capacity models tend to overfit small or simple datasets.
The model achieves very high training accuracy (>95%) quickly and keeps improving. This suggests it may be memorizing details of the training set rather than learning general patterns.

Looking at metrics like training vs. validation error as model capacity increases helps diagnose overfitting. In general, if performance on the validation set degrades, it’s a sign of overfitting.

Problems with Overfitting

Overfitting leads to several issues:

Poor generalization – The model will not perform well on new data since it did not learn the true underlying patterns.
Increased variance – Small changes in the training data can significantly impact the learned patterns.
Lack of interpretability – Analyzing and reasoning about heavily overfit models is difficult.
Wasted resources – Additional model capacity beyond a certain point is not helpful since the model starts overfitting.

In real-world applications, being able to apply models to new, unseen data (generalization) is critical. Overfitting interferes with this goal and should generally be avoided, except in some cases like anomaly detection.

What is Underfitting in Machine Learning?

Underfitting occurs when a machine learning model is not complex enough to learn the underlying pattern in the data. An underfit model will have high error on both the training data and new data.

For example, fitting a simple linear model to non-linear data will underfit the relationship. The linear model lacks the capacity to adequately capture the patterns.

Underfitting commonly occurs with very simple models like linear regression applied to complex data. The model will struggle to understand the relationships beyond its limited capability.

Signs that a Model is Underfitting

There are a few signs that indicate a machine learning model is underfitting:

Training error and validation error are both high. The model fails to fit the training data well.
Increasing model capacity does not lead to lower training error. The model needs more complexity.
The model achieves poor accuracy (<60-70%) quickly and stays poor. The model lacks capacity.
Adding more data does not lead to lower error. The model cannot take advantage of additional training data.

If training a model leads to high error quickly which does not improve with additional iterations or data, it’s a clear sign of underfitting. The solution is to increase model capacity.

Problems with Underfitting

Underfitting causes the following issues:

Poor generalization – Like overfitting, underfitting hurts the model’s ability to generalize to new data since it did not capture the patterns well.
High bias – The rigid assumptions baked into an underfit model result in high bias.
Wasted resources – Collecting additional training data cannot help if the model does not have enough capacity to learn from it.

Choosing too simple a model can lead to many wasted resources trying to improve performance. The first priority should be ensuring the model has sufficient complexity for the problem.

The Bias-Variance Tradeoff

Overfitting and underfitting relate closely to the bias-variance tradeoff. Bias is error due to inflexible assumptions in the model. Variance is error due to too much model flexibility.

Simple models with low capacity have high bias and low variance. Complex models with high capacity have low bias but high variance. The goal is to find a model complexity that balances bias and variance.

As model complexity increases:

Bias decreases (good)
Variance increases (bad)

The optimal model complexity depends on the problem. Simple problems may need simple models. But most real-world datasets require some model complexity in order to fit nonlinear relationships while avoiding overfitting.

Techniques to Deal with Overfitting and Underfitting

There are several ways to combat overfitting and underfitting:

For overfitting:

Regularization – Penalizes model complexity by adding a cost for higher capacity models. Improves generalization.
Early stopping – Stops training before the model overfits the training data.
Dropouts – Temporarily drops neurons during neural network training to reduce inter-dependencies.
Data augmentation – Increases size of training data to decrease overfitting.
Model averaging – Averages predictions from multiple models to reduce variance.

For underfitting:

Increase model capacity – Use more complex models like deep neural networks.
Reduce constraints – Remove rigid assumptions limiting model flexibility.
Feature engineering – Create better model inputs that allow learning richer relationships.

In practice, most solutions deal with overfitting as it is more common. But underfitting should not be overlooked, especially early in the model development process.

Examples of Overfitting and Underfitting

Here are some examples of overfitting and underfitting:

Overfitting

A high-capacity deep neural network trained on a small image dataset. The model achieves 99% training accuracy but only 50% validation accuracy – a sign of overfitting.
A polynomial regression model trained to predict housing prices. As the polynomial degree increases, training error goes down but validation error goes up, indicating overfitting.
A support vector machine classifier with an RBF kernel trained to classify emails as spam or not spam. The model perfectly classifies the emails in the training set but does poorly on new emails.

Underfitting

Fitting a linear regression model to non-linear data. The model cannot capture the true relationship, resulting in high training and validation error.
Using logistic regression to predict complex human behavior. The rigid assumptions result in poor performance on complex datasets.
Training a decision tree with very little data and setting a max depth of 2. The overly simple model cannot learn the patterns well.

The key is to visualize training/validation performance over time and model complexity. Monitoring for divergence between the two curves helps diagnose overfitting and underfitting.

Conclusion

Finding the right level of model complexity is critical in machine learning. Overly simple models will underfit and fail to capture rich relationships in the data. Overly complex models will overfit – memorizing noise instead of learning generalizable patterns.

Techniques like regularization, early stopping, and getting more training data can help prevent overfitting. For underfitting, increasing model capacity and removing rigid assumptions enables learning more complex relationships.

The bias-variance tradeoff provides a useful framework for reasoning about model complexity. Optimal performance requires balancing the flexibility to fit the data with regularization to avoid capturing noise. By understanding overfitting, underfitting and the tools to address them, you can build machine learning models that generalize well to new data.

Share:

Back to Blog

Keep going.

More essays picked for what you just read - same topic, fresh angles.

Browse all articles

Implementing K-Means Clustering: A Beginner’s Guide to Unsupervised Learning

Same topic

Machine Learning

Implementing K-Means Clustering: A Beginner’s Guide to Unsupervised Learning

Introduction to Unsupervised Learning Unsupervised learning is a category of machine learning where algorithms learn patterns from data without any labeled outcomes or explicit instructions on what to predict (Supervised vs. Unsupervised Learning: What’s the Difference? | IBM). I

17 min readRead

Reinforcement Learning: A Beginner’s Guide

Same topic

Machine Learning

Reinforcement Learning: A Beginner’s Guide

Introduction to Reinforcement Learning Reinforcement learning (RL) is a powerful branch of machine learning that has gained significant attention in recent years. Unlike supervised and unsupervised learning, RL focuses on learning through interaction with an environment. In this

20 min readRead

Uncovering the Power of Density-Based Clustering with DBSCAN

Same topic

Machine Learning

Uncovering the Power of Density-Based Clustering with DBSCAN

Introduction to Density-Based Clustering Density-based clustering is a powerful unsupervised machine learning technique that aims to identify dense regions of data points and group them into clusters. Unlike other clustering algorithms like K-means, which require specifying the n

7 min readRead

Same topic

Machine Learning

Introduction to K-Means Clustering

In the vast realm of machine learning, K-means clustering stands out as a fundamental unsupervised learning algorithm. Its simplicity and effectiveness have made it a go-to choice for data scientists and analysts alike. In this comprehensive blog post, we’ll dive deep into the wo

8 min readRead

Stop reading. Start shipping.

Where reading ends, building begins.

Our cohort-led AI programs take you from reading about AI to shipping real products - live sessions, expert mentors, public Demo Days, and hiring-partner intros. Find the track that fits where you want to go.

Explore programs Get in touch

Trusted by 5,000+ learners building in AI worldwide

Live cohort programs

6-week sprints with real instructors and a real Demo Day.

Shipped products

Walk in with an idea. Walk out with a live URL.

Hiring partner intros

Alumni placed at Microsoft, Google, OpenAI, Anthropic and AI-native startups.