Bagging and Pasting in Machine Learning: Enhancing Model Performance through Ensemble Methods

In the previous blog on Ensemble Learning Explained, we explored the world of ensemble learning and how combining multiple models can often outperform even the most finely tuned single model. We looked at voting classifiers—arguably the most intuitive form of ensemble learning—where each model casts a vote and the majority wins. While voting is simple and surprisingly effective, it only scratches the surface of what ensemble methods can do.

In this post, we are going deeper into the ensemble toolbox to look at two closely related techniques: Bagging and Pasting. If ensemble learning is about creating a collaborative team of models, then bagging and pasting are about how you build that team in the first place.

At first glance, these two techniques might appear nearly identical, and indeed they share a common core idea. Both involve training multiple instances of a base model on different subsets of the training data. But the difference in how these subsets are created can significantly affect the diversity and performance of the resulting ensemble.

Let’s unpack both methods in detail.

What’s the problem again?

Many machine learning models, especially decision trees, tend to overfit. They latch on too tightly to the training data and struggle when faced with new, unseen examples. One clever solution to this is to train multiple models on different subsets of the data and then average their predictions (for regression) or take a majority vote (for classification). This approach not only reduces variance but often improves generalization. Enter Bagging and Pasting.

The Core Idea: Train on Subsets, Combine the Results

Imagine you have a dataset and you’re training a decision tree on it. The tree might overfit, especially if it’s deep and your data is noisy. Now imagine training ten such trees, each on a different subset of your training data. When it’s time to make a prediction, you collect the outputs of all ten and either take the majority vote (for classification) or average them (for regression).

That is the essence of both bagging and pasting. The difference lies in how these subsets are formed.

Bagging

Bagging stands for Bootstrap Aggregating. It uses a statistical technique called bootstrap sampling, which means randomly selecting a subset of the training data with replacement. Because of this replacement, the same training instance may appear multiple times in a single subset, while others might not appear at all.

The key benefit of this randomness is diversity. Each model in the ensemble sees a slightly different view of the training data, which makes them less likely to all make the same mistakes. When their predictions are combined, the noise and variance of individual models are reduced.

In practice, bagging tends to work particularly well with models that have high variance and low bias—decision trees being the most common example. That is why the very popular Random Forest algorithm is essentially a forest of decision trees built using bagging.

Another advantage of bagging is that it allows us to evaluate model performance without using a separate validation set. Since some data points are left out during bootstrap sampling, these can be used as a kind of built-in validation set. This is known as out-of-bag (OOB) evaluation, and it often provides a reliable estimate of generalization performance.

Pasting

Pasting is a lesser-known but equally interesting technique. Conceptually, it’s nearly identical to bagging. The major difference is that it selects training subsets without replacement. This means every data point in a given subset is unique—there are no duplicates.

Because of this, pasting introduces slightly less randomness than bagging. That may result in slightly lower diversity among the models in the ensemble, which can impact the ensemble’s ability to reduce variance. However, the fact that every instance appears only once can be useful in some scenarios, particularly when working with smaller datasets where you don’t want to waste any training data.

In short, while bagging leans into randomness for diversity, pasting takes a more deterministic approach, ensuring that no data point is used more than once in any single training subset.

In both cases, the models are trained independently and their predictions are aggregated at the end.

Before we jump into code, a quick detour—don’t just scroll past this. If you’re finding this blog useful (or at least tolerable enough to make it this far), head over to my Instagram: @machinelearningsite and hit follow. It’s where I post bite-sized ML tips, behind-the-scenes project chaos, and the occasional sarcastic reel about why your model’s 99% accuracy is probably lying to you. Go on, your algorithm won’t mind the pause.

A Quick Implementation in Python

Let’s see both methods in action with Scikit-Learn. We’ll use the classic Iris dataset and train decision trees using both bagging and pasting to compare them.

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier, VotingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

# Load dataset
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

# Base estimator: single decision tree with some randomness
base_clf = DecisionTreeClassifier(max_features=0.8, random_state=42)

# 1. Individual classifier
base_clf.fit(X_train, y_train)
y_pred_base = base_clf.predict(X_test)
acc_base = accuracy_score(y_test, y_pred_base)

# 2. Bagging Classifier (with replacement)
# 2. Bagging Classifier (with replacement)
bagging_clf = BaggingClassifier(
    estimator=base_clf,
    n_estimators=10,
    max_samples=0.8,
    bootstrap=True,
    oob_score=True,
    random_state=42
)

bagging_clf.fit(X_train, y_train)
y_pred_bag = bagging_clf.predict(X_test)
acc_bag = accuracy_score(y_test, y_pred_bag)

print("Bagging Accuracy:", accuracy_score(y_test, y_pred_bag))
print("Out-of-Bag Score:", bagging_clf.oob_score_)

To switch to pasting, all we need to do is change one parameter:

pasting_clf = BaggingClassifier(
    estimator=base_clf,
    n_estimators=10,
    max_samples=0.8,
    bootstrap=False,
    random_state=42
)
pasting_clf.fit(X_train, y_train)
y_pred_paste = pasting_clf.predict(X_test)
acc_paste = accuracy_score(y_test, y_pred_paste)

# 4. Voting Classifier for comparison
voting_clf = VotingClassifier(
    estimators=[
        ('dt', DecisionTreeClassifier(random_state=42)),
        ('knn', KNeighborsClassifier(n_neighbors=5)),
        ('svc', SVC(probability=True, random_state=42))
    ],
    voting='soft'
)
voting_clf.fit(X_train, y_train)
y_pred_vote = voting_clf.predict(X_test)
acc_vote = accuracy_score(y_test, y_pred_vote)

# Print Results
print("Base Classifier Accuracy:", acc_base)
print("Bagging Accuracy        :", acc_bag)
print("Pasting Accuracy        :", acc_paste)
print("Voting Accuracy         :", acc_vote)

Even though the code for both methods is nearly identical, the subtle difference in how training data is sampled can lead to different levels of model diversity—and therefore different performance outcomes. Also, we compared the accuracy of bagging and pasting classifier to that of the base classifier and as expected, the ensemble models perform significantly better than individual classifiers.

So When Should You Use Bagging or Pasting?

If your base models tend to overfit the data, bagging is often a great choice. The randomness introduced by bootstrap sampling helps reduce variance and prevents the ensemble from simply memorizing the training data. This is especially true for decision trees, which is why Random Forests—an ensemble of bagged decision trees—are so effective in practice.

Pasting might be preferable when you want more control over the training data or when working with smaller datasets where repetition could be detrimental. It is also a good choice when the training process is expensive and you want to avoid redundancy across your training subsets.

Both methods are valuable tools for building stronger, more generalizable machine learning models.

Final Thoughts

Bagging and pasting are deceptively simple ideas with a big impact. They are based on the principle that training multiple models on different views of the data and combining their predictions can significantly improve accuracy and reduce overfitting. Whether you choose to sample with replacement (bagging) or without it (pasting), you are tapping into the power of randomness and diversity—two ingredients that are often underappreciated in machine learning.

Ensemble methods are a deep and fascinating area of machine learning, and we’re just getting started.

If you found this post helpful, stay tuned for the rest of the series. Feel free to share your experiments or ask questions in the comments. And as always—happy coding.

What’s More?

You’ve just learned the concept of bagging and pasting—powerful ensemble techniques that enhance model performance. But why stop here? To truly elevate your machine learning skills, consider exploring these handpicked articles: