Why Complex Models Need More Data: Polynomial Fitting in Machine Learning

You know how some people like to claim that the math we learn in high school has no real-world use? That it’s just a bunch of abstract equations you’ll never see again once you start working? Well, next time you hear someone say that, send them a link to this blog — because here’s a perfect counterexample.

Story time!

I’m currently working on a project that requires knowledge of the steering angle of a vehicle — a crucial parameter for several control and planning tasks. However, this value isn’t directly available on the vehicle’s CAN bus. What is available, though, is the steering wheel angle. The challenge, then, is to determine a reliable way to convert the steering wheel angle into the actual steering angle of the wheels.

To do this, we conducted an experiment using a rotating plate table to log pairs of corresponding values: the steering wheel angle (which we can measure electronically) and the steering angle (which we measure manually). This gave us a dataset — a collection of real, measurable input-output relationships. Our goal? To model this data with a function: one that takes the steering wheel angle as input and returns the steering angle.

Now here’s where it gets interesting. Once you have this dataset, you can fit a linear equation — if the relationship is simple. But if you want higher precision, especially across a wide range of steering inputs, you might need to fit a higher-degree polynomial, maybe even degree 5 or more. The math behind choosing and fitting such equations? It’s the same algebra, systems of equations, and polynomial functions you were probably taught in high school. Suddenly, those abstract math lessons become very real — and very practical.

This brings us to an important observation. The accuracy of the model you fit — whether it’s linear or nonlinear — depends heavily on how many measurements you collect. These measurements form the basis for a system of equations. The number of measurements (or equations) you have in relation to the number of unknowns (model parameters) determines whether the system has:

A unique solution
Infinitely many solutions
Or no exact solution at all

This isn’t just theoretical math — it has real consequences on your project’s performance. In our case, if we try to fit a 5th-degree polynomial (which has six unknown coefficients), but only collect four measurements, we can’t solve for all the coefficients uniquely. But if we collect ten measurements, we’re in a much stronger position to fit an accurate model that generalizes well.

Now here’s the exciting part — this very concept lies at the heart of machine learning.

In machine learning, especially when using nonlinear models (which are the norm in real-world problems), the number of model parameters can be large — sometimes in the thousands or millions. To determine those parameters reliably, you need a lot of data. The more complex or flexible the model, the more measurements (or training data) you need to avoid overfitting or instability.

So in this blog, we’re going to connect these ideas together. We’ll start with the classic mathematical relationship between equations and unknowns, look at what it means for a system to be overdetermined or underdetermined, and then carry that understanding into the world of machine learning.

We’ll also include a Python example that demonstrates how the number of data points affects the ability to fit a polynomial, and how that same principle applies to ML models.

The Number of Equations vs the Number of Unknowns

Let’s revisit a little algebra — nothing too complicated, just enough to understand how systems of equations work.

Suppose you want to fit a degree 2 polynomial: \(y=a_2x^2+a_1x+a_0\). This polynomial has three unknowns: a_2,a_1,a_0. If you collect three data points, you can construct three equations and solve for the coefficients exactly (assuming the equations are consistent and independent). This is called an exactly determined system.

But what if you have more data points — say 10? Now you have more equations than unknowns. The system becomes overdetermined, and unless all the points fall exactly on a degree-2 curve (linearity, which almost never happens in real data), you can’t find a perfect solution. Instead, you find the best approximation using least squares.

And if you have fewer data points than parameters — like only 2 points for a degree-2 polynomial — the system is underdetermined, and you end up with infinitely many polynomials that pass through the points. There’s not enough information to find a unique solution.

In summary:

Exactly determined: Number of equations = number of unknowns → unique solution
Overdetermined: More equations than unknowns → least squares solution
Underdetermined: Fewer equations than unknowns → infinite solutions

This relationship is the foundation for understanding why collecting enough data is crucial when building models.

Solving Polynomial Equations: When Do You Get Unique Solutions?

Let’s say you’re trying to fit a polynomial of degree d. This polynomial will have d+1 unknown coefficients.

So, if you want a unique solution for a:

Degree 1 polynomial: you need at least 2 data points
Degree 2 polynomial: you need at least 3 data points
Degree 5 polynomial: you need at least 6 data points

If you collect exactly d+1 data points, you can solve the system exactly. If you collect more than d+1, you’re in a better position — you can average out noise and get a more stable, generalized solution. This need for sufficient data is even more critical when you’re not fitting a clean mathematical curve, but rather training a model to generalize to new, unseen inputs.

The Jump to Machine Learning: Why Models Need More Data

Machine learning models — particularly the nonlinear ones like neural networks — often have dozens, thousands, or even millions of parameters. These models are powerful because they can represent very complex relationships. But the downside is that they need a lot of data to constrain all those parameters. Just like with polynomials, if you don’t have enough data points, the model has too much freedom. It might fit the training data perfectly, but it will likely fail on new data. This is the classic problem of overfitting.

On the other hand, simpler models (like linear regression) don’t require much data, but they may fail to capture important non-linearities in the data — which are almost always present in real-world scenarios. In short, the more complex the model, the more data you need. This is not just a heuristic; it’s rooted in the same principles that govern solving systems of equations.

Non-Linearity in Real Life and Non-Linearity in Models

In practice, most systems are nonlinear. Whether it’s predicting steering angles, weather patterns, financial markets, or customer behavior, real-world systems are rarely described accurately with a straight line. That’s why nonlinear models — from polynomial regression to deep learning — are so widely used.

But non-linearity comes with a cost. These models can express a much broader range of functions, but they require more data to constrain their parameters and avoid overfitting.

This brings us back full circle to the importance of collecting enough measurements — whether you’re doing a simple curve fit or training a large machine learning model.

Programming Example: Fitting Polynomials with Python

Let’s use Python to visualize how the number of data points affects the ability to fit a polynomial of a certain degree.

import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic data: steering wheel angle vs steering angle
x = np.array([-30, -15, 15, 30])  # 4 data points
y = np.array([-12, -5, 6, 13])

# Plot data
plt.scatter(x, y, color='red', label='Measured Data')

# Fit degree 1 polynomial (2 coefficients)
coeffs_deg1 = np.polyfit(x, y, 1)
y_pred1 = np.polyval(coeffs_deg1, x)
plt.plot(x, y_pred1, label='Degree 1 Fit')

# Fit degree 2 polynomial (3 coefficients)
coeffs_deg2 = np.polyfit(x, y, 2)
y_pred2 = np.polyval(coeffs_deg2, x)
plt.plot(x, y_pred2, label='Degree 2 Fit')

# Attempt to fit degree 5 polynomial (6 coefficients) with only 4 points
# This will technically produce a curve but may be numerically unstable
coeffs_deg5 = np.polyfit(x, y, 5)
y_pred5 = np.polyval(coeffs_deg5, x)
plt.plot(x, y_pred5, label='Degree 5 Fit (Overfit)')

plt.xlabel('Steering Wheel Angle (degrees)')
plt.ylabel('Steering Angle (degrees)')
plt.title('Polynomial Fits and the Importance of Data Quantity')
plt.legend()
plt.grid(True)
plt.show()

import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic data: steering wheel angle vs steering angle
x = np.array([-30, -15, 15, 30])  # 4 data points
y = np.array([-12, -5, 6, 13])

# Plot data
plt.scatter(x, y, color='red', label='Measured Data')

# Fit degree 1 polynomial (2 coefficients)
coeffs_deg1 = np.polyfit(x, y, 1)
y_pred1 = np.polyval(coeffs_deg1, x)
plt.plot(x, y_pred1, label='Degree 1 Fit')

# Fit degree 2 polynomial (3 coefficients)
coeffs_deg2 = np.polyfit(x, y, 2)
y_pred2 = np.polyval(coeffs_deg2, x)
plt.plot(x, y_pred2, label='Degree 2 Fit')

# Attempt to fit degree 5 polynomial (6 coefficients) with only 4 points
# This will technically produce a curve but may be numerically unstable
coeffs_deg5 = np.polyfit(x, y, 5)
y_pred5 = np.polyval(coeffs_deg5, x)
plt.plot(x, y_pred5, label='Degree 5 Fit (Overfit)')

plt.xlabel('Steering Wheel Angle (degrees)')
plt.ylabel('Steering Angle (degrees)')
plt.title('Polynomial Fits and the Importance of Data Quantity')
plt.legend()
plt.grid(True)
plt.show()

The Python code above generates a set of measured points and fits three polynomials of increasing complexity. The degree-1 and degree-2 fits are reasonable, while the degree-5 fit shows instability due to too few data points. The resulting plot is shown in the image below.

Try adding more data points and re-running the script. You’ll see that the higher-degree fit improves significantly with more data — just like a nonlinear machine learning model.

Before we proceed, if you’re enjoying this post and want more content like this — real-world machine learning, Python tricks, and project insights — follow me on Instagram @machinelearningsite. Not only you will find code snippets and updates on latest blogs but also …. hey, why don’t you check it out and see it for yourself! I am 96.23% sure you will follow.

Now, let’s get back to it.

What do we conclude in context of Machine Learning?

From this example and experience, several key lessons emerge: First, always match your model complexity to your data volume. Don’t use a complex model unless you have enough data to support it.

Second, remember that nonlinear models are powerful but inherently riskier in low-data settings. Use regularization, cross-validation, and simplicity as allies.

Third, understand that the math you learned — systems of equations, polynomials, least squares — is still directly relevant. These aren’t just academic exercises; they’re the foundation for real engineering and machine learning systems.

Finally, when you encounter a real-world nonlinear relationship — like mapping steering wheel angle to wheel direction — embrace the math, gather enough data, and model responsibly.

What’s Next?

Now that you understand how model complexity and data volume affect both traditional curve fitting and machine learning, you might be wondering how these ideas apply to more advanced models — like Support Vector Machines (SVMs).

SVMs are powerful supervised learning algorithms that handle nonlinear data exceptionally well by using a concept called the kernel trick. Just like in polynomial fitting, they introduce complexity to better model real-world patterns — and they also rely on the right balance of data and regularization to avoid overfitting.

If you’re curious about how SVMs manage non-linearity, margin maximization, and generalization, head over to this in-depth guide:

Step-by-Step Guide to Support Vector Machines with Hands-On Exercise

It’s the perfect next step if you’re looking to deepen your understanding of how machine learning models make sense of complex data.