A few months ago, I was working on a project where I had to train and develop a regression model that predicted the price of a vehicle given its features such as car age, brand, engine capacity, year of manufacture etc. The model initially suffered from overfitting, however after many trails with different regularization constants, I was able to develop a model that generalized well over new and unseen samples. However, with every new regularization value, I trained my model over the entire training set all over again, as a result, it costed me a lot of time to ultimately yield an optimum model.
It wasn’t until I read the book “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow” that I realized that instead of just trying out different regularization values to tune my hyperparameter, I could have opted for some more-efficient method. And that is exactly what we are going to talk about today in this blog.
What are Hyperparameters?
Hyperparameters are configuration settings that are determined prior to training a machine learning model. These parameters are not learned during the training process but are set by the user. They remain constant throughout the entire training process of the model and cannot be changed in the middle of the training process. Hyperparameters control the behavior of the learning algorithm and have a significant impact on the performance and behavior of the model. To build a model that generalizes well, you need to set the hyperparameters to the correct value. In other words, you need to tune your hyperparameters.
In practice, there is no formula to determine the best hyperparameter value. It is just a matter of experience that you achieve that right hyperparameter value. In the scenario that I’ve shared above, the particular hyperparameter that I wanted to tune was the regularization parameter. Other hyperparameters that may need tuning are:
- learning rate
- number of layers in a neural network
- number of nodes in a layer
- activation function
- batch size
- dropout rate
- number of epochs
There are certainly more variables that fall under the category of hyperparameters, depending on the type of model you use in your project. So the question remains: How to tune the hyperparameters efficiently?
Tuning Hyperparameters in an efficient manner
It is a common practice to divide the available data set into training and test set. This is done to evaluate model performance on unseen data. While tuning hyperparameters, we introduce a new batch called the validation set. This validation set is a part of the existing training set. To have a clear overview of this concept, we now divide all the available data into training, validation and test set. A general rule of thumb is 60 % data to training set and 20 % each to validation and test set.
The idea to determine a range of hyperparameters and train multiple models using those hyperparameters on the training set. The model that outperforms the other ones is then evaluated on the validation set. If the results are satisfactory, hyperparameters of that particular model are selected. We then develop new model using those hyperparameters, and train it on the training and the validation set combined. This model is ultimately tested on the test set. The flowchart below illustrates this entire concept:
To describe in detail, you can undergo the following steps to determine true hyperparameters. Say that you want to tune the hyperparamater regularization parameter:
- Create 3 models that have different regularization parameters.
- Train those models on the training set and evaluate their performance using the validation set.
- Pick out the model with the best performance among those three and train it again over the training set and the validation set combined.
- Ultimately, test the model on the test set and evaluate its results.
Summary
To summarize, hyperparameters can be tuned by dividing the available data into training and validation set. After determining the hyperparameters that suit the best, a final model is built using them and training over the training + validation set, and evaluated ultimately over test set.
If you are learning machine learning and want to become an expert in the field, I strongly recommend getting a hold of Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. The book covers important machine learning concepts and provides code that allows you to practice the concepts simultaneously as you learn them.
Subscribe to the MLS Newsletter to receive email updates on new and interesting machine learning articles.