Business photo created by kjpargeter — www.freepik.com

How hyperparameters tuning works

Typical ML model training workflow looks fairly standard these days:

  • Prepare data for training and validation
  • Choose architecture for the model
  • Pick initial parameters that make sense
  • Tweak the model and its parameters until you get a result that meets your requirements

Today I’d like to explore the last step: tweaking the model. There’s even a special name for this process: hyperparameters optimization, which is a relatively big field with existing tools and some competition between them.

However, if all you need is hyperparameters tuning, and you don’t need any bells or whistles on top, it’ll be fairly simple software. And still quite interesting how such a simple tool helps in daily ML tasks.

So, I’ve got an idea that writing such a framework from scratches would be good for my next Medium post, and will allow me to show how such tools work :)

First of all let me define what exactly I’m going to build: micro-framework written in Python that simplifies and speeds up optimization of certain hyperparameters of my future Keras models.

  • The framework must be written in Python
  • The framework must be fully compatible with Keras
  • The framework must be compatible with TensorFlow Dataset API
  • 1-month+ of adoption time shouldn’t be required to start using a framework

Sounds legit, but this formulation also requires me to define what’s “tuning” and what’s “certain hyperparameters”.

Each network has general architecture, say ResNet50-inspired network. That’s what defines the high-level structure of the neural network. Certain combinations of layers, specific activation functions in certain places, etc. But then, there are all other activation functions, initial parameters distributions, optimizer to use, learning rate, optional structural blocks within a neural network, and so on. So, that’s parameters that define how the neural network will learn its parameters.

Now “grid search”. This kind of means that the framework I need should accept a list of parameters, test them one by one, and tell me which parameters define the best model. So this list of parameters is a “grid”, and we’ll search for the best combination of parameters. Kind of “brute force” approach in ML/DL field.

This grid might look like the dict below:

params = {'lr': [0.001, 0.0005],
'loss': ['mean_squared_error'],
'act_in': ['relu', 'swish', 'tanh'],
'act_out': ['tanh', 'sigmoid']}

And then the framework will generate a list of all possible combinations from this structure. Frankly speaking, it’s just a nested for-loop.

From the end-user point of view the result will look like this:

{'lr': 0.001, 'loss': 'mean_squared_error', 'act_in': 'relu', 'act_out': 'tanh'}
{'lr': 0.001, 'loss': 'mean_squared_error', 'act_in': 'relu', 'act_out': 'sigmoid'}
....
{'lr': 0.005, 'loss': 'mean_squared_error', 'act_in': 'tanh', 'act_out': 'sigmoid'}

Once I have these combinations, I’ll be able to pass them one by one to some Python function, that generates Keras Model, with all these options applied.

This function generates a Model with given params

// Data pipeline

Now, when I’m able to build all possible combinations of models it’s time to think about data pipeline. There are 2 major requirements:

  • Each model should be trained on the same amount of data
  • TensorFlow/Keras expect “endless” generators.

I suppose the most simple way to meet these requirements would be the use of Callable that returns Python generator+ some wrapper that will make it “endless”.

The general schema here will look like this:

  • Python generator yields either Tuple[np.ndarray, np.ndarray] or Tuple[Dict[str, np.ndarray], Dict[str, np.ndarray]]
  • We need at least 1 generator: for training data.
  • A validation generator is optional.
  • Each generator made “endless”
  • Each “endless” generator is converted to TF DataSet
  • DataSets are fed into Models
Turning Python generator into TensorFlow DataSet

// Comparing the models

To pick the best model, we need some kind of “ground truth”, something comparable. Luckily, we have various metrics that will suit different models: AUC, Accuracy, Recall, Precision, F1, and, actually, the Error. For sake of simplicity, I’ll just use the error, loss in TF terms.

How can I do that? Easy! Keras interface allows the use of callbacks for such tasks:

Storing “loss” score here, however, it can be any metric you add to the Model

// Tuning the model

Now, when we have all parts ready — let’s use it.

Now, when I’ll run the script, I’ll find out what are the best params for this particular model.

Model 1 of  12...
...
Model 12 of 12...
Best params found: {'lr': 0.0005, 'loss': 'mean_squared_error', 'act_in': 'tanh', 'act_out': 'tanh'}

Doesn’t look too complex I hope :)

Sure, there are lots of improvements possible here. I.e.:

  • Allow data parallelism
  • Use multiple workers in parallel, to speed up the search process
  • Use some smart algorithm instead of brute force

And, obviously, there are pitfalls to keep in mind:

  • It’s very easy to get into a huge grid, where the search process will take days, months, or even years. So, you have to keep the number of possible combinations under tight control.
  • In real tasks, one will probably use only a tiny fraction of available data for the tuning process. So, we have to make sure that this data is really representative fraction.

Thanks for reading. Feel free to contact me if you have any questions :)

Full source code is available in my GitHub repository:

Deep Learning Developer.