Optuna Hyperparameter Tuning Proved to be the best parameter tuning by Data Scientists


Hi there, once again we meet with another advanced and interesting topic around called Optuna, yes you might have heard the word Optuna, the next gold rush for the data scientist for parameter hyper tuning.

Before we get into the hands-on on how to perform Optuna let's understand what actually Optuna is all about.

Optuna is a framework for finding the optimal hyperparameter values by trial and error for higher-excellent performance. It employs a Bayesian Optimization algorithm called Tree-structured Parzen Estimator by importing

from optuna.samplers import TPESampler

Optuna also makes use of different samplers like grid search, random search, and evolutionary algorithms. You can get all the samplers list here in this link

Now the question comes why Optuna?

To understand this we need to compare with our traditional hyperparameter methods:

Gird Search: grid search is a brute force search that computes every combination of parameters to find the optimal parameters. Thus for the real-live project use case, it's very very time-consuming and inefficient.

RandomizedSearch: Another alternative to Grid Search to cut down the time-consuming process of grid search by performing a random combination of parameters to find the best parameters. Thus sometimes might be ineffective in finding the best optimal parameter settings.

Optuna: uses the Bayesian framework to understand better the probability of the optimal values and cuts down the unnecessary computation for the combination of non-performing parameters in the search for the optimal parameter settings. Another advantage of Optuna is efficient sampling and pruning algorithms.

Let’s get started with some hands-on

import optuna
import sklearn
from sklearn import datasets
def objective(trial):
      iris = sklearn.datasets.load_iris()

n_estimators = trial.suggest_int('n_estimators', 2, 20)
max_depth = int(trial.suggest_loguniform('max_depth', 1, 32))
clf = sklearn.ensemble.RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth)

return sklearn.model_selection.cross_val_score(clf, iris.data, iris.target,n_jobs=-1, cv=3).mean()

Here we used a dummy dataset, defining our parameters for the random forest and then we are returning the mean of cross validation scores.

Then we will create the optuna study to optimize/ in other words to get the best-optimized parameters from the study

#direction= 'minimize' or 'maximize' 
#here i want to maximize the cross validation score
study = optuna.create_study(direction='maximize')
#study = optuna.create_study(sampler=TPESampler(), #direction="maximize") by default the sampler = TPESampler()
study.optimize(objective, n_trials=100)

Now we can access the best parameters values from ‘study….’

trial = study.best_trial
print('Accuracy: {}'.format(trial.value)) #0.9733333333333333
print("Best hyperparameters: {}".format(trial.params))
#{'n_estimators': 11, 'max_depth': 27.827767703750034}

Well, that's it! short and simple yet powerful.

optuna.visualization.plot_optimization_history(study)
optuna.visualization.plot_slice(study)
optuna.importances.get_param_importance()
help(optuna)

There are also other parameters to explore that will help you interpret the process. Optuna Documentation link below.

The end of another interesting topic. No worries! we got more coming.

Likewise, if you like this article do visit my other articles and happy machine learning.

Next, we will look into Gaussian Mixtures the next best alternatives to K-means :)

Some of my alternative internet presences Facebook, Instagram, Udemy, Blogger, Issuu, and more.

Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy

Have a good day. Talk soon!

pexel


Comments

Popular Posts