How to compare 2 ml models with Paired_tttest5x2: Overcome the limitations of Paired Student t-Tests with Paired tttest5x2

December 16, 2022

How to compare 2 ml models with Paired_tttest5x2: Overcome the limitations of Paired Student t-Tests with Paired tttest5x2

How to compare 2 ml models with Paired_tttest5x2

Overcome the limitations of Paired Student t-Tests with Paired tttest5x2

Hi there, you might be wondering or asked at some point that “How will you identify which machine learning model is better?”

“Which is one better mine or yours?”

don't worry i got you covered. So today we will learn and apply how to answer those questions with statistical proof.

Let’s get started.

You might have heard Paired Student’s t-Test? to compare the before and after results. Wolla! but hold on, a problem with using the Paired Student’s t-Test is that each evaluation of the model is not independent. The reason behind is the same rows of data is used to train the data multiple times except for the time a row of data is used in the hold-out test fold.

This is called the lack of independence in the evaluation which leads to the Paired Student t-Test Biased.

This limitation can be addressed byusing the number of folds and repeats of the procedure to achieve a good sampling for a model performance that generalizes well. Thats what we will do two-fold cross-validation with 5 repeats i.e. 5x2-fold cross-validation.

5x2 statistical hypothesis testing to compare 2 machine learning models.

from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from mlxtend.evaluate import paired_ttest_5x2cv

# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1)

# evaluate model 1
model1 = LogisticRegression()
cv1 = RepeatedStratifiedKFold(n_splits=2, n_repeats=5, random_state=1)
scores1 = cross_val_score(model1, X, y, scoring='accuracy', cv=cv1, n_jobs=-1)
print('LogisticRegression Mean Accuracy: %.3f (%.3f)' % (mean(scores1), std(scores1)))

# evaluate model 2
model2 = LinearDiscriminantAnalysis()
cv2 = RepeatedStratifiedKFold(n_splits=2, n_repeats=5, random_state=1)
scores2 = cross_val_score(model2, X, y, scoring='accuracy', cv=cv2, n_jobs=-1)
print('LinearDiscriminantAnalysis Mean Accuracy: %.3f (%.3f)' % (mean(scores2), std(scores2)))

The above are the 2 simple models, our regular ml pipeline

#Apply the 2 fold 5 repeats paired ttest

t, p = paired_ttest_5x2cv(estimator1=model1, estimator2=model2, X=X, y=y, scoring='accuracy', random_seed=1)
#summarize
print('P-value: %.3f, t-Statistic: %.3f' % (p, t))

#interpreting the result
if p <= 0.05:
 print('Difference between mean performance is probably real')
else:
 print('Algorithms probably have the same performance')

Here, we are done. we can clearly see that both the algorithms are performing the same with a 95%confidence level.

For sure here the models were too simple and will give the same performance. But for sure we don't have to argue anymore about whose model is better. isn't it?

Likewise, i hope you enjoyed the article, if so there are tons of other interesting topics in my other articles.

But wait there are some limitations with paired t-test is because of its assumptions

Observations in each sample are independent and identically distributed (iid).

Observations in each sample are normally distributed.

Observations in each sample have the same variance.

????????????? So >>>>>>>>.

Next, we will look into how to overcome the limitations of parametric tests like student paired t-test with non-parametric hypothesis testing like Wilcoxon Signed-Rank, Stay tuned :)

Thanks again, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy

Some of my alternative internet presences Facebook, Instagram, Udemy, Blogger, Issuu, Slideshare, Scribd and more.

Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy

Let me know if you need anything. Talk Soon

Search This Blog

Welcome to #bobrupakroy

How to compare 2 ml models with Paired_tttest5x2: Overcome the limitations of Paired Student t-Tests with Paired tttest5x2

How to compare 2 ml models with Paired_tttest5x2

“Which is one better mine or yours?”

This is called the lack of independence in the evaluation which leads to the Paired Student t-Test Biased.

Comments

Post a Comment

Popular Posts

Neural Prophet for Time Series- A deep learning approach for sequential learning time-series data

Condensed Nearest Neighbor Rule Undersampling (CNN) ~ An alternative to oversampling techniques like SMOTE