PowerTransformer: A powerful skewness correction approach where Standard or MinMax scaler fails.

 

PowerTransformer

A powerful skewness correction approach where Standard or MinMax scaler fails.

Hello there, today we will look into another interesting topic called Power Transformer.

We usually encounter highly skewed data and it's nearly impossible to correct it and many linear models require some transformation on numeric features to make them normally distributed due to the linear modeling assumptions.

The StandardSCaler and the MinMaxSclaer work well for most distributions but not for highly skewed data because the core metrics of the distribution, such as mean, median, min, and maximum values gets affected.

Applying sklearn PowerTransformer will use logarithmic transform to turn any skewed feature into a normal distribution as close as possible.

Let’s see how it works.

import seaborn as sns
diamonds = sns.load_dataset("diamonds")
diamonds[["price", "carat"]].hist(figsize=(10, 5));

#Both are heavily skewed. Let’s fix that using a logarithmic transform

from sklearn.preprocessing import PowerTransformer
pt = PowerTransformer()
diamonds.loc[:, ["price", "carat"]] = pt.fit_transform(diamonds[["price", "carat"]])
diamonds[["price", "carat"]].hist(figsize=(10, 5));

Here we are……. close to normally distributed.

Thus this will help our model to achieve higher accuracy, indeed.

It’s great to know the new advanced techniques like this. Thanks to the sklearn. Next we will an another advanced Linear Regression variant called Theil Sen Regressor

Thanks again, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy

Some of my alternative internet presences Facebook, Instagram, Udemy, Blogger, Issuu, Slideshare, Scribd and more.

Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy

Let me know if you need anything. Talk Soon.


Comments

Popular Posts