Tweedie Regressor: Generalized Linear Model with a Tweedie distribution.

October 19, 2022

Tweedie Regressor: Generalized Linear Model with a Tweedie distribution.

Tweedie Regressor

Generalized Linear Model with a Tweedie distribution.

Hello there. today we will look into an unusual regressor named as Tweedie Regressor.

Sklearn states Tweedie Regressor as a Generalized Linear Model with a Tweedie distribution.

Wiki, In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and Inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous.[1] Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.[2]

This tweedie estimator can be used to model different GLMs depending on the power parameter, which determines the underlying distribution.

Parameter: power float, default=0

The power determines the underlying target distribution according to the following table:

Power Distribution
0      Normal
1      Poisson
(1,2)  Compound Poisson Gamma
2      Gamma
3      Inverse Gaussian

Let’s apply the Tweedie Regressor with the help of an example

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn
import seaborn as sns

from pandas import read_csv
from pandas.plotting import scatter_matrix

df = pd.read_csv("1000_Companies.csv")

sns.displot(df.Profit)
sns.heatmap(df.corr(),annot=True)

for x in ["Profit"]:
    q75,q25 = np.percentile(df.loc[:,x],[75,25])
    intr_qr = q75-q25
    
    max = q75+(1.5*intr_qr)
    min = q25-(1.5*intr_qr)
    
    df.loc[df[x] < min, x] = np.nan
    df.loc[df[x] > max, x] = np.nan

#drop alll the rows that have null values
df.dropna(inplace=True)

#check the datatypes
df.dtypes
#convert the State column
df = pd.get_dummies(df,drop_first=True)

X= df.drop("Profit",axis=1)
y = df["Profit"]

Basic Data Preprocessing is done. Now the model part.

from sklearn.model_selection import train_test_split
X_train,X_test, y_train, y_test = train_test_split(X,y,test_size=0.25, random_state=1)

from sklearn import linear_model

model = linear_model.TweedieRegressor(max_iter=500).fit(X_train,y_train)
model.score(X_train,y_train)

#predict
y_pred = model.predict(X_test)

#Evaluation 
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error

print("Mean Square Error", mean_absolute_error(y_pred,y_test))
print('RMSE',np.sqrt(mean_absolute_error(y_pred,y_test)))
print("Mean Abosulute Error", mean_absolute_error(y_pred,y_test))

#plot to check if tweedieregressor follows linear regression

fig, ax = plt.subplots()
ax.scatter(y_test,y_pred,edgecolors=(0,0,0))
ax.plot([y.min(),y.max()],[y.min(),y.max()], 'k--',lw=4)
ax.set_xlabel("Measured")
ax.set_ylabel("Predicted")
plt.show()

sklearn linear model tweedie regressor — sklearn linear tweedie regressor

Well that's it. We can observe from the tweedieregressor power parameters that the tweedie regression can be used for different types of distribution like Poisson, Compound Poisson Gamma, Gamma, Inverse Gaussian including normal distribution which in turn will be very helpful.

Check out the tweedie code implementation at Kaggle:

tweedie_regressor_1000_companies_profit
Explore and run machine learning code with Kaggle Notebooks | Using data from 1000_companies_profitwww.kaggle.com

Thanks, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy

Some of my alternative internet presences Facebook, Instagram, Udemy, Blogger, Issuu, Slideshare, Scribd and more.

Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy

Let me know if you need anything. Talk Soon.

Search This Blog

Welcome to #bobrupakroy

Tweedie Regressor: Generalized Linear Model with a Tweedie distribution.

Tweedie Regressor

Comments

Post a Comment

Popular Posts

Neural Prophet for Time Series- A deep learning approach for sequential learning time-series data

Condensed Nearest Neighbor Rule Undersampling (CNN) ~ An alternative to oversampling techniques like SMOTE