Tweedie Regressor: Generalized Linear Model with a Tweedie distribution.

 

Tweedie Regressor

Generalized Linear Model with a Tweedie distribution.

Hello there. today we will look into an unusual regressor named as Tweedie Regressor.

Sklearn states Tweedie Regressor as a Generalized Linear Model with a Tweedie distribution.

Wiki, In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and Inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous.[1] Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.[2]

This tweedie estimator can be used to model different GLMs depending on the power parameter, which determines the underlying distribution.

Parameter: power float, default=0
The power determines the underlying target distribution according to the following table:
Power Distribution
0 Normal
1 Poisson
(1,2) Compound Poisson Gamma
2 Gamma
3 Inverse Gaussian

Let’s apply the Tweedie Regressor with the help of an example

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn
import seaborn as sns
from pandas import read_csv
from pandas.plotting import scatter_matrix
df = pd.read_csv("1000_Companies.csv")
sns.displot(df.Profit)
sns.heatmap(df.corr(),annot=True)
for x in ["Profit"]:
q75,q25 = np.percentile(df.loc[:,x],[75,25])
intr_qr = q75-q25

max = q75+(1.5*intr_qr)
min = q25-(1.5*intr_qr)

df.loc[df[x] < min, x] = np.nan
df.loc[df[x] > max, x] = np.nan
#drop alll the rows that have null values
df.dropna(inplace=True)
#check the datatypes
df.dtypes
#convert the State column
df = pd.get_dummies(df,drop_first=True)
X= df.drop("Profit",axis=1)
y = df["Profit"]
sns.heatmap()
sns.heatmap()

Basic Data Preprocessing is done. Now the model part.

from sklearn.model_selection import train_test_split
X_train,X_test, y_train, y_test = train_test_split(X,y,test_size=0.25, random_state=1)
from sklearn import linear_model
model = linear_model.TweedieRegressor(max_iter=500).fit(X_train,y_train)
model.score(X_train,y_train)
#predict
y_pred = model.predict(X_test)
#Evaluation 
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
print("Mean Square Error", mean_absolute_error(y_pred,y_test))
print('RMSE',np.sqrt(mean_absolute_error(y_pred,y_test)))
print("Mean Abosulute Error", mean_absolute_error(y_pred,y_test))

#plot to check if tweedieregressor follows linear regression

fig, ax = plt.subplots()
ax.scatter(y_test,y_pred,edgecolors=(0,0,0))
ax.plot([y.min(),y.max()],[y.min(),y.max()], 'k--',lw=4)
ax.set_xlabel("Measured")
ax.set_ylabel("Predicted")
plt.show()
sklearn linear model tweedie regressor
sklearn linear tweedie regressor

Well that's it. We can observe from the tweedieregressor power parameters that the tweedie regression can be used for different types of distribution like Poisson, Compound Poisson Gamma, Gamma, Inverse Gaussian including normal distribution which in turn will be very helpful.

Check out the tweedie code implementation at Kaggle:

Thanks, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy

Some of my alternative internet presences Facebook, Instagram, Udemy, Blogger, Issuu, Slideshare, Scribd and more.

Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy

Let me know if you need anything. Talk Soon.


Comments

Popular Posts