
π More Rigorous EDA Pipelines with Pingouin#
Are your exploratory analyses just histograms and scatter plots? Time to level up. π
π What is Pingouin?#
Pingouin is a Python library that bridges SciPy and pandas, enabling statistically rigorous EDA pipelines.
π§ͺ What You Can Validate with Pingouin#
- β
Univariate normality β Shapiro-Wilk test via
pg.normality() - β
Homoscedasticity β Levene test via
pg.homoscedasticity() - β Advanced correlations β With p-values and robust statistics
- β Outliers and statistical tests β Complete with a single function
π» Quick Example#
import pingouin as pg
import pandas as pd
df = pd.read_csv("wine-quality.csv")
# Normality test
normality = pg.normality(df[['pH', 'alcohol', 'fixed acidity']])
print(normality)π¨ The Golden Rule: Garbage In, Garbage Out (GIGO)#
Feeding a model with data that violates its mathematical assumptions is the perfect recipe for ineffective models. Pingouin helps you detect these issues before modeling.
π‘ Explanation in a nutshell#
Pingouin is a Python statistics library that extends SciPy with a pandas-like API, making statistical validation in EDA pipelines straightforward. It lets you check normality, variance homogeneity, and correlations with p-values in just a few lines β essential validation before training any ML model.
More information at the link π
Also published on LinkedIn.

