Skip to main content
  1. Posts/

Learning Machine Learning: Predicting Customer Churn

··338 words·2 mins·

📌 Learn Machine Learning – Predicting Customer Churn

The goal: 🔍 identify which customers are most likely to cancel a service. We use classification algorithms such as:

  • 🧠 Logistic regression – ideal for binary problems.
  • 🌲 Random Forests – usually more accurate, but “heavier”.

💡 A common challenge: imbalanced data.

When churned customers are far fewer than those who stay, the model can “play dumb”. Techniques to address it:

  • 🔁 Oversampling (replicate minority cases)
  • ➖ Undersampling (reduce majority cases)

And, of course, we always go through the workshop:

  • 🧼 clean missing values.
  • 🔤 encode categorical variables.
  • 📊 split into training / testing.

Then you train and evaluate using:

  • 📉 confusion matrix
  • 🏅 metrics such as F1‑score

You can practice with a public dataset, for example the Telco Customer Churn from Kaggle.

💻 Code example
#

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# load and basic preprocessing
df = pd.read_csv('Telco-Customer-Churn.csv')
df = pd.get_dummies(df.drop(columns=['customerID','Churn']), drop_first=True)
y = (df['Churn_Yes'] == 1).astype(int)
X = df.drop(columns=['Churn_Yes'])

X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y, test_size=0.3, random_state=42
)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
print(classification_report(y_test, model.predict(X_test)))

📘 Explanation in a few words
#

Imagine you’re a detective 📂: you have a list of customers with their data (age, plan, billing, etc.) and a label indicating whether they left or not.

The model learns patterns in that data to guess, when a new customer arrives, whether they will churn.

Imbalanced data is like having only two suspects and a hundred innocents; the trick is to “balance the scales” so the detector doesn’t fall asleep. Finally, you look at how the model makes mistakes (confusion matrix) and how much those mistakes matter to you (F1‑score).

More information at the link 👇

More in the following external reference.
Also published on LinkedIn.
Juan Pedro Bretti Mandarano
Author
Juan Pedro Bretti Mandarano