
📊 Accuracy can lie to you. The F1 Score won’t.
If your dataset has imbalanced classes (e.g., 95% negative, 5% positive), a model that always predicts “negative” gets 95% accuracy. But its F1 Score would be 0. That’s exactly what you need to know.
🧮 The three key metrics:
| Metric | Question it answers |
|---|---|
| Precision | Of all I predicted positive, how many actually are? TP / (TP+FP) |
| Recall | Of all real positives, how many did I detect? TP / (TP+FN) |
| F1 Score | How balanced are both? Harmonic mean of Precision and Recall |
📐 The formula:
$$F1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}$$The harmonic mean penalizes extreme values. If precision=0.90 and recall=0.10, F1 ≈ 0.18, not 0.50.
🐍 In Python with scikit-learn:
from sklearn.metrics import f1_score, classification_report
y_true = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
y_pred = [1, 0, 1, 1, 0, 0, 0, 1, 0, 0]
print(f"F1 Score: {f1_score(y_true, y_pred):.2f}")
# F1 Score: 0.67
print(classification_report(y_true, y_pred))📏 How to interpret the result:
- 0.0–0.5: Poor model
- 0.6–0.7: Acceptable (may be a good starting point)
- 0.8–0.9: Strong model
- 0.9–1.0: Excellent
⚠️ When NOT to use F1?
- When one type of error is far more costly than the other (use Precision or Recall individually)
- When classes are balanced and all errors are equal (use accuracy)
More information at the link 👇
Also published on LinkedIn.

