Machine Learning & Coding / Project Showcase

20 October, 2025

I Built a Real-Time Customer Churn Prediction Model Using XGBoost: Full Python Code and Deployment Walkthrough

Table of Contents (TOC)

Introduction: The Business Case for Real-Time Churn
Phase 1: Model Development with Python and XGBoost
Data Cleaning and Feature Engineering
Handling Imbalanced Data (SMOTE/Scale_pos_weight)
Hyperparameter Tuning and Evaluation
Phase 2: Deployment for Real-Time Prediction
The FastAPI/Flask API Structure
Creating the Inference Pipeline
Phase 3: Productionizing with Docker and Scalability
Containerization with Docker
Real-Time Monitoring and Retraining

1. Introduction: The Business Case for Real-Time Churn

Customer churn—the rate at which customers stop using a product or service—is a critical metric directly impacting a company's revenue. Identifying at-risk customers in real-time allows businesses to deploy proactive retention strategies (like personalized offers or discounts). This project details an end-to-end Machine Learning pipeline using XGBoost, a powerful, optimized gradient boosting framework, to predict churn and deploy the model as a scalable API for immediate business insights.

2. Phase 1: Model Development with Python and XGBoost

Data Cleaning and Feature Engineering

The first step is data preparation. Typically, this involves using the Telco Customer Churn dataset (or similar transactional data). Key steps in Python (using Pandas and NumPy) include:

Handling Missing Values: Replacing or removing missing data points.
Encoding Categorical Variables: Converting features like 'Contract Type' or 'Internet Service' into numerical formats using One-Hot Encoding (pd.get_dummies) or Label Encoding.
Feature Engineering: Creating highly predictive features, such as ChargesPerMonth or TenureGroups, which often reveal deeper customer behaviour patterns.

Handling Imbalanced Data (SMOTE/Scale_pos_weight)

Customer churn datasets are highly imbalanced (e.g., $80\%$ Non-Churners vs. $20\%$ Churners). To prevent the XGBoost model from being biased toward the majority class:

Use the scale_pos_weight hyperparameter within XGBClassifier to assign a higher penalty for misclassifying the minority class (churners).
Alternatively, apply SMOTE (Synthetic Minority Over-sampling Technique) on the training set using the imblearn library.

Hyperparameter Tuning and Evaluation

The XGBoost model is trained using the processed features. Evaluation focuses not just on Accuracy (which can be misleading in imbalanced data) but on Recall (correctly identifying actual churners) and the F1-Score (the harmonic mean of Precision and Recall). Hyperparameter Tuning (using RandomizedSearchCV or Optuna) is essential to optimize parameters like max_depth, learning_rate, and n_estimators for the best possible F1-Score.

3. Phase 2: Deployment for Real-Time Prediction

To achieve real-time prediction, the trained model must be saved (serialized) using Python's pickle or joblib and served via a low-latency web API.

The FastAPI/Flask API Structure

We use FastAPI (or Flask) to build a RESTful API endpoint. The core structure involves:

Loading the Model: Loading the serialized .pkl or .joblib model artifact upon application startup.
Defining the Endpoint: Creating a POST endpoint (e.g., /predict_churn) that accepts new customer data (features) in JSON format.
Inference Pipeline: Within the endpoint function, the raw JSON input is passed through the exact same preprocessing steps (encoding, scaling, etc.) used during training, and then fed to the loaded XGBoost model for a prediction.

Creating the Inference Pipeline (Simplified Python Snippet)

Python

from fastapi import FastAPI

import joblib

import pandas as pd

app = FastAPI()

# Load model and preprocessing tools (encoders/scalers)

model = joblib.load("xgb_churn_model.joblib")

encoders = joblib.load("encoders.joblib")

@app.post("/predict_churn")

async def predict_churn(customer_data: dict):

# 1. Convert JSON to DataFrame

df = pd.DataFrame([customer_data])

# 2. Apply the exact same preprocessing (One-Hot Encoding, etc.)

# NOTE: Preprocessing logic must be identical to training!

processed_df = apply_preprocessing(df, encoders)

# 3. Predict probability (Real-time prediction)

churn_proba = model.predict_proba(processed_df)[:, 1][0]

# 4. Apply business threshold (e.g., 0.4)

prediction = 1 if churn_proba >= 0.4 else 0

return {"churn_probability": churn_proba, "prediction": prediction}

4. Phase 3: Productionizing with Docker and Scalability

Containerization with Docker

For consistent and scalable deployment, the entire application (FastAPI code, XGBoost model file, preprocessing logic, and dependencies) is bundled into a single unit using Docker. The Dockerfile specifies the Python environment, copies the application files, and defines the startup command. This container can then be deployed reliably to cloud services like AWS Fargate, Google Cloud Run, or Kubernetes.

Real-Time Monitoring and Retraining

The final step for a production-ready model is MLOps.

Monitoring: Tools like Prometheus or AWS SageMaker Model Monitor are used to track the API's performance and detect data drift (when new customer data features deviate from the training data) or model drift (when the model’s performance degrades over time).
Retraining: Based on monitoring alerts, an automated pipeline is triggered to retrain the XGBoost model on the latest customer data, ensuring the predictive accuracy remains high. This guarantees the model continues to provide reliable, real-time insights to the business.

1. Why is XGBoost often chosen for churn prediction?

Answer: XGBoost is an optimized gradient boosting algorithm known for its high accuracy, efficiency, and ability to automatically handle non-linear relationships and feature interactions in structured datasets, which is common in customer data.

2. What is the role of the scale_pos_weight parameter?

Answer: In imbalanced churn data, scale_pos_weight is an XGBoost hyperparameter that increases the penalty for misclassifying the minority class (churners), forcing the model to pay more attention to correctly identifying the high-risk customers.

3. How is "real-time" prediction achieved during deployment?

Answer: Real-time prediction is achieved by deploying the model via a low-latency web framework like FastAPI or Flask. New customer data is sent to the API, processed instantly, and the prediction is returned in milliseconds, allowing immediate business action.

BuzzAiQ online library offers a vast collection of literature and resources for easy access and exploration.

Manage

About

Connect

BuzzAiQ.com

Category