Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save alonsosilvaallende/61173701827d812dee997f39df0647df to your computer and use it in GitHub Desktop.

Select an option

Save alonsosilvaallende/61173701827d812dee997f39df0647df to your computer and use it in GitHub Desktop.
homework_churn-in-a-telecom-operator_2026-01.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.2"
},
"colab": {
"provenance": [],
"include_colab_link": true
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/alonsosilvaallende/61173701827d812dee997f39df0647df/homework_churn-in-a-telecom-operator_2026-01.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IWUt4iKZJLIi"
},
"source": [
"# Churn prediction in a telco\n",
"\n",
"Treselle Systems, a data consulting service, [analyzed customer churn data using logistic regression](http://www.treselle.com/blog/customer-churn-logistic-regression-with-r/).\n",
"\n",
"We will use that dataset to do our analysis. The dataset includes information about:\n",
"\n",
"+ Customers who left within the last month: the column is called Churn\n",
"+ Services that each customer has signed up for: phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies\n",
"+ Customer account information: how long they've been a customer, contract, payment method, paperless billing, monthly charges, and total charges\n",
"+ Demographic info about customers: gender, age range, and if they have partners and dependents"
]
},
{
"cell_type": "code",
"metadata": {
"id": "G9M17t2UJTfc",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "e50c2876-32a3-4046-dad6-1add351b2547"
},
"source": [
"%pip install --quiet lifelines scikit-survival"
],
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m349.3/349.3 kB\u001b[0m \u001b[31m10.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m4.0/4.0 MB\u001b[0m \u001b[31m32.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m117.3/117.3 kB\u001b[0m \u001b[31m5.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m222.1/222.1 kB\u001b[0m \u001b[31m9.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h Building wheel for autograd-gamma (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "4eQkkBwbJLIl"
},
"source": [
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import numpy as np\n",
"plt.style.use('seaborn-v0_8-bright')"
],
"execution_count": 2,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-09T22:37:18.227285Z",
"start_time": "2020-01-09T22:37:18.131371Z"
},
"id": "So7R4TFbJLI5"
},
"source": [
"churn_data = pd.read_csv(\n",
"'https://raw.githubusercontent.com/treselle-systems/customer_churn_analysis/master/WA_Fn-UseC_-Telco-Customer-Churn.csv')"
],
"execution_count": 3,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-09T22:37:18.287007Z",
"start_time": "2020-01-09T22:37:18.264485Z"
},
"scrolled": false,
"id": "S5CwX51iJLJM",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 707
},
"outputId": "ce3def35-0dd1-4a4e-bce6-06b1c95fbee2"
},
"source": [
"churn_data.head().T"
],
"execution_count": 4,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" 0 1 2 \\\n",
"customerID 7590-VHVEG 5575-GNVDE 3668-QPYBK \n",
"gender Female Male Male \n",
"SeniorCitizen 0 0 0 \n",
"Partner Yes No No \n",
"Dependents No No No \n",
"tenure 1 34 2 \n",
"PhoneService No Yes Yes \n",
"MultipleLines No phone service No No \n",
"InternetService DSL DSL DSL \n",
"OnlineSecurity No Yes Yes \n",
"OnlineBackup Yes No Yes \n",
"DeviceProtection No Yes No \n",
"TechSupport No No No \n",
"StreamingTV No No No \n",
"StreamingMovies No No No \n",
"Contract Month-to-month One year Month-to-month \n",
"PaperlessBilling Yes No Yes \n",
"PaymentMethod Electronic check Mailed check Mailed check \n",
"MonthlyCharges 29.85 56.95 53.85 \n",
"TotalCharges 29.85 1889.5 108.15 \n",
"Churn No No Yes \n",
"\n",
" 3 4 \n",
"customerID 7795-CFOCW 9237-HQITU \n",
"gender Male Female \n",
"SeniorCitizen 0 0 \n",
"Partner No No \n",
"Dependents No No \n",
"tenure 45 2 \n",
"PhoneService No Yes \n",
"MultipleLines No phone service No \n",
"InternetService DSL Fiber optic \n",
"OnlineSecurity Yes No \n",
"OnlineBackup No No \n",
"DeviceProtection Yes No \n",
"TechSupport Yes No \n",
"StreamingTV No No \n",
"StreamingMovies No No \n",
"Contract One year Month-to-month \n",
"PaperlessBilling No Yes \n",
"PaymentMethod Bank transfer (automatic) Electronic check \n",
"MonthlyCharges 42.3 70.7 \n",
"TotalCharges 1840.75 151.65 \n",
"Churn No Yes "
],
"text/html": [
"\n",
" <div id=\"df-fadcf37d-e7ec-4e4c-9250-13f1a9acf39d\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>customerID</th>\n",
" <td>7590-VHVEG</td>\n",
" <td>5575-GNVDE</td>\n",
" <td>3668-QPYBK</td>\n",
" <td>7795-CFOCW</td>\n",
" <td>9237-HQITU</td>\n",
" </tr>\n",
" <tr>\n",
" <th>gender</th>\n",
" <td>Female</td>\n",
" <td>Male</td>\n",
" <td>Male</td>\n",
" <td>Male</td>\n",
" <td>Female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SeniorCitizen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Partner</th>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Dependents</th>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>tenure</th>\n",
" <td>1</td>\n",
" <td>34</td>\n",
" <td>2</td>\n",
" <td>45</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>PhoneService</th>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MultipleLines</th>\n",
" <td>No phone service</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No phone service</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>InternetService</th>\n",
" <td>DSL</td>\n",
" <td>DSL</td>\n",
" <td>DSL</td>\n",
" <td>DSL</td>\n",
" <td>Fiber optic</td>\n",
" </tr>\n",
" <tr>\n",
" <th>OnlineSecurity</th>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>Yes</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>OnlineBackup</th>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>DeviceProtection</th>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>TechSupport</th>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>StreamingTV</th>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>StreamingMovies</th>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Contract</th>\n",
" <td>Month-to-month</td>\n",
" <td>One year</td>\n",
" <td>Month-to-month</td>\n",
" <td>One year</td>\n",
" <td>Month-to-month</td>\n",
" </tr>\n",
" <tr>\n",
" <th>PaperlessBilling</th>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>PaymentMethod</th>\n",
" <td>Electronic check</td>\n",
" <td>Mailed check</td>\n",
" <td>Mailed check</td>\n",
" <td>Bank transfer (automatic)</td>\n",
" <td>Electronic check</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MonthlyCharges</th>\n",
" <td>29.85</td>\n",
" <td>56.95</td>\n",
" <td>53.85</td>\n",
" <td>42.3</td>\n",
" <td>70.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>TotalCharges</th>\n",
" <td>29.85</td>\n",
" <td>1889.5</td>\n",
" <td>108.15</td>\n",
" <td>1840.75</td>\n",
" <td>151.65</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Churn</th>\n",
" <td>No</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" <td>No</td>\n",
" <td>Yes</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-fadcf37d-e7ec-4e4c-9250-13f1a9acf39d')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-fadcf37d-e7ec-4e4c-9250-13f1a9acf39d button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-fadcf37d-e7ec-4e4c-9250-13f1a9acf39d');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "churn_data"
}
},
"metadata": {},
"execution_count": 4
}
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-09T22:37:18.251731Z",
"start_time": "2020-01-09T22:37:18.229450Z"
},
"id": "ku717ihdJLJe",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "38012f88-82de-461a-a724-ebf71dce1078"
},
"source": [
"N = churn_data.shape[0]\n",
"churned = len(churn_data.query(\"Churn == 'Yes'\"))\n",
"notChurned = len(churn_data.query(\"Churn == 'No'\"))\n",
"\n",
"print(f'customers: {N}\\n')\n",
"print(f\"customers who churned: {churned}\")\n",
"print(f\"customers who haven't churned yet: {notChurned}\\n\")\n",
"print(f'percentage of customers who churned: {100*churned/len(churn_data):.0f}%')\n",
"print(f\"percentage of customers who haven't churned yet: {100*notChurned/len(churn_data):.0f}%\")"
],
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"customers: 7043\n",
"\n",
"customers who churned: 1869\n",
"customers who haven't churned yet: 5174\n",
"\n",
"percentage of customers who churned: 27%\n",
"percentage of customers who haven't churned yet: 73%\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"churn_data[\"Churn\"].unique().tolist()"
],
"metadata": {
"id": "GmZRZJMJME4F",
"outputId": "9b7db33f-fd98-4418-b490-443a1b155fdd",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['No', 'Yes']"
]
},
"metadata": {},
"execution_count": 7
}
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-09T22:37:18.305561Z",
"start_time": "2020-01-09T22:37:18.288618Z"
},
"id": "WZgadAo6JLJp"
},
"source": [
"churn_data['Churn'] = (churn_data['Churn'] == 'Yes')"
],
"execution_count": 8,
"outputs": []
},
{
"cell_type": "code",
"source": [
"churn_data[\"Churn\"].unique().tolist()"
],
"metadata": {
"id": "BxK09LoeMPJw",
"outputId": "4ef465ea-f6cf-470e-fbcc-cb2282aa5945",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": 9,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[False, True]"
]
},
"metadata": {},
"execution_count": 9
}
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-09T22:37:18.262702Z",
"start_time": "2020-01-09T22:37:18.254982Z"
},
"id": "CEv5C_L7JLJ2"
},
"source": [
"# Drop customerID column\n",
"churn_data = churn_data.drop('customerID', axis=1)\n",
"\n",
"# Drop TotalCharges column: otherwise together with MonthlyCharges you can deduce how many months you have been subscribed\n",
"churn_data = churn_data.drop('TotalCharges', axis=1)"
],
"execution_count": 10,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-09T22:37:18.691230Z",
"start_time": "2020-01-09T22:37:18.307317Z"
},
"scrolled": false,
"id": "Td3MQtORJLKB",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 641
},
"outputId": "7e7650f0-f49f-4dfa-80d7-cda5f62c9d41"
},
"source": [
"from lifelines import KaplanMeierFitter\n",
"\n",
"kmf = KaplanMeierFitter()\n",
"kmf.fit(churn_data['tenure'], churn_data['Churn'], label='Estimate for Average Customer')\n",
"fig, ax = plt.subplots(figsize=(10,7))\n",
"kmf.plot(ax=ax)\n",
"ax.set_title('Kaplan-Meier Survival Curve - All Customers')\n",
"ax.set_xlabel('Customer Tenure (Months)')\n",
"ax.set_ylabel('Customer Survival Chance (%)')\n",
"plt.show()"
],
"execution_count": 12,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 1000x700 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "OaQL73F8JLKQ"
},
"source": [
"data = pd.DataFrame()\n",
"data = kmf.survival_function_\n",
"data['lower'] = kmf.confidence_interval_['Estimate for Average Customer_lower_0.95']\n",
"data['upper'] = kmf.confidence_interval_['Estimate for Average Customer_upper_0.95']"
],
"execution_count": 13,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "8iBX5KruJLKK"
},
"source": [
"import altair as alt"
],
"execution_count": 15,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"scrolled": true,
"id": "iexNxw4lJLKd",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 385
},
"outputId": "3696c655-3ce3-4898-b45e-43357fc4411f"
},
"source": [
"label = alt.selection_point(\n",
" encodings=['x'], # limit selection to x-axis value\n",
" on='mouseover', # select on mouseover events\n",
" nearest=True, # select data point nearest the cursor\n",
" empty='none' # empty selection includes no data points\n",
")\n",
"\n",
"base = alt.Chart(data.reset_index()).encode(\n",
" x=alt.X('timeline:Q', scale=alt.Scale(zero=False), axis=alt.Axis(title=\"Customer tenure (months)\"))\n",
")\n",
"\n",
"line = base.mark_line(point=False).encode(\n",
" y=alt.Y('Estimate for Average Customer', scale=alt.Scale(zero=False), axis=alt.Axis(title='Customer survival probability')),\n",
" color=alt.value('blue'),\n",
" tooltip = ['timeline', 'Estimate for Average Customer']\n",
")\n",
"\n",
"band = alt.Chart(data.reset_index()).mark_area(\n",
" opacity=0.5\n",
").encode(\n",
" x=alt.X('timeline:Q', scale=alt.Scale(zero=False)),\n",
" y='lower:Q',\n",
" y2='upper:Q'\n",
")\n",
"\n",
"alt.layer(\n",
" line, # base line chart\n",
" band,\n",
" alt.Chart().mark_rule(color='#aaa').encode(\n",
" x = alt.X('timeline:Q', scale=alt.Scale(zero=False), sort=None)\n",
" ).transform_filter(label),\n",
" # add circle marks for selected time points, hide unselected points\n",
" base.mark_circle(size=80).encode(\n",
" y=alt.Y('Estimate for Average Customer', scale=alt.Scale(zero=False), axis=alt.Axis(title='Customer survival probability')),\n",
" opacity=alt.condition(label, alt.value(1), alt.value(0))\n",
" ).add_params(label),\n",
" # add white stroked text to provide a legible background for labels\n",
" base.mark_text(align='left', dx=5, dy=-5, stroke='white', strokeWidth=2).encode(\n",
" text='Estimate for Average Customer:Q'\n",
" ).transform_filter(label),\n",
" # add text labels for stock prices\n",
" base.mark_text(align='left', dx=5, dy=-5).encode(\n",
" text='Estimate for Average Customer:Q'\n",
" ).transform_filter(label),\n",
"\n",
" data=data.reset_index()\n",
").properties(\n",
" title=f'Kaplan-Meier Survival Curve - All Customers',\n",
" width=600\n",
")"
],
"execution_count": 18,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"<style>\n",
" #altair-viz-3b1f35d816cb47b8ba6945c24ca5a429.vega-embed {\n",
" width: 100%;\n",
" display: flex;\n",
" }\n",
"\n",
" #altair-viz-3b1f35d816cb47b8ba6945c24ca5a429.vega-embed details,\n",
" #altair-viz-3b1f35d816cb47b8ba6945c24ca5a429.vega-embed details summary {\n",
" position: relative;\n",
" }\n",
"</style>\n",
"<div id=\"altair-viz-3b1f35d816cb47b8ba6945c24ca5a429\"></div>\n",
"<script type=\"text/javascript\">\n",
" var VEGA_DEBUG = (typeof VEGA_DEBUG == \"undefined\") ? {} : VEGA_DEBUG;\n",
" (function(spec, embedOpt){\n",
" let outputDiv = document.currentScript.previousElementSibling;\n",
" if (outputDiv.id !== \"altair-viz-3b1f35d816cb47b8ba6945c24ca5a429\") {\n",
" outputDiv = document.getElementById(\"altair-viz-3b1f35d816cb47b8ba6945c24ca5a429\");\n",
" }\n",
"\n",
" const paths = {\n",
" \"vega\": \"https://cdn.jsdelivr.net/npm/vega@5?noext\",\n",
" \"vega-lib\": \"https://cdn.jsdelivr.net/npm/vega-lib?noext\",\n",
" \"vega-lite\": \"https://cdn.jsdelivr.net/npm/vega-lite@5.20.1?noext\",\n",
" \"vega-embed\": \"https://cdn.jsdelivr.net/npm/vega-embed@6?noext\",\n",
" };\n",
"\n",
" function maybeLoadScript(lib, version) {\n",
" var key = `${lib.replace(\"-\", \"\")}_version`;\n",
" return (VEGA_DEBUG[key] == version) ?\n",
" Promise.resolve(paths[lib]) :\n",
" new Promise(function(resolve, reject) {\n",
" var s = document.createElement('script');\n",
" document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
" s.async = true;\n",
" s.onload = () => {\n",
" VEGA_DEBUG[key] = version;\n",
" return resolve(paths[lib]);\n",
" };\n",
" s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
" s.src = paths[lib];\n",
" });\n",
" }\n",
"\n",
" function showError(err) {\n",
" outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
" throw err;\n",
" }\n",
"\n",
" function displayChart(vegaEmbed) {\n",
" vegaEmbed(outputDiv, spec, embedOpt)\n",
" .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
" }\n",
"\n",
" if(typeof define === \"function\" && define.amd) {\n",
" requirejs.config({paths});\n",
" let deps = [\"vega-embed\"];\n",
" require(deps, displayChart, err => showError(`Error loading script: ${err.message}`));\n",
" } else {\n",
" maybeLoadScript(\"vega\", \"5\")\n",
" .then(() => maybeLoadScript(\"vega-lite\", \"5.20.1\"))\n",
" .then(() => maybeLoadScript(\"vega-embed\", \"6\"))\n",
" .catch(showError)\n",
" .then(() => displayChart(vegaEmbed));\n",
" }\n",
" })({\"config\": {\"view\": {\"continuousWidth\": 300, \"continuousHeight\": 300}}, \"layer\": [{\"data\": {\"name\": \"data-8b7e071a471b915467a66d1d3190ead4\"}, \"mark\": {\"type\": \"line\", \"point\": false}, \"encoding\": {\"color\": {\"value\": \"blue\"}, \"tooltip\": [{\"field\": \"timeline\", \"type\": \"quantitative\"}, {\"field\": \"Estimate for Average Customer\", \"type\": \"quantitative\"}], \"x\": {\"axis\": {\"title\": \"Customer tenure (months)\"}, \"field\": \"timeline\", \"scale\": {\"zero\": false}, \"type\": \"quantitative\"}, \"y\": {\"axis\": {\"title\": \"Customer survival probability\"}, \"field\": \"Estimate for Average Customer\", \"scale\": {\"zero\": false}, \"type\": \"quantitative\"}}}, {\"data\": {\"name\": \"data-8b7e071a471b915467a66d1d3190ead4\"}, \"mark\": {\"type\": \"area\", \"opacity\": 0.5}, \"encoding\": {\"x\": {\"field\": \"timeline\", \"scale\": {\"zero\": false}, \"type\": \"quantitative\"}, \"y\": {\"field\": \"lower\", \"type\": \"quantitative\"}, \"y2\": {\"field\": \"upper\"}}}, {\"mark\": {\"type\": \"rule\", \"color\": \"#aaa\"}, \"encoding\": {\"x\": {\"field\": \"timeline\", \"scale\": {\"zero\": false}, \"sort\": null, \"type\": \"quantitative\"}}, \"transform\": [{\"filter\": {\"param\": \"param_3\", \"empty\": false}}]}, {\"data\": {\"name\": \"data-8b7e071a471b915467a66d1d3190ead4\"}, \"mark\": {\"type\": \"circle\", \"size\": 80}, \"encoding\": {\"opacity\": {\"condition\": {\"param\": \"param_3\", \"value\": 1, \"empty\": false}, \"value\": 0}, \"x\": {\"axis\": {\"title\": \"Customer tenure (months)\"}, \"field\": \"timeline\", \"scale\": {\"zero\": false}, \"type\": \"quantitative\"}, \"y\": {\"axis\": {\"title\": \"Customer survival probability\"}, \"field\": \"Estimate for Average Customer\", \"scale\": {\"zero\": false}, \"type\": \"quantitative\"}}, \"name\": \"view_3\"}, {\"data\": {\"name\": \"data-8b7e071a471b915467a66d1d3190ead4\"}, \"mark\": {\"type\": \"text\", \"align\": \"left\", \"dx\": 5, \"dy\": -5, \"stroke\": \"white\", \"strokeWidth\": 2}, \"encoding\": {\"text\": {\"field\": \"Estimate for Average Customer\", \"type\": \"quantitative\"}, \"x\": {\"axis\": {\"title\": \"Customer tenure (months)\"}, \"field\": \"timeline\", \"scale\": {\"zero\": false}, \"type\": \"quantitative\"}}, \"transform\": [{\"filter\": {\"param\": \"param_3\", \"empty\": false}}]}, {\"data\": {\"name\": \"data-8b7e071a471b915467a66d1d3190ead4\"}, \"mark\": {\"type\": \"text\", \"align\": \"left\", \"dx\": 5, \"dy\": -5}, \"encoding\": {\"text\": {\"field\": \"Estimate for Average Customer\", \"type\": \"quantitative\"}, \"x\": {\"axis\": {\"title\": \"Customer tenure (months)\"}, \"field\": \"timeline\", \"scale\": {\"zero\": false}, \"type\": \"quantitative\"}}, \"transform\": [{\"filter\": {\"param\": \"param_3\", \"empty\": false}}]}], \"data\": {\"name\": \"data-8b7e071a471b915467a66d1d3190ead4\"}, \"params\": [{\"name\": \"param_3\", \"select\": {\"type\": \"point\", \"encodings\": [\"x\"], \"nearest\": true, \"on\": \"mouseover\"}, \"views\": [\"view_3\"]}], \"title\": \"Kaplan-Meier Survival Curve - All Customers\", \"width\": 600, \"$schema\": \"https://vega.github.io/schema/vega-lite/v5.20.1.json\", \"datasets\": {\"data-8b7e071a471b915467a66d1d3190ead4\": [{\"timeline\": 0.0, \"Estimate for Average Customer\": 1.0, \"lower\": 1.0, \"upper\": 1.0}, {\"timeline\": 1.0, \"Estimate for Average Customer\": 0.9459613196814566, \"lower\": 0.9404183713158356, \"upper\": 0.9510021209572403}, {\"timeline\": 2.0, \"Estimate for Average Customer\": 0.9278349382636628, \"lower\": 0.9215059899590076, \"upper\": 0.9336721186843941}, {\"timeline\": 3.0, \"Estimate for Average Customer\": 0.9137245217943558, \"lower\": 0.9068570856457402, \"upper\": 0.9201081780961634}, {\"timeline\": 4.0, \"Estimate for Average Customer\": 0.9010445125469171, \"lower\": 0.8937333460911131, \"upper\": 0.9078789401526071}, {\"timeline\": 5.0, \"Estimate for Average Customer\": 0.8911105161984236, \"lower\": 0.8834732306073576, \"upper\": 0.8982764957239141}, {\"timeline\": 6.0, \"Estimate for Average Customer\": 0.8848262389332724, \"lower\": 0.8769910175693204, \"upper\": 0.8921935526017892}, {\"timeline\": 7.0, \"Estimate for Average Customer\": 0.8767129454802709, \"lower\": 0.8686302744761023, \"upper\": 0.8843320072393812}, {\"timeline\": 8.0, \"Estimate for Average Customer\": 0.8699329889878812, \"lower\": 0.8616495182497433, \"upper\": 0.8777563360010985}, {\"timeline\": 9.0, \"Estimate for Average Customer\": 0.862394006792432, \"lower\": 0.8538928337872115, \"upper\": 0.8704388125380358}, {\"timeline\": 10.0, \"Estimate for Average Customer\": 0.854915161098529, \"lower\": 0.8462039691721595, \"upper\": 0.8631736733837074}, {\"timeline\": 11.0, \"Estimate for Average Customer\": 0.8496909604294861, \"lower\": 0.8408359247763904, \"upper\": 0.8580958350839235}, {\"timeline\": 12.0, \"Estimate for Average Customer\": 0.8431995538158302, \"lower\": 0.834168685811462, \"upper\": 0.8517833206931235}, {\"timeline\": 13.0, \"Estimate for Average Customer\": 0.8366025632774321, \"lower\": 0.8273958732708122, \"upper\": 0.84536517439355}, {\"timeline\": 14.0, \"Estimate for Average Customer\": 0.8323737381892569, \"lower\": 0.8230555387776636, \"upper\": 0.841249734571825}, {\"timeline\": 15.0, \"Estimate for Average Customer\": 0.8257817372660968, \"lower\": 0.8162924499579037, \"upper\": 0.8348317518565246}, {\"timeline\": 16.0, \"Estimate for Average Customer\": 0.8207255621855252, \"lower\": 0.811106906401148, \"upper\": 0.8299071739485888}, {\"timeline\": 17.0, \"Estimate for Average Customer\": 0.8159762043807567, \"lower\": 0.8062374084341788, \"upper\": 0.82528003664297}, {\"timeline\": 18.0, \"Estimate for Average Customer\": 0.8115314860636578, \"lower\": 0.8016809173680022, \"upper\": 0.8209489863094623}, {\"timeline\": 19.0, \"Estimate for Average Customer\": 0.8079531388287515, \"lower\": 0.798012367175139, \"upper\": 0.8174622790431181}, {\"timeline\": 20.0, \"Estimate for Average Customer\": 0.8045199101935029, \"lower\": 0.7944926763694108, \"upper\": 0.8141168446808157}, {\"timeline\": 21.0, \"Estimate for Average Customer\": 0.8012361554580191, \"lower\": 0.7911262275701932, \"upper\": 0.8109170130349665}, {\"timeline\": 22.0, \"Estimate for Average Customer\": 0.7959622948540777, \"lower\": 0.7857205720282491, \"upper\": 0.8057769198289992}, {\"timeline\": 23.0, \"Estimate for Average Customer\": 0.7933831548159159, \"lower\": 0.7830768790035613, \"upper\": 0.8032632537959522}, {\"timeline\": 24.0, \"Estimate for Average Customer\": 0.7887363983705965, \"lower\": 0.7783129555299131, \"upper\": 0.798735202136176}, {\"timeline\": 25.0, \"Estimate for Average Customer\": 0.7840035684299431, \"lower\": 0.7734606163306005, \"upper\": 0.7941233735135383}, {\"timeline\": 26.0, \"Estimate for Average Customer\": 0.78087089567383, \"lower\": 0.7702488090051803, \"upper\": 0.791070790797879}, {\"timeline\": 27.0, \"Estimate for Average Customer\": 0.7781086312809701, \"lower\": 0.7674160577861241, \"upper\": 0.7883797833200772}, {\"timeline\": 28.0, \"Estimate for Average Customer\": 0.7755170954565537, \"lower\": 0.7647577267893169, \"upper\": 0.7858557012640619}, {\"timeline\": 29.0, \"Estimate for Average Customer\": 0.7722365662879559, \"lower\": 0.761392420324585, \"upper\": 0.7826607509843774}, {\"timeline\": 30.0, \"Estimate for Average Customer\": 0.7686799211927888, \"lower\": 0.7577433426203282, \"upper\": 0.7791973520011836}, {\"timeline\": 31.0, \"Estimate for Average Customer\": 0.7650647304993484, \"lower\": 0.7540337514754943, \"upper\": 0.7756773411800988}, {\"timeline\": 32.0, \"Estimate for Average Customer\": 0.7607086532205094, \"lower\": 0.7495639740543699, \"upper\": 0.7714358919002775}, {\"timeline\": 33.0, \"Estimate for Average Customer\": 0.7574498034209116, \"lower\": 0.7462199150029253, \"upper\": 0.7682629264281162}, {\"timeline\": 34.0, \"Estimate for Average Customer\": 0.7546129127713944, \"lower\": 0.7433083428939619, \"upper\": 0.7655012642310856}, {\"timeline\": 35.0, \"Estimate for Average Customer\": 0.7510069256125633, \"lower\": 0.7396066912924588, \"upper\": 0.7619915810379179}, {\"timeline\": 36.0, \"Estimate for Average Customer\": 0.7485454148763702, \"lower\": 0.7370785621208633, \"upper\": 0.7595970312942879}, {\"timeline\": 37.0, \"Estimate for Average Customer\": 0.7448039349619592, \"lower\": 0.7332355622236266, \"upper\": 0.7559575770131214}, {\"timeline\": 38.0, \"Estimate for Average Customer\": 0.7415060973752753, \"lower\": 0.7298477058195078, \"upper\": 0.7527501715335171}, {\"timeline\": 39.0, \"Estimate for Average Customer\": 0.7378977951982666, \"lower\": 0.7261404450815181, \"upper\": 0.7492412557752938}, {\"timeline\": 40.0, \"Estimate for Average Customer\": 0.7344973445291503, \"lower\": 0.7226464865355982, \"upper\": 0.7459347086154924}, {\"timeline\": 41.0, \"Estimate for Average Customer\": 0.7307675792685739, \"lower\": 0.718813324657747, \"upper\": 0.7423087300192401}, {\"timeline\": 42.0, \"Estimate for Average Customer\": 0.7269600816467797, \"lower\": 0.7148991560020901, \"upper\": 0.7386082428763483}, {\"timeline\": 43.0, \"Estimate for Average Customer\": 0.7228012711110432, \"lower\": 0.7106230187730818, \"upper\": 0.7345670985744338}, {\"timeline\": 44.0, \"Estimate for Average Customer\": 0.7211052180697188, \"lower\": 0.7088784590135658, \"upper\": 0.7329196547767036}, {\"timeline\": 45.0, \"Estimate for Average Customer\": 0.7193787091677161, \"lower\": 0.7071016548595497, \"upper\": 0.7312434901038949}, {\"timeline\": 46.0, \"Estimate for Average Customer\": 0.7158480161165857, \"lower\": 0.7034660682549528, \"upper\": 0.7278176757076321}, {\"timeline\": 47.0, \"Estimate for Average Customer\": 0.7116211615296467, \"lower\": 0.6991114342385604, \"upper\": 0.7237184770494878}, {\"timeline\": 48.0, \"Estimate for Average Customer\": 0.7088401843460741, \"lower\": 0.696245083070984, \"upper\": 0.7210227332102614}, {\"timeline\": 49.0, \"Estimate for Average Customer\": 0.7040913666751537, \"lower\": 0.6913482448254978, \"upper\": 0.7164216030931225}, {\"timeline\": 50.0, \"Estimate for Average Customer\": 0.7008511855123595, \"lower\": 0.6880058448875696, \"upper\": 0.7132833750752133}, {\"timeline\": 51.0, \"Estimate for Average Customer\": 0.6981876180614814, \"lower\": 0.685256130132329, \"upper\": 0.7107056284968808}, {\"timeline\": 52.0, \"Estimate for Average Customer\": 0.6954455950155849, \"lower\": 0.6824228100028451, \"upper\": 0.7080544290948518}, {\"timeline\": 53.0, \"Estimate for Average Customer\": 0.690470511556097, \"lower\": 0.6772766645524837, \"upper\": 0.7032492874895002}, {\"timeline\": 54.0, \"Estimate for Average Customer\": 0.6857136929815187, \"lower\": 0.6723539178814393, \"upper\": 0.6986572790007199}, {\"timeline\": 55.0, \"Estimate for Average Customer\": 0.6823209369414782, \"lower\": 0.668840380360719, \"upper\": 0.6953844208282612}, {\"timeline\": 56.0, \"Estimate for Average Customer\": 0.6784330683549172, \"lower\": 0.6648106650609784, \"upper\": 0.6916372461214441}, {\"timeline\": 57.0, \"Estimate for Average Customer\": 0.6751927910135208, \"lower\": 0.6614472891464104, \"upper\": 0.6885188721810138}, {\"timeline\": 58.0, \"Estimate for Average Customer\": 0.6705796725656021, \"lower\": 0.6566540260819743, \"upper\": 0.6840840058943622}, {\"timeline\": 59.0, \"Estimate for Average Customer\": 0.6671029147039528, \"lower\": 0.6530376784372129, \"upper\": 0.6807452682464185}, {\"timeline\": 60.0, \"Estimate for Average Customer\": 0.6644039143747397, \"lower\": 0.6502267928533968, \"upper\": 0.6781567791866754}, {\"timeline\": 61.0, \"Estimate for Average Customer\": 0.6606262091046632, \"lower\": 0.6462841968422708, \"upper\": 0.6745416370566342}, {\"timeline\": 62.0, \"Estimate for Average Customer\": 0.6581445178608437, \"lower\": 0.6436882696632222, \"upper\": 0.6721723630788525}, {\"timeline\": 63.0, \"Estimate for Average Customer\": 0.6560568270825379, \"lower\": 0.6414987023418927, \"upper\": 0.6701847193256479}, {\"timeline\": 64.0, \"Estimate for Average Customer\": 0.6538497393547579, \"lower\": 0.6391766092494745, \"upper\": 0.668090317858698}, {\"timeline\": 65.0, \"Estimate for Average Customer\": 0.6485434745628794, \"lower\": 0.6335747882063961, \"upper\": 0.6630731098587361}, {\"timeline\": 66.0, \"Estimate for Average Customer\": 0.640381746422204, \"lower\": 0.6249410515971489, \"upper\": 0.6553729251824221}, {\"timeline\": 67.0, \"Estimate for Average Customer\": 0.6335980414812913, \"lower\": 0.6177440126424799, \"upper\": 0.6489931821153638}, {\"timeline\": 68.0, \"Estimate for Average Customer\": 0.626857636784682, \"lower\": 0.6105561099292072, \"upper\": 0.6426893869172589}, {\"timeline\": 69.0, \"Estimate for Average Customer\": 0.6201353028781441, \"lower\": 0.6033383740719858, \"upper\": 0.6364493769082435}, {\"timeline\": 70.0, \"Estimate for Average Customer\": 0.6096568261782064, \"lower\": 0.5920089857499321, \"upper\": 0.6267982298776328}, {\"timeline\": 71.0, \"Estimate for Average Customer\": 0.6027809973115349, \"lower\": 0.5844705956903173, \"upper\": 0.6205645127398534}, {\"timeline\": 72.0, \"Estimate for Average Customer\": 0.5927901520522275, \"lower\": 0.5730628910042918, \"upper\": 0.6119362091926619}]}}, {\"mode\": \"vega-lite\"});\n",
"</script>"
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"metadata": {},
"execution_count": 18
}
]
},
{
"cell_type": "code",
"source": [
"from lifelines import KaplanMeierFitter\n",
"kmf = KaplanMeierFitter()\n",
"fig, ax = plt.subplots(figsize=(10,7))\n",
"for r in churn_data['gender'].unique():\n",
" ix = churn_data['gender'] == r\n",
" kmf.fit(churn_data['tenure'].loc[ix], churn_data['Churn'].loc[ix], label=r)\n",
" # kmf.survival_function_.plot(ax=ax)\n",
" kmf.plot(ax=ax)\n",
"# kmf.fit(churn_data['tenure'], churn_data['Churn'], label='Estimate for Average Customer')\n",
"# kmf.plot(ax=ax)\n",
"ax.set_title('Kaplan-Meier Survival Curve - All Customers')\n",
"ax.set_xlabel('Customer Tenure (Months)')\n",
"ax.set_ylabel('Customer Survival Chance (%)')\n",
"plt.show()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 641
},
"id": "gDmUO9iWQbnH",
"outputId": "52f738e9-782d-40fe-e527-16203e085da9"
},
"execution_count": 19,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 1000x700 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": [
"## Exercise 1\n",
"\n",
"Find interesting features that separate the Kaplan-Meier Survival Curve"
],
"metadata": {
"id": "p7iYfGtPLDb1"
}
},
{
"cell_type": "markdown",
"source": [
"# Exercise 2\n",
"\n",
"Consider only these features: gender, SeniorCitizen, InternetService, Contract"
],
"metadata": {
"id": "OQbZkfE-W0Vc"
}
},
{
"cell_type": "markdown",
"source": [
"https://scikit-survival.readthedocs.io/en/stable/api/generated/sksurv.datasets.get_x_y.html"
],
"metadata": {
"id": "ckDU5maLWByj"
}
},
{
"cell_type": "markdown",
"source": [
"Example:\n",
"```python\n",
"from sksurv.datasets import get_x_y\n",
"X, y = get_x_y(data, attr_labels=[\"Churn\", \"tenure\"], pos_label=True)\n",
"```"
],
"metadata": {
"id": "316E_eyLWEIJ"
}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment