Telecom industry is highly competitive. Retaining customer is challenge in itself. Understanding the factors behind customer behavior behind dumping one provider to avail competitors service, is key to retaining customers and to run the telecom services profitability.
The data shows a general tendency of the longer a customers stays with the Telecom service they is more likely to stay even longer. Similarly the longer stay means more money the customer has paid to the company is more likely to stay. It is interesting to note that if the customer has spent more that $8K with the telecom operator the chances of customer migrating is almost nil.
The graph below shows the counts of subscribers who left versus who stayed against each feature. This give a broad picture of which feature contributes to higher churn rate.
Interestingly, gender, having dependents or partners does not seem to have a significant impact on the customer churn.
Luckily, there are machine learning model which can be build and trained to predict if the customer have a probability of leaving. This gives the Telecom companies an opportunity to proactively address any customer issue before it is too late. There are many ideas present in other article like here and data models describe in other articles used to predict Telcom customer churn.
With the data and models I have used to predict the probability of customer moving on, the Machine learning model using Logistic regression technique proves to be the best.
For those of are not aware of machine learning model or think it a machine with the intelligent of human being then check out here.
The metrics of the models used are compared here where Logistic regression with dummy variable substitution technique has better model metrics and can be used for prediction on new customers data
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=42, solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
Classification report for dataset with Dummy variable replacement: precision recall f1-score support
0 0.85 0.90 0.87 1539
1 0.68 0.57 0.62 574
avg / total 0.80 0.81 0.81 2113
Accuracy Score : 0.811168954094 Area under curve : 0.737099071074