On screen, Al Pacino, as the blind man in ‘The Scent of a Woman’ looked pitch-perfect for the role, did justice and the audience in turn responded with thunderous applause. That’s the post-production success for you. During the pre-production stage, the hunt for the right-fit actor for a specific role begins with the casting process. Individuals go through auditions, wherein their performances are evaluated in addition to the chemistry they create with other characters.
This screening process in Data Science is known by the name of Exploratory Data Analysis (EDA). This is about ‘screening of data’ to know more about data, how data in combination can solve a business problem.
How EDA augments Machine learning?
Exploratory Data Analysis is about applying techniques on data to gain insights before applying any formal modelling techniques. EDA describes data by means of statistical and visualization techniques. It brings out important aspects of the data.
By performing EDA on data, we look at the data from different angles and gain insights without assumptions – insights that can never be gained by just looking or glancing at a large dataset. We unearth what the data is or know if it is really what it is claimed to be. You never know what you can get from the data, or how critical and crucial insights can be in addressing a business problem.
By knowing what the data actually is, gauging if the data is pertinent to solve the problem at hand, EDA also plays a pivotal role in helping choose the right ML model to solve a specific business problem. It offers the context needed to provide an appropriate model provoking the idea of the algorithm that ought to be chosen or enabling clarity to make a start towards solving the problem at hand.
If only EDA is going to be skipped, as we jump right into modelling, it only gets tougher to feed a particular model with data and just wait for the results to surprise you. Well, it can surprise you by not performing well though!
Now to know why the model underperformed, we retrace steps to perform EDA on data for getting more knowledge of what ML model can be used for a scenario or what went wrong with the data.
Through Exploratory Data Analysis, we capture critical or important information that otherwise would have been missed out, information that strengthens analysis in the long run, from framing questions to displaying results.
A good data exploration exercise allows the data scientist to:
- Verify or confirm certain relationships in data.
- Find unexpected relationships in data.
- Deliver data-driven insights to business stakeholders by confirming they are asking the right questions and not polluting the investigation with bias, with their assumptions.
How EDA uncovers the ‘Why’?
A Telecom customer was fighting the Churn battle. There was a need to build a predictive model to predict customers who are likely to churn and take necessary action to reduce customer churn. Prior to that, it was essential to use historical data to know what type of customers were churning and why the churn took place.
Saksoft’s domain expertise and rich experience in working with clients associated with Telecom industry put us on the right Exploratory Data Analysis path. As the attention turn towards columns, what proved significant was the focus on Service Request Analysis, churn and its dimensions as well as churn based on measures. Observations were captured using Tableau, demonstrated by the following images.
Exploratory Data Analysis was pivotal in unearthing striking features that led to customer churn. Some of the key findings included:
- Customers who raised many service requests churned
- Product ‘Y’ had the highest number of service requests
- Customers providing a score of 7 to 9 as the satisfaction score once their SRs were addressed made up for 63% of the overall churn
- Customers were retained when service resolutions were offered within 2000 minutes
This set the path to select the right ML algorithm for this customer’s churn problem, sharpen the accuracy of the ML model to make insightful decisions for improving customer retention.
How EDA-augmented-ML can enhance business?
When Exploratory Data Analysis augments ML, following the right business context becomes easier. With a problem to be solved, EDA helps stakeholders in knowing whether questions they had asked were right or wrong.
It all comes down to exploring, analyzing and retaining data that will add value to the business context and verify assumptions about the data. This will help build accurate models with good results. Ignoring this important or crucial step will only lead to having a weak and shaky foundation with assumptions, on top of which machine learning models are built. Here hope only springs surprises when we think models would do the magic, which is not the case. Most of the time spent on data pertains to the steps taken before we start modelling.