When you perform exploratory data analysis for a client, it’s a great idea to avoid providing insights that are too obvious. Instead, you’ll want to provide insights that stakeholders least expect and insights that lead to maximum, actionable impact. The following three approaches will help you achieve that. I’ll also provide examples, real-life use cases, techniques, and visualization types associated with each approach.
Approach #1: Identify Outliers and Anomalies
This involves finding trends and data points that deviate from the usual.
- Sudden changes in trends
- Attributes that are over-performing or under-performing compared to the past
- Metrics that most deviate from performance targets set by stakeholders
A real-life use case might involve identifying cities, counties, products, and sales people that are either over-performing or under-performing by a significant amount in relation to your business.
Visualization Types you could use to accomplish this are:
- Stacked bar charts
- Chart matrices
- Cumulative line charts
- Box-and-whisker plots
- Scatter plots
- Heat tables
- And more
- Using Z-scores to Detect Outliers
- Identifying values in upper and lower quartiles of Interquartile Ranges
Approach #2: Identify Causes and Relationships
This involves finding drivers of upwings and downswings in your data.
A real-life use case might involve finding a sharp increase of sales in a country and identifying the specific cities and counties that contribute to those sales.
Visualization types include:
- Chart matrices so that you can observe the trends of multiple dimensions at a time.
- Scatter plots, which are ideal for determining correlations or lack thereof.
- Correlation plots that can be used to analyze correlations of multiple dimensions at a time. Typically, correlation plots are created with programming languages such as Python and R.
Perhaps worth mentioning, Power BI has an insights feature that will help you identify causes automatically.
Approach #3: Segment and Group Data in New and Creative Ways
This involves transforming your data and creating new columns in data sets. When you change the presentation of your data, you give yourself the ability to find insights that were never seen before in your company.
General examples include:
- Converting numerical values into categorical groups (in other words, bins)
- Converting categorical values into numerical values
- Converting a dimension field with a high number of attributes into a dimension field with fewer groups of those attributes
- Using clustering methods such as k-means or PCA to group items together that have similar attributes
A real-life use case might involve categorizing products in your company into categories based on their product life cycle phases or attributes. Or you may want to place numerical values into bins to make it easier to spot differences within a high range of numbers.
Here are just a few common tools you could use to perform data transformation:
- Tableau’s sets and groups features
- Tableau Prep
- Power BI’s Power Query Editor
- Python packages such as Pandas and Numpy
- R’s tidyverse collection of packages, which can be used for data transformation