When you perform exploratory data analysis for a client, it’s a great idea to avoid providing insights that are too obvious. Instead, you’ll want to provide insights that stakeholders least expect and insights that lead to maximum, actionable impact. The following three approaches will help you achieve that. I’ll also provide examples, real-life use cases, techniques, and visualization types associated with each approach.

Approach #1: Identify Outliers and Anomalies

This involves finding trends and data points that deviate from the usual.

Examples:

  • Sudden changes in trends
  • Attributes that are over-performing or under-performing compared to the past
  • Metrics that most deviate from performance targets set by stakeholders

A real-life use case might involve identifying cities, counties, products, and sales people that are either over-performing or under-performing by a significant amount in relation to your business.

Visualization Types you could use to accomplish this are:

  • Stacked bar charts
  • Chart matrices
  • Cumulative line charts
  • Box-and-whisker plots
  • Histograms
  • Scatter plots
  • Heat tables
  • And more

Analytical approaches:

  • Using Z-scores to Detect Outliers
  • Identifying values in upper and lower quartiles of Interquartile Ranges

Approach #2: Identify Causes and Relationships

 This involves finding drivers of upwings and downswings in your data.

A real-life use case might involve finding a sharp increase of sales in a country and identifying the specific cities and counties that contribute to those sales.

Visualization types include:

  • Chart matrices so that you can observe the trends of multiple dimensions at a time.
  • Scatter plots, which are ideal for determining correlations or lack thereof.
  • Correlation plots that can be used to analyze correlations of multiple dimensions at a time. Typically, correlation plots are created with programming languages such as Python and R.

Perhaps worth mentioning, Power BI has an insights feature that will help you identify causes automatically.

Approach #3: Segment and Group Data in New and Creative Ways

This involves transforming your data and creating new columns in data sets. When you change the presentation of your data, you give yourself the ability to find insights that were never seen before in your company.

General examples include:

  • Converting numerical values into categorical groups (in other words, bins)
  • Converting categorical values into numerical values
  • Converting a dimension field with a high number of attributes into a dimension field with fewer groups of those attributes
  • Using clustering methods such as k-means or PCA to group items together that have similar attributes

A real-life use case might involve categorizing products in your company into categories based on their product life cycle phases or attributes. Or you may want to place numerical values into bins to make it easier to spot differences within a high range of numbers.

Here are just a few common tools you could use to perform data transformation:

  • Tableau’s sets and groups features
  • Tableau Prep
  • Power BI’s Power Query Editor
  • Python packages such as Pandas and Numpy
  • R’s tidyverse collection of packages, which can be used for data transformation
FREE EBOOK - 50+ BEST COURSES FOR DATA PROFESSIONALS

FREE EBOOK - 50+ BEST COURSES FOR DATA PROFESSIONALS

Before you go...make sure to get our FREE EBOOK to help accelerate your data and analytics skills. Get started today before this once-in-a-lifetime opportunity expires.

You have Successfully Subscribed!

%d bloggers like this: