If you’re in a field related to analytics or data science, you’re always going to be in situations where you need to create new columns in your data so that you can have a greater understanding of it. And most likely, using a software like Excel is way too time-consuming.

So in this article, I’m going to show you how to create columns in Python with functions.

The Data

In this example, I’m using a sample data set I created. There are columns for customers, revenues, and growth %.

import pandas as pd

CustomerSampleData = pd.read_excel(r’C:\Users\timen\Documents\Data Sets\Customer Sample Data.xlsx’)

CustomerSampleData.head()

String Conditions

Right, I’m creating a function called IndustryConditions.

A function is a block of code that only runs when its called.

In the parenthesis, is the name of the argument of the function. Information is passed into functions as arguments.

In this function, the name of an industry is returned for each customer.

I’m also creating a new column called “Industries” and  I’m applying the function to the column.

In the table, there’s a new column with each of these industries.

def IndustryConditions(a):
if (a[‘Customers’] == ‘Duke Trading Associates’):
return ‘Financial Services’
elif (a[‘Customers’] == ‘Luxury Restaurant Group’):
return ‘Food & Beverage’
elif (a[‘Customers’] == ‘ABC Software’):
return ‘Software’
elif (a[‘Customers’] == ‘Sterling Energy Corp’):
return ‘Energy’
else:
return ‘Travel & Hospitality’

CustomerSampleData[‘Industries’] = CustomerSampleData.apply(IndustryConditions, axis=1)

CustomerSampleData.head()

Numerical Conditions

In this section, I’m creating a function called RevenueConditions.

In this function, I’m categorizing revenues starting with the highest revenues to the lowest revenues.

I’m applying this function to a new column called “RevenueCategories.”

def RevenueConditions(b):
if (b[‘Revenues’] >= 10000000):
return ‘High’
elif (b[‘Revenues’] >= 1000000):
return ‘Medium’
else:
return ‘Low’

CustomerSampleData[‘RevenueCategories’] = CustomerSampleData.apply(RevenueConditions, axis=1)

CustomerSampleData.head()

Numerical Conditions 2

In this function, I’m determining whether an opportunity is high based on whether revenues are equal or above 1 million and whether the growth percentage is equal or above 20 percent.

I’m applying this function to a new column called “Opportunity.”

def OpportunityConditions(c):
if (c[‘Revenues’] >= 1000000) and (c[‘Growth%’] >= .2):
return ‘High’
else:
return ‘Other’

CustomerSampleData[‘Opportunity’] = CustomerSampleData.apply(OpportunityConditions, axis=1)

CustomerSampleData.head()

YouTube Video:

Code used in this article is here:

https://github.com/TimEnalls/Creating-Columns-with-Python-Functions

URLs to recommended courses are shown below:

Programming for Data Science with Python:
🎓 https://bit.ly/2O7mwqj

Become a Data Analyst:
🎓 https://bit.ly/38Es6Kf

Python for Data Science and Machine Learning Bootcamp:
🎓 https://bit.ly/31ZQmVR

Python A-Z™: Python For Data Science With Real Exercises!:
🎓 https://bit.ly/3fd6yGU

Sources:

https://www.w3schools.com/python/python_functions.asp
https://docs.python.org/2.0/ref/calls.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

%d bloggers like this: