If you’re in a field related to analytics or data science, you’re always going to be in situations where you need to create new columns in your data so that you can have a greater understanding of it. And most likely, using a software like Excel is way too time-consuming.
So in this article, I’m going to show you how to create columns in Python with functions.
The Data
In this example, I’m using a sample data set I created. There are columns for customers, revenues, and growth %.
import pandas as pd
CustomerSampleData = pd.read_excel(r’C:\Users\timen\Documents\Data Sets\Customer Sample Data.xlsx’)
CustomerSampleData.head()
String Conditions
Right, I’m creating a function called IndustryConditions.
A function is a block of code that only runs when its called.
In the parenthesis, is the name of the argument of the function. Information is passed into functions as arguments.
In this function, the name of an industry is returned for each customer.
I’m also creating a new column called “Industries” and I’m applying the function to the column.
In the table, there’s a new column with each of these industries.
def IndustryConditions(a):
if (a[‘Customers’] == ‘Duke Trading Associates’):
return ‘Financial Services’
elif (a[‘Customers’] == ‘Luxury Restaurant Group’):
return ‘Food & Beverage’
elif (a[‘Customers’] == ‘ABC Software’):
return ‘Software’
elif (a[‘Customers’] == ‘Sterling Energy Corp’):
return ‘Energy’
else:
return ‘Travel & Hospitality’
CustomerSampleData[‘Industries’] = CustomerSampleData.apply(IndustryConditions, axis=1)
CustomerSampleData.head()
Numerical Conditions
In this section, I’m creating a function called RevenueConditions.
In this function, I’m categorizing revenues starting with the highest revenues to the lowest revenues.
I’m applying this function to a new column called “RevenueCategories.”
def RevenueConditions(b):
if (b[‘Revenues’] >= 10000000):
return ‘High’
elif (b[‘Revenues’] >= 1000000):
return ‘Medium’
else:
return ‘Low’
CustomerSampleData[‘RevenueCategories’] = CustomerSampleData.apply(RevenueConditions, axis=1)
CustomerSampleData.head()
Numerical Conditions 2
In this function, I’m determining whether an opportunity is high based on whether revenues are equal or above 1 million and whether the growth percentage is equal or above 20 percent.
I’m applying this function to a new column called “Opportunity.”
def OpportunityConditions(c):
if (c[‘Revenues’] >= 1000000) and (c[‘Growth%’] >= .2):
return ‘High’
else:
return ‘Other’
CustomerSampleData[‘Opportunity’] = CustomerSampleData.apply(OpportunityConditions, axis=1)
CustomerSampleData.head()
Code used in this article is here:
• https://github.com/TimEnalls/Creating-Columns-with-Python-Functions
URLs to recommended courses are shown below:
Programming for Data Science with Python:
🎓 https://bit.ly/2O7mwqj
Become a Data Analyst:
🎓 https://bit.ly/38Es6Kf
Python for Data Science and Machine Learning Bootcamp:
🎓 https://bit.ly/31ZQmVR
Python A-Z™: Python For Data Science With Real Exercises!:
🎓 https://bit.ly/3fd6yGU
Sources:
• https://www.w3schools.com/python/python_functions.asp
• https://docs.python.org/2.0/ref/calls.html
• https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html