pandas nico rubino data analysis

How to Build Pivot Table in Pandas. Examples for Marketers.

This blog post covers:
      • Generating pivot tables from a CSV file
      • Renaming columns in pivot tables
      • Visualizing pivot tables with Matplotlib

 

 

Introduction

In this blog post, we’ll explore how to build and manipulate marketing pivot tables using pandas in Python and discover how to incorporate plotting libraries for diverse visualizations. As a data scientist, pivot tables offer me unparalleled flexibility, and I must confess, I heavily rely on them during the exploratory data analysis phase of any data science project for my clients.


 

Why Pivot Tables?

Pivot tables are an incredible tool for analysing data swiftly. Whether you want to delve into the intricate details of a specific product category or gain a comprehensive overview of your data, pivot tables provide the flexibility to do so with just a click of your mouse. 

I am 100% sure you’re already familiar with Pivot Tables in Microsoft Excel. Working with pivot tables in Excel you’ve likely encountered and utilized pivot tables extensively in the past, considering them Excel’s most popular feature. However, here’s an exciting fact: you can construct pivot tables in Python using Pandas! Creating a pivot table with Pandas is a pretty easy, and it functions in the same way. You can use them to summarise, sort or group data sorted in a table where you can transform rows to columns and columns to rows (Baig, 2021).

In the next paragraph we will manipulate a sales marketing dataset with the aim to learn from a real example.

 

A Marketing Challenge

 

B2C, EdTech Startup

You are the Growth Marketing Manager at ConnectMe, a fictional B2C SaaS platform aiming to connect professionals who seek to network while maintaining long-term goals. Your manager provided you a dataset named in .CSV named ‘Sales_ConnectMe’. It shows the income of your customers, their age, country, spending score and customer ID.

Here are the challenges when working with this DataFrame:

 

      1. What is the country with the highest level of income?
      2. Rename the columns: ‘Annual Income (k$)’, ‘Spending Score (1-100)’
      3. Generate graphs of income of clients
      4. What is the average salary of our customers?

 

Let’s start working on it! 🙂

 

Step1. Importing Libraries

When using pandas, you should use the following command to import pandas and load a new CSV file. Let’s imagine you work as a Growth Marketing Manager or Analyst for a Startup. The CSV file may contain revenue records of customers, their acquisition costs, product types sold, and information about the advertising campaigns and channels that brought in new customers.

Open Jupyter and insert the following lines of code:

 

import numpy as np, pandas as pd
import matplotlib.pyplot as plt, seaborn as sns 
%matplotlib inline
sales22 = pd.read_csv ('Sales_ConnectMe.csv')

The command ‘sales22’ returns the following output:

 

Source: Govindan, G., Baig, M. R., & Shrimali, V. R. (2021). Data Science for Marketing Analytics

Take a look at the table and you will notice in the dataset 198 rows and 6 columns. The column named ‘Annual Income (k$) and Spending Score (1-100) will be renamed in the next step.

 

Step2. Renaming Columns

Let’s change the name of the columns by applying these commands:

sales22.rename ({'Annual Income (k$)' : 'Income', 'Spending Score (1-100)' : 'Spending Score'}, 
axis=1, inplace=True)

 

 

 

Step3. Build Your Pivot Table

In this step we are building our first Pivot Table. We will want to segment the income of our customers based on their usual residence with the aim to understand where are based the richest customers?

To build a new pivot tables we will need to create a new DataFrame named ‘country_pivot_table’

 

country_pivot_table = sales22.pivot_table(index='Country', values='Income', aggfunc='sum')

country_pivot_table = pivot_table.applymap('{:,.0f}'.format)

 

In the aggfunc function, you may want to use a mean, max or sum. The table above shows Irish-based clients generate the highest number of revenues for our Ed Tech startup. German customers are the least in terms of value.

I applied the following command: ‘country_pivot_table = pivot_table.applymap(‘{:,.0f}’.format)’ to correctly set a formatting number in thousands.

 

Step4. Data Visualisation

In the fourth step of this exercise we will create our first histogram of the dataset. In particular, we will perform a data visualisation of the column ‘Income’ using the Matplotlib and Seaborn library. In this step we perform traditional rule-based segmentation approach to define customer segments.

 

sales22.Income.plot.hist(color='gray')
plt.xlabel('Income')
plt.show()

 

The histogram above shows that the majority of customers lies in the 50-90k range. Customers earning less than 40k are considered low-earning income customers.

 

Step5. Segmentation on Income

What is the the last question for the challenge for our Growth Marketing Manager? The question is: ‘What is the mean of the Moderate Income Clients?’ To answer this question we will build a new column named ‘Cluster’ and also we will apply a GroupBy Function.

 

sales22['Cluster'] = np.where(sales22.Income >= 90, 'High earners', np.where(sales22.Income < 50, 'Low Income', 'Moderate Income'))

sales22.groupby('Cluster')['Income'].describe()

 

 

In the last step we applied the function .describe() to explore that the statistical values of our three segments. In particular, the mean of the moderate income clients that is: €69k – This is a very useful information to improve future marketing campaigns and exclude customers included in the ‘Low Income” cluster.

 

Conclusion

In this blog post, you learned how to import pandas libraries to explore, analyse data manipulation and create pivot tables for marketing analytics. We made use of a practical example such as fictional startup in EdTech

You learned how to launch commands in Jupyter to build pivot tables in Python. We also learned to segment users and performing analysis of our data set through pandas. With pandas as a powerful tool in your arsenal, you are well-equipped to tackle data challenges and drive growth in your marketing initiatives with Google Analytics to drive predictive growth.

Send me an email here if you have any questions!


 


 

Bibliography


Govindan, G., Baig, M. R., & Shrimali, V. R. (2021). Data Science for Marketing Analytics: A Practical Guide to Forming a Killer Marketing Strategy Through Data Analysis with Python. Packt.

Hadley, W., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and model data. O’Reilly Media.

McKinney, W. (2018). Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. O’Reilly Media.

Paskhaver, B. (2021). Pandas in Action. Manning Publications.

VanderPlas, J. (2016). Python data science handbook: Essential tools for working with data. O’Reilly Media.

Wickham, H., & Grolemund, G. (2017). R packages: Organize, test, document, and share your code. O’Reilly Media.

Zaki, M. J., & Meira Jr, W. (2014). Data mining and analysis: Fundamental concepts and algorithms. Cambridge University Press.

 

2560 1900 Nicola Rubino

Leave a Reply