loading pandas csv

How to import CSV files in pandas?

 

Overview

In this blog post you will learn the steps required to import CSV files in pandas library using Jupyter notebook. We will also explore what the most popular data types are.

Data Sources Type Explained to Growth Marketers

You might wonder, why would you need to import CSV files in pandas? Each team in marketing, finance, and product may have their own preferred data sources. For instance, in paid media marketing, handling extensive customer data like demographic information and purchase history might involve utilizing MySQL or Excel file formats. On the other hand, a product team that deals with large amounts of textual data might opt for JSON, CSV, or XML as their preferred source types. Considering the diverse requirements of different teams, pandas offers remarkable flexibility through its APIs. It enables the reading and merging of various file types, allowing teams to seamlessly work with their preferred data sources.

What Kind of Data can You Import in Pandas?

The table below summarise the data sources type, the format type and the command you should use to process the action required.

Data Source Type Read Function Write Function Format Type
CSV read_csv to_csv Text
JSON read_json to_json Text
HTML read_html to_html Text
XML read_xml to_xml Text
Excel read_excel to_excel Binary
Stata read_stata to_stata Binary
SAS read_sas to_sas Binary
Clipboard read_clipboard to_clipboard Text
Python Pickle File Format read_pickle to_pickle Binary
Source: Govindan et al (2021). In Data Science for Marketing Analytics: A Practical Guide to Forming a Killer Marketing Strategy Through Data Analysis with Python (p. 9). Packt.

 

From the table above appears Pandas allows to import 9 different type of data source in Text and Binary format. The command you need to use to load a CSV file into a DataFrame is ‘read_csv’. If you need write a DataFrame to an CSV file the command would be: “to_csv”.

 

How can I import a CSV file in pandas?

Let’s assume you work as a Growth Marketing Manager for a fictional EdTech Scaleup eLearning company named eThePlay. The finance team have provided you a CSV file named ‘sales’ including financial performance of the last 5 years such as: ‘Yearly revenue’, ‘Number of Users’, ‘Cost of acquisition’, ‘ROAS’ and ‘ROI’.

Scenario A)

You got the file by email and downloaded the CSV file and stored it in the same directory where you have saved the Jupyter notebook installation file. Therefore, you can progress with following steps:

 
import pandas as pd
sales = pd.read_csv (sales.csv)

Scenario B)

You have downloaded and stored the Sales.csv file in any folder other than the one where you have installed your Jupyter notebook

import pandas as pd
sales = pd.read_csv(r"C:\Users\NICO RUBINO\Documents\Business\ThePlay\Finance\FY2023\sales.csv"

Steps Explained:

    • import pandas as pd: you should always include this command in the top of your Python script every time you run a new code to enable pandas. Once we import ‘pandas as pd’ we’ll be ready to use.
    • sales: this is a chosen name to identify a new DataFrame. By importing a CSV file into pandas, we create a new DataFrame (df). A DataFrame is like a spreadsheet in Excel and it is the fundamental tabular structure that store data in rows and columns. If you are not familiar with this topic, take a look at this blog post for examples and to learn more.
    • read_csv: as shown in the previous table, ‘read_csv’ is a command to load and read a data source type into a DataFrame.
    • r: we added r before the path to take care of any special characters in the path.

Based on what you have learned about pandas so far, you can ensure that you have successfully loaded the file by executing these two commands:

sales.head()

The output will look like a table showing up to the first 5 rows.

sales.info()

The output is a summary of a DataFrame, including the index, number of columns, rows, data types

Conclusion

In conclusion, pandas is a powerful and versatile library that empowers data analysts with efficient data manipulation and analysis capabilities. In this new, short, actionable hands-on blog post series, I have provided tips on loading data from CSV and multiple sources.

Stay tuned for more helpful tips!


Bibliography

Govindan, G., Baig, M. R., & Shrimali, V. R. (2021). Data Science for Marketing Analytics: A Practical Guide to Forming a Killer Marketing Strategy Through Data Analysis with Python. Packt.

Hadley, W., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and model data. O’Reilly Media.

McKinney, W. (2018). Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. O’Reilly Media.

Paskhaver, B. (2021). Pandas in Action. Manning Publications.

VanderPlas, J. (2016). Python data science handbook: Essential tools for working with data. O’Reilly Media.

Wickham, H., & Grolemund, G. (2017). R packages: Organize, test, document, and share your code. O’Reilly Media.

Zaki, M. J., & Meira Jr, W. (2014). Data mining and analysis: Fundamental concepts and algorithms. Cambridge University Press.

 

1366 910 Nicola Rubino

Leave a Reply