How to import CSV files in pandas?

Nicola Rubino
June 30, 2023
No Comments
in Data Science

Overview

In this blog post you will learn the steps required to import CSV files in pandas library using Jupyter notebook. We will also explore what the most popular data types are.

Data Sources Type Explained to Growth Marketers

You might wonder, why would you need to import CSV files in pandas? Each team in marketing, finance, and product may have their own preferred data sources. For instance, in paid media marketing, handling extensive customer data like demographic information and purchase history might involve utilizing MySQL or Excel file formats. On the other hand, a product team that deals with large amounts of textual data might opt for JSON, CSV, or XML as their preferred source types. Considering the diverse requirements of different teams, pandas offers remarkable flexibility through its APIs. It enables the reading and merging of various file types, allowing teams to seamlessly work with their preferred data sources.

What Kind of Data can You Import in Pandas?

The table below summarise the data sources type, the format type and the command you should use to process the action required.

Data Source Type	Read Function	Write Function	Format Type
CSV	read_csv	to_csv	Text
JSON	read_json	to_json	Text
HTML	read_html	to_html	Text
XML	read_xml	to_xml	Text
Excel	read_excel	to_excel	Binary
Stata	read_stata	to_stata	Binary
SAS	read_sas	to_sas	Binary
Clipboard	read_clipboard	to_clipboard	Text
Python Pickle File Format	read_pickle	to_pickle	Binary

Source: Govindan et al (2021). In Data Science for Marketing Analytics: A Practical Guide to Forming a Killer Marketing Strategy Through Data Analysis with Python (p. 9). Packt.

From the table above appears Pandas allows to import 9 different type of data source in Text and Binary format. The command you need to use to load a CSV file into a DataFrame is ‘read_csv’. If you need write a DataFrame to an CSV file the command would be: “to_csv”.

How can I import a CSV file in pandas?

Let’s assume you work as a Growth Marketing Manager for a fictional EdTech Scaleup eLearning company named eThePlay. The finance team have provided you a CSV file named ‘sales’ including financial performance of the last 5 years such as: ‘Yearly revenue’, ‘Number of Users’, ‘Cost of acquisition’, ‘ROAS’ and ‘ROI’.

Scenario A)

You got the file by email and downloaded the CSV file and stored it in the same directory where you have saved the Jupyter notebook installation file. Therefore, you can progress with following steps:

import pandas as pd
sales = pd.read_csv (sales.csv)

Scenario B)

You have downloaded and stored the Sales.csv file in any folder other than the one where you have installed your Jupyter notebook

import pandas as pd
sales = pd.read_csv(r"C:\Users\NICO RUBINO\Documents\Business\ThePlay\Finance\FY2023\sales.csv"

Steps Explained:

- import pandas as pd: you should always include this command in the top of your Python script every time you run a new code to enable pandas. Once we import ‘pandas as pd’ we’ll be ready to use.
- sales: this is a chosen name to identify a new DataFrame. By importing a CSV file into pandas, we create a new DataFrame (df). A DataFrame is like a spreadsheet in Excel and it is the fundamental tabular structure that store data in rows and columns. If you are not familiar with this topic, take a look at this blog post for examples and to learn more.
- read_csv: as shown in the previous table, ‘read_csv’ is a command to load and read a data source type into a DataFrame.
- r: we added r before the path to take care of any special characters in the path.

Based on what you have learned about pandas so far, you can ensure that you have successfully loaded the file by executing these two commands:

sales.head()

The output will look like a table showing up to the first 5 rows.

sales.info()

The output is a summary of a DataFrame, including the index, number of columns, rows, data types

Conclusion

In conclusion, pandas is a powerful and versatile library that empowers data analysts with efficient data manipulation and analysis capabilities. In this new, short, actionable hands-on blog post series, I have provided tips on loading data from CSV and multiple sources.

Stay tuned for more helpful tips!

Bibliography

Govindan, G., Baig, M. R., & Shrimali, V. R. (2021). Data Science for Marketing Analytics: A Practical Guide to Forming a Killer Marketing Strategy Through Data Analysis with Python. Packt.

Hadley, W., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and model data. O’Reilly Media.

McKinney, W. (2018). Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. O’Reilly Media.

Paskhaver, B. (2021). Pandas in Action. Manning Publications.

VanderPlas, J. (2016). Python data science handbook: Essential tools for working with data. O’Reilly Media.

Wickham, H., & Grolemund, G. (2017). R packages: Organize, test, document, and share your code. O’Reilly Media.

Zaki, M. J., & Meira Jr, W. (2014). Data mining and analysis: Fundamental concepts and algorithms. Cambridge University Press.

https://nicorubino.com/wp-content/uploads/2023/06/pandas-csv.jpg 1366 910 Nicola Rubino Nicola Rubino https://secure.gravatar.com/avatar/67217918279b1a30fcad305e38dfdc51?s=96&d=mm&r=g June 30, 2023 February 25, 2024

How to import CSV files in pandas?

Data Sources Type Explained to Growth Marketers

What Kind of Data can You Import in Pandas?

How can I import a CSV file in pandas?

Conclusion

Bibliography

Leave a Reply Cancel Reply

Unassigned Traffic in Google Analytics: 4 Ways To Fix It

How to Standardise Data in Pandas. Z-Score Method

How to Build Pivot Table in Pandas. Examples for Marketers.