Pandas and Python: A Symphony of Data Analysis

Data is the backbone in machine learning, and with Python being one of the most popular programming languages in machine learning, there exists a need to be able to collect, import and process different data using Python. This is where Pandas come in.

Pandas is an open source tool used for data analysis that is build on Python programming language. With various inbuilt constructors, Pandas is a tool to process and analyze data. Importing data, creating/ saving data, plotting various types of graphs, eliminating null values, normalizing/standardizing the data, accessing the statistic etc. are some of the functionalities of it.

In this article, we will scratch just the top layer of Pandas, i.e., we will see how to create data, save it and import an already existing data. This is what we are going to cover in this article:

What is a Dataframe in Pandas?

In the world of data analysis with Pandas in Python, a DataFrame is a fundamental and powerful structure. Think of it as a two-dimensional, tabular data structure that resembles a spreadsheet or SQL table, where rows and columns intersect to form a grid. Each column in a DataFrame represents a different variable, while each row corresponds to a specific entry or observation. This structure allows data scientists and analysts to effortlessly manipulate and analyze data, performing operations like filtering, sorting, and aggregating with ease.

Pandas tutorial in Python

In this following beginner-friendly tutorial, we will see how we can create, save and load data using Pandas in Python. We will consider creating data provides us information on house prices depending on the number of rooms they have.

1. Creating a Pandas Dataframe in Python

Pandas provides a constructor called Dataframe() to convert the input array/dictionary into a dataframe. Let us start by creating a simple dictionary containing house prices according to the number of rooms.

import pandas as pd

house_prices = {'Rooms':[1,2,3,2,3,1,3,1, 2,2], 'Price': [10000,20000,30000,20000,30000,10000,30000,10000,20000,20000]}

house_data = pd.DataFrame(house_prices)

To visualize the data that we just created, we will call the head() method of our dataframe. It is important to note that I used the word “method” in the previous statement. This is because we can treat the Pandas Dataframe as an object and call different methods as per our usability.

house_data.head()

Now you would see a table that shows data with number of rooms and prices. Clearly, it says that houses are priced according to the number of rooms they have. But the dataframe looks incomplete as it does not provide information as to which data belongs to which house. For that, we will add index to the dataframe.

house_data = pd.DataFrame(house_prices, index=['House 1','House 2',' House 3','House 4','House 5','House 6','House 7','House 8','House 9','House 10'])

Now you would see a one more column that contains the indices of the houses. So far so good! Now we do not want to lose this data and want to export this data for others. Thus we will go ahead and save this data.

2. Saving the Dataframe

Pandas offers a class attribute to save an already existing dataframe: to_csv(). The attribute takes in the path plus the name as which the dataframe is to be saved.

#saving the dataframe in csv format with the name 'housing-data.csv'
house_data.to_csv('/home/housing-data.csv')

Voila! We have now saved our data. You can check if there exists a new CSV file in your working directory.

Before we proceed, here’s a small information. Hopefully, all of your code worked well till now. However, in case you feel stuck or meet an error at some point, feel free to DM me on Instagram: @machinelearningsite. Coding hiccups happen to the best of us, and sometimes it’s the little things that trip us up. Speaking of which, for a subtle nod to your coding enthusiasm, check out this cool programming-themed t-shirt. It’s a fun and stylish way to showcase your love for coding in your everyday life.

3. Loading an existing Dataframe in Python

Coming back to our Pandas project, we now want to import some existing dataframe into our code. Just as we used to_csv to save a dataframe, the attribute read_csv is used to read an existing CSV data.

# reading the data and saving it to a variable
data = pd.read_csv('/home/housing-data.csv')

And done! We have now successfully imported an existing dataframe and we can proceed to analyze this data to train our machine learning model.

Summary

In this article, we learnt about creating, saving and loading a DataFrame using Pandas in Python. I hope you’ve gained a basic understanding of the combination of Pandas and Python. There are further numerous functionalities of Pandas that are very useful like data visualization, data cleaning, data sorting etc., however these fall out of scope of this blog.

But hey, if you are interested and want me to cover some related topic, feel free to write me on Instagram. Like I mentioned earlier, feel free to get in contact on Instagram for any code related issues too: @machinelearningsite. And, for a touch of coding flair in your daily life, check out the programming-themed t-shirt mentioned earlier. It’s a lighthearted way to wear your coding passion proudly.

Get in touch with me on social media:

To stay updated on such interesting blogs and hands-on projects, sign up to my *FREE* newsletter:

Leave a Reply