0% found this document useful (0 votes)
89 views

Exploratory Data Visualization Using Python

This document discusses exploratory data analysis (EDA) using Python and the Pandas library. It introduces common EDA commands like reading data, displaying the dataframe, checking the shape, showing the head and tail, and using describe() for summary statistics. These commands are demonstrated on a tips.csv dataset to explore the key attributes and get an understanding of the data. The document explains that EDA involves using various commands to understand the structure and patterns in a new dataset.

Uploaded by

Reymon Dela Cruz
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views

Exploratory Data Visualization Using Python

This document discusses exploratory data analysis (EDA) using Python and the Pandas library. It introduces common EDA commands like reading data, displaying the dataframe, checking the shape, showing the head and tail, and using describe() for summary statistics. These commands are demonstrated on a tips.csv dataset to explore the key attributes and get an understanding of the data. The document explains that EDA involves using various commands to understand the structure and patterns in a new dataset.

Uploaded by

Reymon Dela Cruz
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Exploratory data visualization using python

Hello and welcome back to the data visualization course.


In this lesson, we will be understanding how to do exploratory data analysis or EDA in
Jupyter.
Let’s start with basic EDA commands available in Pandas.
Open Anaconda and then Jupyter.
Next, let’s make a new notebook, so click this New button and pick Python 3 in the
dropdown.
This is going to create the new notebook. Now rename the file to SPARTA Week 7. …
And now we can start. We’re still going to use
the tips.csv dataset so let’s read that file.
Let’s get in the code cell and type the following“import pandas as pd”, as you know,
this will import the pandas library and create the object or handle called pd,
containing all the features available in pandas. Next, let’s read the data from the file
and put all of it in a variable called tips. So let’s type, tips = pd.read_csv(“tips.csv”).
Now that we’ve loaded the dataset into the tips handle or variable, we could start
exploring
it. Before we proceed, let me elaborate a bit more about exploratory data analysis.
Exploring a new dataset is like being blindfolded and then you’re led to an unknown
sculpture.
So use your hands and feel the shape of the sculpture. You try to find out how
irregular the shape is – how tall is it? How wide? how big? What are its quirks? etc etc.
In Python, we can explore the dataset using pandas commands. Let’s try some of
these commands.
Do you still remember that typing the name of the dataset will show you the dataset in
table format?
So, let’s do that. If we type tips here now and run this code…, we’ll get this table
showing the
key attributes and content of the tips table. Python refers to this table as a dataframe.
Displaying the dataframe is always a good start to exploring the data set.
The next exploratory command we’ll be trying is the shape command. Type tips.shape
and run it to see what happens. Here we are. The output is a set of
two numbers. This first number here is the number of rows and this second one is the
number columns
in the dataframe. So the dataset contains 244 rows and 7 columns.
Next let’s try the head command, so type tips.head()
The result shows the top 5 records of the tips dataset.
We can show the top 10 by putting a value inside the parenthesis. Like so… type 10
inside and shift-return. … So now we have 10 records.
To show the last 5 records, we type tips.tail(). And here we see the last 5 rows of the
dataset.
Next, we’ll use the describe command to get summary statistics on our data,
so type tips.describe(), run it so shift + center, and let’s check out the report.
So the describe() command detects numerical attributes in our table
and will calculate the statistical summaries. Therefore in this result set,
we can see that Python dropped the categorical values
or the columns containing categorical values and kept only the numerical ones.
… The first row shows the record count. For the three attributes -- total_bill,
tip, and size -- they all have 244 records. Next, we see the mean values for each the
three
attributes. Like here, the average total bill is about 19.80. The average tip runs
up to about 3, and the average size, which is the number of people on the table, is 2 to
3 people.
And the rest are basic statistical data on the dataframe. Here we have standard
deviation,
the minimum values per attribute,
the inter-quantile range values, and finally, the maximum values for each.
So those are the important EDA commands you need to know to make it easier for you
to
explore the data, there are more actually, we will be taking up some of them as we go
along.
Save your work, and see you in the next lesson!

You might also like