This document discusses exploratory data analysis (EDA) using Python and the Pandas library. It introduces common EDA commands like reading data, displaying the dataframe, checking the shape, showing the head and tail, and using describe() for summary statistics. These commands are demonstrated on a tips.csv dataset to explore the key attributes and get an understanding of the data. The document explains that EDA involves using various commands to understand the structure and patterns in a new dataset.
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
89 views
Exploratory Data Visualization Using Python
This document discusses exploratory data analysis (EDA) using Python and the Pandas library. It introduces common EDA commands like reading data, displaying the dataframe, checking the shape, showing the head and tail, and using describe() for summary statistics. These commands are demonstrated on a tips.csv dataset to explore the key attributes and get an understanding of the data. The document explains that EDA involves using various commands to understand the structure and patterns in a new dataset.
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3
Exploratory data visualization using python
Hello and welcome back to the data visualization course.
In this lesson, we will be understanding how to do exploratory data analysis or EDA in Jupyter. Let’s start with basic EDA commands available in Pandas. Open Anaconda and then Jupyter. Next, let’s make a new notebook, so click this New button and pick Python 3 in the dropdown. This is going to create the new notebook. Now rename the file to SPARTA Week 7. … And now we can start. We’re still going to use the tips.csv dataset so let’s read that file. Let’s get in the code cell and type the following“import pandas as pd”, as you know, this will import the pandas library and create the object or handle called pd, containing all the features available in pandas. Next, let’s read the data from the file and put all of it in a variable called tips. So let’s type, tips = pd.read_csv(“tips.csv”). Now that we’ve loaded the dataset into the tips handle or variable, we could start exploring it. Before we proceed, let me elaborate a bit more about exploratory data analysis. Exploring a new dataset is like being blindfolded and then you’re led to an unknown sculpture. So use your hands and feel the shape of the sculpture. You try to find out how irregular the shape is – how tall is it? How wide? how big? What are its quirks? etc etc. In Python, we can explore the dataset using pandas commands. Let’s try some of these commands. Do you still remember that typing the name of the dataset will show you the dataset in table format? So, let’s do that. If we type tips here now and run this code…, we’ll get this table showing the key attributes and content of the tips table. Python refers to this table as a dataframe. Displaying the dataframe is always a good start to exploring the data set. The next exploratory command we’ll be trying is the shape command. Type tips.shape and run it to see what happens. Here we are. The output is a set of two numbers. This first number here is the number of rows and this second one is the number columns in the dataframe. So the dataset contains 244 rows and 7 columns. Next let’s try the head command, so type tips.head() The result shows the top 5 records of the tips dataset. We can show the top 10 by putting a value inside the parenthesis. Like so… type 10 inside and shift-return. … So now we have 10 records. To show the last 5 records, we type tips.tail(). And here we see the last 5 rows of the dataset. Next, we’ll use the describe command to get summary statistics on our data, so type tips.describe(), run it so shift + center, and let’s check out the report. So the describe() command detects numerical attributes in our table and will calculate the statistical summaries. Therefore in this result set, we can see that Python dropped the categorical values or the columns containing categorical values and kept only the numerical ones. … The first row shows the record count. For the three attributes -- total_bill, tip, and size -- they all have 244 records. Next, we see the mean values for each the three attributes. Like here, the average total bill is about 19.80. The average tip runs up to about 3, and the average size, which is the number of people on the table, is 2 to 3 people. And the rest are basic statistical data on the dataframe. Here we have standard deviation, the minimum values per attribute, the inter-quantile range values, and finally, the maximum values for each. So those are the important EDA commands you need to know to make it easier for you to explore the data, there are more actually, we will be taking up some of them as we go along. Save your work, and see you in the next lesson!