Python EDA for Beginners

This document discusses exploratory data analysis (EDA) using Python and the Pandas library. It introduces common EDA commands like reading data, displaying the dataframe, checking the shape, showing the head and tail, and using describe() for summary statistics. These commands are demonstrated on a tips.csv dataset to explore the key attributes and get an understanding of the data. The document explains that EDA involves using various commands to understand the structure and patterns in a new dataset.

Uploaded by

Reymon Dela Cruz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

161 views3 pages

Python EDA for Beginners

Uploaded by

Reymon Dela Cruz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Exploratory data visualization using python

Hello and welcome back to the data visualization course.

In this lesson, we will be understanding how to do exploratory data analysis or EDA in
Jupyter.
Let’s start with basic EDA commands available in Pandas.
Open Anaconda and then Jupyter.
Next, let’s make a new notebook, so click this New button and pick Python 3 in the
dropdown.
This is going to create the new notebook. Now rename the file to SPARTA Week 7. …
And now we can start. We’re still going to use
the [Link] dataset so let’s read that file.
Let’s get in the code cell and type the following“import pandas as pd”, as you know,
this will import the pandas library and create the object or handle called pd,
containing all the features available in pandas. Next, let’s read the data from the file
and put all of it in a variable called tips. So let’s type, tips = pd.read_csv(“[Link]”).
Now that we’ve loaded the dataset into the tips handle or variable, we could start
exploring
it. Before we proceed, let me elaborate a bit more about exploratory data analysis.
Exploring a new dataset is like being blindfolded and then you’re led to an unknown
sculpture.
So use your hands and feel the shape of the sculpture. You try to find out how
irregular the shape is – how tall is it? How wide? how big? What are its quirks? etc etc.
In Python, we can explore the dataset using pandas commands. Let’s try some of
these commands.
Do you still remember that typing the name of the dataset will show you the dataset in
table format?
So, let’s do that. If we type tips here now and run this code…, we’ll get this table
showing the
key attributes and content of the tips table. Python refers to this table as a dataframe.
Displaying the dataframe is always a good start to exploring the data set.
The next exploratory command we’ll be trying is the shape command. Type [Link]
and run it to see what happens. Here we are. The output is a set of
two numbers. This first number here is the number of rows and this second one is the
number columns
in the dataframe. So the dataset contains 244 rows and 7 columns.
Next let’s try the head command, so type [Link]()
The result shows the top 5 records of the tips dataset.
We can show the top 10 by putting a value inside the parenthesis. Like so… type 10
inside and shift-return. … So now we have 10 records.
To show the last 5 records, we type [Link](). And here we see the last 5 rows of the
dataset.
Next, we’ll use the describe command to get summary statistics on our data,
so type [Link](), run it so shift + center, and let’s check out the report.
So the describe() command detects numerical attributes in our table
and will calculate the statistical summaries. Therefore in this result set,
we can see that Python dropped the categorical values
or the columns containing categorical values and kept only the numerical ones.
… The first row shows the record count. For the three attributes -- total_bill,
tip, and size -- they all have 244 records. Next, we see the mean values for each the
three
attributes. Like here, the average total bill is about 19.80. The average tip runs
up to about 3, and the average size, which is the number of people on the table, is 2 to
3 people.
And the rest are basic statistical data on the dataframe. Here we have standard
deviation,
the minimum values per attribute,
the inter-quantile range values, and finally, the maximum values for each.
So those are the important EDA commands you need to know to make it easier for you
to
explore the data, there are more actually, we will be taking up some of them as we go
along.
Save your work, and see you in the next lesson!

Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
PDF Experiments-1 DADV
No ratings yet
PDF Experiments-1 DADV
41 pages
W04L01 - FA23 - AIC270 - Programming For AI - Syed Ahmed
No ratings yet
W04L01 - FA23 - AIC270 - Programming For AI - Syed Ahmed
66 pages
2,3. Introduction Pandas & Matplotlib
No ratings yet
2,3. Introduction Pandas & Matplotlib
32 pages
Intro To Exploratory Data Analysis Eda in Python
No ratings yet
Intro To Exploratory Data Analysis Eda in Python
7 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
‏لقطة شاشة ٢٠٢٤-٠٥-٠٧ في ٧.٢٧.١٤ م
No ratings yet
‏لقطة شاشة ٢٠٢٤-٠٥-٠٧ في ٧.٢٧.١٤ م
12 pages
Python Pandas Beginner's Guide
No ratings yet
Python Pandas Beginner's Guide
45 pages
Official Documents of Datascience
No ratings yet
Official Documents of Datascience
2 pages
Week 2 - Data Exploration
No ratings yet
Week 2 - Data Exploration
8 pages
Data Mining Vs Data Exploration UNIT-II
No ratings yet
Data Mining Vs Data Exploration UNIT-II
11 pages
EDA of Unicorn Companies
No ratings yet
EDA of Unicorn Companies
30 pages
Pandas Learndatasci
No ratings yet
Pandas Learndatasci
86 pages
Machine
No ratings yet
Machine
10 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
Data Visualization with Python Tutorial
100% (1)
Data Visualization with Python Tutorial
9 pages
STQS2223 CH 4
No ratings yet
STQS2223 CH 4
30 pages
Course - Introduction To Data Science (SD211105)
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
Intro
No ratings yet
Intro
26 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
39 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Data Prep & EDA for Python Users
No ratings yet
Data Prep & EDA for Python Users
12 pages
DataFrame Basics in Data Analytics
No ratings yet
DataFrame Basics in Data Analytics
9 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
11 pages
Data Analysis Guide for Beginners
No ratings yet
Data Analysis Guide for Beginners
26 pages
Document
No ratings yet
Document
21 pages
Python Data Exploration Guide
100% (1)
Python Data Exploration Guide
12 pages
Mastering Exploratory Data Analysis With Python - A Comprehensive Guide To Unveiling Hidden Insights
No ratings yet
Mastering Exploratory Data Analysis With Python - A Comprehensive Guide To Unveiling Hidden Insights
73 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
Visual Aids for Exploratory Data Analysis
No ratings yet
Visual Aids for Exploratory Data Analysis
34 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Ad3301 Unit 1
No ratings yet
Ad3301 Unit 1
15 pages
EDA Step by Step
No ratings yet
EDA Step by Step
2 pages
Pandas
No ratings yet
Pandas
25 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
Data Visualization with Python Guide
No ratings yet
Data Visualization with Python Guide
35 pages
Data Analysis & Visualization Guide
No ratings yet
Data Analysis & Visualization Guide
9 pages
Data Analysis
No ratings yet
Data Analysis
42 pages
EDA Techniques in SAS for Data Science
No ratings yet
EDA Techniques in SAS for Data Science
25 pages
Eda Lab Assignment2
No ratings yet
Eda Lab Assignment2
10 pages
Data Acquisition and Insights with Pandas
No ratings yet
Data Acquisition and Insights with Pandas
8 pages
24UAD315 DEV Final Record
No ratings yet
24UAD315 DEV Final Record
49 pages
Panduan Pandas
No ratings yet
Panduan Pandas
33 pages
Explorato Ry: Data Analysis
No ratings yet
Explorato Ry: Data Analysis
6 pages
AIDS C04-Session-22
No ratings yet
AIDS C04-Session-22
22 pages
Python Libraries for Statistical Analysis
No ratings yet
Python Libraries for Statistical Analysis
40 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
The Basics of Pandas Library
No ratings yet
The Basics of Pandas Library
8 pages
Visual Data Analysis in Python
No ratings yet
Visual Data Analysis in Python
25 pages
Data Understanding and Preparation
No ratings yet
Data Understanding and Preparation
48 pages
Eda Code Snippets
No ratings yet
Eda Code Snippets
17 pages
Ex1 - Plotting and Visualization Using Numpy and Pandas
No ratings yet
Ex1 - Plotting and Visualization Using Numpy and Pandas
14 pages
Rajni Ip File Final
No ratings yet
Rajni Ip File Final
42 pages
Mallikarjun Maralinganavar - 2years - Test Engineer - Trane Tech
No ratings yet
Mallikarjun Maralinganavar - 2years - Test Engineer - Trane Tech
4 pages
Partiels-V2 0 3-Manual
No ratings yet
Partiels-V2 0 3-Manual
32 pages
Sparsh VP210 - V1R1 - Qig - 1664410042
No ratings yet
Sparsh VP210 - V1R1 - Qig - 1664410042
2 pages
Java Objects & Classes Guide
No ratings yet
Java Objects & Classes Guide
14 pages
How To Compile Xigmanas From Scratch
No ratings yet
How To Compile Xigmanas From Scratch
12 pages
Time Problem 1 - Computer Application 1: 1. Autocad Exercise (25 Marks)
No ratings yet
Time Problem 1 - Computer Application 1: 1. Autocad Exercise (25 Marks)
2 pages
اختصارات لوحة مفاتيح أفتر إفكتس
No ratings yet
اختصارات لوحة مفاتيح أفتر إفكتس
7 pages
Excel Inventory & Expense Analysis Lab
No ratings yet
Excel Inventory & Expense Analysis Lab
10 pages
Ielts Listening Mock Test Worksheet
No ratings yet
Ielts Listening Mock Test Worksheet
8 pages
Back To Basics An Introduction To MQTT
No ratings yet
Back To Basics An Introduction To MQTT
43 pages
A082 Practical No 9
No ratings yet
A082 Practical No 9
9 pages
Group Counselling Assignment Guide
No ratings yet
Group Counselling Assignment Guide
17 pages
Logcat 1681132080789
No ratings yet
Logcat 1681132080789
10 pages
Chapter Five
No ratings yet
Chapter Five
38 pages
Web Design Contract for Clients
No ratings yet
Web Design Contract for Clients
2 pages
Oscp+ Study Guide
No ratings yet
Oscp+ Study Guide
28 pages
Grade 11 ICT Model Paper November 2020
No ratings yet
Grade 11 ICT Model Paper November 2020
15 pages
Presentation - MS Excel Vs Google Sheets
No ratings yet
Presentation - MS Excel Vs Google Sheets
25 pages
Tosca Interview Questions For Advance Level
No ratings yet
Tosca Interview Questions For Advance Level
12 pages
Arabic Typing Tutor Guide
No ratings yet
Arabic Typing Tutor Guide
21 pages
SAP MM IDoc Configuration Guide
No ratings yet
SAP MM IDoc Configuration Guide
3 pages
Citra Log - Txt.old
No ratings yet
Citra Log - Txt.old
11 pages
Lecture Notes On CNC
80% (5)
Lecture Notes On CNC
125 pages
ICT) Security Policy and Guideline, 2015 (PDFDrive) PDF
No ratings yet
ICT) Security Policy and Guideline, 2015 (PDFDrive) PDF
67 pages
Calling A Program in Totally Free RPG @
No ratings yet
Calling A Program in Totally Free RPG @
6 pages
Barangay Tetuan Online System
No ratings yet
Barangay Tetuan Online System
36 pages
Information Systems Basics Guide
No ratings yet
Information Systems Basics Guide
26 pages
UVCE BTech MediCodio JD 2025
No ratings yet
UVCE BTech MediCodio JD 2025
2 pages
Object Oriented Analysis and Design: Applying UML and Patterns Craig Larman
No ratings yet
Object Oriented Analysis and Design: Applying UML and Patterns Craig Larman
19 pages
Ms Excel Syllabus PDF
100% (2)
Ms Excel Syllabus PDF
4 pages

Python EDA for Beginners

Uploaded by

Python EDA for Beginners

Uploaded by

Exploratory data visualization using python

Hello and welcome back to the data visualization course.

You might also like