Data Science - Sec3

Uploaded by

abdallahmostafa1836

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views27 pages

Data Science - Sec3

Uploaded by

abdallahmostafa1836

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Data

Science

Section3
Pandas
Pandas

• Pandas is a Python library used for working with data sets.

• It has functions for analyzing, cleaning, exploring, and manipulating data.
• The name "Pandas" has a reference to both "Panel Data", and "Python Data
Analysis" and was created by Wes McKinney in 2008.
• Pandas allows us to analyze big data and make conclusions based on statistical
theories.
• Pandas can clean messy data sets and make them readable and relevant.
• Relevant data is very important in data science.
Pandas

• Pandas is a tool for data processing which helps in data analysis

• It provides functions and methods to efficiently manipulate large datasets.
• Data structure in Pandas :
• Series (one-dimensional array)
• DataFrame (two-dimensional array)
Pandas

• Install pandas :
• Pandas is usually imported under the pd alias.
▪ alias: In Python alias are an alternate name for referring to the same thing.
• the two most common terms used in Pandas :
▪ Series
▪ Dataframe
Series
• It is a one-dimensional array holding data of any type.
• A Pandas Series is like a column in a table.
• Labels in series:
▪ If nothing else is specified, the values are labeled with their index number. First
value has index 0, second value has index 1 etc.
▪ With the index argument, you can name your own labels.

Custom index
Default index
Access Data in Series
• Panel Series support both label based, and position-based indexing.
• Example1 : access elements by label.
• Example2 : access elements by position.
Slicing in Series
• Example1 : Slicing by labels.
• [start_label : end_label]
• Including both
• Example2 : Slicing by positions.
• [start_index : end_index]
• End index not included.
• We can check size of series using
size method and get shape of
series using shape method.
DataFrame
• A Pandas DataFrame is a 2-
dimensional data structure,
like a 2-dimensional array, or a
table with rows and columns.
• Create a simple Pandas
DataFrame using a dictionary:
DataFrame
• Create a simple Pandas DataFrame using a nested lists:
DataFrame
• Pandas use the loc attribute to return one or more row(s)
DataFrame
• Pandas can also use the loc
attribute to return specified rows
without slicing.
CSV File
• A simple way to store big data
sets is to use CSV files
(comma separated files).
• Create CSV file :
CSV File
• Load the CSV into a DataFrame:
•
Excel File
• Create and Load the Excel file
into a DataFrame:
•
Exploratory analysis using
pandas
• Load the data.csv file into a
DataFrame ,then print it:
• If you have a large
DataFrame with many rows,
Pandas will only return the
first 5 rows, and the last 5
rows
•
Viewing the Data

• The head() method returns the headers

and a specified number of rows, starting
from the top.
• Note: if the number of rows is not specified,
the head() method will return the top 5
rows.
• The tail() method returns the headers
and a specified number of rows, starting
from the bottom.
Viewing the Data
• The DataFrames object has a method called
info(), that gives you more information about
the data set.
• The info() method also tells us how many Non-
Null values there are present in each column,
and in our data set it seems like there are 164
of 169 Non-Null values in the "Calories" column.
• Which means that there are 5 rows with no value at
all, in the "Calories" column, for whatever reason.
• Empty values, or Null values, can be bad when
analyzing data, and you should consider removing
rows with empty values. This is a step towards what
is called cleaning data
Viewing the Data
• Example1,2 : enable us to extract
specific subsets of data based on
defined condition.
• The output of the conditional expression
(>, but also ==, !=, <, <=,… would
work) is actually a pandas Series of
boolean values (either True or False)
with the same number of rows as the
original DataFrame. Such a Series of
boolean values can be used to filter the
DataFrame by putting it in between the
selection brackets []. Only rows for
which the value is True will be selected.
Viewing the Data
• Example : Select specific columns
Isin() method
• The isin() method checks if the
Dataframe contains the specified
value(s).
• Example1: Return rows that have values
80 or 90 in the “Duration” column.
Practical section
Steps:
• Download data set from this link :
• https://tinyurl.com/Sec3DS
• Import pandas
• Load “CardioGoodFitness.csv” file

Azure DevOps Interview Questions & Answers
No ratings yet
Azure DevOps Interview Questions & Answers
196 pages
Practical Guide To Pandas For Data Science
100% (1)
Practical Guide To Pandas For Data Science
26 pages
Schneider Electric
No ratings yet
Schneider Electric
40 pages
Consume Azure Machine Learning Models in Power BI - Tutorial
No ratings yet
Consume Azure Machine Learning Models in Power BI - Tutorial
860 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Pandas
No ratings yet
Pandas
41 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
SAP - EHS Regulatory Content
No ratings yet
SAP - EHS Regulatory Content
174 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Percentiles Quartiles Scaffolded Xabdw2
33% (3)
Percentiles Quartiles Scaffolded Xabdw2
2 pages
Chapter 1 Data Science
No ratings yet
Chapter 1 Data Science
18 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
25 pages
Pandas
No ratings yet
Pandas
41 pages
Kafka Notes
No ratings yet
Kafka Notes
7 pages
Phan1 Pandas Numpy Matplotlib
No ratings yet
Phan1 Pandas Numpy Matplotlib
158 pages
Ufgs 01 33 16.00 10 Design Data (Design After Award)
No ratings yet
Ufgs 01 33 16.00 10 Design Data (Design After Award)
58 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
2 Pandas
No ratings yet
2 Pandas
22 pages
Pandas Basics
No ratings yet
Pandas Basics
21 pages
Basic Bioinformatics Syllabus
No ratings yet
Basic Bioinformatics Syllabus
2 pages
SAP Hybris V6 Certified Development Professional - Study Guide
No ratings yet
SAP Hybris V6 Certified Development Professional - Study Guide
261 pages
CRM - Part 2 - Strategic CRM
No ratings yet
CRM - Part 2 - Strategic CRM
48 pages
MCA - II & III Years Syllabus
No ratings yet
MCA - II & III Years Syllabus
94 pages
Mdad - Numpy ML
No ratings yet
Mdad - Numpy ML
85 pages
Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Pandas in Python
No ratings yet
Pandas in Python
59 pages
13
No ratings yet
13
41 pages
Unit 4
No ratings yet
Unit 4
36 pages
Dbms Merged Notes
No ratings yet
Dbms Merged Notes
70 pages
Advance Spreadsheet Skills: Lesson: Worksheet Basics & Navigation Level: Beginner
No ratings yet
Advance Spreadsheet Skills: Lesson: Worksheet Basics & Navigation Level: Beginner
39 pages
SQL Online Interview QA2
No ratings yet
SQL Online Interview QA2
58 pages
Unit6 - Working With Data
No ratings yet
Unit6 - Working With Data
29 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Dbit DBMS
No ratings yet
Dbit DBMS
23 pages
Python 3rd Unit Question and Answer
No ratings yet
Python 3rd Unit Question and Answer
25 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Human Capital Management System
No ratings yet
Human Capital Management System
21 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
Unit 5
No ratings yet
Unit 5
18 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
Pandas
No ratings yet
Pandas
16 pages
Python Pandas
No ratings yet
Python Pandas
21 pages
GEA1000 Notes
No ratings yet
GEA1000 Notes
27 pages
Module 6
No ratings yet
Module 6
48 pages
Pandas
No ratings yet
Pandas
21 pages
Concurrency Control
No ratings yet
Concurrency Control
42 pages
Pandas
No ratings yet
Pandas
29 pages
Pandas Notes
No ratings yet
Pandas Notes
44 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
14 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Lecture 7 Understanding Dataframes in Python and R
No ratings yet
Lecture 7 Understanding Dataframes in Python and R
17 pages
Pandas
No ratings yet
Pandas
13 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
FDS Notes Unit-4
No ratings yet
FDS Notes Unit-4
30 pages
Exercise 3
No ratings yet
Exercise 3
12 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Pandas AI
No ratings yet
Pandas AI
14 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Pandas 1
No ratings yet
Pandas 1
50 pages
Pandas Notes
No ratings yet
Pandas Notes
10 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Asfasdas
No ratings yet
Asfasdas
36 pages
Pandas
No ratings yet
Pandas
13 pages
Unit V Pandas AIML A B Lastupdated 18-06-2024
No ratings yet
Unit V Pandas AIML A B Lastupdated 18-06-2024
33 pages
Bank MGMT System
No ratings yet
Bank MGMT System
15 pages
Lecture 2-Intro To DSA - 071646
No ratings yet
Lecture 2-Intro To DSA - 071646
22 pages
Database: Zhavlon Khamidov
No ratings yet
Database: Zhavlon Khamidov
10 pages
Machine Learning - Section #4 (Pandas)
No ratings yet
Machine Learning - Section #4 (Pandas)
18 pages
Pandas (Assignment 3)
No ratings yet
Pandas (Assignment 3)
24 pages
JOINS
No ratings yet
JOINS
10 pages
Pandas
No ratings yet
Pandas
7 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Unit 3
No ratings yet
Unit 3
10 pages
Pandas
No ratings yet
Pandas
4 pages
Project Synopsis
No ratings yet
Project Synopsis
5 pages
Dbms Lab # 4: SQL Wildcards & Operators
No ratings yet
Dbms Lab # 4: SQL Wildcards & Operators
10 pages
Notes On Pandas.
No ratings yet
Notes On Pandas.
7 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
Unit 3
No ratings yet
Unit 3
14 pages
Backup Policy
No ratings yet
Backup Policy
7 pages
Unit 6 CF & Oa
No ratings yet
Unit 6 CF & Oa
4 pages
Vigi Tools and Methods
No ratings yet
Vigi Tools and Methods
2 pages
Learning JavaScript Data Structures and Algorithms - Second Edition
From Everand
Learning JavaScript Data Structures and Algorithms - Second Edition
Loiane Groner
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet

Data Science - Sec3

Uploaded by

Data Science - Sec3

Uploaded by

Data

• Pandas is a Python library used for working with data sets.

• Pandas is a tool for data processing which helps in data analysis

• The head() method returns the headers

You might also like