Data Science - Sec3
Data Science - Sec3
Data Science - Sec3
Science
Section3
Pandas
Pandas
• Install pandas :
• Pandas is usually imported under the pd alias.
▪ alias: In Python alias are an alternate name for referring to the same thing.
• the two most common terms used in Pandas :
▪ Series
▪ Dataframe
Series
• It is a one-dimensional array holding data of any type.
• A Pandas Series is like a column in a table.
• Labels in series:
▪ If nothing else is specified, the values are labeled with their index number. First
value has index 0, second value has index 1 etc.
▪ With the index argument, you can name your own labels.
Custom index
Default index
Access Data in Series
• Panel Series support both label based, and position-based indexing.
• Example1 : access elements by label.
• Example2 : access elements by position.
Slicing in Series
• Example1 : Slicing by labels.
• [start_label : end_label]
• Including both
• Example2 : Slicing by positions.
• [start_index : end_index]
• End index not included.
• We can check size of series using
size method and get shape of
series using shape method.
DataFrame
• A Pandas DataFrame is a 2-
dimensional data structure,
like a 2-dimensional array, or a
table with rows and columns.
• Create a simple Pandas
DataFrame using a dictionary:
DataFrame
• Create a simple Pandas DataFrame using a nested lists:
DataFrame
• Pandas use the loc attribute to return one or more row(s)
DataFrame
• Pandas can also use the loc
attribute to return specified rows
without slicing.
CSV File
• A simple way to store big data
sets is to use CSV files
(comma separated files).
• Create CSV file :
CSV File
• Load the CSV into a DataFrame:
•
Excel File
• Create and Load the Excel file
into a DataFrame:
•
Exploratory analysis using
pandas
• Load the data.csv file into a
DataFrame ,then print it:
• If you have a large
DataFrame with many rows,
Pandas will only return the
first 5 rows, and the last 5
rows
•
Viewing the Data