3rd Week Report

The document discusses activities undertaken during a virtual internship including learning about pandas DataFrame functions, EDA, and data visualization. It provides details on various DataFrame methods and functions learned for exploring, cleaning, and manipulating data in pandas.

Uploaded by

Nandini Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views7 pages

3rd Week Report

Uploaded by

Nandini Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

WEEKLY REPORT OF DATA ANALYSIS USING PYTHON

Name of the Student and Roll No. Nandini Singh / 220617005

Name of the Company Samatrix.io
Period of the Report Week 1st / 2nd / 3rd / 4th / 3rd
5th / 6th / 7th / 8th / 9th / 10th
Activities undertaken during the week Details of the activity:
Pandas, DataFrame functions, Different types of
functions like , describe, info, columns.
Project : Data set importing.
Various EDA functions on Dataset.
Exploratory data Analyis ( EDA)
Data visualization in pandas.
Using of data in kaggle.
Details of field trips under taken (if any) and As it is an Virtual Internship, so in this no such field
summary of results of such trips trips are taken yet.
Learning Points acquired from above activities We got to learn about many activities done in this
week like:-
 Pandas DataFrame is two-
dimensional size-mutable,
potentially heterogeneous tabular
data structure with labeled axes
(rows and columns).
 Pandas DataFrame consists of three
principal components, the data,
rows, and columns.
 Pandas DataFrame will be created
by loading the datasets from
existing storage, storage can be
SQL Database, CSV file, and Excel
file. Pandas DataFrame can be
created from the lists, dictionary,
and from a list of dictionary etc.
 Creating a dataframe using List.
 Dealing with Rows and Columns
 Indexing and Selecting Data.
 Working with Missing Data.
 Iterating over rows and columns.
 Dropping missing values using
dropna() :
 DataFrame Methods:
 FUNCTIONDESCRIPTION
 index() Method returns index (row
labels) of the DataFrame
 insert() Method inserts a column
into a DataFrame.
 add()Method returns addition of
dataframe and other, element-wise
(binary operator add).
 sub()Method returns subtraction of
dataframe and other, element-wise
(binary operator sub).
 mul()Method returns multiplication
of dataframe and other, element-
wise .
 div()Method returns floating
division of dataframe and other,
element-wise.
 unique()Method extracts the
unique values in the dataframe
 nunique()Method returns count of
the unique values in the dataframe.
 value_counts() Method counts the
number of times each unique value
occurs within the Series.
 columns() Method returns
the column labels of the DataFrame
 axes() Method returns a list
representing the axes of the
DataFrame.
 isnull() Method creates a Boolean
Series for extracting rows with null
values.
 notnull()Method creates a Boolean
Series for extracting rows with non-
null values.
 between()Method extracts rows
where a column value falls in
between a predefined range.
 isin() Method extracts rows from
a DataFrame where a column value
exists in a predefined collection.
 dtypes()Method returns a Series
with the data type of each column.
The result’s index is the original
DataFrame’s columns.
 astype()Method converts the data
types in a Series.
 values() Method returns a Numpy
representation of the DataFrame
i.e. only the values in the
DataFrame will be returned, the
axes labels will be removed.
 sort_values()- Set1, Set2
Method sorts a data frame
in Ascending or Descending order
of passed Column.
 sort_index() Method sorts the
values in a DataFrame based on
their index positions or labels
instead of their values but
sometimes a data frame is made
out of two or more data frames
and hence later index can be
changed using this method.
 loc[] Method retrieves rows
based on index label.
 iloc[] Method retrieves rows
based on index position.
 ix[] Method retrieves
DataFrame rows based on either
index label or index position. This
method combines the best features
of the .loc[] and .iloc[] methods.
 rename() Method is called on
a DataFrame to change the names
of the index labels or column
names.
 columns()Method is an alternative
attribute to change the coloumn
name.
 drop()Method is used to delete
rows or columns from a DataFrame
 pop()Method is used to delete
rows or columns from a DataFrame
 sample()Method pulls out a
random sample of rows or columns
from a DataFrame.
 nsmallest()Method pulls out the
rows with the smallest values in a
column.
 nlargest()Method pulls out the
rows with the largest values in a
column.
 shape() Method returns a tuple
representing the dimensionality of
the DataFrame.
 ndim()Method returns an ‘int’
representing the number of axes /
array dimensions.
 Returns 1 if Series, otherwise
returns 2 if DataFrame.
 dropna()Method allows the user to
analyze and drop Rows/Columns
with Null values in different ways
 fillna()Method manages and let the
user replace NaN values with some
value of their own.
 rank()Values in a Series can be
ranked in order with this method
 query() Method is an alternate
string-based syntax for extracting a
subset from a DataFrame.
 copy()Method creates an
independent copy of a pandas
object.
 duplicated()Method creates a
Boolean Series and uses it to
extract rows that have duplicate
values.
 drop_duplicates()Method is an
alternative option to identifying
duplicate rows and removing them
through filtering.
 set_index()Method sets the
DataFrame index (row labels) using
one or more existing columns.
 reset_index()Method resets index
of a Data Frame. This method sets
a list of integer ranging from 0 to
length of data as index.
 where() Method is used to check a
Data Frame for one or more
condition and return the result
accordingly. By default, the rows
not satisfying the condition are
filled with NaN value.
 EDA is applied to investigate the
data and summarize the key
insights.
 It will give us the basic
understanding of our data, it’s
distribution, null values and much
more.
 We can either explore data using
graphs or through some python
functions.
 There will be two type of analysis.
Univariate and Bivariate. In the
univariate, we will be analyzing a
single attribute. But in the
bivariate, we will be analyzing an
attribute with the target attribute.
 In the non-graphical approach, we
will be using functions such as
shape, summary, describe, isnull,
info, datatypes and more.
 In the graphical approach, we will
be using plots such as scatter, box,
bar, density and correlation plots.
 Data Visualization with Pandas is
the presentation of data in a
graphical format. It helps people
understand the significance of data
by summarizing and presenting a
huge amount of data in a simple
and easy-to-understand format and
helps communicate information
clearly and effectively.
 Pandas DataFrame Plots
 There are several plot types built-in
to pandas, most of them statistical
plots by nature:
 df.plot.area
 df.plot.barh
 df.plot.density
 df.plot.hist
 df.plot.line
 df.plot.scatter
 df.plot.bar
 df.plot.box
 df.plot.hexbin
 df.plot.kde
 df.plot.pie
 This is the different types of
dataframe by which one can
Visualize there data or datasets.
 Kaggle is the world's largest data
science community with powerful
tools and resources to help us
achieve our data science goals.
 Using of data in kaggle.
 allows us to create our own custom
datasets, share them with others
and easily import them into our
notebooks. Additionally, we can
add private datasets which would
only be visible to us.
 The different types of dataset in
kaggle are integers, floats,
booleans, and strings.
 Kaggle also supports special
BigQuery Datasets.
 These are the learning points which
I have learnt from the above
activity.
Plan for the next week Project ( Pandas and Data Visualization.)
Any leave taken during the week No
Any other point No such other points as of now .

Phan1 Pandas Numpy Matplotlib
No ratings yet
Phan1 Pandas Numpy Matplotlib
158 pages
04 Getting Started With Pandas
No ratings yet
04 Getting Started With Pandas
85 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
63 pages
Rajni Ip File Final
No ratings yet
Rajni Ip File Final
42 pages
Dav 2 Unit
No ratings yet
Dav 2 Unit
55 pages
Data Handling Part Ii
No ratings yet
Data Handling Part Ii
41 pages
Data Frame
No ratings yet
Data Frame
95 pages
Dataframe
No ratings yet
Dataframe
23 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Class Xii Information Practices PPT On Data Handling Using Pandas-I
No ratings yet
Class Xii Information Practices PPT On Data Handling Using Pandas-I
64 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
Python For ML
No ratings yet
Python For ML
41 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
47 pages
Lists and Dictionary
No ratings yet
Lists and Dictionary
17 pages
Python Programming For Data Science
No ratings yet
Python Programming For Data Science
36 pages
Ip Project
No ratings yet
Ip Project
21 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
Pandas, Numpy, Matplotlib
No ratings yet
Pandas, Numpy, Matplotlib
11 pages
Lab 1 ML Lab
No ratings yet
Lab 1 ML Lab
15 pages
Pandas
No ratings yet
Pandas
25 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
Pandas
No ratings yet
Pandas
29 pages
1 Data Handling Using Pandas 1
No ratings yet
1 Data Handling Using Pandas 1
63 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Data Frames
No ratings yet
Data Frames
60 pages
Series and Pandas Methods
No ratings yet
Series and Pandas Methods
5 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
Lab 9
No ratings yet
Lab 9
9 pages
Python For Statistics
No ratings yet
Python For Statistics
40 pages
סיכום פקודות יוניטים
No ratings yet
סיכום פקודות יוניטים
3 pages
Data Handling Module
No ratings yet
Data Handling Module
10 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Pandas
No ratings yet
Pandas
25 pages
Pandas Dataframe Methods Structured
No ratings yet
Pandas Dataframe Methods Structured
3 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
Pandas
No ratings yet
Pandas
27 pages
Pandas
No ratings yet
Pandas
12 pages
DataFrame 1
No ratings yet
DataFrame 1
3 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Course - Introduction To Data Science (SD211105)
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
What Is Pandas
No ratings yet
What Is Pandas
9 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Class XII IP Key Points (Python Pandas)
No ratings yet
Class XII IP Key Points (Python Pandas)
5 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Python ClassXII AI
No ratings yet
Python ClassXII AI
4 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Data Frame in Panda 01
No ratings yet
Data Frame in Panda 01
9 pages
Pandas Notes
No ratings yet
Pandas Notes
6 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Pandas
No ratings yet
Pandas
5 pages
Google Cloud Platform
No ratings yet
Google Cloud Platform
17 pages
Pandas
No ratings yet
Pandas
8 pages
CLASS 12 ComputerScience SQP With Marking Scheme (2024-25) - 2
No ratings yet
CLASS 12 ComputerScience SQP With Marking Scheme (2024-25) - 2
43 pages
SQL Server Interview Questions & Answers Book
No ratings yet
SQL Server Interview Questions & Answers Book
75 pages
Unit - 3 - IDC by SS
No ratings yet
Unit - 3 - IDC by SS
22 pages
NCA 6.5 Demo
No ratings yet
NCA 6.5 Demo
5 pages
CSF011G04 - OS Application & Database Security
No ratings yet
CSF011G04 - OS Application & Database Security
40 pages
Prathima - Data Analyst
No ratings yet
Prathima - Data Analyst
6 pages
Path - Web Developers - OpenClassrooms
0% (1)
Path - Web Developers - OpenClassrooms
15 pages
Aws General
No ratings yet
Aws General
325 pages
Student Attendance System Using QR Code
No ratings yet
Student Attendance System Using QR Code
9 pages
Version-52 1178745807
No ratings yet
Version-52 1178745807
262 pages
Term 2 Practical File SQL
No ratings yet
Term 2 Practical File SQL
12 pages
Automating SQL Server Management
No ratings yet
Automating SQL Server Management
20 pages
Config
No ratings yet
Config
17 pages
Jaggia BA 1e Chap002 PPT
No ratings yet
Jaggia BA 1e Chap002 PPT
35 pages
DBMS LAB 6 DDL and Constraints 12042022 023404am 29032023 085018am
No ratings yet
DBMS LAB 6 DDL and Constraints 12042022 023404am 29032023 085018am
43 pages
SAP Basis
No ratings yet
SAP Basis
5 pages
Unit 3
No ratings yet
Unit 3
15 pages
Advanced Query Tuning With IBM Data Studio
No ratings yet
Advanced Query Tuning With IBM Data Studio
59 pages
DBMS Syllabus
No ratings yet
DBMS Syllabus
2 pages
Role of Government in Labour Welfare Group18 Detailed
No ratings yet
Role of Government in Labour Welfare Group18 Detailed
12 pages
How To Build A Data Science Portfolio
No ratings yet
How To Build A Data Science Portfolio
17 pages
GL122 Probability and Statistics 2019 1
No ratings yet
GL122 Probability and Statistics 2019 1
6 pages
Software Design Document
No ratings yet
Software Design Document
16 pages
New CV
No ratings yet
New CV
5 pages
Introduction To DBMS and RDBMS: Data
No ratings yet
Introduction To DBMS and RDBMS: Data
82 pages
Next Generation Computing: Anjalai Ammal Mahalingam Engineering College Kovilvenni, India
No ratings yet
Next Generation Computing: Anjalai Ammal Mahalingam Engineering College Kovilvenni, India
49 pages
Numpy: Usage For Data Analysis Operations
No ratings yet
Numpy: Usage For Data Analysis Operations
20 pages
2nd Week Report
No ratings yet
2nd Week Report
7 pages
Recon NG 5.x Cheat Sheet Sheet1 1
No ratings yet
Recon NG 5.x Cheat Sheet Sheet1 1
1 page
Resume Raviteja Madishetty PDF
No ratings yet
Resume Raviteja Madishetty PDF
3 pages
Venkatesh Resume
No ratings yet
Venkatesh Resume
1 page
4th Week Report
No ratings yet
4th Week Report
3 pages
Python Cast Data Types
No ratings yet
Python Cast Data Types
4 pages
Umair Latif
No ratings yet
Umair Latif
4 pages
Article
No ratings yet
Article
2 pages
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
From Everand
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
Matthew Rosch
No ratings yet

3rd Week Report

Uploaded by

3rd Week Report

Uploaded by

WEEKLY REPORT OF DATA ANALYSIS USING PYTHON

Name of the Student and Roll No. Nandini Singh / 220617005

You might also like