3rd Week Report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

WEEKLY REPORT OF DATA ANALYSIS USING PYTHON

Name of the Student and Roll No. Nandini Singh / 220617005


Name of the Company Samatrix.io
Period of the Report Week 1st / 2nd / 3rd / 4th / 3rd
5th / 6th / 7th / 8th / 9th / 10th
Activities undertaken during the week Details of the activity:
Pandas, DataFrame functions, Different types of
functions like , describe, info, columns.
Project : Data set importing.
Various EDA functions on Dataset.
Exploratory data Analyis ( EDA)
Data visualization in pandas.
Using of data in kaggle.
Details of field trips under taken (if any) and As it is an Virtual Internship, so in this no such field
summary of results of such trips trips are taken yet.
Learning Points acquired from above activities We got to learn about many activities done in this
week like:-
 Pandas DataFrame is two-
dimensional size-mutable,
potentially heterogeneous tabular
data structure with labeled axes
(rows and columns).
 Pandas DataFrame consists of three
principal components, the data,
rows, and columns.
 Pandas DataFrame will be created
by loading the datasets from
existing storage, storage can be
SQL Database, CSV file, and Excel
file. Pandas DataFrame can be
created from the lists, dictionary,
and from a list of dictionary etc.
 Creating a dataframe using List.
 Dealing with Rows and Columns
 Indexing and Selecting Data.
 Working with Missing Data.
 Iterating over rows and columns.
 Dropping missing values using
dropna() :
 DataFrame Methods:
 FUNCTIONDESCRIPTION
 index() Method returns index (row
labels) of the DataFrame
 insert() Method inserts a column
into a DataFrame.
 add()Method returns addition of
dataframe and other, element-wise
(binary operator add).
 sub()Method returns subtraction of
dataframe and other, element-wise
(binary operator sub).
 mul()Method returns multiplication
of dataframe and other, element-
wise .
 div()Method returns floating
division of dataframe and other,
element-wise.
 unique()Method extracts the
unique values in the dataframe
 nunique()Method returns count of
the unique values in the dataframe.
 value_counts() Method counts the
number of times each unique value
occurs within the Series.
 columns() Method returns
the column labels of the DataFrame
 axes() Method returns a list
representing the axes of the
DataFrame.
 isnull() Method creates a Boolean
Series for extracting rows with null
values.
 notnull()Method creates a Boolean
Series for extracting rows with non-
null values.
 between()Method extracts rows
where a column value falls in
between a predefined range.
 isin() Method extracts rows from
a DataFrame where a column value
exists in a predefined collection.
 dtypes()Method returns a Series
with the data type of each column.
The result’s index is the original
DataFrame’s columns.
 astype()Method converts the data
types in a Series.
 values() Method returns a Numpy
representation of the DataFrame
i.e. only the values in the
DataFrame will be returned, the
axes labels will be removed.
 sort_values()- Set1, Set2
Method sorts a data frame
in Ascending or Descending order
of passed Column.
 sort_index() Method sorts the
values in a DataFrame based on
their index positions or labels
instead of their values but
sometimes a data frame is made
out of two or more data frames
and hence later index can be
changed using this method.
 loc[] Method retrieves rows
based on index label.
 iloc[] Method retrieves rows
based on index position.
 ix[] Method retrieves
DataFrame rows based on either
index label or index position. This
method combines the best features
of the .loc[] and .iloc[] methods.
 rename() Method is called on
a DataFrame to change the names
of the index labels or column
names.
 columns()Method is an alternative
attribute to change the coloumn
name.
 drop()Method is used to delete
rows or columns from a DataFrame
 pop()Method is used to delete
rows or columns from a DataFrame
 sample()Method pulls out a
random sample of rows or columns
from a DataFrame.
 nsmallest()Method pulls out the
rows with the smallest values in a
column.
 nlargest()Method pulls out the
rows with the largest values in a
column.
 shape() Method returns a tuple
representing the dimensionality of
the DataFrame.
 ndim()Method returns an ‘int’
representing the number of axes /
array dimensions.
 Returns 1 if Series, otherwise
returns 2 if DataFrame.
 dropna()Method allows the user to
analyze and drop Rows/Columns
with Null values in different ways
 fillna()Method manages and let the
user replace NaN values with some
value of their own.
 rank()Values in a Series can be
ranked in order with this method
 query() Method is an alternate
string-based syntax for extracting a
subset from a DataFrame.
 copy()Method creates an
independent copy of a pandas
object.
 duplicated()Method creates a
Boolean Series and uses it to
extract rows that have duplicate
values.
 drop_duplicates()Method is an
alternative option to identifying
duplicate rows and removing them
through filtering.
 set_index()Method sets the
DataFrame index (row labels) using
one or more existing columns.
 reset_index()Method resets index
of a Data Frame. This method sets
a list of integer ranging from 0 to
length of data as index.
 where() Method is used to check a
Data Frame for one or more
condition and return the result
accordingly. By default, the rows
not satisfying the condition are
filled with NaN value.
 EDA is applied to investigate the
data and summarize the key
insights.
 It will give us the basic
understanding of our data, it’s
distribution, null values and much
more.
 We can either explore data using
graphs or through some python
functions.
 There will be two type of analysis.
Univariate and Bivariate. In the
univariate, we will be analyzing a
single attribute. But in the
bivariate, we will be analyzing an
attribute with the target attribute.
 In the non-graphical approach, we
will be using functions such as
shape, summary, describe, isnull,
info, datatypes and more.
 In the graphical approach, we will
be using plots such as scatter, box,
bar, density and correlation plots.
 Data Visualization with Pandas is
the presentation of data in a
graphical format. It helps people
understand the significance of data
by summarizing and presenting a
huge amount of data in a simple
and easy-to-understand format and
helps communicate information
clearly and effectively.
 Pandas DataFrame Plots
 There are several plot types built-in
to pandas, most of them statistical
plots by nature:
 df.plot.area
 df.plot.barh
 df.plot.density
 df.plot.hist
 df.plot.line
 df.plot.scatter
 df.plot.bar
 df.plot.box
 df.plot.hexbin
 df.plot.kde
 df.plot.pie
 This is the different types of
dataframe by which one can
Visualize there data or datasets.
 Kaggle is the world's largest data
science community with powerful
tools and resources to help us
achieve our data science goals.
 Using of data in kaggle.
 allows us to create our own custom
datasets, share them with others
and easily import them into our
notebooks. Additionally, we can
add private datasets which would
only be visible to us.
 The different types of dataset in
kaggle are integers, floats,
booleans, and strings.
 Kaggle also supports special
BigQuery Datasets.
 These are the learning points which
I have learnt from the above
activity.
Plan for the next week Project ( Pandas and Data Visualization.)
Any leave taken during the week No
Any other point No such other points as of now .

You might also like