Data Science Report
Data Science Report
Module-1:-
Topic:- Significance of Data science
• Data vs Information
This can be traced by keeping in mind four key terms Data, Information,
Knowledge and Wisdom
• Types of analysis
-Descriptive: It is at the foundation of all data insight. It is the
simplest and most common use of data in business today.
Descriptive analysis answers the “what happened” by summarizing
past data, usually in the form of dashboards.
Prescriptive analysis utilizes state of the art technology and data practices.
2. Python
3. Jupyter
Topic:- Significance of Python
• Significance of python
• Features of python
Java, etc. It is very easy to code in python language and anybody can learn
2. Free and open space:- Python language is freely available at the official website
and you can download it. Since it is open-source, this means that source code is
also available to the public. So you can download it as, use it as well as share it.
• Tokens in python
Comments and its types supported by python
1. Single line comment:- these are represented by ‘#’ used only for
comments with one line code
2. Multi line comment:- these are represented by “ ‘’’xxx’’’ “ used
for comments that take up more than one line
Examples
2. Assignment operators:- Add assign(+=), Assignment(=), Subtract assign(-=),
Multiply assign(*=), divide assign(/=), modulus assign(%=), exponent assign(**=),
floor divide assign(//=)
Examples
3. Comparison operators:- Equal to(==), not equal to(!=), greater than(>), less
than(<), greater than or equal to(>=), less than or equal to(<=)
Examples
4. Identity operators:-
5. Membership operators:-
6. Bitwise operators:- And(&), OR(|), Not(~), XOR(^), left shift(<<), right shift(>>)
Topic:- Control statements in python
• Control statements
There are 3 main types of control statements in python namely,
Conditional(IF, Elif, Else), Looping(For, while), break and pass
1. Conditional:-
2. Looping:-
3. Break:-
Topic:- Functions in python
• Functions in python
A function is a block of code which only runs when it is called.
Syntax=
def function_name():
print(“Hello World”)
function_name()
Topic:- Packages and modules in python
Modules can be imported from any folder or directory of the PC and use
alias names for the ease of use.
2. Packages:-
A package, in essence, is like a directory holding sub packages and modules.
While we can create our own packages, we can also use one from the Python
Package Index (PyPI) to use for our projects.
To import a package, we type the following:
import Game.Sound.load
We can also import it giving it an alias:
import Game.Sound.load as load game
Module-3:-
1) String:-
Distinguish ways to declare strings
Distinguish ways to access string elements
Mathematical operation on strings
Possible operations applied
Implementation using python
String Declaration:-
Syntax:-
1)single quote
2)double quote
3)triple quote
->By index:-
Python supports both +ve and –ve index
+ve index read from left to right
-ve index read from right to left
Syntax:-
Str[beg_index:end_index:step_value:step_value]
By default the value is 1
Step value can be +ve or –ve
Note:-
In backward direction if end value=-1 then result would be
empty
In forward direction if end value=0 then result would be empty
Ex:-
Print(“Yash”+“veer”) //Yashveer
->Checking membership:-
It enables to check whether a character or string is member of the another
string
Ex:-
-> Comparison of Strings:-
Relational operators are used to perform string comparisons
Comparison would happen based on alphabetical order
Ex:-
->Finding substrings:-
1) Forward direction:
->Find()
Returns index of first occurrence of given string, else it returns -1
->Index()
It acts as same as find() but if given string is exists then it returns ValueError
2) Backward direction:-
rfind()
rindex()
->Joining of strings
Separator . join(list-of-strings):-Enables to join group of strings such as list or
tuple with respect to given separator
1)List:-
In python, a list is an ordered group of heterogeneous items or elements or
objects
All the items consider as single entity
Elements of a list can be accessed by an index
List can be arbitrarily nested, i.e. list can consist of another list as sublists
Variable size dynamic
List is mutable
Elements can be accessed in both direction
1)empty list:-List1=[ ]
2)static list:-List1= [10,20,30,40]
3)dynamic list:-List1=eval(input(“enter list of objects”)
4)using list( ) function:- L=list(range(0,10,2)) or str=“Yash”
5)using split( ) function:-Str=“native tech hub software company” L=str.split( )
Operations on list:-
Metadata related functions about List:-
len( ) : it returns length of the list
count( ) : it returns no.of occurrence of the given item in the list
index( ) : it returns index of the supplied item
Manipulation functions
append( ) : it append the item at the last index
insert( ) : it insert the item at specific index
extend() : It copies all the elements of one list2 to another list-1
remove() vs clear() :remove will remove a specified item where as clear would
remove all the items from the list
pop() :it removes top means last index value and returns last element
Functions of Ordering elements in list:-
reverse()
sort()
Aliasing and cloning methods
Mathematical operations on list:-
Concatenation (+)
Repetition ( * )
3) Tuple:-
Mathematical Operators:-
Concatenation (+)
Repetition ( * )
Functions:-
Len()
Count()
Index()
Sorted()
Min()
Max()
4) Set:-
1)Dictionary:-
Declaration of Dictionary DS Dictionary Data Structure:-
D={ }
D=dict ( )
We can create empty dictionaries and add entries as follows
D[100]=“Yash”
D[200]=“Singh”
D[300]=“NTH”
Print(D)
Output: { 100:”Yash”,200:”Singh”,300:”NTH”}
->Stack Memory
->Content holding by this memory is completely swiped
So that's the reason we choose files to store files permanently
Access modes:-
r->this is default mode,opens a file for reading
rb->opens a file for reading only binary
r+->both reading and writing
rb+->both reading and writing in binary
w->opens file for writing only. overwrites the file if it exists and creates new
one if it doesn't
wb->opens a file for writing only in binary format overwrites the file if it exists
and creates new one if it doesn't
a->opens file for appending
ab->opens file for appending in binary format
a+->opens file for both appending and reading
ab+->opens file for both appending and reading in binary format
w+->opens file for reading and writing
wb+->opens file for reading and writing in binary format
Renaming of file
os.rename(file_name)
deleting files
os.remove(file_name)
File.seek()
changes the current file position
File.tell()
tells the current position within the file like what will occur next
• Regular Expressions:-
RE is defined as set of symbols that are used to extract some set of strings
based on specified pattern.
Ex:-display all the python files in our computer whose extension is *.py
Such a pattern is used to locate a chunk of text in string by matching up the
pattern
Ex:-Email,Ph no etc..
In python we use the module re
Functions:-
Search():-It searches for first occurence if re pattern within string with optional
flags
sub():-Replaces on or many matches with a string
Findall():-Returns a list containing all matches
Split():-Returns a list where the string has been split at each match
Symbolic characters:-
try:{
run code
}
except:{
Execute this when there is an exception
}
else:
{
No exception run this
}
finally:
{
this code will run always
}
Python has two types of exception handling
1) predefined:-raised automtically by virtual machine when an abnormal event
occurs
2) user defined:-these fall under user domain errors and implement using raise
keyword
• Database Programming:-
Database:-
That stores data in strutured format i.e in relational DBMS
or
Shared collection of logically related data and description of data
which is designed to fulfill the needs of the organization
DBMS:-
It is a software that enables users to define,create,maintain and contol access
to the database.
ex:-mysql,oracle,sql server,db2,siebel,forpo,mongodb etc...
Application Program(Frontend)
which interacts with database by issuing an appropriate request(sql statement)
Ex:-
Client(frontend) interactes with server in server we have DBMS(Backend)
Limitations of flat file system:-
Dependency of program on physical structure
complex process to retrieve data
loss of data on concurrent access
inability to give access based on record
data redundancy
data inconsistency
only partial data recovery
• Object Oriented Programming:-
Object Oriented Programming is a way of computer programming using the
idea of “objects” to represents data and methods
It an approach used for creating neat and reusable code instead of a redundant
one
The program is divided into self-contained objects or several mini-programs.
Every Individual object represents a different part of the application having its
own logic and data to communicate within themselves.
ENTITY:-
An entity is any singular, identifiable and separate object.
CLASS:-
->Class is design entity or an object’s 1 blueprint, description, or
definition Classes are created using the keyword ‘Class’ and an
intended block, which contains class methods, attributes .
-> Every class definition in Python should start with the init
method.
-> The init method is called when an object of the class is
created.
-> This method is the constructor of the class. A constructor is a
method that we can use to initialize a new object to certain
values We should use ‘self’ as the first parameter for all methods.
Types of variables:-
Instance:-
These variable also known as object level variables
➔If the value of a variable is varied from object to object , then
such variables called instance variables
➔For every object a separate copy of instance variables will be
created
Static:- also known as Class level variables
➔If the value of a variable is not varied from object to object
➔Declaration with in the class directly but outside of methods.
Local --also known as method level variables
➔A variable is said to be local, if it declare with in the method •
Local variables will be created at the time of method execution
and destroyed once method completes.
➔Local variables of a method cannot be accessed from outside of
method.
->Inheritance:-
Inheritance allows us to inherit attributes and methods from the base/parent
class
➔ useful as we can create sub-classes and get all of the functionality from
our parent class.
➔ Add new functionalities without affecting the parent class. A constructor
is a method that we can use to initialize a new object to certain values
We should use ‘self’ as the first parameter for all methods
#DEMO ABOUT NOT POSSIBLE METHOD OVERLOADING
class test:
def m1(self):
print(“no argument method”)
def m1(self, a):
print(“one-argument method”)
def m1(self, a, b):
print(“two-argument method”)
Obj=test( )
Obj.m1()
Obj.m1(10)
Obj.m1(10,20)
This program invokes last method METHOD OVERLOADING #DEMO ABOUT
PYTHON POSSIBLE METHOD OVERLOADING class test:
def sum(self, a=None, b=None, c=None):
if a!=None and b!=None and c!=None:
print(“sum of 3 no’s”,a+b+c)
elif a!=None and b!=None: print(“sum of 2 no’s “,a+b)
else:
print(“please provide 2 or 3 arguments”)
obj=test( )
Obj.sum(10,20)
Obj.sum(10,20,5)
Obj.sum(10)
Module-4:-
• Numpy module
-> Numpy stand for numerical python
->Fundamental module for scientific computing using array
objects
->It is derived from scipy library
->Supports N-dimensional array
->To save memory and execution time, it uses
vector/matrix/tensor operations making it faster and better
than general python array
->Learned how to :
o Install and import numpy package
o
o Properties of numpy arrays
o Various functions to create arrays
o Combining arrays
o
o Converting list from arrays
o
o Array indexing
o
o Array slicing
o Array re-shaping
o
o Limitation of array arithmetic
o
o
o
o Perform vector dot product
o
o Perform scalar vector multiplication
o
o
o Matrix arithmetic
o
o Matrix duplication
o
• We also learned about different types of matrices
o Square matrix: where the rows and columns are equal
o
o Triangular matrices: upper- when Aij=0 if i>j, lower- when
Aij=0 if i<j
o Diagnol matrix
• Matrix properties:
o Transpose:
o
o
o Inverse
o
o Trace
o
o Determinant
o
• Dataframe : DataFrame is like an Excel table / PivotTable. It is a
tabular data structure comprised of rows and columns with
heterogeneous values
• Dataframe operations :
o
• Loading csv file into dataframe
o
• Filtering/slicing dataframe
o
• Sorting the dataframe
o
• Aggregating Dataframe
o
• Creating new fields based on arithmetic calculations of other
columns
o
• Descriptive analysis: DS deals with the methods for
summarizing raw observations in to information.In DS,
properties of the sample data such as accidents, prices of
goods, business,incomes, sports data, population data
o
• Mean: It is the sum of observation divided by the total number
of observations.It is also defined as average which is the sum
divided by count
o
• Mode: It is the value that has the highest frequency in the given
data set. The data set may have no mode if the frequency of all
data points is the same. Also, we can have more than one
mode if we encounter two or more data points having the
same frequency.
o
• Median: It is the middle value of the data set. It splits the data
into two halves. If the number of elements in the data set is
odd then the centre element is median and if it is even then the
median would be the average of two central elements
o
• Range: describes the difference between the largest and
smallest data point in our data set.The bigger the range, the
more is the spread of data and vice versa
o
• Variance: It is defined as an average squared deviation from the
mean.It is being calculated by finding the difference between
every data point and the average which is also known as the
mean,squaring them, adding all of them and then dividing by
the number of data points present in our data set
o
• Standard Deviation: It is defined as the square root of the variance. It
is being calculated by finding the Mean, then subtract each number from
the Mean which is also known as average and square the result. Adding
all the values and then divide by the no of terms followed the square
root
o
o
o
o
• Pie Chart
o
o
• Line plot with styling
o
• Seaborn : It is statistical data visualization Python library based
on matplotlib. It provides a high-level interface for drawing
attractive and informative statistical graphics.
• Features of seaborn:
o A dataset-oriented API for examining relationships
between multiple variables
o Specialized support for using categorical variables to show
observations or aggregate statistics
o Options for visualizing univariate or bivariate distributions
and for comparing them between subsets of data
o Building model
o
• Detailed steps in training and testing in machine learning
o Collecting data
o Data wrangling
o Analyze data
o Train algorithm
o Test algorithm
o Deployment
• Machine Learning Algorithms
o
• mean () is the average value for the variable in our
dataset. The xi and yi refer to the fact that we need to
repeat these calculations across all values in our dataset
and i refers to the i’th value of x or y.
• 𝑩𝟎 = 𝒎𝒆𝒂𝒏(𝒚) − 𝑩𝟏 ∗ 𝒎𝒆𝒂𝒏(𝒙)
• Make predictions
o y=B0+B1*x
• Scatter the plot and estimate erros using r^2 value
• Python process is execute linear regression
o Import the csv file and read the data
o Split data in train and test
o
o
o Predict the results using test data
o
o Create a scatter plot and a line plot of the actual
data to check how accurate you are
o
• Steps in SLR
o Calculate mean and variance
o
o Calculate the covariance
o
o Apply simple linear regression
o
o Calculate mean square error
o
o Evaluate regression algorithm on training dataset
• Brief steps:
o Logistic Regression Model
▪ Prediction=0 if P (variable) < 0.5
▪ Prediction=1 if P (variable) >=0.5
▪
o Find coefficients of Logistic Regression using stochastic
gradient descent
o Calculate New Coefficients
o Make prediction
o Find an accuracy of Logistic Regression Model
• Python steps to execute logistic regression
o Import the dataset, read it, perform cleaning operations
o Split the data in training and testing
▪
o Fit it into logistic regression module
▪
o Predict the results
▪
o Create the confusion matrix which will give us the number
of correct and incorrect predictions
▪
o Calculate the accuracy
o Remove punctuations
▪
o Perform stemming: A root is a form which is not further
analyzable In the form ‘untouchables’ the root is ‘touch’,
to which first the suffix ‘-able’, then the prefix ‘un-‘ and
finally the suffix ‘-s’ have been added. In a compound
word like ‘wheelchair’ there are two roots, ‘wheel’ and
‘chair’.Derivational Morphology Derivational morphology
results in the creation of a new word with a new meaning
Inflectional Morphology In contrast, inflectional
morphology involves an obligatory grammatical
specification A stem is of concern only when dealing with
inflectional morphology.(which doesn't change the
coremeaning) In the form ‘untouchables’ the stem is
‘untouchable’, although in the form ‘touched’ the stem is
‘touch’; in the form ‘wheelchairs’ the stem is ‘wheelchair’,
even though the stem contains two roots('wheel' and
'chair')
•
o
•