0% found this document useful (0 votes)
11 views

Data Science Report

Uploaded by

manimandarapu3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Data Science Report

Uploaded by

manimandarapu3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 126

Data Science Internship report module wise

Module-1:-
Topic:- Significance of Data science

• What is Data Science?


Data science is the field of study that combines domain expertise, programming skills, and
knowledge of mathematics and statistics to extract meaningful insights from data .... In turn,
these systems generate insights which analysts and business users can translate into
tangible business value.

• What is the difference between AI and DS?

• Inductive reasoning vs Deductive reasoning


Inductive Deductive
Make use of certain observations and Derives conclusions based on
thereafter deduces conclusions generalised proven theories
Conclusions are either strong or weak Conclusions are either valid or invlaid
Ex:- Tomorrow the teacher is likely to Ex:- A->B; B->C
use ppt for lesson Then A->C

• Data vs Information
This can be traced by keeping in mind four key terms Data, Information,
Knowledge and Wisdom

• Types of analysis
-Descriptive: It is at the foundation of all data insight. It is the
simplest and most common use of data in business today.
Descriptive analysis answers the “what happened” by summarizing
past data, usually in the form of dashboards.

-Diagnostic: After asking the main question of “what happened”,the


next step is to dive deeper and ask why did it happen? This is where
diagnostic analysis comes in.

Diagnostic analysis takes the insights found from descriptive


analytics and drills down to find the causes of those outcomes.
Organizations make use of this type of analytics as it creates more
connections between data and identifies patterns of behavior.

-Predictive: Predictive analysis attempts to answer the question


“what is likely to happen”. This type of analytics utilizes previous
data to make predictions about future outcomes.

This type of analysis is another step up from the descriptive and


diagnostic analyses.

-Prescriptive: Prescriptive analysis is the frontier of data analysis,


combining the insight from all previous analyses to determine the
course of action to take in a current problem or decision.

Prescriptive analysis utilizes state of the art technology and data practices.

• Scope of Data science


Data science has a very huge and vast scope, it can be applied on any field
such as IoT, Big data, Cloud computing, business analytics, etc
• Data science life cycle
The DSLC can be traced through 6 phases(Business Understanding, Data
collection, Data Wrangling, Model Planning and building, Analysis, Test
model)
Topic:- Data science Pre-requisites
• Softwares used for Datascience
1. R and R Studio

2. Python
3. Jupyter
Topic:- Significance of Python

• Significance of python

Data science consulting organizations are empowering their


group of developers and data scientists to utilize Python as a
programming language. Python has gotten well known and the
most significant programming language in an extremely brief
timeframe.

Data Scientists need to manage a large amount of data known as


big data. With simple utilization and a huge arrangement of
python libraries, Python has become a popular choice to deal
with big data.

• Features of python

1. Easy to code:- Python is a high-level programming language. Python is very easy

to learn the language as compared to other languages like C, C#, Javascript,

Java, etc. It is very easy to code in python language and anybody can learn

python basics in a few hours or days. It is also a developer-friendly language.

2. Free and open space:- Python language is freely available at the official website
and you can download it. Since it is open-source, this means that source code is

also available to the public. So you can download it as, use it as well as share it.

3. Object oriented:- One of the key features of python is Object-Oriented

programming. Python supports object-oriented language and concepts of

classes, objects encapsulation, etc.

4. GUI programming support:- Graphical User interfaces can be made using a

module such as PyQt5, PyQt4, wxPython, or Tk in python.


Topic:- Tokens and data types in python

• Tokens in python
Comments and its types supported by python
1. Single line comment:- these are represented by ‘#’ used only for
comments with one line code
2. Multi line comment:- these are represented by “ ‘’’xxx’’’ “ used
for comments that take up more than one line

• Types of data in python


There are mainly 4 types of data supported by any programming language
namely, Numeric(integer, real, float), string(alphanumeric data), Boolean,
NONE
1. Integer:-
a=10
print(a, type(a))
This gives an output= 10 <class ‘int’>
2. String:-
Name=”Panduranga”
print(name, type(name))
This gives an output= Panduranga <class ‘str’>
Module-2:-
Topic:- Operators in python

• Types of operators in python

1. Arithmetic operators:- +, -, *, /, %, **, //

Examples
2. Assignment operators:- Add assign(+=), Assignment(=), Subtract assign(-=),
Multiply assign(*=), divide assign(/=), modulus assign(%=), exponent assign(**=),
floor divide assign(//=)

Examples
3. Comparison operators:- Equal to(==), not equal to(!=), greater than(>), less
than(<), greater than or equal to(>=), less than or equal to(<=)

Examples

4. Identity operators:-

5. Membership operators:-
6. Bitwise operators:- And(&), OR(|), Not(~), XOR(^), left shift(<<), right shift(>>)
Topic:- Control statements in python

• Control statements
There are 3 main types of control statements in python namely,
Conditional(IF, Elif, Else), Looping(For, while), break and pass

1. Conditional:-
2. Looping:-

3. Break:-
Topic:- Functions in python
• Functions in python
A function is a block of code which only runs when it is called.

You can pass data, known as parameters, into a function.

A function can return data as a result.

Syntax=

def function_name():

print(“Hello World”)

function_name()
Topic:- Packages and modules in python

1. Modules:- A module is a Python file containing Python statements and


definitions. For example, a file evenodd.py is a module, and we call it ‘evenodd’.
We put similar code together in one module. This helps us modularize our code,
and make it much easier to deal with. And not only that, a module grants us
reusability. With a module, we don’t need to write the same code again for a
new project that we take up.

Modules can be imported from any folder or directory of the PC and use
alias names for the ease of use.

2. Packages:-
A package, in essence, is like a directory holding sub packages and modules.
While we can create our own packages, we can also use one from the Python
Package Index (PyPI) to use for our projects.
To import a package, we type the following:
import Game.Sound.load
We can also import it giving it an alias:
import Game.Sound.load as load game
Module-3:-

• DATA STRUCTURES IN PYTHON:-


Types:-
1)Strings
2)Lists
3)Tuple
4)sets
5)dictionary

1) String:-
Distinguish ways to declare strings
Distinguish ways to access string elements
Mathematical operation on strings
Possible operations applied
Implementation using python

String Declaration:-

Syntax:-
1)single quote
2)double quote
3)triple quote

Access characters in string:-


1) By index
2) By slice operator

->By index:-
Python supports both +ve and –ve index
+ve index read from left to right
-ve index read from right to left

->By slice operator:-

Syntax:-
Str[beg_index:end_index:step_value:step_value]
By default the value is 1
Step value can be +ve or –ve
Note:-
In backward direction if end value=-1 then result would be
empty
In forward direction if end value=0 then result would be empty

->Mathematical operators for string:-


1)Concatenation operator(+)
2)Repetition operator(*)

Ex:-
Print(“Yash”+“veer”) //Yashveer
->Checking membership:-
It enables to check whether a character or string is member of the another
string
Ex:-
-> Comparison of Strings:-
Relational operators are used to perform string comparisons
Comparison would happen based on alphabetical order
Ex:-

-> Removing spaces from the string:-


Three methods
1. rstrip() Remove spaces from RHS
2. lstrip() Remove spaces from LHS
3. strip() Remove spaces from both RHS & LHS

->Finding substrings:-
1) Forward direction:

->Find()
Returns index of first occurrence of given string, else it returns -1

->Index()
It acts as same as find() but if given string is exists then it returns ValueError
2) Backward direction:-
rfind()
rindex()

-> Counting Substrings:-


1) Count(substring) :-Enable to search throughout the string

2) Count(substring, begin-index, end-index):-Enable to search from beginning


index to ending index-1

-> Replacing a string with another string:-


replace(old-string, new-string):-Enable to search throughout the string and
replace all occurrences with new word in place of old word
->Splitting of a string
split( separator ):-It split the GIVEN string according to the specified separator
The default separator is space.
The return type of split() method is list

->Joining of strings
Separator . join(list-of-strings):-Enables to join group of strings such as list or
tuple with respect to given separator

->Changing Case of string


Methods:-
1)upper( ) :- Invert From lower to upper case
2)lower( ) :- Invert from upper to lower case
3)swapcase( ) :- Invert both cases
4) title( ) :- Invert first character of every word to upper case and remaining as
lower
5) capitalize( ) :- Invert first character as upper case and remaining as lower
case
->Checking starting and ending part of the string:-
Methods:-
1) startswith( substring) :- It returns True if start substring is equal to actual
string else False
2) endswith( substring) :- It returns True if end substring is equal to actual string
else False
->Check type of character present in a string:-
Methods:-
1)isalnum( )
2)isalpha( )
3)isdigit( )
4)islower( )
5)isupper( )
6)istitle( )
7)isspace( )

->Formatting the string:-


Method:-
Format( ) : › enable to format the string display as per the desired output

1)List:-
In python, a list is an ordered group of heterogeneous items or elements or
objects
All the items consider as single entity
Elements of a list can be accessed by an index
List can be arbitrarily nested, i.e. list can consist of another list as sublists
Variable size dynamic
List is mutable
Elements can be accessed in both direction

1)empty list:-List1=[ ]
2)static list:-List1= [10,20,30,40]
3)dynamic list:-List1=eval(input(“enter list of objects”)
4)using list( ) function:- L=list(range(0,10,2)) or str=“Yash”
5)using split( ) function:-Str=“native tech hub software company” L=str.split( )

->Various ways to access elements of List List Data Structure


Methods: › Index Method:-
It has +ve and –ve indexing
+ve has index starts from 0 from left to right
-ve has index starts from -1 from Right to left

Slice Operator Method:-


Syntax:-
List2=list1 [start : stop : step]
List vs Mutability List Data Structure
After creation of list, it can be modified its elements. Hence list objects are
mutable
Ex. list1=[10,20,30,40] print(list1) list1[1]=2006 print(list1)

Operations on list:-
Metadata related functions about List:-
len( ) : it returns length of the list
count( ) : it returns no.of occurrence of the given item in the list
index( ) : it returns index of the supplied item
Manipulation functions
append( ) : it append the item at the last index
insert( ) : it insert the item at specific index
extend() : It copies all the elements of one list2 to another list-1
remove() vs clear() :remove will remove a specified item where as clear would
remove all the items from the list
pop() :it removes top means last index value and returns last element
Functions of Ordering elements in list:-
reverse()
sort()
Aliasing and cloning methods
Mathematical operations on list:-
Concatenation (+)
Repetition ( * )

3) Tuple:-

Different ways to create tuple:-


T=( )
T=(10,20)
T=(“Yash”,10,”native”)
T=tuple(list) #it would allow at most one argument

Accessing Elements of Tuple


Index method
Slice Operator Method
Justification of TUPLE immutability

Mathematical Operators:-
Concatenation (+)
Repetition ( * )
Functions:-
Len()
Count()
Index()
Sorted()
Min()
Max()

Tuple Packing:-deals with making a tuple by packing a group of


variables
Ex. Tupel1=a,b,c,d
Tuple Unpacking:-Enable to unpack the tuple elements to individual
variables
Ex. a,b,c,d=tuple1

4) Set:-

Declaration of Set Data Structure:-


Method1:-
Syntax:-
S={ ele1,ele2---,--,-- }
Method2:-
Using set() function
Set(list)
Note: empty set is allowed only with set() function
Note: s={ } treated as dictionary not as empty set
Note: we should declare it as set() to create empty set

1)Dictionary:-
Declaration of Dictionary DS Dictionary Data Structure:-
D={ }
D=dict ( )
We can create empty dictionaries and add entries as follows
D[100]=“Yash”
D[200]=“Singh”
D[300]=“NTH”
Print(D)
Output: { 100:”Yash”,200:”Singh”,300:”NTH”}

Methods applied on Dictionary


dir(dict):-
command list all methods associated with dictionaries data structure
• Files:-
Definition:-
A file is logical storage media that stores data or information in specific
structure
Ex:-Notepad
Word-create()
open()
read()
rename()
delete()
close()
When exection is completed then the output produced by the prog is no longer
stored in permanent storage

->Stack Memory
->Content holding by this memory is completely swiped
So that's the reason we choose files to store files permanently

open function has three parameters


file object=open(file_name,[access_mode],[buffering])
Access mode->Read,write and append
Buffering is used for additional data
ex:- result=open("foo.txt","w")
print("Name of the file:-",result.name)
Open functions:-
files are categorized in two types
1) text files:- symbols,digits,special symbols etc
2) Binary files:- holds only data in the form of binary:0 and 1's
ex:-Images

Access modes:-
r->this is default mode,opens a file for reading
rb->opens a file for reading only binary
r+->both reading and writing
rb+->both reading and writing in binary
w->opens file for writing only. overwrites the file if it exists and creates new
one if it doesn't
wb->opens a file for writing only in binary format overwrites the file if it exists
and creates new one if it doesn't
a->opens file for appending
ab->opens file for appending in binary format
a+->opens file for both appending and reading
ab+->opens file for both appending and reading in binary format
w+->opens file for reading and writing
wb+->opens file for reading and writing in binary format
Renaming of file
os.rename(file_name)
deleting files
os.remove(file_name)
File.seek()
changes the current file position
File.tell()
tells the current position within the file like what will occur next
• Regular Expressions:-
RE is defined as set of symbols that are used to extract some set of strings
based on specified pattern.
Ex:-display all the python files in our computer whose extension is *.py
Such a pattern is used to locate a chunk of text in string by matching up the
pattern
Ex:-Email,Ph no etc..
In python we use the module re

Functions:-
Search():-It searches for first occurence if re pattern within string with optional
flags
sub():-Replaces on or many matches with a string
Findall():-Returns a list containing all matches
Split():-Returns a list where the string has been split at each match

Symbolic characters:-

^ :-Matches the start of the string


$ :-Matches the end of the String
. :-Matches any character except new line
() :-Matches sub-pattern inside parenthesis as a grp and stores in a substring
* :-Matches zero or more occurences of preceding sub-pattern
+ :-Matches one or more occurences of preceding sub-pattern
? :-Matches zero or one occurences of preceding sub-pattern
*? :-Matches zero or more occurences of preceding sub-pattern
{m,n} :- Matches from m repetition to n occurences of the preceding sub
pattern
| :-Matches either of the sub-pattern present on either side of this special
character
[] :-Matches any one of the character in the set
• Real Time Project:-
• Exception Handling:-

Exception is an error that occurs at runtime


Ex:- c(int type)=a+b(a and b are float type)
This throws an exception which occurs at runtime
a=10
b=0
c=a/b
print(c)
Error:-Zero Division error
When exception occurs it must be handled immediately otherwise it gets
terminated

try:{
run code
}
except:{
Execute this when there is an exception
}
else:
{
No exception run this
}
finally:
{
this code will run always
}
Python has two types of exception handling
1) predefined:-raised automtically by virtual machine when an abnormal event
occurs
2) user defined:-these fall under user domain errors and implement using raise
keyword

• Database Programming:-
Database:-
That stores data in strutured format i.e in relational DBMS
or
Shared collection of logically related data and description of data
which is designed to fulfill the needs of the organization

DBMS:-
It is a software that enables users to define,create,maintain and contol access
to the database.
ex:-mysql,oracle,sql server,db2,siebel,forpo,mongodb etc...
Application Program(Frontend)
which interacts with database by issuing an appropriate request(sql statement)
Ex:-
Client(frontend) interactes with server in server we have DBMS(Backend)
Limitations of flat file system:-
Dependency of program on physical structure
complex process to retrieve data
loss of data on concurrent access
inability to give access based on record
data redundancy
data inconsistency
only partial data recovery
• Object Oriented Programming:-
Object Oriented Programming is a way of computer programming using the
idea of “objects” to represents data and methods
It an approach used for creating neat and reusable code instead of a redundant
one
The program is divided into self-contained objects or several mini-programs.
Every Individual object represents a different part of the application having its
own logic and data to communicate within themselves.
ENTITY:-
An entity is any singular, identifiable and separate object.
CLASS:-
->Class is design entity or an object’s 1 blueprint, description, or
definition Classes are created using the keyword ‘Class’ and an
intended block, which contains class methods, attributes .
-> Every class definition in Python should start with the init
method.
-> The init method is called when an object of the class is
created.
-> This method is the constructor of the class. A constructor is a
method that we can use to initialize a new object to certain
values We should use ‘self’ as the first parameter for all methods.
Types of variables:-
Instance:-
These variable also known as object level variables
➔If the value of a variable is varied from object to object , then
such variables called instance variables
➔For every object a separate copy of instance variables will be
created
Static:- also known as Class level variables
➔If the value of a variable is not varied from object to object
➔Declaration with in the class directly but outside of methods.
Local --also known as method level variables
➔A variable is said to be local, if it declare with in the method •
Local variables will be created at the time of method execution
and destroyed once method completes.
➔Local variables of a method cannot be accessed from outside of
method.

->Inheritance:-
Inheritance allows us to inherit attributes and methods from the base/parent
class
➔ useful as we can create sub-classes and get all of the functionality from
our parent class.
➔ Add new functionalities without affecting the parent class. A constructor
is a method that we can use to initialize a new object to certain values
We should use ‘self’ as the first parameter for all methods
#DEMO ABOUT NOT POSSIBLE METHOD OVERLOADING
class test:
def m1(self):
print(“no argument method”)
def m1(self, a):
print(“one-argument method”)
def m1(self, a, b):
print(“two-argument method”)
Obj=test( )
Obj.m1()
Obj.m1(10)
Obj.m1(10,20)
This program invokes last method METHOD OVERLOADING #DEMO ABOUT
PYTHON POSSIBLE METHOD OVERLOADING class test:
def sum(self, a=None, b=None, c=None):
if a!=None and b!=None and c!=None:
print(“sum of 3 no’s”,a+b+c)
elif a!=None and b!=None: print(“sum of 2 no’s “,a+b)
else:
print(“please provide 2 or 3 arguments”)
obj=test( )
Obj.sum(10,20)
Obj.sum(10,20,5)
Obj.sum(10)
Module-4:-

• Numpy module
-> Numpy stand for numerical python
->Fundamental module for scientific computing using array
objects
->It is derived from scipy library
->Supports N-dimensional array
->To save memory and execution time, it uses
vector/matrix/tensor operations making it faster and better
than general python array
->Learned how to :
o Install and import numpy package
o
o Properties of numpy arrays
o Various functions to create arrays
o Combining arrays

o
o Converting list from arrays
o

o Array indexing

o
o Array slicing

o Array re-shaping

o
o Limitation of array arithmetic

o Broadcasting arrays and limitations of broadcasting

o
o

• Vectors : a vector is a tuple of one or more values which are


scalar
• I learned how to:
o Create a vector
o
o Perform arithmetic operations on vectors

o
o Perform vector dot product

o
o Perform scalar vector multiplication
o

• Numpy module part-2


->Matrix: A matrix is a two-dimensional array of scalars with one or
more columns and one or more rows
->I learned how to:
o Define a matrix

o
o Matrix arithmetic

o
o Matrix duplication

o Matrix multiplication vector


o

o Matrix scalar multiplication

o
• We also learned about different types of matrices
o Square matrix: where the rows and columns are equal

o Symmetric matrix: when the transpose of a matrix and


itself are equal

o
o Triangular matrices: upper- when Aij=0 if i>j, lower- when
Aij=0 if i<j

o Diagnol matrix

o Identity matrix: with 1s in diagonal positions and 0s in all


other places
o

• Matrix properties:
o Transpose:

o
o
o Inverse

o
o Trace

o
o Determinant

• Tensors: Tensors are multi-dimensional arrays with a uniform type (called


a dtype). You can see all supported dtypes at tf.dtypes.DType. If you're
familiar with NumPy, tensors are (kind of) like np.arrays. All tensors are
immutable like Python numbers and strings: you can never update the
contents of a tensor, only create a new one.
• We learned how to:
o Define a tensor
o
o

o Perform arithmetic operations with tensors

o Perform tensor product


o

Topic : Pandas module


• Pandas stands for Panel Data System
• Pandas is a python package / library for data manipulation and
analysis
• Pandas deals with data structures such as series, dataframe
• Flexible data manipulation capabilities of spreadsheets and
relational databases
• Pandas has ability to perform indexing, slice,dice, aggregations,
subset operations
• Difference between series and dataframe : Series is a type of
list in pandas which can take integer values, string values,
double values and more. Series can only contain single list with
index, whereas dataframe can be made of more than one series
or we can say that a dataframe is a collection of series that can
be used to analyse the data.
• Operations with series

o
• Dataframe : DataFrame is like an Excel table / PivotTable. It is a
tabular data structure comprised of rows and columns with
heterogeneous values
• Dataframe operations :
o
• Loading csv file into dataframe

• Operations to view and inspect dataframe


o

o
• Filtering/slicing dataframe
o
• Sorting the dataframe

o
• Aggregating Dataframe

o
• Creating new fields based on arithmetic calculations of other
columns

Topic : Statistics module


• Statistics is a mathematical science including methods of
collecting, organizing and analysing data in order to derive
accurate and scientific conclusions for desired questions.
Statistics categorized into two:
1. Descriptive Statistics 2. Inferential Statistics
• Types of data

• Population: consists of a large group of specifically defined


elements or objects whose properties are to be analysed.
• Sample: A subset of the population is called ‘Sample’.

o
• Descriptive analysis: DS deals with the methods for
summarizing raw observations in to information.In DS,
properties of the sample data such as accidents, prices of
goods, business,incomes, sports data, population data

o
• Mean: It is the sum of observation divided by the total number
of observations.It is also defined as average which is the sum
divided by count

o
• Mode: It is the value that has the highest frequency in the given
data set. The data set may have no mode if the frequency of all
data points is the same. Also, we can have more than one
mode if we encounter two or more data points having the
same frequency.

o
• Median: It is the middle value of the data set. It splits the data
into two halves. If the number of elements in the data set is
odd then the centre element is median and if it is even then the
median would be the average of two central elements

o
• Range: describes the difference between the largest and
smallest data point in our data set.The bigger the range, the
more is the spread of data and vice versa
o
• Variance: It is defined as an average squared deviation from the
mean.It is being calculated by finding the difference between
every data point and the average which is also known as the
mean,squaring them, adding all of them and then dividing by
the number of data points present in our data set

o
• Standard Deviation: It is defined as the square root of the variance. It
is being calculated by finding the Mean, then subtract each number from
the Mean which is also known as average and square the result. Adding
all the values and then divide by the no of terms followed the square
root
o

• Correlation: In statistics, correlation is a statistic that


establishes the relationship between two variables. In other
words, it is the measure of association of variables. What is a
correlation of 1? A correlation of 1 or +1 shows a perfect
positive correlation, which means both the variables move in
the same direction

o
o

Topic : Matplotlib module


• Matplotlib is a visualization tool for 2D graphs and partial
support 3D graphs Plot() and show() are the two functions used
to visualize graphs.A multi-platform data visualization tool built
on the numpy and scipy framework. Therefore, it's fast and
efficient It supports histograms, bar charts, piecharts, scatter
plots and area chart.Matplotlib.pyplot is used to visualize any
graph in pandas and other modules
• Types of plots
o
• Scatter plot

o
• Pie Chart

o
o
• Line plot with styling

o
• Seaborn : It is statistical data visualization Python library based
on matplotlib. It provides a high-level interface for drawing
attractive and informative statistical graphics.
• Features of seaborn:
o A dataset-oriented API for examining relationships
between multiple variables
o Specialized support for using categorical variables to show
observations or aggregate statistics
o Options for visualizing univariate or bivariate distributions
and for comparing them between subsets of data

Topic : Importance of Machine Learning


• Machine Learning: ML is an application of Artificial
Intelligence (AI) that enables the machine to learn
automatically through experience, without being explicitly
programmed.

• I learned about how machine learning gives probable


solutions for identifying and detecting multiple frauds in
different scenarios like classify spam emails from manual
emails.
• I learned about how machine learning gives probable
solution by clustering data, for e.g: clustering customer
purchase history data in order to give better
recommendations in order to increase sales
• Features of machine learning:

• Phases of machine learning

o Building model
o
• Detailed steps in training and testing in machine learning
o Collecting data
o Data wrangling
o Analyze data
o Train algorithm
o Test algorithm
o Deployment
• Machine Learning Algorithms

Topic : Linear Regression


• Equation for Linear Regression
o y=B0+B1*x
• Estimation of slope (B1)

o
• mean () is the average value for the variable in our
dataset. The xi and yi refer to the fact that we need to
repeat these calculations across all values in our dataset
and i refers to the i’th value of x or y.
• 𝑩𝟎 = 𝒎𝒆𝒂𝒏(𝒚) − 𝑩𝟏 ∗ 𝒎𝒆𝒂𝒏(𝒙)
• Make predictions
o y=B0+B1*x
• Scatter the plot and estimate erros using r^2 value
• Python process is execute linear regression
o Import the csv file and read the data
o Split data in train and test

o Set the train data into linear regression module

o
o
o Predict the results using test data
o
o Create a scatter plot and a line plot of the actual
data to check how accurate you are

o Calculate the correlation value

o
• Steps in SLR
o Calculate mean and variance
o
o Calculate the covariance

o Calculate coefficient values

o
o Apply simple linear regression

o
o Calculate mean square error
o
o Evaluate regression algorithm on training dataset

o Calculate co-relation values

Topic : Logistic Regression


• Logistic regression is used to predict binary outputs with two
possible values labeled "0" or "1".Logistic model output can be
one of two classes: pass/fail, win/lose, healthy/sick
• Linear regression is not suitable for classification problem.
Linear regression is unbounded, so logistic regression will be
better candidate in which the output value ranges from 0 to 1
• Logistic regression algorithm works by implementing a linear
equation first with independent predictors to predict a value.
We then need to convert this value into a probability that could
range from 0 to 1.

• Brief steps:
o Logistic Regression Model
▪ Prediction=0 if P (variable) < 0.5
▪ Prediction=1 if P (variable) >=0.5

o Find coefficients of Logistic Regression using stochastic
gradient descent
o Calculate New Coefficients
o Make prediction
o Find an accuracy of Logistic Regression Model
• Python steps to execute logistic regression
o Import the dataset, read it, perform cleaning operations
o Split the data in training and testing

o Fit it into logistic regression module


o Predict the results


o Create the confusion matrix which will give us the number
of correct and incorrect predictions


o Calculate the accuracy

Topic : Naïve bayes algorithm


• NBA: It is classification technique based on Bayes Theorem with
an assumption of independence among predictors or input
variables or features or independent variables. In simple terms,
NBC, assumes that the presence of a particular feature in a
class is unrelated to the presence of any other feature
• Brief steps:
o Create frequency table for each attribute of the data set
o For each frequency table , generate a likelihood table
o Predict the output
o Fit it into nba algo, for continuos variables use normal
distribution
o Test accuracy
• Python procedure:
o Import and read the data
o Set necessary columns

o Remove punctuations


o Perform stemming: A root is a form which is not further
analyzable In the form ‘untouchables’ the root is ‘touch’,
to which first the suffix ‘-able’, then the prefix ‘un-‘ and
finally the suffix ‘-s’ have been added. In a compound
word like ‘wheelchair’ there are two roots, ‘wheel’ and
‘chair’.Derivational Morphology Derivational morphology
results in the creation of a new word with a new meaning
Inflectional Morphology In contrast, inflectional
morphology involves an obligatory grammatical
specification A stem is of concern only when dealing with
inflectional morphology.(which doesn't change the
coremeaning) In the form ‘untouchables’ the stem is
‘untouchable’, although in the form ‘touched’ the stem is
‘touch’; in the form ‘wheelchairs’ the stem is ‘wheelchair’,
even though the stem contains two roots('wheel' and
'chair')


o

You might also like