0% found this document useful (0 votes)

54 views

Institute's Vision

Uploaded by

Mayuresh

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

Institute's Vision

Uploaded by

Mayuresh

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Institute’s Vision

To be an organisation with potential for excellence in engineering and

management for the advancement of society and human kind.

Institute’s Mission

To excel in academics, practical engineering, management and to commence

research endeavours.

To prepare students for future opportunities.

To nurture students with social and ethical responsibilities.

Department’s Vision

To create IT graduates with ethical and employable skills.

Department’s Mission

To imbibe problem solving and analytical skills through teaching learning

process.

To impart technical and managerial skills to meet the industry requirement.

To encourage ethical and value based education.

Excelssior’s Education Society

K. C. COLLEGE OF ENGINEERING
AND MANAGEMENT STUDIES AND
RESEARCH THANE (EAST).

Certificate
This is to certify that Mr. / Ms. ___________________________________

of Semester Branch Roll No. _

has performed and successfully completed all the practical’s in the subject
of ______________________________________________ for the
academic year 20___ to 20___ as prescribed by University of Mumbai.

DATE :- ____________

_____________________________ _____________________________

Practical Incharge Internal Examiner

____________________________ _____________________________

Head of Department External Examiner

COLLEGE SEAL
Lab Objectives: Sr. No. Lab Objectives
The Lab experiments aims:
1 To know the fundamental concepts of data science and
analytics
2 To learn data collection, preprocessing and visualization
techniques for data science
3 To Understand and practice analytical methods for
solving real life problems based on Statistical analysis
4 To learn various machine learning techniques to solve
complex real-world problems
5 To learn streaming and batch data processing using
Apache Spark
6 To map the elements of data science to perceive
information
Lab Outcomes: Sr. No. Lab Outcomes Cognitive levels of attainment as
per Bloom’s Taxonomy

On successful completion, of course, learner/student will be able to:

1 Understand the concept of Data L1
science process and associated
terminologies to solve real-world
problems
2 Analyze the data using different L1, L2, L3, L4
statistical techniques and visualize
the outcome using different types of
plots.
3 Analyze and apply the supervised L1,L2, L3, L4
machine learning techniques like
Classification, Regression or
Support Vector Machine on data for
building the models of data and
solve the problems.
4 Apply the different unsupervised L1, L2,L3
machine learning algorithms like
Clustering, Decision Trees, Random
Forests or Association to solve the
problems.
5 Design and Build an application that L1,L2,L3,L4,L5,L6
performs exploratory data analysis
using Apache Spark
6 Design and develop a data science L1,L2,L3,L4,L5,L6
application that can have data
acquisition, processing, visualization
and statistical analysis methods with
supported machine learning
technique to solve the real-world
problem

Prerequisite: Basics of Python programming and Database management system.

DETAILED SYLLABUS:
Sr. No. Module Detailed Content Hours LO Mapping
I Introduction to 04 LO1
Data Science and i. Introduction,
Data Processing Benefits and uses
using Pandas of data science
ii. Data Science
tasks

iii. Introduction to
Pandas
iv. Data
preparation: Data
cleansing, Data
transformation,
Combine/Merge
/Join data, Data
loading &
preprocessing with
pandas
v. Data
aggregation
vi. Querying data
in Pandas
vii. Statistics with
Pandas Data
Frames
viii. Working with
categorical and text
data
ix. Data Indexing
and Selection
x. Handling
Missing Data
II Data Visualization 04 LO2
and Statistics i. Visualization
with Matplotlib
and Seaborn
ii. Plotting Line
Plots, Bar Plots,
Histograms
Density Plots,
Paths, 3Dplot,
Stream plot,
Logarithmic plots,
Pie chart, Scatter
Plots and Image
visualization using
Matplotlib
iii. Plotting scatter
plot, box plot,
Violin plot, swarm
plot, Heatmap, Bar
Plot using seaborn
iv. Introduction to
scikit-learn and
SciPy
v. Statistics using
python: Linear
algebra, Eigen
value, Eigen
Vector,
Determinant,
Singular Value
Decomposition,
Integration,
Correlation,
Central Tendency,
Variability,
Hypothesis testing,
Anova, z-test, t-test
and chi-square test.

III Machine Learning 05 LO3

i. What is Machine
Learning?
ii. Applications of
Machine Learning;
iii. Introduction to
Supervised
Learning
iv. Overview of
Regression
v. Support Vector
Machine
vi. Classification
algorithms

Program Outcomes

Engineering Graduates will be able to:

1. Engineering knowledge: Apply the knowledge of mathematics, science,

engineeringfundamentals, and an engineering specialization to the solution of complex
engineering problems.

2. Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of mathematics,
natural sciences, and engineering sciences.

3. Design/development of solutions: Design solutions for complex engineering problems and design
system components or processes that meet the specified needs with appropriate consideration for the
public health and safety, and the cultural, societal, and environmental considerations.

4. Conduct investigations of complex problems: Use research-based knowledge and

researchmethods including design of experiments, analysis and interpretation of data, and
synthesis of the information to provide valid conclusions.

5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modelling to complex engineering activities
with an understanding of the limitations.

6. The engineer and society: Apply reasoning informed by the contextual knowledge to
assesssocietal, health, safety, legal and cultural issues and the consequent responsibilities relevant
to the professional engineering practice.
7. Environment and sustainability: Understand the impact of the professional
engineeringsolutions in societal and environmental contexts, and demonstrate the knowledge of,
and need for sustainable development.

8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
normsof the engineering practice.

9. Individual and team work: Function effectively as an individual, and as a member or leader
indiverse teams, and in multidisciplinary settings.

10. Communication: Communicate effectively on complex engineering activities with

theengineering community and with society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and receive
clear instructions.

11. Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.

12.Life-long learning: Recognize the need for, and have the preparation and ability to engage
inindependent and life-long learning in the broadest context of technological change.
Department of Information Technology

Subject :Artificial Intelligence & Data Science

Semester :VI

Class : TE

Course Outcomes / Lab Outcomes

Course Code(ITL703) Lab Outcomes

At the end of experiment student will able to

ITL605.1 Understand the concept of Data science process and
associated terminologies to solve real-world
problems

ITL605.2 Analyze the data using different statistical

techniques and visualize the outcome using different
types of plots.
ITL605.3 Analyze and apply the supervised machine learning
techniques like Classification, Regression or
Support Vector Machine on data for building the
models of data and solve the problems.

ITL605.4 Apply the different unsupervised machine learning

algorithms like Clustering, Decision Trees, Random
Forests or Association to solve the problems.

ITL605.5 Design and Build an application that performs

exploratory data analysis using Apache Spark

ITL605.6 Design and develop a data science application that

can have data acquisition, processing, visualization
and statistical analysis methods with supported
machine learning technique to solve the real-world
problem
Rubrics for Practical

Rubrics Maximum 15-12 12-9 9-6 6-0

Description Marks
Weight

Implementatio 5 Successful Output Few errors Incorrect

n completion correct but in the Output
(R1) with accurate not precise output (2-0)
output (5-4) (4-3) (3-2)

Understanding 5 Understanding Understand Improper No

(R2) Experiment Experiment Conclusion Conclusion
and drawn but (3-2) (2-0)
correct conclusion
conclusion less
(5-4) accurate
(4-3)

Punctuality and 5 Submission Submissio Submissio Submissio

Discipline within a week n after n after two n after
(R3) (5-4) week (4-3) weeks three
(3-2) weeks and
more (2-0)
TABLE OF CONTENTS

Sr. Date of Date of Page Grade

Name of Experiment / Sign
No Conduction Submission No.
Marks

11
Sr. Date of Date of Page Grade

Name of Experiment Sign

Total Grade / Marks :-

Avg. marks of Experiments Avg. marks of Assignments

(A) (B) Total Marks

(A+B)

Obtained Out of Obtained Out of

__________________ __________________

Practical Incharge Date

EXPERIMENT NO. - 01

Aim of the Experiment :- Data preparation using NumPy and Pandas

Lab Outcome :-

Date of Conduction : Date of Submission :__

Punctuality & Discipline

Implementation Understanding (5) (5) Total

(5) (15)

____________________________
Practical Incharge
EXPERIMENT NO. - 01

AIM : Data preparation using NumPy and Pandas

THEORY:
Data Preprocessing:
Data preprocessing is a data mining technique which is used to transform the raw data in a
useful and efficient format.

Steps Involved in Data Pre processing:
1. Data Cleaning:
The data can have many irrelevant and missing parts. To handle this part, data cleaning is done.
It involves handling of missing data, noisy data etc.

 (a). Missing Data:
This situation arises when some data is missing in the data. It can be handled in various
ways.
Some of them are:
1. Ignore the tuples:
This approach is suitable only when the dataset we have is quite large and multiple values
are missing within a tuple .

2. Fill the Missing values:
There are various ways to do this task. You can choose to fill the missing values
manually, by attribute mean or the most probable value.

 (b). Noisy Data:
Noisy data is a meaningless data that can’t be interpreted by machines. It can be generated
due to faulty data collection, data entry errors etc. It can be handled in following ways :
1. Binning Method:
This method works on sorted data in order to smooth it. The whole data is divided into
segments of equal size and then various methods are performed to complete the task.
Each segmented is handled separately. One can replace all data in a segment by its mean
or boundary values can be used to complete the task.

2. Regression:
Here data can be made smooth by fitting it to a regression function. regression used may
be linear (having one independent variable) or multiple (having multiple independent
variables).

3. Clustering:
This approach groups the similar data in a cluster. The outliers may be undetected or it
will fall outside the clusters.
2. Data Transformation:
This step is taken in order to transform the data in appropriate forms suitable for mining
process. This involves following ways:
1. Normalization:
It is done in order to scale the data values in a specified range (-1.0 to 1.0 or 0.0 to 1.0)

2. Attribute Selection:
In this strategy, new attributes are constructed from the given set of attributes to help the
mining process.

3. Discretization:
This is done to replace the raw values of numeric attribute by interval levels or conceptual
levels.

4. Concept Hierarchy Generation:
Here attributes are converted from lower level to higher level in hierarchy. For Example-The
attribute “city” can be converted to “country”.

3. Data Reduction:
Since data mining is a technique that is used to handle huge amount of data. While working
with huge volume of data, analysis became harder in such cases. In order to get rid of this, we
uses data reduction technique. It aims to increase the storage efficiency and reduce data storage
and analysis costs.
The various steps to data reduction are:
1. Data Cube Aggregation:
Aggregation operation is applied to data for the construction of the data cube.

2. Attribute Subset Selection:
The highly relevant attributes should be used, rest all can be discarded. For performing
attribute selection, one can use level of significance and p- value of the attribute.the attribute
having p-value greater than significance level can be discarded.

3. Numerosity Reduction:
This enable to store the model of data instead of whole data, for example: Regression
Models.

4. Dimensionality Reduction:
This reduce the size of data by encoding mechanism .It can be lossy or lossless. If after
reconstruction from compressed data, original data can be retrieved, such reduction are
called lossless reduction else it is called lossy reduction. The two effective methods of
dimensionality reduction are: Wavelet transforms and PCA (Principal Component
Analysis).

Feature Scaling:
Feature Scaling is a technique to standardize the independent features present in the data in a
fixed range. It is performed during the data pre-processing to handle highly varying magnitudes
or values or units. If feature scaling is not done, then a machine learning algorithm tends to
weigh greater values, higher and consider smaller values as the lower values, regardless of the
unit of the values.
OUTPUT :

CONCLUSION:
EXPERIMENT NO. - 02

Aim of the Experiment :- Data Visualization / Exploratory Data Analysis for the selected data set
using Matplotlib and Seaborn
a. Create a bar graph, contingency table using any 2 variables.
b. Create normalized histogram.
c. Describe what this graphs and tables indicates?

Lab Outcome :-

Date of Conduction : Date of Submission :__

Punctuality & Discipline

Implementation Understanding (5) (5) Total

(5) (15)

____________________________

Practical Incharge
EXPERIMENT NO. - 02
AIM : Data Visualization / Exploratory Data Analysis for the selected data set using Matplotlib and
Seaborn
a. Create a bar graph, contingency table using any 2 variables.
b. Create normalized histogram.
c. Describe what this graphs and tables indicates?

THEORY: A bar graph is the graphical representation of categorical data using rectangular bars where
the length of each bar is proportional to the value they represent. A histogram is the graphical
representation of data where data is grouped into continuous number ranges and each range corresponds
to a vertical bar.

Contingency Table is one of the techniques for exploring two or even more variables.
It is basically a tally of counts between two or more categorical variables.

Seaborn.barplot() method in Python

Seaborn is a Python data visualization library based on Matplotlib. It provides a high-
level interface for drawing attractive and informative statistical graphics.

A barplot is basically used to aggregate the categorical data according to some methods
and by default it’s the mean. It can also be understood as a visualization of the group by
action. To use this plot we choose a categorical column for the x-axis and a numerical
column for the y-axis, and we see that it creates a plot taking a mean per categorical
column.

Syntax : seaborn.barplot(x=None, y=None, hue=None, data=None, order=None,

hue_order=None, estimator=<function mean at 0x000002BC3EB5C4C8>, ci=95,
n_boot=1000, units=None, orient=None, color=None, palette=None, saturation=0.75,
errcolor=’.26′, errwidth=None, capsize=None, dodge=True, ax=None, **kwargs,)
P arameters :
Arguments Value Description

x, y, hue names of Inputs for plotting long-form data. See examples

variables in for interpretation.
“data“ or vector
data, optional

data DataFrame, Dataset for plotting. If “x“ and “y“ are absent,
array, or list of this is interpreted as wide-form. Otherwise it is
arrays, optional expected to be long-form.

order, lists of strings, Order to plot the categorical levels in, otherwise
hue_order optional the levels are inferred from the data objects.

estimator callable that Statistical function to estimate within each

maps vector -> categorical bin.
scalar, optional

ci float or “sd” or Size of confidence intervals to draw around

None, optional estimated values. If “sd”, skip bootstrapping
and draw the standard deviation of the
observations. If “None“, no bootstrapping will be
performed, and error bars will not be drawn.

n_boot int, optional Number of bootstrap iterations to use when

computing confidence intervals.

units name of variable Identifier of sampling units, which will be used

in “data“ or to perform a multilevel bootstrap and account
vector data, for repeated measures design.
optional

orient “v” | “h”, Orientation of the plot (vertical or horizontal).

optional This is usually inferred from the dtype of the
input variables, but can be used to specify when
the “categorical” variable is a numeric or when
plotting wide-form data.

color matplotlib color, Color for all of the elements, or seed for a
optional gradient palette.

palette palette name, Colors to use for the different levels of the “hue“
list, or dict, variable. Should be something that can be
optional interpreted by :func:`color_palette`, or a
dictionary mapping hue levels to matplotlib
colors.

saturation float, optional Proportion of the original saturation to draw

colors at. Large patches often look better with
slightly desaturated colors, but set this to “1“ if
you want the plot colors to perfectly match the
input color spec.

errcolor matplotlib color Color for the lines that represent the confidence
interval.

errwidth float, optional Thickness of error bar lines (and caps).

capsize float, optional Width of the “caps” on error bars.

dodge bool, optional When hue nesting is used, whether elements

should be shifted along the categorical axis.

ax matplotlib Axes, Axes object to draw the plot onto, otherwise uses
optional the current Axes.

kwargs ey, value Other keyword arguments are passed through to

mappings “plt.bar“ at draw time.
Following steps are used :

Import Seaborn

Load Dataset from Seaborn as it contain good collection of datasets.

Plot Bar graph using seaborn.barplot() method.

Normalised Histogram using matplotlib()

To normalize a histogram in Python, we can use hist() method. In normalized bar, the
area underneath the plot should be 1.

Steps:

 Make a list of numbers.

 Plot a histogram with density=True.

 To display the figure, use show() method.

Example

import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = [7.00, 3.50]

plt.rcParams["figure.autolayout"] = True

k = [5, 5, 5, 5]
x, bins, p = plt.hist(k, density=True)

plt.show()

OUTPUT:

CONCLUSION:
EXPERIMENT NO. - 03

Aim of the Experiment :- Data Modeling : Validating partition by performing a two‐sample Z‐

test.

Lab Outcome :-

Date of Conduction : Date of Submission :__

Punctuality & Discipline

Implementation Understanding (5) (5) Total

(5) (15)
Punctuality & Discipline
Implementation Understanding (5) (5) Total

____________________________

Practical Incharge
Experiment No. 3
AIM : Data Modeling : Validating partition by performing a two‐sample Z‐test.

THEORY: Data Modeling

Data modeling is the process of creating a simplified diagram of a software system and the data
elements it contains, using text and symbols to represent the data and how it flows. Data models
provide a blueprint for designing a new database or reengineering a legacy application.

Z-test
Z-test is a statistical method to determine whether the distribution of the test statistics can be
approximated by a normal distribution. It is the method to determine whether two sample means
are approximately the same or different when their variance is known and the sample size is large
(should be >= 30).

When to Use Z-test:

 The sample size should be greater than 30. Otherwise, we should use the t-test.
 Samples should be drawn at random from the population.
 The standard deviation of the population should be known.
 Samples that are drawn from the population should be independent of each other.
 The data should be normally distributed, however for large sample size, it is assumed to
have a normal distribution.

Hypothesis Testing

A hypothesis is an educated guess/claim about a particular property of an object. Hypothesis

testing is a way to validate the claim of an experiment.

 Null Hypothesis: The null hypothesis is a statement that the value of a population
parameter (such as proportion, mean, or standard deviation) is equal to some claimed
value. We either reject or fail to reject the null hypothesis. Null Hypothesis is denoted by
H0.
 Alternate Hypothesis: The alternative hypothesis is the statement that the parameter has
a value that is different from the claimed value. It is denoted by HA.

Steps to perform Z-test:

 First, identify the null and alternate hypotheses.

 Determine the level of significance (∝).
 Find the critical value of z in the z-test using
 Calculate the z-test statistics. Below is the formula for calculating the z-test statistics.
 where,
o X¯: mean of the sample.

o Mu: mean of the population.

o Sd: Standard deviation of the population.

o n: sample size.

Two-sampled z-test:
In this test, we have provided 2 normally distributed and independent populations, and we have
drawn samples at random from both populations. Here, we consider u1 and u2 be the population
mean X1 and X2 are the observed sample mean. Here, our null hypothesis could be like:

H0 : µ1- µ2 = 0
and alternative hypothesis

H1 : µ1- µ2 ≠ 0
and the formula for calculating the z-test score:

where sigma1 and sigma2 are the standard deviation and n1 and n2 are the sample size of
population corresponding to u1 and u2 .

Type 1 error and Type II error:

 Type I error: Type 1 error has occurred when we reject the null hypothesis, even when
the hypothesis is true. This error is denoted by alpha.
 Type II error: Type II error has occurred when we didn’t reject the null hypothesis, even
when the hypothesis is false. This error is denoted by beta.

OUTPUT:
CONCLUSION:

EXPERIMENT NO. - 04

Aim of the Experiment :- Implementation of Statistical Hypothesis Test using Scipy and Sci-kit
learn.

Correlation Tests : Chi-Squared Test

Lab Outcome :-

Date of Conduction : Date of Submission :__

Punctuality & Discipline

Implementation Understanding (5) (5) Total

(5) (15)
Punctuality & Discipline
Implementation Understanding (5) (5) Total

____________________________

Practical In charge

Experiment No. 4
AIM : Implementation of Statistical Hypothesis Test using Scipy and Sci-kit learn.

Correlation Tests : Chi-Squared Test

THEORY:

The Pearson’s Chi-Square statistical hypothesis is a test for independence between categorical
variables. In this article, we will perform the test using a mathematical approach and then using
Python’s SciPy module.

The Contingency Table :

A Contingency table (also called crosstab) is used in statistics to summarise the relationship between
several categorical variables. Here, we take a table that shows the number of men and women buying
different types of pets.

dog cat bird total

men 207 282 241 730
women 234 242 232 708
total 441 524 473 1438

The aim of the test is to conclude whether the two variables( gender and choice of pet ) are
related to each other.

Null hypothesis:

We start by defining the null hypothesis (H0) which states that there is no relation between the
variables. An alternate hypothesis would state that there is a significant relation between the two.

We can verify the hypothesis by these methods:

 Using p-value:

We define a significance factor to determine whether the relation between the variables is of
considerable significance. Generally a significance factor or alpha value of 0.05 is chosen. This
alpha value denotes the probability of erroneously rejecting H0 when it is true. A lower alpha
value is chosen in cases where we expect more precision. If the p-value for the test comes out to
be strictly greater than the alpha value, then H0 holds true.

 Using chi-square value:

If our calculated value of chi-square is less or equal to the tabular(also called critical) value of
chi-square, then H0 holds true.

Expected Values Table :

Next, we prepare a similar table of calculated(or expected) values. To do this we need to

calculate each item in the new table as :

row total * column total / grand total

The expected values table :

dog cat bird total

men 223.87343533 266.00834492 240.11821975 730
women 217.12656467 257.99165508 232.88178025 708
total 441 524 473 1438

Chi-Square Table :

We prepare this table by calculating for each item the following:

(Observed_value – Calculated_value)^2 / Calculated_value

The chi-square table:

observed (o) calculated (c) (o-c)^2 / c
207 223.87343533 1.2717579435607573
282 266.00834492 0.9613722161954465
241 240.11821975 0.003238139990850831
234 217.12656467 1.3112758457617977
242 257.99165508 0.991245364156322
232 232.88178025 0.0033387601600580606
Total 4.542228269825232

From this table, we obtain the total of the last column, which gives us the calculated value of chi-
square. Hence the calculated value of chi-square is 4.542228269825232

Now, we need to find the critical value of chi-square. We can obtain this from a table. To use this
table, we need to know the degrees of freedom for the dataset. The degrees of freedom is defined
as : (no. of rows – 1) * (no. of columns – 1).
Hence, the degrees of freedom is (2-1) * (3-1) = 2

Now, look at the table and find the value corresponding to 2 degrees of freedom and 0.05
significance factor :
The tabular or critical value of chi-square here is 5.991

Hence,

Critical value of x^2 >= Calculates value of x^2

Therefore, H0 is accepted, that is, the variables do not have a significant relation.

Performing the test using Python (scipy.stats) :

SciPy is an Open Source Python library, which is used in mathematics, engineering, scientific
and technical computing.

Installation:

pip install scipy

The chi2_contingency() function of scipy.stats module takes as input, the contingency table in
2d array format. It returns a tuple containing test statistics, the p-value, degrees of freedom and
expected table(the one we created from the calculated values) in that order.

Hence, we need to compare the obtained p-value with alpha value of 0.05.

from scipy.stats import chi2_contingency

# defining the table
data = [[207, 282, 241], [234, 242, 232]]
stat, p, dof, expected = chi2_contingency(data)

# interpret p-value
alpha = 0.05
print("p value is " + str(p))
if p <= alpha:
    print('Dependent (reject H0)')
else:
    print('Independent (H0 holds true)')

OUTPUT:
Chi-square Test for feature selection

Feature selection is also known as attribute selection is a process of extracting the most relevant
features from the dataset and then applying machine learning algorithms for the better
performance of the model. A large number of irrelevant features increases the training time
exponentially and increase the risk of overfitting.

Chi-square Test for Feature Extraction:

Chi-square test is used for categorical features in a dataset. We calculate Chi-square between
each feature and the target and select the desired number of features with best Chi-square scores.
It determines if the association between two categorical variables of the sample would reflect
their real association in the population.

Chi- square score is given by :

where –

Observed frequency = No. of observations of class

Expected frequency = No. of expected observations of class if there was no relationship
between the feature and the target.

OUTPUT:

CONCLUSION:
EXPERIMENT NO. - 05

Aim of the Experiment :- Apply regression Model techniques to predict the data on House prices
dataset . And Prediction of Loan Using Multivariable Regression in Python.

Lab Outcome :-

Date of Conduction : Date of Submission :__

Punctuality & Discipline

Implementation Understanding (5) (5) Total

(5) (15)
Punctuality & Discipline
Implementation Understanding (5) (5) Total

____________________________

Practical In charge
Experiment No. 5
AIM:- Apply regression Model techniques to predict the data on House prices dataset .
And Prediction of Loan Using Multivariable Regression in Python

THEORY: Linear Regression:

Linear regression is probably one of the most important and widely used regression techniques.
It’s among the simplest regression methods. One of its main advantages is the ease of interpreting
results.

When implementing linear regression of some dependent variable 𝑦 on the set of independent
variables 𝐱 = (𝑥₁, …, 𝑥ᵣ), where 𝑟 is the number of predictors, you assume a linear relationship
between 𝑦 and 𝐱: 𝑦 = 𝛽₀ + 𝛽₁𝑥₁ + ⋯ + 𝛽ᵣ𝑥ᵣ + 𝜀. This equation is the regression equation. 𝛽₀,
𝛽₁, …, 𝛽ᵣ are the regression coefficients, and 𝜀 is the random error.

Linear regression calculates the estimators of the regression coefficients or simply the predicted

weights, denoted with 𝑏₀, 𝑏₁, …, 𝑏ᵣ. They define the estimated regression function (𝐱) = 𝑏₀ +
𝑏₁𝑥₁ + ⋯ + 𝑏ᵣ𝑥ᵣ. This function should capture the dependencies between the inputs and output
sufficiently well.

The estimated or predicted response, (𝐱ᵢ), for each observation 𝑖 = 1, …, 𝑛, should be as close as

possible to the corresponding actual response 𝑦ᵢ. The differences 𝑦ᵢ - (𝐱ᵢ) for all observations 𝑖 =
1, …, 𝑛, are called the residuals. Regression is about determining the best predicted weights, that
is the weights corresponding to the smallest residuals.

To get the best weights, you usually minimize the sum of squared residuals (SSR) for all
observations 𝑖 = 1, …, 𝑛: SSR = Σᵢ(𝑦ᵢ - 𝑓(𝐱ᵢ))². This approach is called the method of ordinary
least squares.

Multiple Linear Regression:

Multiple or multivariate linear regression is a case of linear regression with two or more
independent variables.

If there are just two independent variables, the estimated regression function is (𝑥₁, 𝑥₂) = 𝑏₀ +
𝑏₁𝑥₁ + 𝑏₂𝑥₂. It represents a regression plane in a three-dimensional space. The goal of
regression is to determine the values of the weights 𝑏₀, 𝑏₁, and 𝑏₂ such that this plane is as close
as possible to the actual responses and yield the minimal SSR.

The case of more than two independent variables is similar, but more general. The estimated
regression function is (𝑥₁, …, 𝑥ᵣ) = 𝑏₀ + 𝑏₁𝑥₁ + ⋯ +𝑏ᵣ𝑥ᵣ, and there are 𝑟 + 1 weights to be
determined when the number of inputs is 𝑟.

OUTPUT:

CONCLUSION:
EXPERIMENT NO. - 06
Aim of the Experiment :- Classification modelling
a. Choose classifier for classification problem.
b. Evaluate the performance of classifier

Lab Outcome :-

Date of Conduction : Date of Submission :__

Punctuality & Discipline

Implementation Understanding (5) (5) Total

(5) (15)

____________________________

Practical Incharge
Experiment No. 6
Aim : Classification modelling
a. Choose classifier for classification problem.
b. Evaluate the performance of classifier

THEORY: Ensemble learning is a machine learning paradigm where multiple models (often
called “weak learners”) are trained to solve the same problem and combined to get better results.
The main hypothesis is that when weak models are correctly combined we can obtain more
accurate and/or robust models.

Bagging is a homogeneous weak learners’ model that learns from each other independently in
parallel and combines them for determining the model average. Bagging is an acronym for
‘Bootstrap Aggregation’ and is used to decrease the variance in the prediction model. Bagging is
a parallel method that fits different, considered learners independently from each other, making it
possible to train them simultaneously.

Bagging generates additional data for training from the dataset. This is achieved by random
sampling with replacement from the original dataset. Sampling with replacement may repeat
some observations in each new training data set. Every element in Bagging is equally probable
for appearing in a new dataset.
These multi datasets are used to train multiple models in parallel. The average of all the
predictions from different ensemble models is calculated. The majority vote gained from the
voting mechanism is considered when classification is made. Bagging decreases the variance and
tunes the prediction to an expected outcome.
Example of Bagging: The Random Forest model uses Bagging, where decision tree models with
higher variance are present. It makes random feature selection to grow trees. Several random
trees make a Random Forest.

OUTPUT:

CONCLUSION:
EXPERIMENT NO. - 07

Aim of the Experiment :- Clustering

a. Clustering algorithms for unsupervised classification.
b. Plot the cluster data.

Lab Outcome :-

Date of Conduction : Date of Submission :__

Punctuality & Discipline

Implementation Understanding (5) (5) Total

(5) (15)

____________________________
Practical Incharge
Experiment No. 7
AIM : Clustering
a. Clustering algorithms for unsupervised classification.
b. Plot the cluster data.

THEORY: -K-Means Clustering is an unsupervised learning algorithm that is used to solve the
clustering problems in machine learning or data science. In this topic, we will learn what is K-
means clustering algorithm, how the algorithm works, along with the Python implementation of
k-means clustering.

What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm , which groups the unlabeled dataset
into different clusters. Here K defines the number of pre-defined clusters that need to be created
in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and
so on. It is an iterative algorithm that divides the unlabeled dataset into k different clusters in
such a way that each dataset belongs only one group that has similar properties. It allows us to
cluster the data into different groups and a convenient way to discover the categories of groups in
the unlabeled dataset on its own without the need for any training. It is a centreoid -based
algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to
minimize the sum of distances between the data point and their corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters,
and repeats the process until it does not find the best clusters. The value of k should be
predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K centre points or centroids by an iterative process.
o Assigns each data point to its closest k-centre Those data points which are near to the
particular k- centre , create a cluster.

Hence each cluster has data points with some commonalities, and it is away from other clusters.
The below diagram explains the working of the K-means Clustering Algorithm:

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each data point to the new closest centroid
of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

OUTPUT:

CONCLUSION:
EXPERIMENT NO. - 08

Aim of the experiment :- Using any machine learning techniques using available data set to
develop a recommendation system.

Lab Outcome :-

Date of Conduction : Date of Submission :__

Punctuality & Discipline

Implementation Understanding (5) (5) Total

(5) (15)

____________________________

Practical Incharge
EXPERIMENT NO.: 8

AIM: Using any machine learning techniques using available data set to develop a recommendation
system.

THEORY: Practically, recommender systems encompass a class of techniques and algorithms

which are able to suggest “relevant” items to users. Ideally, the suggested items are as relevant to

the user as possible, so that the user can engage with those items: YouTube videos, news articles,

online products, and so on.

Items are ranked according to their relevancy, and the most relevant ones are shown to the user.

The relevancy is something that the recommender system must determine and is mainly based on

historical data. If you’ve recently watched YouTube videos about elephants, then YouTube is

going to start showing you a lot of elephant videos with similar titles and themes!

Recommender systems are generally divided into two main categories: collaborative filtering and

content-based systems.

Figure 1: A tree of the different types of Recommender Systems.

Collaborative Filtering Systems

Collaborative filtering methods for recommender systems are methods that are solely based on

the past interactions between users and the target items. Thus, the input to a collaborative
filtering system will be all historical data of user interactions with target items. This data is

typically stored in a matrix where the rows are the users, and the columns are the items.

The core idea behind such systems is that the historical data of the users should be enough to

make a prediction. I.e we don’t need anything more than that historical data, no extra push from

the user, no presently trending information, etc.

Beyond this, collaborative filtering methods are further divided into two sub-groups: memory-

based and model-based methods.

Memory-based methods are the most simplistic as they use no model whatsoever. They assume

that predictions can be made on pure “memory” of past data and usually just employ a simple

distance-measurement approach, like nearest neighbour.

Model-based approaches, on the other hand, always assume some kind of underlying model and

basically try to make sure that whatever predictions come out will fit the model well.

steps:

1. Load up the data with pandas

2. Convert the pandas dataframes to graph lab SFrames

3. Train the model

4. Make recommendations

Principal component analysis (PCA) is a statistical procedure that is used to reduce the

dimensionality. It uses an orthogonal transformation to convert a set of observations of possibly

correlated variables into a set of values of linearly uncorrelated variables called principal

components. It is often used as a dimensionality reduction technique.

Steps Involved in the PCA

Step 1: Standardize the dataset.

Step 2: Calculate the covariance matrix for the features in the dataset.

Step 3: Calculate the eigenvalues and eigenvectors for the covariance matrix.

Step 4: Sort eigenvalues and their corresponding eigenvectors.

Step 5: Pick k eigenvalues and form a matrix of eigenvectors.

Step 6: Transform the original matrix.

OUTPUT:

CONCLUSION:
EXPERIMENT NO. - 09
Aim of the Experiment :- Exploratory data analysis using Apache Spark and Pandas

Lab Outcome :-

Date of Conduction : Date of Submission :__

Punctuality & Discipline

Implementation Understanding (5) (5) Total

(5) (15)

__________________________

Practical Incharge
Experiment No-9

AIM: Exploratory data analysis using Apache Spark and Pandas

THEORY:

Exploratory Data Analysis In Python?

Exploratory Data Analysis (EDA) in Python is the first step in data analysis process developed
by “John Tukey” in the 1970s. In statistics, exploratory data analysis is an approach to analyzing
data sets to summarize their main characteristics, often with visual methods.

For Example, You are planning to go on a trip to the “X” location. Things you do before taking
a decision:

 You will explore the location on what all places, waterfalls, trekking, beaches, restaurants
that location has in Google, Instagram, Facebook, and other social Websites.

 Calculate whether it is in your budget or not.

 Check for the time to cover all the places.

 Type of Travel method.

Similarly, when you are trying to build a machine learning model you need to be pretty sure
whether your data is making sense or not. The main aim of exploratory data analysis is to obtain
confidence in your data to an extent a machine learning algorithm.

Need For Exploratory Data Analysis

Exploratory Data Analysis is a crucial step before jumping to machine learning or modeling of
the data. By doing this you can get to know whether the selected features are good enough to
model, are all the features required, are there any correlations based on which we can either go
back to the Data Pre-processing step or move on to modeling.

Once Exploratory Data Analysis is complete, its feature can be used for supervised and
unsupervised machine learning modeling.

In every machine learning workflow, the last step is Reporting or Providing the insights to the
Stake Holders. By completing the Exploratory Data Analysis many plots can be drawn, heat-
maps, frequency distribution, graphs, correlation matrix along with the hypothesis by which any
individual can understand what the data is all about and what insights can get from exploring the
data set.

In Trip Example, all the exploration of the selected place are done based on which we will get
the confidence to plan the trip and even share with our friends the insights we got regarding the
place so that they can also join.

What Are The Steps In Exploratory Data Analysis In Python?

There are many steps for conducting Exploratory data analysis.

 Description of data

 Handling missing data

 Handling outliers

 Understanding relationships and new insights through plots

a) Description of data:
We need to know the different kinds of data and other statistics of our data before we can move
on to the other steps. A good one is to start with the describe() function in python. In Pandas, we
can apply describe() on a DataFrame which helps in generating descriptive statistics that
summarize the central tendency, dispersion, and shape of a dataset’s distribution, excluding NaN
values.

The result’s index will include count, mean, std, min, max as well as lower, 50 and upper
percentiles. By default, the lower percentile is 25 and the upper percentile is 75. The 50
percentile is the same as the median.

Loading the Dataset:

import pandas as pd
from sklearn.datasets import load_boston

boston = load_boston()
x = boston.data
y = boston.target
columns = boston.feature_names
# creating dataframes
boston_df = pd.DataFrame(boston.data)
boston_df.columns = columns
boston_df.describe()

b) Handling missing data:

Data in the real-world are rarely clean and homogeneous. Data can either be missing during data
extraction or collection due to several reasons. Missing values need to be handled carefully
because they reduce the quality of any of our performance matrix. It can also lead to wrong
prediction or classification and can also cause a high bias for any given model being used. There
are several options for handling missing values. However, the choice of what should be done is
largely dependent on the nature of our data and the missing values. Below are some of the
techniques:
 Drop NULL or missing values

 Fill Missing Values

 Predict Missing values with an ML Algorithm

Drop NULL or missing values:

This is the fastest and easiest step to handle missing values. However, it is not generally advised.
This method reduces the quality of our model as it reduces sample size because it works by
deleting all other observations where any of the variables is missing.

The above code indicates that there are no null values in our data set.

Fill Missing Values:

This is the most common method of handling missing values. This is a process whereby missing
values are replaced with a test statistic like mean, median or mode of the particular feature the
missing value belongs to. Let’s suppose we have a missing value of age in the boston data set.
Then the below code will fill the missing value with the 30.

Predict Missing values with an ML Algorithm:

This is by far one of the best and most efficient methods for handling missing data. Depending on
the class of data that is missing, one can either use a regression or classification model to predict
missing data.

c) Handling outliers:
An outlier is something which is separate or different from the crowd. Outliers can be a result of
a mistake during data collection or it can be just an indication of variance in your data. Some of
the methods for detecting and handling outliers:

 BoxPlot

 Scatterplot

 Z-score
 IQR(Inter-Quartile Range)

BoxPlot:
A box plot is a method for graphically depicting groups of numerical data through their quartiles.
The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2).
The whiskers extend from the edges of the box to show the range of the data. Outlier points are
those past the end of the whiskers. Boxplots show robust measures of location and spread as well
as providing information about symmetry and outliers.

import seaborn as sns

sns.boxplot(x=boston_df['DIS'])

OUTPUT:

CONCLUSION:
EXPERIMENT NO. - 10
Aim of the Experiment :-

Lab Outcome :-

Date of Conduction : Date of Submission :__

Punctuality & Discipline

Implementation Understanding (5) (5) Total

(5) (15)

__________________________

Practical Incharge
EXPERIMENT NO. - 10
AIM :- Batch and Streamed Data Analysis using Spark.

THEORY:

Datasets are becoming huge. Infact, data is growing faster than processing speeds.
Therefore, algorithms involving large data and high amount of computation are often
run on a distributed computing system. A distributed computing system involves
nodes (networked computers) that run processes in parallel and communicate (if,
necessary).
MapReduce – The programming model that is used for Distributed computing is
known as MapReduce. The MapReduce model involves two stages, Map and
Reduce.
1. Map – The mapper processes each line of the input data (it is in the form of a
file), and produces key – value pairs.
Input data → Mapper → list([key, value])
2. Reduce – The reducer processes the list of key – value pairs (after the
Mapper’s function). It outputs a new set of key – value pairs.
list([key, value]) → Reducer → list([key, list(values)])
Spark – Spark (open source Big-Data processing engine by Apache) is a cluster
computing system. It is faster as compared to other cluster computing systems (such
as, Hadoop). It provides high level APIs in Python, Scala, and Java. Parallel jobs are
easy to write in Spark. We will cover PySpark (Python + Apache Spark), because
this will make the learning curve flatter. To install Spark on a linux system,
follow this. To run Spark in a multi – cluster system, follow this. We will see how to
create RDDs (fundamental data structure of Spark).
RDDs (Resilient Distributed Datasets) – RDDs are immutable collection of
objects. Since we are using PySpark, these objects can be of multiple types. These
will become more clear further.
SparkContext – For creating a standalone application in Spark, we first define a
SparkContext –

RDD transformations – Now, a SparkContext object is created. Now, we will create

RDDs and see some transformations on them.
One major advantage of using Spark is that it does not load the dataset into memory,
lines is a pointer to the ‘file_name.txt’ ?file.

Steps:
1. Our text file is in the following format – (each line represents an edge of a
directed graph)
1    2
1    3
2    3
3    4
.    .
.    .
.    .PySpark
2. Large Datasets may contain millions of nodes, and edges.
3. First few lines set up the SparkContext. We create an RDD lines from it.
4. Then, we transform the lines RDD to edges RDD.The function conv a?cts on
each line and key value pairs of the form (1, 2), (1, 3), (2, 3), (3, 4), … are stored
in the edges RDD.
5. After this the reduceByKey aggregates all the key – pairs corresponding to a
particular key and numNeighbours function is used for generating each vertex’s
degree in a separate RDD Adj_list, which has the form (1, 2), (2, 1), (3, 1), …

OUTPUT:

CONCLUSION:
EXPERIMENT NO. - 11

Aim :- Implementation of Mini project based on case study taken from given dataset using Data
science and Machine learning.
Each group has to select a problem based on which ML project is done. Attach here the same.
The following steps should be outlined.

a) Problem definition, identifying which data set can be implemented.

b) Identify and use a standard data mining dataset available for the problem. Some links for data
science datasets are: Kaggle, UCI Machine Learning Repository etc.
c) Implement appropriate machine learning algorithm.
d) Interpret and visualize the results.

Lab Outcome :-

Date of Conduction : Date of Submission :__

Punctuality &
Implementation Understanding (5) Discipline (5) Total

(5) (15)
____________________________

Practical In charge

EXPERIMENT NO. - 11
AIM: Implementation of Mini project based on case study taken from given dataset using Data
science and Machine learning.
Each group has to select a problem based on which ML project is done. Attach here the same.
The following steps should be outlined.

a) Problem definition, identifying which data set can be implemented.

PROJECT DETAILS:

CONCLUSION:

3.4.6 Packet Tracer - Configure Vlans and Trunking - Physical Mode
No ratings yet
3.4.6 Packet Tracer - Configure Vlans and Trunking - Physical Mode
7 pages
Harsh Patil - Tata Steel Working Capital
100% (1)
Harsh Patil - Tata Steel Working Capital
82 pages
Lab Manual Blockchain Lab
100% (2)
Lab Manual Blockchain Lab
36 pages
Data Science and Big Data Analytics
No ratings yet
Data Science and Big Data Analytics
2 pages
EE0005 Introduction To Data Science and Artificial Intelligence - OBTL
No ratings yet
EE0005 Introduction To Data Science and Artificial Intelligence - OBTL
8 pages
Edureka Data Science Ebook
100% (2)
Edureka Data Science Ebook
22 pages
DSL Lab
No ratings yet
DSL Lab
79 pages
DSL Lab
No ratings yet
DSL Lab
81 pages
Self Learning Material - Introduction To Data Science
No ratings yet
Self Learning Material - Introduction To Data Science
10 pages
GreyAtom FSDSE Brochure PDF
No ratings yet
GreyAtom FSDSE Brochure PDF
25 pages
Data Science Syl Lab Us
No ratings yet
Data Science Syl Lab Us
4 pages
CHO AI 105 - Data Analytics-As Shared
No ratings yet
CHO AI 105 - Data Analytics-As Shared
8 pages
BUSINESS ANALYTICS UNIT I
No ratings yet
BUSINESS ANALYTICS UNIT I
45 pages
Master in Data Science-Syllabus
No ratings yet
Master in Data Science-Syllabus
15 pages
Data Analytics Program - Introduction To Data Analytics - Lesson 1
No ratings yet
Data Analytics Program - Introduction To Data Analytics - Lesson 1
56 pages
Data Science
No ratings yet
Data Science
132 pages
Data+Science+in+Python+ +Data+Prep+&+EDA
No ratings yet
Data+Science+in+Python+ +Data+Prep+&+EDA
196 pages
Syllabus
No ratings yet
Syllabus
3 pages
Symbiosis Skills and Professional University
No ratings yet
Symbiosis Skills and Professional University
3 pages
Syllabus BigData EN
No ratings yet
Syllabus BigData EN
6 pages
Data Science Course Outline CES LUMS
No ratings yet
Data Science Course Outline CES LUMS
4 pages
2nd - Semester - Data Science - Final - Updated
No ratings yet
2nd - Semester - Data Science - Final - Updated
15 pages
Big Data Management Syllabus
No ratings yet
Big Data Management Syllabus
5 pages
Pemanfaatan Big Data Dalam Riset 2023
No ratings yet
Pemanfaatan Big Data Dalam Riset 2023
47 pages
BCA-DATA_SCIENCE-FIFTH_SEM-BOS-APPROVED-SYLLABUS
No ratings yet
BCA-DATA_SCIENCE-FIFTH_SEM-BOS-APPROVED-SYLLABUS
22 pages
Data Analytics Course Plan 2016
No ratings yet
Data Analytics Course Plan 2016
7 pages
Fd45092a Ccad 459e Bc18 b01536fd6bac Untitled
No ratings yet
Fd45092a Ccad 459e Bc18 b01536fd6bac Untitled
53 pages
CE0716-Data Warehouse and Mining_Compulsory
No ratings yet
CE0716-Data Warehouse and Mining_Compulsory
5 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
Thinklance Data Science (1)
No ratings yet
Thinklance Data Science (1)
14 pages
Big Data Analytics Syllabus
No ratings yet
Big Data Analytics Syllabus
2 pages
2023 HIT2203-Course outline
No ratings yet
2023 HIT2203-Course outline
6 pages
HIT2203 Course Outline
No ratings yet
HIT2203 Course Outline
6 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
82 pages
Module 1 ITE Elective 1 New - Curriculum
No ratings yet
Module 1 ITE Elective 1 New - Curriculum
10 pages
1693261154 DSC Data Science Career Track Syllabus 082823
No ratings yet
1693261154 DSC Data Science Career Track Syllabus 082823
20 pages
Machine Learning & Data Science
No ratings yet
Machine Learning & Data Science
18 pages
Data Scientist Master Program Slimup v2
No ratings yet
Data Scientist Master Program Slimup v2
26 pages
Data Scientist Master Program
No ratings yet
Data Scientist Master Program
31 pages
unit 1 ds
No ratings yet
unit 1 ds
10 pages
Data Analytics Course Guide 2024
No ratings yet
Data Analytics Course Guide 2024
14 pages
F.Y.B.Sc Data Science (CBCS)
No ratings yet
F.Y.B.Sc Data Science (CBCS)
14 pages
Applied Data Science With Python-N
No ratings yet
Applied Data Science With Python-N
17 pages
Data Analytics Full Time Bootcamp PDF
100% (1)
Data Analytics Full Time Bootcamp PDF
11 pages
Certificate in Big Data Analytics For Business and Management
No ratings yet
Certificate in Big Data Analytics For Business and Management
17 pages
Unit I 2 Marks
No ratings yet
Unit I 2 Marks
5 pages
DISC 325 - Business Data Management - Section 1 - Ussama Yaqub
No ratings yet
DISC 325 - Business Data Management - Section 1 - Ussama Yaqub
5 pages
CDF - Data Analysis and Visualization CDF
No ratings yet
CDF - Data Analysis and Visualization CDF
4 pages
Knowledge Discovery Data Mining - Syllabus
No ratings yet
Knowledge Discovery Data Mining - Syllabus
6 pages
Data Roadmap
No ratings yet
Data Roadmap
9 pages
2nd - Semester - Data Science - Modified
No ratings yet
2nd - Semester - Data Science - Modified
14 pages
4 Ppt on YARN MapReduce 31 10 20
No ratings yet
4 Ppt on YARN MapReduce 31 10 20
17 pages
DAT100_Int_Data_Ana_Lec2_Intro II
No ratings yet
DAT100_Int_Data_Ana_Lec2_Intro II
39 pages
Introduction to Data-Science
No ratings yet
Introduction to Data-Science
246 pages
BDA Syllabus
No ratings yet
BDA Syllabus
4 pages
Kadir
No ratings yet
Kadir
80 pages
BigDataSYLLABUS
No ratings yet
BigDataSYLLABUS
4 pages
CH 1 Introduction To Data Science
100% (1)
CH 1 Introduction To Data Science
27 pages
2024 RoadMap
No ratings yet
2024 RoadMap
18 pages
Data Science 1
100% (3)
Data Science 1
133 pages
Data Science With Python - Lesson 01 - Data Science Overview
100% (5)
Data Science With Python - Lesson 01 - Data Science Overview
35 pages
Data Science with Python: From Zero to Machine Learning
From Everand
Data Science with Python: From Zero to Machine Learning
Pouvo
No ratings yet
Data Science with Python: Unlocking the Power of Pandas and Numpy
From Everand
Data Science with Python: Unlocking the Power of Pandas and Numpy
Robert Johnson
No ratings yet
Sensor Lab PPT Template
No ratings yet
Sensor Lab PPT Template
15 pages
Assignment 1 (PM)
No ratings yet
Assignment 1 (PM)
3 pages
Social Distancing Indicator and Alarming System
No ratings yet
Social Distancing Indicator and Alarming System
14 pages
Web Based Mini Project Report Format
No ratings yet
Web Based Mini Project Report Format
8 pages
Project Black Book
No ratings yet
Project Black Book
38 pages
IEEE Project Report
No ratings yet
IEEE Project Report
14 pages
IEEE Paper (DEVELOPMENT OF PROGRAMMING LANGUAGE PYTHON)
No ratings yet
IEEE Paper (DEVELOPMENT OF PROGRAMMING LANGUAGE PYTHON)
16 pages
BCS Higher Education Qualifications Diploma in IT Principles of Internet Technologies Syllabus
No ratings yet
BCS Higher Education Qualifications Diploma in IT Principles of Internet Technologies Syllabus
5 pages
Section V - AODB-invert
No ratings yet
Section V - AODB-invert
31 pages
IO-Controller PROFINET Functions en
No ratings yet
IO-Controller PROFINET Functions en
8 pages
MLA Research Paper - Using Notecards
No ratings yet
MLA Research Paper - Using Notecards
11 pages
Hostel Management System Report
No ratings yet
Hostel Management System Report
57 pages
Finite Element Analysis
No ratings yet
Finite Element Analysis
2 pages
4 - Markov Process
No ratings yet
4 - Markov Process
86 pages
TT-0100 2100 Plus Diagnostic Manual
No ratings yet
TT-0100 2100 Plus Diagnostic Manual
1 page
MOM - Kick Off Meeting - UPS
No ratings yet
MOM - Kick Off Meeting - UPS
3 pages
61 - TYBAF - A Comparative Study of Consumer Preference Between Amazon and Flipkart - Mayur Satre - Research Project
No ratings yet
61 - TYBAF - A Comparative Study of Consumer Preference Between Amazon and Flipkart - Mayur Satre - Research Project
97 pages
Radioparts Catalog
No ratings yet
Radioparts Catalog
84 pages
Bangalore - Chennai
No ratings yet
Bangalore - Chennai
17 pages
SL2100 Licensing v1.0 PDF
No ratings yet
SL2100 Licensing v1.0 PDF
19 pages
Cs431 CD Lab Manual - Knpy PDF
No ratings yet
Cs431 CD Lab Manual - Knpy PDF
47 pages
KMBN IT01 LM Consolidated
No ratings yet
KMBN IT01 LM Consolidated
123 pages
Process Chart
No ratings yet
Process Chart
21 pages
Pvsyst Sa - Route de La Maison-Carrée 30 - 1242 Satigny - Switzerland
No ratings yet
Pvsyst Sa - Route de La Maison-Carrée 30 - 1242 Satigny - Switzerland
34 pages
Here Are A Few Tips For Reading Excel Files in A WPF MVVM Application
No ratings yet
Here Are A Few Tips For Reading Excel Files in A WPF MVVM Application
5 pages
6th Merit List BS Artificial Intelligence Group A Department of Artificial Intelligence BAHAWALPUR BWP Merit Fall 2024 Fall 2024
No ratings yet
6th Merit List BS Artificial Intelligence Group A Department of Artificial Intelligence BAHAWALPUR BWP Merit Fall 2024 Fall 2024
6 pages
Cyber Sec
No ratings yet
Cyber Sec
376 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
Believer Song
No ratings yet
Believer Song
11 pages
At Command
No ratings yet
At Command
6 pages
Java Automation QAEngineer New - Ashx
No ratings yet
Java Automation QAEngineer New - Ashx
1 page
Database Management System SET 1 Lab Practicals
No ratings yet
Database Management System SET 1 Lab Practicals
7 pages
Verbal Reasoning - Parent's Guide
No ratings yet
Verbal Reasoning - Parent's Guide
9 pages
Dinesh Kamani Full Times Resume
No ratings yet
Dinesh Kamani Full Times Resume
1 page
N - Mme005Ma1 - B#Tsel#Taliburatselma Sec 3
No ratings yet
N - Mme005Ma1 - B#Tsel#Taliburatselma Sec 3
204 pages
8FM0-28 AS Decision Mathematics 2 - May 2018
No ratings yet
8FM0-28 AS Decision Mathematics 2 - May 2018
8 pages