0% found this document useful (0 votes)

6 views132 pages

Data Science

The document outlines a syllabus for an Introduction to Data Science course, covering topics such as the data science life cycle, tools for data science, artificial intelligence (AI), and machine learning (ML) algorithms. It includes sections on probability theory, SQL commands, and various machine learning techniques like supervised, unsupervised, and reinforcement learning. Additionally, it discusses feature selection methods and provides examples of applications in data science.

Uploaded by

thecarr2006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views132 pages

Data Science

Uploaded by

thecarr2006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 132

School of Computer Science & Engineering

UNIT 2
Introduction to Data Science
Syllabus

• What is Data Science? Applications of Data Science, Data science life cycle, Tools for data science, definition

of AI, types of machine learning (ML), list of ML algorithms for classification, clustering, and feature

selection. Probability theory, bayes theorem, bayes probability; Cartesian plane, equations of lines, graphs;

exponents.

• Introduction to SQL: SQL Commands experimental demonstrations-DDL, DML, DCL, TCL, DQL. Import SQL

Database Data into Excel.

2
Outline:

Review of Previous Lecture

Topic for the day
Objective and Outcome of Lecture
Data Science
Lecture
Discussion Probability Theorem

Examples

3
Review of Previous Lecture :
Creating excel sheet

Uploading csv , xml data into excel

4
Topic for the Lecture:
What is data science

Probability Theorem

Bayes Theorem

5
Objective and Outcome of
Lecture:

• To understand data science

probabilitytheory,
• bayes theorem,
Lecture Objective • Cartesian Plane,
• equations of lines, graphs,
exponents,
• SQL

6
Data Science:

Data science is the field of study that combines domain expertise,

programming skills, and knowledge of mathematics and statistics to extract
meaningful insights from data.

Data science is the study of data.

It involves developing methods of recording, storing, and analyzing data to

effectively extract useful information.
7
Data Science:

8
The life cycle of the data science:
The business requirement step deals with the
identification of the problem and objectives

The data acquisition step deals with finding and

collecting of the source of data and store the data

The data processing step is used to transform the data to

a form that suits better for finding the required information.

The data exploration step is a brain storming step where

identification of pattern is done

The data modeling step deals with building of data

models and training the models using the data sets.

The deployment stage deals with the deployment of the

model in the business environment. 9
Tools for data
science

10
Explanation
• 1. SAS (Statistical Analysis Software) It is one of those data science tools
which are specifically designed for statistical operations. SAS is a closed
source proprietary software that is used by large organizations to analyze
data.

• 2. Apache Spark
• Apache Spark or simply Spark is an all-powerful analytics engine and it is the
most used Data Science tool. Spark is specifically designed to handle batch
processing and Stream Processing.

• 3. BigML
• It provides a fully interactable, cloud-based GUI environment that you can use
for processing Machine Learning Algorithms.

• 4. D3.js
• Javascript is mainly used as a client-side scripting language. D3.js, a Javascript
library allows you to make interactive visualizations on your web-browser.
Explanation
• 5. MATLAB
• MATLAB is a multi-paradigm numerical computing environment for
processing mathematical information.
• It is a closed-source software that facilitates matrix functions,
algorithmic implementation and statistical modeling of data. MATLAB
is most widely used in several scientific disciplines.

• In Data Science, MATLAB is used for simulating neural networks and

fuzzy logic. Using the MATLAB graphics library, you can create
powerful visualizations.

• 6. Excel Probably the most widely used Data Analysis tool. Microsoft
developed Excel mostly for spreadsheet calculations and today, it is
widely used for data processing, visualization, and complex
calculations.
• 7. ggplot2 ggplot2 is an advanced data visualization package for the
R programming language. The developers created this tool to replace
the native graphics package of R and it uses powerful commands to
create illustrious visualizations. It is the most widely used library that
Data Scientists use for creating visualizations from analyzed data.
Explanation
• 8. Tableau
• Tableau is a Data Visualization software that is packed with
powerful graphics to make interactive visualizations. It is focused
on industries working in the field of business intelligence. The
most
• important aspect of Tableau is its ability to interface with
databases, spreadsheets, OLAP (Online Analytical Processing)
cubes, etc. Along with these features, Tableau has the ability to
visualize geographical data and for plotting longitudes and
latitudes in maps.

• 9. Jupyter Project Jupyter is an open-source tool based on IPython

for helping developers in making open-source software and
experiences interactive computing. Jupyter supports multiple
languages like Julia, Python, and R. It is a web-application tool
used for writing live code, visualizations, and presentations.
Jupyter is a widely popular tool that is designed to address the
requirements of Data Science.
Explanation
• 10. Matplotlib Matplotlib is a plotting and visualization library
developed for Python. It is the most popular tool for generating
graphs with the analyzed data. It is mainly used for plotting
complex graphs using simple lines of code. Using this, one can
generate bar plots, histograms, scatterplots etc. Matplotlib has
several essential modules. One of the most widely used
modules is pyplot. It offers a MATLAB like an interface. Pyplot is
also an open-source alternative to MATLAB‘s graphic modules.

• 13. TensorFlow TensorFlow has become a standard tool for

Machine Learning. It is widely used for advanced machine
learning algorithms like Deep Learning. Developers named
TensorFlow after Tensors which are multidimensional arrays. It is
an open-source and ever-evolving toolkit which is known for its
performance and high computational abilities.
15
Artificial
intelligence (AI)
• Artificial intelligence (AI) is intelligence demonstrated
by machines, unlike the natural intelligence displayed
by humans and animals, which involves consciousness
and emotionality
• Artificial intelligence (AI), the ability of a digital
computer or computer-controlled robot to perform
tasks commonly associated with intelligent beings.
• Artificial intelligence (AI) refers to the simulation of
human intelligence in machines that are programmed
to think like humans and mimic their actions.

16
Machine
Learning
“Machine learning enables
a machine toautomatically learn
from data, improve performance
from experiences, and predict
things withoutbeing explicitly
programmed.”

17
Key differences between AI
and ML

18
Key differences between AI
and ML

19
Types of machine
learning (ML)

20
Types of Machine Learning

3/24/2021 21
Supervised
learning
• Supervised learning as the name indicates the
presence of a supervisor as a teacher.
• Basically supervised learning is a learning in
which we teach or train the machine using data
which is well labeled that means some data is
already tagged with the correct answer.
• After that, the machine is provided with a new set
of examples(data) so that supervised learning
algorithm analyses the training data(set of training
examples) and produces a correct outcome from
labeled data.

22
Unsupervised
learning
• Unsupervised learning is the training of machine using
information that is neither classified nor labeled and
allowing the algorithm to act on that information without
guidance.
• Here the task of machine is to group unsorted
information according to similarities, patterns and
differences without any prior training of data.
• Unlike supervised learning, no teacher is provided that
means no training will be given to the machine.
• Therefore machine is restricted to find the hidden
structure in unlabeled data by our-self.

23
Semi-supervised
learning &
Reinforcement
learning
• Semi-supervised Learning is between the supervised
and unsupervised learning.
• It uses both labelled and unlabelled data for training.
• Reinforcement learning trains an algorithm with a
reward system, providing feedback when an artificial
intelligence agent performs the best action in a
particular situation.
• In Reinforcement learning , AI agents are attempting to
find the optimal way to accomplish a particular goal, or
improve performance on a specific task.
• As the agent takes action that goes toward the goal, it
receives a reward.
24
Examples /
Applications

25
Difference

26
Regressi
on
• Regression analysis is a statistical method to model
the relationship between a dependent (target) and
independent (predictor) variables with one or more
independent variables.
• Regression is a process of finding the correlations
between dependent and independent variables.
• It helps in predicting the continuous variables such
as prediction of Market Trends, prediction of House
prices, etc

27
ML Regression
Algorithms
• Simple Linear Regression
• Multiple Linear Regression
• Polynomial Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression

28
Classificat
ion
• Classification algorithm is a Supervised Learning
technique that is used to identify the category of
new observations on the basis of training data.
• In Classification, a program learns from the given
dataset or observations and then classifies new
observation into a number of classes or groups.
• Such as, Yes or No, 0 or 1, Spam or Not Spam, cat
or dog, etc.

29
ML Classification
Algorithms
• Logistic Regression
• K-Nearest Neighbours
• Support Vector Machines
• Kernel SVM
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification

30
Difference between
Regression and
Classification

31
Clusteri
ng
• Grouping the similar data is called cluster
• Clustering or cluster analysis is a machine learning
technique, which groups the unlabelled dataset.

32
Clustering
Algorithms
• K-Means algorithm
• Agglomerative Hierarchical algorithm
• Mean-shift algorithm
• DBSCAN Algorithm (Density-Based Spatial
Clustering of Applications with Noise)
• Expectation-Maximization (EM) Clustering using
GMM (Gaussian Mixture Model)

33
Association Rule
• Association rule learning is a type of unsupervised learning technique that checks
for the dependency of one data item on another data item and maps accordingly
so that it can be more profitable. It tries to find some interesting relations or
associations among the variables of dataset. It is based on different rules to
discover the interesting relations between variables in the database.

• The association rule learning is one of the very important concepts of

machine learning, and it is employed in Market Basket analysis, Web usage
mining, continuous production, etc.

34
35
Feature
selection
• In machine learning and statistics, feature
selection, also known as variable selection,
attribute selection or variable subset selection
• It is the process of selecting a subset of relevant
features (variables, predictors) for use in model
construction.
• When the number of features are very large. No-
need not use every feature at your disposal for
creating an algorithm.
• You can assist your algorithm by feeding in only
those features that are really important.
36
Feature
selection
• Machine learning works on a simple rule – if you put
garbage in, you will only get garbage to come out.
(garbage -noise) - “Sometimes, less is better!”
Top reasons to use feature selection are:
• It enables the machine learning algorithm to train faster.
• It reduces the complexity of a model and makes it easier
to interpret.
• It improves the accuracy of a model if the right subset is
chosen.
• It reduces overfitting.
37
ML Feature selection
Algorithms
Filter Methods:Filter methods are a type of feature
selection method that works by selecting features based
on some criteria prior to building the model.

• Pearson’s Correlation
• Linear Discriminant Analysis (LDA)
• ANOVA (Analysis of variance)
• Chi-Square

38
Wrapper Methods

• The wrapper method has the same goal as the filter

method, but it takes a machine learning model for its
evaluation. In this method, some features are fed to
the ML model, and evaluate the performance. The
performance decides whether to add those features or
remove to increase the
accuracy of the model. This method is more accurate
than the filtering method but complex to work .

• Forward Selection
• Backward Elimination
• Recursive Feature elimination
ML Feature selection
Algorithms
Embedded Methods
Embedded methods check the different training
iterations of the machine learning model and
evaluate the importance of each feature.
• Decision Tree
• ID3
• C4.5
• Classification And Regression Tree (CART)

40
Linear regression

y= mx+c+ ε
• y= Dependent Variable (Target Variable)
• x= Independent Variable (predictor Variable)
• c= y intercept of the line
• m= slope
• ε= error

41
Probability theory

Probability means possibility. It is a branch of mathematics that

deals with the occurrence of a random event.

Probability theory is the branch of mathematics that deals with

the possibility of the happening of events.

The value is expressed from zero to one

Probability can range in from 0 to 1, where 0 means the event to

42
be an impossible one and 1 indicates a certain event.
Probability theory
Probability has been introduced in Maths to predict how likely
events are to happen.

This is the basic probability theory, which is also used in the

probability distribution, where you will learn the possibility of
outcomes for a random experiment.

To find the probability of a single event to occur, first, we

should know the total number of possible outcomes.
43
Formula for Probability
The probability formula is defined as the possibility of an event to
happen is equal to the ratio of the number of favourable outcomes and
the total number of outcomes.

Probability of event to happen P(E) =

Number of favourable outcomes/Total Number of outcomes

This is the basic formula. But there are some more formulas
for different situations or events.

44
Probability theory

• For example,
When we toss a coin, either we get Head OR Tail, only two
possible outcomes are possible (H, T).

But if we toss two coins in the air, there could be three

possibilities of events to occur, such as both the coins show
heads or both show tails or one shows heads and one tail, i.e.
(H, H), (H, T),(T, T).
45
Problems on Probability
1) There are 6 pillows in a bed, 3 are red, 2 are yellow and
1 is blue. What is the probability of picking a yellow
pillow?

Solution:

The probability is equal to the number of yellow pillows in

the bed divided by the total number of pillows,
i.e. 2/6 = 1/3.
46
Problems on Probability
2) There is a container full of coloured bottles, red, blue, green and orange. Some
of the bottles are picked out and displaced. Sumit did this 1000 times and got the
following results:

No. of blue bottles picked out: 300

No. of red bottles: 200
No. of green bottles: 450
No. of orange bottles: 50

a) What is the probability that Sumit will pick a green bottle?

Ans: For every 1000 bottles picked out, 450 are green.
Therefore, P(green) = 450/1000 = 0.45

b) If there are 100 bottles in the container, how many of them are likely to be green?
Ans: The experiment implies that 450 out of 1000 bottles are green.
47
Therefore, out of 100 bottles, 45 are green.
48
49
50
51
52
53
Probability Terms and Definition
Some of the important probability terms are

54
Probability Terms and Definition
Some of the important probability terms are

55
Question 2: Two dice are rolled, find the probability
that the sum is:
equal to 1
equal to 4
less than 13
Solution:

To find the probability that the sum is equal to 1

But we have to first determine the sample space S of two dice as shown below.
S = { (1,1),(1,2),(1,3),(1,4),(1,5),(1,6)
(2,1),(2,2),(2,3),(2,4),(2,5),(2,6)
(3,1),(3,2),(3,3),(3,4),(3,5),(3,6)
(4,1),(4,2),(4,3),(4,4),(4,5),(4,6)
(5,1),(5,2),(5,3),(5,4),(5,5),(5,6)
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6) }
So, n(S) = 36 56
4) Let E be the event “sum equal to 1”. Since, there are no
outcomes which where a sum is equal to 1, hence,
P(E) = n(E) / n(S) = 0 / 36 = 0
2) Let A be the event of getting the sum of numbers on dice
equal to 4.
Three possible outcomes give a sum equal to 4 they are:
A = {(1,3),(2,2),(3,1)}
n(A) = 3
Hence, P(A) = n(A) / n(S) = 3 / 36 = 1 / 12
3) Let B be the event of getting the sum of numbers on dice is less than 13.
From the sample space, we can see all possible outcomes for the event B, which gives a sum less
than B. Like:
(1,1) or (1,6) or (2,6) or (6,6).
So you can see the limit of an event to occur is when both dies have number 6, i.e. (6,6).
Thus, n(B) = 36
Hence,
P(B) = n(B) / n(S) = 36 / 36 = 1 57
Applications of Probability

Probability has a wide variety of applications in real life. Some of

the common applications which we see in our everyday life while
checking the results of the following events:

Choosing a card from the deck of cards

Flipping a coin
Throwing a dice in the air
Pulling a red ball out of a bucket of red and white balls
Winning a lucky draw

58
Major Applications of Probability

It is used for risk assessment and modelling in various

industries
Weather forecasting or prediction of weather changes
Probability of a team winning in a sport based on players
and strength of team
In the share market, chances of getting the hike of share
prices

59
Basics of Probability

Example:1 Given: 10 marbles: 2 red, 3 green, 5 blue.

• Find: probability of selecting green?
• Solution: P(G) = 3/10= .30

Two Basic Rules

1.Additional Rule
1. Mutually exclusive - Mutual exclusive mean occurrence of events both A and B together is impossible i.e.
P(A and B)=0 and A or B is the sum of A and B i.e. P(A or B) =P(A) + P(B) OR P(AUB)=P(A)
+P(B)

60
-Non Mutually exclusive
• In case of Non Mutual exclusive events A or B is the sum of A and B minus A and B i.e.
• P(A or B) =P(A) + P(B) – P(A and B) OR P(AUB)=P(A)+P(B)-P(A AND B)

61
62
63
64
65
2. Multifaction Rule

Multiplication rule of probability states that whenever an event is the

intersection of two other events, that is, events A and B need to occur
simultaneously. Then, P(A and B)=P(A)⋅P(B).

The set A∩B denotes the simultaneous occurrence of events A and B, that
is the set in which both events A and event B have occurred.

The probability of event AB is obtained by using the properties of

conditional probability, which is given as P(A ∩ B) = P(A) P(B | A).

Multiplication Rule of Probability for Dependent Events

If the outcome of one event affects the outcome of the other, then those
events are referred to as dependent events.

Sometimes, the occurring of the first event impacts the probability of the
second event. From the theorem,
we have, P(A ∩ B) = P(A) P(B | A), where A and B are independent
events. 66
2. Multifaction Rule

67
Dependent Event (Conditional
Prabability)
• The conditional probability of an event B in relationship to an event A is the probability that event B occurs
given that event A has already occurred.

68
69
Problem 1:
• A math teacher gave her class two tests. 25% of the class passed both tests and 42% of the class passed the
first test. What percent of those who passed the first test also passed the second test?
Answer:
P(Second | First) = P(First and Second)/P(First)
= 0.25/0.42=0.60
= 60%

70
Problem 2:
• A jar contains black and white marbles. Two marbles are chosen without replacement. The probability of
selecting a black marble and then a white marble is 0.34, and the probability of selecting a black marble on
the first draw is 0.47. What is the probability of selecting a white marble on the second draw, given that the
first marble drawn was black?
• Answer:
• P(White | Black) = P(Black and White)/P(Black)
= 0.34/0.47
=.72
= 72%

71
Bayes Theorem
• Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after
Reverend Thomas Bayes, describes the probability of an event, based
on prior knowledge of conditions that might be related to the event.

• Bayes’ theorem describes the probability of occurrence of an event

related to any condition. It is also considered for the case of
conditional probability.

• For example: if we have to calculate the probability of taking a blue ball

from the second bag out of three different bags of balls, where each
bag contains three different colour balls viz. red, blue, black. In this
case, the probability of occurrence of an event is calculated depending
on other conditions is known as conditional probability. 72
Statement of theorem

73
Statement of theorem

74
Example 1:
• A bag I contain 4 white and 6 black balls while another Bag II contains 4 white and 3 black balls. One ball is
drawn at random from one of the bags, and it is found to be black. Find the probability that it was drawn from
Bag I.
• Solution:
• Let E1 be the event of choosing the bag I, E2 the event of choosing the bag II, and A be the event of drawing
a black ball.
• Then,P(E1) = P(E2) = 1/2
• Also,P(A|E1) = P(drawing a black ball from Bag I) = 6/10 = 3/5
• P(A|E2) = P(drawing a black ball from Bag II) = 3/7
• By using Bayes’ theorem, the probability of drawing a black ball from bag I out of two bags,
• P(E1|A) = P(E1)P(A|E1)/P(E1)P(A│E1)+P(E2)P(A|E2)
• =(1/2 × 3/5)/(1/2 × 3/5 + 1/2 × 3/7) = 7/12

75
Example 2:

76
Problem on Bayes Theorem

77
Assignment Example 2:
• A man is known to speak truth 2 out of 3 times. He throws a die and reports that the
number obtained is a four. Find the probability that the number obtained is actually a four.
• Solution:
• Let A be the event that the man reports that number four is obtained.
• Let E1 be the event that four is obtained and E2 be its complementary event.
• Then, P(E1) = Probability that four occurs = 1/6
• P(E2) = Probability that four does not occurs = 1 – P(E1) = 1 −1/6 = 5/6
• Also, P(A|E1) = Probability that man reports four and it is actually a four = 2/3
• P(A|E2) = Probability that man reports four and it is not a four = 1/3
• By using Bayes’ theorem, probability that number obtained is actually a four,
• P(E1|A) =P(E1)P(A|E1)/P(E1)P(A│E1) + P(E2)P(A|E2) = (1/6 × 2/3)/(1/6 × 2/3 + 5/6 ×
1/3) = 2/7
78
Problem on Bayes Theorem
1. In a bolt factory, machines A, B and C manufacture 25%, 35%
, 40% respectively. Of the total of their output 5, 4 and 2% are defective.
A bolt is drawn and is found to be defective. What are the
probabilities that it was manufactured by the machines C ?
Solution:

79
80
Another Way to Solve

81
Assignment
Problem on Bayes Theorem
2.An insurance company insured 2000 scooter drivers, 4000 car
drivers and 6000 truck drivers. The probability of an accident involving
a scooter, a car and a truck are 0.01, 0.03 and 0.15 respectively. One of
the insured persons meets with an accident. What is the probability that
he is a scooter driver?

82
Problem on Bayes Theorem
2. An insurance company insured 2000 scooter drivers, 4000 car
drivers and 6000 truck drivers. The probability of an accident involving
a scooter, a car and a truck are 0.01, 0.03 and 0.15 respectively. One of
the insured persons meets with an accident. What is the probability that
he is a scooter driver?
Solution:

83
84
85
Cartesian Plane
The cartesian plane is a two-dimensional coordinate plane formed by
the intersection of two perpendicular lines. The horizontal line is known
as X-axis, and the vertical line is known as Y-axis. The coordinate point
(x, y) on the Cartesian plane says that the horizontal distance of the
point from the origin is x, and the vertical distance is y. If the sign of x is
positive, the point is on the right of the origin; else it is on the left.
Similarly, if the sign is positive for y, the point is y points above the
origin else it is y points below it.

86
Cartesian Plane:

87
Equation of lines:

The equation of a line is typically written

as y=mx+b where m is the slope and b is
the y-intercept.

The maths allow us to get a

straight line between any two (x,y) points
in two dimensional graph.
88
Graphs:

The Data Science and Analytics field has also used Graphs to
model various structures and problems.

As a Data Scientist, you should be able to solve problems in an

efficient manner and Graphs provide a mechanism to do that

Graphs are mathematical structures used to study pairwise

relationships between objects and entities.
89
Graphs:

A Graph is a pair of sets. G = (V,E). V is the set of vertices. E is a set of

edges. E is made up of pairs of elements from V (unordered pair)

A DiGraph is also a pair of sets. D = (V,A). V is the set of vertices. A is the

set of arcs. A is made up of pairs of elements from V (ordered pair)

In the case of digraphs, there is a distinction between `(u,v)` and `(v,u)`.

Usually the edges are called arcs in such cases to indicate a notion of
direction.
90
Types of graph

91
Trivial Graph:

• A graph is said to be trivial if a finite graph contains only one vertex

and no edge.

92
Simple Graph:

• A simple graph is a graph that does not contain more than one edge
between the pair of vertices. A simple railway track connecting
different cities is an example of a simple graph.

93
Multi Graph:
• Multi Graph:
• Any graph which contains some parallel edges but doesn’t contain any
self-loop is called a multigraph. For example a Road Map.
• Parallel Edges: If two vertices are connected with more than one edge
then such edges are called parallel edges that are many routes but
one destination.
• Loop: An edge of a graph that starts from a vertex and ends at the
same vertex is called a loop or a self-loop.

94
Exponents:

An expression that represents repeated multiplication of the

same factor is called a power.
An exponent refers to the number of times a number is
multiplied by itself.
For example, 2 to the 3rd (written like this: 23) means: 2 x 2 x 2 =
8.
The exponent corresponds to the number of times the base is
used as a factor.
95
Properties
• The properties of exponents or laws of exponents are used to solve
problems involving exponents. These properties are also considered
as major exponents rules to be followed while solving exponents. The
properties of exponents are mentioned below.
• Law of Product: am × an = am+n
• Law of Quotient: am/an = am-n
• Law of Zero Exponent: a0 = 1
• Law of Negative Exponent: a-m = 1/am
• Law of Power of a Power: (am)n = amn
• Law of Power of a Product: (ab)m = ambm
• Law of Power of a Quotient: (a/b)m = am/bm
96
SQL
OBJECTIVE AND OUTCOME OF
LECTURE
Introduction to Microsoft Excel

To learn about SQL, Database tables.

Lecture
Objective

To implement SQL commands in Database.

Lecture
Outcome
DATABASE

A database is a collection of information that

is organized so that it can be easily accessed,
managed and updated. Computer databases
typically contain aggregations of data records
or files,
SQL
SQL is a standard language for accessing and manipulating
databases.

SQL stands for Structured Query Language.

SQL lets you access and manipulate databases

DATABASE TABLES

A database most often contains one or

more tables.
Each table is identified by a name (e.g.
"Customers" or "Orders").
Tables contain records (rows) with data.

101
1. DDL – Data Definition Language - used to create and modify the structure of
objects in a database using predefined commands and a specific syntax. These
database objects include tables, sequences, locations, aliases, schemas and
indexes.

2. DML – Data Manipulation Language- used to make changes to the database,

such as: CRUD operations to create, read, update and delete data. Using
INSERT, SELECT, UPDATE, and DELETE commands.

3. DCL – Data Control Language its commands are administrative powers that
allow other users access to the database.

4. TCL – Transaction Control Language which commits, or saves, transactions

done to the database or data.
MYSQL COMMANDS
SOME OF THE MOST IMPORTANT
SQL COMMANDS
SELECT - extracts data from a database
UPDATE - updates data in a database

DELETE - deletes data from a database

INSERT INTO - inserts new data into a database

CREATE DATABASE - creates a new database

ALTER DATABASE - modifies a database

CREATE TABLE - creates a new table

ALTER TABLE - modifies a table

DROP TABLE - deletes a table

CREATE INDEX - creates an index (search key)

DROP INDEX - deletes an index

104
CREATE TABLE

The CREATE TABLE statement is used to create a new table in a database.

In that table, if you want to add multiple columns, use the below syntax.

The column parameters specify the names of the columns of the table.

The data type parameter specifies the type of data the column can hold (e.g.
varchar, integer, date, etc.).

105
CREATE TABLE

The EmpId column is of type int and will hold an integer.

The LastName, FirstName, Address, and City columns are of type varchar and
will hold characters and the maximum length for these fields is 255
characters.

106
INSERT VALUE IN TABLE

The INSERT INTO statement is used to insert new records in a table.

It is possible to write the INSERT INTO statement in two ways.

Syntax

The first way specifies both the column names and the values to be inserted.

If you are adding values for all the columns of the table, then no need to specify the column
names in the SQL query. However, make sure that the order of the values is in the same
order as the columns in the table.

107
INSERT VALUE IN TABLE

108
SELECT
Display the contents of the table
Syntax:
Select * from table_name
Example:
Select * from tasks
DESCRIBE TABLE
To view the structure / schema of a table

Syntax:
DESCRIBE table_name
DESC table_name

1. Example:
DELETE
To delete the contents of the table
Syntax:
DELETE * FROM table_nameDELETE FROM table_nameWHERE condition
Example:
DELETE * FROM tasks
DELETE * FROM tasks WHERE task_id=1
UPDATE
To update a value in table

Syntax:
UPDATE table_nameSET field1 = new-value1, field2 = new-value2 [WHERE Clause]

Example:
UPDATE tasks SET task_name=‘xyz’ WHERE task_id=1
DROP
TRUNCATE
MYSQLDATA TYPES
1. NUMERIC DATA TYPE
2. DATETIME DATA TYPE
3. STRING DATA TYPE
NUMERIC DATA TYPE
DATETIME DATA TYPE
STRING DATA TYPE
HOW TO IMPORT MYSQL DATABASE INTO
EXCEL
1. Create a new workbook in MS Excel
2.  Click on DATA tab
3.  Select from Other sources button
4.  Select from SQL Server as shown in the image

5. Enter the server name/IP address. For this tutorial, am connecting to localhost 127.0.0.1
6. 2. Choose the login type. If you are on a local machine and you have windows authentication enabled.
7. 3. If you are connecting to a remote server, then you will need to provide user id and password details.
8. 4. Click on next button
CONTINU…
CONTINU….

9. Select EmployeesDB from the drop down list 10

10 Click on employees table to select it
11. Click on next button.
THANK YOU

Log in or Sign Up
100% (1)
Log in or Sign Up
9 pages
Oracle CRM Service Contracts Queries
No ratings yet
Oracle CRM Service Contracts Queries
55 pages
Advanced Machine Learning Mastering Level Learning With Python
No ratings yet
Advanced Machine Learning Mastering Level Learning With Python
81 pages
Dacs-Wn Series PDF
No ratings yet
Dacs-Wn Series PDF
6 pages
Unit 3
No ratings yet
Unit 3
97 pages
6th Sem Cse Data Science Analytics SM o
No ratings yet
6th Sem Cse Data Science Analytics SM o
40 pages
Report Print
No ratings yet
Report Print
22 pages
DS Module 1
No ratings yet
DS Module 1
112 pages
Fd45092a Ccad 459e Bc18 b01536fd6bac Untitled
No ratings yet
Fd45092a Ccad 459e Bc18 b01536fd6bac Untitled
53 pages
Library
No ratings yet
Library
23 pages
Module 4 Data Science
No ratings yet
Module 4 Data Science
42 pages
Data exam 3
No ratings yet
Data exam 3
42 pages
Machine Learning Unit-1.1
No ratings yet
Machine Learning Unit-1.1
29 pages
L2 - Machine Learning Process
No ratings yet
L2 - Machine Learning Process
17 pages
Machine Learning Unit-1.1
No ratings yet
Machine Learning Unit-1.1
43 pages
Data Science: Sales Forecasting For Marketing
No ratings yet
Data Science: Sales Forecasting For Marketing
52 pages
Essential Data Science Notes - A Concise PDF Guide
No ratings yet
Essential Data Science Notes - A Concise PDF Guide
20 pages
Data Science Syllabus From Beginner to Advanced
No ratings yet
Data Science Syllabus From Beginner to Advanced
7 pages
Basics of Data Science KPK
No ratings yet
Basics of Data Science KPK
38 pages
Data Analytics PDF
0% (1)
Data Analytics PDF
6 pages
Week 12 Intro to DS and ML
No ratings yet
Week 12 Intro to DS and ML
67 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Unit 2 Data Science
No ratings yet
Unit 2 Data Science
53 pages
Data Science
No ratings yet
Data Science
33 pages
Data-Science - Introduction
No ratings yet
Data-Science - Introduction
35 pages
Getting Started With Data Science Using Python
100% (1)
Getting Started With Data Science Using Python
25 pages
Data Science and Machine Learning
No ratings yet
Data Science and Machine Learning
30 pages
Question Bank Syllbuswise
No ratings yet
Question Bank Syllbuswise
16 pages
Data Science Notes Structured FINAL v2
No ratings yet
Data Science Notes Structured FINAL v2
9 pages
Python
No ratings yet
Python
9 pages
datascience
No ratings yet
datascience
12 pages
DS-Unit-1_ABM
No ratings yet
DS-Unit-1_ABM
103 pages
Data Science Roadmap for Beginners
No ratings yet
Data Science Roadmap for Beginners
4 pages
Unit 2
No ratings yet
Unit 2
48 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
8 pages
Data Science Vs Machine Learning Vs Deep Learning: The Difference
No ratings yet
Data Science Vs Machine Learning Vs Deep Learning: The Difference
19 pages
Data Science Master Class 2023
No ratings yet
Data Science Master Class 2023
8 pages
Module 1
No ratings yet
Module 1
192 pages
MLUnit_1 Share (1)
No ratings yet
MLUnit_1 Share (1)
162 pages
Master+Data+Science,+Data+Analytics+and+Machine+Learning+Using+Python (1)
No ratings yet
Master+Data+Science,+Data+Analytics+and+Machine+Learning+Using+Python (1)
16 pages
Introduction-It Skills
No ratings yet
Introduction-It Skills
20 pages
BMA - Recommended Sources For Analytics
No ratings yet
BMA - Recommended Sources For Analytics
13 pages
The Field of Data Science
No ratings yet
The Field of Data Science
4 pages
EPS DL Handout1 Introduction Compressed
No ratings yet
EPS DL Handout1 Introduction Compressed
46 pages
ML & AI-Introduction To Data-Science Tools
No ratings yet
ML & AI-Introduction To Data-Science Tools
7 pages
Lesson1 Introduction To The Data Science Process and The Value of Learning Data Science
No ratings yet
Lesson1 Introduction To The Data Science Process and The Value of Learning Data Science
6 pages
SWE 227 Slide 01
No ratings yet
SWE 227 Slide 01
21 pages
PDS Labmanualword
No ratings yet
PDS Labmanualword
32 pages
d 01 Introduction
No ratings yet
d 01 Introduction
37 pages
Intro To AI With Python
No ratings yet
Intro To AI With Python
50 pages
Unit I
No ratings yet
Unit I
52 pages
Python For Data Science and Machine Learning
100% (2)
Python For Data Science and Machine Learning
31 pages
Chapter One Data Science
No ratings yet
Chapter One Data Science
4 pages
Previous Lecture
No ratings yet
Previous Lecture
43 pages
Data Science 3
No ratings yet
Data Science 3
4 pages
AI LIFE CYCLE
No ratings yet
AI LIFE CYCLE
30 pages
data science unit_2
No ratings yet
data science unit_2
9 pages
DS Curriculum
No ratings yet
DS Curriculum
4 pages
Data Engineers
No ratings yet
Data Engineers
21 pages
Presentation
No ratings yet
Presentation
42 pages
Chapter-14 Data Science
No ratings yet
Chapter-14 Data Science
12 pages
Exploring the World of Data Science and Machine Learning
From Everand
Exploring the World of Data Science and Machine Learning
NIBEDITA Sahu
No ratings yet
Data Science with Python: Unlocking the Power of Pandas and Numpy
From Everand
Data Science with Python: Unlocking the Power of Pandas and Numpy
Robert Johnson
No ratings yet
AWS Well Architected
No ratings yet
AWS Well Architected
7 pages
Tutorial (Chapter 9)
No ratings yet
Tutorial (Chapter 9)
4 pages
Introduction To Minitab: Lab No: 01
No ratings yet
Introduction To Minitab: Lab No: 01
3 pages
Content Din en Iso PDF
No ratings yet
Content Din en Iso PDF
244 pages
Intermediate Javascript Notes
No ratings yet
Intermediate Javascript Notes
1 page
Inventory Control & Improving Record Accuracy in Production: Dr. Elbahlul M. Abogrean, Tajedeen R. Own
No ratings yet
Inventory Control & Improving Record Accuracy in Production: Dr. Elbahlul M. Abogrean, Tajedeen R. Own
6 pages
Prototype Phase PDF
No ratings yet
Prototype Phase PDF
10 pages
Gary Bronson Excel 2019 Project Book Mercury Learning and Information 2021
No ratings yet
Gary Bronson Excel 2019 Project Book Mercury Learning and Information 2021
162 pages
Via Ivrea 8b 10098 Rivoli - (To) Italy Phone +39 011 9573423
No ratings yet
Via Ivrea 8b 10098 Rivoli - (To) Italy Phone +39 011 9573423
2 pages
Only And: For Regular Serving Railway Employees of SWR Rwfiynk
No ratings yet
Only And: For Regular Serving Railway Employees of SWR Rwfiynk
5 pages
Computational Fluid Dynamics Assignment 2
No ratings yet
Computational Fluid Dynamics Assignment 2
20 pages
EXPERIMENT NO. 1: Verification of The Integrity of The Junctions of A BJT Transistor I. Objectives
No ratings yet
EXPERIMENT NO. 1: Verification of The Integrity of The Junctions of A BJT Transistor I. Objectives
6 pages
Engineering Roadmap
No ratings yet
Engineering Roadmap
2 pages
Install Top Win 7 PDF
No ratings yet
Install Top Win 7 PDF
9 pages
The Inverse Laplace Transform Partial Fractions and The First Shifting Theorem
No ratings yet
The Inverse Laplace Transform Partial Fractions and The First Shifting Theorem
5 pages
Aleesha Eleen Sajan (Resume)
No ratings yet
Aleesha Eleen Sajan (Resume)
2 pages
Dss
No ratings yet
Dss
3 pages
Prediction Theory
No ratings yet
Prediction Theory
90 pages
Chapter 2. Introducing The UML: The Unified Modeling Language User Guide Second Edition
No ratings yet
Chapter 2. Introducing The UML: The Unified Modeling Language User Guide Second Edition
35 pages
Reactive and Active Power Transfer Experiment - Additional Notes
No ratings yet
Reactive and Active Power Transfer Experiment - Additional Notes
51 pages
As 3648-1993 Specification and Methods of Test For Packaged Concrete Mixes
No ratings yet
As 3648-1993 Specification and Methods of Test For Packaged Concrete Mixes
7 pages
Instructions For Use: Réf. Constructeur: Réf. GPAO: 33502012701 Ind1
No ratings yet
Instructions For Use: Réf. Constructeur: Réf. GPAO: 33502012701 Ind1
40 pages
Aztech DSL5028EN Series Product Specifications
No ratings yet
Aztech DSL5028EN Series Product Specifications
2 pages
The Application of Numerical Approximation Methods Upon Digital Images
No ratings yet
The Application of Numerical Approximation Methods Upon Digital Images
5 pages
QA Interview Questions For Telegram App (Responses)
No ratings yet
QA Interview Questions For Telegram App (Responses)
2 pages

Data Science

Uploaded by

Data Science

Uploaded by

School of Computer Science & Engineering

Database Data into Excel.

Review of Previous Lecture

Uploading csv , xml data into excel

• To understand data science

Data science is the field of study that combines domain expertise,

Data science is the study of data.

It involves developing methods of recording, storing, and analyzing data to

The data acquisition step deals with finding and

The data processing step is used to transform the data to

The data exploration step is a brain storming step where

The data modeling step deals with building of data

The deployment stage deals with the deployment of the

• In Data Science, MATLAB is used for simulating neural networks and

• 9. Jupyter Project Jupyter is an open-source tool based on IPython

• 13. TensorFlow TensorFlow has become a standard tool for

• The association rule learning is one of the very important concepts of

• The wrapper method has the same goal as the filter

Probability means possibility. It is a branch of mathematics that

Probability theory is the branch of mathematics that deals with

The value is expressed from zero to one

Probability can range in from 0 to 1, where 0 means the event to

This is the basic probability theory, which is also used in the

To find the probability of a single event to occur, first, we

Probability of event to happen P(E) =

But if we toss two coins in the air, there could be three

The probability is equal to the number of yellow pillows in

No. of blue bottles picked out: 300

a) What is the probability that Sumit will pick a green bottle?

To find the probability that the sum is equal to 1

Probability has a wide variety of applications in real life. Some of

Choosing a card from the deck of cards

It is used for risk assessment and modelling in various

Example:1 Given: 10 marbles: 2 red, 3 green, 5 blue.

Two Basic Rules

Multiplication rule of probability states that whenever an event is the

The probability of event AB is obtained by using the properties of

Multiplication Rule of Probability for Dependent Events

• Bayes’ theorem describes the probability of occurrence of an event

• For example: if we have to calculate the probability of taking a blue ball

The equation of a line is typically written

The maths allow us to get a

As a Data Scientist, you should be able to solve problems in an

Graphs are mathematical structures used to study pairwise

A Graph is a pair of sets. G = (V,E). V is the set of vertices. E is a set of

A DiGraph is also a pair of sets. D = (V,A). V is the set of vertices. A is the

In the case of digraphs, there is a distinction between `(u,v)` and `(v,u)`.

• A graph is said to be trivial if a finite graph contains only one vertex

An expression that represents repeated multiplication of the

To learn about SQL, Database tables.

To implement SQL commands in Database.

A database is a collection of information that

SQL stands for Structured Query Language.

SQL lets you access and manipulate databases

A database most often contains one or

2. DML – Data Manipulation Language- used to make changes to the database,

4. TCL – Transaction Control Language which commits, or saves, transactions

DELETE - deletes data from a database

INSERT INTO - inserts new data into a database

CREATE DATABASE - creates a new database

ALTER DATABASE - modifies a database

CREATE TABLE - creates a new table

ALTER TABLE - modifies a table

DROP TABLE - deletes a table

CREATE INDEX - creates an index (search key)

DROP INDEX - deletes an index

The CREATE TABLE statement is used to create a new table in a database.

The EmpId column is of type int and will hold an integer.

The INSERT INTO statement is used to insert new records in a table.

It is possible to write the INSERT INTO statement in two ways.

9. Select EmployeesDB from the drop down list 10

You might also like