0% found this document useful (0 votes)
2 views

Data Science

The document outlines a syllabus for an Introduction to Data Science course, covering topics such as the data science life cycle, tools for data science, artificial intelligence (AI), and machine learning (ML) algorithms. It includes sections on probability theory, SQL commands, and various machine learning techniques like supervised, unsupervised, and reinforcement learning. Additionally, it discusses feature selection methods and provides examples of applications in data science.

Uploaded by

thecarr2006
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Data Science

The document outlines a syllabus for an Introduction to Data Science course, covering topics such as the data science life cycle, tools for data science, artificial intelligence (AI), and machine learning (ML) algorithms. It includes sections on probability theory, SQL commands, and various machine learning techniques like supervised, unsupervised, and reinforcement learning. Additionally, it discusses feature selection methods and provides examples of applications in data science.

Uploaded by

thecarr2006
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 132

School of Computer Science & Engineering

UNIT 2
Introduction to Data Science
Syllabus

• What is Data Science? Applications of Data Science, Data science life cycle, Tools for data science, definition

of AI, types of machine learning (ML), list of ML algorithms for classification, clustering, and feature

selection. Probability theory, bayes theorem, bayes probability; Cartesian plane, equations of lines, graphs;

exponents.

• Introduction to SQL: SQL Commands experimental demonstrations-DDL, DML, DCL, TCL, DQL. Import SQL

Database Data into Excel.

2
Outline:

Review of Previous Lecture


Topic for the day
Objective and Outcome of Lecture
Data Science
Lecture
Discussion Probability Theorem

Examples

3
Review of Previous Lecture :
Creating excel sheet

Uploading csv , xml data into excel

4
Topic for the Lecture:
What is data science

Probability Theorem

Bayes Theorem

5
Objective and Outcome of
Lecture:

• To understand data science


probabilitytheory,
• bayes theorem,
Lecture Objective • Cartesian Plane,
• equations of lines, graphs,
exponents,
• SQL

6
Data Science:

Data science is the field of study that combines domain expertise,


programming skills, and knowledge of mathematics and statistics to extract
meaningful insights from data.

Data science is the study of data.

It involves developing methods of recording, storing, and analyzing data to


effectively extract useful information.
7
Data Science:

8
The life cycle of the data science:
The business requirement step deals with the
identification of the problem and objectives

The data acquisition step deals with finding and


collecting of the source of data and store the data

The data processing step is used to transform the data to


a form that suits better for finding the required information.

The data exploration step is a brain storming step where


identification of pattern is done

The data modeling step deals with building of data


models and training the models using the data sets.

The deployment stage deals with the deployment of the


model in the business environment. 9
Tools for data
science

10
Explanation
• 1. SAS (Statistical Analysis Software) It is one of those data science tools
which are specifically designed for statistical operations. SAS is a closed
source proprietary software that is used by large organizations to analyze
data.

• 2. Apache Spark
• Apache Spark or simply Spark is an all-powerful analytics engine and it is the
most used Data Science tool. Spark is specifically designed to handle batch
processing and Stream Processing.

• 3. BigML
• It provides a fully interactable, cloud-based GUI environment that you can use
for processing Machine Learning Algorithms.

• 4. D3.js
• Javascript is mainly used as a client-side scripting language. D3.js, a Javascript
library allows you to make interactive visualizations on your web-browser.
Explanation
• 5. MATLAB
• MATLAB is a multi-paradigm numerical computing environment for
processing mathematical information.
• It is a closed-source software that facilitates matrix functions,
algorithmic implementation and statistical modeling of data. MATLAB
is most widely used in several scientific disciplines.

• In Data Science, MATLAB is used for simulating neural networks and


fuzzy logic. Using the MATLAB graphics library, you can create
powerful visualizations.

• 6. Excel Probably the most widely used Data Analysis tool. Microsoft
developed Excel mostly for spreadsheet calculations and today, it is
widely used for data processing, visualization, and complex
calculations.
• 7. ggplot2 ggplot2 is an advanced data visualization package for the
R programming language. The developers created this tool to replace
the native graphics package of R and it uses powerful commands to
create illustrious visualizations. It is the most widely used library that
Data Scientists use for creating visualizations from analyzed data.
Explanation
• 8. Tableau
• Tableau is a Data Visualization software that is packed with
powerful graphics to make interactive visualizations. It is focused
on industries working in the field of business intelligence. The
most
• important aspect of Tableau is its ability to interface with
databases, spreadsheets, OLAP (Online Analytical Processing)
cubes, etc. Along with these features, Tableau has the ability to
visualize geographical data and for plotting longitudes and
latitudes in maps.

• 9. Jupyter Project Jupyter is an open-source tool based on IPython


for helping developers in making open-source software and
experiences interactive computing. Jupyter supports multiple
languages like Julia, Python, and R. It is a web-application tool
used for writing live code, visualizations, and presentations.
Jupyter is a widely popular tool that is designed to address the
requirements of Data Science.
Explanation
• 10. Matplotlib Matplotlib is a plotting and visualization library
developed for Python. It is the most popular tool for generating
graphs with the analyzed data. It is mainly used for plotting
complex graphs using simple lines of code. Using this, one can
generate bar plots, histograms, scatterplots etc. Matplotlib has
several essential modules. One of the most widely used
modules is pyplot. It offers a MATLAB like an interface. Pyplot is
also an open-source alternative to MATLAB‘s graphic modules.

• 13. TensorFlow TensorFlow has become a standard tool for


Machine Learning. It is widely used for advanced machine
learning algorithms like Deep Learning. Developers named
TensorFlow after Tensors which are multidimensional arrays. It is
an open-source and ever-evolving toolkit which is known for its
performance and high computational abilities.
15
Artificial
intelligence (AI)
• Artificial intelligence (AI) is intelligence demonstrated
by machines, unlike the natural intelligence displayed
by humans and animals, which involves consciousness
and emotionality
• Artificial intelligence (AI), the ability of a digital
computer or computer-controlled robot to perform
tasks commonly associated with intelligent beings.
• Artificial intelligence (AI) refers to the simulation of
human intelligence in machines that are programmed
to think like humans and mimic their actions.

16
Machine
Learning
“Machine learning enables
a machine toautomatically learn
from data, improve performance
from experiences, and predict
things withoutbeing explicitly
programmed.”

17
Key differences between AI
and ML

18
Key differences between AI
and ML

19
Types of machine
learning (ML)

20
Types of Machine Learning

3/24/2021 21
Supervised
learning
• Supervised learning as the name indicates the
presence of a supervisor as a teacher.
• Basically supervised learning is a learning in
which we teach or train the machine using data
which is well labeled that means some data is
already tagged with the correct answer.
• After that, the machine is provided with a new set
of examples(data) so that supervised learning
algorithm analyses the training data(set of training
examples) and produces a correct outcome from
labeled data.

22
Unsupervised
learning
• Unsupervised learning is the training of machine using
information that is neither classified nor labeled and
allowing the algorithm to act on that information without
guidance.
• Here the task of machine is to group unsorted
information according to similarities, patterns and
differences without any prior training of data.
• Unlike supervised learning, no teacher is provided that
means no training will be given to the machine.
• Therefore machine is restricted to find the hidden
structure in unlabeled data by our-self.

23
Semi-supervised
learning &
Reinforcement
learning
• Semi-supervised Learning is between the supervised
and unsupervised learning.
• It uses both labelled and unlabelled data for training.
• Reinforcement learning trains an algorithm with a
reward system, providing feedback when an artificial
intelligence agent performs the best action in a
particular situation.
• In Reinforcement learning , AI agents are attempting to
find the optimal way to accomplish a particular goal, or
improve performance on a specific task.
• As the agent takes action that goes toward the goal, it
receives a reward.
24
Examples /
Applications

25
Difference

26
Regressi
on
• Regression analysis is a statistical method to model
the relationship between a dependent (target) and
independent (predictor) variables with one or more
independent variables.
• Regression is a process of finding the correlations
between dependent and independent variables.
• It helps in predicting the continuous variables such
as prediction of Market Trends, prediction of House
prices, etc

27
ML Regression
Algorithms
• Simple Linear Regression
• Multiple Linear Regression
• Polynomial Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression

28
Classificat
ion
• Classification algorithm is a Supervised Learning
technique that is used to identify the category of
new observations on the basis of training data.
• In Classification, a program learns from the given
dataset or observations and then classifies new
observation into a number of classes or groups.
• Such as, Yes or No, 0 or 1, Spam or Not Spam, cat
or dog, etc.

29
ML Classification
Algorithms
• Logistic Regression
• K-Nearest Neighbours
• Support Vector Machines
• Kernel SVM
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification

30
Difference between
Regression and
Classification

31
Clusteri
ng
• Grouping the similar data is called cluster
• Clustering or cluster analysis is a machine learning
technique, which groups the unlabelled dataset.

32
Clustering
Algorithms
• K-Means algorithm
• Agglomerative Hierarchical algorithm
• Mean-shift algorithm
• DBSCAN Algorithm (Density-Based Spatial
Clustering of Applications with Noise)
• Expectation-Maximization (EM) Clustering using
GMM (Gaussian Mixture Model)

33
Association Rule
• Association rule learning is a type of unsupervised learning technique that checks
for the dependency of one data item on another data item and maps accordingly
so that it can be more profitable. It tries to find some interesting relations or
associations among the variables of dataset. It is based on different rules to
discover the interesting relations between variables in the database.

• The association rule learning is one of the very important concepts of


machine learning, and it is employed in Market Basket analysis, Web usage
mining, continuous production, etc.

34
35
Feature
selection
• In machine learning and statistics, feature
selection, also known as variable selection,
attribute selection or variable subset selection
• It is the process of selecting a subset of relevant
features (variables, predictors) for use in model
construction.
• When the number of features are very large. No-
need not use every feature at your disposal for
creating an algorithm.
• You can assist your algorithm by feeding in only
those features that are really important.
36
Feature
selection
• Machine learning works on a simple rule – if you put
garbage in, you will only get garbage to come out.
(garbage -noise) - “Sometimes, less is better!”
Top reasons to use feature selection are:
• It enables the machine learning algorithm to train faster.
• It reduces the complexity of a model and makes it easier
to interpret.
• It improves the accuracy of a model if the right subset is
chosen.
• It reduces overfitting.
37
ML Feature selection
Algorithms
Filter Methods:Filter methods are a type of feature
selection method that works by selecting features based
on some criteria prior to building the model.

• Pearson’s Correlation
• Linear Discriminant Analysis (LDA)
• ANOVA (Analysis of variance)
• Chi-Square

38
Wrapper Methods

• The wrapper method has the same goal as the filter


method, but it takes a machine learning model for its
evaluation. In this method, some features are fed to
the ML model, and evaluate the performance. The
performance decides whether to add those features or
remove to increase the
accuracy of the model. This method is more accurate
than the filtering method but complex to work .

• Forward Selection
• Backward Elimination
• Recursive Feature elimination
ML Feature selection
Algorithms
Embedded Methods
Embedded methods check the different training
iterations of the machine learning model and
evaluate the importance of each feature.
• Decision Tree
• ID3
• C4.5
• Classification And Regression Tree (CART)

40
Linear regression

y= mx+c+ ε
• y= Dependent Variable (Target Variable)
• x= Independent Variable (predictor Variable)
• c= y intercept of the line
• m= slope
• ε= error

41
Probability theory

Probability means possibility. It is a branch of mathematics that


deals with the occurrence of a random event.

Probability theory is the branch of mathematics that deals with


the possibility of the happening of events.

The value is expressed from zero to one

Probability can range in from 0 to 1, where 0 means the event to


42
be an impossible one and 1 indicates a certain event.
Probability theory
Probability has been introduced in Maths to predict how likely
events are to happen.

This is the basic probability theory, which is also used in the


probability distribution, where you will learn the possibility of
outcomes for a random experiment.

To find the probability of a single event to occur, first, we


should know the total number of possible outcomes.
43
Formula for Probability
The probability formula is defined as the possibility of an event to
happen is equal to the ratio of the number of favourable outcomes and
the total number of outcomes.

Probability of event to happen P(E) =


Number of favourable outcomes/Total Number of outcomes

This is the basic formula. But there are some more formulas
for different situations or events.

44
Probability theory

• For example,
When we toss a coin, either we get Head OR Tail, only two
possible outcomes are possible (H, T).

But if we toss two coins in the air, there could be three


possibilities of events to occur, such as both the coins show
heads or both show tails or one shows heads and one tail, i.e.
(H, H), (H, T),(T, T).
45
Problems on Probability
1) There are 6 pillows in a bed, 3 are red, 2 are yellow and
1 is blue. What is the probability of picking a yellow
pillow?

Solution:

The probability is equal to the number of yellow pillows in


the bed divided by the total number of pillows,
i.e. 2/6 = 1/3.
46
Problems on Probability
2) There is a container full of coloured bottles, red, blue, green and orange. Some
of the bottles are picked out and displaced. Sumit did this 1000 times and got the
following results:

No. of blue bottles picked out: 300


No. of red bottles: 200
No. of green bottles: 450
No. of orange bottles: 50

a) What is the probability that Sumit will pick a green bottle?


Ans: For every 1000 bottles picked out, 450 are green.
Therefore, P(green) = 450/1000 = 0.45

b) If there are 100 bottles in the container, how many of them are likely to be green?
Ans: The experiment implies that 450 out of 1000 bottles are green.
47
Therefore, out of 100 bottles, 45 are green.
48
49
50
51
52
53
Probability Terms and Definition
Some of the important probability terms are

54
Probability Terms and Definition
Some of the important probability terms are

55
Question 2: Two dice are rolled, find the probability
that the sum is:
equal to 1
equal to 4
less than 13
Solution:

To find the probability that the sum is equal to 1


But we have to first determine the sample space S of two dice as shown below.
S = { (1,1),(1,2),(1,3),(1,4),(1,5),(1,6)
(2,1),(2,2),(2,3),(2,4),(2,5),(2,6)
(3,1),(3,2),(3,3),(3,4),(3,5),(3,6)
(4,1),(4,2),(4,3),(4,4),(4,5),(4,6)
(5,1),(5,2),(5,3),(5,4),(5,5),(5,6)
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6) }
So, n(S) = 36 56
4) Let E be the event “sum equal to 1”. Since, there are no
outcomes which where a sum is equal to 1, hence,
P(E) = n(E) / n(S) = 0 / 36 = 0
2) Let A be the event of getting the sum of numbers on dice
equal to 4.
Three possible outcomes give a sum equal to 4 they are:
A = {(1,3),(2,2),(3,1)}
n(A) = 3
Hence, P(A) = n(A) / n(S) = 3 / 36 = 1 / 12
3) Let B be the event of getting the sum of numbers on dice is less than 13.
From the sample space, we can see all possible outcomes for the event B, which gives a sum less
than B. Like:
(1,1) or (1,6) or (2,6) or (6,6).
So you can see the limit of an event to occur is when both dies have number 6, i.e. (6,6).
Thus, n(B) = 36
Hence,
P(B) = n(B) / n(S) = 36 / 36 = 1 57
Applications of Probability

Probability has a wide variety of applications in real life. Some of


the common applications which we see in our everyday life while
checking the results of the following events:

Choosing a card from the deck of cards


Flipping a coin
Throwing a dice in the air
Pulling a red ball out of a bucket of red and white balls
Winning a lucky draw

58
Major Applications of Probability

It is used for risk assessment and modelling in various


industries
Weather forecasting or prediction of weather changes
Probability of a team winning in a sport based on players
and strength of team
In the share market, chances of getting the hike of share
prices

59
Basics of Probability

Example:1 Given: 10 marbles: 2 red, 3 green, 5 blue.


• Find: probability of selecting green?
• Solution: P(G) = 3/10= .30

Two Basic Rules


1.Additional Rule
1. Mutually exclusive - Mutual exclusive mean occurrence of events both A and B together is impossible i.e.
P(A and B)=0 and A or B is the sum of A and B i.e. P(A or B) =P(A) + P(B) OR P(AUB)=P(A)
+P(B)

60
-Non Mutually exclusive
• In case of Non Mutual exclusive events A or B is the sum of A and B minus A and B i.e.
• P(A or B) =P(A) + P(B) – P(A and B) OR P(AUB)=P(A)+P(B)-P(A AND B)

61
62
63
64
65
2. Multifaction Rule

Multiplication rule of probability states that whenever an event is the


intersection of two other events, that is, events A and B need to occur
simultaneously. Then, P(A and B)=P(A)⋅P(B).

The set A∩B denotes the simultaneous occurrence of events A and B, that
is the set in which both events A and event B have occurred.

The probability of event AB is obtained by using the properties of


conditional probability, which is given as P(A ∩ B) = P(A) P(B | A).

Multiplication Rule of Probability for Dependent Events


If the outcome of one event affects the outcome of the other, then those
events are referred to as dependent events.

Sometimes, the occurring of the first event impacts the probability of the
second event. From the theorem,
we have, P(A ∩ B) = P(A) P(B | A), where A and B are independent
events. 66
2. Multifaction Rule

67
Dependent Event (Conditional
Prabability)
• The conditional probability of an event B in relationship to an event A is the probability that event B occurs
given that event A has already occurred.

68
69
Problem 1:
• A math teacher gave her class two tests. 25% of the class passed both tests and 42% of the class passed the
first test. What percent of those who passed the first test also passed the second test?
Answer:
P(Second | First) = P(First and Second)/P(First)
= 0.25/0.42=0.60
= 60%

70
Problem 2:
• A jar contains black and white marbles. Two marbles are chosen without replacement. The probability of
selecting a black marble and then a white marble is 0.34, and the probability of selecting a black marble on
the first draw is 0.47. What is the probability of selecting a white marble on the second draw, given that the
first marble drawn was black?
• Answer:
• P(White | Black) = P(Black and White)/P(Black)
= 0.34/0.47
=.72
= 72%

71
Bayes Theorem
• Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after
Reverend Thomas Bayes, describes the probability of an event, based
on prior knowledge of conditions that might be related to the event.

• Bayes’ theorem describes the probability of occurrence of an event


related to any condition. It is also considered for the case of
conditional probability.

• For example: if we have to calculate the probability of taking a blue ball


from the second bag out of three different bags of balls, where each
bag contains three different colour balls viz. red, blue, black. In this
case, the probability of occurrence of an event is calculated depending
on other conditions is known as conditional probability. 72
Statement of theorem

73
Statement of theorem

74
Example 1:
• A bag I contain 4 white and 6 black balls while another Bag II contains 4 white and 3 black balls. One ball is
drawn at random from one of the bags, and it is found to be black. Find the probability that it was drawn from
Bag I.
• Solution:
• Let E1 be the event of choosing the bag I, E2 the event of choosing the bag II, and A be the event of drawing
a black ball.
• Then,P(E1) = P(E2) = 1/2
• Also,P(A|E1) = P(drawing a black ball from Bag I) = 6/10 = 3/5
• P(A|E2) = P(drawing a black ball from Bag II) = 3/7
• By using Bayes’ theorem, the probability of drawing a black ball from bag I out of two bags,
• P(E1|A) = P(E1)P(A|E1)/P(E1)P(A│E1)+P(E2)P(A|E2)
• =(1/2 × 3/5)/(1/2 × 3/5 + 1/2 × 3/7) = 7/12

75
Example 2:

76
Problem on Bayes Theorem

77
Assignment Example 2:
• A man is known to speak truth 2 out of 3 times. He throws a die and reports that the
number obtained is a four. Find the probability that the number obtained is actually a four.
• Solution:
• Let A be the event that the man reports that number four is obtained.
• Let E1 be the event that four is obtained and E2 be its complementary event.
• Then, P(E1) = Probability that four occurs = 1/6
• P(E2) = Probability that four does not occurs = 1 – P(E1) = 1 −1/6 = 5/6
• Also, P(A|E1) = Probability that man reports four and it is actually a four = 2/3
• P(A|E2) = Probability that man reports four and it is not a four = 1/3
• By using Bayes’ theorem, probability that number obtained is actually a four,
• P(E1|A) =P(E1)P(A|E1)/P(E1)P(A│E1) + P(E2)P(A|E2) = (1/6 × 2/3)/(1/6 × 2/3 + 5/6 ×
1/3) = 2/7
78
Problem on Bayes Theorem
1. In a bolt factory, machines A, B and C manufacture 25%, 35%
, 40% respectively. Of the total of their output 5, 4 and 2% are defective.
A bolt is drawn and is found to be defective. What are the
probabilities that it was manufactured by the machines C ?
Solution:

79
80
Another Way to Solve

81
Assignment
Problem on Bayes Theorem
2.An insurance company insured 2000 scooter drivers, 4000 car
drivers and 6000 truck drivers. The probability of an accident involving
a scooter, a car and a truck are 0.01, 0.03 and 0.15 respectively. One of
the insured persons meets with an accident. What is the probability that
he is a scooter driver?

82
Problem on Bayes Theorem
2. An insurance company insured 2000 scooter drivers, 4000 car
drivers and 6000 truck drivers. The probability of an accident involving
a scooter, a car and a truck are 0.01, 0.03 and 0.15 respectively. One of
the insured persons meets with an accident. What is the probability that
he is a scooter driver?
Solution:

83
84
85
Cartesian Plane
The cartesian plane is a two-dimensional coordinate plane formed by
the intersection of two perpendicular lines. The horizontal line is known
as X-axis, and the vertical line is known as Y-axis. The coordinate point
(x, y) on the Cartesian plane says that the horizontal distance of the
point from the origin is x, and the vertical distance is y. If the sign of x is
positive, the point is on the right of the origin; else it is on the left.
Similarly, if the sign is positive for y, the point is y points above the
origin else it is y points below it.

86
Cartesian Plane:

87
Equation of lines:

The equation of a line is typically written


as y=mx+b where m is the slope and b is
the y-intercept.

The maths allow us to get a


straight line between any two (x,y) points
in two dimensional graph.
88
Graphs:

The Data Science and Analytics field has also used Graphs to
model various structures and problems.

As a Data Scientist, you should be able to solve problems in an


efficient manner and Graphs provide a mechanism to do that

Graphs are mathematical structures used to study pairwise


relationships between objects and entities.
89
Graphs:

A Graph is a pair of sets. G = (V,E). V is the set of vertices. E is a set of


edges. E is made up of pairs of elements from V (unordered pair)

A DiGraph is also a pair of sets. D = (V,A). V is the set of vertices. A is the


set of arcs. A is made up of pairs of elements from V (ordered pair)

In the case of digraphs, there is a distinction between `(u,v)` and `(v,u)`.


Usually the edges are called arcs in such cases to indicate a notion of
direction.
90
Types of graph

91
Trivial Graph:

• A graph is said to be trivial if a finite graph contains only one vertex


and no edge.

92
Simple Graph:

• A simple graph is a graph that does not contain more than one edge
between the pair of vertices. A simple railway track connecting
different cities is an example of a simple graph.

93
Multi Graph:
• Multi Graph:
• Any graph which contains some parallel edges but doesn’t contain any
self-loop is called a multigraph. For example a Road Map.
• Parallel Edges: If two vertices are connected with more than one edge
then such edges are called parallel edges that are many routes but
one destination.
• Loop: An edge of a graph that starts from a vertex and ends at the
same vertex is called a loop or a self-loop.

94
Exponents:

An expression that represents repeated multiplication of the


same factor is called a power.
An exponent refers to the number of times a number is
multiplied by itself.
For example, 2 to the 3rd (written like this: 23) means: 2 x 2 x 2 =
8.
The exponent corresponds to the number of times the base is
used as a factor.
95
Properties
• The properties of exponents or laws of exponents are used to solve
problems involving exponents. These properties are also considered
as major exponents rules to be followed while solving exponents. The
properties of exponents are mentioned below.
• Law of Product: am × an = am+n
• Law of Quotient: am/an = am-n
• Law of Zero Exponent: a0 = 1
• Law of Negative Exponent: a-m = 1/am
• Law of Power of a Power: (am)n = amn
• Law of Power of a Product: (ab)m = ambm
• Law of Power of a Quotient: (a/b)m = am/bm
96
SQL
OBJECTIVE AND OUTCOME OF
LECTURE
Introduction to Microsoft Excel

To learn about SQL, Database tables.


Lecture
Objective

To implement SQL commands in Database.


Lecture
Outcome
DATABASE

A database is a collection of information that


is organized so that it can be easily accessed,
managed and updated. Computer databases
typically contain aggregations of data records
or files,
SQL
SQL is a standard language for accessing and manipulating
databases.

SQL stands for Structured Query Language.

SQL lets you access and manipulate databases


DATABASE TABLES

A database most often contains one or


more tables.
Each table is identified by a name (e.g.
"Customers" or "Orders").
Tables contain records (rows) with data.

101
1. DDL – Data Definition Language - used to create and modify the structure of
objects in a database using predefined commands and a specific syntax. These
database objects include tables, sequences, locations, aliases, schemas and
indexes.

2. DML – Data Manipulation Language- used to make changes to the database,


such as: CRUD operations to create, read, update and delete data. Using
INSERT, SELECT, UPDATE, and DELETE commands.

3. DCL – Data Control Language its commands are administrative powers that
allow other users access to the database.

4. TCL – Transaction Control Language which commits, or saves, transactions


done to the database or data.
MYSQL COMMANDS
SOME OF THE MOST IMPORTANT
SQL COMMANDS
SELECT - extracts data from a database
UPDATE - updates data in a database

DELETE - deletes data from a database

INSERT INTO - inserts new data into a database

CREATE DATABASE - creates a new database

ALTER DATABASE - modifies a database

CREATE TABLE - creates a new table

ALTER TABLE - modifies a table

DROP TABLE - deletes a table

CREATE INDEX - creates an index (search key)

DROP INDEX - deletes an index


104
CREATE TABLE

The CREATE TABLE statement is used to create a new table in a database.

In that table, if you want to add multiple columns, use the below syntax.

The column parameters specify the names of the columns of the table.

The data type parameter specifies the type of data the column can hold (e.g.
varchar, integer, date, etc.).

105
CREATE TABLE

The EmpId column is of type int and will hold an integer.

The LastName, FirstName, Address, and City columns are of type varchar and
will hold characters and the maximum length for these fields is 255
characters.

106
INSERT VALUE IN TABLE

The INSERT INTO statement is used to insert new records in a table.

It is possible to write the INSERT INTO statement in two ways.

Syntax

The first way specifies both the column names and the values to be inserted.

If you are adding values for all the columns of the table, then no need to specify the column
names in the SQL query. However, make sure that the order of the values is in the same
order as the columns in the table.

107
INSERT VALUE IN TABLE

108
SELECT
Display the contents of the table
Syntax:
Select * from table_name
Example:
Select * from tasks
DESCRIBE TABLE
To view the structure / schema of a table

Syntax:
DESCRIBE table_name
DESC table_name

1. Example:
DELETE
To delete the contents of the table
Syntax:
DELETE * FROM table_nameDELETE FROM table_nameWHERE condition
Example:
DELETE * FROM tasks
DELETE * FROM tasks WHERE task_id=1
UPDATE
To update a value in table

Syntax:
UPDATE table_nameSET field1 = new-value1, field2 = new-value2 [WHERE Clause]

Example:
UPDATE tasks SET task_name=‘xyz’ WHERE task_id=1
DROP
TRUNCATE
MYSQLDATA TYPES
1. NUMERIC DATA TYPE
2. DATETIME DATA TYPE
3. STRING DATA TYPE
NUMERIC DATA TYPE
DATETIME DATA TYPE
STRING DATA TYPE
HOW TO IMPORT MYSQL DATABASE INTO
EXCEL
1. Create a new workbook in MS Excel
2.  Click on DATA tab
3.  Select from Other sources button
4.  Select from SQL Server as shown in the image

5. Enter the server name/IP address. For this tutorial, am connecting to localhost 127.0.0.1
6. 2. Choose the login type. If you are on a local machine and you have windows authentication enabled.
7. 3. If you are connecting to a remote server, then you will need to provide user id and password details.
8. 4. Click on next button
CONTINU…
CONTINU….

9. Select EmployeesDB from the drop down list 10


10 Click on employees table to select it
11. Click on next button.
THANK YOU

You might also like