0% found this document useful (0 votes)
34 views33 pages

FoDS - L1

Uploaded by

f20221525
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views33 pages

FoDS - L1

Uploaded by

f20221525
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 33

Foundations of Data Science

(CS F320)
Prof.N.L.Bhanu Murthy
BITS Pilani
Hyderabad Campus
What is Science?

“Science” is derived from the Latin word scientia, meaning


"knowledge"

“Science is the study of the nature, behavior of natural things and


the knowledge that we obtain about them.” - Collins

BITS Pilani, Hyderabad Campus


Physical Science

Physical Science is the scientific study of non-living matter.

– Chemistry
The study of all forms of matter,
including how matter interacts
with other matter.

– Physics
The study of energy and
how it affects matter.

BITS Pilani, Hyderabad Campus


Data..

BITS Pilani, Hyderabad Campus


Data..
“Data is the New Oil” – World Economic Forum 2011

BITS Pilani, Hyderabad Campus


Data Science

Data science is an interdisciplinary field that uses methods,


processes, algorithms and systems to extract knowledge and
insights from data in various forms, both structured and
unstructured
“The ability to take data — to be able to understand it, to process it, to extract
value from it, to visualize it, to communicate it — that’s going to be a hugely
important skill in the next decades.”
Hal Varian, chief economist at Google and UC Berkeley professor of
information sciences, business, and economics

BITS Pilani, Hyderabad Campus


Data Science Pipeline

BITS Pilani, Hyderabad Campus


Machine Learning

Machine Learning is study of


algorithms that
improve performance P
at some task T
with experience E
Tom Mitchell (1990)

Well-defined learning task: <P,T,E>


 Regression
 Supervised learning
 Unsupervised Learning

BITS Pilani, Hyderabad Campus


Data Mining
Data mining (knowledge discovery from data)
“Extraction of interesting (non-trivial, implicit, previously unknown
and potentially useful) patterns or knowledge from huge
amount of data”

Association Rule Discovery [Descriptive]


Sequential Pattern Discovery [Descriptive]
Deviation Detection [Predictive]

BITS Pilani, Hyderabad Campus


Information Retrieval Info

Query
IR
Retrieval system
Document Answer list
collection

RRR vs KGF 2 which is better in Hindi?

 Multimedia Information retrieval(MIR)


 Recommender systems
 Web search
BITS Pilani, Hyderabad Campus
Probability Foundations to Data Science
“Probability is a mathematical tool to model uncertainty”

 Frequentist vs Bayesian perspective of Probability

 Probability distributions – Gaussian, Beta, Bernoulli and Binomial

 Maximum likelihood and Bayeisan Inference

 Probabilistic perspective of Polynomial Curve Fitting

 Bayesian Curve Fitting

 Mixture of Guassians and Probability Bounds

 Nonparametric Methods - Nearest-neighbour methods

BITS Pilani, Hyderabad Campus


Decision & Information Theory Foundations

 Minimizing Misclassification rate & expected loss

 Inference and decision

 Loss functions for regression

 Relative Entropy and Mutual Information

 Decision Tree

BITS Pilani, Hyderabad Campus


Computational Foundations to Data Science
 Unconstrained/Constrained optimization

 Equality/inequality constraints

 Convex optimization

 Lagrange multiplier

 Primal/dual concept

 Quadratic programming

 Kernel Machines for Regression

BITS Pilani, Hyderabad Campus


Curse of Dimensionality

 Dimensionality Reduction

 Principal Component Analysis

BITS Pilani, Hyderabad Campus


Data Visualization

Mapping Data to Graphical Elements like

 Histograms and Pie Charts

 Box Plot

 Percentile Plots

 Empirical Cumulative Distribution Functions

 Scatter Plots

 Visualizing Spatio-temporal Data

 OLAP and Multidimensional Data Analysis

BITS Pilani, Hyderabad Campus


Data Preprocessing Techniques

 Types of Data

 Data Quality

 Feature Extraction

 Feature Subset Selection

 Discretization and Binrization

 Measures of Similarity and Dissimilarity

BITS Pilani, Hyderabad Campus


Introduction to Big Data & Analytics

BITS Pilani, Hyderabad Campus


Applications of Data Science

PageRank: The web as a behavioral dataset

BITS Pilani, Hyderabad Campus


Applications of Data Science

Sponsored search:

BITS Pilani, Hyderabad Campus


Applications of Data Science

Sponsored search
 Google revenue around $50 bn/year from marketing, 97%
of the companies revenue.

 Sponsored search uses an auction – a pure competition for


marketers trying to win access to consumers.

BITS Pilani, Hyderabad Campus


Applications of Data Science

"In the 21st century, the candidate with [the] best


data, merged with the best messages dictated by that
data, wins.”

Andrew Rasiej, Personal Democracy Forum

BITS Pilani, Hyderabad Campus


Data Science

BITS Pilani, Hyderabad Campus


Data Science

BITS Pilani, Hyderabad Campus


Data Science

BITS Pilani, Hyderabad Campus


Teaching & Evaluation (CS F320 – L P U – 3 0 3)

Evaluation Components & Criteria


Component Duration Weightage Date&Time Mode
Mid Sem Test 90 30% TBA Closed
Class Participation 5-10 mins 10% Surprise Open
Programming - 20% TBA Open
Assignments (2-3)
Comprehensive 180 mins 40% TBA Closed

Make-up Policy: Make-up will be granted only to genuine cases with prior
permission only

Course Notices: All notices will be put up in CMS and students are strongly
advised to log in to CMS and look for notices quite often.

Chamber Consultation: Tuesday 5PM – 6PM

BITS Pilani, Hyderabad Campus


Text Book

T1. Christopher Bishop: Pattern Recognition and Machine Learning,


Springer International Edition.

BITS Pilani, Hyderabad Campus


Text Book

T1. Tan,Pang-Ning & others. “Introduction to Data Mining” Pearson Education,


2006.
BITS Pilani, Hyderabad Campus
Reference Book

R1. Avrim Blum, John Hopcroft, Ravindran Kannan: Foundations of Data


Science, CUP
BITS Pilani, Hyderabad Campus
Reference Book

R1. Tom M. Mitchell: Machine Learning, The McGraw-Hill Companies, Inc.

BITS Pilani, Hyderabad Campus


Reference Book

BITS Pilani, Hyderabad Campus


Reference Book

BITS Pilani, Hyderabad Campus


Reference Book

BITS Pilani, Hyderabad Campus


Thank You!!

BITS Pilani, Hyderabad Campus

You might also like