0% found this document useful (0 votes)
7 views

Introductions To Data Science - Lecture 1 - Introduction

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Introductions To Data Science - Lecture 1 - Introduction

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Introduction to Data

Science
Lecture 1 - Introduction

Ph.D. Vahan Sargsyan


2021
Welcome to Data Science
• Sexiest profession in 21st century.
About Myself
• Who Ph.D. Vahan Sargsyan
• Where CERGE-EI, Prague, Czech Republic
• What Data Scientist in NetSuite (Oracle)
• When Monday 07.30 – 09.00 a.m. CEST, Thursday 07.30-08.15 a.m. CEST

• Teaching Assistants:
• International School of Economics at TSU (ISET) - Giorgi Kvinikadze ([email protected])
• Far Eastern Federal University (FEFU)- Valeria Shichalina ([email protected])
• Novosibirsk State University (NSU) - Elena Limanova ([email protected])
• Westminster International University (WIUT) - Ziyodakhon Malikova ([email protected])

• Contact information [email protected]


• Office Hours Tuesday 07:30 – 08:30 CET, or by appointment.
About You! 
• Far Eastern Federal University (Russia)

• Novosibirsk State University (Russia)

• Westminster International University


(Uzbekistan)

• International School of Economics at TSU


(ISET, Georgia)

• https://cutt.ly/jc4Bifu
About You! 
• DO NOT HESITATE TO ASK QUESTIONS!!!
About the Course
• Check the Syllabus
Data Science as a Profession
• Statistics and Mathematics
• Probability, algebra, regression, etc.
Statistics and
• Choose procedure
Mathematics
• Diagnose problem
Machine Traditional
• Coding Learning Research
• Databases Data
• Tools, programing languages
Science Field-
Coding Danger specific
• Field-specific knowledge Zone knowledge
• Experience in field
• Goals, methods and constraints (Domain)
Coding and Software
• Programming Laguages
What is Machine Learning?
• Term coined by Arthur Samuel (IBM) in 1959
• Machines doing things without being explicitly programmed to do so.
• Algorithm that is able to do two consecutive steps:
1. Find a pattern in a Data;
2. Make a prediction based on the found pattern.
1*2 + 2*5 = 12
• Human Learning is very similar: 1*4 + 2*3 = 10
X1 X2 X3
Y 1*4 + 2*2 = 8
2 5 12
4 3 10 1*X1 + 2*X2 = Y
4 2 8 ?=5
3 1 ?
What is Machine Learning?
• Term coined by Arthur Samuel (IBM) in 1959
• Doing things without being explicitly programmed to do so.
• Algorithm that is able to do two consecutive steps:
1. Find a pattern in a Data;
2. Make a prediction based on the found pattern.
• Human Learning is very similar: -2*5 + 1*12 = 2
X1
Y X2 X3
X1 -2*3 + 1*10 = 4
2 5 12 -2*2 + 1*8 = 4
4 3 10 -2*X2 + 1*X1 = Y
4 2 8
3 1 ? ?=5
Disadvantages and Advantages
of Machine Learning
• Disadvantages:
• Slightly distorted inputs completely wrong output
• Advantages:
• The computing force of the machine  May reveal non-ideal functional form
of the relationship.

(1*)X1 (+2*)X2 (=)y (1*)X1 (+2*)X2 (-0.5*)X3 (=)y (+)error (=)Y


2 5 12 2 5 4 10 -1 9
4 3 6 7 +0.6 7.6
4 3 10
4 2 2 7 -0.8 6.2
4 2 8 3 1 10 0 +0.4 0.4
3 1 5 6 3 2 11 +1 12
Advantages of Machine Learning
• The computing force of the machine.

• What if:
• the functional form of the relationship is not ideal;
• the provided data is not complete.
(1*)X1 (+2*)X2 (-0.5*)X3 (=)Y
2 5 9
4 7.6
4 2 6.2
1 0.4
6 3 ?
Econometrics and Regression
• Econometrics is the application of statistical methods to economic data
in order to give empirical content to economic relationships (i.e. find
patterns).
• In statistical modeling, regression analysis is a set of statistical
processes for estimating the relationships between a dependent
variable and one or more independent variables.
• Some regression models (most common):
• Ordinary Least Squares (OLS);
• Logit;
• Random Forest;
• Neural Networks and Deep Learning.
Data Science and Machine Learning
• Research for an appropriate Regression model – ±5%;
• Data Mining and Cleaning – ±80%;
• Trial and improvement, retrial and improvement, … - ±15%

• Machine learning is a continuously evolving system based on the new


data and corrections of the past predictions.

• Python as a programming Language for ML.


This course
• Planning

• Data Gathering

• Data Cleaning

• Modeling

• Analytics and Evaluation

You might also like