0% found this document useful (0 votes)
5 views

ProgrammingForDS17_DataScience

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

ProgrammingForDS17_DataScience

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Data Science

Liana Harutyunyan
Programming for Data Science
May 1, 2024
American University of Armenia
[email protected]

1
Data Science

• Data science is an interdisciplinary field that uses


algorithms, procedures, and processes to examine large
amounts of data in order to uncover hidden patterns,
generate insights, and direct decision making.

2
Data Science

• Data science is an interdisciplinary field that uses


algorithms, procedures, and processes to examine large
amounts of data in order to uncover hidden patterns,
generate insights, and direct decision making.
• It includes but is not limited to this stages:
• Data collection, cleaning and preprocessing
• Exploratory data analysis
• Machine learning (finding insights)
• Communication and interpretation of results

2
Programming for Data Science

Main programming languages used for Data Science are

• Python - which is more general-purpose programming


language
• R - which is primarily used for statistical analysis and
data visualization

3
Programming for Data Science

Main programming languages used for Data Science are

• Python - which is more general-purpose programming


language
• R - which is primarily used for statistical analysis and
data visualization

However, both of them, develop with high speed, and


sometimes it just comes to personal preference.

3
Data Science and Machine Learning

Machine Learning is a part of Data Science.


Deep Learning is a part of Machine Learning.

4
Main directions

Machine Learning has 3 main techniques to learn something


(algorithms also divide into these categories):
• Supervised Learning - when you have labeled dataset,
both inputs and outputs
• regression
• classification

5
Main directions

Machine Learning has 3 main techniques to learn something


(algorithms also divide into these categories):
• Supervised Learning - when you have labeled dataset,
both inputs and outputs
• regression
• classification
• Unsupervised Learning - unlabeled dataset, only input,
we provide the machine with data and ask to look for
hidden features

5
Main directions

Machine Learning has 3 main techniques to learn something


(algorithms also divide into these categories):
• Supervised Learning - when you have labeled dataset,
both inputs and outputs
• regression
• classification
• Unsupervised Learning - unlabeled dataset, only input,
we provide the machine with data and ask to look for
hidden features
• Reinforcement Learning - machine learns getting
positive and negative rewards from the environment
(similar to how the child learns)

5
Linear Regression example

Imagine you have a dataset of Flat area and price, and you
want to predict the price of the apartment based on the
area.
Let’s do the same exercise in both Python and R.

• Build a toy dataset of area-price relationship.


• Use built-in libraries and functions to predict the
relationship.

6
Classical Models

• In Python, the most classical Machine Learning


algorithms are implemented in sklearn package.
• In R, some algorithms are in the base library, others
have their individual packages. Currently unified
package is being developed.

7
Deep Learning Models

To write Deep Learning Models (Neural Nets) in Python, you


need either of the following libraries:

• PyTorch
• Tensorflow

They require more code than the classical models.


(For R, libraries for deep learning are being developed, but
not highly used.)

8
Other terms

Other terms used in Data Science:

• Web Scraping - Download data from Internet that are


not already available for download (tables)
• APIs - Download data using APIs (request - reply of data)

You might also like