0% found this document useful (0 votes)
36 views16 pages

Lecture 1 Introduction Tools An - Chniques For Data Science

Uploaded by

zainabkalsoom70
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views16 pages

Lecture 1 Introduction Tools An - Chniques For Data Science

Uploaded by

zainabkalsoom70
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Tools and Techniques for Data

Science
Can you become Data Scientist?

2
Intended Audience
Designed for students with no Data Science background,
However, with some understanding in Mathematics,
Statistics, and Programming

Data Science Data Scientist


 A cross-disciplinary approach to solving  Someone who extracts insights
data-rich problem from messy data.
 Also known as data driven science, is  Someone who knows more
an interdisciplinary field of scientific statistics than a computer scientist
methods, processes, and systems to and more computer science than a
extract knowledge or insights from data statistician.
in various forms, either structured or
unstructured, similar to data mining –
Wikipedia
https://en.wikipedia.org/wiki/Data_scie
nce
Data Science and Applications

 What is Data Science? Data Science - a cross-disciplinary


approach to solving data-rich problems
Scripting Language
Predictive Analytics
Computer
Science Machine
Math
(Hacking & Learning
&
Data Preparation Model fitting
Coding) Stats
Data Governance Experimental design
Data Science
Statistical
SQL Research Advance math
Traditional
Constraints (Privacy, Legal) (Data Engineer) Research Statistical packages
Get the right data
Understand customers
Domain Expertise Define metrics that mater
Ask Good Questions
(Marketing)
Make it actionable Translate for non-tech
4
Data Scientist Profile

https://www.datascience.com/blog/data-scientist-skills
5
Data Science Process

 Data Science Steps


 Data Gathering
If some data gathering setup is not present at an organization, you may have to get
it in fragments using credentials from different stakeholders across the
organization.
 Data Preparation
Its an art to clean and format gather data. The real world have very messy data;
missing values, errors in collection process, formatting, normalizing, outliers –
issue you have to learn to deal with.
 Exploration
Before modeling, visualization; clustering, scatterplots, bar graphs, Chernoff faces
are all interesting ways to understand structures that’s helps in model building.
6
Data Science Process
 Data Science Steps
 Model Building
Here you got the opportunity to explore different models to find best suited one. There
is no one fit for all, one need intuition to find strong candidate among; Random Forests,
SVM’s, Bayesian Predictors, Neural networks, Deep learning, K-Nearest Neighbors,
among others.
 Model Validation
Prediction accuracy is standard benchmarks for model validation, however, in some
cases, false positive and false negative are also important from a problem perspective. If
you are predicting a disease, you would care more about false negative since it may
result in a person’s death as compared to false positive which leads to further testing.
 Model Deployment
You would get opportunity to tweak and improve your model after deploying the wild.
7
Data Science Process

8
Amount of data produced yearly

1 zetabyte = 10 bytes = 1 billion TB


21

9
Applications

10
Job Titles and Salaries

11
Why Big Data Careers a big Craze

 Huge Demand for Big Data Professionals

 The Shortage of Big Data Talent

 Wide Choice of Job Types and Technologies


 Predictive Analytics
 Prescriptive Analytics
 Descriptive Analytics

12
Cloud and Big Data

 How cloud, big data and AI are key to the future


 71% of enterprises globally predict their investments in data and analytics will accelerate in
the next three years and beyond
 57% of enterprises globally have a Chief Data Officer, a leadership role that is pivotal in
helping to democratise data and analytics across any organisation
 52% of enterprises are leveraging advanced and predictive analytics today to provide greater
insights and contextual intelligence into operations
 41% of all enterprises are considering a move to cloud-based analytics in the next year
 Cloud computing (24%), big data (20%), and AI/machine learning (18%) are the three
technologies predicted to have the greatest impact on analytics over the next five years
 Just 16% of enterprises have enabled at least 75% of their employees to have access to
company data and analytics

 Source:https
://www.cloudcomputing-news.net/news/2018/aug/23/global-state-of-enterprise-analytics-
2018/ 13
Data Science Tools

These Data Science Tools are utilized to analyze and generate Predictions
 DaR (Decision analysis and resolution)
 Python (programming language)
 Jupyter (programming environment)
 Tableau (data visualization)
 Spark with ML (fast and realtime processing/analysis of data)
 Hadoop (Big data analytics tool)
 SAS (Statistical analysis tool)
 SQL
 Orange (data analysis tool)
 Weka (data analysis tool)

14
Data Science Techniques

 Machine Learning
 Regression
 Logistic Regression
 K Means Clustering
 Association Analysis
 Decision Trees
 Text Mining
 Social Network Analysis
 Time Series Forecasting
 Pareto Analysis

15
Sources

 https://www.kdnuggets.com/2019/05/poll-top-data-science-mac
hine-learning-platforms.html

 https://www.softwaretestinghelp.com/data-science-tools/

 https://data-flair.training/blogs/data-science-tools/

 https://www.youtube.com/watch?v=mNdbcHECGN4

 https://www.slideshare.net/MSDEVMTL/data-science-presenta
tion-94481624

16

You might also like