DSF 1-2

The document provides an overview of data science fundamentals. It discusses how data science uses statistics, data analysis, and machine learning to extract knowledge and insights from data. It explains why data science has become important in the era of big data, where large data sets require machine processing. The document also describes key components of data science like data wrangling, modeling, and visualization. It discusses how data science has evolved from earlier statistical techniques limited by computing power.

Uploaded by

Sultan mehmood hamza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views28 pages

DSF 1-2

Uploaded by

Sultan mehmood hamza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

DATA SCIENCE

FUNDAMENTAL
S
DSC293
Lecture 1-2
Dr. Hufsa Mohsin
OVERVIEW
 Data Science is a combination of multiple disciplines that uses
 statistics,
 data analysis, and
 machine learning

 to analyze data and to extract knowledge and insights from it.

WHY DATA SCIENCE
 Information is what we want, but data are what we’ve got.
 The techniques for transforming data into information go back hundreds of years.
 “bills” - tabulations—a condensation of data on individual events into a form more readily
assimilated by the human reader.
 Constructing such tabulations was a manual operation.
WHY DATA SCIENCE…
 Over the centuries, as data became larger, machines were introduced to speed up the
tabulations.
 Herman Hollerith’s development of punched cards
 Also in the late 19th century, statistical methods began to develop rapidly.
 These methods have been tremendously important in interpreting data, but they were not
intrinsically tied to mechanical data processing.
 Generations of students have learned to carry out statistical operations by hand on small sets
of data.
WHY DATA SCIENCE…
 Nowadays, it is common to have data sets that are so large they can be processed only by
machine.
 In this era of big data, data are gathered by networks of instruments and computers. The settings
where such data arise are diverse:
 the genome
 satellite observations of Earth,
 entries by Web users,
 sales transactions, etc.

 There are new opportunities for finding and characterizing patterns using techniques described as
data mining, machine learning, data visualization, and so on.
 Such techniques require computer processing. Among the tasks that need performing are data
cleaning, combining data from multiple sources, and reshaping data into a form suitable as input
to data-summarization operations for visualization and modeling.
WHY DATA SCIENCE…
 Data science spans a wide range of capacities that they described as “data acumen.” Key
components that are part of data acumen include
 mathematical
 computational
 statistical foundations
 data management and curation
 data description and visualization
 data modeling and assessment
 workflow and reproducibility, communication and teamwork, domain-specific considerations,
and ethical problem solving.
DATA WRANGLING
 A process of preparing data for visualization and other modern techniques of statistical
interpretation and using those data to answer statistical questions via modeling and
visualization.
 The ability to reason statistically and utilize computational and algorithmic capacities.
 R and the packages dplyr and ggplot2—
 focus on a small subset of functions that accomplish data wrangling tasks in a concise and expressive way.
WHAT IS DATA SCIENCE?
 The science of extracting meaningful information from data.
 Data science as a fine-grained blend of intellectual traditions from statistics and computer
science.
 Computer science
 It is the creation of appropriate abstractions to express computational structures and the development
of algorithms that operate on those abstractions.
 Statistics
 it is the interplay of general notions of sampling, models, distributions and decision-making.

 Data science is based on the idea that these styles of thinking support each other (Pierson
2016).
WHAT IS DATA SCIENCE?
 Data science is best applied in the context of expert knowledge about the domain from which
the data originate.
 The distinction between data and information is the core of data science.
 Data scientists are people who are interested in converting the data that is now abundant into
actionable information that always seems to be scarce.
STATISTICS AND DATA
SCIENCE
 The goals of data scientists and statisticians are the same??
 Much of statistical technique was originally developed in an environment where data were
scarce and difficult or expensive to collect, so statisticians focused on creating methods that
would maximize the strength of inference one is able to make, given the least amount of data.
 These techniques were often ingenious, involved sophisticated mathematics, and have proven
invaluable to the empirical sciences
 While several of the most influential early statisticians saw computing as an integral part of
statistics, it is also true that much of the development of statistical theory was to find
mathematical approximations for things that we couldn’t yet compute
STATISTICS AND DATA
SCIENCE
 Today, the manner in which we extract meaning from data is different in two ways
 we are able to compute many more things than we could before
 some of the techniques that were ubiquitous in statistics education in the 20th century (e.g., the t-test, ANOVA)
are being replaced by computational techniques that are conceptually simpler but were simply infeasible until
the microcomputer revolution (e.g., the bootstrap, permutation tests).
 We have a lot more data than we had before
 many of the data we now collect are observational—they don’t come from a designed experiment, and they
aren’t really sampled at random.
 clinical trials and A/B tests
VS
 predictive model, an interactive visualization of the data, or a web application that allows the user to engage
with the data to explore questions and extract meaning.
EVOLUTION OF DATA
SCIENCE
DATA SCIENCE LIFE CYCLE
DATA SCIENCE COMPONENTS
DATA PLANNING AND
STRATEGY
 Developing a plan or a data strategy is simply determining what data are you going to gather
and why.
 not the strategy for deciding what mathematical techniques we’re going to use or the technologies
required.
 The focus is on the data we need to address the business problem/ opportunity and why.
 Hence, deciding on a strategy requires making a connection between the data and the business
goals.
 Gathering and formatting the data, getting rid of the ‘garbage data’ that doesn’t serve the
business goal is a reflection of achieving mission-critical data for business goals.
DATA MINING
 Data mining basically implies analyzing data patterns in large batches of data using one or
more software. It has applications in multiple fields like science and research.
 As an application of data mining, businesses can learn more about their customers as it helps
them to be closer to them & develop more effective strategies related to business functions &
leverage resources in an optimal & insightful manner.
DATA ENGINEERING
 Data engineering primarily involves the creation of software solutions for data problems that
involve establishing a data system with data pipelines and endpoints within that system.
 Data engineering requires an in-depth understanding of a wide range of data technologies &
frameworks along with creating data solutions to enable business processes.
DATA ANALYSIS & MODELS
 Considered as the heart of data science, we can think of data analysis & mathematical
models in terms of how to use data to extract insights or make business predictions & to create
a tool that replaces or supplements what a human does.
DATA VISUALIZATION &
OPERATIONALIZATION
 Data visualization is not just presenting the analyzed data correctly; it involves understanding
the raw data and what is needed to be visualized based on the needs and goals of users and the
operations.
 Data operationalization involves real-time person decision/action, a long-term response, or a
recommendation on a specific task.
1. DATA ANALYST SQL, R,
SAS, and
Python

 Data analysts are responsible for a variety of tasks including visualisation, munging, and
processing of massive amounts of data. They also have to perform queries on the databases
from time to time. One of the most important skills of a data analyst is optimization.
 This is because they have to create and modify algorithms that can be used to cull information from
some of the biggest databases without corrupting the data.
 Roles and Responsibilities:
 Extracting data from primary and secondary sources using automated tools
 Developing and maintaining databases
 Performing data analysis and making reports with recommendations
 Analyzing data and forecasting trends that impact the organization/project
 Working with other team members to improve data collection and quality processes
Hive,

2. DATA ENGINEERS NoSQL, R,

Ruby, Java,
C++, and
Matlab

 Data engineers build and test scalable Big Data ecosystems for the businesses so that the data
scientists can run their algorithms on the data systems that are stable and highly optimized.
 Data engineers also update the existing systems with newer or upgraded versions of the
current technologies to improve the efficiency of the databases.
 Roles and Responsibilities:
 Design and maintain data management systems
 Data collection/acquisition and management
 Conducting primary and secondary research
 Finding hidden patterns and forecasting trends using data
 Collaborating with other teams to perceive organizational goals
 Make reports and update stakeholders based on analytics
3. DATABASE
ADMINISTRATOR
 The job profile of a database administrator is pretty much self-explanatory- they are
responsible for the proper functioning of all the databases of an enterprise and grant or revoke
its services to the employees of the company depending on their requirements. They are also
responsible for database backups and recoveries.

Roles and Responsibilities:

 Working on database software to store and manage data
 Working on database design and development
 Implementing security measures for database
 Preparing reports, documentation, and operating manuals
 Data archiving
 Working closely with programmers, project managers, and other team members
4. MACHINE LEARNING
ENGINEER
 Machine learning engineers are in high demand today. However, the job profile comes with its
challenges. Apart from having in-depth knowledge of some of the most powerful technologies such
as SQL, REST APIs, etc. machine learning engineers are also expected to perform A/B testing,
build data pipelines, and implement common machine learning algorithms such as classification,
clustering, etc.
 Roles and Responsibilities:
 Designing and developing Machine Learning systems
 Researching Machine Learning Algorithms
 Testing Machine Learning systems Java,
 Developing apps/products basis client requirements Python,
 Extending existing Machine Learning frameworks and libraries
JS
 Exploring and visualizing data for a better understanding
 Training and retraining systems
 Know the importance of statistics in machine learning
5. DATA SCIENTIST R, MatLab,
SQL,
Python,

 Data scientists have to understand the challenges of business and offer the best solutions using
data analysis and data processing. For instance, they are expected to perform predictive
analysis and run a fine-toothed comb through an “unstructured/disorganized” data to offer
actionable insights. They can also do this by identifying trends and patterns that can help the
companies in making better decisions.
 Roles and Responsibilities:
 Identifying data collection sources for business needs
 Processing, cleansing, and integrating data
 Automation data collection and management process
 Using Data Science techniques/tools to improve processes
 Analyzing large amounts of data to forecast trends and provide reports with recommendations
 Collaborating with business, engineering, and product teams
data warehousing,

6. DATA ARCHITECT data modelling,

extraction
transformation and
load (ETL)
Hive, Pig, and Spark
 A data architect creates the blueprints for data management so that the databases can be easily
integrated, centralized, and protected with the best security measures. They also ensure that the
data engineers have the best tools and systems to work with.
 Roles and Responsibilities:
 Developing and implementing overall data strategy in line with business/organization
 Identifying data collection sources in line with data strategy
 Collaborating with cross-functional teams and stakeholders for smooth functioning of database
systems
 Planning and managing end-to-end data architecture
 Maintaining database systems/architecture considering efficiency and security
 Regular auditing of data management system performance and making changes to improve systems
accordingly.
SQL, data

7. STATISTICIAN mining, and the

various
machine
learning
technologies

 A statistician, as the name suggests, has a sound understanding of statistical theories and data
organization. Not only do they extract and offer valuable insights from the data clusters, but
they also help create new methodologies for the engineers to apply.
 Roles and Responsibilities:
 Collecting, analyzing, and interpreting data
 Analyzing data, assessing results, and predicting trends/relationships using statistical
methodologies/tools
 Designing data collection processes
 Communicating findings to stakeholders
 Advising/consulting on organizational and business strategy basis data
 Coordinating with cross-functional teams
8. BUSINESS ANALYST Power BI
Tableau

 The role of business analysts is slightly different than other data science jobs. While they do
have a good understanding of how data-oriented technologies work and how to handle large
volumes of data, they also separate the high-value data from the low-value data. In other
words, they identify how the Big Data can be linked to actionable business insights for
business growth.
 Roles and Responsibilities:
 Understanding the business of the organization
 Conducting detailed business analysis – outlining problems, opportunities, and solutions
 Working on improving existing business processes
 Analysing, designing, and implementing new technology and systems
 Budgeting and forecasting
 Pricing analysis
9. DATA AND ANALYTICS
MANAGER
 A data and analytics manager oversees the data science operations and assigns the duties to their team
according to skills and expertise. Their strengths should include technologies like SAS, R, SQL, etc.
and of course management.
 Roles and Responsibilities:
 Developing data analysis strategies Python, SAS,
 Researching and implementing analytics solutions R, Java
 Leading and managing a team of data analysts
 Overseeing all data analytics operations to ensure quality
 Building systems and processes to transform raw data into actionable business insights
 Staying upto date on industry news and trends

 How to Become a Data and Analytics Manager?

First and foremost, to go down the analytics manager career path, you must have excellent social skills,
leadership qualities, and an out-of-box thinking attitude. You should also be good at data science
technologies like Python, SAS, R, Java, etc.

(Ebook) Empiricism and Language Learnability by Nick Chater, Alexander Clark, John A. Goldsmith, Amy Perfors ISBN 9780198734260, 0198734263 Download
No ratings yet
(Ebook) Empiricism and Language Learnability by Nick Chater, Alexander Clark, John A. Goldsmith, Amy Perfors ISBN 9780198734260, 0198734263 Download
53 pages
English Language Advisor
No ratings yet
English Language Advisor
6 pages
Q1 Week 3 Modalsffd
No ratings yet
Q1 Week 3 Modalsffd
48 pages
HP 439 Frost Syllabus Spring 25
No ratings yet
HP 439 Frost Syllabus Spring 25
5 pages
Remodeling
No ratings yet
Remodeling
2 pages
Get Hired as a Data Analyst FAST in 2024
From Everand
Get Hired as a Data Analyst FAST in 2024
Silas Meadowlark
No ratings yet
Syllabus - Ethics & CSR
No ratings yet
Syllabus - Ethics & CSR
4 pages
A.OSAMA-CX Team Leader
No ratings yet
A.OSAMA-CX Team Leader
2 pages
IELTS Recent Actual Test With Answers Practice Test 03
No ratings yet
IELTS Recent Actual Test With Answers Practice Test 03
3 pages
GEC05 Course Syllabus Math in The Modern World
No ratings yet
GEC05 Course Syllabus Math in The Modern World
8 pages
Nova Southeastern Dissertation Guide
100% (2)
Nova Southeastern Dissertation Guide
4 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Mathematics-In-The-Modern-World (Gec 102)
No ratings yet
Mathematics-In-The-Modern-World (Gec 102)
15 pages
Modern American Literature Syllabus
No ratings yet
Modern American Literature Syllabus
2 pages
S22 Lecture 1 Intro Inked
No ratings yet
S22 Lecture 1 Intro Inked
46 pages
Chapter 5 - Software Architecture
No ratings yet
Chapter 5 - Software Architecture
11 pages
Datascience Internship
No ratings yet
Datascience Internship
19 pages
Jazzlyn Shequin: Elementary Educator
No ratings yet
Jazzlyn Shequin: Elementary Educator
2 pages
Data Science Unit 01
No ratings yet
Data Science Unit 01
19 pages
V Smilnak Supervisor Rec Letter
No ratings yet
V Smilnak Supervisor Rec Letter
1 page
Internal Auditor Interview Questions
100% (1)
Internal Auditor Interview Questions
2 pages
DS Notes
No ratings yet
DS Notes
159 pages
Trends in Data Science: AI and DS-I
No ratings yet
Trends in Data Science: AI and DS-I
32 pages
Data Science Basics
No ratings yet
Data Science Basics
25 pages
Introduction
No ratings yet
Introduction
20 pages
DS B&V-1
No ratings yet
DS B&V-1
30 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
85 pages
Taming The Table-Instructions For The Game: Times
No ratings yet
Taming The Table-Instructions For The Game: Times
3 pages
Data Science M-1 Notes
No ratings yet
Data Science M-1 Notes
34 pages
Applied - Data - Science MODULE 1 SEM8
No ratings yet
Applied - Data - Science MODULE 1 SEM8
16 pages
Class Wise Child Detailed Report
No ratings yet
Class Wise Child Detailed Report
1 page
Introduction To Data-Science
No ratings yet
Introduction To Data-Science
246 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
16 pages
Genua Istc 541 Lesson Plan Final
No ratings yet
Genua Istc 541 Lesson Plan Final
8 pages
IS-BFSI-Europe NW-Parent
No ratings yet
IS-BFSI-Europe NW-Parent
5 pages
Data Science Presentation
No ratings yet
Data Science Presentation
27 pages
Introduction To Data Science Lecture 1
No ratings yet
Introduction To Data Science Lecture 1
4 pages
Data Science Life Cycle
No ratings yet
Data Science Life Cycle
12 pages
Computational Data Science - Unit 1
No ratings yet
Computational Data Science - Unit 1
18 pages
Unit-3 Intr Data Science
No ratings yet
Unit-3 Intr Data Science
150 pages
6th Grade-Csi Lesson Plan
0% (1)
6th Grade-Csi Lesson Plan
3 pages
Seminar On Data Science
100% (7)
Seminar On Data Science
25 pages
Unit 1
No ratings yet
Unit 1
28 pages
HUI-CMP201 Note 5
No ratings yet
HUI-CMP201 Note 5
62 pages
Basic of Ds
No ratings yet
Basic of Ds
14 pages
Unit - 1 DS
No ratings yet
Unit - 1 DS
24 pages
Introduction Am
No ratings yet
Introduction Am
74 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
25 pages
Ids Course Content
No ratings yet
Ids Course Content
98 pages
Chapter 1 Introduction To Datascience
No ratings yet
Chapter 1 Introduction To Datascience
13 pages
Datascience
75% (8)
Datascience
28 pages
GPR Form 1
No ratings yet
GPR Form 1
8 pages
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
No ratings yet
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
5 pages
Vishwha D
No ratings yet
Vishwha D
29 pages
Fds Module 1
No ratings yet
Fds Module 1
65 pages
The Constitution of Aba Foundation
100% (1)
The Constitution of Aba Foundation
12 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
Unit-1 Data Science
No ratings yet
Unit-1 Data Science
74 pages
FDS - Lecture Notes - III AIML, CSM
No ratings yet
FDS - Lecture Notes - III AIML, CSM
101 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Dsdm-Unit1 241031 194317
No ratings yet
Dsdm-Unit1 241031 194317
38 pages
Time Minutes Learning Areas: Morning Session
No ratings yet
Time Minutes Learning Areas: Morning Session
8 pages
Application Development For Mobile Devices: Course: COMP1550
No ratings yet
Application Development For Mobile Devices: Course: COMP1550
19 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
13 pages
Therapeutic Modalities
100% (1)
Therapeutic Modalities
142 pages
Data Science
No ratings yet
Data Science
6 pages
Unit 1
No ratings yet
Unit 1
60 pages
Lesson Plan 7th Grade
100% (1)
Lesson Plan 7th Grade
5 pages
Digital Unit Plan Template
No ratings yet
Digital Unit Plan Template
9 pages
Unit I
No ratings yet
Unit I
52 pages
Data Science CLASS 12 INVESTIGATORY PROJECT
No ratings yet
Data Science CLASS 12 INVESTIGATORY PROJECT
9 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
16 pages
Data Science
No ratings yet
Data Science
18 pages
Ds Intro KK
No ratings yet
Ds Intro KK
11 pages
Developing Analytic Talent: Becoming a Data Scientist
From Everand
Developing Analytic Talent: Becoming a Data Scientist
Vincent Granville
3/5 (7)
Data Science With Python (MSC 3rd Sem) Unit 1
No ratings yet
Data Science With Python (MSC 3rd Sem) Unit 1
17 pages
File
No ratings yet
File
27 pages
Data Science - AD1102-1
No ratings yet
Data Science - AD1102-1
53 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Ab Assignment 3
No ratings yet
Ab Assignment 3
7 pages
Session 1819
No ratings yet
Session 1819
47 pages
Basics of Data Science KPK
No ratings yet
Basics of Data Science KPK
38 pages
Data Science Components
No ratings yet
Data Science Components
7 pages
Chapter 1
No ratings yet
Chapter 1
47 pages
List of Universities Offering Distance Education in Astrology Courses97
100% (1)
List of Universities Offering Distance Education in Astrology Courses97
4 pages
Data Science: by Neha Tyagi
100% (1)
Data Science: by Neha Tyagi
17 pages
Data Science Ppt1 Update
No ratings yet
Data Science Ppt1 Update
67 pages
Data
No ratings yet
Data
43 pages
Data Science
100% (2)
Data Science
52 pages
Data Science Intro
No ratings yet
Data Science Intro
52 pages

DSF 1-2

Uploaded by

DSF 1-2

Uploaded by

DATA SCIENCE

 to analyze data and to extract knowledge and insights from it.

2. DATA ENGINEERS NoSQL, R,

Roles and Responsibilities:

6. DATA ARCHITECT data modelling,

7. STATISTICIAN mining, and the

 How to Become a Data and Analytics Manager?

You might also like