0% found this document useful (0 votes)

52 views17 pages

Data Science Unit 1

Uploaded by

Anil Arudra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views17 pages

Data Science Unit 1

Uploaded by

Anil Arudra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

4-2 B.

Tech IT Regulation: R19 Data Science-UNIT-1

UNIT-1
INTRODUCTION
SYLLABUS: UNIT I
Introduction: Need for data science – benefits and uses – facets of data – data science
process – setting their search goal – retrieving data – cleansing, integrating, and transforming
data – exploratory data analysis – build the models – presenting and building applications.
Data Science
 Data science involves using methods to analyze massive amounts of data
and extract the knowledge it contains.
 Data science is an evolutionary extension of statistics capable of dealing
with the massive amounts of data produced today. It adds methods from
computer science to the repertoire of statistics.
 Data Science is a discipline that merges concepts from computer science
(algorithms, programming, machine learning, and data mining),
mathematics (statistics and optimization), and domain knowledge
(business, applications, and visualization) to extract insights from data
and transform it into actions that have an impact in the particular domain
of application.
 Data science is the field of study that combines domain expertise,
programming skills, and knowledge of math and statistics to extract
meaningful insights from data. Data science practitioners apply machine
learning algorithms to numbers, text, images, video, audio, and more to
produce artificial intelligence (AI) systems that perform tasks which
ordinarily require human intelligence. In turn, these systems generate
insights that analysts and business users translate into tangible business
value.
 Data science is a field that studies data and how to extract meaning from it,
using a series of methods, algorithms, systems, and tools to extract insights
from structured and unstructured data. That knowledge then gets applied to
business, government, and other bodies to help drive profits, innovate
products and services, build better infrastructure and public systems, and
more.
 In short, Data Science “uses scientific methods, processes, algorithms
and systems to extract knowledge and insights from data in various
forms”.
 Python is a great language for data science because it has many data
science libraries available, and it’s widely supported by specialized
software. For instance, almost every popular NoSQL database has a
Python-specific API. Because of these features and the ability to
prototype quickly with Python while keeping acceptable performance, its
influence is steadily growing in the data science world.
Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 1
4-2 B.Tech IT Regulation: R19 Data Science-UNIT-1

 Relation between AI, ML, DL and data science

Need for Data Science

 The principal purpose of Data Science is to find patterns within data. It
uses various statistical techniques to analyze and draw insights from the
data.
 Every business has data but its business value depends on how much they
know about the data they have.
 Data Science has gained importance in recent times because it can help
businesses to increase business value of its available data which in turn
can help them to take competitive advantage against their competitors.
 It can help us to know our customers better, it can help us to optimize our
processes, it can help us to take better decisions. Because of data science,
data has become strategic asset.
 Data creates magic. Industries need data to help them make careful
decisions. Data Science churns raw data into meaningful insights.
Therefore, industries need data science.
 A Data Scientist is a wizard who knows how to create magic using data.
 By using Data Science, companies are able to make:
 Better decisions (should we choose A or B)
 Predictive analysis (what will happen next?)
 Pattern discoveries (find pattern, or maybe hidden information in
the data)
Benefits and Uses of Data Science
 Data science is used almost everywhere in both commercial and non
commercial settings.
 Commercial companies in almost every industry use data science to gain
insights into their customers, processes, staff, completion, and products.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 2
4-2 B.Tech IT Regulation: R19 Data Science-UNIT-1

 Many companies use data science to offer customers a better user

experience, as well as to cross-sell, up-sell, and personalize their
offerings.
 A good example of this is Google AdSense, which collects data from
internet users so relevant commercial messages can be matched to the
person browsing the internet. MaxPoint is another example of real-time
personalized advertising.
 Human resource professionals use people analytics and text mining to
screen candidates, monitor the mood of employees, and study informal
networks among co workers.
 Financial institutions use data science to predict stock markets, determine
the risk of lending money, and learn how to attract new clients for their
services.
 Governmental organizations are also aware of data’s value. Many
governmental organizations not only rely on internal data scientists to
discover valuable information, but also share their data with the public.
We can use this data to gain insights or build data-driven applications.
Data.gov is but one example; it’s the home of the US Government’s open
data.
 Universities use data science in their research but also to enhance the
study experience of their students. The rise of massive open online
courses (MOOC) produces a lot of data, which allows universities to
study how this type of learning can complement traditional classes.
MOOCs are an invaluable asset if we want to become a data scientist and
big data professional, so definitely look at a few of the better-known
ones: Coursera, Udacity, and edX.
Facets of data
(Types of Data or Different Forms of Data)
 Data can come in different forms. The main forms are
 Structured data
 Unstructured data
 Natural language data
 Machine-generated data
 Graph-based data or Network data
 Audio, video, and images
 Streaming data
Structured Data:
 Structured data is data that depends on a data model and resides in a fixed
field within a record.
 As such, it’s often easy to store structured data in tables within databases
or Excel files.
Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 3
4-2 B.Tech IT Regulation: R19 Data Science-UNIT-1

 SQL, or Structured Query Language, is the preferred way to manage and

query data that resides in databases.
 More often, data comes unstructured.
Unstructured Data
 Unstructured data is data that isn’t easy to fit into a data model because
the content is context-specific or varying.
 One example of unstructured data is our regular email.
 Although email contains structured elements such as the sender, title, and
body text, it’s a challenge to find the number of people who have written
an email complaint about a specific employee because so many ways
exist to refer to a person, for example. The thousands of different
languages and dialects out there further complicate this.
 A human-written email is also a perfect example of natural language data.
Natural language Data:
 Natural language is a special type of unstructured data; it’s challenging to
process because it requires knowledge of specific data science techniques
and linguistics.
 The natural language processing community has had success in entity
recognition, topic recognition, summarization, text completion, and
sentiment analysis, but models trained in one domain don’t generalize
well to other domains.
 Even state-of-the-art techniques aren’t able to decipher the meaning of
every piece of text. This shouldn’t be a surprise though: humans struggle
with natural language as well. It’s ambiguous by nature.
 The concept of meaning itself is questionable here. Have two people
listen to the same conversation. Will they get the same meaning? The
meaning of the same words can vary when coming from someone upset
or jowes.
Machine-generated Data:
 Machine-generated data is information that’s automatically created by a
computer, process, application, or other machine without human
intervention.
 Machine-generated data is becoming a major data resource and will
continue to do so.
 Wikibon has forecast that the market value of the industrial Internet (a
term coined by Frost & Sullivan to refer to the integration of complex
physical machinery with networked sensors and software) will be
approximately $540 billion in 2020.
 IDC (International Data Corporation) has estimated there will be 26 times
more connected things than people in 2020. This network is commonly
referred to as the internet of things.
Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 4
4-2 B.Tech IT Regulation: R19 Data Science-UNIT-1

 The analysis of machine data relies on highly scalable tools, due to its
high volume and speed.
 Examples of machine data are web server logs, call detail records,
network event logs, and telemetry.
Graph-based or Network Data:
 “Graph data” can be a confusing term because any data can be shown in a
graph. “Graph” in this case points to mathematical graph theory.
 In graph theory, a graph is a mathematical structure to model pair-wise
relationships between objects. Graph or network data is, in short, data that
focuses on the relationship or adjacency of objects.
 The graph structures use nodes, edges, and properties to represent and
store graphical data.
 Graph-based data is a natural way to represent social networks, and its
structure allows calculating specific metrics such as the influence of a
person and the shortest path between two people.
 Examples of graph-based data can be found on many social media
websites. For instance, on LinkedIn we can see who we know at which
company. Our follower list on Twitter is another example of graph-based
data.
 The power and sophistication comes from multiple, overlapping graphs of
the same nodes. For example, imagine the connecting edges here to show
“friends” on Face book. Imagine another graph with the same people
which connect business colleagues via LinkedIn. Imagine a third graph
based on movie interests on Netflix. Overlapping the three different-
looking graphs makes more interesting questions possible.

Audio, image, and video Data:

 Audio, image, and video are data types that pose specific challenges to a
data scientist.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 5
4-2 B.Tech IT Regulation: R19 Data Science-UNIT-1

 Tasks that are trivial for humans, such as recognizing objects in pictures,
turn out to be challenging for computers.
 MLBAM (Major League Baseball Advanced Media) announced in 2014
that they’ll increase video capture to approximately 7 TB per game for
the purpose of live, in-game analytics. High-speed cameras at stadiums
will capture ball and athlete movements to calculate in real time, for
example, the path taken by a defender relative to two baselines.
 Recently a company called DeepMind succeeded at creating an algorithm
that’s capable of learning how to play video games. This algorithm takes
the video screen as input and learns to interpret everything via a complex
process of deep learning.
 It’s a remarkable feat that prompted Google to buy the company for their
own Artificial Intelligence (AI) development plans. The learning
algorithm takes in data as it’s produced by the computer game; it’s
streaming data.
Streaming Data
 While streaming data can take almost any of the previous forms, it has an
extra property. The data flows into the system when an event happens
instead of being loaded into a data store in a batch. Although this isn’t
really a different type of data, we treat it here as such because we need to
adapt process to deal with this type of information.
 Examples are the “What’s trending” on Twitter, live sporting or music
events, and the stock market.
Data Science Process
(Steps in Data science process or Flow of Data science Process or phases of data science process)
 The data science process typically consists of six steps.
1. Setting the research goal
2. Gathering data (or) Retrieving data
3. Data preparation (or) Data Pre processing
4. Data exploration (or) Exploratory data analysis
5. Data Modeling (or) Model building
6. Presentation and Automation

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 6
4-2 B.Tech IT Regulation: R19 Data Science-UNIT-1

Brief explanation of the six steps of the data science process

 1. The first step of this process is setting a research goal. The main
purpose here is making sure all the stakeholders understand the what,
how, and why of the project. In every serious project this will result in a
project charter.
 2. The second phase is data retrieval. This step includes finding suitable
data and getting access to the data owner. The result is data in its raw
form, which probably needs polishing and transformation before it
becomes usable.
 3. This includes transforming the data from a raw form into data that’s
directly usable in models. To achieve this, we’ll detect and correct
different kinds of errors in the data, combine data from different data
sources, and transform it.
 4. The fourth step is data exploration. The goal of this step is to gain a
deep understanding of the data. We’ll look for patterns, correlations, and
deviations based on visual and descriptive techniques.
 5. Finally, we get to the sexiest part: model building (often referred to as
“data modeling”). It is now that we attempt to gain the insights or make
the predictions stated in project charter.
 6. The last step of the data science model is presenting results and
automating the analysis, if needed. Certain projects require performing
the business process over and over again, so automating the project will
save time.
Advantages of this approach:
 Following these six steps pays off in terms of a higher project success
ratio and increased impact of research results. This process ensures we
have a well-defined research plan, a good understanding of the business
question, and clear deliverables before we even start looking at data.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 7
4-2 B.Tech IT Regulation: R19 Data Science-UNIT-1

 Another benefit of following a structured approach is that we work more

in prototype mode while we search for the best model.
 Dividing a project into smaller stages also allows employees to work
together as a team.
Six steps of the data science process

Step 1: Defining research goals and creating a project charter

 A project starts by understanding the what, the why, and the how of
project.
 What does the company expect to do? And why does management place
such a value on research? How the company benefits from that?
Answering these three questions (what, why, how) is the goal of the first
phase, so that everybody knows what to do and can agree on the best
course of action.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 8
4-2 B.Tech IT Regulation: R19 Data Science-UNIT-1

 The outcome should be a clear research goal, a good understanding of the

context, well-defined deliverables, and a plan of action with a timetable.
This information is then best placed in a project charter.
 After we have a good understanding of the business problem, try to get a
formal agreement on the deliverables. All this information is best
collected in a project charter. For any significant project this would be
mandatory.
 A project charter requires teamwork, and the input covers at least the
following:
 A clear research goal
 The project mission and context
 How we’re going to perform analysis
 What resources we expect to use
 Proof that it’s an achievable project, or proof of concepts
 Deliverables and a measure of success
 A timeline
Step 2: Retrieving data

 The next step in data science is to retrieve the required data. This data is
either found within the company or retrieved from a third party.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 9
4-2 B.Tech IT Regulation: R19 Data Science-UNIT-1

 The mail goal of this phase is finding suitable data and getting access to
the data owner.
 Data can be stored in many forms, ranging from simple text files to tables
in a database. The data can be stored in official data repositories such as
databases, data marts, data warehouses, and data lakes maintained by a
team of IT professionals.
 Finding data even within own company can sometimes be a challenge. As
companies grow, their data becomes scattered around many places.
Knowledge of the data may be dispersed as people change positions and
leave the company
 Getting access to data is another difficult task.. Getting access to the data
may take time and involve company politics.
 Although data is considered an asset more valuable than oil by certain
companies, more and more governments and organizations share their
data for free with the world. This data can be of excellent quality.
Step 3: Cleansing, integrating, and transforming data
 This phase sanitize and prepare the data received from the data retrieval
phase, for use in the modeling and reporting phase.
 This phase involves, checking and remediating data errors, enriching the
data with data from other data sources, and transforming it into a suitable
format for models.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 10
4-2 B.Tech IT Regulation: R19 Data Science-UNIT-1

 This phase consist of the following main steps:

 Data cleaning
 Data transformation
 Data Integration (Combining data)
Cleansing data
 Data cleansing is a subprocess of the data science process that focuses on
removing errors in data so that data becomes a true and consistent
representation of the processes it originates from.
 By “true and consistent representation” we imply that at least two types
of errors exist. The first type is the interpretation error, such “a person’s
age is greater than 300 years.”
 The second type of error points to inconsistencies between data sources .
An example of this class of errors is putting “Female” in one table and
“F” in another when they represent the same thing: that the person is
female. Another example is that we use Pounds in one table and Dollars
in another.

Data Entry Errors

 Data collection and data entry are error-prone processes. They often
require human intervention; they make typos or lose their concentration
for a second and introduce an error into the chain.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 11
4-2 B.Tech IT Regulation: R19 Data Science-UNIT-1

 But data collected by machines or computers isn’t free from errors either.
Errors can arise from human sloppiness, whereas others are due to
machine or hardware failure. Examples of errors originating from
machines are transmission errors or bugs in the extract, transform, and
load phase (ETL).
 For small data sets we can check every value by hand. Detecting data
errors when the variables we study don’t have many classes can be done
by tabulating the data with counts.
Redundant Whitespace
 Whitespaces tend to be hard to detect but cause errors like other
redundant characters.
 If we know to watch out for them, fixing redundant whitespaces is luckily
easy enough in most programming languages. They all provide string
functions that will remove the leading and trailing whitespaces. For
instance, in Python we can use the strip() function to remove leading and
trailing spaces.
Impossible Values and Sanity Checks
 Sanity checks are another valuable type of data check.
 Here we check the value against physically or theoretically impossible
values such as people taller than 3 meters or someone with an age of 299
years.
 Sanity checks can be directly expressed with rules: check = 0 <= age <= 120
Outliers
 An outlier is an observation that seems to be distant from other observations
or, more specifically, one observation that follows a different logic or
generative process than the other observations.
 The easiest way to find outliers is to use a plot or a table with the minimum
and maximum values.
 Outliers can gravely influence the data modeling, so investigate them
first.
Dealing with Missing Values
 Missing values aren’t necessarily wrong, but still need to handle them
separately.
 Certain modeling techniques can’t handle missing values.
 They might be an indicator that something went wrong in data collection
or that an error happened in the ETL process.
 Common techniques data scientists use are listed in following table:

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 12
4-2 B.Tech IT Regulation: R19 Data Science-UNIT-1

Deviations from a Code Book

 Detecting errors in larger data sets against a code book or against
standardized values can be done with the help of set operations.
 A code book is a description of data, a form of metadata. It contains
things such as the number of variables per observation, the number of
observations, and what each encoding within a variable means.
Different Units of Measurement
 When integrating two data sets, we have to pay attention to their
respective units of measurement.
 An example of this would be when we study the prices of gasoline in the
world. To do this we gather data from different data providers. Data sets
can contain prices per gallon and others can contain prices per liter.
Different Levels of Aggregation
 Having different levels of aggregation is similar to having different types
of measurement.
 An example of this would be a data set containing data per week versus
one containing data per work week.
 This type of error is generally easy to detect, and summarizing (or the
inverse, expanding) the data sets will fix it.
Step 4: Exploratory data analysis
 During exploratory data analysis we take a deep dive into the data.
 The goal of this step is to gain a deep understanding of the data.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 13
4-2 B.Tech IT Regulation: R19 Data Science-UNIT-1

 Information becomes much easier to grasp when shown in a picture,

therefore we mainly use graphical techniques to gain an understanding of
data and the interactions between variables.
 The main goal of this phase is discovering anomalies missed before,
forcing to take a step back and fix them.

 The visualization techniques use in this phase range from simple line
graphs or histograms to more complex diagrams such as Sankey and
network graphs.
 Sometimes it’s useful to compose a composite graph from simple graphs
to get even more insight into the data. Other times the graphs can be
animated or made interactive to make it.
 Another technique called brushing and linking can also be used. With
brushing and linking we combine and link different graphs and tables (or
views) so changes in one graph are automatically transferred to the other
graphs. This interactive exploration of data facilitates the discovery of
new insights.
 Tabulation, clustering, and other modeling techniques can also be a part
of exploratory analysis.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 14
4-2 B.Tech IT Regulation: R19 Data Science-UNIT-1

Step 5: Build the models

 Building a model is an iterative process.

 Most models consist of the following main steps:
1. Selection of a modeling technique and variables to enter in the model
2. Execution of the model
3. Diagnosis and model comparison
Model and variable selection:
 In this step it is need to select the variables we want to include in model
and a modelling technique.
 Exploratory analysis gives a fair idea of what variables will help to
construct a good model.
 Many modeling techniques are available, and choosing the right model
for a problem requires judgment.
 Model is selected based on
 model performance
 whether project meets all the requirements to model
 whether project is easy to implement
 the maintenance on the model
 whether the model easy to explain
Model execution:
 This step involves implementing model in code.
 Most programming languages, such as Python, already have libraries
such as StatsModels or Scikit-learn. These packages use several of the
most popular techniques.
 Coding a model is a nontrivial task in most cases, so having these
libraries available can speed up the process.
 Only a handful of techniques have industry-ready implementations in
Python. But it’s fairly easy to use models that are available in R within
Python with the help of the RPy library. RPy provides an interface from
Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 15
4-2 B.Tech IT Regulation: R19 Data Science-UNIT-1

Python to R. R is a free software environment, widely used for statistical

computing.
Model diagnostics and model comparison
 From multiple models we can choose the best one based on multiple
criteria.
 Working with a holdout sample helps to pick the best-performing model.
 A holdout sample is a part of the data that leave out of the model building
so it can be used to evaluate the model afterward.
 The principle here is simple: the model should work on unseen data. use
only a fraction of data to estimate the model and the other part, the
holdout sample, is kept out of the equation.
 The model is then unleashed on the unseen data and error measures are
calculated to evaluate it. Multiple error measures are available.
 Once the model is trained, calculate the model error with an error
measure. Then we choose the model with the lowest error.
 Many models make strong assumptions, such as independence of the
inputs, and have to verify that these assumptions are indeed met. This is
called model diagnostics.
Step 6: Presenting findings and building applications on top of them

 The last step of the data science model is presenting results to the world
or stakeholders or business and automating the analysis, if needed.
 The main aim of this phase is presenting the results to the stakeholders
and automating or industrializing analysis process for repetitive reuse and
integration with other tools
 These results can take many forms, ranging from presentations to research
reports.
 Sometimes it is need to automate the execution of the process because the
business will want to use the insights gained in another project or enable
an operational process to use the outcome from model.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 16
4-2 B.Tech IT Regulation: R19 Data Science-UNIT-1

 Certain projects require performing the business process over and over
again, so automating the project will save time.
 The last stage of the data science process is where soft skills will be most
useful, and they’re extremely important.
Unit-1 Tutorial Questions
1. What is data science? Explain briefly the need for data science with
real time applications.

2. List and describe the facets of data with examples.

3. Briefly explain the steps in the data science process with neat
diagram.

4. How do we set the research goal, retrieving data and data preparation
in data science process?

5. Explain the data exploration, data modelling and presentation phases

in data science process.

6. What is data cleaning? Explain common errors detected in data

cleaning process with possible solutions.

7. Give an overview of common techniques data science use to handle

missing data.

8. Write short note on the following phases of data science process

i) Data preparation
ii) Data exploration
iii) Data modelling

9. Write short note on the following

i) Model diagnostics.
ii) Dummy variables
iii) Brushing and linking.
iv) Joining and appending
v) Interpretation error and inconsistencies

10. What are the advantages of following a structured approach in data

science process?

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 17

Data Science SPPU
No ratings yet
Data Science SPPU
115 pages
Unit 1 PPT 1
No ratings yet
Unit 1 PPT 1
27 pages
Unit I
No ratings yet
Unit I
262 pages
Unit 1 FUNDAMENTALS OF DATA SCIENCE-1
No ratings yet
Unit 1 FUNDAMENTALS OF DATA SCIENCE-1
27 pages
CS3352 Fds
No ratings yet
CS3352 Fds
23 pages
Data Science & Its Applications
No ratings yet
Data Science & Its Applications
59 pages
Module 1 - 1
No ratings yet
Module 1 - 1
48 pages
Unit 1 - Data Science - III BSC Cs.
No ratings yet
Unit 1 - Data Science - III BSC Cs.
14 pages
Unit 1 To 5
No ratings yet
Unit 1 To 5
202 pages
Introductiontodatascience 230122140841 B90a0856 1
No ratings yet
Introductiontodatascience 230122140841 B90a0856 1
44 pages
Module-1: Introduction To Data Science
No ratings yet
Module-1: Introduction To Data Science
98 pages
Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
70 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
36 pages
Data Science Unit I
No ratings yet
Data Science Unit I
13 pages
FODS Full Notes
No ratings yet
FODS Full Notes
217 pages
Unit-1 IDS
No ratings yet
Unit-1 IDS
26 pages
Unit 1 Final
No ratings yet
Unit 1 Final
75 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
37 pages
Mod 3
No ratings yet
Mod 3
96 pages
Unit I-Introduction of Data Science & R Programming: What Is Data Science? What Is Data Science?
No ratings yet
Unit I-Introduction of Data Science & R Programming: What Is Data Science? What Is Data Science?
30 pages
Unit 1-FDS
100% (2)
Unit 1-FDS
18 pages
Data Science Unit 1 Notes
No ratings yet
Data Science Unit 1 Notes
22 pages
Machine Learning Unit-1.1
No ratings yet
Machine Learning Unit-1.1
43 pages
CS 3353 FDS Unit 1 Notes JPR
No ratings yet
CS 3353 FDS Unit 1 Notes JPR
39 pages
347 862932 Introduction
No ratings yet
347 862932 Introduction
35 pages
Data Science - Unit 1 MDM
No ratings yet
Data Science - Unit 1 MDM
64 pages
Data Science
No ratings yet
Data Science
244 pages
Himadev
No ratings yet
Himadev
37 pages
3961502-Class10 Ai Part B Unit3 Unit3 Data Science
No ratings yet
3961502-Class10 Ai Part B Unit3 Unit3 Data Science
15 pages
Introductiontodatascience 230122140841 B90a0856
No ratings yet
Introductiontodatascience 230122140841 B90a0856
44 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
15 pages
Question Bank Syllbuswise
No ratings yet
Question Bank Syllbuswise
16 pages
Unit 1
No ratings yet
Unit 1
76 pages
Unit 1
No ratings yet
Unit 1
25 pages
Fdsunit 1
No ratings yet
Fdsunit 1
27 pages
22UCS303 DS-Unit I-N
No ratings yet
22UCS303 DS-Unit I-N
42 pages
Unit 1
No ratings yet
Unit 1
28 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Ds Unit 1
No ratings yet
Ds Unit 1
18 pages
DS R Unit-1
No ratings yet
DS R Unit-1
41 pages
Data v2
No ratings yet
Data v2
25 pages
Data Science: by Neha Tyagi
100% (1)
Data Science: by Neha Tyagi
17 pages
Vishwha D
No ratings yet
Vishwha D
29 pages
Unit 1
No ratings yet
Unit 1
26 pages
Data Science - FYBCA-Sem-II
No ratings yet
Data Science - FYBCA-Sem-II
13 pages
Machine Learning Unit-1.1
No ratings yet
Machine Learning Unit-1.1
29 pages
Dsbda Unit 1
No ratings yet
Dsbda Unit 1
119 pages
Basics of Data Science KPK
No ratings yet
Basics of Data Science KPK
38 pages
BCA Lecture I
No ratings yet
BCA Lecture I
20 pages
Project Report
No ratings yet
Project Report
29 pages
FDS Unit 1
No ratings yet
FDS Unit 1
21 pages
Accomplishment Report Ict
100% (2)
Accomplishment Report Ict
2 pages
Samsung Bloatware List
No ratings yet
Samsung Bloatware List
2 pages
Lesson1 Introduction To The Data Science Process and The Value of Learning Data Science
No ratings yet
Lesson1 Introduction To The Data Science Process and The Value of Learning Data Science
6 pages
AI Syllbus
No ratings yet
AI Syllbus
5 pages
Report Header: Message 1
No ratings yet
Report Header: Message 1
2 pages
IRS Imp
No ratings yet
IRS Imp
76 pages
Sy0-701 9
No ratings yet
Sy0-701 9
23 pages
ADA UNIT 3 Complete Notes
No ratings yet
ADA UNIT 3 Complete Notes
59 pages
CD Practice Question Booklet
No ratings yet
CD Practice Question Booklet
133 pages
GRADE 11 MATH Mod 4
No ratings yet
GRADE 11 MATH Mod 4
23 pages
MNV PRT RM
No ratings yet
MNV PRT RM
21 pages
Lecture 1 Object-Oriented Design and Implementation
No ratings yet
Lecture 1 Object-Oriented Design and Implementation
201 pages
AFF FAS8300 and FAS8700 Install and Setup
No ratings yet
AFF FAS8300 and FAS8700 Install and Setup
13 pages
Techno Various Fonts Dafont - Com 3
No ratings yet
Techno Various Fonts Dafont - Com 3
1 page
SuccessFactors With Microsoft 365
No ratings yet
SuccessFactors With Microsoft 365
41 pages
4IR Assi (AH Sir)
No ratings yet
4IR Assi (AH Sir)
18 pages
07 Sp20 Java Programming Essentials
No ratings yet
07 Sp20 Java Programming Essentials
95 pages
Presentation On Tally - ERP 9
No ratings yet
Presentation On Tally - ERP 9
10 pages
Idap 2019 8875953
No ratings yet
Idap 2019 8875953
6 pages
List Dict Set Tuple
No ratings yet
List Dict Set Tuple
23 pages
Factors Influencing Intention of Mobile Application Use Seok Kang Department of Communication, University of Texas at San Antonio
No ratings yet
Factors Influencing Intention of Mobile Application Use Seok Kang Department of Communication, University of Texas at San Antonio
20 pages
Chapter 06-Statistical Methods in Quality Management: True/False
No ratings yet
Chapter 06-Statistical Methods in Quality Management: True/False
20 pages
INVENTORY
No ratings yet
INVENTORY
7 pages
Math. Runge-Kutta Application To A Tank Concentration. Jeff Munic
No ratings yet
Math. Runge-Kutta Application To A Tank Concentration. Jeff Munic
22 pages
VXR P Doc List
No ratings yet
VXR P Doc List
5 pages
Sasvinaa Kandasamy (DSTR Final TP053388)
No ratings yet
Sasvinaa Kandasamy (DSTR Final TP053388)
34 pages
T2-Orion Throttle Base-Manual en V 1.0
No ratings yet
T2-Orion Throttle Base-Manual en V 1.0
8 pages
Enterprise Resource Planning (Department Elective) : BE IT - 2019-20 Sample Question Paper For Final Exam
No ratings yet
Enterprise Resource Planning (Department Elective) : BE IT - 2019-20 Sample Question Paper For Final Exam
8 pages
Covid To Cancer: How AI Is Being Used To Beat Deadly Diseases
No ratings yet
Covid To Cancer: How AI Is Being Used To Beat Deadly Diseases
1 page
Aodv Protocol Thesis
No ratings yet
Aodv Protocol Thesis
4 pages
Omori Headspace Vinyl Figure Collection - Omocat
No ratings yet
Omori Headspace Vinyl Figure Collection - Omocat
1 page
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
From Everand
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
Marlowe Reyes
No ratings yet
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
From Everand
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
daniel Huston
No ratings yet
Data Science Essentials: Machine Learning and Natural Language Processing
From Everand
Data Science Essentials: Machine Learning and Natural Language Processing
Angel Gabaldon
No ratings yet
Data Mining: Concepts, Fundamentals And Applications
From Everand
Data Mining: Concepts, Fundamentals And Applications
Enrico Guardelli
No ratings yet

Data Science Unit 1

Uploaded by

Data Science Unit 1

Uploaded by

4-2 B.

Tech IT Regulation: R19 Data Science-UNIT-1

 Relation between AI, ML, DL and data science

Need for Data Science

 Many companies use data science to offer customers a better user

 SQL, or Structured Query Language, is the preferred way to manage and

Audio, image, and video Data:

Brief explanation of the six steps of the data science process

 Another benefit of following a structured approach is that we work more

Step 1: Defining research goals and creating a project charter

 The outcome should be a clear research goal, a good understanding of the

 This phase consist of the following main steps:

Data Entry Errors

Deviations from a Code Book

 Information becomes much easier to grasp when shown in a picture,

Step 5: Build the models

 Building a model is an iterative process.

Python to R. R is a free software environment, widely used for statistical

2. List and describe the facets of data with examples.

5. Explain the data exploration, data modelling and presentation phases

6. What is data cleaning? Explain common errors detected in data

7. Give an overview of common techniques data science use to handle

8. Write short note on the following phases of data science process

9. Write short note on the following

10. What are the advantages of following a structured approach in data

You might also like