0% found this document useful (0 votes)

30 views40 pages

DV - Unit 1

The document discusses various topics related to data including data, databases, data centers, data warehousing, data visualization, data collection methods like census, sampling, experimental, and observational. It also discusses statistics concepts like descriptive statistics, mean, median, mode and inferential statistics. The document then talks about random variables, probability distributions, and applications of data science.

Uploaded by

Narenkumar. N

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views40 pages

DV - Unit 1

Uploaded by

Narenkumar. N

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

SUBHASHREE K

SOFTWARE TECHNICAL TRAINER

IBM

Data
Visualization
UNIT I

INTRODUCTION TO STATISTICS
DATA
Data is a collection of facts, such as numbers, words, measurements,
observations or just descriptions of things.
Collective numbers of information's are known as Data.
Relative to today's computers and transmission media, data is
information converted into binary digital form.
Ex:- Name and address is information.
Name and all the things like Phone number, Address, Gender ,Father’s
name are Data
DATABASE
A database is an organized collection of structured information, or
data, typically stored electronically in a computer system.
The data can then be easily accessed, managed, modified, updated,
controlled, and organized.
Most databases use structured query language (SQL) for writing
and querying data.
Ex: Student Database, Employee Database
DATABASE
Ex:

Data definition language (DDL)

Data manipulation language (DML)

Data control language (DCL)

Transaction control language (TCL)

SQL.

XQuery.

OQL.
DATA CENTER
A Data center is a physical facility that organizations use to house
their critical applications and data.
A data center's design is based on a network of computing and
storage resources that enable the delivery of shared applications
and data.
The key components of a data center design include routers,
switches, firewalls, storage systems, servers, and
application-delivery controllers
DATA CENTER
DATA WAREHOUSE
A data warehouse is a central repository of information that can be
analyzed to make more informed decisions.
Data flows into a data warehouse from transactional
systems, relational databases, and other sources.
Business analysts, data engineers, data scientists, and decision
makers access the data through business intelligence (BI) tools
DATA WAREHOUSE
Informed decision making
Informed decision making
Consolidated data from many sources
Consolidated data from many sources
Historical data analysis.
Historical data analysis. Data quality, consistency, and accuracy.
Separation of analytics processing from transactional databases, which improves performance of both systems.
Data quality, consistency, and accuracy.
ETL- Extract, Transform and load

Separation of analytics processing from transactional databases,

which improves performance of both systems.
ETL- Extract, Transform and load
DATA VISUALIZATION
In this technical world, a lot of data is being generated on a day by day.
And sometimes to analyze this data for certain trends, patterns may
become difficult if the data is in its raw format.
Data visualization provides a good, organized pictorial representation of
the data which makes it easier to understand, observe, analyze.
Communicating the information correctly too the required target
peoples.
Eg:-Cricket score of batsman
DATA COLLECTION
Data represents information collected in the form of numbers and
text.
Data collection is generally done after the experiment or
observation.
Primary data and Secondary data are helpful in planning and
estimating.
Data collection is either qualitative or quantitative.
DATA COLLECTION
Census data collection
Sample data collection
Experimental data collection
Observational data collection
CENSUS DATA COLLECTION
Census data collection is a method of collecting data whereby
all the data from every member of the population is collected.
SAMPLE DATA COLLECTION
Sample data collection, which is commonly just referred to as sampling,
is a method which collects data from only a chosen portion of the
population.
Sampling is used commonly in everyday life.
for example, all the different research polls that are conducted before
elections. Pollsters don't ask all the people in a given state who they'll
vote for, but they choose a small sample and assume that these people
represent how the entire population of the state is likely to vote.
SAMPLE DATA COLLECTION
History has shown that these polls are almost always close to accuracy,
and as such sampling is a very powerful tool in statistics.
EXPERIMENTAL DATA COLLECTION
Experimental data collection involves one performing an experiment and
then collecting the data to be further analyzed. Experiments involve tests
and the results of these tests are your data.
An example of experimental data collection is rolling a die one hundred
times while recording the outcomes. Your data would be the results you
get in each roll. The experiment could involve rolling the die in different
ways and recording the results for each of those different ways.
EXPERIMENTAL DATA COLLECTION
OBSERVATIONAL DATA COLLECTION
Observational data collection method involves not carrying
out an experiment but observing without influencing the
population at all.
Observational data collection is popular in studying trends
and behaviors of society where, for example, the lives of a
bunch of people are observed and data is collected for the
different aspects of their lives. Analysis of data collected in
such ways can broadly categorized into 2 categories called
descriptive and inferential statistics.
OBSERVATIONAL DATA COLLECTION
STATISTICS
Statistics is a mathematical science including methods of
collecting, organizing and analyzing data in such a way that
meaningful conclusions can be drawn from them.
Data can be defined as groups of information that represent
the qualitative or quantitative attributes of a variable or set of
variables.
An example of data can be the ages of the students in a given
class. When you collect those ages, that becomes your data.
DESCRIPTIVE STATISTICS
Descriptive statistics deals with the processing of data without
attempting to draw any inferences from it. The data are
presented in the form of tables and graphs. The characteristics
of the data are described in simple terms. Events that are dealt
with include everyday happenings such as accidents, prices of
goods, business, incomes, epidemics, sports data, population
data.
DESCRIPTIVE STATISTICS
MEAN

The mean value is the average value of the whole data.

To calculate the mean, find the sum of all values, and divide the
sum by the number of values.
MEDIAN
The median value is the value in the middle.
Prefect partition value.
MODE
Mode value is the value that appears the most number of times in
the dataset.
Calculating the highest repeated value in the data set.
INFERENTIAL STATISTICS

The objective of making inference from data is to make intelligent

assertion like
1. People who don’t smoke live longer than people who smoke.
2. 80% of all vehicle in USA are 4 wheelers.

1. P
INFERENTIAL STATISTICS

In our regular life, we make decision driven by data.

It is always better idea to back our decision. In case if we don’t have
data to back our decision, it can be easy that we can make wrong
conclusion.
It is a tangible way which you can use to defend yourself from the
consequence of a decision which was correct based on information
available at the point when decision was made, and which then went
wrong later.

1. P
RANDOM VARIABLE

Whose value cannot be determined before an event happens.

Example:-
1. A person’s blood type.
2. Number of leaves on a tree.
3. Number of times a user visits Linked in a day.

1. P
PROBABILITY DISTRIBUTIONS

Probability distributions are functions that calculates the

probabilities of the outcomes of random variables.
Typical examples of random variables are coin tosses and dice
rolls.
1. P
PROBABILITY DISTRIBUTIONS

Normally distributed data can be transformed into a standard normal

distribution.
Standardizing normally distributed data makes it easier to compare
different sets of data.
The standard normal distribution is used for:
1. Calculating confidence intervals
1. P

2. Hypothesis tests
DATA SCIENCE

Data Science is the area of study which involves extracting insights

from vast amounts of data using various scientific methods, algorithms,
and processes. It helps you to discover hidden patterns from the raw
data. The term Data Science has emerged because of the evolution of
mathematical statistics, data analysis, and big data.
P
Why Data Science?
Data Science can help you to detect fraud using advanced machine
learning algorithms
It helps you to prevent any significant monetary losses
Allows to build intelligence ability in machines
You can perform sentiment analysis to gauge customer brand loyalty
It enables you to take better and faster decisions
It helps you to recommend the right product to the right customer to
enhance your business
P
Why Data Science?
APPLICATION

Internet Search: Google search uses Data science technology to

search for a specific result within a fraction of a second
Recommendation Systems: To create a recommendation system. For
example, “suggested friends” on Facebook or suggested videos” on
YouTube, everything is done with the help of Data Science.
Image & Speech Recognition: Speech recognizes systems like Siri,
Google Assistant, and Alexa run on the Data science technique.
Moreover, Facebook recognizes your friend when you upload a photo
with them, with the help of Data Science.
APPLICATION

Gaming world: EA Sports, Sony, Nintendo are using Data science

technology. This enhances your gaming experience. Games are now
developed using Machine Learning techniques, and they can update
themselves when you move to higher levels.
Online Price Comparison: PriceRunner, Junglee, Shopzilla work on
the Data science mechanism. Here, data is fetched from the relevant
websites using APIs.
CHALLENGES

A high variety of information & data is required for accurate analysis

Not adequate data science talent pool available
Management does not provide financial support for a data science team
Unavailability of/difficult access to data
Business decision-makers do not effectively use data Science results
Explaining data science to others is difficult
Privacy issues
Lack of significant domain expert
If an organization is very small, it can’t have a Data Science team
DATA PREPROCESSING

A preliminary processing of data in order to prepare it for the primary

processing or for further analysis.
The term can be applied to any first or preparatory processing stage
when there are several steps required to prepare data for the user.
For example, extracting data from a larger set, filtering it for various
reasons and combining sets of data could be preprocessing steps. See
preprocessor and compiler directive.
STEPS IN DATA PREPROCESSING
STEPS IN DATA PREPROCESSING

Data transformation: Here, data scientists think about how different aspects of the
data need to be organized to make the most sense for the goal. This could include
things like structuring unstructured data, combining salient variables when it makes
sense or identifying important ranges to focus on.
Data enrichment: In this step, data scientists apply the various feature engineering
libraries to the data to effect the desired transformations. The result should be a data
set organized to achieve the optimal balance between the training time for a new
model and the required compute.
Data validation: At this stage, the data is split into two sets. The first set is used to
train a machine learning or deep learning model. The second set is the testing data
that is used to gauge the accuracy and robustness of the resulting model. This
second step helps identify any problems in the hypothesis used in the cleaning and
feature engineering of the data.

Justine, or The Misfortunes of Virtue by Marquis de Sade
No ratings yet
Justine, or The Misfortunes of Virtue by Marquis de Sade
324 pages
Unit-3 DS
No ratings yet
Unit-3 DS
21 pages
பூக்குழி பெருமாள்முருகன்
No ratings yet
பூக்குழி பெருமாள்முருகன்
108 pages
Data Science 5
100% (4)
Data Science 5
216 pages
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
What Exactly Is Data Science
No ratings yet
What Exactly Is Data Science
15 pages
Part 1 - Basic Statistics
No ratings yet
Part 1 - Basic Statistics
44 pages
Data Science UNIT 1 Final
No ratings yet
Data Science UNIT 1 Final
107 pages
Fha Unit 1 Introduction
No ratings yet
Fha Unit 1 Introduction
8 pages
Data Science 1
100% (4)
Data Science 1
133 pages
FDS - Lecture Notes - III AIML, CSM
No ratings yet
FDS - Lecture Notes - III AIML, CSM
101 pages
3.question Bank
No ratings yet
3.question Bank
7 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
MGT 1103
No ratings yet
MGT 1103
4 pages
4.02 Statistics Fundamentals
No ratings yet
4.02 Statistics Fundamentals
2 pages
DSA Module 1 Notes
No ratings yet
DSA Module 1 Notes
24 pages
TYCS DS Unit1
No ratings yet
TYCS DS Unit1
28 pages
Lecture 1,2&3
No ratings yet
Lecture 1,2&3
80 pages
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
53 pages
Lecture1 Introduction
No ratings yet
Lecture1 Introduction
49 pages
D2 - Data, Data Types, and Descriptive Statistics
No ratings yet
D2 - Data, Data Types, and Descriptive Statistics
21 pages
Lecture 01-05 Data, Central Tendency PDF
No ratings yet
Lecture 01-05 Data, Central Tendency PDF
51 pages
FDS Notes
No ratings yet
FDS Notes
5 pages
Getting Started With Data Science: Grade VIII
No ratings yet
Getting Started With Data Science: Grade VIII
32 pages
Data Science - Ebook
No ratings yet
Data Science - Ebook
32 pages
Lecture 1
No ratings yet
Lecture 1
27 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
209 pages
Introduction To Predictive Analytics
No ratings yet
Introduction To Predictive Analytics
92 pages
Unit - II - Part I - Importance of Statistics in Data Science
No ratings yet
Unit - II - Part I - Importance of Statistics in Data Science
10 pages
CH 1
No ratings yet
CH 1
34 pages
P&S New Notes-A
No ratings yet
P&S New Notes-A
22 pages
DS Module 1 Notes
No ratings yet
DS Module 1 Notes
25 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
34 pages
DS Mod 1 To 2 Complete Notes
No ratings yet
DS Mod 1 To 2 Complete Notes
63 pages
IAT 2 Part A - DS
No ratings yet
IAT 2 Part A - DS
5 pages
Unit-1 - Introduction To Data Science
No ratings yet
Unit-1 - Introduction To Data Science
17 pages
08-MBA-DATA ANALYTICS - Data Science and Business Analysis - Unit 1
No ratings yet
08-MBA-DATA ANALYTICS - Data Science and Business Analysis - Unit 1
40 pages
FDS Module 1 Notes
No ratings yet
FDS Module 1 Notes
27 pages
03-07-2024-Data Science - Orentation Programme
No ratings yet
03-07-2024-Data Science - Orentation Programme
53 pages
Datas Unit1
No ratings yet
Datas Unit1
20 pages
Ass-3 Ds
No ratings yet
Ass-3 Ds
7 pages
Data Science Intro Session-18 & 19
No ratings yet
Data Science Intro Session-18 & 19
48 pages
Project Report
No ratings yet
Project Report
29 pages
Big Data Chapter 2
No ratings yet
Big Data Chapter 2
62 pages
Statistical Learning - Introduction
No ratings yet
Statistical Learning - Introduction
20 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
62 pages
Unit-1 Data Science
No ratings yet
Unit-1 Data Science
74 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
16 pages
Chirag Modi Data Science Report
No ratings yet
Chirag Modi Data Science Report
29 pages
Data Science 3
100% (1)
Data Science 3
216 pages
Data Science-New (Unit-I)
No ratings yet
Data Science-New (Unit-I)
18 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
6 pages
Data Science A Beginner S Guide 1668243666
100% (1)
Data Science A Beginner S Guide 1668243666
26 pages
Session1 DataCharacteristics
No ratings yet
Session1 DataCharacteristics
41 pages
CS3352-QB Fds
No ratings yet
CS3352-QB Fds
12 pages
PSAI Unit 1
No ratings yet
PSAI Unit 1
70 pages
Notes (Chapter 1 - 3)
No ratings yet
Notes (Chapter 1 - 3)
15 pages
L1 - Introduction To Data Science
No ratings yet
L1 - Introduction To Data Science
33 pages
Unit 1
No ratings yet
Unit 1
11 pages
College (Untitled)
No ratings yet
College (Untitled)
27 pages
Being John Malkovich - A Screenplay by Charlie Kaufman
No ratings yet
Being John Malkovich - A Screenplay by Charlie Kaufman
148 pages
Untitled - English Revision 2
No ratings yet
Untitled - English Revision 2
11 pages
Draft
No ratings yet
Draft
7 pages
Is QB
No ratings yet
Is QB
1 page
AI Unit 1 QB
No ratings yet
AI Unit 1 QB
1 page
Face To Face
No ratings yet
Face To Face
136 pages
Script
No ratings yet
Script
4 pages
Socialalienationnavitha Sree
No ratings yet
Socialalienationnavitha Sree
5 pages
Wa0002.
No ratings yet
Wa0002.
79 pages
ED Unit 1
No ratings yet
ED Unit 1
8 pages
Unit 3 Is
No ratings yet
Unit 3 Is
2 pages
Ex 5
No ratings yet
Ex 5
3 pages
Aimbot
No ratings yet
Aimbot
1 page
Aim Bot
No ratings yet
Aim Bot
11 pages
VB Unit 3 Notes
No ratings yet
VB Unit 3 Notes
17 pages
The Current State of Psychoanalytically
No ratings yet
The Current State of Psychoanalytically
6 pages
AI Unit 2 QB
No ratings yet
AI Unit 2 QB
1 page
Gigerenzer 2023 Psychological Ai Designing Algorithms Informed by Human Psychology
No ratings yet
Gigerenzer 2023 Psychological Ai Designing Algorithms Informed by Human Psychology
10 pages
VB Lab Full Programs Record
No ratings yet
VB Lab Full Programs Record
50 pages
ஜெயகாந்தனின் வெளிவராத கதைகள்
No ratings yet
ஜெயகாந்தனின் வெளிவராத கதைகள்
118 pages
பெரியார் - அ.மார்க்ஸ்
No ratings yet
பெரியார் - அ.மார்க்ஸ்
70 pages
NTFX Price Prediction
No ratings yet
NTFX Price Prediction
5 pages
Dog
No ratings yet
Dog
2 pages
Linux Record 2023 24
No ratings yet
Linux Record 2023 24
36 pages
DV - Unit 2
No ratings yet
DV - Unit 2
73 pages
Good Fellas
No ratings yet
Good Fellas
126 pages
Module 4 - Provide Valet Services To Guest
No ratings yet
Module 4 - Provide Valet Services To Guest
5 pages
GEFIL1 SIM Week 7-9 Mohinog PDF
0% (1)
GEFIL1 SIM Week 7-9 Mohinog PDF
40 pages
Principles of Public Speaking Syllabus - Ms. Catherine Linobo
No ratings yet
Principles of Public Speaking Syllabus - Ms. Catherine Linobo
7 pages
Discounted Cash Flows Method
No ratings yet
Discounted Cash Flows Method
36 pages
Seminar On: Electronic Braking System (Ebs)
No ratings yet
Seminar On: Electronic Braking System (Ebs)
21 pages
Accounting Information Systems 14th Edition (Ebook PDF) Download
100% (1)
Accounting Information Systems 14th Edition (Ebook PDF) Download
58 pages
VBQ-XII - English Core - 2
No ratings yet
VBQ-XII - English Core - 2
25 pages
Chapter 4 - Kanban Agile Method
No ratings yet
Chapter 4 - Kanban Agile Method
5 pages
Value Creation Through Mergers and Acquistion - Eicher Motors
No ratings yet
Value Creation Through Mergers and Acquistion - Eicher Motors
21 pages
BY:-Walabuma Lenjiso: Advisor
No ratings yet
BY:-Walabuma Lenjiso: Advisor
22 pages
Ai For IT Coders
No ratings yet
Ai For IT Coders
18 pages
Int J Mental Health Nurs - 2003 - Happell - Burnout and Job Satisfaction A Comparative Study of Psychiatric Nurses From
No ratings yet
Int J Mental Health Nurs - 2003 - Happell - Burnout and Job Satisfaction A Comparative Study of Psychiatric Nurses From
9 pages
IMD MBA Class Profiles
No ratings yet
IMD MBA Class Profiles
16 pages
The Geisha Memory 2
No ratings yet
The Geisha Memory 2
25 pages
Lm3622 Aplication Circuit
No ratings yet
Lm3622 Aplication Circuit
2 pages
AN-1525 Single Supply Operation of The DAC0800 and DAC0802: Application Report
No ratings yet
AN-1525 Single Supply Operation of The DAC0800 and DAC0802: Application Report
7 pages
ESC201T L13 Sinusoidal Analysis Phasors
No ratings yet
ESC201T L13 Sinusoidal Analysis Phasors
29 pages
Euphoria User Manual
0% (1)
Euphoria User Manual
795 pages
Creative Non-Fiction - Q3 - W6
100% (5)
Creative Non-Fiction - Q3 - W6
17 pages
Straightforward A2 - Unit 1 - Mini Test
No ratings yet
Straightforward A2 - Unit 1 - Mini Test
4 pages
Automatic Power Switching Mains, Solar, Inverter
No ratings yet
Automatic Power Switching Mains, Solar, Inverter
14 pages
Changes in Jump Performance and Muscle Activity Following Soccer-Specific Exercise
No ratings yet
Changes in Jump Performance and Muscle Activity Following Soccer-Specific Exercise
9 pages
Brand Personality
No ratings yet
Brand Personality
3 pages
Hindu Conceptions of Law
No ratings yet
Hindu Conceptions of Law
25 pages
L5 Evaluating Materials
No ratings yet
L5 Evaluating Materials
11 pages
TPEditor V1.10 Manual
No ratings yet
TPEditor V1.10 Manual
100 pages
He Sas 1
No ratings yet
He Sas 1
3 pages
Geoid - Wikipedia
No ratings yet
Geoid - Wikipedia
23 pages
(FREE PDF Sample) Mostly Codeless Game Development: New School Game Engines Robert Ciesla Ebooks
100% (2)
(FREE PDF Sample) Mostly Codeless Game Development: New School Game Engines Robert Ciesla Ebooks
55 pages
BKI - Vol 2 - Rules For Hull
67% (3)
BKI - Vol 2 - Rules For Hull
355 pages

DV - Unit 1

Uploaded by

DV - Unit 1

Uploaded by

SUBHASHREE K

SOFTWARE TECHNICAL TRAINER

Data definition language (DDL)

Data manipulation language (DML)

Data control language (DCL)

Transaction control language (TCL)

Separation of analytics processing from transactional databases,

The mean value is the average value of the whole data.

The objective of making inference from data is to make intelligent

In our regular life, we make decision driven by data.

Whose value cannot be determined before an event happens.

Probability distributions are functions that calculates the

Normally distributed data can be transformed into a standard normal

Data Science is the area of study which involves extracting insights

Internet Search: Google search uses Data science technology to

Gaming world: EA Sports, Sony, Nintendo are using Data science

A high variety of information & data is required for accurate analysis

A preliminary processing of data in order to prepare it for the primary

You might also like