DV - Unit 1
DV - Unit 1
Data
Visualization
UNIT I
INTRODUCTION TO STATISTICS
DATA
Data is a collection of facts, such as numbers, words, measurements,
observations or just descriptions of things.
Collective numbers of information's are known as Data.
Relative to today's computers and transmission media, data is
information converted into binary digital form.
Ex:- Name and address is information.
Name and all the things like Phone number, Address, Gender ,Father’s
name are Data
DATABASE
A database is an organized collection of structured information, or
data, typically stored electronically in a computer system.
The data can then be easily accessed, managed, modified, updated,
controlled, and organized.
Most databases use structured query language (SQL) for writing
and querying data.
Ex: Student Database, Employee Database
DATABASE
Ex:
SQL.
XQuery.
OQL.
DATA CENTER
A Data center is a physical facility that organizations use to house
their critical applications and data.
A data center's design is based on a network of computing and
storage resources that enable the delivery of shared applications
and data.
The key components of a data center design include routers,
switches, firewalls, storage systems, servers, and
application-delivery controllers
DATA CENTER
DATA WAREHOUSE
A data warehouse is a central repository of information that can be
analyzed to make more informed decisions.
Data flows into a data warehouse from transactional
systems, relational databases, and other sources.
Business analysts, data engineers, data scientists, and decision
makers access the data through business intelligence (BI) tools
DATA WAREHOUSE
Informed decision making
Informed decision making
Consolidated data from many sources
Consolidated data from many sources
Historical data analysis.
Historical data analysis. Data quality, consistency, and accuracy.
Separation of analytics processing from transactional databases, which improves performance of both systems.
Data quality, consistency, and accuracy.
ETL- Extract, Transform and load
1. P
INFERENTIAL STATISTICS
1. P
RANDOM VARIABLE
Example:-
1. A person’s blood type.
2. Number of leaves on a tree.
3. Number of times a user visits Linked in a day.
1. P
PROBABILITY DISTRIBUTIONS
2. Hypothesis tests
DATA SCIENCE
Data transformation: Here, data scientists think about how different aspects of the
data need to be organized to make the most sense for the goal. This could include
things like structuring unstructured data, combining salient variables when it makes
sense or identifying important ranges to focus on.
Data enrichment: In this step, data scientists apply the various feature engineering
libraries to the data to effect the desired transformations. The result should be a data
set organized to achieve the optimal balance between the training time for a new
model and the required compute.
Data validation: At this stage, the data is split into two sets. The first set is used to
train a machine learning or deep learning model. The second set is the testing data
that is used to gauge the accuracy and robustness of the resulting model. This
second step helps identify any problems in the hypothesis used in the cleaning and
feature engineering of the data.