0% found this document useful (0 votes)

10 views32 pages

Data Science

Uploaded by

vikaburko01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views32 pages

Data Science

Uploaded by

vikaburko01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

D ATA S C I E N C E

MADE BY VIKTORIIA BURKO

C O N C T E N T:

Data Data Storage in

Data Types;
Analysis; Formats; computers;

Data
Manipulatin Data
Analysis
g Data Sets; Cleansing
Process;
D ATA
A N A LY S I S
Data analysis involves examining,
cleaning, transforming, and modeling
data to discover useful information,
draw conclusions, and support decision-
making. It employs various statistical
and computational techniques to identify
patterns, trends, and relationships within
data sets.
Common steps include data collection, data
processing, exploratory data analysis, and
interpretation of results. Tools like
spreadsheets, programming languages (e.g.,
Python, R), and specialized software (e.g.,
Tableau, SAS) are frequently used. Effective
data analysis can lead to insights that drive
business strategy, scientific research, and policy
development.
D ATA T Y P E S
A data type is an attribute associated with a piece of data that tells a computer
system how to interpret its value. Understanding data types ensures that data is
collected in the preferred format and the value of each property is as expected. Data
types should not be confused with the two types of data that are collectively
referred to as customer data: entity data and event data. To properly define event
properties and entity properties, you need a good understanding of data types. A
well-defined tracking plan must contain the data type of every property to ensure
data accuracy and prevent data loss.
Q U A L I TAT I V E
D ATA
Qualitative data refers to non-numeric information that
describes qualities or characteristics. It is often collected
through interviews, surveys, observations, and textual
analysis, capturing the subjective aspects of experiences and
perceptions. Unlike quantitative data, qualitative data is
typically categorized into themes or patterns rather than
measured in numbers. Examples include opinions,
behaviors, and descriptions, which can provide deep
insights into complex issues. Analyzing qualitative data
often involves coding and identifying recurring themes to
understand underlying meanings and motivations.
Q U A N T I TAT I V E
D ATA
Quantitative data refers to numerical information that can
be measured and quantified. It is often collected through
experiments, surveys, and databases, providing objective
data that can be analyzed statistically. This type of data is
used to identify patterns, test hypotheses, and make
predictions based on numerical trends. Examples of
quantitative data include height, weight, temperature, and
test scores. Analyzing quantitative data involves using
mathematical and statistical techniques to interpret the
numbers and draw conclusions about the relationships and
differences within the data.
Integer (int)

It is the most common numeric data type used to store numbers

without a fractional component (-707, 0, 707).

Floating Point (float)

COMMON It is also a numeric data type used to store numbers that may have a
D AT A T Y P E S fractional component like monetary values do (707.07, 0.7, 707.00).

Please note that number is often used as a data type that includes
both int and float types.

Character (char)

It is used to store a single letter, digit, punctuation mark, symbol, or

blank space.
String (str or text)

It is a sequence of characters and the most commonly used data type to store text. Additionally, a string can also include
digits and symbols, however, it is always treated as text.

A phone number is usually stored as a string (+1-999-666-3333) but can also be stored as an integer (9996663333).

Boolean (bool)

It represents the values true and false. When working with the boolean data type, it is helpful to keep in mind that
sometimes a boolean value is also represented as 0 (for false) and 1 (for true).

Enumerated type (enum)

It contains a small set of predefined unique values (also known as elements or enumerators) that can be compared and
assigned to a variable of enumerated data type.

The values of an enumerated type can be text-based or numerical. In fact, the boolean data type is a pre-defined
enumeration of the values true and false.
Array
Also known as a list, an array is a data type that stores a number of elements in a specific order, typically all of the
same type.
Since an array stores multiple elements or values, the structure of data stored by an array is referred to as an array data
structure.
Each element of an array can be retrieved using an integer index (0, 1, 2,…), and the total number of elements in an
array represents the length of an array.
Date
Needs no explanation; typically stores a date in the YYYY-MM-DD format (ISO 8601 syntax).
Time
Stores a time in the hh:mm:ss format. Besides the time of the day, it can also be used to store the time elapsed or the
time interval between two events which could be more than 24 hours. For example, the time elapsed since an event
took place could be 72+ hours (72:00:59).
Datetime
Stores a value containing both date and time together in the YYYY-MM-DD hh:mm:ss format.
Timestamp
Typically represented in Unix time, a timestamp represents the number of seconds that have elapsed since midnight
(00:00:00 UTC), 1st January 1970.
I M P O R T A N C E O F D AT A T Y P E S

You might be wondering why

it’s important to know about all
these data types when you are Your knowledge of data types
mainly concerned with will come in handy in two stages
understanding how to leverage of your data collection efforts as
customer data. There is only one described below.
main reason—to gather clean
and consistent data.
EXAMPLE AND RECAP
Different programming languages offer
various other data types for a variety of A good way to think about data types is
purposes, however, the most commonly when you come across any form or
used data types that you need to know survey.
to become data-led have been covered.

Looking at a standard registration form, A text field stores the input as a string
you should keep in mind that each field while a number field typically accepts
accepts values of a particular data type. an integer.

Names and email addresses are always In single option or multiple option
of the type string, while numbers can be fields, where one has to select from
stored as a numerical type or as string predefined options, data types
since a string is a set of characters enumerated type and arrays come into
including digits. play.
D ATA F O R M AT S
Data formats refer to the structures in which data is
organized, stored, and transmitted, allowing for efficient
access and interpretation. Common data formats include
CSV (Comma-Separated Values), which is simple and
widely used for tabular data; JSON (JavaScript Object
Notation), which is lightweight and commonly used for
data interchange in web applications; and XML (eXtensible
Markup Language), which is versatile for hierarchical data
representation. Other formats like SQL databases are used
for structured data storage and retrieval, while formats such
as Parquet and Avro are optimized for large-scale data
processing. Choosing the appropriate data format depends
on the specific needs of the data handling, including
storage efficiency, ease of access, and compatibility with
analysis tools.
STRUCTURED
D ATA
Structured data refers to information that is organized into a
defined format, making it easily searchable and analyzable
by computers. This type of data is typically stored in
relational databases and spreadsheets, where it is arranged
in tables with rows and columns. Each column represents a
variable, and each row contains a record, ensuring
consistency and enabling efficient querying and reporting.
Examples of structured data include customer information
in a CRM system, financial transactions in accounting
software, and inventory lists. The rigid structure of this data
type allows for straightforward data management and
analysis using SQL and other database management tools.
S E M I - S T R U C T U R E D D ATA
Semi-structured data is a form of data that does not reside in a
traditional relational database but still has some organizational
properties that make it easier to analyze than unstructured data.
It contains tags or markers to separate semantic elements and
enforce hierarchies of records and fields within the data.
Common formats of semi-structured data include JSON
(JavaScript Object Notation), XML (eXtensible Markup
Language), and NoSQL databases. This type of data is often
found in web data, emails, and data streams from sensors. The
flexible schema allows for a more adaptable approach to data
storage and retrieval, accommodating varying types of data and
evolving requirements.
UNSTRUCTURED
D ATA
Unstructured data refers to information that lacks a
predefined format or structure, making it more challenging
to collect, process, and analyze. Unlike structured data,
which fits neatly into tables, unstructured data is often text-
heavy and can include multimedia elements. Examples of
unstructured data include emails, social media posts, videos,
audio files, and images. This type of data is rich in
information but requires advanced techniques like natural
language processing (NLP), image recognition, and
machine learning to extract meaningful insights. Due to its
complexity and volume, managing unstructured data often
involves specialized tools and technologies for storage,
indexing, and analysis.
STORAGE IN
COMPUTERS
Storage in computers refers to the components and devices used to
retain digital data, ensuring that information is available for
processing and retrieval as needed. There are two primary types of
storage: primary storage, also known as volatile memory or RAM
(Random Access Memory), which provides fast, temporary storage
for data actively being used by the CPU; and secondary storage,
which includes non-volatile memory such as hard drives (HDDs),
solid-state drives (SSDs), and external storage devices. Secondary
storage retains data even when the computer is turned off, making it
suitable for long-term storage. Additionally, there are cloud storage
solutions that allow data to be stored on remote servers accessed over
the internet, providing scalability and remote access capabilities.
Each type of storage has its own advantages in terms of speed,
capacity, and cost.
M A N I P U L AT I N G D ATA S E T S
Manipulating data sets involves various techniques and processes to clean, transform, and analyse data to extract meaningful
insights and facilitate decision-making. Common tasks include:
1. Data Cleaning: Removing or correcting errors, handling missing values, and ensuring consistency in data formats.
2. Data Transformation: Changing the data’s format or structure, such as normalizing, aggregating, or reshaping data to fit
analytical requirements.
3. Filtering and Sorting: Selecting specific subsets of data based on criteria and arranging data in a particular order to highlight
trends or patterns.
4. Merging and Joining: Combining multiple data sets into a single, cohesive data set by aligning related information based on
common keys or indexes.
5. Aggregation: Summarizing data through operations like averaging, summing, or counting to condense large data sets into more
interpretable formats.
Tools such as Excel, SQL, Python (with libraries like pandas), and R are commonly used to perform these tasks, enabling analysts
to refine raw data into actionable insights.
Manipulating data sets in Excel involves various techniques to clean, transform,
and analyze data efficiently. Key tasks include:

Selecting Data: To select data, you can click and drag over cells, use keyboard
shortcuts (like Ctrl + Shift + Arrow keys), or use the name box to jump to
specific cell ranges. Excel also offers features like filters and tables to easily
select subsets of data based on specific criteria.

Reordering Data: Reordering involves sorting data to arrange it in a

meaningful order. You can sort columns in ascending or descending order by
selecting the column header and using the Sort feature found in the Data tab.
For more complex sorting, you can use the Sort dialog box to sort by multiple
columns.

Reformatting Data: Reformatting changes the appearance and structure of

data. This can include changing cell formats (like dates, currency, or
percentages), applying conditional formatting to highlight specific data points,
or using functions to convert text to uppercase or lowercase. The Text to
Columns feature can split text data into multiple columns based on delimiters.
SELECTING
A COLUMN
REORDERIN
G A
COLUMN
REFORMATTIN
G A COLUMN
F I LT E R I N G
AND
SORTING
ROWS
SUBSETING
D ATA
REMOVING
D U P L I C AT E
S
D ATA A N A LY S I S P R O C E S S
The data analysis process typically involves several key steps:

Defining the Problem: Clearly articulate the objectives of the analysis and the questions you seek to answer. Understanding the
purpose of the analysis helps guide subsequent steps.

Data Collection: Gather relevant data from various sources, ensuring its quality, completeness, and relevance to the analysis goals.
This may involve accessing databases, conducting surveys, or collecting data through experiments.

Data Cleaning and Preparation: Clean the data to remove errors, inconsistencies, and missing values. This step also involves
transforming and restructuring the data to make it suitable for analysis. Tasks may include standardizing formats, encoding
categorical variables, and scaling numerical data.

Exploratory Data Analysis (EDA): Explore the data to understand its characteristics, identify patterns, and detect outliers. EDA
techniques include summary statistics, data visualization (e.g., histograms, scatter plots), and correlation analysis to uncover
insights and hypotheses.
Hypothesis Testing and Modeling: Formulate hypotheses based on insights from EDA and use
statistical methods to test them. This step may involve building predictive models (e.g., regression,
classification) or conducting inferential analyses to draw conclusions about the data population.

Interpretation and Insights: Interpret the results of the analysis in the context of the problem
domain, drawing meaningful conclusions and actionable insights. Communicate findings effectively
to stakeholders through reports, presentations, or visualizations.

Iterative Refinement: Review and refine the analysis process iteratively, incorporating feedback
and additional data as needed. Continuously validate and update models to improve accuracy and
relevance over time.

Decision Making and Implementation: Use the insights gained from the analysis to inform
decision-making and drive actions or interventions. Monitor the impact of decisions and track
performance metrics to assess the effectiveness of the analysis in achieving desired outcomes.
D ATA C L E A N S I N G

1 2 3 4 5

Data cleansing, also known Identifying Data Quality Handling Missing Removing Standardizing Data
as data cleaning or data Issues: Review the dataset Data: Determine how to Duplicates: Identify and Formats: Standardize data
scrubbing, is the process of to identify common issues handle missing values, remove duplicate records formats to ensure
identifying and correcting such as missing values, which can skew analysis or observations from the consistency and
errors, inconsistencies, and duplicate records, incorrect results. Options include dataset, ensuring that each comparability across the
inaccuracies in a dataset to formats, and outliers. imputing missing values entry is unique. This helps dataset. This may involve
improve its quality and Understanding the nature using statistical methods, prevent duplication bias converting data types,
reliability for analysis. Key and extent of these issues deleting rows or columns and ensures the accuracy standardizing date formats,
steps in the data cleansing is essential for developing with missing data, or of analysis results. and normalizing text fields
process include: an effective cleansing flagging missing values for to remove variations.
strategy. further investigation.
Correcting Errors: Identify and correct errors in the dataset, such as typographical errors,
inconsistencies in naming conventions, and invalid values. This may require manual review
or automated algorithms to detect and rectify errors.

Handling Outliers: Identify and address outliers, which are data points that deviate
significantly from the rest of the dataset. Depending on the analysis goals, outliers can be
treated by removing them, transforming them, or analyzing them separately.
Validating Data Integrity: Validate the integrity of the cleansed dataset to ensure that it
meets quality standards and is fit for analysis. This may involve cross-referencing data
against external sources, conducting data validation checks, and performing quality
assurance tests.
Documenting Changes: Document all changes made during the data cleansing process,
including the rationale behind each decision and any assumptions made. Maintaining clear
documentation helps ensure transparency and reproducibility of the analysis.
THANK YOU FOR YOUR
AT T E N T I O N

Microsoft Azure Data Fundamentals Explore Core Data Concepts
No ratings yet
Microsoft Azure Data Fundamentals Explore Core Data Concepts
8 pages
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
From Everand
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
1/5 (1)
Introduction To Emerging Technologies (EMTE1012) .
No ratings yet
Introduction To Emerging Technologies (EMTE1012) .
6 pages
DATA ANALYSIS - Full - Note - Immersive 2
No ratings yet
DATA ANALYSIS - Full - Note - Immersive 2
13 pages
Structured vs. Unstructured Data Understanding Differences
No ratings yet
Structured vs. Unstructured Data Understanding Differences
9 pages
CHAPTER 2 Emerging
No ratings yet
CHAPTER 2 Emerging
8 pages
Big Data Introduction
No ratings yet
Big Data Introduction
46 pages
SQL Notes
No ratings yet
SQL Notes
45 pages
Dr. Ayaz - Data Science Presentation
No ratings yet
Dr. Ayaz - Data Science Presentation
164 pages
Cse Big Data 702 Notes
No ratings yet
Cse Big Data 702 Notes
91 pages
Introduction To Data Science: Chapter Two
No ratings yet
Introduction To Data Science: Chapter Two
52 pages
DA Unit 1
No ratings yet
DA Unit 1
44 pages
Unit-Iii Advanced Database Systems
No ratings yet
Unit-Iii Advanced Database Systems
29 pages
AI Primer
No ratings yet
AI Primer
24 pages
Advanced Data Management Techniques
No ratings yet
Advanced Data Management Techniques
257 pages
Unit01-Advanced Data Management Techniques
No ratings yet
Unit01-Advanced Data Management Techniques
11 pages
Unit 1
No ratings yet
Unit 1
60 pages
Computer
No ratings yet
Computer
4 pages
Data and Data Storage
No ratings yet
Data and Data Storage
29 pages
Chapter 2 - Intro To Data Sciences (Updated)
No ratings yet
Chapter 2 - Intro To Data Sciences (Updated)
67 pages
Unit - Big - Data
No ratings yet
Unit - Big - Data
107 pages
Multidisciplinary Field That Uses A Variety
No ratings yet
Multidisciplinary Field That Uses A Variety
48 pages
Emerging CH2
No ratings yet
Emerging CH2
41 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
52 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
BD Unit 1
No ratings yet
BD Unit 1
72 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
DP 900 Data Fundamentals 1710103456
No ratings yet
DP 900 Data Fundamentals 1710103456
35 pages
DA (Unit 1)
No ratings yet
DA (Unit 1)
45 pages
Course 3
No ratings yet
Course 3
22 pages
Structured and Unstructured Data: Learning Outcomes
100% (1)
Structured and Unstructured Data: Learning Outcomes
13 pages
Sample Security Plan
No ratings yet
Sample Security Plan
9 pages
UNIT 1 INTRODUCTION TO BIGDATA by MIT
No ratings yet
UNIT 1 INTRODUCTION TO BIGDATA by MIT
12 pages
HTC Emerging Ch2
No ratings yet
HTC Emerging Ch2
37 pages
ET Ch-2 Data Science PPT
No ratings yet
ET Ch-2 Data Science PPT
28 pages
Data Categories
No ratings yet
Data Categories
4 pages
Chapter Two
No ratings yet
Chapter Two
57 pages
Unit 4 DigitalData
No ratings yet
Unit 4 DigitalData
22 pages
Data Science
No ratings yet
Data Science
35 pages
CH5-Written Report
No ratings yet
CH5-Written Report
6 pages
Data Types and Sources
No ratings yet
Data Types and Sources
36 pages
Undestanding Data Module-3
No ratings yet
Undestanding Data Module-3
8 pages
Module 1 Notes
No ratings yet
Module 1 Notes
7 pages
Data Analyst Work
No ratings yet
Data Analyst Work
22 pages
Big Data Aktu Unit 1
No ratings yet
Big Data Aktu Unit 1
85 pages
Lecture 1
No ratings yet
Lecture 1
25 pages
CSC4404 Chap3
No ratings yet
CSC4404 Chap3
84 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
57 pages
Domain 1
No ratings yet
Domain 1
8 pages
Data - Visualisation - Charts and Types of Data
No ratings yet
Data - Visualisation - Charts and Types of Data
7 pages
Algorithm and Data Structure Lecture 1a
No ratings yet
Algorithm and Data Structure Lecture 1a
4 pages
Module 2 DSA 24
No ratings yet
Module 2 DSA 24
79 pages
DATA ANALYTICS Note
No ratings yet
DATA ANALYTICS Note
52 pages
BMC205 DSAA Unit1 Intro Notes
No ratings yet
BMC205 DSAA Unit1 Intro Notes
14 pages
Unit I - Big Data Programming
No ratings yet
Unit I - Big Data Programming
19 pages
Data Science: Chapter Two
No ratings yet
Data Science: Chapter Two
8 pages
Fundamentals of Machine Learning and Data Science
No ratings yet
Fundamentals of Machine Learning and Data Science
73 pages
Chapter - 2
No ratings yet
Chapter - 2
38 pages
Chapter 2-2
No ratings yet
Chapter 2-2
34 pages
BDA Question Answer
No ratings yet
BDA Question Answer
29 pages
Glynn Consulting LTD - File Renaming Utility: WWW - Glynnconsulting.co - Uk, Modified by Shimant Gunjan
No ratings yet
Glynn Consulting LTD - File Renaming Utility: WWW - Glynnconsulting.co - Uk, Modified by Shimant Gunjan
23 pages
Tarea Video2
No ratings yet
Tarea Video2
13 pages
Pms
No ratings yet
Pms
6 pages
SQL Project Class 11
No ratings yet
SQL Project Class 11
10 pages
RDS - Deep Dive On Amazon Relational Database Service (RDS)
No ratings yet
RDS - Deep Dive On Amazon Relational Database Service (RDS)
34 pages
dp-700 6
No ratings yet
dp-700 6
16 pages
Features of Good Relational Design and Schema Refinement 1
No ratings yet
Features of Good Relational Design and Schema Refinement 1
25 pages
How To Configure A Transaction Replication Between Two Azure SQL Managed Instances - Refined
No ratings yet
How To Configure A Transaction Replication Between Two Azure SQL Managed Instances - Refined
17 pages
Network Databases Case Study
100% (1)
Network Databases Case Study
12 pages
Database Management Systems: What Is A Database?
No ratings yet
Database Management Systems: What Is A Database?
4 pages
Power BI With HR Analytics
No ratings yet
Power BI With HR Analytics
4 pages
Oracle ZFS Storage ZS3 Presales Specialist
80% (5)
Oracle ZFS Storage ZS3 Presales Specialist
21 pages
Your Search - 3545342342342343 - Did Not Match Any Documents. Suggestions: Make Sure That All Words Are Spelled Correctly. Try Different Keywords. Try More General Keywords
No ratings yet
Your Search - 3545342342342343 - Did Not Match Any Documents. Suggestions: Make Sure That All Words Are Spelled Correctly. Try Different Keywords. Try More General Keywords
1 page
DB Summary ITI 1734114920
No ratings yet
DB Summary ITI 1734114920
38 pages
LAB ACTIVITY 1: Structured Query Language (SQL) : 2. Insert The Following Values
No ratings yet
LAB ACTIVITY 1: Structured Query Language (SQL) : 2. Insert The Following Values
3 pages
Raushan Kumar Frelancer Trainer 4.3 BI (1) Up
No ratings yet
Raushan Kumar Frelancer Trainer 4.3 BI (1) Up
2 pages
DB Ass2
No ratings yet
DB Ass2
3 pages
Learn Linux, 101 Create and Change Hard and Symbolic Links PDF
No ratings yet
Learn Linux, 101 Create and Change Hard and Symbolic Links PDF
12 pages
Database Management System: Tutorial 1
No ratings yet
Database Management System: Tutorial 1
2 pages
Veritas Netbackup 8.0 Blueprint Accelerator
No ratings yet
Veritas Netbackup 8.0 Blueprint Accelerator
29 pages
SQL Bookstore V1
No ratings yet
SQL Bookstore V1
5 pages
Maximize Availability: With Oracle Database 18c
No ratings yet
Maximize Availability: With Oracle Database 18c
38 pages
Business Intelligence (BI) Refers To
No ratings yet
Business Intelligence (BI) Refers To
8 pages
Pooja DBMS Micro Project
No ratings yet
Pooja DBMS Micro Project
18 pages
Using and Installing BMC Analytics
No ratings yet
Using and Installing BMC Analytics
41 pages
Get Row Numbers in Sas Proc SQL
No ratings yet
Get Row Numbers in Sas Proc SQL
5 pages
Collection Variable Types
No ratings yet
Collection Variable Types
45 pages
DSS Course in English
No ratings yet
DSS Course in English
17 pages
Oracle Forms PDF
No ratings yet
Oracle Forms PDF
121 pages
Adventure Works 2008
No ratings yet
Adventure Works 2008
1 page

Data Science

Uploaded by

Data Science

Uploaded by

D ATA S C I E N C E

MADE BY VIKTORIIA BURKO

Data Data Storage in

It is the most common numeric data type used to store numbers

Floating Point (float)

It is used to store a single letter, digit, punctuation mark, symbol, or

Enumerated type (enum)

You might be wondering why

Reordering Data: Reordering involves sorting data to arrange it in a

Reformatting Data: Reformatting changes the appearance and structure of

You might also like