CertPREP Instructor PPT ITDataAnlytics 01
CertPREP Instructor PPT ITDataAnlytics 01
Lesson 1
2
Skill 1.1: Define the concept of data
• This skill covers how to:
• Define data and information
• Differentiate between data and information
• Define statistics and its relation with data
3
Define data and information
• Data is a collection of facts or figures that are recorded for
analysis.
• Information is organized data that is analyzed and presented to
make it meaningful and suitable for making decisions.
4
Differentiate between data and information
• Table 1-2: The differences between data and information
Data Information
A collection of facts that is raw and has little Processed data that has some meaning.
or no meaning.
Data is independent of information. Information is dependent on data.
Data cannot be used for making decisions Information can be used to make personal
unless analyzed. and business decisions.
It may be difficult to understand data. It is easier to understand information.
5
Define statistics and its relation with data
• Statistics is the summary of the data collected and can be a
numerical value.
• Statistics are often presented in the form of charts, graphs, or
tables.
6
Example
Marks for Math assignment
roll 1 2 3 4 5 6 7 8 9 10
marks 60 40 80 65 55 70 50 85 45 50
Statistics
Average mark 60
Maximum mark 85
Minimum mark 40
7
Skill 1.2: Describe basic data variable types
• This skill covers how to:
• Define variables
• Identify different data types
• Define type checking
8
Define variables
• In the field of data analytics, a variable is a value, such as a number, that
could represent a height measurement, someone's age, or income, as well
as a quantity or characteristic that can be counted or measured like class
size, class grades, or gender. It is called a variable because its value can
change. Variables can have different values during different stages of a
program. For example, age is a variable that can vary for different people
and can also vary over time.
9
Define variables
• Common variable rules:
• Can only consist of letters (A-Z, a-z), digits (0-9), and an underscore(_).
• Can start with a letter or an underscore character but cannot start with
a number.
• Is case-sensitive
• Should not be a keyword. Keywords are the reserved words in any
programming language e.g. if, while, for etc.
10
Identify different data types
• In computer programming, a data type specifies the type of data that a
variable can store.
Data Types Description
11
Example
12
Define type checking
• Types of type checking • Categories of type checking
• Compile Time • Statistically-typed languages
• Runtime • Dynamically-typed languages
13
Skill 1.3: Describe basic structures used in
data analytics
• This skill covers how to:
• Define tables
• Define arrays
• Define lists
14
Define Tables
• A table is a data structure in which data is arranged in rows and
columns.
• The row is also called a record, a tuple, or a vector.
• The column is also called a field, a parameter, a property, or an
attribute.
• The intersection of a row and a column is called a cell.
15
Define arrays
• Characteristics of arrays:
• Index-based
• Fixed in size
• Provides random access
16
Define lists
• A list is a linear data structure that is used to store a collection of
items.
• Lists are supported in many programming languages such as
Python, Java, C#, and C++.
• The items in the list are ordered.
• The items in lists are mutable.
• Lists allow duplicate items.
• Similar to an array, list items are indexed.
• Unlike an array, the items in a list can be of the same or different
data types.
17
Skill 1.4: Describe data categories
• This skill covers how to:
• Differentiate between structured and unstructured data
• Identify and use different types of data
18
Differentiate between structured and
unstructured data
• Table 1-14: Key differences between structured and
unstructured data
Structured Data Unstructured Data
Data Definition It is organized and usually has a predefined It is unorganized and has no predefined
format. format.
Data Type The data type for each attribute is The data type for each piece of data is not
predefined. predefined.
Storage It is stored in relational databases like It is stored in non-relational databases such as
Oracle, Microsoft SQL Server, or MySQL. MongoDB, Cassandra, or Couchbase.
Searching or It is easy to search and analyze. It is hard to search and analyze.
Analysis
Examples Customer records, student records, product Social media posts, images, audio, video, and
information, and sales transactions emails
19
Identify and use different types of data
• Big data • Qualitative data
• Volume • Normal
• Variety • Ordinal
• Velocity • Quantitative data
• Veracity • Continuous
• Variability • Discrete
• Value
• Imputed data
• Metadata
20
Summary
• This lesson covered defining the concept of data; describing
basic data variable types; describing basic structures used in
data analytics; and describing data categories.
21