0% found this document useful (0 votes)

6 views56 pages

CertPREP Instructor PPT ITDataAnlytics 02

Uploaded by

Muhammad Muzammal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views56 pages

CertPREP Instructor PPT ITDataAnlytics 02

Uploaded by

Muhammad Muzammal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 56

Data Manipulation

Lesson 2

IT Specialist: Data Analytics

Topics Covered
• Skill 2.1: Import, store, and export data
• Skill 2.2: Clean data
• Skill 2.3: Organize data
• Skill 2.4: Aggregate data

2
Skill 2.1: Import, store, and export data
• This skill covers how to:
• Describe ETL processing
• Perform ETL with relational data
• Perform ETL with data stored in delimited files
• Perform ETL with data stored in XML files
• Perform ETL with data stored in JSON files

3
Describe ETL processing
• Figure 2-1: The ETL process

4
Extract
• During the data extraction phase, raw data is extracted from
one or more source systems to a staging area. The raw data
can be structured, semi-structured, or unstructured. Possible
sources include, but are not limited to the following:
• Relational or non-relational databases
• Flat files like CSV, JSON, or XML
• Sensors, email, or web pages

5
Transform
• During the data transformation phase, the raw data extracted from the
source system is processed and transformed for its intended analytical use
case. Some of the tasks performed during this phase are as follows:
• Filtering, cleansing, and deduplicating (i.e., eliminating duplicate or
redundant data) data
• Removal, encryption, decryption, or hashing of critical data as per industry
or government data regulations
• Performing necessary calculations or translations like currency conversion,
measurement units, or standardizing text formats
• Changing the data format to match the schema of the target system
• Some methods of filtering, cleaning, and formatting data are discussed later
in this lesson.
6
Load
• During the data load phase, the transformed data is loaded into
the target system.
• The data loading process can be one of the following types:
• Full data load where all data is loaded into the target system
• Incremental data load where periodic loading of incremental
data changes is done after the initial full loading of data
• Full refresh data load where the old data is fully replaced by the
new data in the target system

7
Perform ETL with relational data
• Varieties of RDBMS • Key components
• Microsoft SQL Server • Primary key
• MySQL • Foreign key
• Oracle database

8
Structured Query Language
• Categories of SQL statements • SELECT statements
• Data Definition Language • SQL alias
• Data Query Language
•
• SQL Joins
Data Manipulation Language
• Inner
• Data Control Language
• Left Outer
• Right Outer
• Cross
• Full outer
• Self

9
SQL

10
Select Statements
• SELECT * FROM table_name;
• SELECT col1, col2 FROM table_name;
• SELECT DISTINCT col1,col2,... FROM table_name;
• SELECT col1, col2,... FROM table_name WHERE condition;
• SELECT TOP N col1, col2,... FROM table_name;

11
SQL Alias Statement
• SELECT emp_id AS employee_id, emp_name AS
employee_name FROM employee;
• SELECT e.emp_id,
e.emp_name,
d.dept_name
FROM employee e INNER JOIN department d ON e.dept_id =
d.dept_id;

12
Cross Join or Cartesian Join
• Generates paired combination of each row of first table with
each row of second table

• Select column1, column2,….

from table1 cross join table2;

13
Perform ETL with data stored in delimited
files (Slide 1 of 3)
• Common delimiters • Types of delimited files
• Comma (,) • CSV file
• Semicolon (;) • TSV file
• Tab
• Space

14
Perform ETL with data stored in delimited
files (Slide 2 of 3)
• Importing delimited files to Excel
1. Open Excel and click on Data -> From Text
2. A dialog box will be opened to allow you to select the file. Select your
delimited file and click on Import.
3. Excel will open a preview of the data in the selected file.
• Click Next.
• Choose the delimiter (for example, comma for a CSV file ) and click on Finish.
• A new Import Data dialog box will be opened. Choose either Existing
worksheet or New worksheet and click OK

15
Perform ETL with data stored in delimited
files (Slide 3 of 3)
• Reading and writing delimited • Reading and writing delimited
files using Python files using R
• Read_csv () • Read_csv ()
• To_csv () • To_csv ()

16
Perform ETL with data stored in XML files
• XML follows a tree structure that must contain a root element
• The root element is the parent of all other elements
• Each XML element may contain text, attributes, or sub-child
elements
• All attribute values must be quoted with either single or double
quotes

17
Perform ETL with data stored in JSON files
• JSON file values
• String
• Number
• JSON object
• Array
• Boolean
• Null

18
Skill 2.2: Clean data
• This skill covers how to:
• Perform data cleaning common practices
• Perform truncation
• Describe data validation

19
Perform data cleaning common practices
• Common practices used in data cleaning process
1. Remove irrelevant data
2. Remove duplicate data
3. Remove unnecessary spaces
4. Handle inconsistent capitalization
5. Data type conversion
6. Handle missing or null values using imputation
7. Deal with outliers
8. Standardize data

20
Removing or filtering out irrelevant data
• In most cases, only part of the dataset is relevant to our data
analysis. In such cases, we either filter out or delete the irrelevant
data and select only the part of the data that is relevant to us.

• Filter out all the inactive employees during any calculation

• SELECT * FROM employee2 WHERE is_active != 0;

• Delete all inactive employees permanently from the table

• DELETE FROM employee2 WHERE is_active = 0;

21
Remove duplicate data
• It is very common to have duplicate records. Data can be
collected and gathered from multiple different sources. It is very
normal to have duplicates in the raw and unprocessed data.

• SELECT DISTINCT * FROM raw_employee;

22
Remove unnecessary spaces
• The unnecessary leading and/or trailing space can cause the same
data to be considered different. For example, the values “male”, “ male”,
“male ” and “ male ” are the same, but are considered different by a
string comparison due to leading and/or trailing spaces. Extra spaces
can be handled using the following SQL functions.

• TRIM() removes both leading and trailing spaces.

• LTRIM() removes only leading spaces.
• RTRIM() removes only trailing spaces.

• SELECT TRIM(department) FROM employee

23
Handling inconsistent capitalization
• Sometimes, the same data looks different to an algorithm or
function that uses string comparison due to inconsistent
capitalization.

• SELECT UPPER(department) FROM employee;

• SELECT Lower(department) FROM employee;

24
Data type conversion
• Syntax: CAST(expression AS datatype(length))

Value Description
expression Required. The value to convert
datatype Required. The datatype to convert expression to. Can be one of the
following: bigint, int, smallint, tinyint, bit, decimal, numeric, money,
smallmoney, float, real, datetime, smalldatetime, char, varchar, text,
nchar, nvarchar, ntext, binary, varbinary, or image
(length) Optional. The length of the resulting data type (for char, varchar,
nchar, nvarchar, binary and varbinary)

• SELECT CAST(age AS INT) AS age FROM employee;

25
Handle missing or null values using
imputation
In the real world, there may be some datasets that contain missing values.
Missing values are generally represented as NULL, N/A, blanks, etc.
Missing values are one of the most common problems in data analysis.
There are various ways to handle missing values:
• One way is to discard the records having missing values. But doing so
may result in the loss of valuable information.
• A better way is to replace missing data with some substituted value. This
technique is known as imputation.
• The substituted value that is used to replace missing data is known
as imputed data. The imputed data is derived from the existing part of
the data.

26
Handle missing or null values using
imputation
The following are some of the popular methods of data imputation:
• Imputation Using Mean or Median Values: In this technique, the
missing values in a column are replaced by the mean or median
value of non-missing values in the same column. This method can
be used only with numeric data.
Mean or average: It is the sum of all the numbers divided by the total
number of numbers. For example, the mean of 5 numbers [9, 12, 8, 14,
7] is (9+12+8+14+7)/5 , i.e., 10.
Median: It is the middle number in the sorted list of numbers in ascending
or descending order. For example, to find the median of 5 numbers [9, 12,
8, 14, 7], first sort the numbers in ascending order [7, 8, 9, 12, 14] and find
the middle number, i.e., 9. Therefore 9 is the median value.
27
Handle missing or null values using
imputation
• Imputation Using Most Frequent Values: In this technique,
the missing values in a column are replaced by the most
frequent value of non-missing values in the same column. This
method can be used for both numeric and non-numeric data.
• Imputation Using Zero or Constant Values: In this technique,
the missing values of a column are replaced by zero or any
other constant value. This method can be used for both numeric
and non-numeric data.

28
Deal with outliers
• An outlier is an extremely high or extremely low data value
compared to the other data values in the dataset.
The outliers are considered abnormal data values but they
should be investigated before eliminating them because they
may be valuable to the data and the analysis. To investigate
them, ask questions such as:
• Why did such data values appear?
• Is it a rare case, or is it likely to appear again?
Based on the investigation, a data analyst may either eliminate
those data points or perform data imputation.
29
Standardize data
• Standardizing data is a process of changing data into a
consistent format. This is required in various scenarios like
these:
• Changing temperature to either Fahrenheit or Celsius to have a
consistent unit.
• Ensuring that all instances of a length measurement are given
in the same unit (meters or kilometers).

• SELECT (length_km * 1000) AS length_meter FROM lengths;

30
Describe data validation
• Data validation is performed after the data cleaning process to
validate the data. During data validation, you take steps to
ensure that the data is accurate, complete, consistent, and
uniform.
• Common examples of data validation rules
• Data completeness check to ensure that the required records are not
missing
• Data type validation to verify that each field has the correct data type
(For example integer, float, string)
• Range validation to ensure the values are in the correct range (eg., a
number between 1-100)
31
Describe data validation
• Uniqueness check (For example, in a relational database, the
uniqueness can be ensured at the time of table creation by creating the
primary key constraints for fields like employee_id in
the EMPLOYEE table and department_id in DEPARTMENT table.)
• Consistent expressions (For example, the same department name
should not have different values like “HR”, “H.R”, and “Hr”)
• No null values (For example, the field name in the EMPLOYEE table
should not have null values).

32
Skill 2.3: Organize data
• This skill covers how to:
• Describe data organization
• Perform sorting
• Perform filtering
• Perform appending and slicing
• Perform pivoting
• Perform transposition

33
Describe data organization
• Data organization plays a vital role in managing and accessing data.
When data is well organized in an Excel worksheet or in a database
table, it allows users to access and process data easily and efficiently.
It is very difficult to access and process data that is not well
organized.
• Data organization helps in categorizing and classifying data to make it
more usable. Some of the following processes are used to organize
data.
• Sorting data
• Filtering data
• Appending data
• Slicing data
34
Perform sorting (Slide 1 of 2)
• Figure 2-5 Sorting Data in Excel

35
Perform sorting (Slide 2 of 2)
• SQL provides the ORDER BY clause to sort the data selected
from the database.
• ORDER BY sorts the records in ascending order by default.
• The ASC keyword is used to return the result in ascending
order, but it is optional as the ORDER BY returns results in
ascending order by default.
• The DESC keyword is used to return the result in descending
order.

36
Perform filtering
• Follow these steps to filter data in Excel.
1. Select any cell within the range of your dataset.
2. Select Data and then click on Filter.
3. After clicking on Filter, the column header will show arrow icons as
shown below. Select any of the column header arrows to filter the
data on that column.
4. Select Text Filters or Number Filters, and then select a
comparison, like Between.
5. Enter the filter criteria and click on OK.

37
Perform filtering
Operator Meaning
= Equal to
!=or<> Not equal to
> Greater than
< Less than
<= Less than or equal to
>= Equal to or greater than
BETWEEN To select items within a specified range where the start and end items are inclusive

LIKE To search for a pattern

IN To specify multiple possible values for a column to include

NOT IN To specify multiple possible values for a column to exclude

38
Perform filtering
• The following query will pull all records from
the EMPLOYEE table where the name begins with the
character 'R'. Here, the wildcard character % is being used.

• SELECT * FROM employee2 WHERE name LIKE 'R%';

39
Perform filtering
• The following query will return all records from
the EMPLOYEE table where the id is between 103 and 105,
inclusive of the beginning and end values.

• SELECT * FROM employee2 WHERE id BETWEEN 103 AND

105;

40
Perform filtering
• The following query will return all records from
the EMPLOYEE table where the id is either 103, 104, or 105.

• SELECT * FROM employee2 WHERE id IN (103,104,105);

41
Perform filtering
• The following query will return all records from
the EMPLOYEE table where the id is not 103, 104, or 105.

• SELECT * FROM employee2 WHERE id NOT IN (103,104,105);

42
AND and OR operators:
• The AND and OR operators are used to filter records based on
more than one condition in the WHERE clause

• The AND operator returns TRUE if all the conditions separated

by AND are TRUE

• The OR operator returns TRUE if any of the conditions

separated by OR is TRUE

43
Perform appending and slicing (Slide 1 of 3)
• Appending is used to combine two or more strings together. In
SQL, the + operator as well as the CONCAT() function are used
to combine strings.
• The following queries combine the two strings 'Data' and '
Analyst' together.

SELECT CONCAT('Data', ' Analyst'); Data Analyst

SELECT 'Data' + ' Analyst'; Data Analyst

44
Perform appending and slicing (Slide 2 of 3)
• Slicing is used to extract a subset of elements from a string. In
SQL, the SUBSTRING() function is used for slicing.
• string - It is the string to extract from.
• start - It is the start position. The first position in the string is 1.
• length - It is the number of characters required to extract.

• The following query extracts the first four characters from the
string 'Data Analyst’.
Query Output
SELECT SUBSTRING('Data Analyst', 1, 4); Data
45
Perform appending and slicing (Slide 3 of 3)
• Slicing Data in Excel
1. Select all the data in Excel and format it as a table (Insert -> Table).
Check My Table has Headers and click OK.
2. Click anywhere in the table and select Insert -> Slicer
3. A dialog box for Insert Slicers will open in which you need to select
the fields that you want to use to slice data and then select OK.
4. For each of the selected fields, a slicer will be created and each
slicer will have buttons corresponding to the distinct values in the
selected field.
5. When any of the slicer buttons is clicked, then only the matching
rows in the linked table will be shown.

46
Perform pivoting
1. Click anywhere in the table and select Insert -> PivotTable
2. The Create PivotTable dialog will be opened. Click OK.
3. A new worksheet will be opened that allows you to select the
pivot fields.
4. By default, the calculation is the sum, but it can be changed to
the count, min, or max.
• Select Value Field Settings for the VALUE dropdown.
• Choose your calculation (Sum, Count, Average, Max, Min, Product)
from the Value Field Settings dialog box and click OK

47
Perform transposition
• Steps to create a transposition of a table
1. Select blank cells where you would like the transposed table to be
created
2. After selecting blank cells, type the transpose formula =
TRANSPOSE (A1:E4)
3. After writing the transpose formula, press ENTER, which will generate
a transposed table.

48
Skill 2.4: Aggregate data
• This skill covers how to:
• Describe the aggregation function
• Use aggregation functions like COUNT, SUM, MIN, MAX, and AVG in
SQL
• Use GROUP BY and HAVING in SQL

49
Describe the aggregation function
• The aggregation of data is one of the most important aspects of
data analytics. It is useful in knowing the summary of the data.
Consider that you want to know the total number of employees
as well as the maximum, minimum, and average salary of
employees working in your organization. In order to find these
details, you need to apply the appropriate aggregation functions
on the employee records stored in the database.

50
Describe the aggregation function
• Table 2-43 Common and frequently used aggregation functions
COUNT Returns the number of records.

SUM Returns the total sum of values in a numeric column.

MIN Returns the smallest value in a column.

MAX Returns the largest value in a column.

AVG Returns the average of all the values in a column.

51
Use aggregation functions like COUNT,
SUM, MIN, MAX, and AVG in SQL
• SELECT
count(*) as total_employee,
sum(salary) as total_salary,
min(salary) as min_salary,
max(salary) as max_salary,
avg(salary) as average_salary
FROM employee2;

52
Use GROUP BY and HAVING in SQL
• The GROUP BY statement groups rows in different categories
where the aggregation functions can be applied on the rows of
a category independently.
• The HAVING clause is used to filter grouped data using
conditions calculated with the aggregate functions.

53
Use GROUP BY and HAVING in SQL
• SELECT department, COUNT(*) as total FROM employee2
GROUP BY department;

• SELECT department, COUNT(*) as total FROM employee2

GROUP BY department HAVING count(*)>1;

54
Use GROUP BY and HAVING in SQL
• SELECT department,
COUNT(*) as total_employee,
MIN(salary) as min_salary,
MAX(salary) as max_salary,
AVG(salary) as average_salary
FROM Employee2 GROUP BY department;

55
Summary
• This lesson covered importing, storing, and exporting data;
cleaning data; organizing data; and aggregating data.

Session2 Parts 3 4
No ratings yet
Session2 Parts 3 4
202 pages
Data Preparation
No ratings yet
Data Preparation
19 pages
Data Cleaning and Data Transformation
No ratings yet
Data Cleaning and Data Transformation
13 pages
CourseNotes - Learning Data Analytics 1 Foundations
No ratings yet
CourseNotes - Learning Data Analytics 1 Foundations
8 pages
2.1 Combining Data Frames
No ratings yet
2.1 Combining Data Frames
38 pages
Session 2 - Excel Fundamentals For Data Exploration
100% (1)
Session 2 - Excel Fundamentals For Data Exploration
56 pages
Mapping
No ratings yet
Mapping
22 pages
TTDS Lecture 2
No ratings yet
TTDS Lecture 2
40 pages
Business Analytics
100% (2)
Business Analytics
142 pages
Wk6 Preprocessing
No ratings yet
Wk6 Preprocessing
64 pages
Chapter 2 - Preparing Data For Analysis
No ratings yet
Chapter 2 - Preparing Data For Analysis
35 pages
Bafpred Module 2 Week 5 6
No ratings yet
Bafpred Module 2 Week 5 6
35 pages
VIPDMTheory Chapter 3
No ratings yet
VIPDMTheory Chapter 3
87 pages
TTDS Lecture 2
No ratings yet
TTDS Lecture 2
40 pages
Data Analytics With Financial Accounting Information: Winter 2022 Session 4
No ratings yet
Data Analytics With Financial Accounting Information: Winter 2022 Session 4
36 pages
Pre Processing
No ratings yet
Pre Processing
52 pages
Module 2 Clean Data For More Accurate Insights
No ratings yet
Module 2 Clean Data For More Accurate Insights
35 pages
Data Preprocessing
No ratings yet
Data Preprocessing
120 pages
The Data Science Process
No ratings yet
The Data Science Process
33 pages
DAA - Chapter 02
No ratings yet
DAA - Chapter 02
11 pages
SQL To Pyspark
No ratings yet
SQL To Pyspark
28 pages
Mod2 DM
No ratings yet
Mod2 DM
86 pages
CS822 DataMining Week3
No ratings yet
CS822 DataMining Week3
91 pages
What Is Data Cleanning?
No ratings yet
What Is Data Cleanning?
14 pages
Unit 2 Data Preprocessing and Association Rule Mining
No ratings yet
Unit 2 Data Preprocessing and Association Rule Mining
31 pages
FDS UNIT 1 Part2
No ratings yet
FDS UNIT 1 Part2
47 pages
LP-VI Handwritten Writeups
No ratings yet
LP-VI Handwritten Writeups
9 pages
Lecture 16
No ratings yet
Lecture 16
21 pages
Master in SQL: Data Cleaning
No ratings yet
Master in SQL: Data Cleaning
14 pages
Data Analyst Syllabus (For Aundh)
No ratings yet
Data Analyst Syllabus (For Aundh)
8 pages
ETL Day3 Assignment
No ratings yet
ETL Day3 Assignment
5 pages
Ais Elect - Reviewer
No ratings yet
Ais Elect - Reviewer
5 pages
Data Analitics 4
No ratings yet
Data Analitics 4
10 pages
Slide For Chapter 2
No ratings yet
Slide For Chapter 2
16 pages
wk3. Data-ETL
No ratings yet
wk3. Data-ETL
61 pages
Week 4&5
No ratings yet
Week 4&5
25 pages
BI - Lab Manual
No ratings yet
BI - Lab Manual
28 pages
Data Warehousing: Lecture No 07
No ratings yet
Data Warehousing: Lecture No 07
38 pages
MGMT 134 C2 Notes
No ratings yet
MGMT 134 C2 Notes
5 pages
Session2 Short
No ratings yet
Session2 Short
196 pages
E Xtract T Ransform L OAD: MIS Systems (Acct, HR) Legacy Systems
No ratings yet
E Xtract T Ransform L OAD: MIS Systems (Acct, HR) Legacy Systems
30 pages
Learn
No ratings yet
Learn
31 pages
Data Science Master Syllabus
No ratings yet
Data Science Master Syllabus
37 pages
DAA - Chapter 02
No ratings yet
DAA - Chapter 02
12 pages
FBA Module 3
No ratings yet
FBA Module 3
41 pages
Preprocessing Techniques
No ratings yet
Preprocessing Techniques
63 pages
New DM
No ratings yet
New DM
47 pages
PI ETL Concepts
No ratings yet
PI ETL Concepts
31 pages
3 Processing
No ratings yet
3 Processing
79 pages
DM Unit 3
No ratings yet
DM Unit 3
15 pages
m4t5 - PDF - Eng Data Cleaning & Etl
No ratings yet
m4t5 - PDF - Eng Data Cleaning & Etl
6 pages
Microsoft PowerPoint - DAA - Chapter 02
No ratings yet
Microsoft PowerPoint - DAA - Chapter 02
8 pages
0oracle SQL Scribe)
No ratings yet
0oracle SQL Scribe)
128 pages
Lec 9
No ratings yet
Lec 9
1 page
SQL Theory With Query
No ratings yet
SQL Theory With Query
11 pages
All Bi
No ratings yet
All Bi
17 pages
Data Cleaning in SQL
No ratings yet
Data Cleaning in SQL
14 pages
DSA2
No ratings yet
DSA2
4 pages
MCQ For 9th Class
67% (3)
MCQ For 9th Class
20 pages
Data Quality and Preprocessing Concepts ETL
No ratings yet
Data Quality and Preprocessing Concepts ETL
64 pages
WebGoat PDF
No ratings yet
WebGoat PDF
99 pages
OptiXtrans DC908 MD02 Datasheet
No ratings yet
OptiXtrans DC908 MD02 Datasheet
14 pages
Unit 1 - ARM7
No ratings yet
Unit 1 - ARM7
67 pages
Model 910 Manual
No ratings yet
Model 910 Manual
62 pages
Behaviour Modeling With State Machine and Activity Diagrams
No ratings yet
Behaviour Modeling With State Machine and Activity Diagrams
62 pages
27604MangeshGhonge MS
No ratings yet
27604MangeshGhonge MS
402 pages
Iot Imp Question
No ratings yet
Iot Imp Question
4 pages
Reference Schematics T1042 PDF
No ratings yet
Reference Schematics T1042 PDF
38 pages
Data Models
No ratings yet
Data Models
40 pages
2020 Internet Retailer Leading Vendors Top 1000 Eretailers
100% (1)
2020 Internet Retailer Leading Vendors Top 1000 Eretailers
106 pages
Annexure-10b) MA5620 & MA5626 Product Description PDF
No ratings yet
Annexure-10b) MA5620 & MA5626 Product Description PDF
52 pages
Networking Devices - Introductory Summary
100% (1)
Networking Devices - Introductory Summary
22 pages
Axxon Intellect
No ratings yet
Axxon Intellect
44 pages
Swing Java
No ratings yet
Swing Java
34 pages
5 Computer Hardware
No ratings yet
5 Computer Hardware
127 pages
Sony CRM
No ratings yet
Sony CRM
9 pages
CV 2
No ratings yet
CV 2
1 page
Sample Paper - 2010 Class - XII Subject - Computer Science
No ratings yet
Sample Paper - 2010 Class - XII Subject - Computer Science
7 pages
Software Quality & Testing Lecture 8
No ratings yet
Software Quality & Testing Lecture 8
20 pages
Chapter 06
No ratings yet
Chapter 06
4 pages
Battery Replacement SOP V1.2
No ratings yet
Battery Replacement SOP V1.2
9 pages
Rexelite Tutorial
No ratings yet
Rexelite Tutorial
5 pages
5990-8443EN A Simple Powerful Method To Characterize Differential Interconnects (Aug 2014)
No ratings yet
5990-8443EN A Simple Powerful Method To Characterize Differential Interconnects (Aug 2014)
16 pages
Vulnerabilities and Security Issues in Optical Networks
No ratings yet
Vulnerabilities and Security Issues in Optical Networks
4 pages
How To Manage Qualitative Data: A Step-by-Step Guide
No ratings yet
How To Manage Qualitative Data: A Step-by-Step Guide
3 pages
Tours Csharp Project Proposal
No ratings yet
Tours Csharp Project Proposal
2 pages
San Francisco St. Butuan City 8600, Region XIII Caraga, Philippines
No ratings yet
San Francisco St. Butuan City 8600, Region XIII Caraga, Philippines
2 pages
Guide: Maya Accessories Information System Manual
No ratings yet
Guide: Maya Accessories Information System Manual
3 pages
Brainly
No ratings yet
Brainly
4 pages
Learn SQL: Database Management Basics
From Everand
Learn SQL: Database Management Basics
Kiet Huynh
No ratings yet

CertPREP Instructor PPT ITDataAnlytics 02

Uploaded by

CertPREP Instructor PPT ITDataAnlytics 02

Uploaded by

Data Manipulation

IT Specialist: Data Analytics

• Select column1, column2,….

• Filter out all the inactive employees during any calculation

• Delete all inactive employees permanently from the table

• SELECT DISTINCT * FROM raw_employee;

• TRIM() removes both leading and trailing spaces.

• SELECT TRIM(department) FROM employee

• SELECT UPPER(department) FROM employee;

• SELECT Lower(department) FROM employee;

• SELECT CAST(age AS INT) AS age FROM employee;

• SELECT (length_km * 1000) AS length_meter FROM lengths;

LIKE To search for a pattern

NOT IN To specify multiple possible values for a column to exclude

• SELECT * FROM employee2 WHERE name LIKE 'R%';

• SELECT * FROM employee2 WHERE id BETWEEN 103 AND

• SELECT * FROM employee2 WHERE id IN (103,104,105);

• SELECT * FROM employee2 WHERE id NOT IN (103,104,105);

• The AND operator returns TRUE if all the conditions separated

• The OR operator returns TRUE if any of the conditions

SELECT CONCAT('Data', ' Analyst'); Data Analyst

SUM Returns the total sum of values in a numeric column.

MIN Returns the smallest value in a column.

MAX Returns the largest value in a column.

AVG Returns the average of all the values in a column.

• SELECT department, COUNT(*) as total FROM employee2

You might also like