0% found this document useful (0 votes)

7 views3 pages

Domain 2

The document discusses various data mining concepts, including data integration, ETL and ELT processes, and the importance of APIs for data sharing. It also covers techniques for data manipulation, such as data merging, blending, and normalization, as well as issues like missing data and data redundancy. Additionally, it explains database indexing, temporary tables, and methods for data subsetting.

Uploaded by

baka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views3 pages

Domain 2

Uploaded by

baka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Domain 2.

0: Data Mining
31. Data integration combines business and technical processes for collating
data from different sources into valuable and meaningful datasets.

32. Extract, transform, load (ETL) enables data engineers to extract data
from multiple source systems, transform the raw data into a more
usable/workable dataset, and finally load the data into a storage system so
end users can access meaningful data in reports or dashboards.

33. Extract, load, transform (ELT) enables data engineers to extract the data
from data sources, load it to target datastore, and transform it as the queries
are executed to get insights in reports or dashboards.

34. Delta loading refers to the process of extracting the delta, or difference in
the data compared to what was previously extracted as part of the ETL
process.

35. An application programming interface (API) provides a programmable

interface for interacting with applications and infrastructure and acts as a
middleware integration layer.

36. APIs enable organizations to selectively share their applications in terms

of data and functionality with internal stakeholders (developers and users) as
well as external stakeholders, such as business partners, third-party
developers, and vendors.

37. Web scraping, also known as web data extraction or web harvesting, is a
method used to the extract data from websites.

38. Surveys are commonly used to collect data from respondents.

39. Sampling is the process of collecting data from a subdivision/subset of a

given population to get insights that represent the whole population.

40. A derived variable is defined by a parameter or an expression related to

existing variables in a dataset.

41. The process of recoding a variable can be used to transform a current

variable into a different one, based on certain criteria and business
requirements.

42. Data merging simplifies data analysis by merging multiple datasets into
one larger dataset.
43. Data blending brings together data from multiple sources that may be
very dissimilar.

44. Duplicate data can lead to similar entities of the same data values being
created in the database/warehouse.

45. Data appending refers to adding new data elements to an existing

dataset/database.

46. Imputation is helpful in filling in missing values. Imputation can be based

on logical rules, based on related observations, based on the last observation
carried forward, and based on creating new variable categories.

47. Data reduction is a data manipulation technique that is used to minimize

the size of a dataset by aggregating, clustering, or removing any redundant
features.

48. Data redundancy occurs when the same datasets are stored in multiple
data sources.

49. Data manipulation is an important step for business operation and

optimization when dealing with data and analysis. Data analysts and
engineers can manipulate data so that analysis can be performed on
cleansed, focused, and more accurate datasets.

50. Normalization is aimed at removing redundant information from a

database and ensuring that only related data is stored in a table.

51. Many data functions are available to help collate or get focused insights
from data. Some examples are aggregate functions, logical functions,
sorting, and filtering.

52. Missing data is one of the key issues with data accuracy and consistency.

53. Specification mismatch is caused by data at the source being a mismatch

for data at the destination due to unrecognized symbols, bad data entry,
invalid calculations, or mismatching of units/labels.

54. A data outlier in a dataset is an observation that is inconsistent or very

dissimilar to the remaining information.

55. Invalid data refers to values that were initially generated inaccurately.

56. Non-parametric data is data that does not fit a well-defined or well-stated
distribution.
57. Data type validation ensures that data has the correct data type before it
is leveraged at the destination system.

58. An execution plan works behind the scenes to ensure that a query gets
all the needed resources and is executed; it outlines the steps for execution
of the query from start through output.

59. A parameterized query makes it possible to use placeholders for

parameters, where the parameter values are supplied at execution time.

60. Indexing speeds up the execution of queries by rapidly finding records by

delivering all the columns requested by the query without executing full
table scans.

61. A B-tree is formed of nodes where the tree starts at a root that has no
parent node and the other nodes in the tree each have one parent node,
which might or might not have child nodes.

62. A clustered index sorts the way records in the table are physically stored,
whereas a non-clustered index collects data in one place and records in
another place, like a pointer to the data.

63. Temporary tables offer workspace for transitional results when processing
data.

64. There are two types of temporary tables that you can create in Microsoft
SQL: global and local.

65. A subset is a smaller set of data from a larger database or a data

warehouse that allows you to focus on only the relevant information.

66. Data subsetting can be performed by using two methods: data sharding
and data partitioning. Data sharding involves creating logical horizontal
partitions in database to quickly access the data of interest. Partitioning
involves creating logical vertical partitions in a database.

Data Engineering Part 1 1735286787
No ratings yet
Data Engineering Part 1 1735286787
22 pages
Data Mesh Principles and Logical Architecture
75% (4)
Data Mesh Principles and Logical Architecture
27 pages
Comptia Data+ Da0-001
No ratings yet
Comptia Data+ Da0-001
10 pages
Snowflake
No ratings yet
Snowflake
43 pages
What Are The Principal Tools and Technologies For Accessing Information From Databases To Improve Business Performance and Decision Making
100% (1)
What Are The Principal Tools and Technologies For Accessing Information From Databases To Improve Business Performance and Decision Making
4 pages
Glossary: Data Analytics
No ratings yet
Glossary: Data Analytics
15 pages
Advance Database System
No ratings yet
Advance Database System
8 pages
Data Warehousing & Data Mining Unit-2 Notes
100% (1)
Data Warehousing & Data Mining Unit-2 Notes
36 pages
LSMW Migration With IDOC Method and BAPI
100% (1)
LSMW Migration With IDOC Method and BAPI
47 pages
Basics of Data Analytics
No ratings yet
Basics of Data Analytics
4 pages
Acceptance Testing and ETL Process j8Mus6Ctvj
No ratings yet
Acceptance Testing and ETL Process j8Mus6Ctvj
19 pages
IV-cse DM Viva Questions
No ratings yet
IV-cse DM Viva Questions
10 pages
This PPT Is Dedicated To My Inner Controller Founders.: Amma Bhagavan
No ratings yet
This PPT Is Dedicated To My Inner Controller Founders.: Amma Bhagavan
84 pages
Data Warehouse 1
No ratings yet
Data Warehouse 1
21 pages
1y EyZZjRcyvhMmWY1XMYQ - Course 5 Week 1 Glossary - DA Terms and Definitions
No ratings yet
1y EyZZjRcyvhMmWY1XMYQ - Course 5 Week 1 Glossary - DA Terms and Definitions
14 pages
BECE352E Module 2
No ratings yet
BECE352E Module 2
58 pages
Data Warehouse Glossary
No ratings yet
Data Warehouse Glossary
6 pages
Ais Elect - Reviewer
No ratings yet
Ais Elect - Reviewer
5 pages
It125 Finals
No ratings yet
It125 Finals
16 pages
Basics of Data Integration
No ratings yet
Basics of Data Integration
67 pages
Unit 2 Data Analytics
No ratings yet
Unit 2 Data Analytics
16 pages
KBXP lP9SDOXhSb90W1zRQ Course-4-Glossary
No ratings yet
KBXP lP9SDOXhSb90W1zRQ Course-4-Glossary
13 pages
Baltzan BDIS8e CH06 Instructor PPT
No ratings yet
Baltzan BDIS8e CH06 Instructor PPT
58 pages
Paper 2 Datawarehouse Notes
No ratings yet
Paper 2 Datawarehouse Notes
20 pages
Data Base Management Sysytem
No ratings yet
Data Base Management Sysytem
26 pages
Unit-1 DMDW
No ratings yet
Unit-1 DMDW
22 pages
CJrMPRb9S OIYJrkYfkgVg Course 5 Glossary
No ratings yet
CJrMPRb9S OIYJrkYfkgVg Course 5 Glossary
15 pages
Week06 Chapter6 Data BI
No ratings yet
Week06 Chapter6 Data BI
63 pages
Reviewer
No ratings yet
Reviewer
2 pages
Data Extraction
No ratings yet
Data Extraction
14 pages
Data Mining
No ratings yet
Data Mining
14 pages
DM & W SQ
No ratings yet
DM & W SQ
15 pages
DAVAI Macro
No ratings yet
DAVAI Macro
6 pages
Adbms
No ratings yet
Adbms
19 pages
RDBMS Stands For Relational Database Management System. It's A Type of Database Management System That Stores Data in
No ratings yet
RDBMS Stands For Relational Database Management System. It's A Type of Database Management System That Stores Data in
2 pages
Migrate ADMM Exam Specification 1
No ratings yet
Migrate ADMM Exam Specification 1
13 pages
The Need of Data Analysis
No ratings yet
The Need of Data Analysis
12 pages
Buma Reviewer
No ratings yet
Buma Reviewer
10 pages
Cou2 Glossary
No ratings yet
Cou2 Glossary
6 pages
Data Mining
No ratings yet
Data Mining
34 pages
Data Analystic
No ratings yet
Data Analystic
35 pages
Notes 250122 031657
No ratings yet
Notes 250122 031657
19 pages
Ctit QB Solution-U1
No ratings yet
Ctit QB Solution-U1
12 pages
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
No ratings yet
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
28 pages
DMBI Viva
No ratings yet
DMBI Viva
18 pages
CFFtBuz2Q4yO5uDKZ D UQ Course 3 Glossary
No ratings yet
CFFtBuz2Q4yO5uDKZ D UQ Course 3 Glossary
15 pages
fKRnNEEKSXed0KQFCUv8XQ Glossary
No ratings yet
fKRnNEEKSXed0KQFCUv8XQ Glossary
8 pages
v2QsWP7eSuSCZzIB2RRzJg Course-2-Glossary
No ratings yet
v2QsWP7eSuSCZzIB2RRzJg Course-2-Glossary
14 pages
Additional Terminology
No ratings yet
Additional Terminology
1 page
Bca DM Unit I
No ratings yet
Bca DM Unit I
20 pages
Data Warehouse
No ratings yet
Data Warehouse
11 pages
Data Warehouse
No ratings yet
Data Warehouse
14 pages
DA Interview Questions
No ratings yet
DA Interview Questions
7 pages
Data Preprocessing Steps 2
No ratings yet
Data Preprocessing Steps 2
26 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
6 pages
INTERNSHIP
No ratings yet
INTERNSHIP
7 pages
DM UNIT-1 Question and Answer
No ratings yet
DM UNIT-1 Question and Answer
25 pages
Prathima - Data Analyst
No ratings yet
Prathima - Data Analyst
6 pages
Basic of Intelligence Business
No ratings yet
Basic of Intelligence Business
5 pages
ZolpEjBKQVioCz08u0B3kw Glossary 8
No ratings yet
ZolpEjBKQVioCz08u0B3kw Glossary 8
4 pages
DWM PDF
No ratings yet
DWM PDF
35 pages
BDA Notes
No ratings yet
BDA Notes
35 pages
Informatica MDM Course Content
No ratings yet
Informatica MDM Course Content
4 pages
Performance Tuning Techniques For Handling High Volume of Data in Informatica
No ratings yet
Performance Tuning Techniques For Handling High Volume of Data in Informatica
16 pages
Ms Power Bi Assessment
No ratings yet
Ms Power Bi Assessment
29 pages
Data Warehouse Manual
No ratings yet
Data Warehouse Manual
15 pages
Ccs341 Data Warehousing All Units
No ratings yet
Ccs341 Data Warehousing All Units
86 pages
BDA UT2 QB Answers
100% (1)
BDA UT2 QB Answers
22 pages
Ramesh Mullapati Mobile: +1 804-300-6993 Email ID: Summary
No ratings yet
Ramesh Mullapati Mobile: +1 804-300-6993 Email ID: Summary
8 pages
Requirements For Quarry Full Digitalisation For Smart Sensors, Automation & Process Control, and For ICT Solutions, BIM and AI Report
No ratings yet
Requirements For Quarry Full Digitalisation For Smart Sensors, Automation & Process Control, and For ICT Solutions, BIM and AI Report
89 pages
TM02 Determine Suitability of Database Functionality and Scalability
No ratings yet
TM02 Determine Suitability of Database Functionality and Scalability
83 pages
Lecture 12-13 (31-MAY-01-JUNE - 07-08 - JUNE-2023) - CH09 - PPT
No ratings yet
Lecture 12-13 (31-MAY-01-JUNE - 07-08 - JUNE-2023) - CH09 - PPT
88 pages
Determine Suitability of Database Functionality and Scalability
No ratings yet
Determine Suitability of Database Functionality and Scalability
73 pages
Clinithink CLiX ENRICH White Paper
No ratings yet
Clinithink CLiX ENRICH White Paper
16 pages
RajithaK Data Engineer Resume
No ratings yet
RajithaK Data Engineer Resume
12 pages
KM Secc
No ratings yet
KM Secc
16 pages
TLombard Resume Resume
No ratings yet
TLombard Resume Resume
10 pages
Enterprise Integration Questions Answers
No ratings yet
Enterprise Integration Questions Answers
6 pages
Subject: Big Data 5th SEM / Computer
No ratings yet
Subject: Big Data 5th SEM / Computer
2 pages
RavindraReddy CV
No ratings yet
RavindraReddy CV
3 pages
Bikash - Jha-CV - Docx (1) 2
No ratings yet
Bikash - Jha-CV - Docx (1) 2
3 pages
Etl CV
No ratings yet
Etl CV
2 pages
Informatica and Cloudera Unleash The Power of Hadoop
No ratings yet
Informatica and Cloudera Unleash The Power of Hadoop
3 pages
Suraj Kumar
No ratings yet
Suraj Kumar
2 pages
Pruthviraj Data Engineer PDF
No ratings yet
Pruthviraj Data Engineer PDF
1 page

Domain 2

Uploaded by

Domain 2

Uploaded by

Domain 2.

35. An application programming interface (API) provides a programmable

36. APIs enable organizations to selectively share their applications in terms

38. Surveys are commonly used to collect data from respondents.

39. Sampling is the process of collecting data from a subdivision/subset of a

40. A derived variable is defined by a parameter or an expression related to

41. The process of recoding a variable can be used to transform a current

45. Data appending refers to adding new data elements to an existing

46. Imputation is helpful in filling in missing values. Imputation can be based

47. Data reduction is a data manipulation technique that is used to minimize

49. Data manipulation is an important step for business operation and

50. Normalization is aimed at removing redundant information from a

53. Specification mismatch is caused by data at the source being a mismatch

54. A data outlier in a dataset is an observation that is inconsistent or very

59. A parameterized query makes it possible to use placeholders for

60. Indexing speeds up the execution of queries by rapidly finding records by

65. A subset is a smaller set of data from a larger database or a data

You might also like