0% found this document useful (0 votes)
28 views5 pages

C. Semi-Structured Data: Workspaces

Test question 1

Uploaded by

datnthe171250
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views5 pages

C. Semi-Structured Data: Workspaces

Test question 1

Uploaded by

datnthe171250
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

1.

___is textual data files with a discernible pattern that enables parsing (such
as Extensible Markup Language [XML] data files that are self-describing and defined by an
XML schema)
A. Unstructured data
В. Quasi-structured data
C. Semi-structured data
D. Structured data

12. ___is data that has no inherent structure, which may include text documents, PDFs,
images, and video.
A. Unstructured data
В. Quasi-structured data
C. Semi-structured data
D. Structured data

14. ____is textual data with erratic data formats that can be formatted with effort, tools, and
time (for instance, web clickstream data that may contain inconsistencies in data values and
formats)
A. Structured data
B. Semi-structured data
C. Quasi-structured data
D. Unstructured data

2.Which of the following tends to require highly structured data organized in rows and
columns for accurate
reporting?
A. Data Science
B. Business Intelligence
c. Data warehouse
D. Data exploratory

3.____refers to the process of cleaning data, normalizing datasets, and performing


transformations on the data.
A. Data visualization
В. Data conditioning
C. Data processing
D. Learning About the Data

7. ___referred to as workspaces, are designed to enable teams to explore many datasets in


a controlled fashion and are not typically used for enterprise level financial reporting and
sales dashboards.
A. Analytic sandbox
В. Analytic environment
C. Analytic datasets
D. Enterprise data warehouse
4. Which of the following is NOT a function enable exporting of R datasets to an external
file?
A. write.table()
В. write.csv()
C. write.csv20
D. Out.write()
5. The expected defaults for headers, column separators, and decimal point notations of
read.csv2() function
are
A. Headers is false, column separators = “”, decimal point = “.”
В. Headers is true, column separators = “,”, decimal point = “.”
C. Headers is true, column separators = “;”, decimal point = “,”
D. Headers is true, column separators = "\t", decimal point = “.”
E. Headers is true, column separators = "\t", decimal point = “,”

11. The expected defaults for headers, column separators, and decimal point notations of
read.delim2() function are___

A. Headers is false, column separators = “”, decimal point = “.”


В. Headers is true, column separators = “,”, decimal point = “.”
C. Headers is true, column separators = “;”, decimal point = “,”
D. Headers is true, column separators = "\t", decimal point = “.”
E. Headers is true, column separators = "\t", decimal point = “,”
11. The expected defaults for headers, column separators, and decimal point notations of
read.delim() function are__
A. Headers is false, column separators = “”, decimal point = “.”
В. Headers is true, column separators = “,”, decimal point = “.”
C. Headers is true, column separators = “;”, decimal point = “,”
D. Headers is true, column separators = "\t", decimal point = “.”
E. Headers is true, column separators = "\t", decimal point = “,”

6. Data science teams would rather keep more data than too little data for the analysis. The
questions for the data conditioning step that the team should answer include these, except:
A. What are the data sources? What are the target fields?
В. How clean is the data?
C. How consistent are the contents and files?
D. What is the programming language used for data processing?
8. ___tends to use disaggregated data in a more forward-looking, exploratory way, focusing
on analyzing the present and enabling informed decisions about the future
A. Data Science
В. Business Intelligence
C. Data warehouse
D. Data exploratory
E. Data exploratory and analytics

A. Data Science: Science, analysis, predictive modeling, algorithms, machine learning.

B. Business Intelligence: Business, intelligence, reporting, dashboards, historical data.

C. Data warehouse: Central repository, integrated data, historical data, storage.

D. Data exploratory: Exploration, discovery, forward-looking, exploratory analysis.

E. Data exploratory and analytics: Exploration, analytics, forward-looking, informed


decisions.

9. Which of the followings is correct statement used to declare a vector?

A. V <-seq(5, 9, by = 0.4)

В. V <-vector(seg(5,9), 0.4)

C. V <-5:9, 0.4

D. V<-0.4, 5:9

10. Which of the following tends to use disaggregated data in a more forward-looking,
exploratory way, focusing on analyzing the present and enabling informed decisions about
the future?

A. Data Science

В. Business Intelligence

C. Machine learning

D. Data exploratory

13. Which of the following phase teams begin forming ideas about which data to keep and
which data to transform or discard?
A. Data visualization
В. Data conditioning
C. Learning About the Data
D. Prepare analytics sandbox

14.Which of the followings is correct statement used to create a list?


A. list_data <- list("Red", "Green", c(21,32,11), TRUE, 51.23, 119.1)
В. list_data <-list("Red", "Green", vector(1:5))
C. list_data <- list("Red", "Green", c(21,32,11), TRUE, 51.23, 119.1);
D. list_data <-list("Red", "Green", "1","5");

19. Given a statement


list1 <- list(1:5)
Which of the following statement is correct?
A. v1 <-unlist(list1)
В. v1 <-vector(list1)
C. v1 <-matrix(list1)
D. v1 <- list(list1);

15. In the data conditioning phase, data science team should consider the following issues,
except:
A. Assess the consistency of the data types
В. Review the content of data columns or other inputs, and check to ensure they make
sense
C. Look for any evidence of systematic error
D. Categorize data into classes
16. Which is NOT an additional questions and considerations for the data conditioning step?
A. What are the data sources?
В. How clean is the data?
C. How consistent are the contents and files
D. Which data model should we use?

17. The data-conditioning step is usually performed only by the following member, except:
A. The data owners
В. A data engineer
C. A database administrator
D. A programmer

18.Which of the following involves many operations on the dataset before developing models
to process or analyze the data?
A. Data visualization
В. Data conditioning
C. Data processing
D. Prepare analytics sandbox
20. Is it NOT important to involve the data scientist in data conditioning step because many
decisions are made in the data conditioning phase that does not affect subsequent analysis?
A. True
В. False
21. A part of___involves deciding which aspects of particular datasets will be useful to
analyze in later steps
A. Data conditioning
В. Performing ETLT
C. Developing Initial Hypotheses
D. Data analysis
22. Which of the following is often viewed as a preprocessing step for the data analysis?
A. Data visualization
В. Data conditioning
C. Data processing
D. Learning About the Data

You might also like