C. Semi-Structured Data: Workspaces
C. Semi-Structured Data: Workspaces
___is textual data files with a discernible pattern that enables parsing (such
as Extensible Markup Language [XML] data files that are self-describing and defined by an
XML schema)
A. Unstructured data
В. Quasi-structured data
C. Semi-structured data
D. Structured data
12. ___is data that has no inherent structure, which may include text documents, PDFs,
images, and video.
A. Unstructured data
В. Quasi-structured data
C. Semi-structured data
D. Structured data
14. ____is textual data with erratic data formats that can be formatted with effort, tools, and
time (for instance, web clickstream data that may contain inconsistencies in data values and
formats)
A. Structured data
B. Semi-structured data
C. Quasi-structured data
D. Unstructured data
2.Which of the following tends to require highly structured data organized in rows and
columns for accurate
reporting?
A. Data Science
B. Business Intelligence
c. Data warehouse
D. Data exploratory
11. The expected defaults for headers, column separators, and decimal point notations of
read.delim2() function are___
6. Data science teams would rather keep more data than too little data for the analysis. The
questions for the data conditioning step that the team should answer include these, except:
A. What are the data sources? What are the target fields?
В. How clean is the data?
C. How consistent are the contents and files?
D. What is the programming language used for data processing?
8. ___tends to use disaggregated data in a more forward-looking, exploratory way, focusing
on analyzing the present and enabling informed decisions about the future
A. Data Science
В. Business Intelligence
C. Data warehouse
D. Data exploratory
E. Data exploratory and analytics
A. V <-seq(5, 9, by = 0.4)
В. V <-vector(seg(5,9), 0.4)
C. V <-5:9, 0.4
D. V<-0.4, 5:9
10. Which of the following tends to use disaggregated data in a more forward-looking,
exploratory way, focusing on analyzing the present and enabling informed decisions about
the future?
A. Data Science
В. Business Intelligence
C. Machine learning
D. Data exploratory
13. Which of the following phase teams begin forming ideas about which data to keep and
which data to transform or discard?
A. Data visualization
В. Data conditioning
C. Learning About the Data
D. Prepare analytics sandbox
15. In the data conditioning phase, data science team should consider the following issues,
except:
A. Assess the consistency of the data types
В. Review the content of data columns or other inputs, and check to ensure they make
sense
C. Look for any evidence of systematic error
D. Categorize data into classes
16. Which is NOT an additional questions and considerations for the data conditioning step?
A. What are the data sources?
В. How clean is the data?
C. How consistent are the contents and files
D. Which data model should we use?
17. The data-conditioning step is usually performed only by the following member, except:
A. The data owners
В. A data engineer
C. A database administrator
D. A programmer
18.Which of the following involves many operations on the dataset before developing models
to process or analyze the data?
A. Data visualization
В. Data conditioning
C. Data processing
D. Prepare analytics sandbox
20. Is it NOT important to involve the data scientist in data conditioning step because many
decisions are made in the data conditioning phase that does not affect subsequent analysis?
A. True
В. False
21. A part of___involves deciding which aspects of particular datasets will be useful to
analyze in later steps
A. Data conditioning
В. Performing ETLT
C. Developing Initial Hypotheses
D. Data analysis
22. Which of the following is often viewed as a preprocessing step for the data analysis?
A. Data visualization
В. Data conditioning
C. Data processing
D. Learning About the Data