0% found this document useful (0 votes)
426 views30 pages

Datascience One Word

This document contains multiple choice questions about data types, data structures, databases, SQL, NoSQL and related topics. It covers structured vs unstructured data, quantitative vs qualitative data, relational database concepts like keys, SQL statements like DML, DDL, DCL. It also covers semi-structured data formats like XML, NoSQL databases like MongoDB, HBase, cloud computing concepts.

Uploaded by

Shradha Gaikwad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
426 views30 pages

Datascience One Word

This document contains multiple choice questions about data types, data structures, databases, SQL, NoSQL and related topics. It covers structured vs unstructured data, quantitative vs qualitative data, relational database concepts like keys, SQL statements like DML, DDL, DCL. It also covers semi-structured data formats like XML, NoSQL databases like MongoDB, HBase, cloud computing concepts.

Uploaded by

Shradha Gaikwad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

1.Tha data that can be processed,stored and taken in fixed format is called ______________.

A.Semi-Structred data.
B.Structured Data.
c.Unstructured data.
d.Hidden data.

2.___________ refers to data that lacks a specific form or structure.


A.Structured data.
B.Semi-structured data.
C.Unstructured Data.
D.Numerical data.

3.Email is an example of ______________.


A.Structured Data.
B.Unstructured data.
C. Discrete data.
4.Continuous Data.

4.___________is defined as raw facts and figures collected together and stored in database.
A.Data
B.Analysis
C.Knowledge
D.Wisdom

5.___________count cannot be made more accurate.


A.Continuous data.
B.Discrete data.
C.Structured data.
4.Categorical.

6._______________is called Quantitative data.


A.Nominal Data.
B.Numerical Data.
C.Oridinal data.
D.Normal data.

7.______________is called Qualitative data.


A.Business data.
B.Discrete data.
C.Filtered data.
D.Categorical data.

8.___________data have finite options.


A.Continuous data.
B.Discrete Data.
C.Ordinal data.
D.Nominal Data.

9.____________data has no hierarchy.


A.Numerical data.
B.Nominal Data.
C.Ordinal Data.
D.Observed data.
10._____________is a translator used for translating high level language into desired output.
A.
B.
C.Interpreter
D.Compiler

11.A________is a translator a high level language into equivalent machine language


programs.
A.Assembler.
B.Compiler
C.Intrerpreter
D.

12. ALGOL is a example for ____________________type of high level language.


A.String and list Processing
B.Object Oriented Programming language.
C.Algebraic Formula-Type Processing
D.Visual Programming Language.

13._____________Programming languages are designed for building Windows-based


applications.
A.LISP
B.Visual Basic
C.COBOL
D.C++

14.__________________is a software suite that combines basic tools required to write and
test software.
A. An Integrated Development Environment.
B.Exploratory Data Analysis.
C.Data Visualization.
D.

15._____________is an approach used to analyse dataset to summarize data set


characteristics using visual methods.
A.Exploratory Data Analysis.
B.
C.Data Cleaning
D.Data Extraction.

16.___________________is the process of displaying data or information in graphical


charts,figures and bars.
A.
B.Data Visualization.
C.
D.

17.______________ presents categorical data with rectangular bars.


A.Box Plot
B.Scatter plot
C.Pie Chart
D.Bar Graph

18 .A vertical bar chart is called __________.


A.Scatter plot
B.Box plot
C.line graph
D.histogram
19. A_______ visualises the distribution of data over continuous interval or certain time
period.
A.Histogram
B.Box Plot
C.
D.

20.A_________is a circular statistical graphic,which is divided into slices to illustrate


numerical proportion.
A.Bar Chart
B.Line Chart
C.Box Plot
D.Pie Chart
21.A__________displays the five-number summary of a set of data.
A.Scatter Plot
B.Box Plot
C.Line chart
D.Histogram

22._____________is the process of transforming data from its original “raw”form into more
digestible format.
A.Data Extraction
B.Data Wrangling
C.Data Mapping
D.Data cleaning

23.__________________ is a platform in which enterprises are analyzing and storing their


user data.
A. Data modelling
B. Data Processing
C. Data Management
D. Data Collection

24.Primary data is also termed as _________.


A.raw data.
B.Structured Data.
C.
D.
25.Primary data is obtained by__________.
A.books
B.magazine
C.news paper
D.survey

26.The process of fixing or removing incorrect, incomplete and irrelevant data from a dataset
is called _______.
A.Data Analysis
B.Data Extraction
C.Data Cleaning
D.Data Modelling

27.The data extracted directly from the source system is called________.


A.Offline Extraction
B.Incremental Extraction
C.Full Extraction
D.Online Extraction
28.__________is the process of creation of data model which specifies how data is to be
stored in database.
A.Data Analysis
B.Data Modelling
C.Data Processing
D.Data Collection

29.The process of deriving high quality information from text is called_______________.


A.Text Analytics
B.Data Extraction
C.Data Visualization
D.Text Processing.

30._____________refers to the process of detecting data sets with similar attributes to learn
their similarities and difference in the data.
A.Regression Analysis
B.Classification Analysis
C.Clustering Analysis
D.Discrete Analysis
31.____________is based on real-world entities and relationship among them.
A.Entity Relationship model
B.
C.Logical Relationship model
D.Data model

32.The process of extracting the data from different various sources systems is
called__________.
A.Data Collection
B.Data Extraction
C.Data Processing
D.Data Analysis

33.Data Visualization examines the data in _________format.


A. Graphical
B. Text based
C.file-based
D.Directory-based

34. In ER model rectangle represents:

a.       Attributes
b.      Entity set
c.       Relationships
d.      None of these

35. An entity has a set of ___________ that describe it.


a.      Attributes
b.      Entity
c.       Tuples
d.      Relations

36.Data Mining is the process of analysis of large amount of data to extract previously
unknown, interesting patterns of _______, _______ and the dependencies.
a. online, offline extraction
b. full, incremental extraction
c. data, unusual data
d. raw, corrupted data

37.______ model is referred as physical model.


a. E-R model
b. Data model
c. Data object
d. Relational model
38.Data Curation is an iterative process which includes three main stages:
a. Clustering, Association, Classification
b. Preserving, Sharing, Discovering
c. Collaborate, Supervise, Participate
d. Create, Alter, Drop

39.Which are types of Outer Join:


a. Theta, Natural, Full
b. Sum, Avg, Min
c. Projection, Notation, Union
d. Left, Right, Full

40] Which of the following is correct skills for a Data Scientist?


A. Probability & Statistics
B. Machine Learning / Deep Learning
C. Data Wrangling
D. All of the above

41] Which of the following is not a application for data science?


A. Recommendation Systems
B. Image & Speech Recognition
C. Online Price Comparison
D. Privacy Checker

42] ] EDA and data Visualization both mainly targets to represent data in _______ format.
a. graphical
b. univariate graphical
c. multivariate graphical
d. non-graphical
43]Any repository data that is documented but yet to be processed and fully integrated is
called as ______
a. Data Wrangling
b. Data Munging
c. Raw data
d. Kurtosis

44]

UNIT-2

1.Which among is not a stage of Data curation?


A.Preserving
B.Sharing
C.Filtering
D.Discovering

2._____________is used to combine information from two different relations or tables into
single relation.
A.Cartesian Product
B.Set Difference
C.Union Operation
D.Rename Operation

3._________is used to performs binary union between two given relations.


A.Union operation
B.Projection
C.Select operation
D.Rename Operation

4.TCL is __________________.
A.Transmission Control Level.
B.Transaction Console Language
C.Transaction Control Language
D.None of the above.

5. The language used application programs to request data from the DBMS is referred to as
__________.
 A.DML
B.DDL
C. Query language
D. All of the above

6. Which of the following is not a type of SQL statement?


A. Data Manipulation Language (DML)
B. Data Definition Language (DDL)
C. Data Control Language (DCL)
D. Data Communication Language (DCL)

7. Which of the following is not included in DML (Data Manipulation Language)


A. INSERT
B. UPDATE
C. DELETE
D. CREATE

8. Which of the following keyword is used with Data Control Language (DCL) statements?
A. SELECT
B. INSERT
C. DELETE
D. GRANT

9. The Database Language That Allows You To Access Or Maintain Data In A Database
A.DCL
B. DML
c. DDL
D. All of the Mentioned

10. __________is the attribute or group of attributes that uniquely identify occurrence of each
entity.
a.       Foreign key
b.      Super Key
c.       Primary Key
d.      All of these
11.. In SQL, which command is used to add new rows to a table?
A. Alter Table
B. Add row
C.Insert
D.Append

12. DCL stands for ________.


A. Data Control Language
B. Data Console Language
C. Data Console Level
D. Data Control Level

13. ________is the information about data.

a.  Data
b.   Meta-Data
c.    Entity
d.    Relations

14. Which is not the feature of database:


a.  Data redundancy
b.  Independence
c.   Flexibility
d.   Data Integrity

15.Which of the following is not a semi-structured data?


A. Markup language XML
B. Open standard JSON
C. NoSQL
D. Excel Spreadsheet

16. Which are the main features of XML?


a.  Text data description
b. Human- and computer-friendly format
c. Handles data in a tree structure having one-and only one-root element
d. All mentioned above

17.  XML stands for?


a. Extensible Markup Language
b. Extended Mashup Language
c. Extensible Mashup Language

d. X-Markup Language

18. Find the correct syntax of the declaration which defines the XML Version?

a.< ?xml version= "1.0" ? >

b. < xml version="1.0"/ >


c. < ?xml version="1.0" / >
d. None of the above

19. XML is ?
a. Platform Independent
b. Language Independent
c. Both A & B
d. None of the above

20.MongoDB is a cross-platform, document-oriented database that provides, _______, high


availability and easy scalability.
a. high integrity
b. high performance
c. collection
d. multiple databases

21.______ provides a mechanism for storage and retrieval of data which are not based on
RDBMS principle.
a. ODBC
b. JDBC
c. SQL
d. NoSQL

22.HBase is a _______ database and the tables in it are sorted by row.


a. column-oriented
b. row-oriented
c. table-oriented
d. data-oriented

23. Amazon Web Services (AWS) is used to provide IT services to the market in the form of
web services known as ______
a. Public Cloud
b. Hybrid Cloud
c. Private Cloud
d. Cloud Computing

24. Which are three types of service models in cloud.


a. Public, Private, Hybrid
b. Cost-Efficient, Reliability, Unlimited Storage
c. IaaS, PaaS, SaaS
d. Backup, Recovery, Easy Access

25. ______ is a collection of datasets that cannot be processed using traditional computing
techniques.
a. Cloud
b. MapReduce
c. Web services
d. Big Data

26]Height, Weight, Length are example of


a) Binomial data b) Discrete data
c) Qualitative data d) Continuous data

27]Which of the following operator not used in Relational algebra.


a) Select b) Project
c) Remove d) Union

28]What is the mean of DISPOSE in Data Curation Lifecycle.


a) Store in data secure manner
b) Used in proper authentication
c) Used for longer time
d) Not in used for longer time

29].Which is the technique used for extracting large amount of data from websites.
a) JSON b) Web scraping
c) Data Modelling d) XML

30].Federated database related to,


a) Heterogeneous database b) Autonomous database
c) Homogenous database 4) None of the above

31]. JSON stands for,


1) JavaScript Object Notification
2) JavaScript Object Notation
3) JavaScript Object Networking
4) None of the above

32]What are the following types are not related to NoSQL database.
A) Homogeneous DB b) Document Database
c) Graph Stores d) Key Value Store

33]Mongo DB support cross platform and is written in ______ language.


a) Java b) SQL
c) C++ d) c
34] It is an error from the Erroneous assumption mode during learning of an algorithm is
called as ______
a) Variance
b) bias
c) AIC
4) HBase

35] Ordinal data is a type of


a) measurement b) Categorical
c) Discrete d) Continuous

36] _____ Measures asymmetry about the mean of the probability distribution of a random
variable.
a) Skewness b) Covariance
c) Variance d) Kurtosis

37]In ____, We start with all the features and removes the least significant feature at each
iteration.
a) Forward Elimination
b) Backward Elimination
c) Recursive Elimination
d) None of the above

38]X-path specification has_____ type of nodes.


a) four b) Five
c) Six d) Seven

39]in ____ Shows all individual data points.


a) Box-Plot 2) Scatter Plot
3) Line plot 4) Pie chart

40] Movie Recommendation systems are an example of


a) Classification
b) Clustering
c) Reinforcement Learning
d) Regression

41]Which of the following lists names of variables in a data.frame?


A. par()
B. names()
C. barchart()
D. quantile()

42] Which method shows hierarchical data in a nested format?


A. Treemaps
B. Scatter plots
C. Population pyramids
D. Area charts

43]Which of the following plots are often used for checking


randomness in time series?
A. Autocausation
B. Autorank
C. Autocorrelation
D. None of the above
44] ___________ provides an web service interface that provides resizable compute
capacity in the AWS cloud.
a)EC2
B)S3
C)ES2
D)EC3

45] Amazon Web Services falls into which of the following cloud-
computing category?
  Platform as a Service
  Software as a Service
  Infrastructure as a Service
  Back-end as a Service

Unit-3

1] ______ is used to solve the problem of the overfitting faced by models it does so by
introducing the penalty term of the number of parameters or features in the model.
a. AIC
b. Cross Validation
c. BIC
d. RIC

2] Ridge expression is a technique which comes into picture when the data suffers from
______
A. collinearity
b. noncollinearity
c. multicollinearity
d. regularization
3] _______ tradeoff is generally faced in supervised algorithms due to which the accuracy
and generalization both cannot be adopted in the model.
a. Bias
b. Variance
c. Bias-Variance
d. Parsimony

4] Regularization is a form of ______, that constrains or regularizes or shrinks the coefficient


estimates towards zero.
a. model
b. coefficient
c. regression
d. procedure

5] Cross Validation is a statistical method used to estimate the skill of _______ models.
a. re-sampling
b. machine learning
c. statistical
d. predictive

6] ______ can include a range of activities like convert data types, cleanse data by removing
nulls or duplicate data, enrich the data, or perform aggregations, depending on the needs of
your project.
a. Data Mining
b. Data Cleaning
c. Data Transformation
d. Data Analysis

7] The technique used for forecasting, time series modelling and finding the casual effect
relationship between variables is called _______
a. Logistic Regression
b. Time Series Analysis
c. Classification Trees
d. Regression Analysis

8] The Autoregressive model is mathematically return as _______


x
a.
p
t=c+ ∑ ∅ i x t−i+ ∈ t
i=0

n
b.( x +a ) =∑ n x a
n k n−k

k=0 k
()

f (n ) ( a ) n
c. f ( z )=∑ ( z−a )
n=0 n!

d. x t= μ+∈ p
t+ ∑ ∅∈ t−i
i=1

9] In machine learning ________ classification is also known as multinomial classification.


a. multiple
b. multiclass
c. multipurpose
d. multisector

10] _______ means there are only two possible classes such as positive or negative, 0 and 1,
true or false, on or off.
a. Dichotomous
b. Logistic Regression
c. Binary Classes
d. Sigmoid Function

11] acronym of SVM is _______


a. Secure Virtual Machine
b. Security Vector Machine
c. Support Virtual machine
d. Support Vector Machine

12] K-Nearest is simple and one of the most basic yet essential classification _______
in Machine Learning.
a. Hyperplane
b. Algorithm
c. Function
d. Recognition

13] Which one is the Advantage of KNN algorithm


a. No Training Period
b. Does Not work well with large dataset
c. Does not work well with high dimensions
d. Need Feature scaling

14] PCA stands for


a. Principal Computerized Analysis
b. Principal Computational Analysis
c. Principal Component Analysis
d. Principal Clustering Analysis

15] Which of the following is finally produced by Hierarchical Clustering?


a. final estimate of cluster centroids
b. tree showing how close things are to each other
c. assignment of each point to clusters
d. all of the mentioned
16] Which of the following is required by K-means clustering? Which of the following is
required by K-means clustering?
a. defined distance metric
b. number of clusters
c. initial guess as to cluster centroids
d. All of the Above

17] Which of the following distance metric cannot be used in k-NN?

a. Minkowski

b. Manhattan

c. Jaccard

d. All of the above

18] The package used for reading HTML and XML data is

a. httr

b. http

c. httx

d. all of the above

19]Which of the following is second goal of PCA?

a. Data compression

b. statistical analysis

c. data dredging

d. all of the above


20] Which of the following can be used for data analysis model?

a. CRAN

b. CPAN

c. CTAN

d. All of the above

21] Which of the following is a categorical outcome?

a. RMSE

b. Rsquare

c. Accuracy

d. rjson

22] Which of the following characteristic of big data is relatively more concerned to data
science?

a. Velocity

b. Variety

c. Volume

d. Variance

23] Which of the following is commonly referred to as ‘data fishing’?

a. Data bagging

b. Data booting

c. Data Merging

d. Data Dredging
24]_______ is tree like structure which is used to represent the hierarchical clustering
technique.

a. Dendogram

b. K-means

c. Agglomerative

d. Divisive

25] SEM and PEM are types of which method

a. Clustering

b. Dendogram

c. Ensemble

d. Hierarchical

26] Bayes’ Theorem is named after

a. Thomas Bayes

b. Chris Bayes

c. Mcloed Bayes

d. Todd Bayes

27] Which of the following methods are present in caret for regularized regression?
a) ridge
b) lasso
c) relaxo
d) all of the mentioned

28] Which of the following analysis is a statistical process for estimating the relationships
among variables?
a) Causal
b) Regression
c) Multivariate
d) All of the mentioned

29] Which of the following options is/are true for K-fold cross-validation?
1. Increase in K will result in higher time required to cross validate the result.
2. Higher values of K will result in higher confidence on the cross-validation result as
compared to lower value of K.
3. If K=N, then it is called Leave one out cross validation, where N is the number of
observations.

a) 1 and 2
b) 2 and 3
c) 1 and 3
d) 1,2 and 3

30] Which of the following function tracks the changes in model statistics?
a) varImp
b) varImpTrack
c) findTrack
d) none of the mentioned

31] Which of the following is characteristic of best machine learning method?


a) Fast
b) Accuracy
c) Scalable
d) All of the mentioned

32] Which of the following package is used for tidy data?


a) tidyr
b) souryr
c) NumPy
d) all of the mentioned
33] Which of the following function is used for k-means clustering?
a) k-means
b) k-mean
c) heatmap
d) none of the mentioned

34] Which of the following function is used to read data off the webpages?
a) read.web
b) read.Lines
c) read.Line
d) all of the mentioned

35]Which of the following tool is used for estimating standard errors and the bias of
estimators?
a) knitr
b) jackknife
c) ggplot2
d) all of the mentioned

36] Which of the following is similar to a pre-specified clinical trial protocol?


a) Caching-based Data Analysis
b) Evidence-based Data Analysis
c) Markdown-based Data Analysis
d) All of the mentioned

37] Which of the following returns an array of ones with the same shape and type as a given
array?
a) all_like
b) ones_like
c) one_alike
d) all of the mentioned

38] ___________ decompose the elements of x into mantissa and twos exponent.
a) trunc
b) fmod
c) frexp
d) ldexp
39]_____________________is used to summarize the information in a data set described by
multiple variable.
A]Principal Component Analysis.
B]Exploratory Data Analysis.
C]Multidimensionality.
D]Integrated Development Environment

40]K-means clustering is a type of ______________learning.


A]Supervised
B]Unsupervised
C]Semisupervised
D]Reinforcement

41] Which of the following curve analysis is conducted on each predictor for classification?
a) NOC
b) ROC
c) COC
d) All of the mentioned

42] Bayesian Information Criterion (BIC) is related to________.


a)Ridge regression
b)Akaike Information Criterion (AIC)
c)Cross validation
d)Lasso Regression

43.In ___________hierarchial clustering,the distance between two clusters as the


shortest distance between two points in each cluster.

a]single linkage

b]Complete linkage

c]Average linkage

d]Complex linkage
44]____________algorithm is called lazy learner algorithm.

a]KNN

b]SVM

c]PCA

d]EDA

45]SVM creates__________that separates the dataset into classes.

A]Clusters

B]Dimensions

C]Hyperplane

D]Groups

46]

You might also like