Datascience One Word
Datascience One Word
A.Semi-Structred data.
B.Structured Data.
c.Unstructured data.
d.Hidden data.
4.___________is defined as raw facts and figures collected together and stored in database.
A.Data
B.Analysis
C.Knowledge
D.Wisdom
14.__________________is a software suite that combines basic tools required to write and
test software.
A. An Integrated Development Environment.
B.Exploratory Data Analysis.
C.Data Visualization.
D.
22._____________is the process of transforming data from its original “raw”form into more
digestible format.
A.Data Extraction
B.Data Wrangling
C.Data Mapping
D.Data cleaning
26.The process of fixing or removing incorrect, incomplete and irrelevant data from a dataset
is called _______.
A.Data Analysis
B.Data Extraction
C.Data Cleaning
D.Data Modelling
30._____________refers to the process of detecting data sets with similar attributes to learn
their similarities and difference in the data.
A.Regression Analysis
B.Classification Analysis
C.Clustering Analysis
D.Discrete Analysis
31.____________is based on real-world entities and relationship among them.
A.Entity Relationship model
B.
C.Logical Relationship model
D.Data model
32.The process of extracting the data from different various sources systems is
called__________.
A.Data Collection
B.Data Extraction
C.Data Processing
D.Data Analysis
a. Attributes
b. Entity set
c. Relationships
d. None of these
36.Data Mining is the process of analysis of large amount of data to extract previously
unknown, interesting patterns of _______, _______ and the dependencies.
a. online, offline extraction
b. full, incremental extraction
c. data, unusual data
d. raw, corrupted data
42] ] EDA and data Visualization both mainly targets to represent data in _______ format.
a. graphical
b. univariate graphical
c. multivariate graphical
d. non-graphical
43]Any repository data that is documented but yet to be processed and fully integrated is
called as ______
a. Data Wrangling
b. Data Munging
c. Raw data
d. Kurtosis
44]
UNIT-2
2._____________is used to combine information from two different relations or tables into
single relation.
A.Cartesian Product
B.Set Difference
C.Union Operation
D.Rename Operation
4.TCL is __________________.
A.Transmission Control Level.
B.Transaction Console Language
C.Transaction Control Language
D.None of the above.
5. The language used application programs to request data from the DBMS is referred to as
__________.
A.DML
B.DDL
C. Query language
D. All of the above
8. Which of the following keyword is used with Data Control Language (DCL) statements?
A. SELECT
B. INSERT
C. DELETE
D. GRANT
9. The Database Language That Allows You To Access Or Maintain Data In A Database
A.DCL
B. DML
c. DDL
D. All of the Mentioned
10. __________is the attribute or group of attributes that uniquely identify occurrence of each
entity.
a. Foreign key
b. Super Key
c. Primary Key
d. All of these
11.. In SQL, which command is used to add new rows to a table?
A. Alter Table
B. Add row
C.Insert
D.Append
a. Data
b. Meta-Data
c. Entity
d. Relations
d. X-Markup Language
18. Find the correct syntax of the declaration which defines the XML Version?
19. XML is ?
a. Platform Independent
b. Language Independent
c. Both A & B
d. None of the above
21.______ provides a mechanism for storage and retrieval of data which are not based on
RDBMS principle.
a. ODBC
b. JDBC
c. SQL
d. NoSQL
23. Amazon Web Services (AWS) is used to provide IT services to the market in the form of
web services known as ______
a. Public Cloud
b. Hybrid Cloud
c. Private Cloud
d. Cloud Computing
25. ______ is a collection of datasets that cannot be processed using traditional computing
techniques.
a. Cloud
b. MapReduce
c. Web services
d. Big Data
29].Which is the technique used for extracting large amount of data from websites.
a) JSON b) Web scraping
c) Data Modelling d) XML
32]What are the following types are not related to NoSQL database.
A) Homogeneous DB b) Document Database
c) Graph Stores d) Key Value Store
36] _____ Measures asymmetry about the mean of the probability distribution of a random
variable.
a) Skewness b) Covariance
c) Variance d) Kurtosis
37]In ____, We start with all the features and removes the least significant feature at each
iteration.
a) Forward Elimination
b) Backward Elimination
c) Recursive Elimination
d) None of the above
45] Amazon Web Services falls into which of the following cloud-
computing category?
Platform as a Service
Software as a Service
Infrastructure as a Service
Back-end as a Service
Unit-3
1] ______ is used to solve the problem of the overfitting faced by models it does so by
introducing the penalty term of the number of parameters or features in the model.
a. AIC
b. Cross Validation
c. BIC
d. RIC
2] Ridge expression is a technique which comes into picture when the data suffers from
______
A. collinearity
b. noncollinearity
c. multicollinearity
d. regularization
3] _______ tradeoff is generally faced in supervised algorithms due to which the accuracy
and generalization both cannot be adopted in the model.
a. Bias
b. Variance
c. Bias-Variance
d. Parsimony
5] Cross Validation is a statistical method used to estimate the skill of _______ models.
a. re-sampling
b. machine learning
c. statistical
d. predictive
6] ______ can include a range of activities like convert data types, cleanse data by removing
nulls or duplicate data, enrich the data, or perform aggregations, depending on the needs of
your project.
a. Data Mining
b. Data Cleaning
c. Data Transformation
d. Data Analysis
7] The technique used for forecasting, time series modelling and finding the casual effect
relationship between variables is called _______
a. Logistic Regression
b. Time Series Analysis
c. Classification Trees
d. Regression Analysis
n
b.( x +a ) =∑ n x a
n k n−k
k=0 k
()
∞
f (n ) ( a ) n
c. f ( z )=∑ ( z−a )
n=0 n!
d. x t= μ+∈ p
t+ ∑ ∅∈ t−i
i=1
10] _______ means there are only two possible classes such as positive or negative, 0 and 1,
true or false, on or off.
a. Dichotomous
b. Logistic Regression
c. Binary Classes
d. Sigmoid Function
12] K-Nearest is simple and one of the most basic yet essential classification _______
in Machine Learning.
a. Hyperplane
b. Algorithm
c. Function
d. Recognition
a. Minkowski
b. Manhattan
c. Jaccard
18] The package used for reading HTML and XML data is
a. httr
b. http
c. httx
a. Data compression
b. statistical analysis
c. data dredging
a. CRAN
b. CPAN
c. CTAN
a. RMSE
b. Rsquare
c. Accuracy
d. rjson
22] Which of the following characteristic of big data is relatively more concerned to data
science?
a. Velocity
b. Variety
c. Volume
d. Variance
a. Data bagging
b. Data booting
c. Data Merging
d. Data Dredging
24]_______ is tree like structure which is used to represent the hierarchical clustering
technique.
a. Dendogram
b. K-means
c. Agglomerative
d. Divisive
a. Clustering
b. Dendogram
c. Ensemble
d. Hierarchical
a. Thomas Bayes
b. Chris Bayes
c. Mcloed Bayes
d. Todd Bayes
27] Which of the following methods are present in caret for regularized regression?
a) ridge
b) lasso
c) relaxo
d) all of the mentioned
28] Which of the following analysis is a statistical process for estimating the relationships
among variables?
a) Causal
b) Regression
c) Multivariate
d) All of the mentioned
29] Which of the following options is/are true for K-fold cross-validation?
1. Increase in K will result in higher time required to cross validate the result.
2. Higher values of K will result in higher confidence on the cross-validation result as
compared to lower value of K.
3. If K=N, then it is called Leave one out cross validation, where N is the number of
observations.
a) 1 and 2
b) 2 and 3
c) 1 and 3
d) 1,2 and 3
30] Which of the following function tracks the changes in model statistics?
a) varImp
b) varImpTrack
c) findTrack
d) none of the mentioned
34] Which of the following function is used to read data off the webpages?
a) read.web
b) read.Lines
c) read.Line
d) all of the mentioned
35]Which of the following tool is used for estimating standard errors and the bias of
estimators?
a) knitr
b) jackknife
c) ggplot2
d) all of the mentioned
37] Which of the following returns an array of ones with the same shape and type as a given
array?
a) all_like
b) ones_like
c) one_alike
d) all of the mentioned
38] ___________ decompose the elements of x into mantissa and twos exponent.
a) trunc
b) fmod
c) frexp
d) ldexp
39]_____________________is used to summarize the information in a data set described by
multiple variable.
A]Principal Component Analysis.
B]Exploratory Data Analysis.
C]Multidimensionality.
D]Integrated Development Environment
41] Which of the following curve analysis is conducted on each predictor for classification?
a) NOC
b) ROC
c) COC
d) All of the mentioned
a]single linkage
b]Complete linkage
c]Average linkage
d]Complex linkage
44]____________algorithm is called lazy learner algorithm.
a]KNN
b]SVM
c]PCA
d]EDA
A]Clusters
B]Dimensions
C]Hyperplane
D]Groups
46]