0% found this document useful (0 votes)
76 views144 pages

Combinepdf

Uploaded by

rhythmm quira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views144 pages

Combinepdf

Uploaded by

rhythmm quira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 144

DATA ANALYSIS

PRELIM QUIZ 1
Question 1
Complete

Mark 1.00 out of 1.00

Flag question

Question text
It transforms data into actionable intelligence for business purposes.
Select one:

a.

Text Analytics
b.

Business Intelligence
c.

Data Mining
d.

Statistics Analytics

Question 2
Complete

Mark 1.00 out of 1.00

Flag question

Question text
It is used in organization’s strategic and tactical business decision making.
Select one:

a.
data visualization
b.

text analytics
c.

business intelligence
d.

data mining

Question 3
Complete

Mark 1.00 out of 1.00

Flag question

Question text
The following are artifacts used in data analysis EXCEPT:
Select one:

a.

ANOVA
b.

pivot tables
c.

graphs
d.

statistical tools

Question 4
Complete

Mark 1.00 out of 1.00

Flag question
Question text
The following processes are used in data analysis EXCEPT:
Select one:

a.

transforming
b.

collecting
c.

inspecting
d.

cleansing

Question 5
Complete

Mark 1.00 out of 1.00

Flag question

Question text
_____________ includes identifying groups of data record.
Select one:

a.

Text Analytics
b.

Statistics Analytics
c.

Cluster analysis
d.

Business Intelligence

Question 6
Complete

Mark 1.00 out of 1.00


Flag question

Question text
Which of the following type of text is processed in text analytics?
Select one:

a.

structured
b.

unstructured
c.

unorganized
d.

raw

Question 7
Complete

Mark 1.00 out of 1.00

Flag question

Question text
Which of the following is NOT a method used in data analysis?
Select one:

a.

Business Intelligence
b.

Text Analytics
c.

Statistics Analytics
d.

Data Mining

Question 8
Complete

Mark 1.00 out of 1.00

Flag question

Question text
It is a free software programming language.
Select one:

a.

Orange
b.

WEKA
c.

Knime
d.

R-programming

Question 9
Complete

Mark 1.00 out of 1.00

Flag question

Question text
It has the goal of discovering useful information to support decision making.
Select one:

a.

data mining
b.

data visualization
c.

data analysis
d.

database

Question 10
Complete

Mark 1.00 out of 1.00

Flag question

Question text
It makes complex data more understandable and usable.
Select one:

a.

data mining
b.

text analytics
c.

data visualization
d.

business intelligence

Question 11
Complete

Mark 1.00 out of 1.00

Flag question

Question text
The goal is to transform raw data into understandable business information.
Select one:

a.

Data mining
b.

text analytics
c.

data visualization
d.

business intelligence

Question 12
Complete

Mark 1.00 out of 1.00

Flag question

Question text
What is the process of deriving useful information from text?
Select one:

a.

Statistics Analytics
b.

Text Analytics
c.

Data Mining
d.

Business Intelligence

Question 13
Complete

Mark 1.00 out of 1.00


Flag question

Question text
It extracts meaningful numerical indices from information and make it available to
statistical and machine learning.
Select one:

a.

Text analytics
b.

data mining
c.

business intelligence
d.

data visualization

Question 14
Complete

Mark 1.00 out of 1.00

Flag question

Question text
___________ uses artifacts to present data visually.
Select one:

a.

Text Analytics
b.

data visualization
c.

Statistics Analytics
d.

Data Mining

Question 15
Complete

Mark 1.00 out of 1.00

Flag question

Question text
What programming language doe Orange use?
Select one:

a.

python
b.

Fortran
c.

Cobol
d.

JAVA

Question 16
Complete

Mark 1.00 out of 1.00

Flag question

Question text
It is a powerful tool that shows the network of data.
Select one:

a.

WEKA
b.

Orange
c.

Knime
d.

Rapid Miner

Question 17
Complete

Mark 1.00 out of 1.00

Flag question

Question text
Which of the following data mining techniques is predictive?
Select one:

a.

tracking pattern
b.

clustering
c.

classification
d.

outlier detection

Question 18
Complete

Mark 1.00 out of 1.00

Flag question

Question text
It includes identifying groups of data records.
Select one:

a.

cluster analysis
b.

data mining
c.

data analysis
d.

database

Question 19
Complete

Mark 1.00 out of 1.00

Flag question

Question text
Which of the following is NOT a goal in data mining?
Select one:

a.

evaluating data
b.

discovering useful information


c.

aids in business decision making


d.

collecting data

Question 20
Complete

Mark 1.00 out of 1.00


Flag question

Question text

It is a method for discovering patterns in large data sets.

Select one:

a.

Text Analytics
b.

Data Mining
c.

Statistics Analytics
d.

Business Intelligence

PRELIM QUIZ 2
Question 1
Complete

Mark 1.00 out of 1.00

Flag question

Question text

Which of the following is the transpose of B?


Select one:

a.

b.

c.

d.

Question 2
Complete

Mark 1.00 out of 1.00

Flag question

Question text
What is an organized collection of information and set of information used to manage
that operation?
Select one:

a.

ADT
b.

ML
c.

data structure
d.
data science

Question 3
Complete

Mark 0.00 out of 1.00

Flag question

Question text

3A + B =
Select one:

a.

b.

c.

d.

Question 4
Complete

Mark 1.00 out of 1.00

Flag question
Question text
The intersection of the two sets A={ 2,3} B={4,5} is a
Select one:

a.

singleton
b.

singular
c.

null set
d.

nonsingular

Question 5
Complete

Mark 1.00 out of 1.00

Flag question

Question text
What is the earlier name for data science?
Select one:

a.

datology
b.

dataology
c.

datatology
d.

datalogy

Question 6
Complete

Mark 1.00 out of 1.00


Flag question

Question text
Which is NOT a characteristic feature of data structure?
Select one:

a.

It defines as to how components relate to each other.


b.

Set of operations is on one or more component items.


c.

It contains a fixed structure.


d.

It contains component data

Question 7
Complete

Mark 1.00 out of 1.00

Flag question

Question text
An array is a good example of _________data structure.
Select one:

a.

static
b.

dynamic
c.
linear
d.

nonlinear

Question 8
Complete

Mark 1.00 out of 1.00

Flag question

Question text
What is the size of the product of a 5x 6 and a 6x 8 matrices?
Select one:

a.

5x5
b.

8x8
c.

8x5
d.

5x 8

Question 9
Complete

Mark 1.00 out of 1.00

Flag question

Question text
If A={ 2,3} B={4,5},which of the following is a Cartesian product of the two sets?
Select one:

a.

{ (3,4) (3,5) (2,4 ) {2,2) }


b.

{ (3,3) (3,5) (2,4 ) {2,5) }


c.

{ (3,4) (3,5) (2,4 ) {2,5) }


d.

{ (3,4) (3,3) (2,4 ) {2,5) }

Question 10
Complete

Mark 1.00 out of 1.00

Flag question

Question text

Which of the following is TRUE?


Select one:

a.

AB is not possible
b.

AB=BA
c.

A + B = B+ A
d.

BC=CB

Question 11
Complete

Mark 1.00 out of 1.00

Flag question

Question text
What is a data structure that has a fixed size?
Select one:

a.

dynamic
b.

linear
c.

nonlinear
d.

static

Question 12
Complete

Mark 1.00 out of 1.00

Flag question

Question text
ML means:
Select one:

a.

Machine Learning
b.

Mobile Learning
c.
Math Learning
d.

Machine Landscaping

Question 13
Complete

Mark 1.00 out of 1.00

Flag question

Question text
Addition and subtraction of matrices only is possible if two are more matrices.
Select one:

a.

Have same sizes.

b.

Have same number of rows


c.

Are square matrices.


d.

Have same number of columns.

Question 14
Complete

Mark 1.00 out of 1.00

Flag question

Question text
The two sets If A={ 2,3} B={4,5} are said to be
Select one:

a.

adjoint
b.

disjoint
c.

joint
d.

equal

Question 15
Complete

Mark 1.00 out of 1.00

Flag question

Question text
What is the correct meaning of ADT?
Select one:

a.

Abstract Data Type


b.

Adequate Data Tautology


c.

Adequate Data Type


d.

Abstract Data Topography

Question 16
Complete

Mark 1.00 out of 1.00


Flag question

Question text
It refers to a data structure that grows and shrinks at execution time.
Select one:

a.

dynamic
b.

nonlinear
c.

linear
d.

static

Question 17
Complete

Mark 1.00 out of 1.00

Flag question

Question text

Matrix B is
Select one:

a.

transpose
b.

inverse
c.

singular
d.

invertible

Question 18
Complete

Mark 1.00 out of 1.00

Flag question

Question text
What is the focus of data science?
Select one:

a.

statistical computation
b.

manipulate data efficiently and effectively


c.

organization of data

d.

collection of data

Question 19
Complete

Mark 1.00 out of 1.00

Flag question

Question text
Which of the matrices is singular?

Select one:

a.

B
b.

none
c.

C
d.

Question 20
Complete

Mark 1.00 out of 1.00

Flag question

Question text
_______________ is a data structure that every component has a unique processor and
succesor.
Select one:

a.

static
b.

dynamic
c.

linear
d.
nonlinear

MIDTERM QUIZ 1
Question 1
Complete

Mark 1.00 out of 1.00

Flag question

Question text
It shows a high correlation between the incidence of flu and searches about flu on
google.
Select one:

a.

Google Flu trends


b.

Google Flu Viral


c.

Google Flu Searches


d.

Google Flu Reactions

Question 2
Complete

Mark 0.00 out of 1.00

Flag question

Question text
It refers to well based theories and sound business judgement.
Select one:
a.

Data Mining
b.

Data Science
c.

Data Analytics
d.

Data visualization

Question 3
Complete

Mark 1.00 out of 1.00

Flag question

Question text
PAW means____________.
Select one:

a.

Preliminary Assumption Web


b.

Predicting Analytics Web


c.

Predictive Analytics World


d.

Predictive Analytics web

Question 4
Complete

Mark 1.00 out of 1.00


Flag question

Question text
He said that “ In mathematics the art of proposing a question must be held of higher
value than solving it”.
Select one:

a.

Eric Schmidt
b.

Francis Galton
c.

William Gibson
d.

Georg Cantor

Question 5
Complete

Mark 1.00 out of 1.00

Flag question

Question text
These are the data skills that a good data scientist need to cultivate EXCEPT
Select one:

a.

Communication
b.

speaking
c.

coding
d.

Math and Stats

Question 6
Complete

Mark 1.00 out of 1.00

Flag question

Question text
What is a great example of data product?

Select one:

a.

google drive
b.

google navigation
c.

google navigation
d.

google maps

Question 7
Complete

Mark 0.00 out of 1.00

Flag question

Question text
It expands available data enormously since there is so much more text being generated
than numbers.
Select one:
a.

text analysis
b.

data mining
c.

data ranking
d.

Text mining

Question 8
Complete

Mark 1.00 out of 1.00

Flag question

Question text
The following are the 3V's of big data EXCEPT
Select one:

a.

velocity
b.

veracity
c.

variety
d.

volume

Question 9
Complete

Mark 1.00 out of 1.00


Flag question

Question text
The developer of farmville, a famous game in the internet.
Select one:

a.

Zynga Incorporated
b.

Moontoon
c.

Supercell
d.

Electronic Arts

Question 10
Complete

Mark 1.00 out of 1.00

Flag question

Question text
The explosion of _______data is the main reason why every 2 days 5 exabytes of data are
generated.
Select one:

a.

gargantuan
b.

reaction
c.

transaction
d.

interaction

Question 11
Complete

Mark 1.00 out of 1.00

Flag question

Question text
He pointed out that until 2003 ,all of mankind had generated just 5 exabytes of data
Select one:

a.

Eric Schmidt
b.

Eric Smidth
c.

Eric Smith
d.

Eric Smicht

Question 12
Complete

Mark 1.00 out of 1.00

Flag question

Question text
Which is Not an interaction data?
Select one:

a.

data base
b.

RFID data
c.

geo-location
d.

browser action

Question 13
Complete

Mark 1.00 out of 1.00

Flag question

Question text
A new phenomenon for the explosion of _________data
Select one:

a.

communication
b.

transient
c.

interaction
d.

transaction

Question 14
Complete

Mark 1.00 out of 1.00

Flag question

Question text
The creation of data from varied sources and its qualification into information.
Select one:

a.

datafition
b.

datafitration
c.

datafication
d.

datacation

Question 15
Complete

Mark 1.00 out of 1.00

Flag question

Question text
“ All models are wrong but some are useful “
Select one:

a.

DJ Patil
b.

William Gibson
c.

George E. P. Box
d.

Georg cantor

Question 16
Complete

Mark 1.00 out of 1.00


Flag question

Question text
The person who said that “ The future is not google-able”.
Select one:

a.

William Gibson
b.

Georg cantor
c.

D J Patil
d.

Eric Schmidth

Question 17
Complete

Mark 1.00 out of 1.00

Flag question

Question text
Exabyte means ________bytes
Select one:

a.

trillion trillion
b.

thousand thousand
c.

million million
d.

billion billion

Question 18
Complete

Mark 1.00 out of 1.00

Flag question

Question text
IOT means
Select one:

a.

Interconnction of things
b.

Internet of time
c.

Interaction of time
d.

Internet of things

Question 19
Complete

Mark 1.00 out of 1.00

Flag question

Question text
How many bytes of data are generated every two days in today's world?
Select one:

a.

5 terabytes
b.

5 exabytes
c.

5 gigabytes
5 gigabytes
d.

5 megabytes

Question 20
Complete

Mark 1.00 out of 1.00

Flag question

Question text
The creation of data from varied sources and its quantification into information.
Select one:

a.

datology
b.

datalization
c.

Datafication
d.

dataology

PRE-TEST
Question 1
Correct

Mark 1.00 out of 1.00


Flag question

Question text
The proportion of a well defined positive event is called _________________.
Select one:

a.

probability
b.

sensitivity
c.

anonimity
d.

specificity
Feedback
Your answer is correct.

The correct answer is: sensitivity

Question 2
Correct

Mark 1.00 out of 1.00

Flag question

Question text
AUC means___________.
Select one:

a.

Artificial Under Cover


b.

Area Under the Curve


c.

Area Under Coverage


d.

Artificial Unit Curve


Feedback
Your answer is correct.

The correct answer is: Area Under the Curve

Question 3
Correct

Mark 1.00 out of 1.00

Flag question

Question text
It allows you to see which value of the explanatory variable corresponds a given
probability success.
Select one:

a.

ogive
b.

probability table
c.

probability analysis table


d.

histogram
Feedback
Your answer is correct.

The correct answer is: probability analysis table

Question 4
Correct

Mark 1.00 out of 1.00


Flag question

Question text
LR means ________________________.
Select one:

a.

Logistic Regression
b.

Logistic Reinforcement
c.

Linear Regression
d.

Linear Relativity
Feedback
Your answer is correct.

The correct answer is: Linear Regression

Question 5
Correct

Mark 1.00 out of 1.00

Flag question

Question text
Positive correlation means that_______________.
Select one:

a.

as x decreases y increases
b.

as x increases y decreases
c.

as x increases y remains constant


d.

as x increases y also increases and vice versa


Feedback
Your answer is correct.

The correct answer is: as x increases y also increases and vice versa

Question 6
Correct

Mark 1.00 out of 1.00

Flag question

Question text
Which of the following belong to the GLM?
Select one:

a.

exponential
b.

quadratic
c.

logistic
d.

multivariate
Feedback
Your answer is correct.

The correct answer is: logistic

Question 7
Correct

Mark 1.00 out of 1.00


Flag question

Question text
GLM means_____________.
Select one:

a.

Generalized Linear Model


b.

Generalized Line Mode


c.

General Line Model


d.

Generalized Linear Mode


Feedback
Your answer is correct.

The correct answer is: Generalized Linear Model

Question 8
Correct

Mark 1.00 out of 1.00

Flag question

Question text
The proportion of well defined negative events is called ________________.
Select one:

a.

regression
b.

probability
c.

specificity
d.

sensitivity
Feedback
Your answer is correct.

The correct answer is: specificity

Question 9
Correct

Mark 1.00 out of 1.00

Flag question

Question text
The method that does not require the assumption that parameters are normally
distributed.
Select one:

a.

profile likeness
b.

feedback
c.

profile likehood
d.

parameter range
Feedback
Your answer is correct.

The correct answer is: profile likehood

Question 10
Correct

Mark 1.00 out of 1.00


Flag question

Question text
Data involving two variables are called _________data.
Select one:

a.

dichotomal
b.

multivariate
c.

dichotomy
d.

bivariate
Feedback
Your answer is correct.

The correct answer is: bivariate

Any way to get new expressions from old ones.

Select one:

a.

semantic

b.

surrogate

c.

reasoning

d.

inference

Feedback
The correct answer is: inference
Question 2
Complete

Mark 0.00 out of 1.00

Flag question

Question text
The following are distinct roles that KR plays EXCEPT

Select one:

a.

Surrogate

b.

Medium of human expression

c.

Medium for pragmatically diligent interpretation

d.

Set of ontological commitments

Feedback
The correct answer is: Medium for pragmatically diligent interpretation

Question 3
Complete

Mark 1.00 out of 1.00

Flag question

Question text
Which is NOT a basic representation technology?
Select one:

a.

frame
b.

graph
c.

logic
d.

semantic net
Feedback
The correct answer is: graph

Question 4
Complete

Mark 1.00 out of 1.00

Flag question

Question text
All representations are ________.

Select one:

a.

unstable

b.

perfect

c.

stable

d.

imperfect

Feedback
The correct answer is: imperfect

Question 5
Complete

Mark 1.00 out of 1.00


Flag question

Question text
KR means __________________________.
Select one:

a.

Knowledge Request
b.

Knowledge Requisition
c.

Knowledge Representation
d.

Knowledge Replenished
Feedback
The correct answer is: Knowledge Representation

Question 6
Complete

Mark 1.00 out of 1.00

Flag question

Question text
KR is a set of __________commitments.

Select one:

a.

social

b.

anthropological
c.

ontological

d.

psychological

Feedback
The correct answer is: ontological

Question 7
Complete

Mark 1.00 out of 1.00

Remove flag

Question text
A network purpoting to describe family memberships.
Select one:

a.

network topology
b.

network adherence
c.

networking
d.

network tautology
Feedback
The correct answer is: network topology

Question 8
Complete

Mark 1.00 out of 1.00


Flag question

Question text
It sees a set of prototypes in particular prototypical diseases to be matched against the
case at hand.

Select one:

a.

MYCIN
b.

SEMANTIC NETS
c.

INTERNIST
d.

LOGIC
Feedback
The correct answer is: INTERNIST

Question 9
Complete

Mark 1.00 out of 1.00

Flag question

Question text
The following provided inspirations of what constitutes intelligent reasoning EXCEPT
Select one:

a.

Statistics
b.

Psychology
c.

Sociology
d.

Biology
Feedback
The correct answer is: Sociology

Question 10
Complete

Mark 1.00 out of 1.00

Flag question

Question text
It is a variety of formal calculation typically deduction.
Select one:

a.

Intelligent Reasoning
b.

GLM
c.

Artificial Intelligence
d.

KR
Feedback
The correct answer is: Intelligent Reasoning

Question 11
Complete

Mark 1.00 out of 1.00


Flag question

Question text
Which is NOT a component of KR?

Select one:

a.

set of inferences that represent sactions

b.

fundamental conception

c.

set of inferences that it recommends

d.

it adheres to the function

Feedback
The correct answer is: it adheres to the function

Question 12
Complete

Mark 0.00 out of 1.00

Flag question

Question text
The following are abstract notions EXCEPT

Select one:

a.

processees

b.

actions
c.

beliefs

d.

casualty

Feedback
The correct answer is: casualty

Question 13
Complete

Mark 1.00 out of 1.00

Flag question

Question text
It is a process that goes on internally while most things it wishes about exists only
externally.

Select one:

a.

inference

b.

logic

c.

actions

d.

reasoning

Feedback
The correct answer is: reasoning

Question 14
Complete

Mark 1.00 out of 1.00


Flag question

Question text
Which is NOT a KR technology?

Select one:

a.

frames

b.

logic

c.

semantic nets

d.

roles

Feedback
The correct answer is: roles

Question 15
Complete

Mark 1.00 out of 1.00

Flag question

Question text
It is used to enable an entity to determine consequences by thinking rather than acting.
Select one:

a.

Knowledge Representation
b.

Artificial Intelligence
c.

Intelligent reasoning
d.

Knowledge Channel
Feedback
The correct answer is: Knowledge Representation

Question 16
Complete

Mark 0.00 out of 1.00

Flag question

Question text
It views the world in terms of attributes object value triples.
Select one:

a.

frame
b.

semantic net
c.

rule based
d.

logic
Feedback
The correct answer is: rule based

Question 17
Complete

Mark 0.00 out of 1.00


Flag question

Question text
It views the world in thinking of prototypical objects.
Select one:

a.

logic
b.

rule
c.

semantic net
d.

frame
Feedback
The correct answer is: frame

Question 18
Complete

Mark 1.00 out of 1.00

Flag question

Question text
It involves a commitment in viewing the world in terms of individual entities and
relations.
Select one:

a.

logic
b.

semantic nets
c.

frame
d.

rules
Feedback
The correct answer is: logic

Question 19
Complete

Mark 1.00 out of 1.00

Flag question

Question text
KR as a _________is a substitute for the thing itself.

Select one:

a.

surrogate

b.

semantic

c.

ontological

d.

pragmatic

Feedback
The correct answer is: surrogate

Question 20
Complete

Mark 1.00 out of 1.00


Flag question

Question text
It is a language that we say things about the world.

Select one:

a.

Medium of human experiences

b.

Medium of ontological commitments

c.

Medium of pragmatic evidences

d.

Medium of human expression

Feedback
The correct answer is: Medium of human expression
lOMoARcPSD|16010511

DATA Analysis Q1 Midterm EXAM

Data Communication and Networking 3 (AMA Computer University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Rythm Quira ([email protected])
lOMoARcPSD|16010511

UGRD-MATH6200 Data Analysis


PRELIM Q1 20/20
It makes complex data more understandable and usable.

data visualization

The goal is to transform raw data into understandable business information.

Data mining

Which of the following is NOT a goal in data mining?

collecting data

It includes identifying groups of data records.


cluster analysis

It has the goal of discovering useful information to support decision making.

data analysis

What programming language doe Orange use?

Python

The following processes are used in data analysis EXCEPT:

Collecting

Which of the following data mining techniques is predictive?

Classification

It is a method for discovering patterns in large data sets.

Data Mining

It transforms data into actionable intelligence for business purposes.

Business Intelligence

It extracts meaningful numerical indices from information and make it available to statistical
and machine learning.

Text analytics

It is a powerful tool that shows the network of data.

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Knime

Which of the following type of text is processed in text analytics?

Unstructured

Which of the following is NOT a method used in data analysis?

Statistics Analytics

_____________ includes identifying groups of data record.

Cluster analysis

What is the process of deriving useful information from text?

Text Analytics

___________ uses artifacts to present data visually.

data visualization

It is a free software programming language.

R-programming

The following are artifacts used in data analysis EXCEPT:

ANOVA

It is used in organization’s strategic and tactical business decision making.

business intelligence

PRELIM Q2 20/20
What is an organized collection of information and set of information used to manage that
operation?

ADT

Addition and subtraction of matrices only is possible if two are more matrices.

Have same sizes.

If A={ 2,3} B={4,5},which of the following is a Cartesian product of the two sets?

{ (3,4) (3,5) (2,4 ) {2,5) }

What is a data structure that has a fixed size?

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Static

Which of the following is the transpose of B?

The two sets If A={ 2,3} B={4,5} are said to be

Disjoint

What is the correct meaning of ADT?

Abstract Data Type

An array is a good example of _________data structure.

static

It refers to a data structure that grows and shrinks at execution time.

Dynamic

Matrix B is

Invertible

Which is NOT a characteristic feature of data structure?

It contains a fixed structure.

The intersection of the two sets A={ 2,3} B={4,5} is a

null set

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Which of the following is TRUE?

A + B = B+ A

ML means:

Machine Learning

_______________ is a data structure that every component has a unique processor and
succesor.

linear

3A + B =

Which of the matrices is singular?

What is the earlier name for data science?

Datalogy

What is the size of the product of a 5x 6 and a 6x 8 matrices?

5x 8

What is the focus of data science?

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

manipulate data efficiently and effectively

PRELIM EXAM 49/50

It is used to discover patterns in large data sets

Data mining

3A + B

It includes identifying groups of data records

cluster analysis

In α =babaa β =a^6b^5bb, what is the length of the concatenation of the two strings?

18

What programming language is used in Rapid miner?

Java

It makes complex data more understandable and usable.

data visualization

There are how many data mining techniques?

It is a theoretical classification that estimates and anticipates the increase increase in


running time for algorithms.

run time analysis

The product of a 2x5 and 5x3 matrices is a ______matrix

2x3

Which of the following is TRUE?

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

A + B = B+ A

Which of the following is a predictive data mining technique?

Regression

The symbol used to indicate strings with no elements.

λ
Another term for text analytics.
text mining
The following are softwares used in data mining EXCEPT
SPSS

Which of the following is the transpose of B?

It is a process of finding the computational complexity of algorithms.


analysis of algorithms
It is popular among financial data analysts.

Knime
It is used in organization’s strategic and tactical business decision making.
business intelligence
It is a process of finding the computational complexity of algorithms.
analysis of algorithms
It is a powerful tool that shows the network of data.
Knime
Matrix B is
Invertible
The following are large inputs EXCEPT
Big beta notation

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

It relates the length of an algorithm to the number of storage location it uses.


space complexity
Refers to using tools of statistics to present data visually.
data visualization
Which of the following data mining techniques is predictive?

classification
It is used for prototyping in Rapid miner.
studio
The process of inspecting,cleansing,transforming and modelling data with the goal of
discovering useful information.
data analysis
Another term for an empty set.
Null
What type of text are processed in Text analytics?
Unstructured
A special type of function where the domain is a set of consecutive integers.
Sequence
The sets A= { x/x is a distinct letter in the word "MATHEMATICS"} and B={x/x is a distinct
letter in the word "STATISTICS"} , the two sets are
Joint
The goal is to transform raw data into understandable business information.
Data mining
Addition and subtraction of matrices only is possible if two are more matrices.
Have same sizes.
The function describing the performance of an algorithm is usually an upper bound
determined from ______inputs.
worst case
An example of an abstract computer.
Turing machine

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Which of the matrices is singular?


A
It is a free software programming language.
R-programming
Algorithm analysis is an important part of a broader_____________.
computational complexity theory
If A= { x/x is a distinct letter in the word "MATHEMATICS"} AND B={x/x is a distinct letter in
the word "STATISTICS"} then their intersection is
{A,C,I,S,T}
If R= { (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R which the domain is
{3,5,6}
The range in R={ (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R is

{3,5,6,10,12}
Null strings are indicated by
λ
It relates the length of an algorithm’s input to the number of steps it takes.
time complexity
What is the size of the product of a 5x 6 and a 6x 8 matrices?
5x 8
It offers a way to examine trends from collected data and derive insights from it.
Business Intelligence
He coined the term <analysis of algorithms=.
Donald Knuth
The constant multiplicative factor in which algorithms are related are_______ constants.
Hidden
it is a perfect software for machine learning.
orange

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Earlier name for data science.


Datalogy
A matrix that has the same number of rows and columns is called
square

MIDTERM Q1 20/20
It expands available data enormously since there is so much more text being generated than
numbers.
Text mining
A new phenomenon for the explosion of _________data
Interaction
It shows a high correlation between the incidence of flu and searches about flu on google.
Google Flu trends
What is a great example of data product?
google maps
These are the data skills that a good data scientist need to cultivate EXCEPT
Speaking
It refers to well based theories and sound business judgement.
Data Science
The developer of farmville, a famous game in the internet.
Zynga Incorporated
The following are the 3V's of big data EXCEPT
Veracity
He pointed out that until 2003 ,all of mankind had generated just 5 exabytes of data
Eric Schmidt
The creation of data from varied sources and its quantification into information.
Datafication
The explosion of _______data is the main reason why every 2 days 5 exabytes of data are
generated.
Interaction
Exabyte means ________bytes

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

billion billion
IOT means
Internet of things
PAW means____________.
Predictive Analytics World
He said that < In mathematics the art of proposing a question must be held of higher value
than solving it=.
Georg Cantor
How many bytes of data are generated every two days in today's world?
5 exabytes
< All models are wrong but some are useful <
George E. P. Box
The person who said that < The future is not google-able=.
William Gibson
Which is Not an interaction data?
data base
The creation of data from varied sources and its qualification into information.
Datafication

MIDTERM Q2 20/20
What range of values 3 SD below and above the mean in a normal distribution if the mean
is 10 and standard deviation is 2?
4-16
What is the value of the mean if a score of 110 is 3 standard deviation above the mean?
95
What is the value of the standard deviation in a standard normal distribution?
1
What percent of data will lie within 2 standard deviation of the mean?

95
Empirical rule for a normal distribution that is 3 standard deviations above and below the
mean covers ______% of the data.

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

99.7
What range of values lie between 3 standard deviations above and below the mean if the
mean is 80 and the standard deviation is 3?
71-89
A bell shaped curve that is symmetric about a vertical line.
normal distribution
A distribution where large distribution are displayed.
Grouped frequency distribution
The area of the standard normal curve to the right of z=0.82 is _______.
0.206
The normal distribution with a mean of 0 and standard deviation of 1.
Standard
A score of 50 lies 2 standard deviations above a mean of 30.What is the value of the
standard deviation?
10
A bell-shaped distribution that is symmetric about a vertical line?
Normal
What is the mean for a standard normal distribution?
0
A survey of 100 consumers said that the price charged for a kilo of rice could be
approximated by a normal distribution with a mean of 35 and a standard deviation of 4.How
many of them lie between 27 and 43?
95
Empirical rule for a normal distribution that is 2 standard deviations above and below the
mean is ________% of data.
95
Empirical rule for a normal distribution lie ______% of data with 1 standard deviation below
and above the mean.
68
A graph used to indicate intervals in a frequency distribution is refereed to as
a______________.
Histogram

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Which of the following is TRUE when a distribution is normal?


Mean=Median=Mode
Lists the percent of data in each distribution.
relative frequency distribution
A survey of 100 consumers said that the price charged for a kilo of rice could be
approximated by a normal distribution with a mean of 35 and a standard deviation of 4.How
many are less than 39?
84

MIDTERM EXAM 48/50


A vegetable distributor knows that during the month of August ,the weights of tomatoes
are normally distributed with a mean of 0.61 lb and a standard deviation of 0.15 lb. How
many can be expected to weigh between 0.31 to 0.91 in a shipment of 4500 tomatoes.
4275
What percent of data will lie within 2 standard deviation of the mean?
95
If there are 103 scores the median is equal to the _____ranked score.
52nd
Which of the following is used as a method for Correlation?
Pearson r
The major outcome of correlation.
Prediction
On an examination given to 1000 students, Jef’s score of 80 was higher than the score of
480 students who took the exam. What is the percentile for Jef’s score?
48th
A survey of 100 consumers said that the price charged for a kilo of rice could be
approximated by a normal distribution with a mean of 35 and a standard deviation of 4.How
many of them lie between 27 and 43?
95
It list the percent of data in a distribution.
relative frequency distribution
The equation of the _______line predicts the value of Y given X.
Regression

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

It refers to the degree of relationship between two variables?


Correlation
What increases data volume?
Velocity
The middle-most value in a ranked list of numbers.
Median
The difference between the highest and lowest value.
Range
What is the value of the mean if a score of 110 is 3 standard deviation above the mean?
95
A bell-shaped distribution that is symmetric about a vertical line?
Normal
A vegetable distributor knows that during the month of August ,the weights of tomatoes
are normally distributed with a mean of 0.61 lb and a standard deviation of 0.15 lb. What
percent of the tomatoes weigh less than 0.71 lb?
95
It expands available data enormously.
text mining
Which is NOT a value of r ?
1.02
A score of 50 lies 2 standard deviations above a mean of 30.What is the value of the
standard deviation?
10
The value of X in the regression equation Y= 1.24 X + 6.9 if Y=13.1 is
5
The following are elements in an analytic plan EXCEPT
graphs
The creation of a data product contains 3 components EXCEPT
Time
The quantification of data into information.
Datafication

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

If the standard deviation of a distribution is 3.5, the variance is


12.25
A perfect positive correlation coefficient is equal to
1
Who said that "The future is not google-able " ?
William Gillason
Which is NOT a correct correlation Coefficient?
1.2
In the equation of the regression line represented by Y= 1.24 X + 6.9 if X=2 then Y =?
9.38
He coined the term "data scientist"
DJ Patil
Which of the following is TRUE when a distribution is normal?
Mean
The number that occurs most frequently is called________.
Mode
A survey of 100 consumers said that the price charged for a kilo of rice could be
approximated by a normal distribution with a mean of 35 and a standard deviation of 4.How
many are less than 39?
84
The method of correlation used for ranked score is ________.
Spearman rho
A data having the same number of occurrence in scores is said to be
no mode
A bell-shaped distribution that is symmetric about a vertical line.
Normal
A positive z-score means that the score is
Higher than the mean
Example of a data product.
google map
The area of the standard normal curve to the right of z=0.82 is _______.

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

0.206
As of 2014,there are _______million of tweets a day.
500
What range of values lie between 3 standard deviations above and below the mean if the
mean is 80 and the standard deviation is 3?
71-89
Data is NOT information unless we add_________.
Analytics
A vegetable distributor knows that during the month of August ,the weights of tomatoes
are normally distributed with a mean of 0.61 lb and a standard deviation of 0.15 lb. How
many can be expected to weigh more than 0.31 lb in a shipment of 6000 tomatoes.
200
The score NOT easily affected by extreme values.
Median
A negative correlation exists when___________.
x increases y decreases
According to Hilary Mason which is NOT a skill that a good data scientist must cultivate.
critical thinking
He is someone who asks interesting questions on formal and informal theory.
data scientist
Data involving two variables.
Bivariate
It partitions a ranked data into four equal groups.
Quartile
A graph that is used to indicate frequency distribution.
Histogram
The normal distribution with a mean of 0 and standard deviation of 1.
Standard

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

ENGR DATA Analysis Midterm QUZ 2

System Analysis, Design and Development (AMA Computer University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Rythm Quira ([email protected])
lOMoARcPSD|16010511

Midterm Quiz 2: Attempt review http://trimestral.amaesonline.com/2213A/mod/quiz/review.php?attemp...

Started on Thursday, 10 November 2022, 11:31 AM


State Finished
Completed on Thursday, 10 November 2022, 11:40 AM
Time taken 8 mins 15 secs
Marks 20.00/20.00
Grade 100.00 out of 100.00

Question 1

Correct

Mark 1.00 out of 1.00

What range of values 3 SD below and above the mean in a normal distribution if the mean is 10 and standard deviation is 2?

Select one:
a. 10-14

b. 5-15

c. 4-16

d. 8-14

Your answer is correct.

Question 2

Correct

Mark 1.00 out of 1.00

Lists the percent of data in each distribution.

Select one:
a. relative frequency distribution
b. grouped frequency distribution

c. ogive

d. histogram

Your answer is correct.

Question 3

Correct

Mark 1.00 out of 1.00

Which of the following is TRUE when a distribution is normal?

Select one:

Downloaded by Rythm Quira ([email protected])


1 of 7 10/11/2022, 11:40 AM
lOMoARcPSD|16010511

Midterm Quiz 2: Attempt review http://trimestral.amaesonline.com/2213A/mod/quiz/review.php?attemp...

Select one:
a. Mean > Median >Mode

b. Mean=Median=Mode

c. Mean < Median <Mode

d. Mean >Mode >Median

Your answer is correct.

Question 4

Correct

Mark 1.00 out of 1.00

A bell shaped curve that is symmetric about a vertical line.

Select one:
a. normal distribution

b. kurtic

c. standard distribution

d. skewed

Your answer is correct.

Question 5

Correct

Mark 1.00 out of 1.00

A bell-shaped distribution that is symmetric about a vertical line?

Select one:
a. symmetric

b. skewed

c. standard

d. normal

Your answer is correct.

Question 6

Correct

Mark 1.00 out of 1.00

A distribution where large distribution are displayed.

Select one:

Downloaded by Rythm Quira ([email protected])


2 of 7 10/11/2022, 11:40 AM
lOMoARcPSD|16010511

Midterm Quiz 2: Attempt review http://trimestral.amaesonline.com/2213A/mod/quiz/review.php?attemp...

a. Relative frequency distribution

b. histogram

c. Grouped frequency distribution

d. ogive

Your answer is correct.

Question 7

Correct

Mark 1.00 out of 1.00

A graph used to indicate intervals in a frequency distribution is refereed to as a______________.

Select one:
a. bar graph

b. histogram

c. pie graph

d. ogive

Your answer is correct.

Question 8

Correct

Mark 1.00 out of 1.00

A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a normal distribution with a mean of 35 and a
standard deviation of 4.How many are less than 39?

Select one:
a. 80

b. 84

c. 82

d. 78

Your answer is correct.

Question 9

Correct

Mark 1.00 out of 1.00

What is the value of the standard deviation in a standard normal distribution?

Downloaded by Rythm Quira ([email protected])


3 of 7 10/11/2022, 11:40 AM
lOMoARcPSD|16010511

Midterm Quiz 2: Attempt review http://trimestral.amaesonline.com/2213A/mod/quiz/review.php?attemp...

Select one:
a. 5

b. 0

c. 1

d. 2

Your answer is correct.

Question 10

Correct

Mark 1.00 out of 1.00

A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a normal distribution with a mean of 35 and a
standard deviation of 4.How many of them lie between 27 and 43?

Select one:
a. 92

b. 95

c. 90

d. 88

Your answer is correct.

Question 11

Correct

Mark 1.00 out of 1.00

What is the mean for a standard normal distribution?

Select one:
a. 5

b. 0

c. 2

d. 1

Your answer is correct.

Question 12

Correct

Mark 1.00 out of 1.00

Downloaded by Rythm Quira ([email protected])


4 of 7 10/11/2022, 11:40 AM
lOMoARcPSD|16010511

Midterm Quiz 2: Attempt review http://trimestral.amaesonline.com/2213A/mod/quiz/review.php?attemp...

The normal distribution with a mean of 0 and standard deviation of 1.

Select one:
a. Skewed

b. kurtic

c. Standard

d. skewed to the right

Your answer is correct.

Question 13

Correct

Mark 1.00 out of 1.00

A score of 50 lies 2 standard deviations above a mean of 30.What is the value of the standard deviation?

Select one:
a. 10

b. 25

c. 20

d. 15

Your answer is correct.

Question 14

Correct

Mark 1.00 out of 1.00

Empirical rule for a normal distribution lie ______% of data with 1 standard deviation below and above the mean.

Select one:
a. 68

b. 64

c. 75

d. 79

Your answer is correct.

Question 15

Correct

Mark 1.00 out of 1.00

Downloaded by Rythm Quira ([email protected])


5 of 7 10/11/2022, 11:40 AM
lOMoARcPSD|16010511

Midterm Quiz 2: Attempt review http://trimestral.amaesonline.com/2213A/mod/quiz/review.php?attemp...

The area of the standard normal curve to the right of z=0.82 is _______.

Select one:
a. 0.295

b. 209

c. 0.294

d. 0.206

Your answer is correct.

Question 16

Correct

Mark 1.00 out of 1.00

Empirical rule for a normal distribution that is 2 standard deviations above and below the mean is ________% of data.

Select one:
a. 80

b. 90

c. 95

d. 85

Your answer is correct.

Question 17

Correct

Mark 1.00 out of 1.00

What is the value of the mean if a score of 110 is 3 standard deviation above the mean?

Select one:
a. 90

b. 91

c. 95

d. 85

Your answer is correct.

Question 18

Correct

Mark 1.00 out of 1.00

Downloaded by Rythm Quira ([email protected])


6 of 7 10/11/2022, 11:40 AM
lOMoARcPSD|16010511

Midterm Quiz 2: Attempt review http://trimestral.amaesonline.com/2213A/mod/quiz/review.php?attemp...

Empirical rule for a normal distribution that is 3 standard deviations above and below the mean covers ______% of the data.

Select one:
a. 95

b. 98

c. 99.7

d. 92

Your answer is correct.

Question 19

Correct

Mark 1.00 out of 1.00

What range of values lie between 3 standard deviations above and below the mean if the mean is 80 and the standard deviation is 3?

Select one:
a. 72-89

b. 71-88

c. 71-89

d. 70-89

Your answer is correct.

Question 20

Correct

Mark 1.00 out of 1.00

What percent of data will lie within 2 standard deviation of the mean?

Select one:
a. 95

b. 99

c. 68

d. 90

Your answer is correct.

Downloaded by Rythm Quira ([email protected])


7 of 7 10/11/2022, 11:40 AM
lOMoARcPSD|16010511

Exam 2019, questions and answers

Bachelor of Science in Computer Science (AMA Computer University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Rythm Quira ([email protected])
lOMoARcPSD|16010511

Which is NOT a characteristic feature of data structure?


Select one:
a. It contains component data
b. Set of operations is on one or more component items.
c. It defines as to how components relate to each other.

d. It contains a fixed structure.


Feedback

Your answer is correct.


The correct answer is: It contains a fixed structure.

Question 2
Correct
Mark 1.00 out of 1.00

Flag question

Question text

What is the correct meaning of ADT?


Select one:
a. Adequate Data Tautology
b. Abstract Data Topography
c. Abstract Data Type

d. Adequate Data Type


Feedback

Your answer is correct.


The correct answer is: Abstract Data Type

Question 3
Correct
Mark 1.00 out of 1.00

Flag question

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Question text

What is an organized collection of information and set of information used to


manage that operation?
Select one:
a. data science
b. ADT
c. data structure

d. ML
Feedback

Your answer is correct.


The correct answer is: ADT

Question 4
Correct
Mark 1.00 out of 1.00

Flag question

Question text

_______________ is a data structure that every component has a unique processor


and succesor.
Select one:
a. dynamic
b. nonlinear
c. static

d. linear
Feedback

Your answer is correct.


The correct answer is: linear

Question 5
Correct
Mark 1.00 out of 1.00

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Flag question

Question text

An array is a good example of _________data structure.


Select one:
a. linear
b. dynamic
c. nonlinear

d. static
Feedback

Your answer is correct.


The correct answer is: static

Question 6
Correct
Mark 1.00 out of 1.00

Flag question

Question text

ML means:
Select one:
a. Machine Landscaping
b. Math Learning
c. Mobile Learning

d. Machine Learning
Feedback

Your answer is correct.


The correct answer is: Machine Learning

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Question 7
Correct
Mark 1.00 out of 1.00

Flag question

Question text

What is the focus of data science?


Select one:
a. manipulate data efficiently and effectively
b. organization of data
c. collection of data

d. statistical computation
Feedback

Your answer is correct.


The correct answer is: manipulate data efficiently and effectively

Question 8
Correct
Mark 1.00 out of 1.00

Flag question

Question text

What is a data structure that has a fixed size?


Select one:
a. linear
b. static
c. nonlinear

d. dynamic
Feedback

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Your answer is correct.


The correct answer is: static

Question 9
Correct
Mark 1.00 out of 1.00

Flag question

Question text

What is the earlier name for data science?


Select one:
a. dataology
b. datalogy
c. datology

d. datatology
Feedback

Your answer is correct.


The correct answer is: datalogy

Question 10
Correct
Mark 1.00 out of 1.00

Flag question

Question text

It refers to a data structure that grows and shrinks at execution time.


Select one:
a. dynamic
b. nonlinear
c. static

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

d. linear
Feedback

Your answer is correct.


The correct answer is: dynamic

Downloaded by Rythm Quira ([email protected])


Introduction
Data Analysis- is the process of inspecting, cleansing, transforming and modelling data with the goal of
discovering useful information, informing conclusions and supporting decision-making. It is the process of
evaluating data using analytical and statistical tools to discover useful information and aid in business decision
making.

Methods used:
1. Data Mining - is a method of data analysis for discovering patterns in large data sets using methods of
statistics, artificial intelligence, machine learning and data bases. The goal is to transform raw data into
understandable business information. These might include identifying groups of data records (known as
cluster analysis) or identifying anomalies and dependencies between data groups.
2. Text analytics - is the process of deriving useful information from text It is accomplished by processing
unstructured textual information, extract meaningful numerical indices from the information and make
the information available to statistical and machine learning algorithms for further processing.
3. Business Intelligence - transforms data into actionable intelligence for business purposes and maybe
used in an organization's strategic and tactical business decision making. It offers a way for people to
examine trends from collected data and derive insights from it.
4. Data Visualization - refers very simply to the visual representation of data. In the context of data analysis,
it means using the tools of statistics, probability, pivot tables and other artifacts to present data visually.
It makes complex data more understandable and usable.

Data Mining
7 most important data mining techniques:

1. Tracking pattern
2. Classification (predictive)
3. Association (descriptive)
4. Outlier detection
5. Clustering Desciptive0
6. Regression (predictive)
7. Prediction

Data Mining tools


1. Rapid Miner is one of the best predictive-analysis system developed by the company with same name.
It is written in JAVA programming language. It provides an integrated environment for deep learning, text
mining, machine learning and predictive analysis. Rapid Miner offers the server both on premise and in
public/private cloud infrastructures. It has a client/server model as its base. It is rated as the number one
business analytics software.
It consists of three modules:
• Rapid miner studio - for workflow design and prototyping.
• Rapid miner server - to operate predictive data models created in the studio.
• Rapid miner radoop - executes processes directly in Hadoop cluster to simplify predictive analysis.
2. Orange is a perfect software suit for machine learning and data mining. It best aids the data visualization
and is a component-based software. It has been written in Python computing language.
As it is a component-based software, the components of orange are called "widgets". These widgets
range from data visualization and pre-processing to an evaluation of algorithms.
3. Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be
applied directly to a dataset or called from your own Java code. The tool is very sophisticated and used
in many different applications including visualization and algorithms for data analysis and predictive
modelling.
4. Knime is primarily used for data preprocessing-data extraction, transformation and loading. It is a
powerful tool with GUi that shows the network of data nodes. It is popular amongst financial data analysts.
5. R-programming is primarily written in C and in Fortran and a lot of its modules are written in R itself. It's
a free software programming language and software environment for statistical computing and graphics.
nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering and others.
lOMoARcPSD|16010511

Midterm Quiz 1 Data Analysis

Data Analysis (AMA Computer University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Rythm Quira ([email protected])
lOMoARcPSD|16010511

Question 1
Correct
Mark 1.00 out of 1.00
Flag question

Question text
The person who said that “ The future is not google-able”.
Question 1Select one:

a.
Eric Schmidth

b.
D J Patil

c.
William Gibson

d.
Georg cantor
Feedback
Your answer is correct.

Question 2
Correct
Mark 1.00 out of 1.00
Flag question

Question text
The following are the 3V's of big data EXCEPT
Question 2Select one:

a.
volume

b.
variety

c.
veracity

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

d.
velocity
Feedback
Your answer is correct.

Question 3
Correct
Mark 1.00 out of 1.00
Flag question

Question text
These are the data skills that a good data scientist need to cultivate EXCEPT
Question 3Select one:

a.
speaking

b.
coding

c.
Communication

d.
Math and Stats
Feedback
Your answer is correct.

Question 4
Correct
Mark 1.00 out of 1.00
Flag question

Question text
He pointed out that until 2003 ,all of mankind had generated just 5 exabytes of
data
Question 4Select one:

a.
Eric Smidth

b.

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Eric Schmidt

c.
Eric Smith

d.
Eric Smicht
Feedback
Your answer is correct.

Question 5
Correct
Mark 1.00 out of 1.00
Flag question

Question text
It refers to well based theories and sound business judgement.
Question 5Select one:

a.
Data visualization

b.
Data Mining

c.
Data Analytics

d.
Data Science
Feedback
Your answer is correct.

Question 6
Correct
Mark 1.00 out of 1.00
Flag question

Question text
How many bytes of data are generated every two days in today's world?
Question 6Select one:

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

a.
5 exabytes

b.
5 terabytes

c.
5 gigabytes
5 gigabytes

d.
5 megabytes
Feedback
Your answer is correct.

Question 7
Correct
Mark 1.00 out of 1.00
Flag question

Question text
Which is Not an interaction data?
Question 7Select one:

a.
browser action

b.
RFID data

c.
data base

d.
geo-location
Feedback
Your answer is correct.

Question 8
Correct
Mark 1.00 out of 1.00

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Flag question

Question text
What is a great example of data product?

Question 8Select one:

a.
google maps

b.
google drive

c.
google navigation

d.
google navigation
Feedback
Your answer is correct.

Question 9
Correct
Mark 1.00 out of 1.00
Flag question

Question text
The creation of data from varied sources and its quantification into information.
Question 9Select one:

a.
datalization

b.
dataology

c.
Datafication

d.

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

datology
Feedback
Your answer is correct.

Question 10
Correct
Mark 1.00 out of 1.00
Flag question

Question text
He said that “ In mathematics the art of proposing a question must be held of
higher value than solving it”.
Question 10Select one:

a.
Eric Schmidt

b.
William Gibson

c.
Francis Galton

d.
Georg Cantor
Feedback
Your answer is correct.

Question 11
Correct
Mark 1.00 out of 1.00
Flag question

Question text
It shows a high correlation between the incidence of flu and searches about flu
on google.
Question 11Select one:

a.
Google Flu trends

b.

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Google Flu Reactions

c.
Google Flu Searches

d.
Google Flu Viral
Feedback
Your answer is correct.

Question 12
Incorrect
Mark 0.00 out of 1.00
Flag question

Question text
The explosion of _______data is the main reason why every 2 days 5 exabytes of
data are generated.
Question 12Select one:

a.
transaction

b.
reaction

c.
gargantuan

d.
interaction
Feedback
Your answer is incorrect.

Question 13
Correct
Mark 1.00 out of 1.00
Flag question

Question text
PAW means____________.
Question 13Select one:

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

a.
Predictive Analytics web

b.
Predictive Analytics World

c.
Preliminary Assumption Web

d.
Predicting Analytics Web
Feedback
Your answer is correct.

Question 14
Correct
Mark 1.00 out of 1.00
Flag question

Question text
Exabyte means ________bytes
Question 14Select one:

a.
thousand thousand

b.
billion billion

c.
trillion trillion

d.
million million
Feedback
Your answer is correct.

Question 15
Correct
Mark 1.00 out of 1.00

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Flag question

Question text
It expands available data enormously since there is so much more text being
generated than numbers.
Question 15Select one:

a.
data ranking

b.
text analysis

c.
data mining

d.
Text mining
Feedback
Your answer is correct.

Question 16
Correct
Mark 1.00 out of 1.00
Flag question

Question text
The creation of data from varied sources and its qualification into information.
Question 16Select one:

a.
datafitration

b.
datafication

c.
datacation

d.
datafition

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Feedback
Your answer is correct.

Question 17
Correct
Mark 1.00 out of 1.00
Flag question

Question text
IOT means
Question 17Select one:

a.
Interconnction of things

b.
Internet of time

c.
Interaction of time

d.
Internet of things
Feedback
Your answer is correct.

Question 18
Correct
Mark 1.00 out of 1.00
Flag question

Question text
A new phenomenon for the explosion of _________data
Question 18Select one:

a.
interaction

b.
transaction

c.

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

communication

d.
transient
Feedback
Your answer is correct.

Question 19
Correct
Mark 1.00 out of 1.00
Flag question

Question text
“ All models are wrong but some are useful “
Question 19Select one:

a.
William Gibson

b.
DJ Patil

c.
George E. P. Box

d.
Georg cantor
Feedback
Your answer is correct.

Question 20
Correct
Mark 1.00 out of 1.00
Flag question

Question text
The developer of farmville, a famous game in the internet.
Question 20Select one:

a.
Moontoon

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

b.
Electronic Arts

c.
Supercell

d.
Zynga Incorporated
Feedback
Your answer is correct.

Downloaded by Rythm Quira ([email protected])


lOMoARcPSD|16010511

Midterm Quiz 2 Sauce Data Analysis

Data Analysis (AMA Computer University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Rythm Quira ([email protected])
lOMoARcPSD|16010511

11/13/23, 11:53 AM Midterm Quiz 2: Attempt review

Started on Monday, 13 November 2023, 11:27 AM


State Finished
Completed on Monday, 13 November 2023, 11:48 AM
Time taken 21 mins 21 secs
Marks 20.00/20.00
Grade 100.00 out of 100.00

Question 1
Correct

Mark 1.00 out of 1.00

In 2,4,4,4,5,5,6,8,9 the range is

Select one:

a. 7✓

b. 3

c. 5

d. 6

Your answer is correct.

Question 2
Correct

Mark 1.00 out of 1.00

Which is NOT a measure of variability?

Select one:

a. range

b. quartile ✓

c. variance

d. standard deviation

Your answer is correct.

https://trimestral.amaesonline.com/2313A/mod/quiz/review.php?attempt=213520&cmid=21506 1/10
Downloaded by Rythm Quira ([email protected])
lOMoARcPSD|16010511

11/13/23, 11:53 AM Midterm Quiz 2: Attempt review

Question 3

Correct

Mark 1.00 out of 1.00

Which of the following statements is TRUE?

Select one:

a. Q2=Mean

b. Q2=Mode

c. Q2=median ✓

d. Q2=Range

Your answer is correct.

Question 4
Correct

Mark 1.00 out of 1.00

The number that occurs most frequently is called________.

Select one:

a. Mode ✓

b. median

c. mean

d. range

Your answer is correct.

https://trimestral.amaesonline.com/2313A/mod/quiz/review.php?attempt=213520&cmid=21506 2/10
Downloaded by Rythm Quira ([email protected])
lOMoARcPSD|16010511

11/13/23, 11:53 AM Midterm Quiz 2: Attempt review

Question 5

Correct

Mark 1.00 out of 1.00

A positive z-score means that the score is

Select one:

a. One standard deviation higher than the mean

b. Higher than the mean ✓

c. Lower than the mean

d. Equal to the mean

Your answer is correct.

Question 6
Correct

Mark 1.00 out of 1.00

What is value of quartile 3 in 2,4,4,4,5,5,6,8,9 ?

Select one:

a. 5

b. 6

c. 7✓

d. 8

Your answer is correct.

https://trimestral.amaesonline.com/2313A/mod/quiz/review.php?attempt=213520&cmid=21506 3/10
Downloaded by Rythm Quira ([email protected])
lOMoARcPSD|16010511

11/13/23, 11:53 AM Midterm Quiz 2: Attempt review

Question 7

Correct

Mark 1.00 out of 1.00

A score of 3 in 2,4,4,4,5,5,6,8,9 is

Select one:

a. 1.2 above the mean

b. 1.02 below the mean ✓

c. 1.92 above the mean

d. 1.18 below the mean

Your answer is correct.

Question 8
Correct

Mark 1.00 out of 1.00

The most frequent score.

Select one:

a. median

b. standard deviation

c. mode ✓

d. mean

Your answer is correct.

https://trimestral.amaesonline.com/2313A/mod/quiz/review.php?attempt=213520&cmid=21506 4/10
Downloaded by Rythm Quira ([email protected])
lOMoARcPSD|16010511

11/13/23, 11:53 AM Midterm Quiz 2: Attempt review

Question 9

Correct

Mark 1.00 out of 1.00

Which is not a measure of central tendency?

Select one:

a. mean

b. mode

c. median

d. standard deviation ✓

Your answer is correct.

Question 10
Correct

Mark 1.00 out of 1.00

If in a distribution all scores are distinct then_____________.

Select one:

a. it is normal

b. the mean is higher than the mode

c. it is skewed.

d. there is no mode. ✓

Your answer is correct.

https://trimestral.amaesonline.com/2313A/mod/quiz/review.php?attempt=213520&cmid=21506 5/10
Downloaded by Rythm Quira ([email protected])
lOMoARcPSD|16010511

11/13/23, 11:53 AM Midterm Quiz 2: Attempt review

Question 11

Correct

Mark 1.00 out of 1.00

A distribution with 4 modes is said to be a _________distribution.

Select one:

a. bimodal

b. unimodal

c. trimodal

d. multimodal ✓

Your answer is correct.

Question 12
Correct

Mark 1.00 out of 1.00

The score NOT easily affected by extreme values.

Select one:

a. mode

b. Median ✓

c. mean

d. range

Your answer is correct.

https://trimestral.amaesonline.com/2313A/mod/quiz/review.php?attempt=213520&cmid=21506 6/10
Downloaded by Rythm Quira ([email protected])
lOMoARcPSD|16010511

11/13/23, 11:53 AM Midterm Quiz 2: Attempt review

Question 13

Correct

Mark 1.00 out of 1.00

On an examination given to 1000 students, Jef’s score of 80 was higher than the score of 480 students who took the exam. What is
the percentile for Jef’s score?

Select one:

a. 48th ✓

b. 65th

c. 50th

d. 60th

Your answer is correct.

Question 14
Correct

Mark 1.00 out of 1.00

The score easily affected by extreme values is the _________.

Select one:

a. median

b. Mean ✓

c. mode

d. range

Your answer is correct.

https://trimestral.amaesonline.com/2313A/mod/quiz/review.php?attempt=213520&cmid=21506 7/10
Downloaded by Rythm Quira ([email protected])
lOMoARcPSD|16010511

11/13/23, 11:53 AM Midterm Quiz 2: Attempt review

Question 15

Correct

Mark 1.00 out of 1.00

If there are 101 scores the median is equal to the _____ranked score.

Select one:

a. 54th

b. 55th

c. 52nd

d. 51st ✓

Your answer is correct.

Question 16
Correct

Mark 1.00 out of 1.00

The standard deviation for the data in 2,4,4,4,5,5,6,8,9

Select one:

a. 2.16

b. 2.17 ✓

c. 2.15

d. 2.18

Your answer is correct.

https://trimestral.amaesonline.com/2313A/mod/quiz/review.php?attempt=213520&cmid=21506 8/10
Downloaded by Rythm Quira ([email protected])
lOMoARcPSD|16010511

11/13/23, 11:53 AM Midterm Quiz 2: Attempt review

Question 17

Correct

Mark 1.00 out of 1.00

If the standard deviation of a distribution is 3, the variance is

Select one:

a. 1.41

b. 9 ✓

c. 6

d. 1.5

Your answer is correct.

Question 18
Correct

Mark 1.00 out of 1.00

Another term for variability.

Select one:

a. center

b. mean

c. frequent

d. dispersion ✓

Your answer is correct.

https://trimestral.amaesonline.com/2313A/mod/quiz/review.php?attempt=213520&cmid=21506 9/10
Downloaded by Rythm Quira ([email protected])
lOMoARcPSD|16010511

11/13/23, 11:53 AM Midterm Quiz 2: Attempt review

Question 19

Correct

Mark 1.00 out of 1.00

The distribution 2,4,4,4,5,5,6,8,9 is said to be

Select one:

a. unimodal ✓

b. multimodal

c. bimodal

d. trimodal

Your answer is correct.

Question 20
Correct

Mark 1.00 out of 1.00

Which is NOT a measure of central tendency?

Select one:

a. mean

b. median

c. mode

d. quartile ✓

Your answer is correct.

◄ Online Video Lecture - Statistical Computations (Cleofe) (Part 1)

Jump to...

Statistical Computations ►

https://trimestral.amaesonline.com/2313A/mod/quiz/review.php?attempt=213520&cmid=21506 10/10
Downloaded by Rythm Quira ([email protected])
[Data Analysis]
1 [Introduction]

Module 1 Introduction

Course Learning Outcomes:


1. To define data analysis.
2. To identify the different methods used in data anlysis.
3. To be able to explain each method in data analysis.

Data analysis
Data Analysis- is the process of inspecting,cleansing,transforming and modelling data
with the goal of discovering useful information ,informing conclusions and supporting
decision-making. It is the process of evaluating data using analytical and statistical tools
to discover useful information and aid in business decision making.
Methods used:
1. Data Mining
2.Text analytics
3.Business Intelligence
4. Data Visualization
Data mining-is a method of data analysis for discovering patterns in large data sets using methods of statistics,
artificial intelligence,machine learning and data bases. The goal is to transform raw data into understandable
business information.These might include identifying groups of data records(known as cluster analysis) or
identifying anomalies and dependencies between data groups.
Text Analytics-is the process of deriving useful information from text It is accomplished by processing
unstructured textual information,extract meaningful numerical indices from the information and make the
information available to statistical and machine learning algorithms for further processing.
Business Intelligence-transforms data into actionable intelligence for business purposes and maybe used in an
organization's strategic and tactical business decision making. It offers a way for people to examine trends from
collected data and derive insights from it.

Data Visualization- refers very simply to the visual representation of data. In the context of data, analysis,it means
using the tools of statistics, probability,pivot tables and other artifacts to present data visually. It makes complex
data more understandable and usable.
Data Science
Data science is the science of learning from data. The sciences are focusing on answering specific
questions about the world while data science is focusing on how to manipulate data efficiently and
effectively.

The primary focus is not which questions to ask of the data but how we can answer them, whatever they
may be. It is more like computer science and mathematics than it is like natural sciences, in this way. It isn’t
so much about studying the natural world as it is about how to compute data efficiently. Included in data
science is the design of experiments. With the right data, we can address the questions we are interested in.
With a poor design of experiments or a poor choice of which data we gather, this can be difficult. Study
design might be the most important aspect of data science. In this module the focus on the analysis of
data, once gathered.

Computer science is also mainly the study of computations—as is hinted at in the name—but is a bit
broader in this focus. Although datalogy, an earlier name for data science, was also suggested for
computer science, and for example in Denmark it is the name for computer science, using the name
“computer science” puts the focus on computation while using the name “data science” puts the focus on
data. But of course, the fields overlap.

If you are writing a sorting algorithm, are you then focusing on the computation or the data? Is that even
a meaningful question to ask? There is a huge overlap between computer science and data science and
naturally the skill sets you need overlap as well. To efficiently manipulate data, you need the tools for
doing that, so computer programming skills are a must and some knowledge about algorithms and data
structures usually is as well.

For data science, though, the focus is always on the data. In a data analysis project, the focus is on how
the data flows from its raw form through various manipulations until it is summarized in some useful form.
Although the difference can be subtle, the focus is not about what operations a program does during the
analysis, but about how the data flows and is transformed data, what purpose those changes serve, and
how they help us gain knowledge about the data. It is as much about deciding what to do with the data as
it is about how to do it efficiently.

Statistics is of course also closely related to data science. So closely linked, in fact, that many consider data
science just a fancy word for statistics that looks slightly more modern and sexier. I can’t say that I strongly
disagree with this—data science does sound sexier than statistics—but just as data science is slightly
different from computer science, data science is also slightly different from statistics. Just, perhaps,
somewhat less different than computer science is. A large part of doing statistics is building mathematical
models for your data and fitting the models to the data to learn about the data in this way. That is also
what we do in data science. As long as the focus is on the data, I am happy to call statistics data science.

If the focus changes to the models and the mathematics, then we are drifting away from data science into
something else—just as if the focus changes from the data to computations we are drifting from data
science to computer science.

Data science is also related to machine learning and artificial intelligence, and again there are huge
overlaps. Perhaps not surprising since something like machine learning has its home both in computer
science and in statistics; if it is focusing on data analysis, it is also at home in data science. To be honest, it
has never been clear to me when a mathematical model changes from being a plain old statistical model
to becoming machine learning anyway.

Abstract Data Type (ADT)

- set of data values and associated operations that are precisely specified independent of any
implementation.

- organized collection of information and a set of operations used to manage that information

- set of operations defines the interface of the ADT

It satisfies the following conditions:

1. The representation or definition of the type and the operations are contained in a single syntactic unit.

2. The representation of objects of the type is hidden from the program units that use the type, so only
direct operations possible on those objects are those provided in the type's definition.

Data Structures

Characteristic features are:

1. It contains component data items, which may be atomic or another data structure (still a domain)

2. A set of operations on one or more of the component items.

3. Defines rules as to how components relate to each other and to the structure as a whole (assertations)

Types:

1. Static data structure-has a fixed size

Ex. arrays

2. Dynamic data structure-grows and shrinks at execution time as required by its contents. It is
implemented using links.

3. Linear data structure -every component has a unique predecessor and successor except first and last
elements.

4. Non-linear data structure- no such restriction is there as elements may be arranged in any desired
fashion restricted by the way we use to represent such types.
Module 3-Mathematical Preliminaries

At the end of this module, you as and are expected t0:


1. Define sets and relations
2. Perform operations on matrices.
3. Identify the inverse of a matrix.

Sets and Relations


set-collection of objects referred to as elements. The elements making up a set are assumed to be distinct

empty set or null set or void set -set with no elements denoted by { }

equal sets-sets having identical elements

disjoint sets -sets having no common element/s

Cartesian product of two sets-set of all the ordered pair of sets X and Y

Ex. X={ 1,2} Y={a,b} then X x Y={[1,a},{1,b},{2,a},{2,b}}

Relation-elements of the first set relate to the elements of a second set

A binary relation R is a subset of the cartesian product of the two sets

Domain-the set of all values in the first set (X)

Range-the set of all values in the second set (Y)

Ex. Let X= { 2,3,4} Y={3,4,5,6,7}

(x,y) is an element of R if x divides y

R= { (2,4),(2,6),(3,3), (3,6),(4,4)}

Domain is {2,3,4}

Range is {3,4,6}

Sequence-is a special type of a function in which the domain is a set of consecutive integers

A string over X ,where X is a finite set is a finite sequence of elements from X

Course Module
Ex. X={ a,b,c} then a string may be baac or acab, Order is taken into account

Repetitions in a string can be specified by superscriipts for example the string bbaaac may be written
b^2a^3c

null string-string with no elements indicated by λ

X^*-denote the t set of strings over X including the null string

X^+ -denote the set of all non null strings over X

The length of a string α is denoted by /α/ which refers to the number of elements

Ex. if α=aabab and β=a^3b^4a^32 then

/ α /=5 and /β/=39

the string consisting of α followed by β written α β is called their concatenation

A string β is a substring of the string α if there are strings γ and δ with α = γβ δ

Vector Algebra
matrix- rectangular array of data represented by capital letters. If A is a matrix the number of m rows
and n columns determines the size written as m x n. It is either enclosed by parenthesis or bracket.

Operations:

Addition and Subtraction of matrices: It can only be made possible if the matrices are of the same
size. Addition and subtraction is done by adding and subtracting corresponding entries.

Scalar Multiplication: Obtained by multiplying a fixed number to each entry.

Multiplication of matrices: To multiply any two matrices ,the number of columns of the first must be
equal to the number of rows of the second. A matrix with a size of 3x2 and a 2x3 yields a 3x3 matrix

Transpose of a matrix: If the entries in the rows and columns are interchanged.Uses the symbol A^T.

Matrix raised to an exponent p: M^p is equal to the matrix product taken p times

Inverse of a matrix: It exist if and only if the the matrix is invertible such that ad-bc is not equal to 0.
However the inverse does not exist if the matrix is NOT invertible.

References and Supplementary Materials


Books and Journals
1. Sanjiv Ranjan Das; 2016; Data Science :Theories ,Models ,Algorithms and Analytics ; S.
R. Das
2. Richard Johnsonbough; 2005; Introduction to Discrete Mathematics; Pearson
Education South Asia Pacific

Course Module
Math 6200 / Data Analyis
1
]

Module 4-Algorithm Analysis

At the end of this module, you as and are expected t0:


1. Define an algorithm analysis.
2. Distinguish different case time for algorithms.
3. Identify the cost models for different algorithms.

Algorithm Analysis- refers to the process of deriving estimates for the time and space needed to
execute the algorithm

Time needed to execute an algorithm is a function of the input.

best-case time-minimum time to execute the algorithm

worst-case time-maximum time needed to execute the algorithm

average-case time-average time to execute the algorithm

Analysis of algorithms

For looking up a given entry in a given ordered list, both the binary and the linear search algorithm (which
ignores ordering) can be used. The analysis of the former and the latter algorithm shows that it takes at
most log2(n) and n check steps, respectively, for a list of length n. In the depicted example list of length 33,
searching for "Morin, Arthur" takes 5 and 28 steps with binary (shown in cyan) and linear (magenta) search,
respectively.

Course Module
Graphs of functions commonly used in the analysis of algorithms, showing the number of
operations N versus input size n for each function
In computer science, the analysis of algorithms is the process of finding the computational complexity of
algorithms – the amount of time, storage, or other resources needed to execute them. Usually, this
involves determining a function that relates the length of an algorithm's input to the number of steps it
takes (its time complexity) or the number of storage locations it uses (its space complexity). An algorithm
is said to be efficient when this function's values are small, or grow slowly compa red to a growth in the
size of the input. Different inputs of the same length may cause the algorithm to have different behavior,
so best, worst and average case descriptions might all be of practical interest. When not otherwise
specified, the function describing the performance of an algorithm is usually an upper bound, determined
from the worst case inputs to the algorithm.

The term "analysis of algorithms" was coined by Donald Knuth.[1] Algorithm analysis is an important part of
a broader computational complexity theory, which provides theoretical estimates for the resources needed
by any algorithm which solves a given computational problem. These estimates provide an insight into
reasonable directions of search for efficient algorithms.

In theoretical analysis of algorithms it is common to estimate their complexity in the asymptotic sense, i.e.,
to estimate the complexity function for arbitrarily large input. Big O notation, Big-omega
notation and Big-theta notation are used to this end. For instance, binary search is said to run in a number
of steps proportional to the logarithm of the length of the sorted list being searched, or in O(log(n)),
colloquially "in logarithmic time". Usually asymptotic estimates are used because
different implementations of the same algorithm may differ in efficiency. However the efficiencies of any
two "reasonable" implementations of a given algorithm are related by a constant multiplicative factor
called a hidden constant.

Exact (not asymptotic) measures of efficiency can sometimes be computed but they usually require certain
assumptions concerning the particular implementation of the algorithm, called model of computation. A
model of computation may be defined in terms of an abstract computer, e.g., Turing machine, and/or by
postulating that certain operations are executed in unit time. For example, if the sorted list to which we
apply binary search has n elements, and we can guarantee that each lookup of an element in the list can
be done in unit time, then at most log 2 n + 1 time units are needed to return an answer.

Cost models

Time efficiency estimates depend on what we define to be a step. For the analysis to correspond usefully
to the actual execution time, the time required to perform a step must be guaranteed to be bounded
above by a constant. One must be careful here; for instance, some analyses count an addition of two
numbers as one step. This assumption may not be warranted in certain contexts. For example, if the
numbers involved in a computation may be arbitrarily large, the time required by a single addition can no
longer be assumed to be constant.

Two cost models are generally used:[2][3][4][5][6]


Math 6200 / Data Analyis
3
]

 the uniform cost model, also called uniform-cost measurement (and similar variations), assigns a
constant cost to every machine operation, regardless of the size of the numbers involved
 the logarithmic cost model, also called logarithmic-cost measurement (and similar variations),
assigns a cost to every machine operation proportional to the number of bits involved

The latter is more cumbersome to use, so it's only employed when necessary, for example in the analysis
of arbitrary-precision arithmetic algorithms, like those used in cryptography.

A key point which is often overlooked is that published lower bounds for problems are often given for a
model of computation that is more restricted than the set of operations that you could use in practice and
therefore there are algorithms that are faster than what would naively b e thought possible.[7]

Run-time analysis
Run-time analysis is a theoretical classification that estimates and anticipates the increase in running
time (or run-time) of an algorithm as its input size (usually denoted as n) increases. Run-time efficiency is a
topic of great interest in computer science: A program can take seconds, hours, or even years to finish
executing, depending on which algorithm it implements. While software profiling techniques can be used
to measure an algorithm's run-time in practice, they cannot provide timing data for all infinitely many
possible inputs; the latter can only be achieved by the theoretical methods of run-time analysis.

Shortcomings of empirical metrics

Since algorithms are platform-independent (i.e. a given algorithm can be implemented in an


arbitrary programming language on an arbitrary computer running an arbitrary operating system), there
are additional significant drawbacks to using an empirical approach to gauge the comparative
performance of a given set of algorithms.

Take as an example a program that looks up a specific entry in a sorted list of size n. Suppose this
program were implemented on Computer A, a state-of-the-art machine, using a linear search algorithm,
and on Computer B, a much slower machine, using a binary search algorithm. Benchmark testing on the
two computers running their respective programs might look something like the following:

Computer A run-time Computer B run-time


n (list size)
(in nanoseconds) (in nanoseconds)
16 8 100,000
63 32 150,000
250 125 200,000
1,000 500 250,000

Course Module
References and Supplementary Materials
Books and Journals
1. Sanjiv Ranjan Das; 2016; Data Science :Theories ,Models ,Algorithms and Analytics ; S.
R. Das
2. Richard Johnsonbough; 2005; Introduction to Discrete Mathematics; Pearson
Education South Asia Pacific
Module 7-Statistical Computations

At the end of this module, you are expected t0:


1. Compute for the mean, median and mode.
2. Compute for measures of variability.
3. Interpret the statistical measures obtained.

A. Measures of central tendencies

Mean-sum of the numbers divided by n

mean=Σ x/n =495/6=82.5

Ex. 92,84.65.76.88.90

Median-of a ranked list of n numbers is

-the middle number if n is odd


-the mean of the two middle numbers if n is even
Ex. 1,4,8,9,12.14.21 n=7 (odd) Median =9
23,46,77,89.92,108 n=6 (even) Median=77+89/2=83he

Mode-the number that occur most frequently

Ex. 18,15,21,16,15,14,15,21 Mode=15


unimodal -one mode
bimodal-two modes
trimodal-three modes
multimodal-many modes
If all the scores appeared once then there is NO mode.

B.Measures of Dispersion/Spread/Variation

Range- is the difference between the highest and lowest value

Standard deviation-amount by which each individual value deviates


from the mean
s=SQRT Σ (x-Mean)^2/n-1 (sample)
σ=sqrtΣ(x-Mean)^2/n (population)

Ex. 2,4,7,12,15 Mean=8 s=5.43


variance-square of the standard deviation

Course Module
For above data variance=29,5

C.Measures of Relative position

Z-score-for a given data value x is the number of standard deviations


that x is above or below the mean of data
z-score= x-mean/s
Ex. x=950 s=90 mean=842
z-score = 950-842/90=1.2

Percentile os score x=number of data values less than x/total number of


data values x100
Ex. On a reading examination given to 900 students. Elaine's score of 602
was higher than the scores of 576 of the students who took the
examination.What is the percentile for Elaine's score.

percentile=576/900 x100=64
Elaine's score places her at the 64th percentile

Quartiles-this partition a ranked data set into four equal groups.


Q1 Q2 Q 3 Q2=Median
Q1 covers 25% Q3 covers 75%
Ex. 26 32 33 40 42 43 48 Q2=median =40 Q1=32 Q3=43

<Figure 1. This is a sample caption>

References and Supplementary Materials


Books and Journals
1. Sanjiv Ranjan Das; 2016; Data Science :Theories ,Models ,Algorithms and Analytics ; S.
R. Das
2. Richard Johnsonbough; 2005; Introduction to Discrete Mathematics; Pearson
Education South Asia Pacific
Course Module
A vegetable distributor knows that during the month of August ,the weights of tomatoes
are normally distributed with a mean of 0.61 lb and a standard deviation of 0.15 lb. How
many can be expected to weigh between 0.31 to 0.91 in a shipment of 4500 tomatoes.

Select one:
a. 4100
b. 4000
c. 4275

d. 4215
Feedback

Your answer is correct.

Question 2
Incorrect
Mark 0.00 out of 1.00

Flag question

Question text

Which is NOT a correct correlation Coefficient?

Select one:
a. 0.9
b. 0.56
c. -0,43

d. 1.2
Question 3
Correct
Mark 1.00 out of 1.00

Flag question

Question text
A vegetable distributor knows that during the month of August ,the weights of tomatoes
are normally distributed with a mean of 0.61 lb and a standard deviation of 0.15 lb. What
percent of the tomatoes weigh less than 0.71 lb?

Select one:
a. 95
b. 97
c. 84

d. 85
Feedback

Your answer is correct.

Question 4
Correct
Mark 1.00 out of 1.00

Flag question

Question text

Example of a data product.

Select one:
a. google search
b. google games
c. google drive

d. google map
Feedback

Your answer is correct.

Question 5
Correct
Mark 1.00 out of 1.00
Flag question

Question text

A bell-shaped distribution that is symmetric about a vertical line.

Select one:
a. normal
b. kurtic
c. skewed

d. standard
Feedback

Your answer is correct.

Question 6
Correct
Mark 1.00 out of 1.00

Flag question

Question text

A perfect positive correlation coefficient is equal to

Select one:
a. 0
b. -1
c. 1

d. 2
Feedback

Your answer is correct.

Question 7
Correct
Mark 1.00 out of 1.00

Flag question

Question text

According to Hilary Mason which is NOT a skill that a good data scientist must cultivate.

Select one:
a. critical thinking
b. communication
c. coding

d. math and stat


Feedback

Your answer is correct.

Question 8
Correct
Mark 1.00 out of 1.00

Flag question

Question text

A graph that is used to indicate frequency distribution.

Select one:
a. ogive
b. bar graph
c. pie graph

d. histogram
Feedback

Your answer is correct.


Question 9
Correct
Mark 1.00 out of 1.00

Flag question

Question text

What increases data volume?

Select one:
a. velocity
b. variety
c. vastness

d. viscosity
Feedback

Your answer is correct.

Question 10
Correct
Mark 1.00 out of 1.00

Flag question

Question text

He coined the term "data scientist"

Select one:
a. J Pastor
b. G.Cantor
c. N.R. Drops

d. DJ Patil
Feedback
Your answer is correct.

Question 11
Correct
Mark 1.00 out of 1.00

Flag question

Question text

As of 2014,there are _______million of tweets a day.

Select one:
a. 500
b. 200
c. 300

d. 400
Feedback

Your answer is correct.

Question 12
Correct
Mark 1.00 out of 1.00

Flag question

Question text

The following are elements in an analytic plan EXCEPT

Select one:
a. analytic models
b. decision support tools
c. interlinked data output

d. graphs
Feedback

Your answer is correct.

Question 13
Correct
Mark 1.00 out of 1.00

Flag question

Question text

The middle-most value in a ranked list of numbers.

Select one:
a. percentile
b. mode
c. median

d. mean
Feedback

Your answer is correct.

Question 14
Correct
Mark 1.00 out of 1.00

Flag question

Question text

He is someone who asks interesting questions on formal and informal theory.

Select one:
a. data analyst
b. data expert
c. data scientist
d. data drive
Feedback

Your answer is correct.

Question 15
Correct
Mark 1.00 out of 1.00

Flag question

Question text

The quantification of data into information.

Select one:
a. datafication
b. analytics
c. dataology

d. mining
Feedback

Your answer is correct.

Question 16
Correct
Mark 1.00 out of 1.00

Flag question

Question text

It list the percent of data in a distribution.

Select one:
a. percent distribution
b. relative distribution
c. frequency distribution

d. relative frequency distribution


Feedback

Your answer is correct.

Question 17
Correct
Mark 1.00 out of 1.00

Flag question

Question text

Data is NOT information unless we add_________.

Select one:
a. depth
b. velocity
c. volume

d. analytics
Feedback

Your answer is correct.

Question 18
Correct
Mark 1.00 out of 1.00

Flag question

Question text

It partitions a ranked data into four equal groups.

Select one:
a. mean
b. median
c. percentile

d. quartile
Feedback

Your answer is correct.

Question 19
Correct
Mark 1.00 out of 1.00

Flag question

Question text

The creation of a data product contains 3 components EXCEPT

Select one:
a. time
b. process
c. technical expertise

d. data
Feedback

Your answer is correct.

Question 20
Correct
Mark 1.00 out of 1.00

Flag question

Question text

Who said that "The future is not google-able " ?

Select one:
a. Dennis Grant
b. Roland Patil
c. William Gillason

d. Wiliam Harvey
Feedback

Your answer is correct.

Question 21
Correct
Mark 1.00 out of 1.00

Flag question

Question text

The major outcome of correlation.

Select one:
a. prediction
b. interpretation
c. analysis

d. critical thinking
Feedback

Question 23
Correct
Mark 1.00 out of 1.00

Flag question

Question text

The difference between the highest and lowest value.

Select one:
a. variance
b. deviation
c. range

d. mean
Feedback

Your answer is correct.

Question 24
Correct
Mark 1.00 out of 1.00

Flag question

Question text

It expands available data enormously.

Select one:
a. text
b. text mining
c. volume

d. sorting
Feedback

Your answer is correct.


Module 6-Fundamentals of Data Science

At the end of this module, you are expected t0:


1. Define data science.
2. Determine the V’s of data science.
3. Identify the meaning of datafication.

The Art of Data Science — “All models are wrong, but some are useful.”
George E. P. Box and N.R. Draper in “Empirical Model Building and Response
Surfaces,” John Wiley & Sons, New York, 1987. So you want to be a “data
scientist”? There is no widely accepted definition of who a data scientist is.1
Several books now attempt to 1 The term “data scientist” was coined by D.J.
Patil. He was the Chief Scientist for LinkedIn.

In 2011 Forbes placed him second in their Data Scientist List, just behind
Larry Page of Google. define what data science is and who a data scientist
may be, see Patil (2011), Patil (2012), and Loukides (2012). This book’s
viewpoint is that a data scientist is someone who asks unique, interesting
questions of data based on formal or informal theory, to generate rigorous
and useful insights.2 It is likely to be an individual with multi-disciplinary
train- 2 To quote Georg Cantor - “In mathematics the art of proposing a
question must be held of higher value than solving it.” ing in computer
science, business, economics, statistics, and armed with the necessary
quantity of domain knowledge relevant to the question at hand. The potential
of the field is enormous for just a few well-trained data scientists armed with
big data have the potential to transform organizations and societies. In the
narrower domain of business life, the role of the data scientist is to generate
applicable business intelligence. Among all the new buzzwords in business –
and there are many – “Big Data” is one of the most often heard. The
burgeoning social web, and the growing role of the internet as the primary
information channel of business, has generated more data than we might
imagine. Users upload an hour of video data to YouTube every second.3 87%
of the U.S. 3 Mayer-Schönberger and Cukier (2013), p8. They report that
USC’s Martin Hilbert calculated that more than 300 exabytes of data storage
was being used in 2007, an exabyte being one billion gigabytes, i.e., 1018
bytes, and 260 of binary usage. population has heard of Twitter, and 7% use
it.4 Forty-nine percent of 4 In contrast, 88% of the population has heard of
Facebook, and 41% use it. See www.convinceandconvert.com/ 7-surprising-
statistics-about -twitter-in-america/. Half of Twitter users are white, and of the
remaining half, half are black. Twitter users follow some brand or the other,
Course Module
hence the reach is enormous, and, as of 2014, there are more then 500 million
tweets a day. But data is not information, and until we add analytics, it is just
noise. And more, bigger, data may mean more noise and does not mean
better data. In many cases, less is more, and we need models as well. That is
what this book is about, it’s about theories and models, with or without data,
26 data science: theories, models, algorithms, and analytics big or small. It’s
about analytics and applications, and a scientific approach to using data
based on well-founded theory and sound business judgment. This book is
about the science and art of data analytics. Data science is transforming
business. Companies are using medical data and claims data to offer
incentivized health programs to employees. Caesar’s Entertainment Corp.
analyzed data for 65,000 employees and found substantial cost savings.
Zynga Inc, famous for its game Farmville, accumulates 25 terabytes of data
every day and analyzes it to make choices about new game features. UPS
installed sensors to collect data on speed and location of its vans, which
combined with GPS information, reduced fuel usage in 2011 by 8.4 million
gallons, and shaved 85 million miles off its routes.5 McKinsey argues that a
successful data 5 “How Big Data is Changing the Whole Equation for
Business,” Wall Street Journal March 8, 2013. analytics plan contains three
elements: interlinked data inputs, analytics models, and decision-support
tools.6 In a seminal paper, Halevy, Norvig 6 “Big Data: What’s Your Plan?”
McKinsey Quarterly, March 2013. and Pereira (2009), argue that even simple
theories and models, with big data, have the potential to do better than
complex models with less data. In a recent talk7 well-regarded data scientist
Hilary Mason empha- 7 At the h2o world conference in the Bay Area, on 11th
November 2015. sized that the creation of “data products” requires three
components: data (of course) plus technical expertise (machine-learning) plus
people and process (talent). Google Maps is a great example of a data
product that epitomizes all these three qualities. She mentioned three skills
that good data scientists need to cultivate: (a) in math and stats, (b) coding,
(c) communication. I would add that preceding all these is the ability to ask
relevant questions, the answers to which unlock value for companies,
consumers, and society. Everything in data analytics begins with a clear
problem statement, and needs to be judged with clear metrics. Being a data
scientist is inherently interdisciplinary. Good questions come from many
disciplines, and the best answers are likely to come from people who are
interested in multiple fields, or at least from teams that co-mingle varied skill
sets. Josh Wills of Cloudera stated it well - “A data scientist is a person who is
better at statistics than any software engineer and better at software
engineering than any statistician.” In contrast, complementing data scientists
are business analytics people, who are more familiar with business models
and paradigms and can ask
4. 1.1 Volume, Velocity, Variety There are several "V"s of big data: three of
these are volume, velocity, variety.8 Big data exceeds the storage capacity
of conventional databases. 8 This nomenclature was originated by the
Gartner group in 2001, and has been in place more than a decade. This is
it’s volume aspect. The scale of data generation is mind-boggling.
Google’s Eric Schmidt pointed out that until 2003, all of human kind had
generated just 5 exabytes of data (an exabyte is 10006 bytes or a
billionbillion bytes). Today we generate 5 exabytes of data every two days.
The main reason for this is the explosion of “interaction” data, a new
phenomenon in contrast to mere “transaction” data. Interaction data
comes from recording activities in our day-to-day ever more digital lives,
such as browser activity, geo-location data, RFID data, sensors, personal
digital recorders such as the fitbit and phones, satellites, etc. We now live
in the “internet of things” (or iOT), and it’s producing a wild quantity of
data, all of which we seem to have an endless need to analyze. In some
quarters it is better to speak of 4 Vs of big data, as shown in Figure 1.1.
Figure 1.1: The Four Vs of Big Data. A good data scientist will be adept at
managing volume not just technically in a database sense, but by building
algorithms to make intelli- 28 data science: theories, models, algorithms,
and analytics gent use of the size of the data as efficiently as possible.
Things change when you have gargantuan data because almost all
correlations become significant, and one might be tempted to draw
spurious conclusions about causality. For many modern business
applications today extraction of correlation is sufficient, but good data
science involves techniques that extract causality from these correlations
as well. In many cases, detecting correlations is useful as is. For example,
consider the classic case of Google Flu Trends, see Figure 1.2. The figure
shows the high correlation between flu incidence and searches about “flu”
on Google, see Ginsberg et. al. (2009). Obviously searches on the key word
“flu” do not result in the flu itself! Of course, the incidence of searches on
this key word is influenced by flu outbreaks. The interesting point here is
that even though searches about flu do not cause flu, they correlate with
it, and may at times even be predictive of it, simply because searches lead
the actual reported levels of flu, as those may occur concurrently but take
time to be reported. And whereas searches may be predictive, the cause of
searches is the flu itself, one variable feeding on the other, in a repeat
cycle.9 Hence, prediction is a major outcome of corre- 9 Interwoven time
series such as these may be modeled using Vector AutoRegressions, a
technique we will encounter later in this book. lation, and has led to the
recent buzz around the subfield of “predictive analytics.” There are entire
conventions devoted to this facet of correlation, such as the wildly popular
PAW (Predictive Analytics World).10 10 May be a futile collection of
Course Module
people, with non-working crystal balls, as William Gibson said - “The
future is not google-able.” Pattern recognition is in, passe causality is out.
Figure 1.2: Google Flu Trends. The figure shows the high correlation
between flu incidence and searches about “flu” on Google. The orange
line is actual US flu activity, and the blue line is the Google Flu Trends
estimate. Data velocity is accelerating. Streams of tweets, Facebook
entries, financial information, etc., are being generated by more users at
an ever increasing pace. Whereas velocity increases data volume, often
exponentially, it might shorten the window of data retention or
application. For example, high-frequency trading relies on micro-second
information and streams of data, but the relevance of the data rapidly
decays. the art of data science 29 Finally, data variety is much greater than
ever before. Models that relied on just a handful of variables can now avail
of hundreds of variables, as computing power has increased. The scale of
change in volume, velocity, and variety of the data that is now available
calls for new econometrics, and a range of tools for even single questions.
This book aims to introduce the reader to a variety of modeling concepts
and econometric techniques that are essential for a well-rounded data
scientist. Data science is more than the mere analysis of large data sets. It
is also about the creation of data. The field of “text-mining” expands
available data enormously, since there is so much more text being
generated than numbers. The creation of data from varied sources, and its
quantification into information is known as “datafication.” 1.2 Machine
Learning Data science is also more than “machine learning

<Figure 1. This is a sample caption>

References and Supplementary Materials


Books and Journals
1. Sanjiv Ranjan Das; 2016; Data Science :Theories ,Models ,Algorithms and Analytics ; S.
R. Das
2. Richard Johnsonbough; 2005; Introduction to Discrete Mathematics; Pearson
Education South Asia Pacific

Course Module

You might also like