CC Unit - 4 Imp Questions

This document contains questions and explanations about various topics related to cloud computing, big data analytics, and machine learning. It discusses data preprocessing steps like cleaning, transformation, and reduction. It also defines sampling, outlier detection and treatment, data standardization, and common data issues like missing values. Additionally, it provides brief overviews of predictive analytics, descriptive analytics, social network analysis, and machine learning algorithms like linear regression, decision trees, and neural networks. The final questions cover association rules, sequence rules, social network learning, and relational neighbor classification.

Uploaded by

Sahana Urs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views4 pages

CC Unit - 4 Imp Questions

Uploaded by

Sahana Urs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Cloud Computing and Big Data analytics

Unit – 4
Important questions

1.What is data pre – processing? Explain the steps

Data preprocessing is the process of transforming raw data into an understandable format. The
quality of the data should be checked before applying machine learning or data mining
algorithms.
Steps Involved in Data Preprocessing:

• Data Cleaning: The data can have many irrelevant and missing parts. ...
• Data Transformation: This step is taken in order to transform the data in appropriate
forms suitable for mining process. ...
• Data Reduction: Since data mining is a technique that is used to handle huge amount of
data.
2.What are sampling? Explain the types of data elements

Sampling is a process used in statistical analysis in which a predetermined number of

observations are taken from a larger population. The methodology used to sample from a
larger population depends on the type of analysis being performed, but it may include simple
random sampling or systematic sampling.

Big data also encompasses a wide variety of data types, including the following:
structured data, such as transactions and financial records;
unstructured data, such as text, documents and multimedia files; and.
semi structured data, such as web server logs and streaming data from sensors.
3.Explain outlier detection and treatment

An outlier may also be explained as a piece of data or observation that deviates drastically from
the given norm or average of the data set. ... Therefore, Outlier Detection may be defined as
the process of detecting and subsequently excluding outliers from a given set of data
ways to deal with outliers in data

Set up a filter in your testing tool. Even though this has a little cost, filtering out outliers is
worth it. ...
Remove or change outliers during post-test analysis. ...
Change the value of outliers. ...
Consider the underlying distribution. ...

Consider the value of mild outliers.

4.Explain Standardizing data?
Data standardization is the process of bringing data into a uniform format that allows analysts
and others to research, analyze, and utilize the data. In statistics, standardization refers to the
process of putting different variables on the same scale in order to compare scores between
different types of variables.
5.Explain the following

• Missing value - Missing value occurs when there is no data value for a variable in an
observation. The phenomenon of missing value is universal in clinical researches
involving big data.
• Categorization - Categorization is the exercise of creating meaningful categories for a
particular variable/feature in the dataset, for better inference/understanding.
• Weights and evidence coding - The weight of evidence approach means that you use a
combination of information from several independent sources to give sufficient
evidence to fulfil an information requirement.
• Variable selection - Variable selection is a collection of candidate model variables tested
for significance during model training. Candidate model variables are also known as
independent variables, predictors, attributes, model factors, covariates, regressors,
features, or characteristics.
• Segmentation - Segmentation refers to the act of segmenting data according to your
company's needs in order to refine your analyses based on a defined context, using a
tool for cross-calculating analyses. In concrete terms, a segment enables you to filter
your analyses based on certain elements (single or combined).
6.Explain the predictive analytics in briefly
Predictive analytics is the use of data, statistical algorithms and machine learning techniques to
identify the likelihood of future outcomes based on historical data. The goal is to go beyond
knowing what has happened to providing a best assessment of what will happen in the future.

Though predictive analytics has been around for decades, it's a technology whose time has
come. More and more organizations are turning to predictive analytics to increase their bottom
line and competitive advantage. Why now?
• Growing volumes and types of data, and more interest in using data to produce valuable
insights.
• Faster, cheaper computers.
• Easier-to-use software.
• Tougher economic conditions and a need for competitive differentiation.

With interactive and easy-to-use software becoming more prevalent, predictive analytics is no
longer just the domain of mathematicians and statisticians. Business analysts and line-of-
business experts are using these technologies as well.
7.Explain the descriptive analytics in briefly

Descriptive analytics is the interpretation of historical data to better understand changes that
have occurred in a business. Descriptive analytics describes the use of a range of historic data to
draw comparisons. Most commonly reported financial metrics are a product of descriptive
analytics, for example, year-over-year pricing changes, month-over-month sales growth, the
number of users, or the total revenue per subscriber. These measures all describe what has
occurred in a business during a set period.
KEY TAKEAWAYS
Descriptive analytics is the process of parsing historical data to better understand the changes
that have occurred in a business.

Using a range of historic data and benchmarking, decision-makers obtain a holistic view of
performance and trends on which to base business strategy.
Descriptive analytics can help to identify the areas of strength and weakness in an organization.
Examples of metrics used in descriptive analytics include year-over-year pricing changes,
month-over-month sales growth, the number of users, or the total revenue per subscriber.
Descriptive analytics is now being used in conjunction with newer analytics, such as predictive
and prescriptive analytics.
In its simplest form, descriptive analytics answers the question, "What happened?"

8.Explain the social network analytics in briefly

Social network analysis (SNA) is the application of graph theory to understand, categorize and
quantify relationships in a social network. ... Algorithms for network metrics calculation are
complex and that makes SNA difficult to implement in big data environments on large datasets
with many nodes and edges.
Analyzing these networks, through Social Network Analysis (SNA) allow us to better understand
how individuals are connected, and most importantly, how information flows – this is critical for
improving communication and mobilizing knowledge.

9.Explain the following

• Linear regression - Linear regression analysis is used to predict the value of a variable
based on the value of another variable. The variable you want to predict is called the
dependent variable. The variable you are using to predict the other variable's value is
called the independent variable.
• Decision tree - A Decision Tree is an algorithm used for supervised learning problems such as
classification or regression. A decision tree or a classification tree is a tree in which each internal
(nonleaf) node is labeled with an input feature.
• Neural networks - A neural network is a series of algorithms that endeavors to recognize
underlying relationships in a set of data through a process that mimics the way the human brain
operates. In this sense, neural networks refer to systems of neurons, either organic or artificial
in nature.
• Association rule - Association rules are created by searching data for frequent if-then
patterns and using the criteria support and confidence to identify the most important
relationships.
• Sequence rule - A sequence rule consists of a previous sequence in the rule body that
leads to a consecutive item set in the rule head. The consecutive item set occurs after a
particular period of time. Sequence rules, sequences, and item sets have various
characteristics.
• Social network leaning - Social networking big data is a collection of extremely big data
sets with great diversity in social networks. Social networking big data is also a core
component for the social influence analysis and the security.
10.Explain relational Neighbours classification?
A relation model is based on the idea that the behavior between nodes is correlated, meaning
that connected nodes have a propensity to belong to the same class. The relational neighbor
classifier, in particular, predicts a node's class based on its neighboring nodes and adjacent
edges.
The relational neighbor classifier, in particular, predicts a node's class based on its neighboring
nodes and adjacent edges. The dataset transfers consist of transactions from different
accounts. The account info data contains which of these accounts are money mules.

Business Research Methods Full
100% (1)
Business Research Methods Full
275 pages
Notes - KCS 061 Big Data Unit 1
No ratings yet
Notes - KCS 061 Big Data Unit 1
25 pages
Handbook of Regression Methods
100% (5)
Handbook of Regression Methods
654 pages
Data Analytics Chapter - 1
No ratings yet
Data Analytics Chapter - 1
42 pages
Da CH1 Slqa
No ratings yet
Da CH1 Slqa
6 pages
Big Data
No ratings yet
Big Data
5 pages
DA (All CHP.)
No ratings yet
DA (All CHP.)
14 pages
Project
No ratings yet
Project
17 pages
2marks With Answers
No ratings yet
2marks With Answers
10 pages
Module - 03
No ratings yet
Module - 03
28 pages
PI Kit - MBA Admissions 2023
No ratings yet
PI Kit - MBA Admissions 2023
50 pages
Data Warehouse 1
No ratings yet
Data Warehouse 1
21 pages
Fda 1
No ratings yet
Fda 1
5 pages
Data Analytics - An Overview
No ratings yet
Data Analytics - An Overview
4 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Introduction To Data Analytics: Roberta Turra
No ratings yet
Introduction To Data Analytics: Roberta Turra
23 pages
Introduction To Data Mining For Business Analytics
No ratings yet
Introduction To Data Mining For Business Analytics
51 pages
Chapter-1 Introduction To Data Analytics
No ratings yet
Chapter-1 Introduction To Data Analytics
34 pages
Activity 1 PDF
No ratings yet
Activity 1 PDF
3 pages
Data Analytics An Overview
No ratings yet
Data Analytics An Overview
4 pages
Unit 1
No ratings yet
Unit 1
50 pages
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Big Data Assignments Answer
No ratings yet
Big Data Assignments Answer
15 pages
It 6001 Da 2 Marks With Answer PDF
No ratings yet
It 6001 Da 2 Marks With Answer PDF
10 pages
Big - Data Unit-2
100% (2)
Big - Data Unit-2
64 pages
PTDLKT
No ratings yet
PTDLKT
11 pages
The Predictive Analytics Model
No ratings yet
The Predictive Analytics Model
6 pages
Unitwise Imp Notes
No ratings yet
Unitwise Imp Notes
34 pages
Da - MP - 1
No ratings yet
Da - MP - 1
19 pages
U1 C CLSRM
No ratings yet
U1 C CLSRM
30 pages
BDA 2 Marks
No ratings yet
BDA 2 Marks
13 pages
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
No ratings yet
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
10 pages
Unit 6: Big Data Analytics Using R: 6.0 Overview
No ratings yet
Unit 6: Big Data Analytics Using R: 6.0 Overview
32 pages
IAT 2 Part A - DS
No ratings yet
IAT 2 Part A - DS
5 pages
Data Analytics 2marks PDF
100% (1)
Data Analytics 2marks PDF
13 pages
Whats App
No ratings yet
Whats App
23 pages
Data Science
No ratings yet
Data Science
14 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
86 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
33 pages
Week 1 Lecture
No ratings yet
Week 1 Lecture
26 pages
Lecture+Notes (Upgrad)
No ratings yet
Lecture+Notes (Upgrad)
5 pages
Data Analytics 1
No ratings yet
Data Analytics 1
4 pages
Analytics Methods
No ratings yet
Analytics Methods
40 pages
Short
No ratings yet
Short
8 pages
Dataanalytics 191124003453
No ratings yet
Dataanalytics 191124003453
32 pages
Data Mining Tasks
No ratings yet
Data Mining Tasks
24 pages
Q.1. What Is Data Mining?
No ratings yet
Q.1. What Is Data Mining?
15 pages
Chapter 01 2
No ratings yet
Chapter 01 2
19 pages
Kit 601 L Unit 1 240219102731 858108ce
No ratings yet
Kit 601 L Unit 1 240219102731 858108ce
35 pages
wk6 - Data Analytics
No ratings yet
wk6 - Data Analytics
25 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
Unit 2
No ratings yet
Unit 2
37 pages
FDS - 3 Solved
No ratings yet
FDS - 3 Solved
21 pages
Fdsa U 5
No ratings yet
Fdsa U 5
9 pages
Data Mining Models and Tasks
No ratings yet
Data Mining Models and Tasks
6 pages
IV Ai-Ds Ad3491 Fdsa QB Unit5
No ratings yet
IV Ai-Ds Ad3491 Fdsa QB Unit5
4 pages
365 Data
No ratings yet
365 Data
4 pages
DA - AKTU Short Answer + Differences
No ratings yet
DA - AKTU Short Answer + Differences
42 pages
Big Data Chapter 2
No ratings yet
Big Data Chapter 2
62 pages
BIA Notes
No ratings yet
BIA Notes
10 pages
DA unit-II
No ratings yet
DA unit-II
15 pages
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet
Policy-Making As Discourse: A Review of Recent Knowledge-To-Policy Literature
No ratings yet
Policy-Making As Discourse: A Review of Recent Knowledge-To-Policy Literature
37 pages
Inductive, Deductive and Abductive Reasoning
67% (3)
Inductive, Deductive and Abductive Reasoning
5 pages
Balocawehay, Abuyog, Leyte
No ratings yet
Balocawehay, Abuyog, Leyte
4 pages
Chapter 4 R (2017)
No ratings yet
Chapter 4 R (2017)
50 pages
S7 Research Design
No ratings yet
S7 Research Design
21 pages
Jurna Yelma
No ratings yet
Jurna Yelma
13 pages
Review-Validation of QSAR Models-Strategies and Importance
No ratings yet
Review-Validation of QSAR Models-Strategies and Importance
9 pages
PaghahandaatBalidasyonngSIM ASPILLAGAEDSEL
No ratings yet
PaghahandaatBalidasyonngSIM ASPILLAGAEDSEL
20 pages
The Fermi Paradox Where Is Everybody
No ratings yet
The Fermi Paradox Where Is Everybody
8 pages
Meta Synthesis Literature Review
100% (2)
Meta Synthesis Literature Review
5 pages
Annamalai University: Directorate of Distance Education
No ratings yet
Annamalai University: Directorate of Distance Education
256 pages
BondoDISTRICT Roadclass
No ratings yet
BondoDISTRICT Roadclass
1 page
Ebooks File (Ebook PDF) Statistics For Business Economics 13th Edition by David All Chapters
100% (5)
Ebooks File (Ebook PDF) Statistics For Business Economics 13th Edition by David All Chapters
56 pages
Definition of Terms Meaning in Research Paper
100% (1)
Definition of Terms Meaning in Research Paper
4 pages
Chapter Five:: Analyses and Interpretation of Data
No ratings yet
Chapter Five:: Analyses and Interpretation of Data
72 pages
Eco CH 1 Introduction PDF
No ratings yet
Eco CH 1 Introduction PDF
2 pages
MD Research Methodology - CBDC - Homoeopathy
No ratings yet
MD Research Methodology - CBDC - Homoeopathy
27 pages
Basic Core Curriculum
No ratings yet
Basic Core Curriculum
292 pages
Appendix 3 Sample Lab Report
No ratings yet
Appendix 3 Sample Lab Report
8 pages
Lesson 1 - Quatitative Research
No ratings yet
Lesson 1 - Quatitative Research
39 pages
SG 1
No ratings yet
SG 1
7 pages
Ac Lab Report
100% (1)
Ac Lab Report
4 pages
The Reincarnation of Edgar Cayce
100% (1)
The Reincarnation of Edgar Cayce
67 pages
Student's T-Test
100% (1)
Student's T-Test
11 pages
Non Experimental Designs
No ratings yet
Non Experimental Designs
12 pages
Hypothesis Testing: Measures of Difference Measures of Association
No ratings yet
Hypothesis Testing: Measures of Difference Measures of Association
12 pages
Sampling Techniques
No ratings yet
Sampling Techniques
2 pages
What Was Aristotle Doing in His Early Logic Anyway (A Reply To Woods and Hansen)
No ratings yet
What Was Aristotle Doing in His Early Logic Anyway (A Reply To Woods and Hansen)
9 pages

CC Unit - 4 Imp Questions

Uploaded by

CC Unit - 4 Imp Questions

Uploaded by

Cloud Computing and Big Data analytics

1.What is data pre – processing? Explain the steps

Sampling is a process used in statistical analysis in which a predetermined number of

Consider the value of mild outliers.

8.Explain the social network analytics in briefly

9.Explain the following

You might also like