0% found this document useful (0 votes)

95 views12 pages

1st Internal Solved

The document is a test paper for the 7th semester course on Big Data and Analytics. It contains 4 questions with multiple sub-questions. Question 1 asks students to describe data, web data, Big Data, and the 3V characteristics of Big Data. Question 2 explains massively parallel processing, cloud computing in Big Data, and issues like data noise and filtering. Questions 3 and 4 contain sub-questions on topics like data storage, Big Data usage in marketing, distributed databases, and Big Data analytics architectures. The test aims to evaluate students' understanding of fundamental Big Data concepts.

Uploaded by

Niroop K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views12 pages

1st Internal Solved

Uploaded by

Niroop K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

7th Semester (2018 Scheme)

SHREE DEVI INSTITUTE OF TECHNOLOGY

Kenjar, Mangalore -574142
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
I Internal Test – November 2021

Sem/Sec: VII A & B Max Marks: 30

Course Name : BIG DATA AND ANALYTICS Duration: 1 Hour
Course Code : 18CS72 Date: 15/11/2021

Question Marks RBT CO

Note: Answer any One Full Question from each part.
Number Level
PART- A
1 a. Describe the data, web data and Big Data. Explain the 3Vs 7M L2 CO1
characteristics of Big Data?

b. Define Big Data architecture. Draw five layers in architecture 8M L1 CO1

design and explain functions in each layer.
OR
2 a. Explain Massively Parallel Processing and Cloud Computing in Big 6M L2 CO1
Data scenario.

b. Explain data noise, outliers, data anomaly and duplicate data with 5M L2 CO1
example. Why filtering require during pre-processing.

c. Describe the pre-processing steps, data cleaning, transforming, 4M L2 CO1

modeling and visualizing data.
PART- B
3 a. With the figure show how data-store export using machines, files, 7M L2 CO1
computers, web servers and web services.

b. Describe the ways of usages of Big Data analytics in marketing, 8M L2 CO1

sales and advertising.
OR
4 a. Define distributed databases. How do they differ from distributed 5M L1 CO1
Data Stores?

b. Explain Traditional and Big Data analytics architecture reference 5M L1 CO1

model.

c. Describe how Big Data analytics facilitate in Healthcare & 5M L2 CO1

Medicine.
SCHEME OF EVALUATION:

Sem: VII Max. Marks:30

Course Name / Code: BIG DATA AND ANALYTICS (18CS72) Date: 15/11/2021

Q. Answers Marks
No
1a Describe the data, web data and Big Data. Explain the 3Vs characteristics of Big Data? 7M

Definitions of Data
 Data is information, usually in the form of facts or statistics that one can analyze or use
for further calculations.
 Data is information that can be stored and used by a computer program.
 Data is information presented in numbers, letters, or other form.
 Data is information from series of observations, measurements or facts.
 Data is information from series of behavioral observations, measurements or facts.

Definition of Web Data

Web data is the data present on the web servers (or enterprise severs) in the form of text,
images, videos, audios and multimedia files for web users. A user (client software) interacts
with this data. A client can access data of response from a server. Internet applications 3M
including web sites, web services, web portals, online business applications, emails, chats,
tweets and social networks provide and consume the web data.

Big Data Definitions

 Big Data is high-volume, high-velocity and/or high-variety information asset that
requires new forms of processing for enhanced decision making, insight discovery and
process optimization.
 A collection of data sets so large or complex that traditional data processing applications
are inadequate.
 Data of a very large size, typically to the extent that its manipulation and management
present significant logistical challenges.
 Data sets whose size is beyond the ability of typical database software tool to capture,
store, manage and analyze

Explain the 3Vs characteristics of Big Data

Volume which is related to size of the data. Size defines the amount or quantity of data, which is
generated from an application. The size determines the processing considerations needed for
handling that data.

Velocity refers to the speed of generation of data. It is a measure of how fast the data generates
and processes. To meet the demands and the challenges of processing Big Data, the velocity of 4M
generation of data plays a crucial role.

Variety Big Data comprises of a variety of data. Data is generated from multiple sources. The
variety is due to the availability of large number of heterogeneous platforms in the industry. It is
important characteristic that needs to be known for proper processing of data and helps in
effective use of data according to their formats.

Veracity take into account the quality of data captured. uncertain or imprecise data
1b Define Big Data architecture. Draw five layers in architecture design and explain functions 8M
in each layer.

Big Data architecture is defined as: “Big Data architecture is the logical and/or physical
layout/ structure of how Big Data will be stored, accessed and managed within a Big Data or 2M
IT environment.
Architecture logically defines how Big Data solution will work, the core components
(hardware, database, software, storage) used, flow of information, security and more.

Figure shows the logical layers and the functions which are considered in Big Data
architecture. Data processing architecture consists of five layers

1. Identification of data sources,

2. Acquisition, ingestion, extraction, pre-processing, transformation of data,
3. Data storage at files, severs, cluster or cloud,
4. Data-processing and
5. Data consumption in the number programs and tools.

Figure Design of logical layers in a data processing architecture and functions in the
layers.

Logical layer 1 (L1) is for identifying data sources, which are external, internal or both. L1
considers the following aspects in a design:
 Amount of data needed at ingestion layer 2 (L2)
 Push from L1 or pull by L2 as per the mechanism for the usages.
 Source data-types: Database, files, web or service
 Source formats, i.e., semi-structured, unstructured or structured.

The layer 2 (L2) is for data-ingestion. Ingestion is the process of obtaining and importing data
for immediate use or transfer. L2 considers the following aspects:
 Obtaining and importing data using ELT (Extract Load and Transform).
 Data Pre-processing (validation, transformation or transcoding) requirement.
 Data semantics (such as replace, append, aggregate, compact).
4M
 Ingestion and ETL processes either in batches or in real time, which means store and use
the data as generated. Batch processing is using discrete datasets at scheduled or periodic
intervals of time.

The L3 layer is for storage of data from L2 layer. L3 considers the following aspects:
 Data storage type (historical or incremental), formats, compression, frequency of
incoming data, querying patterns and data consumption requirements for L4 or L5.
 Data storage using Hadoop distributed file system or NoSQL data stores- HBase,
Cassandra, MongoDB.

L4 considers the following aspects:

 Data processing software such as MapReduce, Hive, Pig, Spark, Spark Mahout, Spark
Streaming.
 Processing in scheduled batches or real time or hybrid.
 Processing as per synchronous or asynchronous processing requirements at L5

L5 considers the consumption of data for the following:

 Data integration.
 Datasets usages for reporting and visualization.
 Datasets usages for Analytics (real time, near real time, scheduled batches), BPs, BIs,
knowledge discovery.
 Export of datasets to cloud, web or other systems.
2a Explain Massively Parallel Processing and Cloud Computing in Big Data scenario. 6M

Massively Parallel Processing Platforms

Many programs are so large and complex that it is impossible to execute them on single
computer system. it is required to enhance (scale up) the computer system or use massive
parallel processing platforms (MPPs).
Parallelization of tasks can be done at several levels:
1. Distributing separate tasks onto separate threads on the same CPU
2. Distributing separate tasks onto separate CPUs on the same computer and 3M
3. Distributing separate tasks onto separate computers.
When making use of the advantage of multiple computers, software needs to be able to
parallelize tasks. The computational problem is broken into discrete pieces of sub-tasks that can
be processed simultaneously. Total time taken will be much less than with a single compute
resource.
Cloud Computing
Cloud computing is a type of Internet-based computing that provides shared processing
resources and data to the computers and other devices on demand. Cloud usages circumvent the
single point failure. Its multiple nodes perform automatically and interchangeably. It offers high
data security compared to other distributed technologies.
Ex: Amazon Web Service (AWS), Elastic Compute Cloud (EC2), Microsoft Azure or Apache
CloudStack, Amazon Simple Storage Service (S3).

Cloud computing features are:

 On-demand service
 Resource pooling
 Scalability
 Accountability and
 Broad network access

Cloud services can be classified into three fundamentals types:

3M
1. Infrastructure as a Service (IaaS): Providing access to resources, such as hard disks,
network connections, databases storage, data center and virtual server spaces is Infrastructure
as s Service (IaaS).
Ex: Tata Communications, Amazon data centers and virtual servers. Apache CloudStack
offers public cloud services, provides highly scalable Infrastructure as a Service (IaaS).

2. Platform as a Service (PaaS): It implies providing the runtime environment to allow

developers to build applications and services, is Platform as a Service (PaaS). Software at
the clouds support and manage the services, storage, networking, deploying, testing,
collaborating, hosting and maintaining applications.
Ex: Hadoop Cloud Service, Oracle Big Data Cloud Services.

3. Software as a Service (SaaS): Providing software applications as a service to end users is

known as Software as a Service (SaaS). Software applications are hosted by a service
provider and made available to customers over the Internet.
Ex: SQL, Google SQL, Oracle Big Data SQL, IBM BigSQL, HPE Vertica, and Microsoft
Polybase.

2b Explain data noise, outliers, data anomaly and duplicate data with example. Why filtering require 5M
during pre-processing.

Noise
Noise in data refers to data giving additional meaningless information besides true / actual
information. Noise refers to difference in the value measured from true value due to additional
influences. Result of data analysis is adversely affected due to noisy data.
Ex: Consider noise in wind velocity and direction readings. The velocity at certain instances will
appear too high and sometimes too low. The directions at certain instances will appear inclined
towards the north and sometimes towards the south.

Outliers
Outliers refers to data, which appears to not to belong to the dataset. Outliers need to be
removed from the dataset; else the result will be effected by a small or large amount. If valid data
is identified as outlier, then also the results will be affected. The outliers are a result of human
data-entry errors, programming bugs.
Ex: In the students grade-sheets in one subject 4/5 in the 4th semester. A result in a semester
shows 9.0/10 in place of 3.0/10. Data 9.0 is an outlier. The student semester grade point average
(SGPA) will be erroneously declared and the student may be even declared to have failed in that
semester.
4M
Missing Values
Missing value implies data not appearing in the data set.
Ex: Consider missing values in the sales figures of chocolates. The values not sent for certain
dates. This may be due to the failure of power supply at the machine or network problems on
specific days in a month. The chocolate sales not added for a day can be added in the next day’s
sales data. The effect on the average sales per day is not significant. However, if the failure
occurred on last day of a month, then the analysis will be erroneous.

Duplicate Values
Duplicate value implies the same data appearing two or more times in a dataset.
Ex: Consider duplicate values in the sales figures of chocolates. This may be due to some
problem in the system. When the number of duplicates values are sent and added, then sales
result analysis will get affected. It can even result in false alarms to a service, which affects
supply chain.
Assume network problems on certain instances. so may not get an acknowledgement of the sales
figures from the server, leading to resending the sales record once again. Then the sales figures
of chocolates get recorded twice at that instance. The chocolate sales data gets added twice in a
specific day’s sales data. The calculation of monthly sales data is adversely affected.

Pre-processing need are:

1. Dropping out of range, inconsistent and outlier values 1M
2. Filtering unreliable, irrelevant and redundant information
3. Data cleaning, editing, reduction and/or wrangling.
4. Data validation, transformation or transcoding.
5. ELT processing

2c Describe the pre-processing steps, data cleaning, transforming, modeling and visualizing data. 4M

Data Cleaning refers to the process of removing or correcting incomplete, incorrect, inaccurate
or irrelevant parts of the data after detecting them.
Ex: In students grade-sheets correcting the grade outliers.

Data Transforming
Data reduction enables the transformation of acquired information into an ordered, correct and
simplified form. The reductions enable ingestion of meaningful data in the datasets. The basic
concept is the reduction of multitudinous amount of data and use meaningful parts.
Data wrangling refers to the process of transforming and mapping the data.
Ex: mapping enables data into another format, which makes it valuable for analytics and data
visualizations. 4M

Data modeling and visualizing data

Data modeling makes sure that the data is stored in a database and accurately represented. It
includes the data objects, associations and rules. Data modeling creates a clear picture of the data
and identifies missing and redundant data.
Data visualization and in the world of Big Data, it’s becoming massively important.
Representing through graphs, charts and maps, make data insightful, so able to see hidden trends
and patterns. It makes data more understandable and usable. Data can be presented in the form of
rectangular tables, or it can be presented in colorful graphs of various types. When processing for
data visualization of Excel format file, the data conversion will be done from .csv file to .xlsx
format.
3a With the figure show how data-store export using machines, files, computers, web servers and 7M
web services.

Figure Data store export from machines, files, computers, web servers and web
services

Data Store first pre-processes from machine and file data sources. Pre-processing transforms
the data in table or partition schema or supported data formats, for example, JSON, CSV,
AVRO. Data then exports in compressed or uncompressed data formats. 1M
Cloud offers various services, IaaS, PaaS & SaaS. These services can be accessed through a
cloud client (client application), such as web browser, SQL or other client. Figure shows
data-store export from machines, files, computers, web servers and web services. The data
exports to clouds, such as IBM, Microsoft, Oracle, Amazon, Rackspace, TCS, Tata
Communications or Hadoop cloud services.

Export of Data to AWS and Rackspace Clouds

Following are the steps for export to an EC2 (AWS) instance:
2M
1. A process pre-processes the data from table in MySQL database and creates a CSV file.
2. An EC2 instance provides an AWS data pipeline.
3. The CSV file exports to Amazon S3 using pipeline. The CSV file then copies into S3
bucket.
4. AWS notification service (SNS) sends notification on completion.

Following are the steps for export to Rackspace

1. One or more databases create a database instance. The process of creation can be
configured to create an instance. Each database can have a number of users.
2. Default port number for binding of MySQL is port 3306.
3. A command
2M
mysqldump – u root – p database_name > database_name.sql
exports to Rackspace cloud.
4. When a database is at a remote host then a command
mysqldump – h host_name – u user_name – p database_name > database_name.sql
exports to the cloud database.
3b Describe the ways of usages of Big Data analytics in marketing, sales and advertising. 8M

Big Data in Marketing and Sales

Marketing is the creation, communication and delivery of value to customers. Customer
Value (CV) depends on 3 factors – quality, service and price.

Following are the five application areas for popularity of Big Data:
1. Leading marketers using Customer Value Analytics (CVA) to deliver the consistent
customer experiences. CVA using the inputs of evaluated purchase patterns, preferences,
quality, price and post sales servicing requirements.
2. Operational analytics for optimizing company operations.
3. Detection of frauds and compliances. Ex: Fraud is borrowing money on already mortgage
assets, compliances means returning the loan and interest installments by the borrowers.
4. New products and innovations in service. Ex: A company develops software and then
offers services like Uber.
5. Enterprise data warehouse optimization.

Big data is providing marketing insights into

1. Most effective content at each stage of a sales cycle,
2. Investment in improving the customer relationship management (CRM),
3. Addition to strategies for increasing customer lifetime value (CLTV),
4. Lowering of customer acquisition cost (CAC), 6M

Big Data usages has the following features-for enabling detection and prevention of frauds:
1. Fusing of existing data at an enterprise data warehouse with the data from sources such
as social media, websites, blogs, e-mails and thus enriching existing data.
2. Using multiple sources of data and connecting with many applications.
3. Analyzing data which enable structured reports and visualization.
4. Providing high volume data mining, new innovative applications thus leading to new
business intelligence and knowledge discovery.
5. Faster detection of threats and predict frauds by using various data and information
publicly available.

Big Data in Advertising

The impact of Big Data is tremendous on the digital advertising industry. Data technology and
analytics provide insights, patterns and models which relate the media exposure of purchase
activity of all consumers using digital channels.
Success from advertisements depends on collection, analyzing and mining. The new insights
enable personalization and targeting the online, social media and mobile for advertisements
2M
called hyper-localized advertising.
Advertising nowadays no longer limits to TV, radio and print. Advertisers use along with these
multiple devices and mediums.
Example: Advertisement of the introduction of new course by an institution or introduction of
new flights by an Airline needs media other than TV.
Advertising on digital medium needs optimization. Too much usage can also effect negatively.
Phone call, SMS, e-mail based advertisements can be nuisance if sent without proper researching
on the potential targets. The analytics help in this direction.
4a Define distributed databases. How do they differ from distributed Data Stores? 5M

Distributed Database Management System

Is a collection of logically interrelated databases at multiple systems over a computer network.
The features of a distributed database system are:
1. A collection of logically related databases.
2. Cooperation between databases in a transparent manner. Means each user within the
system may access all of the data within all of the databases as if they were a single
database.
3. Location Independent which means the user is unaware of where the data is located, and
it is possible to move the data from one physical location to another without affecting the
user.

SQL
(Structured Query Language) SQL is a language for viewing or changing databases, for data
access control, schema creation, and data modifications.

Large Data Storage using RDBMS

RDBMS tables store data in a structured form. The tables have rows and columns. A set of
keys and relational keys access the fields at tables, and retrieve data using queries (insert,
modify, append, join or delete). 4M

In-Memory Column Formats Data

Data in a column are kept together in-memory in columnar format.

In-Memory Row Format Databases

Each row record has corresponding values in multiple columns and the values store at the
consecutive memory addresses. In-Memory row format allows much faster data processing
during OLTP.

Enterprise Data-Store & Data Warehouse

Enterprise data server use data from several distributed sources. Enterprise data, after data
cleaning process, integrate with the server data at data warehouse.

Big Data Storage

Big Data Store uses NoSQL. NoSQL is also used in cloud data store.

Figure shows co-existence of data at server, SQL, RDBMS with NoSQL and Big Data at
Hadoop, Spark, Mesos, S3 or compatible Clusters.

Figure Coexistence of RDBMS for traditional server data, NoSQL and Hadoop, Spark and
compatible Big Data Clusters.
4b Explain Traditional and Big Data analytics architecture reference model. 5M

DBMS or RDBMS manages the traditional databases.

Data Analytics
Analysis brings order, structure and meaning to the collection of data. Analytics uses
historical data and forecasts new values or results. Data analysis helps in finding business
intelligence and in decision making.
Data Analytics Definition
Analysis of data is a process of inspecting, cleaning, transforming and modeling data with the
goal of discovering useful information, suggesting conclusions and supporting decision making.
Phases in Analytics

1. Descriptive analytics enables deriving the additional value from visualizations and reports. 3M
2. Predictive analytics is advanced analytics which enables extraction of new facts and
knowledge, and then predicts/forecasts.
3. Prescriptive analytics enable derivation of the additional value and undertake better
decisions for new options to maximize the profits

Analytics integrate with the enterprise server or data warehouse.

Figure shows an overview of a reference model for analytics architecture. The figure also shows
the Big Data file systems, machine learning algorithms and query languages and usage of the
Hadoop ecosystem.

Figure Traditional and Big Data analytics architecture reference model

4c Describe how Big Data analytics facilitate in Healthcare & Medicine. 5M

Big Data and Healthcare

Big Data analytics in health care use the following data sources: clinical records, pharmacy
records, electronic medical records, diagnosis logs and notes and additional data such as social
interactions, medical leaves from job, deviation from person usual activities.

Healthcare analytics using Big Data can facilitate the following:

1. Provisioning of value-based and customer-centric healthcare: Means cost effective care
by improving healthcare quality using latest knowledge, usages of electronic health and
medical records and improving coordination among the healthcare providing agencies which
reduce avoidable overuse and healthcare costs.
2. Utilizing the “Internet of Things” for health care: This enables the monitoring of devices
data of patients parameters, such as, glucose, BP, ECGs and necessities of visiting
physicians. 3M
3. Preventing fraud, waste, abuse in healthcare industry and reduce healthcare costs: Uses
Big Data predictive analytics and help resolve excessive or duplicate claims. The analytics of
patient records and billing help in detecting anomalies such as overutilization of services in
short intervals, different hospitals in different locations simultaneously, or identical
prescriptions for the same patient.
4. Improving outcomes by accurately diagnosing patient conditions. Early diagnosis,
predicting problems such as congestive heart failure, anticipating and avoiding
complications, matching treatments with outcomes and predicting patients at risk for disease
or readmission.
5. Monitoring patients in real time: Using Machine learning algorithms to process real-time
events. Provides physicians with insights to help them make life-saving decisions. The
process automation sends the alerts to care providers and informs them instantly about
changes in the conditions of a patient.

Big Data in Medicine

Big Data driven approaches help in research in medicine. Following are some findings: building
the health profiles of individual patients and predicting models for diagnosing better and offer
better treatment.
2M
1. Aggregating large volume and variety of information from multiple sources the DNAs,
proteins and metabolites to cells, tissues, organs, organisms and ecosystems that can enhance
the understanding of biology of diseases. Big Data creates patterns and models by data
mining and help in better understanding and research.
2. Deploying wearable devices data, the devices data records during active as well as inactive
periods, provide better understanding of patient health and better risk profiling the user for
certain diseases.

Faculty In charge HOD

Big Data
No ratings yet
Big Data
190 pages
Oracle Recruiting Cloud Integration Deployment Guide
67% (3)
Oracle Recruiting Cloud Integration Deployment Guide
31 pages
Big Data Analytics Compiled Notes
No ratings yet
Big Data Analytics Compiled Notes
130 pages
Django 'Mahmoud Ahmed+
No ratings yet
Django 'Mahmoud Ahmed+
342 pages
Beats Making Selling
100% (3)
Beats Making Selling
39 pages
cp5293 Big Data Analytics Question Bank
0% (1)
cp5293 Big Data Analytics Question Bank
13 pages
Data Base Owner Jakarta
100% (2)
Data Base Owner Jakarta
26 pages
Web Content Management Unit-1
100% (1)
Web Content Management Unit-1
10 pages
Ds4015 Big Data Analytics QB
No ratings yet
Ds4015 Big Data Analytics QB
155 pages
Mediant Software SBC Users Manual Ver 74
No ratings yet
Mediant Software SBC Users Manual Ver 74
1,655 pages
Big Data SV Publication
No ratings yet
Big Data SV Publication
142 pages
Cp5293 Big Data Analytics Question Bank
0% (1)
Cp5293 Big Data Analytics Question Bank
13 pages
How To Earn Money From Instagram
No ratings yet
How To Earn Money From Instagram
4 pages
It - (R20) - 4-1 - Big Data Analytics - Digital Notes
No ratings yet
It - (R20) - 4-1 - Big Data Analytics - Digital Notes
117 pages
Report
No ratings yet
Report
11 pages
Unit 1 - BD - Introduction To Big Data
100% (1)
Unit 1 - BD - Introduction To Big Data
90 pages
Big Data Analytics-Digital Notes
No ratings yet
Big Data Analytics-Digital Notes
86 pages
Data Science and Big Data Analytics - Unit - 1
No ratings yet
Data Science and Big Data Analytics - Unit - 1
47 pages
Literature Review Environment
100% (1)
Literature Review Environment
6 pages
No SQL Database in Bda
No ratings yet
No SQL Database in Bda
84 pages
Bengal College of Engineering & Technology, Durgapur: Submitted To-Submitted by
100% (1)
Bengal College of Engineering & Technology, Durgapur: Submitted To-Submitted by
23 pages
BDA Unit 1
No ratings yet
BDA Unit 1
36 pages
Online Book Store
100% (1)
Online Book Store
36 pages
Manual of LiveGTS
100% (1)
Manual of LiveGTS
40 pages
Syllabus
No ratings yet
Syllabus
7 pages
B7 Comp Tsol-1
100% (1)
B7 Comp Tsol-1
4 pages
Module 1-BDA
No ratings yet
Module 1-BDA
82 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
GetSocialSmart Content-Grid
No ratings yet
GetSocialSmart Content-Grid
7 pages
It (r20) 4-1 Big Data Analytics Digital Notes
No ratings yet
It (r20) 4-1 Big Data Analytics Digital Notes
84 pages
Bigdata
No ratings yet
Bigdata
54 pages
Full Site Audit Report-Inter Pro Web Host
No ratings yet
Full Site Audit Report-Inter Pro Web Host
125 pages
Big Data Analytics (R20a0520)
No ratings yet
Big Data Analytics (R20a0520)
84 pages
Lecture1 Introductiontobigdata 190301171350
No ratings yet
Lecture1 Introductiontobigdata 190301171350
63 pages
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
No ratings yet
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
25 pages
Big Data One Shot
No ratings yet
Big Data One Shot
45 pages
Introduction of Subject
No ratings yet
Introduction of Subject
28 pages
Execution and Business Plan
No ratings yet
Execution and Business Plan
22 pages
BIG Data - Unit - 1
No ratings yet
BIG Data - Unit - 1
24 pages
Ayush Chaunah
No ratings yet
Ayush Chaunah
46 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
DS Bda QB Ug24
No ratings yet
DS Bda QB Ug24
28 pages
SocialFlow User Guide March 2020
No ratings yet
SocialFlow User Guide March 2020
41 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
21 pages
BDA ESE Questions
No ratings yet
BDA ESE Questions
22 pages
CS8091 BDA Unit1
No ratings yet
CS8091 BDA Unit1
63 pages
Bda Ans
No ratings yet
Bda Ans
18 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
15 pages
20ai402 Data Analytics Unit-1
No ratings yet
20ai402 Data Analytics Unit-1
52 pages
Ese Bda
No ratings yet
Ese Bda
28 pages
COMP9313: Big Data Management
No ratings yet
COMP9313: Big Data Management
79 pages
Bda Material Jntugv R20 Unit 1
No ratings yet
Bda Material Jntugv R20 Unit 1
32 pages
Big Data Syllabus
No ratings yet
Big Data Syllabus
1 page
Big Data Analytics - Notes
No ratings yet
Big Data Analytics - Notes
13 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
17 pages
BDA Notes Part 1
No ratings yet
BDA Notes Part 1
11 pages
Unit 1 B Tech 3 Year BD
No ratings yet
Unit 1 B Tech 3 Year BD
10 pages
Zoo U Program Guide
No ratings yet
Zoo U Program Guide
19 pages
Ak As2
No ratings yet
Ak As2
15 pages
RWS Module 2
No ratings yet
RWS Module 2
50 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
Unit 01
No ratings yet
Unit 01
36 pages
Chaoter Data Science
No ratings yet
Chaoter Data Science
20 pages
Bda Unit-1 Notes
No ratings yet
Bda Unit-1 Notes
10 pages
Bda Unit I LM
No ratings yet
Bda Unit I LM
14 pages
Ajax 1
No ratings yet
Ajax 1
2 pages
U I Q-A
No ratings yet
U I Q-A
7 pages
Unit I LM
No ratings yet
Unit I LM
12 pages
Big Data With Cloud Computing Discussions and Challenges
No ratings yet
Big Data With Cloud Computing Discussions and Challenges
9 pages
BDA2023 Outline
No ratings yet
BDA2023 Outline
7 pages
User Guide: Get Started With Sprout Social's All-In-One Social Management and Engagement Platform
No ratings yet
User Guide: Get Started With Sprout Social's All-In-One Social Management and Engagement Platform
11 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
Module 1
No ratings yet
Module 1
21 pages
Big Data Technologies Course Outline
No ratings yet
Big Data Technologies Course Outline
2 pages
HTML and Css
No ratings yet
HTML and Css
27 pages
Javascript Interview Que
No ratings yet
Javascript Interview Que
5 pages
Smart Mirror Using Raspberry Pi
No ratings yet
Smart Mirror Using Raspberry Pi
6 pages
Bda Test1 Key Answers
No ratings yet
Bda Test1 Key Answers
7 pages
Separate Practical COPC203-Introduction To Web Developement
No ratings yet
Separate Practical COPC203-Introduction To Web Developement
2 pages
What Is Organic Traffic in Ahrefs and How Do We Calculate It Help Center - Ahrefs
No ratings yet
What Is Organic Traffic in Ahrefs and How Do We Calculate It Help Center - Ahrefs
1 page
Heq Apr24 Dip BDM
No ratings yet
Heq Apr24 Dip BDM
2 pages
Chapter - 1 Introduction
No ratings yet
Chapter - 1 Introduction
22 pages
Web Scrapper From Scratch
No ratings yet
Web Scrapper From Scratch
25 pages
Syllabus
No ratings yet
Syllabus
3 pages
Kas 30sept2019 Latihan Untuk Scanning
No ratings yet
Kas 30sept2019 Latihan Untuk Scanning
2 pages
Big Data - 2 Marks-1
No ratings yet
Big Data - 2 Marks-1
1 page
Big Data Analytics
No ratings yet
Big Data Analytics
3 pages
Shubham - Resume Updated
No ratings yet
Shubham - Resume Updated
7 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)

1st Internal Solved

Uploaded by

1st Internal Solved

Uploaded by

7th Semester (2018 Scheme)

SHREE DEVI INSTITUTE OF TECHNOLOGY

Sem/Sec: VII A & B Max Marks: 30

Question Marks RBT CO

b. Define Big Data architecture. Draw five layers in architecture 8M L1 CO1

c. Describe the pre-processing steps, data cleaning, transforming, 4M L2 CO1

b. Describe the ways of usages of Big Data analytics in marketing, 8M L2 CO1

b. Explain Traditional and Big Data analytics architecture reference 5M L1 CO1

c. Describe how Big Data analytics facilitate in Healthcare & 5M L2 CO1

Sem: VII Max. Marks:30

Definition of Web Data

Big Data Definitions

Explain the 3Vs characteristics of Big Data

1. Identification of data sources,

L4 considers the following aspects:

L5 considers the consumption of data for the following:

Massively Parallel Processing Platforms

Cloud computing features are:

Cloud services can be classified into three fundamentals types:

2. Platform as a Service (PaaS): It implies providing the runtime environment to allow

3. Software as a Service (SaaS): Providing software applications as a service to end users is

Pre-processing need are:

Data modeling and visualizing data

Export of Data to AWS and Rackspace Clouds

Following are the steps for export to Rackspace

Big Data in Marketing and Sales

Big data is providing marketing insights into

Big Data in Advertising

Distributed Database Management System

Large Data Storage using RDBMS

In-Memory Column Formats Data

In-Memory Row Format Databases

Enterprise Data-Store & Data Warehouse

Big Data Storage

DBMS or RDBMS manages the traditional databases.

Analytics integrate with the enterprise server or data warehouse.

Figure Traditional and Big Data analytics architecture reference model

4c Describe how Big Data analytics facilitate in Healthcare & Medicine. 5M

Big Data and Healthcare

Healthcare analytics using Big Data can facilitate the following:

Big Data in Medicine

Faculty In charge HOD

You might also like