0% found this document useful (0 votes)

117 views8 pages

BDAM - Assignment 1 - Group 2

The document discusses several topics related to data management and computing technologies: 1) How in-memory computing works by storing data in RAM across clusters for faster parallel processing compared to disk storage. 2) How Google's search engine operates using a distributed network, web crawler, indexer and query processor. 3) Benefits and cons of Google Cloud Bigtable as a scalable NoSQL database. 4) Definitions of data virtualization, storage virtualization, and functions of a VM manager.

Uploaded by

Pankhuri Bhatnagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

117 views8 pages

BDAM - Assignment 1 - Group 2

Uploaded by

Pankhuri Bhatnagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

BDAM

ASSIGNMENT - 1

MANAGEMENT DEVELOPMENT INSTITUTE

GURGAON

SUBMITTED BY:
GROUP 2
Dikshika Arya (19PT1-07)
Jigyasa Monga (19PT1-12)
Pankhuri Bhatnagar (19PT1-18)
1. How things work in Memory computing?

 In-memory computing means using a type of middleware software that allows one
to store data in RAM which is faster than traditional spinning disk, across a cluster
of computers, and process it in parallel.

 RAM storage and parallel distributed processing are two fundamental pillars of in-
memory computing.

 A single modern computer can hardly have enough RAM to hold a significant
dataset, but that’s not enough to store many of today’s operational datasets that
easily measure in terabytes.

 To overcome this problem in-memory computing software is designed from the

ground up to store data in a distributed fashion, where the entire dataset is divided
into individual computers’ memory, each storing only a portion of the overall
dataset. Once data is partitioned - parallel distributed processing becomes a
technical necessity simply because data is stored this way.

 Developing technology that enables in-memory computing and parallel processing

is highly challenging.

 By storing data in RAM and processing it in parallel, it supplies real-time insights

that enable businesses to deliver immediate actions and responses. That’s what
makes it ideal for implementation in transactional and analytical applications
sharing the same data infrastructure

2. How Google works:

● Google runs on a distributed network of thousands of low-cost computers and can

therefore carry out fast parallel processing.
● Parallel processing is a method of computation in which many calculations can be
performed simultaneously, significantly speeding up data processing.
● Google has three distinct parts:
o Googlebot, a web crawler that finds and fetches web pages.
o The indexer that sorts every word on every page and stores the resulting
index of words in a huge database.
o The query processor, which compares your search query to the index and
recommends the documents that it considers most relevant.
● Pros: -
o Highly scalable data warehouse
o Easily integrated into analytics tools like Data Studio
o Easy to use with SQL support
o Can be used for all batch jobs or aggregations.

● Cons: -
o High Price.
o It does not handle external dependencies.

3. Cloud Bigtable

 A fully managed, scalable NoSQL database service for large analytical and
operational workloads.
 It is a compressed, high performance, proprietary data storage system built on
Google File System, Chubby Lock Service, SSTable and a few other Google
technologies.

● Pros: -
o Consistent sub-10ms latency—handle millions of requests per second.
o Ideal for use cases such as personalization, ad tech, fintech, digital media, and IoT.
o Seamlessly scale to match your storage needs; no downtime during reconfiguration.
o Designed with a storage engine for machine learning applications leading to better
predictions

4. What is data and storage virtualization? Functions of VM

manager?

Data virtualization is an approach to data management that allows an application to

retrieve and manipulate data without requiring technical details about the data, such as
how it is formatted at source, or where it is physically located, and can provide a single
customer view (or single view of any other entity) of the overall data.

Benefits of data visualization:

 Reduce risk of data errors

 Reduce systems workload through not moving data around
 Increase speed of access to data on a real-time basis
 Significantly reduce development and support time
 Increase governance and reduce risk through the use of policies
 Reduce data storage required

Storage virtualization

 Storage virtualization is the process of grouping the physical storage from multiple
network storage devices so that it looks like a single storage device.
 The process involves abstracting and covering the internal functions of a storage
device from the host application, host servers or a general network in order to
facilitate the application and network-independent management of storage.
 Storage virtualization is also known as cloud storage.
 Some of the benefits of storage virtualization include automated management,
expansion of storage capacity, reduced time in manual supervision, easy updates
and reduced downtime
Functions of VM manager:

 Create virtual machines from installation media or from a virtual machine template.
 Delete virtual machines.
 Power off virtual machines.
 Import virtual machines.
 Deploy and clone virtual machines.
 Perform live migration of virtual machines.
 Import and manage ISOs.

5. Hyper-V technology, Intel VT-x?

Hyper-V technology

 Hyper-V is a form of hypervisor-based virtualization technology, which is used for

creating, running, and managing virtual machines (VMs). Hyper-V is a Type-1
hypervisor, which means that the hypervisor runs directly on the physical hardware
(host machine) and hosts multiple VMs (guest machines) sharing the virtualized
hardware resources from the physical server.
 Even though one physical server can host multiple VMs and those VMs share the
same set of physical resources, they do not affect one another’s performance. This is
due to the fact that each VM in a virtual environment runs in isolation from other
VMs.
Intel VT-x

 Intel VT (Virtualization Technology) is the company's hardware assistance for

processors running virtualization platforms.
 Intel VT includes a series of extensions for hardware virtualization. The Intel VT-x
extensions are probably the best recognized extensions, adding migration, priority
and memory handling capabilities to a wide range of Intel processors. By
comparison, the VT-d extensions add virtualization support to Intel chipsets that
can assign specific I/O devices to specific virtual machines (VM)s, while the VT-c
extensions bring better virtualization support to I/O devices such as network
switches.

6. Discuss streaming data access and management.

Streaming data refers to the data that is continuously generated, usually in high volumes
and at high velocity. It is the continuous flow of data generated by various sources.
In streaming data access technology, instead of reading data as packets or chunks, data is
read continuously with a constant bitrate. The application starts reading data from the start
of a file and keeps on reading it in a sequential manner without random seeks.

By using stream processing technology, data streams can be processed, stored, analyzed,
and acted upon as it is generated in real-time.

Streaming Data Architecture-

Streaming Data Architecture can be considered as a framework of software components

which are built to ingest and process large volumes of streaming data from multiple
sources. Under the Streaming Data Architecture, it consumes data immediately as it is
generated, persists it to storage, and can include various additional components like tools
for real-time processing, data manipulation and analytics.

Streaming stacks can be built on an assembly line of open-source and proprietary solutions
to specific problems including stream processing, storage, data integration and real-time
analytics.

 The Message Broker / Stream Processor

Message Broker is the element that takes data from a source, called a producer,
translates it into a standard message format, and streams it on an ongoing basis.
Other components can listen in and consume the messages passed on by the broker.

Streaming brokers support very high performance, have massive capacity of

message traffic, and are highly focused on streaming with little support for data
transformations or task scheduling.

 Batch and Real-time ETL tools

Data streams from one or more message brokers need to be aggregated,

transformed and structured before data can be analyzed with SQL-based analytics
tools. This action is performed by ETL tool or platform which receives queries from
users, fetches events from message queues and applies the query and generates a
result. It also performs additional joins, transformations on aggregations on the
data. It may result in an API call, an action, a visualization, an alert or even in a new
data stream.

 Data Analytics / Serverless Query Engine

After the preparation of streaming data by the stream processor, it is analyzed to

provide value. There are various approaches and tools used for streaming data
analytics.
 Streaming Data Storage

Various Data Storage options are being used for storing streaming data like
Database or Data Warehouse, in the message broker or in Data Lake. Data Lake
option provides a flexible and inexpensive option for storing event data however, it
offers its own technical challenges.

Various modern streaming architectures are also being adopted which rely on full stack approach
(in contrast to patching together open source technologies) which further provides the benefits of
performance, high availability, fault tolerance, flexibility, etc.

Communication Systems by R P Singh and SD Sapre PDF
0% (2)
Communication Systems by R P Singh and SD Sapre PDF
4 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
Big Data Unit 1 Notes
100% (1)
Big Data Unit 1 Notes
27 pages
Big Data Architecture Basics
No ratings yet
Big Data Architecture Basics
24 pages
Internet of Things (Design Principles For Web Connectivity) : By: Dr. Raj Kamal
No ratings yet
Internet of Things (Design Principles For Web Connectivity) : By: Dr. Raj Kamal
44 pages
BIG DATA 1 Unit
100% (1)
BIG DATA 1 Unit
17 pages
DS-2000 Software Operation Manual
100% (1)
DS-2000 Software Operation Manual
23 pages
Unit 3 Memory Management Amultiple Choice Questions:: 18Csc205J Operating Systems Unit 3
No ratings yet
Unit 3 Memory Management Amultiple Choice Questions:: 18Csc205J Operating Systems Unit 3
17 pages
De Mod 1 Get Started With Databricks Data Science and Engineering Workspace
No ratings yet
De Mod 1 Get Started With Databricks Data Science and Engineering Workspace
27 pages
The New Information Fabric 3.0: Data Virtualization Delivered!
No ratings yet
The New Information Fabric 3.0: Data Virtualization Delivered!
35 pages
Experiment No. 6: Aim: To Design and Simulate 6T Ram Using Symicade Tool. Tools Used: Symicade
No ratings yet
Experiment No. 6: Aim: To Design and Simulate 6T Ram Using Symicade Tool. Tools Used: Symicade
3 pages
Interconnection Structures
89% (36)
Interconnection Structures
43 pages
BDA UNIT 1 and 2
No ratings yet
BDA UNIT 1 and 2
34 pages
White Paper
No ratings yet
White Paper
98 pages
Presentation EBTIC D1 M1
No ratings yet
Presentation EBTIC D1 M1
106 pages
Data Virtualization and Dashboard Lecture 3 230824
No ratings yet
Data Virtualization and Dashboard Lecture 3 230824
88 pages
Lab - Configure Basic Router Settings Topology: Addressing Table
No ratings yet
Lab - Configure Basic Router Settings Topology: Addressing Table
14 pages
IoT CSE U 5
No ratings yet
IoT CSE U 5
149 pages
DT-EDU-DEN80EDU01ABDS001 Origins of Data Virtualization
No ratings yet
DT-EDU-DEN80EDU01ABDS001 Origins of Data Virtualization
20 pages
Big Data and Hadoop: Senior Product Specialist
No ratings yet
Big Data and Hadoop: Senior Product Specialist
40 pages
16 08 2024 Data Virtualization Session2
No ratings yet
16 08 2024 Data Virtualization Session2
45 pages
InfiniScaleStorage TAR
No ratings yet
InfiniScaleStorage TAR
57 pages
Introduction To Big Data, Hadoop and Spark
No ratings yet
Introduction To Big Data, Hadoop and Spark
40 pages
Assignment-1 03-135191-007 M. Raza Siddique
No ratings yet
Assignment-1 03-135191-007 M. Raza Siddique
9 pages
BDA Unit 2 1
No ratings yet
BDA Unit 2 1
42 pages
Emerging IT Trends and Virtualization
No ratings yet
Emerging IT Trends and Virtualization
34 pages
Unit - Iv Data Analytics Frameworks: Centralized and Distributed Functional Architectures of Relational Systems
No ratings yet
Unit - Iv Data Analytics Frameworks: Centralized and Distributed Functional Architectures of Relational Systems
24 pages
BD-Unit03-6 7 and 8-1
No ratings yet
BD-Unit03-6 7 and 8-1
68 pages
Numenta Case Analysis-Group 2
100% (1)
Numenta Case Analysis-Group 2
3 pages
Data Virtualization
No ratings yet
Data Virtualization
40 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
24 pages
Unit1 - BDH
No ratings yet
Unit1 - BDH
77 pages
3 Assignment
No ratings yet
3 Assignment
5 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
STREAM PROCESSING 2 Marks Question and Answers
No ratings yet
STREAM PROCESSING 2 Marks Question and Answers
8 pages
Red Hat Enterprise Linux-8-8.8 Release Notes-En-Us
No ratings yet
Red Hat Enterprise Linux-8-8.8 Release Notes-En-Us
209 pages
Stream Processing Chapter 2
No ratings yet
Stream Processing Chapter 2
21 pages
Digitization Week 3
No ratings yet
Digitization Week 3
13 pages
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
19PT2-36 - Operations Management Assignment
No ratings yet
19PT2-36 - Operations Management Assignment
6 pages
Laptop Specification For Buy
No ratings yet
Laptop Specification For Buy
2 pages
Unit Ii
No ratings yet
Unit Ii
20 pages
Ceo Updates PDF
No ratings yet
Ceo Updates PDF
26 pages
GBDK Manual
No ratings yet
GBDK Manual
377 pages
Ece406 Embedded-system-Design TH 1.00 Ac16
No ratings yet
Ece406 Embedded-system-Design TH 1.00 Ac16
1 page
Form Dir 2
No ratings yet
Form Dir 2
2 pages
Assignment 1
No ratings yet
Assignment 1
17 pages
Installing and Tasting OpenDaylight Beryllium
No ratings yet
Installing and Tasting OpenDaylight Beryllium
6 pages
Cars Data
No ratings yet
Cars Data
72 pages
Quick Guide For YAMAHA E-Bike Diagnostic Software Ver.2
No ratings yet
Quick Guide For YAMAHA E-Bike Diagnostic Software Ver.2
7 pages
MX One Installation
No ratings yet
MX One Installation
45 pages
Comp Arch and Org - Lec 1
No ratings yet
Comp Arch and Org - Lec 1
27 pages
Assembly Language For x86 Processors
No ratings yet
Assembly Language For x86 Processors
50 pages
01 Introduction - World of Microcontrollers - Book - PIC Microcontrollers
No ratings yet
01 Introduction - World of Microcontrollers - Book - PIC Microcontrollers
15 pages
Manifest NonUFSFiles Win64
No ratings yet
Manifest NonUFSFiles Win64
3 pages
1st Unit - Jupyter Notebook
No ratings yet
1st Unit - Jupyter Notebook
16 pages
SNB m008 Mini 286 Main Board Manual
No ratings yet
SNB m008 Mini 286 Main Board Manual
14 pages
Ficha Técnica - Gemalto - Datasheet - BGS5T - Web PDF
No ratings yet
Ficha Técnica - Gemalto - Datasheet - BGS5T - Web PDF
4 pages
Red Hat System Administration Objectives
No ratings yet
Red Hat System Administration Objectives
9 pages
Column 1 Column 2 Column 3
No ratings yet
Column 1 Column 2 Column 3
15 pages
Form No. Inc-9 Affidavit (Pursuant To Section 7 (1) (C) of The Companies Act, 2013 and Rule 15 of Thecompanies (Incorporation) Rules, 2014)
No ratings yet
Form No. Inc-9 Affidavit (Pursuant To Section 7 (1) (C) of The Companies Act, 2013 and Rule 15 of Thecompanies (Incorporation) Rules, 2014)
3 pages
FModel Log 2024 05 12
No ratings yet
FModel Log 2024 05 12
38 pages
Artificial Intelligence - Assignment 3
No ratings yet
Artificial Intelligence - Assignment 3
11 pages
Wine Data Output
No ratings yet
Wine Data Output
10 pages
Discuss Mesos and Yarn and The Relative Placement of The Two Respectively
No ratings yet
Discuss Mesos and Yarn and The Relative Placement of The Two Respectively
6 pages
Boston Children Project
No ratings yet
Boston Children Project
2 pages
First Generation Computers IBM-701 (Electronic Data Processing Machine)
No ratings yet
First Generation Computers IBM-701 (Electronic Data Processing Machine)
5 pages
19PT1-18 Pankhuri Bhatnagar
No ratings yet
19PT1-18 Pankhuri Bhatnagar
3 pages
Assignment 2 Answer
No ratings yet
Assignment 2 Answer
7 pages
Assignment 02 BigData Computing Noc23-Cs112
No ratings yet
Assignment 02 BigData Computing Noc23-Cs112
9 pages
Design and Simulate HSRP Protocol Based Network On Packet Tracer
No ratings yet
Design and Simulate HSRP Protocol Based Network On Packet Tracer
6 pages
Bimmercode Quick Start Guide For Unicarscan Ucsi-2000
No ratings yet
Bimmercode Quick Start Guide For Unicarscan Ucsi-2000
5 pages
I2 Quote 3700016092696.1 2025 03 04
No ratings yet
I2 Quote 3700016092696.1 2025 03 04
5 pages
SBM MT
No ratings yet
SBM MT
1 page
Scalability By Design
From Everand
Scalability By Design
Chukwunonso Offor
No ratings yet
Cyber Vulnerabilities: Education, #3
From Everand
Cyber Vulnerabilities: Education, #3
Artur Victoria
No ratings yet
IBM WebSphere eXtreme Scale 6
From Everand
IBM WebSphere eXtreme Scale 6
Anthony Chaves
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Essays on Infrastructure-as-code
From Everand
Essays on Infrastructure-as-code
Ravi Rajamani
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
From Everand
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
Mario Marinov
No ratings yet
CrateDB for IoT and Machine Data: The Complete Guide for Developers and Engineers
From Everand
CrateDB for IoT and Machine Data: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Flux Architecture
From Everand
Flux Architecture
Adam Boduch
No ratings yet
Streamlit Development Essentials: Definitive Reference for Developers and Engineers
From Everand
Streamlit Development Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Computer Application In Business ( Concise Notes )
From Everand
Computer Application In Business ( Concise Notes )
NotesKaro
No ratings yet
Microsoft Azure Fundamentals Exam Cram: Second Edition
From Everand
Microsoft Azure Fundamentals Exam Cram: Second Edition
IP Specialist
5/5 (1)
Study Guide 300-435 ENAUTO: Automating and Programming Cisco Enterprise Solutions Certification Exam
From Everand
Study Guide 300-435 ENAUTO: Automating and Programming Cisco Enterprise Solutions Certification Exam
Anand Vemula
No ratings yet
RisingWave for Real-Time Data Processing: The Complete Guide for Developers and Engineers
From Everand
RisingWave for Real-Time Data Processing: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Study Guide Cisco AppDynamics Professional Implementer (500-430 CAPI)
From Everand
Study Guide Cisco AppDynamics Professional Implementer (500-430 CAPI)
Anand Vemula
No ratings yet
Striim Platform Essentials: Definitive Reference for Developers and Engineers
From Everand
Striim Platform Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Azure HDInsight: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Azure HDInsight: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Veeam Data Protection and Recovery Solutions: Definitive Reference for Developers and Engineers
From Everand
Veeam Data Protection and Recovery Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
From Everand
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Netdata in Practice: Definitive Reference for Developers and Engineers
From Everand
Netdata in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Web Deployment with Zeit: Definitive Reference for Developers and Engineers
From Everand
Efficient Web Deployment with Zeit: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Zabbix Systems Monitoring and Management: Definitive Reference for Developers and Engineers
From Everand
Zabbix Systems Monitoring and Management: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cloud Computing Made Simple: Navigating the Cloud: A Practical Guide to Cloud Computing
From Everand
Cloud Computing Made Simple: Navigating the Cloud: A Practical Guide to Cloud Computing
Poonam Devi
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
How To Do Virtualization: Your Step-By-Step Guide To Virtualization
From Everand
How To Do Virtualization: Your Step-By-Step Guide To Virtualization
HowExpert
No ratings yet
Computer Science Self Management: Fundamentals and Applications
From Everand
Computer Science Self Management: Fundamentals and Applications
Fouad Sabry
No ratings yet

BDAM - Assignment 1 - Group 2

Uploaded by

BDAM - Assignment 1 - Group 2

Uploaded by

BDAM

MANAGEMENT DEVELOPMENT INSTITUTE

 To overcome this problem in-memory computing software is designed from the

 Developing technology that enables in-memory computing and parallel processing

 By storing data in RAM and processing it in parallel, it supplies real-time insights

2. How Google works:

● Google runs on a distributed network of thousands of low-cost computers and can

4. What is data and storage virtualization? Functions of VM

Data virtualization is an approach to data management that allows an application to

Benefits of data visualization:

 Reduce risk of data errors

5. Hyper-V technology, Intel VT-x?

 Hyper-V is a form of hypervisor-based virtualization technology, which is used for

 Intel VT (Virtualization Technology) is the company's hardware assistance for

6. Discuss streaming data access and management.

Streaming Data Architecture-

Streaming Data Architecture can be considered as a framework of software components

 The Message Broker / Stream Processor

Streaming brokers support very high performance, have massive capacity of

 Batch and Real-time ETL tools

Data streams from one or more message brokers need to be aggregated,

 Data Analytics / Serverless Query Engine

After the preparation of streaming data by the stream processor, it is analyzed to

You might also like