Unit 2 Cloud Computing Reference Model
Unit 2 Cloud Computing Reference Model
The Cloud Computing Reference Model (CCRM) serves as a foundational framework for comprehending
the intricacies of cloud computing ecosystems. Its conceptual lens elucidates the dynamic interplay
between various components and their relationships within cloud environments. While diverse
interpretations and iterations exist, the National Institute of Standards and Technology's (NIST) Cloud
Computing Reference Architecture is widely recognized for its comprehensive depiction.
At its core, the CCRM delineates essential aspects such as service models, deployment paradigms,
architectural elements, interfaces, security frameworks, management methodologies, and
interoperability standards. Service models, encompassing Infrastructure as a Service (IaaS), Platform as
a Service (PaaS), Software as a Service (SaaS), and Function as a Service (FaaS), delineate the spectrum
of cloud offerings. Deployment models, including Public, Private, Hybrid, and Community Clouds,
illuminate the diverse infrastructural configurations.
Additionally, the CCRM underscores the criticality of interfaces, security protocols, and compliance
measures in fostering secure and compliant cloud environments. Moreover, it accentuates the
significance of effective management, monitoring, integration, and interoperability for seamless cloud
operations. By synthesizing these multifaceted components, the CCRM facilitates a holistic
understanding of cloud computing landscapes, empowering stakeholders to navigate and harness the
transformative potential of cloud technologies effectively.
The Cloud Computing reference model is divided into 3 major service models:
1. Software as a Service (SaaS)
2. Platform as a Service (PaaS)
3. Infrastructure as a Service (IaaS)
Saas
Software as a Service (SaaS) is a cloud computing model where software applications are hosted and
provided to users over the internet on a subscription basis. SaaS eliminates the need for users to install,
manage, and maintain software locally, as everything is managed by the service provider. Users access
the software through a web browser or API, enabling them to use the application from any device with
internet connectivity.
SaaS offerings range from productivity tools like email and office suites to specialised business
applications like customer relationship management (CRM) and enterprise resource planning (ERP)
systems. SaaS provides scalability, flexibility, and cost-effectiveness, as users only pay for the features
and resources they need, with the service provider handling software updates, maintenance, security,
and infrastructure management.
Features
Accessibility: SaaS applications provide unparalleled accessibility, enabling users to access them
from anywhere with an internet connection. This accessibility fosters remote work and
flexibility, allowing users to collaborate and perform tasks on the go using various devices such
as laptops, tablets, or smartphones. Users can conveniently access their SaaS applications
whether they are in the office, at home, or traveling, enhancing productivity and responsiveness
to business needs.SaaS applications are accessible over the internet, allowing users to access
them from anywhere, anytime, using any device with an internet connection, fostering remote
work and flexibility.
Scalability: SaaS offerings are designed to be inherently scalable, allowing users to effortlessly
adjust their usage and subscription plans in response to changing business requirements. Users
can quickly scale up to accommodate increased demand or scale down during periods of
reduced usage without significant upfront investment or infrastructure changes. This scalability
ensures businesses can efficiently manage their resources and costs, adapting to evolving
market conditions and growth opportunities with agility and cost-effectiveness.
Automatic Updates: SaaS providers relieve users of the burden of managing software updates
and upgrades by handling these tasks themselves. This ensures users can access the latest
features, improvements, and security patches without manual intervention. Automatic updates
are seamlessly integrated into the SaaS platform, minimising user workflow disruptions and
eliminating the risk of running outdated software. By staying up-to-date with the latest software
versions, users can benefit from enhanced functionality, improved performance, and
strengthened security measures, ultimately contributing to a more efficient and secure
computing environment.
Cost-effectiveness: SaaS operates on a subscription-based pricing model, where users pay a
recurring fee typically based on usage or the number of users. This pay-as-you-go approach
eliminates the need for upfront software licensing fees and significantly reduces the total cost of
ownership compared to traditional software deployment models. Businesses can accurately
forecast and budget their expenses, as subscription fees are predictable and often scale with
usage.
Paas
Platform as a Service (PaaS) is a cloud computing model that provides developers with a platform and
environment to build, deploy, and manage applications without dealing with the underlying
infrastructure complexities. PaaS offerings typically include tools, development frameworks, databases,
middleware, and other resources necessary for application development and deployment.
Developers can focus on writing and improving their code while the PaaS provider handles
infrastructure management, scalability, and maintenance tasks. PaaS streamlines the development
process, accelerates time-to-market, and reduces infrastructure management overhead.
Features
Development Tools: PaaS platforms offer a wide array of development tools, including
integrated development environments (IDEs), code editors, and debugging utilities, to facilitate
efficient application development. PaaS platforms offer development tools like IDEs, code
editors, and debugging utilities, streamlining the application development process. These tools
provide developers a cohesive environment for coding, testing, and debugging applications,
enhancing productivity and code quality.
Deployment Automation: PaaS automates the deployment process, allowing developers to
deploy applications quickly and efficiently, reducing deployment errors and speeding up the
release cycle. PaaS automates the deployment process, enabling rapid and error-free
deployment of applications. By automating provisioning, configuration, and deployment tasks,
PaaS reduces manual intervention, minimises deployment errors, and accelerates the release
cycle, ensuring faster time-to-market for applications.
Scalability: PaaS platforms provide scalable infrastructure resources, enabling applications to
scale up dynamically or down based on demand, ensuring optimal performance and resource
utilisation. PaaS platforms offer scalable infrastructure resources, allowing applications to adjust
resource allocation based on demand dynamically. This elasticity ensures optimal performance,
resource utilisation, and cost efficiency, enabling applications to handle varying workloads
seamlessly without downtime or performance degradation.
Middleware and Services: PaaS offerings include middleware components and pre-built
services, such as databases, messaging queues, and authentication services, which developers
can leverage to enhance their applications' functionality without building these components
from scratch. PaaS offerings include middleware components and pre-built services like
databases, messaging queues, and authentication services. These services simplify application
development by providing ready-to-use components, reducing development time and effort
while enhancing application functionality and scalability.
Lass
LaaS (Linguistic as a Service) is a specialised service model within the field of natural language
processing (NLP) and artificial intelligence (AI). It provides on-demand access to linguistic functionalities
and capabilities through cloud-based APIs (Application Programming Interfaces). LaaS enables
developers and businesses to integrate advanced language processing features into their applications
without the need for extensive expertise in NLP or AI.
Infrastructure as a Service (IaaS) offers users virtualised computing resources over the internet. Users
control operating systems, storage, and networking, but the cloud provider manages the infrastructure,
including servers, virtualisation, and networking components. This model grants flexibility and
scalability without the burden of maintaining physical hardware.
Features
Language Understanding: LaaS platforms offer robust capabilities for understanding and
interpreting human language, including tasks such as sentiment analysis, entity recognition,
intent detection, and language translation. These features enable applications to extract
meaningful insights from textual data and facilitate interaction with users in multiple
languages.LaaS platforms excel in comprehending human language, offering tasks like sentiment
analysis, entity recognition, intent detection, and language translation.
Text Analysis and Processing: LaaS services provide tools for analysing and processing text, such
as tokenisation, part-of-speech tagging, syntactic parsing, and named entity recognition. These
functionalities help extract structured information from unstructured text data, enabling
applications to perform tasks like information retrieval, content categorisation, and text
summarization. LaaS services provide tools for dissecting and manipulating text, including
tokenisation, part-of-speech tagging, syntactic parsing, and named entity recognition.
Speech Recognition and Synthesis: Many LaaS platforms offer speech recognition and synthesis
capabilities, allowing applications to transcribe spoken language into text and generate human-
like speech from textual input. These features are essential for building voice-enabled
applications, virtual assistants, and speech-to-text systems.LaaS platforms furnish speech
recognition and synthesis functionalities, enabling applications to transcribe spoken language
into text and generate natural-sounding speech from textual inputs.
Customisation and Integration: LaaS platforms often provide tools and APIs for customising and
integrating linguistic functionalities into existing applications and workflows. Developers can
tailor the behaviour of language processing models to suit specific use cases and integrate them
seamlessly with other software components and services.LaaS platforms furnish speech
recognition and synthesis functionalities, enabling applications to transcribe spoken language
into text and generate natural-sounding speech from textual inputs.
Deployment Models
These models describe how cloud services are deployed and who has access to them. Standard
deployment models include Public Cloud, Private Cloud, Hybrid Cloud, and Community Cloud, each with
ownership, control, and resource-sharing characteristics.
Each deployment model has its advantages and considerations, and organisations may choose to adopt
one or a combination of models based on security requirements, compliance considerations,
performance needs, budget constraints, and strategic objectives. Ultimately, the goal is to select the
deployment model that best aligns with the organisation's goals and requirements while maximising the
benefits of cloud computing.
On-Premises Deployment
In this model, software applications are installed and run on computers and servers located within the
premises of an organisation. The organisation is responsible for managing and maintaining all aspects of
the infrastructure, including hardware, software, security, and backups.
Software applications are installed and run on servers within the organisation's premises. The
organisation manages all aspects of the infrastructure, including hardware, software, security, and
backups.
Cloud Deployment
Cloud deployment involves hosting software applications and services on remote servers maintained by
third-party cloud service providers such as Amazon Web Services (AWS), Microsoft Azure, or Google
Cloud Platform. Users access these applications and services over the Internet. Cloud deployment offers
scalability, flexibility, and cost-effectiveness, as organisations can pay only for the resources they use.
Software applications and services are hosted on remote servers maintained by third-party cloud
service providers. Users access these resources over the internet. Cloud deployment offers scalability,
flexibility, and cost-effectiveness as organisations pay only for the resources they use.
Hybrid Deployment
Hybrid deployment combines elements of both on-premises and cloud deployment models.
Organisations may choose to host some applications and services on-premises while utilising cloud
services for others. This approach allows organisations to leverage the benefits of both deployment
models, such as maintaining sensitive data on-premises while taking advantage of cloud scalability for
other workloads.
Software applications and services are hosted on remote servers maintained by third-party cloud
service providers. Users access these resources over the internet. Cloud deployment offers scalability,
flexibility, and cost-effectiveness as organisations pay only for the resources they use.
Multi-Cloud Deployment
Multi-cloud deployment involves using services from multiple cloud providers to meet specific business
needs. Organisations may choose this approach to avoid vendor lock-in, mitigate risk, or take advantage
of specialised services offered by different providers. Organisations use services from multiple cloud
providers to meet specific business needs.
This approach helps avoid vendor lock-in, mitigate risk, or take advantage of specialised services offered
by different providers. These deployment models provide organisations with options to choose the
most suitable infrastructure and delivery method based on their specific requirements, budget, and
technical capabilities.
Functional Components
Functional components are essential for effectively managing and utilising cloud resources in cloud
computing. Computing includes virtual machines or containers for processing and executing
applications. Storage encompasses scalable object or block storage solutions for data management.
Networking provides virtualised networks and connectivity between resources. Security includes
measures like firewalls and encryption to protect data and applications. Management ensures efficient
resource allocation, monitoring, and administration. Orchestration automates deployment, scaling, and
management processes for improved operational efficiency.
Computing component
Computing in cloud computing refers to the fundamental capability of provisioning and managing
virtual machines (VMs) or containers to execute applications. Virtual Machines (VMs) emulate physical
computers and support various operating systems (OS).
They are versatile, allowing applications with diverse OS requirements to run within isolated
environments. On the other hand, containers encapsulate applications and their dependencies into
portable units, ensuring consistency across different com
Storage component
Storage solutions in cloud computing offer scalable options for storing and managing data. Object
storage systems store data as objects, each comprising the data itself, metadata (descriptive attributes),
and a unique identifier.
This approach is highly scalable and ideal for unstructured data like media files and backups. Block
storage, in contrast, manages data in fixed-sized blocks and is commonly used for structured data such
as databases and VM disks. It provides high performance and is typically directly attached to VM
instances for persistent storage needs.
Networking component
Networking components in cloud computing facilitate the establishment and management of
virtualized networks that interconnect cloud resources. Virtual Private Clouds (VPCs) offer isolated
virtual networks dedicated to specific users or groups, ensuring security and control over network
configurations.
Subnets segment the IP address space within a VPC, enabling further granularity and security. Routing
tables dictate how traffic flows between subnets and external networks, optimizing network efficiency
and security.
Security component
Security measures in cloud computing protect data, applications, and infrastructure from unauthorized
access and cyber threats. Firewalls regulate incoming and outgoing network traffic based on predefined
security rules, guarding against unauthorized access and network-based attacks.
Encryption transforms data into a secure format using algorithms, ensuring only authorized parties can
decrypt and access the original data with appropriate keys. Access controls enforce restrictions on
resource access based on authentication credentials, roles, and permissions, adhering to the principle
of least privilege to mitigate security risks.
Management component
Management in cloud computing encompasses tools and processes for efficiently administering cloud
resources throughout their lifecycle. Resource provisioning automates the allocation and deployment of
cloud resources based on demand and workload requirements, ensuring scalability and cost-efficiency.
Performance monitoring continuously tracks resource usage, application performance, and service
availability to detect issues and optimize resource utilization.
Usage optimization analyzes consumption patterns to minimize costs and improve efficiency by
dynamically scaling resources based on workload fluctuations. Compliance management ensures
adherence to regulatory requirements and SLAs, maintaining data protection and service availability
standards.
Orchestration component
Orchestration automates and coordinates the deployment, scaling, and management of cloud resources
and applications. It facilitates automated deployment of resources, reducing manual intervention and
minimizing errors in provisioning and configuration tasks. Scaling capabilities dynamically adjust
resource capacity based on workload changes, optimizing performance and cost-effectiveness.
Management processes streamline complex workflows across different cloud components, ensuring
consistency and reliability in operations. Tools like Kubernetes and Terraform are commonly used for
orchestration, enabling efficient management of containerized applications and infrastructure as code
(IaC) practices. puting environments. Containers are lightweight and facilitate efficient deployment and
scaling of applications, sharing the host OS kernel for resource efficiency.
Data Formats
Standardize how information is structured and exchanged across various systems and services.
Standard data formats like JSON (JavaScript Object Notation) or XML (eXtensible Markup Language)
define how data is formatted and interpreted, facilitating interoperability between different
applications and platforms.
Data formats like JSON and XML standardize how information is structured and exchanged between
systems and services. They define rules for encoding data, facilitating interoperability and enabling
different applications and platforms to interpret and process data consistently and accurately.
Cloud computing reference models provide a structured framework for understanding the components,
layers, and interactions within a cloud computing environment.
While there isn't a standardized classification of "types" of cloud computing reference models, one
widely recognized reference model is the NIST (National Institute of Standards and Technology) Cloud
Computing Reference Architecture. Here's an overview of the NIST Cloud Computing Reference
Architecture.
Cloud Service
A cloud service is an offering made available to cloud service consumers, which could be in the form of
infrastructure (IaaS), platforms (PaaS), or applications (SaaS). Cloud services represent a pivotal aspect
of modern computing, offering a broad array of solutions and resources accessible over the internet
through cloud service providers (CSPs). These services include Infrastructure as a Service (IaaS),
Platform as a Service (PaaS), and Software as a Service (SaaS), each catering to different needs and
levels of abstraction.
IaaS provides virtualized computing resources, PaaS offers application development and deployment
platforms, and SaaS delivers ready-to-use software applications. Cloud services empower organizations
and individuals to leverage computing resources, applications, and data storage on-demand, facilitating
scalability, flexibility, and cost-effectiveness without the burden of managing physical infrastructure.
Example
A cloud service is Microsoft Office 365, which offers a suite of productivity tools hosted on Microsoft's
cloud infrastructure, including Word, Excel, PowerPoint, Outlook, and more. With Office 365, users can
access these applications from any device with an internet connection without installing or maintaining
software locally.
They can collaborate in real time on documents, store files securely in the cloud, and benefit from
automatic updates and backups. This cloud service provides organisations scalability, flexibility, and
cost-effectiveness, allowing them to streamline productivity and collaboration while reducing the
overhead of managing on-premises software and infrastructure.
Cloud Consumer
Cloud consumers, comprising individuals and organizations, leverage cloud services to fulfill various
computing needs without the burden of maintaining on-premises infrastructure. These consumers
interact directly with cloud providers to access and utilize a wide array of resources delivered over the
Internet, including computing power, storage, and software applications.
By adopting cloud solutions, consumers benefit from the scalability, flexibility, and cost-effectiveness of
pay-as-you-go models, enabling them to scale resources up or down based on demand and only pay for
what they use. Additionally, cloud services facilitate remote access to data and applications from
anywhere with an internet connection, promoting user collaboration and productivity.
Cloud Provider
Cloud providers serve as the backbone of the cloud computing ecosystem, offering a range of
infrastructure and services to support the diverse needs of cloud consumers. These entities encompass
public cloud vendors, private cloud operators, and hybrid cloud environments, delivering computing
resources, storage, and networking capabilities via data centres located worldwide.
Cloud providers manage and maintain the underlying hardware and software infrastructure, ensuring
cloud services' availability, reliability, and security. They also invest heavily in innovation, continually
expanding their service offerings and enhancing performance to meet evolving consumer demands.
Cloud Auditor
Cloud auditors play a critical role in ensuring the security and compliance of cloud environments. As
independent entities, they assess and evaluate the security posture of cloud providers, conducting
thorough examinations to verify adherence to industry standards and best practices.
Through assessments, audits, and certifications, cloud auditors offer assurance to consumers regarding
the security and trustworthiness of cloud services. By validating compliance with regulations such as
GDPR, HIPAA, or SOC 2, they help organizations make informed decisions when selecting cloud
providers and mitigate risks associated with data breaches or regulatory non-compliance.
Cloud Broker
Operating as intermediaries between cloud consumers and providers, cloud brokers facilitate the
selecting and procuring of cloud services. They assist consumers in navigating the complex landscape of
cloud offerings, identifying the most suitable solutions based on their requirements and budget
constraints.
Additionally, cloud brokers negotiate contracts with providers to secure favourable terms and pricing
for consumers. Beyond procurement, they offer value-added services such as integration, migration,
and management of cloud resources, streamlining the adoption process and optimizing consumers'
cloud investments.
Cloud Carrier
Cloud carriers are the backbone of cloud connectivity, transporting data and traffic between cloud
consumers and providers. These network and telecommunications providers ensure network
connections' reliability, availability, and performance, facilitating seamless access to cloud services.
By optimizing network infrastructure and leveraging advanced technologies, cloud carriers enhance
data transfer efficiency across distributed cloud environments, minimizing latency and downtime.
Additionally, they offer value-added services such as network security and traffic optimization to
safeguard data integrity and enhance user experience.
Cloud Consumer
Beyond just utilizing cloud services, cloud consumers play a pivotal role in shaping the demand for
various cloud offerings.
They are responsible for defining requirements, selecting appropriate services, and driving innovation
by adopting new technologies. Cloud consumers also influence the development of cloud solutions
through feedback and market demand, ultimately shaping the evolution of cloud computing.
Cloud Provider
In addition to offering cloud services and infrastructure, cloud providers are tasked with ensuring the
security, reliability, and performance of their offerings.
They invest in data centre infrastructure, network connectivity, and cybersecurity measures to deliver
high-quality services that meet the diverse needs of cloud consumers. Cloud providers also play a
crucial role in supporting regulatory compliance and industry standards, fostering consumer trust and
confidence.
Cloud Service
Cloud services encompass a wide range of offerings, each catering to specific use cases and
requirements. These services are designed to be scalable, flexible, and cost-effective, enabling
consumers to leverage computing resources on demand without upfront investments in hardware or
software.
Cloud services promote agility and innovation by providing access to cutting-edge technologies and
enabling rapid deployment of applications and services.
Cloud Resource
Cloud resources are dynamic and scalable within cloud environments, allowing consumers to adjust
resource allocations based on changing demands.
Cloud providers provision and manage these resources, optimize infrastructure utilization and ensure
efficient resource allocation to meet consumer requirements. Cloud resources include virtual machines,
storage volumes, networks, and application instances, all of which contribute to the delivery of cloud
services.
Cloud Interface
Cloud interfaces are the primary means of interaction between cloud consumers and providers,
facilitating the seamless exchange of data and commands. APIs (Application Programming Interfaces)
play a crucial role in enabling programmatic access to cloud resources, allowing consumers to automate
processes and integrate cloud services with existing workflows.
Command-line interfaces (CLIs) and graphical user interfaces (GUIs) provide alternative methods for
interacting with cloud environments, catering to the preferences and expertise of different users.
Cloud Agreement
Cloud agreements define the terms and conditions governing the relationship between cloud
consumers and providers. These agreements outline the rights and responsibilities of each party,
including service-level commitments, data protection measures, and dispute resolution mechanisms.
Cloud agreements also establish pricing models, payment terms, and termination clauses, ensuring
transparency and fairness in the delivery and consumption of cloud services. By formalizing contractual
arrangements, cloud agreements mitigate risks and assure consumers and providers, fostering trust and
long-term partnerships.
Overall, the OCCI Cloud Reference Model provides a standardized approach to understanding the roles,
relationships, and interactions within cloud computing ecosystems, enabling interoperability and
portability across different cloud platforms and implementations. It serves as a foundation for the
development of open, vendor-neutral cloud standards and specifications, promoting innovation and
collaboration in the cloud computing industry.
European European
Defines standards for cloud computing in
Telecommunications Telecommunications
Europe, covering aspects such as
Standards Institute Standards Institute
interoperability, security, and data protection.
(ETSI) Cloud Standards (ETSI)
These reference models and frameworks serve different purposes, from defining architectural
components and capabilities to addressing specific security and compliance requirements. They provide
valuable guidance for organisations adopting cloud computing solutions effectively and securely.
The Security Reference Model in Cloud Computing provides a framework for understanding and
implementing security measures to protect cloud environments and their data.
The security Reference Model in cloud computing provides a comprehensive framework for designing,
implementing, and managing security controls to effectively protect cloud environments and mitigate
security risks. Organizations can tailor this model to their specific requirements and environments while
aligning with industry standards and best practices.
Data Security
Data security protects data throughout its lifecycle, including data-at-rest, in transit, and in use.
Encryption, tokenization, data masking, and data loss prevention (DLP) techniques are commonly used
to safeguard sensitive data from unauthorized access, disclosure, or modification.
Protect sensitive data through encryption, tokenization, or data masking techniques. Implement data
loss prevention (DLP) solutions to prevent unauthorized access, disclosure, or modification of data.
Network Security
Network security encompasses measures to secure network infrastructure, communications, and traffic
within the cloud environment. This includes firewalls, intrusion detection and prevention systems
(IDS/IPS), virtual private networks (VPNs), and network segmentation to prevent unauthorized access
and mitigate network-based attacks.
Secure network infrastructure with firewalls, intrusion detection and prevention systems (IDS/IPS), and
virtual private networks (VPNs). Segment networks to isolate sensitive data and restrict lateral
movement of threats within the cloud environment.
Endpoint Security
Endpoint security involves securing devices such as laptops, smartphones, and servers that access cloud
services. Endpoint protection solutions, including antivirus software, endpoint detection and response
(EDR), and mobile device management (MDM) tools, help detect and prevent security threats at the
device level.
Secure devices accessing cloud services with antivirus software, endpoint detection and response (EDR),
and mobile device management (MDM) solutions. Enforce security policies on endpoints to prevent
malware infections and unauthorized access to cloud resources.
By effectively leveraging the Cloud Computing Reference Model, organisations can capitalise on its
structured approach to enhance scalability, flexibility, security, and innovation, achieving strategic
business objectives in a dynamic digital landscape.
Organizations often measure data center energy efficiency through power usage effectiveness (PUE), which
represents the ratio of the total power entering the data center divided by the power used by IT equipment.
However, the subsequent rise of virtualization has allowed for more productive use of IT equipment, resulting in
much higher efficiency, lower energy usage, and reduced energy costs. Metrics such as PUE are no longer central
to energy efficiency goals. However, organizations can still assess PUE and use comprehensive power and cooling
analysis to understand better and manage energy efficiency.
Datacenter Level
Data centers are not defined by their physical size or style. Small businesses can operate successfully with
multiple servers and storage arrays networked within a closet or small room. At the same time, major computing
organizations -- such as Facebook, Amazon, or Google -- can fill a vast warehouse space with data center
equipment and infrastructure.
In other cases, data centers may be assembled into mobile installations, such as shipping containers, also known
as data centers in a box, that can be moved and deployed.
However, data centers can be defined by different levels of reliability or flexibility, sometimes referred to as data
center tiers.
In 2005, the American National Standards Institute (ANSI) and the Telecommunications Industry Association (TIA)
published the standard ANSI/TIA-942, "Telecommunications Infrastructure Standards for Data Centers", which
defined four levels of data center design and implementation guidelines.
Each subsequent level aims to provide greater flexibility, security, and reliability than the previous level. For
example, a Tier I data center is little more than a server room, while a Tier IV data center provides redundant
subsystems and higher security.
Levels can be differentiated by available resources, data center capabilities, or uptime guarantees. The Uptime
Datacenter designs must also implement sound security and security practices. For example, security is often
reflected in the layout of doors and access corridors, which must accommodate the movement of large,
cumbersome IT equipment and allow employees to access and repair infrastructure.
Fire fighting is another major safety area, and the widespread use of sensitive, high-energy electrical and
electronic equipment precludes common sprinklers. Instead, data centers often use environmentally friendly
chemical fire suppression systems, which effectively oxygenate fires while minimizing collateral damage to
equipment. Comprehensive security measures and access controls are needed as the data center is also a core
business asset. These may include:
o Badge Access;
o biometric access control, and
o video surveillance.
These security measures can help detect and prevent employee, contractor, and intruder misconduct.
Because many service providers today offer managed services and their colocation features, the definition
of managed services becomes hazy, as all vendors market the term slightly differently. The important distinction
to make is:
o The organization pays a vendor to place their hardware in a facility. The customer is paying for the
location alone.
o Managed services. The organization pays the vendor to actively maintain or monitor the hardware
through performance reports, interconnectivity, technical support, or disaster recovery.
What is the difference between Data Center vs. Cloud?
Cloud computing vendors offer similar features to enterprise data centers. The biggest difference between a
cloud data center and a typical enterprise data center is scale. Because cloud data centers serve many different
organizations, they can become very large. And cloud computing vendors offer these services through their data
centers.
Because enterprise data centers increasingly implement private cloud software, they increasingly see end-users,
like the services provided by commercial cloud providers.
Private cloud software builds on virtualization to connect cloud-like services, including:
o system automation;
o user self-service; And
o Billing/Charge Refund to Data Center Administration.
The goal is to allow individual users to provide on-demand workloads and other computing resources without IT
administrative intervention.
Further blurring the lines between the enterprise data center and cloud computing is the development of hybrid
cloud environments. As enterprises increasingly rely on public cloud providers, they must incorporate
connectivity between their data centers and cloud providers.
For example, platforms such as Microsoft Azure emphasize hybrid use of local data centers with Azure or other
public cloud resources. The result is not the elimination of data centers but the creation of a dynamic
environment that allows organizations to run workloads locally or in the cloud or move those instances to or
from the cloud as desired.
Cloud
Cloud may be a term used to describe a group of services, either a global or individual network of servers, that
have a unique function. Cloud is not a physical entity, but they are a group or network of remote servers arched
together to operate as a single unit for an assigned task.
In short, a cloud is a building containing many computer systems. We access the cloud through the Internet
because cloud providers provide the cloud as a service.
One of the many confusions we have is whether the cloud is the same as cloud computing? The answer is no.
Cloud services like Compute run in the cloud. The computing service offered by the cloud lets users' rent'
computer systems in a data center over the Internet.
Another example of a cloud service is storage. AWS says, "Cloud computing is the on-demand delivery of IT
resources over the Internet with pay-as-you-go pricing. Instead of buying, owning, and maintaining physical data
centers and servers, you can access technology services, such as computing power, storage, and databases, from
a cloud provider such as Amazon Web Services (AWS)."
Types of Cloud:
Businesses use cloud resources in different ways. There are mainly four of them:
o Public Cloud: The cloud method is open to all with the Internet on a pay-per-use method.
o Private Cloud: This is a cloud method used by organizations to make their data centers accessible only
with the organization's permission.
o Hybrid cloud: It is a cloud method that combines public and private clouds. It caters to the various needs
of an organization for its services.
o Community cloud is a cloud method that provides services to an organization or a group of people within
a single community.
Data Center
A data center can be described as a facility/location of networked computers and associated components (such
as telecommunications and storage) that help businesses and organizations handle large amounts of data. These
data centers allow data to be organized, processed, stored, and transmitted across applications used by
businesses.
Cloud is a virtual resource that helps Data Center is a physical resource that helps
1. businesses store, organize, and operate data businesses store, organize, and operate data
efficiently. efficiently.
The scalability of the cloud required less The scalability of the Data Center is huge in
2.
amount of investment. investment compared to the cloud.
Maintenance cost is less as compared to Maintenance cost is high because the developers of
3.
service providers. the organization do the maintenance.
The organization needs to rely on third The organization's developers are trusted for the
4.
parties to store its data. data stored in the data centers.
The performance is huge compared to the
5. The performance is less than the investment.
investment.
6. This requires a plan for optimizing the cloud. It is easily customizable without any hard planning.
It requires a stable internet connection to This may or may not require an internet
7.
provide the function. connection.
The cloud is easy to operate and is Data centers require experienced developers to
8.
considered a viable option. operate and are not considered a viable option.
Data centers are physical computing resources that allow organizations to operate their websites or digital
offerings 24/7. Data centers are generally made up of racks (servers are stacked with each other), cabinets,
cables, and many more. Maintaining a data center requires a significant amount of networking knowledge. We
can host our servers in these data centers either shared or dedicated. Hosted website speed in data centers is
usually based on the hardware and specifications of an SSD-based server is faster and more expensive than an
HDD-based server.
Data Center: In a dedicated space with strong security levels, where enterprises or organizations store and share
large amounts of data, is known as a data center.
Data Center Infrastructure Design:
Old Data Center Design:
It is mainly based on north-south traffic. To reach the server how much hope a packet will require wasn’t
predictable. It used to take a lot of time to reach a packet from server to server. We are not able to handle east-
west traffic.
Computing Environments
Computing environments refer to the technology infrastructure and software platforms that are used to develop,
test, deploy, and run software applications. There are several types of computing environments, including:
1. Mainframe: A large and powerful computer system used for critical applications and large-scale data
processing.
2. Client-Server: A computing environment in which client devices access resources and services from a
central server.
3. Cloud Computing: A computing environment in which resources and services are provided over the
Internet and accessed through a web browser or client software.
4. Mobile Computing: A computing environment in which users access information and applications using
handheld devices such as smartphones and tablets.
5. Grid Computing: A computing environment in which resources and services are shared across multiple
computers to perform large-scale computations.
6. Embedded Systems: A computing environment in which software is integrated into devices and products,
often with limited processing power and memory.
Each type of computing environment has its own advantages and disadvantages, and the choice of environment
depends on the specific requirements of the software application and the resources available.
In the world of technology where every tasks are performed with help of computers, these computers have
become one part of human life. Computing is nothing but process of completing a task by using this computer
technology and it may involve computer hardware and/or software. But computing uses some form of computer
system to manage, process, and communicate information. After getting some idea about computing now lets
understand about computing environments.
Computing Environments : When a problem is solved by the computer, during that computer uses many devices,
arranged in different ways and which work together to solve problems. This constitutes a computing
environment where various number of computer devices arranged in different ways to solve different types of
problems in different ways. In different computing environments computer devices are arranged in different
ways and they exchange information in between them to process and solve problem. One computing
environment consists of many computers other computational devices, software and networks that to support
processing and sharing information and solving task. Based on the organization of different computer devices and
communication processes there exists multiple types of computing environments.
Now lets know about different types of computing environments.
Types of Computing Environments : There are the various types of computing environments. They are :
Cloud Programming
Cloud computing has revolutionized the way software applications are built, deployed, and managed. Cloud
programming involves several key aspects that developers must consider to ensure scalability, security,
performance, and reliability. The following are the crucial facets of cloud programming that shape modern cloud-
native applications.
1. Scalability and Elasticity
One of the fundamental advantages of cloud computing is its ability to scale resources dynamically based on
demand. Scalability refers to the system’s ability to handle increased workloads by adding resources. It can be of
two types: vertical scaling (scaling up/down) and horizontal scaling (scaling out/in). Vertical scaling increases
the capacity of a single machine (e.g., upgrading RAM or CPU), whereas horizontal scaling involves adding more
instances to distribute the workload.
On the other hand, elasticity is the ability of a system to automatically allocate or deallocate resources as
needed. Cloud platforms such as AWS, Azure, and Google Cloud provide auto-scaling features that adjust
resource allocation in real time based on traffic fluctuations. This ensures optimal performance without over-
provisioning, thereby reducing costs.
2. Multi-Tenancy and Resource Sharing
Cloud environments are designed to be multi-tenant, meaning multiple users (tenants) share the same physical
infrastructure while maintaining logical isolation. This approach maximizes resource utilization and cost
efficiency. However, it also brings challenges such as data security, resource contention, and tenant isolation.
Cloud providers implement virtualization, encryption, and access controls to ensure that each tenant’s data
remains private and secure.
A good example of multi-tenancy is Software as a Service (SaaS) platforms like Google Workspace and Microsoft
365, where multiple users share the same infrastructure but experience a personalized, isolated environment.
3. Cloud Service Models (IaaS, PaaS, SaaS, FaaS)
Cloud computing is categorized into different service models, each serving distinct purposes:
Infrastructure as a Service (IaaS): Provides virtualized computing resources such as virtual machines,
storage, and networking (e.g., AWS EC2, Google Compute Engine). Developers have complete control
over infrastructure but must manage system administration tasks.
Platform as a Service (PaaS): Offers a platform for application development without managing
underlying hardware or operating systems (e.g., Google App Engine, AWS Elastic Beanstalk). This allows
developers to focus on writing code rather than managing infrastructure.
Software as a Service (SaaS): Delivers applications over the internet without requiring installation or
maintenance (e.g., Dropbox, Gmail, Microsoft Teams). Users can access these services via web browsers
or mobile apps.
Function as a Service (FaaS) / Serverless Computing: Allows developers to execute code in response to
events without provisioning or managing servers (e.g., AWS Lambda, Azure Functions). This model is cost-
effective for event-driven applications, as billing is based on execution time rather than always-on
infrastructure.
4. Virtualization and Containerization
Virtualization is a key technology that enables cloud computing by allowing multiple virtual machines (VMs) to
run on a single physical server. Hypervisors like VMware ESXi, Microsoft Hyper-V, and KVM manage these VMs,
providing isolated environments for different applications.
However, a more lightweight alternative is containerization, which packages applications with their
dependencies in isolated units called containers. Docker is the most popular containerization tool, and
Kubernetes is widely used for orchestrating and managing containers at scale. Containers offer faster
deployment, lower overhead, and consistent environments across different cloud platforms.
5. Security and Compliance in Cloud Programming
Security is a major concern in cloud computing, as applications and data are hosted on shared infrastructure.
Cloud providers implement security mechanisms such as data encryption (at rest and in transit), identity and
access management (IAM), firewalls, and intrusion detection systems to protect sensitive information.
Additionally, cloud applications must comply with regulatory standards such as General Data Protection
Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), and ISO 27001. Compliance
ensures that data is handled securely and legally. Developers must follow security best practices, such as using
multi-factor authentication (MFA), role-based access control (RBAC), and regular security audits to minimize
risks.
6. Cloud Storage and Data Management
Cloud storage services provide scalable and reliable solutions for storing application data. The three main types
of cloud storage are:
Object Storage: Data is stored as objects with metadata, making it ideal for unstructured data like
images, videos, and backups (e.g., Amazon S3, Google Cloud Storage).
Block Storage: Used for persistent storage in virtual machines and databases (e.g., AWS EBS, Azure
Managed Disks).
File Storage: Provides shared file storage for distributed applications (e.g., Google Filestore, Azure Files).
Cloud databases such as Amazon RDS, Google Cloud Spanner, and MongoDB Atlas offer managed solutions that
handle replication, backups, and scaling automatically, allowing developers to focus on application logic rather
than database administration.
7. Cloud Networking and Load Balancing
Cloud networking plays a crucial role in ensuring smooth communication between cloud resources. Cloud
providers offer services such as Virtual Private Clouds (VPCs), VPNs, and firewalls to manage network security
and connectivity.
Load balancing is used to distribute incoming traffic across multiple instances, preventing overload on a single
server and improving application reliability. Cloud-based load balancers (e.g., AWS Elastic Load Balancer, Azure
Load Balancer) automatically adjust traffic distribution based on server health and performance.
8. DevOps and CI/CD in Cloud Development
DevOps practices are widely adopted in cloud computing to streamline software development and deployment.
Continuous Integration (CI) and Continuous Deployment (CD) pipelines automate code testing and deployment,
reducing manual effort and improving software reliability.
Popular CI/CD tools include Jenkins, GitHub Actions, GitLab CI/CD, and AWS CodePipeline. These tools enable
faster software updates, reducing downtime and improving application performance.
9. Edge Computing and Hybrid Cloud
With the rise of Internet of Things (IoT) and real-time applications, edge computing has gained prominence.
Edge computing processes data closer to the source (e.g., IoT devices, local servers) instead of sending everything
to a centralized cloud. This reduces latency, bandwidth usage, and dependency on cloud availability.
Hybrid cloud is another approach where organizations use a mix of public cloud, private cloud, and on-premises
infrastructure. Hybrid solutions like AWS Outposts, Azure Arc, and Google Anthos enable seamless integration
between different environments, offering flexibility and control.
10. Programming Languages and Frameworks for Cloud Development
Developers use various languages and frameworks for cloud-native application development:
Python: Popular for cloud automation and machine learning (used in AWS Lambda, Google Cloud
Functions).
Java: Common for enterprise applications running in cloud environments (Spring Boot, Jakarta EE).
Node.js: Preferred for real-time applications and serverless computing.
Golang: Used for microservices and containerized applications (Docker, Kubernetes).
Frameworks like Serverless Framework, AWS SAM, and Terraform simplify cloud application deployment and
infrastructure management.
Distributed Computing
Distributed computing refers to a computing model where a problem is divided into multiple sub-problems that
are executed on different machines (nodes) in a networked environment. Each node works independently but
communicates with other nodes to complete the computation.
Characteristics of Distributed Computing
Workloads are divided among multiple independent machines.
No shared memory; nodes communicate via a network.
Ensures fault tolerance—if one node fails, others can continue processing.
Commonly used for cloud computing, big data processing, and blockchain technology.
Examples of Distributed Computing
Google Search Engine: Google’s indexing system distributes search queries across multiple servers.
Big Data Processing: Hadoop and Spark process terabytes of data across distributed nodes.
Microservices Architecture: Large applications are built using independent microservices running on
different servers.
2. Introduction to MapReduce
MapReduce is a distributed computing framework introduced by Google for processing large datasets across a
cluster of computers. It follows a divide-and-conquer approach by breaking down tasks into independent sub-
tasks that can be executed in parallel.
Why MapReduce?
Processes massive datasets efficiently.
Runs on clusters of commodity hardware, reducing costs.
Fault-tolerant: If a node fails, the task is reassigned.
Provides a simple programming model for distributed data processing.
Architecture of MapReduce
MapReduce consists of two main functions:
1. Map Function: Divides the input data into smaller chunks and processes them in parallel.
2. Reduce Function: Aggregates and combines the mapped output to generate the final result.
The execution process is managed by a Master Node, which assigns tasks to multiple Worker Nodes that execute
the Map and Reduce functions.
5. Applications of MapReduce
MapReduce is widely used in cloud computing, big data, and distributed systems for:
Big Data Analytics: Processing large datasets in platforms like Hadoop.
Search Engines: Google uses MapReduce for indexing web pages.
Log Analysis: Companies analyze massive logs from servers using MapReduce.
Bioinformatics: DNA sequencing and genome analysis require massive data processing, which is done
using MapReduce.
MapReduce Architecture
MapReduce and HDFS (Hadoop Distributed File System) are the two major components of Hadoop which makes
it so powerful and efficient to use. MapReduce is a programming model used for efficient processing in parallel
over large data-sets in a distributed manner. The data is first split and then combined to produce the final result.
The libraries for MapReduce is written in so many programming languages with various different-different
optimizations. The purpose of MapReduce in Hadoop is to Map each of the jobs and then it will reduce it to
equivalent tasks for providing less overhead over the cluster network and to reduce the processing power. The
MapReduce task is mainly divided into two phases Map Phase and Reduce Phase.
MapReduce Architecture:
In MapReduce, we have a client. The client will submit the job of a particular size to the Hadoop MapReduce
Master. Now, the MapReduce master will divide this job into further equivalent job-parts. These job-parts are
then made available for the Map and Reduce Task. This Map and Reduce task will contain the program as per the
requirement of the use-case that the particular company is solving. The developer writes their logic to fulfill the
requirement that the industry requires. The input data which we are using is then fed to the Map Task and the
Map will generate intermediate key-value pair as its output. The output of Map i.e. these key-value pairs are then
fed to the Reducer and the final output is stored on the HDFS. There can be n number of Map and Reduce tasks
made available for processing the data as per the requirement. The algorithm for Map and Reduce is made with a
very optimized way such that the time complexity or space complexity is minimum.
Let’s discuss the MapReduce phases to get a better understanding of its architecture:
The MapReduce task is mainly divided into 2 phases i.e. Map phase and Reduce phase.
1. Map: As the name suggests its main use is to map the input data in key-value pairs. The input to the map
may be a key-value pair where the key can be the id of some kind of address and value is the actual value
that it keeps. The Map() function will be executed in its memory repository on each of these input key-
value pairs and generates the intermediate key-value pair which works as input for the Reducer
or Reduce() function.
2. Reduce: The intermediate key-value pairs that work as input for Reducer are shuffled and sort and send to
the Reduce() function. Reducer aggregate or group the data based on its key-value pair as per the reducer
algorithm written by the developer.
How Job tracker and the task tracker deal with MapReduce:
1. Job Tracker: The work of Job tracker is to manage all the resources and all the jobs across the cluster and
also to schedule each map on the Task Tracker running on the same data node since there can be hundreds
of data nodes available in the cluster.
2. Task Tracker: The Task Tracker can be considered as the actual slaves that are working on the instruction
given by the Job Tracker. This Task Tracker is deployed on each of the nodes available in the cluster that
executes the Map and Reduce task as instructed by Job Tracker.
There is also one important component of MapReduce Architecture known as Job History Server. The Job History
Server is a daemon process that saves and stores historical information about the task or application, like the logs
which are generated during or after the job execution are stored on Job History Server.
Hadoop – Architecture
As we all know Hadoop is a framework written in Java that utilizes a large cluster of commodity hardware to
maintain and store big size data. Hadoop works on MapReduce Programming Algorithm that was introduced by
Google. Today lots of Big Brand Companies are using Hadoop in their Organization to deal with big data, eg.
Facebook, Yahoo, Netflix, eBay, etc. The Hadoop Architecture Mainly consists of 4 components.
1. MapReduce
2. HDFS(Hadoop Distributed File System)
3. YARN(Yet Another Resource Negotiator)
4. Common Utilities or Hadoop Common
1. MapReduce
MapReduce nothing but just like an Algorithm or a data structure that is based on the YARN framework. The
major feature of MapReduce is to perform the distributed processing in parallel in a Hadoop cluster which Makes
Hadoop working so fast. When you are dealing with Big Data, serial processing is no more of any use. MapReduce
has mainly 2 tasks which are divided phase-wise:
In first phase, Map is utilized and in next phase Reduce is utilized.
Here, we can see that the Input is provided to the Map() function then it’s output is used as an input to the
Reduce function and after that, we receive our final output. Let’s understand What this Map() and Reduce()
does.
As we can see that an Input is provided to the Map(), now as we are using Big Data. The Input is a set of Data. The
Map() function here breaks this DataBlocks into Tuples that are nothing but a key-value pair. These key-value
pairs are now sent as input to the Reduce(). The Reduce() function then combines this broken Tuples or key-value
pair based on its Key value and form set of Tuples, and perform some operation like sorting, summation type job,
etc. which is then sent to the final Output Node. Finally, the Output is Obtained.
The data processing is always done in Reducer depending upon the business requirement of that industry. This is
How First Map() and then Reduce is utilized one by one.
Map Task:
RecordReader The purpose of recordreader is to break the records. It is responsible for providing key-
value pairs in a Map() function. The key is actually is its locational information and value is the data
associated with it.
Map: A map is nothing but a user-defined function whose work is to process the Tuples obtained from
record reader. The Map() function either does not generate any key-value pair or generate multiple pairs
of these tuples.
Combiner: Combiner is used for grouping the data in the Map workflow. It is similar to a Local reducer.
The intermediate key-value that are generated in the Map is combined with the help of this combiner.
Using a combiner is not necessary as it is optional.
Partitionar: Partitional is responsible for fetching key-value pairs generated in the Mapper Phases. The
partitioner generates the shards corresponding to each reducer. Hashcode of each key is also fetched by
this partition. Then partitioner performs it’s(Hashcode) modulus with the number of
reducers(key.hashcode()%(number of reducers)).
Reduce Task
Shuffle and Sort: The Task of Reducer starts with this step, the process in which the Mapper generates
the intermediate key-value and transfers them to the Reducer task is known as Shuffling. Using the
Shuffling process the system can sort the data using its key value.
Once some of the Mapping tasks are done Shuffling begins that is why it is a faster process and does not
wait for the completion of the task performed by Mapper.
Reduce: The main function or task of the Reduce is to gather the Tuple generated from Map and then
perform some sorting and aggregation sort of process on those key-value depending on its key element.
OutputFormat: Once all the operations are performed, the key-value pairs are written into the file with
the help of record writer, each record in a new line, and the key and value in a space-separated manner.
2. HDFS
HDFS(Hadoop Distributed File System) is utilized for storage permission. It is mainly designed for working on
commodity Hardware devices(inexpensive devices), working on a distributed file system design. HDFS is designed
in such a way that it believes more in storing the data in a large chunk of blocks rather than storing small data
blocks.
HDFS in Hadoop provides Fault-tolerance and High availability to the storage layer and the other devices present
in that Hadoop cluster. Data storage Nodes in HDFS.
1. NameNode(Master)
2. DataNode(Slave)
NameNode:NameNode works as a Master in a Hadoop cluster that guides the Datanode(Slaves). Namenode is
mainly used for storing the Metadata i.e. the data about the data. Meta Data can be the transaction logs that
keep track of the user’s activity in a Hadoop cluster.
Meta Data can also be the name of the file, size, and the information about the location(Block number, Block ids)
of Datanode that Namenode stores to find the closest DataNode for Faster Communication. Namenode instructs
the DataNodes with the operation like delete, create, Replicate, etc.
DataNode: DataNodes works as a Slave DataNodes are mainly utilized for storing the data in a Hadoop cluster,
the number of DataNodes can be from 1 to 500 or even more than that. The more number of DataNode, the
Hadoop cluster will be able to store more data. So it is advised that the DataNode should have High storing
capacity to store a large number of file blocks.
File Block In HDFS: Data in HDFS is always stored in terms of blocks. So the single block of data is divided into
multiple blocks of size 128MB which is default and you can also change it manually.
Let’s understand this concept of breaking down of file in blocks with an example. Suppose you have uploaded a
file of 400MB to your HDFS then what happens is this file got divided into blocks of
128MB+128MB+128MB+16MB = 400MB size. Means 4 blocks are created each of 128MB except the last one.
Hadoop doesn’t know or it doesn’t care about what data is stored in these blocks so it considers the final file
blocks as a partial record as it does not have any idea regarding it. In the Linux file system, the size of a file block
is about 4KB which is very much less than the default size of file blocks in the Hadoop file system. As we all know
Hadoop is mainly configured for storing the large size data which is in petabyte, this is what makes Hadoop file
system different from other file systems as it can be scaled, nowadays file blocks of 128MB to 256MB are
considered in Hadoop.
Replication In HDFS Replication ensures the availability of the data. Replication is making a copy of something
and the number of times you make a copy of that particular thing can be expressed as it’s Replication Factor. As
we have seen in File blocks that the HDFS stores the data in the form of various blocks at the same time Hadoop
is also configured to make a copy of those file blocks.
By default, the Replication Factor for Hadoop is set to 3 which can be configured means you can change it
manually as per your requirement like in above example we have made 4 file blocks which means that 3 Replica
or copy of each file block is made means total of 4×3 = 12 blocks are made for the backup purpose.
This is because for running Hadoop we are using commodity hardware (inexpensive system hardware) which can
be crashed at any time. We are not using the supercomputer for our Hadoop setup. That is why we need such a
feature in HDFS which can make copies of that file blocks for backup purposes, this is known as fault tolerance.
Now one thing we also need to notice that after making so many replica’s of our file blocks we are wasting so
much of our storage but for the big brand organization the data is very much important than the storage so
nobody cares for this extra storage. You can configure the Replication factor in your hdfs-site.xml file.
Rack Awareness The rack is nothing but just the physical collection of nodes in our Hadoop cluster (maybe 30 to
40). A large Hadoop cluster is consists of so many Racks . with the help of this Racks information Namenode
chooses the closest Datanode to achieve the maximum performance while performing the read/write
information which reduces the Network Traffic.
HDFS Architecture
3. YARN(Yet Another Resource Negotiator)
YARN is a Framework on which MapReduce works. YARN performs 2 operations that are Job scheduling and
Resource Management. The Purpose of Job schedular is to divide a big task into small jobs so that each job can
be assigned to various slaves in a Hadoop cluster and Processing can be Maximized. Job Scheduler also keeps
track of which job is important, which job has more priority, dependencies between the jobs and all the other
information like job timing, etc. And the use of Resource Manager is to manage all the resources that are made
available for running a Hadoop cluster.
Features of YARN
1. Multi-Tenancy
2. Scalability
3. Cluster-Utilization
4. Compatibility