0% found this document useful (0 votes)
32 views4 pages

IIT Jammu Case Study

Netweb has established a state-of-the-art data center at IIT Jammu, designed to support high-performance computing, big data, AI, and cloud infrastructure for researchers across the Jammu & Kashmir region. The facility boasts 450 Teraflops of computing power, integrating diverse user requirements into a single platform while optimizing energy efficiency. This deployment enhances IIT Jammu's research capabilities in various scientific disciplines and is built for future scalability.

Uploaded by

Yopie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views4 pages

IIT Jammu Case Study

Netweb has established a state-of-the-art data center at IIT Jammu, designed to support high-performance computing, big data, AI, and cloud infrastructure for researchers across the Jammu & Kashmir region. The facility boasts 450 Teraflops of computing power, integrating diverse user requirements into a single platform while optimizing energy efficiency. This deployment enhances IIT Jammu's research capabilities in various scientific disciplines and is built for future scalability.

Uploaded by

Yopie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Case Study

HIgh Performance Computing

Netweb Deploys a Complete


Data Center at IIT Jammu
Jammu & Kashmir’s first state-of-the-art facility is de-
signed with a unique yet high-performance blend of
HPC, Big Data, AI, and Cloud Infrastructure. The facility
is established with a mandate to help not just IIT Jam-
mu’s own research scholars but scholars and faculty
members from the entire Jammu & Kashmir region. This
is a stepping stone towards ushering a high-end tech-
nology era for Jammu & Kashmir.

OVERVIEW
Research facilities are expected Removing operating Silos: When
to deliver HPC systematically and the customer reviewed what the
reliably to keep pace with the end-state target data model should
unprecedented levels of computa- look like, it became evident that
tion required to gather, evaluate, they wanted to remove any operat-
ing silos between the user groups. Industry and Country
and move voluminous data. The
increase in data generated by Education & Research, India
researchers at IIT Jammu tasked The main objective was to allow
Products and Services
them with facilitating optimal different user groups like parallel-
HPC Cluster, Big Data cluster,
performance while simultaneously computation users, AI/Machine
Private Cloud, Parallel File System
minimizing power usage by com- and Learning users, and gener-
puting systems, maximizing the al-purpose computing users, an
Key Challenges
efficiency of their cooling processes. avenue to not only learn together
but also learn cross platform archi- • Designing a Data Center with
PROJECT DRIVERS tectures. - the lowest hardware footprint
IIT Jammu wanted a facility that yet highest performance
can deliver the highest perfor- Providing data seamlessly to all • Integrating cloud & Big Data
mance, a technology platform that users became a prime motivation facilities with existing networks
can meet their computational for adopting a hybrid model that • To build a single platform that
needs, and last but not least is would also allow us to converge could serve CPU intensive, GPU
economical and compact in design. these applications and capitalize intensive and large in memory
on the benefits of all the platforms based computational needs.
“ The overall efficiency of the together.
cluster was highly optimized by Results
Netweb HPC experts which led to Building a single platform that • Computing power of 450TF
higher performance. Even the could serve CPU-intensive, GPU - • 256TF of peak performance
sustained storage throughput/per- intensive and large in memory from CPUs and 56TF from GPUs
formance surpassed the criteria we based computational needs for a • 48 TB of total memory
Dr. Sanat Tiwari key driver for the project. • Data Center PUE of 1.4
Assistant professor, Dep. of Physics
IIT Jammu
THE NEW DATA CENTER

Established in 2016, IIT Jammu is an Institute of


National Importance to promote higher education
and cutting edge R & D. Since then, the idea of
setting up a large-scale computational facility
started evolving. The challenges were to integrate
diverse user requirements, campus networking, and
IT services on board in a single infrastructure for
optimized usage and performance. Netweb
designed the entire data centre & compute infra-
structure with such a high performance yet achiev-
ing such a low PUE level.
IIT Jammu Data Center - Agastya at-a-glance
The final design was a complete Data Centre with 450 TF Compute Power
HPC Cluster, Big Data Cluster, a Private Cloud for IIT 256TF of Peak Performance from CPUs
Jammu and a High Throughput Parallel File System. 56TF of Peak Performance from GPUs
The HPC consists of 80 compute nodes, each with a
48 TB of Total Memory
20-core Intel processor. Each core runs at 2.50 GHz
and has 32 GB memory. The system provides about Scalable low latency interconnect
450 TFlops of computing power.
BIG DATA & AI DEPLOYMENT
CROSS PLATFORM HYBRID ARCHITECTURE
Netweb delivered a fault-tolerant & scalable
Hadoop system that efficiently oversees the
IIT Jammu wanted an infrastructure with scalable
storage and retrieval of data across the cluster.
resources to run HPC, AI and Big data applications
The Big Data centre includes Hadoop, Hive, Pig,
and run workloads beyond the limitations of physical
spark etc. This cloud facility uses Kubernetes
servers. This gave Netweb the opportunity to deploy
container orchestration with Juju container man-
state of art Private Cloud, HPC and Big Data clusters.
agement tool. The ganglia cloud monitoring is
used to monitor the health of VMs and services.
When we reviewed what the end-state target data
model should look like, it became evident that we
Netweb included 8 nodes of dual NVIDIA Tesla
want to remove operating silos between different
V100 GPUs. The acceleration capability of the GPU
user groups and provide data seamlessly to all users.
component enables our researchers to carry out
All this became a prime motivation for adopting a
much larger simulations. We used a container
hybrid model.
platform from Tyrone to orchestrate various HPC
and AI containers over the GPU systems.
BIG CLOUD
DATA Cloud Integral design elements of Agastaya
Virtualised HPC
3 x GPU Suite
Data Nodes Containers  An efficient latest compute HPC infrastructure
Parallel that has 256Tflops of peak performance from
Big Data
File CPUs and appox 56 Tflops from GPUs
VDI
System
(PFS)  The scalable low latency interconnect which
GPU HPC currently has 100Gbps speed and scalable to
200Gbps in future
80 x Nodes
8 x Nodes
 A scalable 500TB+ based Parallel File System
Tyrone Cluster Manager with sustained performance of more than 30GBps.
TCM
 A unified cluster management utility that eases
HPC | Big Data | Private | Parallel File System overall administration.
TYRONE CLUSTER MANAGER

Tyrone cluster manager (TCM) is a complete


management suite that lets you deploy complete
clusters and manage them effectively.

TCM has given IIT Jammu the flexibility with a large


set of workloads’ support and provisioned Big-data
cluster with GPU support which can also be used
for Big Data and Deep Learning. A unified cluster
management utility which eases overall administra-
tion.
Management software for Cluster Management and
CLOUD DEPLOYMENT
Cloud
The private cloud facility uses Tyrone Skylus, our
hyper-converged cloud suite based on Openstack POWER CONSUMPTION, SPACE, & COST
which is customised to integrate compute, storage
and networking at various granularity level. IIT Jammu has continually monitored energy use
since the launch of the data center, using power
We run some mandatory cloud services for IIT usage effectiveness (PUE) as the primary metric.
Jammu including compute services for VM, bare Designing a Data Center with the lowest Hardware
metal compute, container provisioning and storage footprint yet highest deliverable performance was a
services for object, block & file. The VMs can be challenge. During execution, Netweb & IIT team
created with basic cloud features and compilers as came up with optimized containment designs
well as advanced cloud features based VMs with which benefited us in two ways:
pre-installed images of Deep Learning framework.
 Data centre Containment provides us future
Tyrone Skylus also provides us container platform scalability
to orchestrate various HPC and AI containers over
the GPU systems.  Increase in our PUE percentage.

Other Key Features


“ Needless to say Netweb has a strong back-
 Virtualization of HPC applications enabled both bone in R&D, which helped us to synchronize
applications and test / learning codes to reside on quickly. The team of experts worked with us
VMs. Netweb configured these VMs for quick tirelessly to meet our aggressive timelines in
deployment such that we can provision VMs of spite of COVID challenges. Combining HPC and
different sizes quickly, with some VMs as large as big data in a single environment is a huge task.
baremetal systems, allowing us to mazimise our But Netweb being an experienced integrator
HPC system. understood our team’s requirements for desir-
able results. ”
 VDI and remote desktop feature - We use the
remote desktop feature where users can log in to Dr. Manoj Gaur
their desktops and use any part of the technology Director - IIT Jammu
stack as per the usage rights. Such as HPC, GPU or
Big Data servers and applications
THE CONVERGENCE OF HPC & AI
 Tyrone Skylus Cloud Suite - To deliver SDN , SDS
Experts have observed a 40% reduction in
and virtualize compute and GPUs. With the help of
error rates when 10x more data is being used
Skylus users have flexibility to create virtual work-
stations with multi GPU support, virtual AI Clusters in coordination with AI
with cpu and GPU.
Key Components of the Data Center - AGASTYA

• 2 x Master Node
• 72 x Compute Nodes
HPC • 8 x GPU Nodes NVIDIA Tesla V100
• 4 x HPC Storage
• 5 x IB Switch (IB-L1) ,6 x Ethernet Switch

HPC master node


• 3 X Name Node
HPC secondary node Big Data • 6 X Data Node (CPU-CPU)
(Hadoop) • 3 X Data Node (CPU-GPU)
• 3 X Storage Expansion enclosures

• 3 x Infrastructure Node
• 2 x Cloud Manager Node
Cloud • 3 x Converged Cloud Node
• 3 x Converged Cloud Node GPU
• 2 x Fibre Switch, 2 x Ethernet Switch

HPC Storage • 4 X Lustre parallel file system


• Up to 90GB/sec throughput per appliance.

Management • Tyrone Skylus


• Tyrone Cluster Manager (TCM)
Software
40 Nodes 30 Nodes 12 + Storage

RESULTS

Deployed in a record time amidst the pandemic, Netweb delivered a full-scale mixed workload environment with
HPC, Big data and cloud, along with a complete software stack with containers, micro-service-based applica-
tions, and Deep Learning frameworks in a cost-efficient, shared infrastructure. AI was added into the mix as an
enabler for HPC and Big data at scale. This provided end-to-end efficiency for applications in a distributed
environment, on-Premise and in the cloud.

The Data Center gives IT Jammu researchers an enormous leap in compute capability. With the increased com-
pute power of Agastya, the facility now touts a Peak performance of 256 Teraflops over CPU Compute Nodes
and 56 Teraflops Peak Performance over GPU Compute Nodes, with 48 terabytes of total memory, the overall
HPC facility will have a peak computing power of 450 Teraflops.

This deployment will substantially expand IIT Jammu’s ability to carry out research across a broad range of
scientific disciplines such as Astrophysics, computational chemistry, Molecular dynamics, AI, Machine Learning
and Deep Learning. The scalable design is also capable of handling years of future technology upgrades, thus
increasing agility and the facility’s overall lifespan.

Netweb has a reputation for excellence in deploying supercomputing resources for education and research
institutes to help them address difficult research challenges.

INDONESIA

© 2021 Netweb. All rights reserved. Netweb & NARL logo are registered trademarks
All third-party trademarks are the property of their respective owners.

You might also like