Brkcom 1004

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Generative AI Inferencing with

Cisco UCS X-Series-based


Converged Infrastructure

John George, TME


@johnrgeorge
BRKCOM-1004

#CiscoLive
Cisco Webex App
https://ciscolive.ciscoevents.com/
ciscolivebot/#BRKCOM-1004

Questions?
Use Cisco Webex App to chat
with the speaker after the session

How
1 Find this session in the Cisco Live Mobile App

2 Click “Join the Discussion”

3 Install the Webex App or go directly to the Webex space

4 Enter messages/questions in the Webex space

Webex spaces will be moderated Enter your personal notes here

by the speaker until June 7, 2024.

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 2
• Intro to Generative AI
Inferencing
• Cisco UCS X-Series with
GPUs
• FlexPod as a Platform for
Generative AI Inferencing
Agenda • FlashStack as a Platform for
Generative AI Inferencing
• Generative AI Inferencing
Examples
• Retrieval Augmented
Generation (RAG)
BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 3
What are we building for?
Source data

Training in the cloud


Text Foundational models Model hubs
Code
Image
Audio
Video
Structured
Signals Large language Computer vision Generative Hugging Face, ModelHub,
models models models NVIDIA NGC, others

NGC

Very large models built by a few vendors on very large clusters Most customers start here to leverage
typically on public cloud existing Foundational models;
Integrate with own data to fine tune
the model and deploy for Inference

The goal of Cisco AI framework is to deliver an “AI ready stack” that integrates and automates
compute, network, storage, AI models, and AI tools to get customers started on their AI journey
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 4
Gen AI infrastructure mapping
Inferencing
Most GenAI customers are here

1
GPU 4-8
CPU CPU 2 2-4
GPU GPU
only only GPU

Edge DC

Fine-tuning
Leveraging 2-4
foundation 1-2 GPU
GPU
models

CPU CPU CPU cluster 8-16


only cluster 4-8 GPU
CPU GPU
only cluster

Training
1-2
Build GPU
models
1000+ GPU
from scratch
CPU only CPU 8-16 64-128 GPU
cluster cluster 4-8 GPU
GPU

100M 1Bil 10Bil 100Bil 100+Bil

Number of parameters

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Cisco Validated Designs (CVDs) Covered
• FlexPod Datacenter with Generative AI Inferencing
• FlashStack for Generative AI Inferencing Design Guide

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 6
Cisco UCS X-Series and 5th Generation Fabric

Cisco UCS 6536 1


2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

Fabric Interconnect
3
4

LS

UCS-FI-6536
BCN STS ENV

Cisco UCS X9108 IFM


Cisco UCS X9508 Chassis
UCS 9508 0

01
0
1 02 1 5
01 3 7
1 03
02
03

2 6

UCS 9108 100G


4 8 1

UCS 9108 25G


1 2 3 4 5 6 7 8

2
0

01
1 5
2 02
3 7
03

2
3 2 6
4 8
UCS9108
UCS 9108100G
25G
1 2 3 4 5 6 7 8

1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4

1 3

2 4
5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6

X X X X X X X X
X X X X

5
0
4
01
02
03

1
UCS X9416

300 GB 300 GB 300 GB 300 GB 300 GB 300 GB 300 GB 300 GB


300 GB 300 GB 300 GB 300 GB 5 00
SAS HDD SAS HDD SAS HDD SAS HDD SAS HDD SAS HDD SAS HDD SAS HDD 6
SAS HDD SAS HDD SAS HDD SAS HDD 01
01
02
02
03
03

2
6
UCS X9416

1 2 3 4 5 6 7 8

Cisco UCS 9416 X-Fabric


Cisco UCS X210c M7 Server
Cisco UCS X440p PCIe Node

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 7
X-Series current portfolio
Compute
X210c Compute Node X410c Compute Node
• 2- Socket, single slot servers • 4- Socket, dual slot servers
• Two Generations: M6 and M7 • Intel 4th Gen Xeon CPU
• Intel 3rd Gen. (Ice Lake) and 4th Gen • Up to 64 DDR5 DIMMs
(Sapphire Rapids) Xeon CPUs

Fabric 25/100G IFM


4th and 5th Gen FI 8 x 25/100G connectivity
• 25/100G ports
• unified ports – up to 16x 32G
FC ports (6536)
• Supports VIC 1400, 14000
4th and 5th Gen VIC
and 15000 series 25/100G connectivity for both
blades and racks.

X-Fabric and PCIe node


X-Fabric GPU Node and Front Mezz GPUs

300 GB
SAS HDD

X
300 GB
SAS HDD

X
• Based on native PCIe Gen. 4 Nvidia A16, Nvidia L40, Nvidia L4 and Nvidia H100
• Provides GPU acceleration to enterprise GPUs today in various configurations.
application
• No backplane or cables = Easy upgrades

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
Cisco UCS X210c M7 Supported GPUs
• Front Mezz Supported GPUs
• 1 NVIDIA T4
• 1 Intel Flex 140

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
Cisco UCS X210c M7 Supported GPUs
• X440p Supported GPUs
• 2 NVIDIA A16
• 2 NVIDIA A40 RTX
• 2 NVIDIA A100-80*
• 2 NVIDIA H100
• 4 NVIDIA L4
• 2 NVIDIA L40
• 2 NVIDIA L40S*
• 4 Intel Flex 140
• 2 Intel Flex 170

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
FlexPod Family Components

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 11
FlexPod with Generative AI Physical Topology

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 12
FlexPod Deployment with Ansible Automation

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 13
FlexPod Deployment with Ansible Automation

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 14
Deploy FlexPod
• Deploy IP-Based FlexPod Using Ansible, Deploying at Least 6
Servers
• Leaving 3 Servers for the Infra Tenant, Create an OCP VMware
Cluster and Move the OCP Servers to this Cluster
• Manage the OCP Cluster with a Single Image and Add the NVAIE
Driver to this Image if Mapping vGPUs to OCP

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 15
Add OCP Tenant to FlexPod
Multi-Tenant Design
• Determine Tenant VLANs and Add VLANs and VRF to Cisco Nexus
Switches
• Add VLAN Interfaces, Broadcast Domains, Tenant SVM, and Logical
Interfaces (LIFs) for Storage and Management to NetApp Storage
• Add Tenant VLANs to Cisco IMM UCS Domain Profiles and vNIC
Interfaces in Server Profiles
• Add VMware vDS Port Groups for Tenant VLANs and VMKs for
NVMe-TCP
• Use NetApp ONTAP Tools to Create NFS OCP Datastores

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 16
ESXi Network Configuration – OCP Tenant

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 17
Deploy Redhat OCP
• Deploy OCP DNS/DHCP Servers in FlexPod Infra Tenant
• Deploy OCP Installer VM in FlexPod Infra Tenant
• From OCP Installer, Install OCP Using Installer Provisioned
Infrastructure (IPI) Installer
• Create Machine Files and Apply to Cluster
• Modify the VMware OCP Template with GPU Passthrough Settings
• Regenerate Workers with Larger Resources (64 vCPUs, 240GB
RAM, and 512GB Disk, Storage Networks)

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 18
Add GPU Support to OCP
• Deploy NVIDIA License Server
• Add vGPUs or GPUs to Worker VMs
• Install the Node Feature Discovery (NFD) Operator
• Install the NVIDIA GPU Operator with Appropriate vGPU or GPU
Driver
• Ensure vGPUs are Licensed
• Enable vGPU or GPU Monitoring Dashboard in OCP
• Enable GPU Monitoring in VMware vCenter

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
Increasing OCP Worker Resources

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
Adding vGPUs to Worker VMs

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
Monitoring vGPUs or GPUs in OCP Console

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
Monitoring GPUs in VMware vCenter

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 23
Monitoring GPUs with nvidia-smi

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 24
Monitoring Server Power Consumption
Cisco Intersight

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 25
NetApp Astra Trident
• Open Source
• Maintained and supported by NetApp
• Designed for Kubernetes and NetApp
• Deploys in Kubernetes clusters as pods and
managed with Kubernetes
• CSI Compliant Storage Orchestrator
• Quickly and easily consume persistent storage
• Broad support
• One provisioner for entire NetApp portfolio

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
NetApp Trident Backend Definition

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 27
NetApp DataOps Toolkit

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 28
FlexPod as a Platform for Generative AI Inference

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 29
FlashStack for Generative AI Inference
• Foundational Architecture for Gen AI
• Validated NVIDIA NeMo Inference with TensorRT-LLM
that accelerates inference performance of LLMs on
NVIDIA GPUs
• Validated models using Text Generation Inference
server from Hugging Face
• Metrics dashboard for insights into infrastructure, cluster
and GPU performance and behavior
• Simplify and Accelerate Model Deployment
• Extensive breadth of validation of AI models such as
GPT, Stable Diffusion and Llama 2 LLMs with diverse
model serving options
• Consistent Performance
• Consistent average latency and Throughput
• Better price to performance ratio

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 30
Summary of Generative AI Models Validated

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 31
Example of Running Llama 2 13B on NeMo

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 32
Example of Running Llama 2 13B on NeMo

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 33
Example of Running Llama 2 13B on NeMo

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 34
Example of Running Llama 2 13B on NeMo

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 35
Sample Benchmark Output

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
Launch Stable Diffusion Using NetApp DataOps
Toolkit

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 37
Run Stable Diffusion XL 1.0

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 38
Stable Diffusion XL 1.0 Output
prompt = “Astronaut riding a horse on Mars, detailed, 8k resolution”

Image generated by AI

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 39
Retrieval-Augmented Generation (RAG)

• RAG is a technique for enhancing the accuracy and reliability of generative


AI models with facts fetched from external sources.
• It fills a gap in how Large Language Model (LLM) work for enterprise
applications.
• It generates up-to-date and domain-specific answers by connecting a
LLM to enterprise data.

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 40
Generic RAG Architecture

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 41
RAG Demo- Question with no RAG

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 42
RAG Demo- FlexPod CVD in Knowledge Base

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
RAG Demo- More Precise Answer with RAG

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 44
Q&A

45
BRKCOM-1004
Complete Your Session Evaluations

Complete a minimum of 4 session surveys and the Overall Event Survey to be


entered in a drawing to win 1 of 5 full conference passes to Cisco Live 2025.

Earn 100 points per survey completed and compete on the Cisco Live
Challenge leaderboard.

Level up and earn exclusive prizes!

Complete your surveys in the Cisco Live mobile app.

#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 46
• Visit the Cisco Showcase
for related demos

• Book your one-on-one


Meet the Engineer meeting

Continue • Attend the interactive education


with DevNet, Capture the Flag,
your education and Walk-in Labs

• Visit the On-Demand Library


for more sessions at
www.CiscoLive.com/on-demand

Contact me at: [email protected]

BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 47
Thank you

#CiscoLive
#CiscoLive

You might also like