Brkcom 1004
Brkcom 1004
Brkcom 1004
#CiscoLive
Cisco Webex App
https://ciscolive.ciscoevents.com/
ciscolivebot/#BRKCOM-1004
Questions?
Use Cisco Webex App to chat
with the speaker after the session
How
1 Find this session in the Cisco Live Mobile App
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 2
• Intro to Generative AI
Inferencing
• Cisco UCS X-Series with
GPUs
• FlexPod as a Platform for
Generative AI Inferencing
Agenda • FlashStack as a Platform for
Generative AI Inferencing
• Generative AI Inferencing
Examples
• Retrieval Augmented
Generation (RAG)
BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 3
What are we building for?
Source data
NGC
Very large models built by a few vendors on very large clusters Most customers start here to leverage
typically on public cloud existing Foundational models;
Integrate with own data to fine tune
the model and deploy for Inference
The goal of Cisco AI framework is to deliver an “AI ready stack” that integrates and automates
compute, network, storage, AI models, and AI tools to get customers started on their AI journey
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 4
Gen AI infrastructure mapping
Inferencing
Most GenAI customers are here
1
GPU 4-8
CPU CPU 2 2-4
GPU GPU
only only GPU
Edge DC
Fine-tuning
Leveraging 2-4
foundation 1-2 GPU
GPU
models
Training
1-2
Build GPU
models
1000+ GPU
from scratch
CPU only CPU 8-16 64-128 GPU
cluster cluster 4-8 GPU
GPU
Number of parameters
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Cisco Validated Designs (CVDs) Covered
• FlexPod Datacenter with Generative AI Inferencing
• FlashStack for Generative AI Inferencing Design Guide
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 6
Cisco UCS X-Series and 5th Generation Fabric
Fabric Interconnect
3
4
LS
UCS-FI-6536
BCN STS ENV
01
0
1 02 1 5
01 3 7
1 03
02
03
2 6
2
0
01
1 5
2 02
3 7
03
2
3 2 6
4 8
UCS9108
UCS 9108100G
25G
1 2 3 4 5 6 7 8
1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4
1 3
2 4
5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6
X X X X X X X X
X X X X
5
0
4
01
02
03
1
UCS X9416
2
6
UCS X9416
1 2 3 4 5 6 7 8
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 7
X-Series current portfolio
Compute
X210c Compute Node X410c Compute Node
• 2- Socket, single slot servers • 4- Socket, dual slot servers
• Two Generations: M6 and M7 • Intel 4th Gen Xeon CPU
• Intel 3rd Gen. (Ice Lake) and 4th Gen • Up to 64 DDR5 DIMMs
(Sapphire Rapids) Xeon CPUs
300 GB
SAS HDD
X
300 GB
SAS HDD
X
• Based on native PCIe Gen. 4 Nvidia A16, Nvidia L40, Nvidia L4 and Nvidia H100
• Provides GPU acceleration to enterprise GPUs today in various configurations.
application
• No backplane or cables = Easy upgrades
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
Cisco UCS X210c M7 Supported GPUs
• Front Mezz Supported GPUs
• 1 NVIDIA T4
• 1 Intel Flex 140
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
Cisco UCS X210c M7 Supported GPUs
• X440p Supported GPUs
• 2 NVIDIA A16
• 2 NVIDIA A40 RTX
• 2 NVIDIA A100-80*
• 2 NVIDIA H100
• 4 NVIDIA L4
• 2 NVIDIA L40
• 2 NVIDIA L40S*
• 4 Intel Flex 140
• 2 Intel Flex 170
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
FlexPod Family Components
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 11
FlexPod with Generative AI Physical Topology
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 12
FlexPod Deployment with Ansible Automation
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 13
FlexPod Deployment with Ansible Automation
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 14
Deploy FlexPod
• Deploy IP-Based FlexPod Using Ansible, Deploying at Least 6
Servers
• Leaving 3 Servers for the Infra Tenant, Create an OCP VMware
Cluster and Move the OCP Servers to this Cluster
• Manage the OCP Cluster with a Single Image and Add the NVAIE
Driver to this Image if Mapping vGPUs to OCP
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 15
Add OCP Tenant to FlexPod
Multi-Tenant Design
• Determine Tenant VLANs and Add VLANs and VRF to Cisco Nexus
Switches
• Add VLAN Interfaces, Broadcast Domains, Tenant SVM, and Logical
Interfaces (LIFs) for Storage and Management to NetApp Storage
• Add Tenant VLANs to Cisco IMM UCS Domain Profiles and vNIC
Interfaces in Server Profiles
• Add VMware vDS Port Groups for Tenant VLANs and VMKs for
NVMe-TCP
• Use NetApp ONTAP Tools to Create NFS OCP Datastores
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 16
ESXi Network Configuration – OCP Tenant
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 17
Deploy Redhat OCP
• Deploy OCP DNS/DHCP Servers in FlexPod Infra Tenant
• Deploy OCP Installer VM in FlexPod Infra Tenant
• From OCP Installer, Install OCP Using Installer Provisioned
Infrastructure (IPI) Installer
• Create Machine Files and Apply to Cluster
• Modify the VMware OCP Template with GPU Passthrough Settings
• Regenerate Workers with Larger Resources (64 vCPUs, 240GB
RAM, and 512GB Disk, Storage Networks)
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 18
Add GPU Support to OCP
• Deploy NVIDIA License Server
• Add vGPUs or GPUs to Worker VMs
• Install the Node Feature Discovery (NFD) Operator
• Install the NVIDIA GPU Operator with Appropriate vGPU or GPU
Driver
• Ensure vGPUs are Licensed
• Enable vGPU or GPU Monitoring Dashboard in OCP
• Enable GPU Monitoring in VMware vCenter
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
Increasing OCP Worker Resources
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
Adding vGPUs to Worker VMs
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
Monitoring vGPUs or GPUs in OCP Console
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
Monitoring GPUs in VMware vCenter
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 23
Monitoring GPUs with nvidia-smi
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 24
Monitoring Server Power Consumption
Cisco Intersight
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 25
NetApp Astra Trident
• Open Source
• Maintained and supported by NetApp
• Designed for Kubernetes and NetApp
• Deploys in Kubernetes clusters as pods and
managed with Kubernetes
• CSI Compliant Storage Orchestrator
• Quickly and easily consume persistent storage
• Broad support
• One provisioner for entire NetApp portfolio
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
NetApp Trident Backend Definition
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 27
NetApp DataOps Toolkit
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 28
FlexPod as a Platform for Generative AI Inference
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 29
FlashStack for Generative AI Inference
• Foundational Architecture for Gen AI
• Validated NVIDIA NeMo Inference with TensorRT-LLM
that accelerates inference performance of LLMs on
NVIDIA GPUs
• Validated models using Text Generation Inference
server from Hugging Face
• Metrics dashboard for insights into infrastructure, cluster
and GPU performance and behavior
• Simplify and Accelerate Model Deployment
• Extensive breadth of validation of AI models such as
GPT, Stable Diffusion and Llama 2 LLMs with diverse
model serving options
• Consistent Performance
• Consistent average latency and Throughput
• Better price to performance ratio
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 30
Summary of Generative AI Models Validated
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 31
Example of Running Llama 2 13B on NeMo
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 32
Example of Running Llama 2 13B on NeMo
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 33
Example of Running Llama 2 13B on NeMo
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 34
Example of Running Llama 2 13B on NeMo
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 35
Sample Benchmark Output
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
Launch Stable Diffusion Using NetApp DataOps
Toolkit
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 37
Run Stable Diffusion XL 1.0
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 38
Stable Diffusion XL 1.0 Output
prompt = “Astronaut riding a horse on Mars, detailed, 8k resolution”
Image generated by AI
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 39
Retrieval-Augmented Generation (RAG)
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 40
Generic RAG Architecture
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 41
RAG Demo- Question with no RAG
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 42
RAG Demo- FlexPod CVD in Knowledge Base
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
RAG Demo- More Precise Answer with RAG
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 44
Q&A
45
BRKCOM-1004
Complete Your Session Evaluations
Earn 100 points per survey completed and compete on the Cisco Live
Challenge leaderboard.
#CiscoLive BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 46
• Visit the Cisco Showcase
for related demos
BRKCOM-1004 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 47
Thank you
#CiscoLive
#CiscoLive