Mastering KVM Virtualization.2nd
Mastering KVM Virtualization.2nd
Mastering
KVM Virtualization
Second Edition
Vedran Dakic
Prasad Mukhedkar
Anil Vettathu
BIRMINGHAM—MUMBAI
Mastering KVM Virtualization
Second Edition
Copyright © 2020 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means, without the prior written permission of the publisher,
except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information
presented. However, the information contained in this book is sold without warranty, either express
or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable
for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies
and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing
cannot guarantee the accuracy of this information.
Commissioning Editor: Vijin Boricha
Acquisition Editor: Shrilekha Inani
Senior Editor: Arun Nadar
Content Development Editor: Nihar Kapadia
Technical Editor: Soham Amburle 最新资料最新资料
ISBN 978-1-83882-871-4
www.packt.com
25 years ago, a colleague suggested that I should write what he called "a
Linux book". I liked the idea and I promised I would. Years rolled by, and
here I am, a quarter of a century later, acting on a promise. As Steve Jobs
once said, 'Ideas without action aren't ideas. They're regrets.'
Why subscribe?
• Spend less time learning and more time coding with practical eBooks and videos
from over 4,000 industry professionals
• Improve your learning with Skill Plans built especially for you
• Get a free eBook or video every month
• Fully searchable for easy access to vital information
• Copy and paste, print, and bookmark content
最新资料最新资料
Did you know that Packt offers eBook versions of every book published, with PDF and
ePub files available? You can upgrade to the eBook version at packt.com and, as a print
book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
[email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up
for a range of free newsletters, and receive exclusive discounts and offers on Packt books
and eBooks.
Contributors
About the authors
Vedran Dakic has a master's in electrical engineering and computing and is an IT trainer,
covering system administration, cloud, automatization, and orchestration courses. He is
a certified Red Hat, VMware, and Microsoft trainer. He is currently employed as the head
of department of operating systems at Algebra University College in Zagreb. As part of
his job, he lectures in relation to 3- and 5-year study programs in systems engineering,
programming, and multimedia tracks. He also does a lot of consulting and systems
integration in relation to his clients' projects – something he has been doing for the past
20 years. His approach is simple – bring real-world experience to all of the courses that he
is involved with as this will provide added value for his students and customers.
最新资料最新资料
Prasad Mukhedkar is a specialist cloud solution architect at Red Hat India with over 10
years of experience in helping customers in their journey to Virtualization and Cloud
adoption. He is a Red Hat Certified Architect and has extensive experience in designing
and implementing high performing cloud infrastructure. His areas of expertise are Red
Hat Enterprise Linux 7/8 performance tuning, KVM virtualization, Ansible Automation,
and Red Hat OpenStack. He is a huge fan of the Linux "GNU screen" utility.
最新资料最新资料
Anil Vettathu began his association with Linux while in college and began his career as a
Linux System Administrator soon after. He is a generalist, with an interest in open source
technologies. He has hands-on experience in designing and implementing large scale
virtualization environments using open source technologies and has extensive knowledge
in libvirt and KVM. These days he primarily works on Red Hat Enterprise Virtualization,
containers, and real time performance tuning. Currently, he is working as a Technical
Account Manager for Red Hat. His website is http://anilv.in.
I'd like to thank my wife, Chandni, for her unconditional support. She took
on the pain of looking after our two naughtiest kids, while I enjoyed writing
this book. I'd like like to thank my parents, Dr. Annieamma and Dr. George
Vettathu, for their guidance and for pushing me hard to study something
new. Finally, I would like to thank my sister, Dr. Wilma, for her guidance,
and my brother, Vimal.
About the reviewer
Ranjith Rajaram is employed as a senior principle technical support engineer at a
leading open source Enterprise Linux company. He began his career by providing support
to web hosting companies and managing servers remotely. Ranjith has also provided
technical support to end customers. Early in his career, he worked on Linux, Unix, and
FreeBSD platforms.
For the past 15 years, he has been continuously learning something new. This is what he
likes and admires about technical support. As a mark of respect to all his fellow technical
support engineers, he has included "developing software is humane, but supporting it is
divine" in his email signature.
At his current organization, he is involved in implementing, installing, and
troubleshooting Linux environment networks. Aside from this, he is also an active
contributor to the Linux container space (Docker, Podman), Kubernetes, and OpenShift.
Apart from this book, he has reviewed the first editions of Mastering KVM Virtualization
and Learning RHEL Networking, both available from Packt.
最新资料最新资料
Section 1:
KVM Virtualization Basics
1
Understanding Linux Virtualization
Linux virtualization and how it Xen12
all started 4 KVM13
Types of virtualization 6
最新资料最新资料
2
KVM as a Virtualization Solution
Virtualization as a concept 18 The internal workings of libvirt,
Virtualized versus physical environments18 QEMU, and KVM 30
Why is virtualization so important? 20 libvirt30
Hardware requirements for QEMU38
virtualization21 QEMU – KVM internals 41
Software requirements for virtualization 24 Data structures 42
Threading models in QEMU 48
ii Table of Contents
KVM49 Summary64
Data structures 55 Questions64
Execution flow of vCPU 59 Further reading 65
Section 2:
libvirt and ovirt for Virtual Machine
Management
3
Installing KVM Hypervisor, libvirt, and oVirt
Getting acquainted with QEMU Automating virtual machine installation 77
and libvirt 70 Installing oVirt 80
Getting acquainted with oVirt 71 Starting a virtual machine using
Installing QEMU, libvirt, and QEMU and libvirt 83
oVirt 73 Summary87
Installing the first virtual machine in 最新资料最新资料
Questions87
KVM76
Further reading 87
4
Libvirt Networking
Understanding physical and Configuring Open vSwitch 105
virtual networking 90 Other Open vSwitch use cases 113
Virtual networking 91
Understanding and using SR-
Libvirt NAT network 93
IOV114
Libvirt routed network 94
Libvirt isolated network 95
Understanding macvtap 118
Summary121
Using userspace networking Questions121
with TAP and TUN devices 101
Further reading 122
Implementing Linux bridging 103
Table of Contents iii
5
Libvirt Storage
Introduction to storage 124 Getting image information 164
Storage pools 126 Attaching a disk using virt-manager 164
Attaching a disk using virsh 166
Local storage pools 128
Creating an ISO image library 167
Libvirt storage pools 130
Deleting a storage pool 169
NFS storage pool 131 Creating storage volumes 170
iSCSI and SAN storage 136 Creating volumes using the virsh
command171
Storage redundancy and
Deleting a volume using the virsh
multipathing146
command171
Gluster and Ceph as a storage
backend for KVM 150 The latest developments in
Gluster150 storage – NVMe and NVMeOF 172
Ceph 155 Summary176
Questions176
Virtual disk images and
formats and basic KVM storage Further reading 177
operations162
最新资料最新资料
6
Virtual Display Devices and Protocols
Using virtual machine display Using the SPICE display protocol198
devices180 Adding a SPICE graphics server 198
Physical and virtual graphics cards in
VDI scenarios 185 Methods to access a virtual
GPU PCI passthrough 189 machine console 200
Getting display portability with
Discussing remote display noVNC 202
protocols193
Summary206
Remote display protocols history 193
Questions206
Types of remote display protocols 195
Further reading 207
Using the VNC display protocol 196
Why VNC? 197
iv Table of Contents
7
Virtual Machines: Installation, Configuration, and Life Cycle
Management
Creating a new VM using virt- Migrating VMs 238
manager210 Benefits of VM migration 239
Using virt-manager 210 Setting up the environment 240
Using virt-* commands 217 Offline migration 243
Creating a new VM using Cockpit 223 Live or online migration 247
8
Creating and Modifying VM Disks, Templates, and Snapshots
Modifying VM images using 最新资料最新资料
Snapshots285
libguestfs tools 256 Working with internal snapshots 286
virt-v2v257 Managing snapshots using virt-manager291
virt-p2v259 Working with external disk snapshots 292
guestfish 259
Use cases and best practices
VM templating 263 while using snapshots 305
Working with templates 266 Summary306
Deploying VMs from a template 275 Questions306
virt-builder and virt-builder Further reading 306
repos281
virt-builder repositories 283
Table of Contents v
Section 3:
Automation, Customization, and
Orchestration for KVM VMs
9
Customizing a Virtual Machine with cloud-init
What is the need for virtual Using cloud-init modules 322
machine customization? 312
Examples on how to use a
Understanding cloud-init 314 cloud-config script with cloud-
Understanding cloud-init init323
architecture315 The first deployment 329
Installing and configuring The second deployment 332
cloud-init at boot time 318 The third deployment 334
Cloud-init images 319
Cloud-init data sources 320
Summary342
Questions342
Passing metadata and user 最新资料最新资料
10
Automated Windows Guest Deployment and Customization
The prerequisites to creating cloudbase-init customization
Windows VMs on KVM 346 examples353
Creating Windows VMs using Troubleshooting common
the virt-install utility 347 cloudbase-init customization
Customizing Windows VMs issues361
using cloudbase-init 350 Summary364
Questions364
Further reading 364
vi Table of Contents
11
Ansible and Scripting for Orchestration and Automation
Understanding Ansible 366 automation and orchestration 399
Automation approaches 367
Orchestrating multi-tier
Introduction to Ansible 369
application deployment on
Deploying and using AWX 372
KVM VM 406
Deploying Ansible 382
Learning by example – various
Provisioning a virtual machine examples of using Ansible with
using the kvm_libvirt module 383 KVM409
Working with playbooks 387 Summary410
Installing KVM 393 Questions410
Using Ansible and cloud-init for
Further reading 411
Section 4:
Scalability, Monitoring, Performance
Tuning, and Troubleshooting 最新资料最新资料
12
Scaling Out KVM with OpenStack
Introduction to OpenStack 416 Additional OpenStack
Software-defined networking 418 use cases 439
Understanding VXLAN 420 Creating a Packstack demo
Understanding GENEVE 425 environment for OpenStack 441
13
Scaling out KVM with AWS
Introduction to AWS 470 What do we want to do? 484
Approaching the cloud 470 Uploading an image to EC2 498
Multi-cloud472
Building hybrid KVM clouds
Shadow IT 473
with Eucalyptus 507
Market share 474
How do you install it? 509
Big infrastructure but no services 474
Using Eucalyptus for AWS control 518
Pricing 475
Data centers 477 Summary521
Placement is the key 478 Questions521
AWS services 480
Further reading 522
Preparing and converting
virtual machines for AWS 483 最新资料最新资料
14
Monitoring the KVM Virtualization Platform
Monitoring the KVM Workflow 533
virtualization platform 524
Configuring data collector and
Introduction to the open source aggregator536
ELK solution 526
Creating charts in Kibana 537
Elasticsearch526
Creating custom utilization reports 538
Logstash527
ELK and KVM 549
Kibana528
Summary557
Setting up and integrating the
ELK stack 528 Questions557
Further reading 557
viii Table of Contents
15
Performance Tuning and Optimization for KVM VMs
It's all about design 560 Automatic NUMA balancing 592
General hardware design 561 The numactl command 593
VM design 565 Understanding numad and numastat 594
16 最新资料最新资料
an advanced knowledge of Linux beforehand. We'll get you there as you go through
the book – it's an integral part of the learning process. If you're interested in KVM,
OpenStack, the ELK Stack, Eucalyptus, or AWS – we've got you covered.
by using NoVNC.
Chapter 7, Virtual Machines: Installation, Configuration, and Life Cycle Management,
introduces additional ways of deploying and configuring KVM virtual machines, as well
as migration processes, which are very important for any kind of production environment.
Chapter 8, Creating and Modifying VM Disks, Templates, and Snapshots, discusses various
virtual machine image types, virtual machine templating processes, the use of snapshots,
and some of the use cases and best practices while using snapshots. It also serves as an
introduction to the next chapter, where we will be using templating and virtual machine
disks in a much more streamlined fashion to customize virtual machines post-boot by
using cloud-init and cloudbase-init.
Chapter 9, Customize a Virtual Machine with cloud-init, discusses one of the most
fundamental concepts in cloud environments – how to customize a virtual machine
image/template post-boot. Cloud-init is used in almost all of the cloud environments
to do post-boot Linux virtual machine configuration, and we explain how it works
and how to make it work in your environment.
Preface xi
Chapter 14, Monitoring the KVM Virtualization Platform, introduces a very popular
concept of monitoring via the Elasticsearch, Logstash, Kibana (ELK) stack. It also takes
you through the whole process of setting up and integrating the ELK stack with your
KVM infrastructure, all the way through to the end result – using dashboards and UIs
to monitor your KVM-based environment.
Chapter 15, Performance Tuning and Optimization for KVM VMs, talks about various
approaches to tuning and optimization in KVM-based environments by deconstructing
all of the infrastructure design principles and putting them to (correct) use. We cover a
number of advanced topics here – NUMA, KSM, CPU and memory performance, CPU
pinning, the tuning of VirtIO, and block and network devices.
Chapter 16, Troubleshooting Guidelines for the KVM Platform, starts with the basics
– troubleshooting KVM services and logging, and explains various troubleshooting
methodologies for KVM and oVirt, Ansible and OpenStack, Eucalyptus, and AWS. These
are the real-life problems that we've also encountered in our production environments
while writing this book. In this chapter, we basically discuss problems related to every
single chapter of this book, including problems associated with snapshots and templating.
xii Preface
Code in Action
Code in Action videos for this book can be viewed at https://bit.ly/32IHMdO.
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names,
filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles.
Here is an example: "What we need to do is just uncomment the one pipeline that is
defined in the configuration file, located in the /etc/logstash folder."
A block of code is set as follows:
<memoryBacking>
<locked/>
</memoryBacking>
Preface xiii
When we wish to draw your attention to a particular part of a code block, the relevant
lines or items are set in bold:
Bold: Indicates a new term, an important word, or words that you see on screen. For
example, words in menus or dialog boxes appear in the text like this. Here is an example:
"After you push the Refresh button, new data should appear on the page."
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book
title in the subject of your message and email us at [email protected].
最新资料最新资料
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes
do happen. If you have found a mistake in this book, we would be grateful if you would
report this to us. Please visit www.packtpub.com/support/errata, selecting your
book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the internet,
we would be grateful if you would provide us with the location address or website name.
Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise
in, and you are interested in either writing or contributing to a book, please visit
authors.packtpub.com.
Reviews
Please leave a review. Once you have read and used this book, why not leave a review on
the site that you purchased it from? Potential readers can then see and use your unbiased
opinion to make purchase decisions, we at Packt can understand what you think about
our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
最新资料最新资料
Section 1:
KVM Virtualization
Basics
最新资料最新资料
Part 1 provides you with an insight into the prevailing technologies in Linux virtualization
and its advantages over other virtualization solutions. We will discuss the important data
structures and the internal implementation of libvirt, QEMU, and KVM.
This part of the book comprises the following chapters:
(for example, IBM CP-40 and its S/360-40, from 1967). But it sure was a new idea for a
PC market, which was in a weird phase with many things happening at the same time.
Switching to 64-bit CPUs with multi-core CPUs appearing on the market, then switching
from DDR1 to DDR2, and then from PCI/ISA/AGP to PCI Express, as you might
imagine, was a challenging time.
Specifically, I remember thinking about the possibilities – how cool it would be to run an
OS, and then another couple of OSes on top of that. Working in the publishing industry,
you might imagine how many advantages that would offer to anyone's workflow, and I
remember really getting excited about it.
15 or so years of development later, we now have a competitive market in terms of
virtualization solutions – Red Hat with KVM, Microsoft with Hyper-V, VMware with
ESXi, Oracle with Oracle VM, and Google and other key players duking it out for users
and market dominance. This led to the development of various cloud solutions such as
EC2, AWS, Office 365, Azure, vCloud Director, and vRealize Automation for various types
of cloud services. All in all, it was a very productive 15 years for IT, wouldn't you say?
Linux virtualization and how it all started 5
But, going back to October 2003, with all of the changes that were happening in the IT
industry, there was one that was really important for this book and virtualization for
Linux in general: the introduction of the first open source Hypervisor for x86 architecture,
called Xen. It supports various CPU architectures (Itanium, x86, x86_64, and ARM), and
it can run various OSes – Windows, Linux, Solaris, and some flavors of BSD – and it's still
alive and kicking as a virtualization solution of choice for some vendors, such as Citrix
(XenServer) and Oracle (Oracle VM). We'll get into more technical details about Xen a
little bit later in this chapter.
The biggest corporate player in the open source market, Red Hat, included Xen
virtualization in initial releases of its Red Hat Enterprise Linux 5, which was released in
2007. But Xen and Red Hat weren't exactly a match made in heaven and although Red
Hat shipped Xen with its Red Hat Enterprise Linux 5 distribution, Red Hat switched to
KVM in Red Hat Enterprise Linux 6 in 2010, which was – at the time – a very risky move.
Actually, the whole process of migrating from Xen to KVM began in the previous version,
with 5.3/5.4 releases, both of which came out in 2009. To put things into context, KVM
was a pretty young project back then, just a couple of years old. But there were more than
a few valid reasons why that happened, varying from Xen is not in the mainline kernel,
KVM is, to political reasons (Red Hat wanted more influence over Xen development, and
that influence was fading with time).
Technically speaking, KVM uses a different, modular approach that transforms Linux
最新资料最新资料
kernels into fully functional hypervisors for supported CPU architectures. When we
say supported CPU architectures, we're talking about the basic requirement for KVM
virtualization – CPUs need to support hardware virtualization extensions, known as
AMD-V or Intel VT. To make things a bit easier, let's just say that you're really going to
have to try very hard to find a modern CPU that doesn't support these extensions. For
example, if you're using an Intel CPU on your server or desktop PC, the first CPUs that
supported hardware virtualization extensions date all the way back to 2006 (Xeon LV) and
2008 (Core i7 920). Again, we'll get into more technical details about KVM and provide a
comparison between KVM and Xen a little bit later in this chapter and in the next.
6 Understanding Linux Virtualization
Types of virtualization
There are various types of virtualization solutions, all of which are aimed at different
use cases and are dependent on the fact that we're virtualizing a different piece of the
hardware or software stack, that is, what you're virtualizing. It's also worth noting
that there are different types of virtualization in terms of how you're virtualizing – by
partitioning, full virtualization, paravirtualization, hybrid virtualization, or container-
based virtualization.
So, let's first cover the five different types of virtualization in today's IT based on what
you're virtualizing:
If you take a look at these virtualization solutions and scale them up massively (hint: the
cloud), that's when you realize that you're going to need various tools and solutions to
effectively manage the ever-growing infrastructure, hence the development of various
automatization and orchestration tools. Some of these tools will be covered later in this
book, such as Ansible in Chapter 11, Ansible for Orchestration and Automation. For
the time being, let's just say that you just can't manage an environment that contains
thousands of virtual machines by relying on standard utilities only (scripts, commands,
and even GUI tools). You're definitely going to need a more programmatic, API-driven
approach that's tightly integrated with the virtualization solution, hence the development
of OpenStack, OpenShift, Ansible, and the Elasticsearch, Logstash, Kibana (ELK) stack,
which we'll cover in Chapter 14, Monitoring the KVM Virtualization Platform Using the
ELK Stack.
8 Understanding Linux Virtualization
If we're talking about how we're virtualizing a virtual machine as an object, there are
different types of virtualization:
translation and I/O mapping. One of the main advantages of virtualization software is
its capability to run multiple guests operating on the same physical system or hardware.
These multiple guest systems can be on the same OS or different ones. For example, there
can be multiple Linux guest systems running as guests on the same physical system.
The VMM is responsible for allocating the resources requested by these guest OSes. The
system hardware, such as the processor, memory, and so on, must be allocated to these
guest OSes according to their configuration, and the VMM can take care of this task. Due
to this, the VMM is a critical component in a virtualization environment.
In terms of types, we can categorize hypervisors as either type 1 or type 2.
10 Understanding Linux Virtualization
最新资料最新资料
However, a type 1 hypervisor doesn't favor customization. Generally, there will be some
restrictions when you try to install any third-party applications or drivers on it.
Open source virtualization projects 11
On the other hand, a type 2 hypervisor resides on top of the OS, allowing you to do
numerous customizations. Type 2 hypervisors are also known as hosted hypervisors
that are dependent on the host OS for their operations. The main advantage of type
2 hypervisors is the wide range of hardware support, because the underlying host OS
controls hardware access. The following diagram provides an illustration of the type 2
hypervisor design concept:
That's a good example of a type 2 hypervisor use case. Well-known type 2 hypervisors
include VMware Player, Workstation, Fusion, and Oracle VirtualBox. On the other
hand, if we're specifically aiming to create a server that we're going to use to host virtual
machines, then that's type 1 hypervisor territory.
In the upcoming sections, we will discuss Xen and KVM, which are the leading open
source virtualization solutions in Linux.
Xen
Xen originated at the University of Cambridge as a research project. The first public
release of Xen was in 2003. Later, the leader of this project at the University of Cambridge,
Ian Pratt, co-founded a company called XenSource with Simon Crosby (also from the
University of Cambridge). This company started to develop the project in an open
source fashion. On 15 April 2013, the Xen project was moved to the Linux Foundation
as a collaborative project. The Linux Foundation launched a new trademark for the Xen
Project to differentiate the project from any commercial use of the older Xen trademark.
More details about this can be found at https://xenproject.org/.
The Xen hypervisor has been ported to a number of processor families, such as Intel
IA-32/64, x86_64, PowerPC, ARM, MIPS, and so on.
The core concept of Xen has four main building blocks:
• Xen hypervisor: The integral part of Xen that handles intercommunication between
the physical hardware and virtual machine(s). It handles all interrupts, times, CPU
and memory requests, and hardware interaction.
最新资料最新资料
• Dom0: Xen's control domain, which controls a virtual machine's environment. The
main part of it is called QEMU, a piece of software that emulates a regular computer
system by doing binary translation to emulate a CPU.
• Management utilities: Command-line utilities and GUI utilities that we use
to manage the overall Xen environment.
• Virtual machines (unprivileged domains, DomU): Guests that we're running
on Xen.
As shown in the following diagram, Dom0 is a completely separate entity that controls the
other virtual machines, while all the other are happily stacked next to each other using
system resources provided by the hypervisor:
Open source virtualization projects 13
KVM 最新资料最新资料
KVM represents the latest generation of open source virtualization. The goal of the project
was to create a modern hypervisor that builds on the experience of previous generations
of technologies and leverages the modern hardware available today (VT-x, AMD-V, and
so on).
KVM simply turns the Linux kernel into a hypervisor when you install the KVM kernel
module. However, as the standard Linux kernel is the hypervisor, it benefits from the
changes that were made to the standard kernel (memory support, scheduler, and so
on). Optimizations for these Linux components, such as the scheduler in the 3.1 kernel,
improvement to nested virtualization in 4.20+ kernels, new features for mitigation
of Spectre attacks, support for AMD Secure Encrypted Virtualization, Intel iGPU
passthrough in 4/5.x kernels, and so on benefit both the hypervisor (the host OS) and the
Linux guest OSes. For I/O emulations, KVM uses a userland software, QEMU; this is a
userland program that does hardware emulation.
QEMU emulates the processor and a long list of peripheral devices such as the disk,
network, VGA, PCI, USB, serial/parallel ports, and so on to build a complete piece of
virtual hardware that the guest OS can be installed on. This emulation is powered by KVM.
14 Understanding Linux Virtualization
• OpenStack: A fully open source cloud OS that consists of several open source sub
projects that provide all the building blocks to create an IaaS cloud. KVM (Linux
virtualization) is the most used (and best-supported) hypervisor in OpenStack
deployments. It's governed by the vendor-agnostic OpenStack Foundation. How
to build an OpenStack cloud using KVM will be explained in detail in Chapter 12,
最新资料最新资料
There are other important questions to consider when discussing OpenStack beyond
the technical bits and pieces that we've discussed so far in this chapter. One of the most
important concepts in IT today is actually being able to run an environment (purely
virtualized one, or a cloud environment) that includes various types of solutions (such
as virtualization solutions) by using some kind of management layer that's capable of
working with different solutions at the same time. Let's take OpenStack as an example of
this. If you go through the OpenStack documentation, you'll soon realize that OpenStack
supports 10+ different virtualization solutions, including the following:
• KVM
• Xen (via libvirt)
• LXC (Linux containers)
• Microsoft Hyper-V
• VMware ESXi
• Citrix XenServer
• User Mode Linux (UML)
• PowerVM (IBM Power 5-9 platform)
• Virtuozzo (hyperconverged solution that can use virtual machines, storage, and
最新资料最新资料
containers)
• z/VM (virtualization solution for IBM Z and IBM LinuxONE servers)
That brings us to the multi-cloud environments that could span different CPU
architectures, different hypervisors, and other technologies such as hypervisors – all under
the same management toolset. This is just one thing that you can do with OpenStack.
We'll get back to the subject of OpenStack later in this book, specifically in Chapter 12,
Scaling Out KVM with OpenStack.
Summary
In this chapter, we covered the basics of virtualization and its different types. Keeping in
mind the importance of virtualization in today's large-scale IT world is beneficial as it's
good to know how these concepts can be tied together to create a bigger picture – large,
virtualized environments and cloud environments. Cloud-based technologies will be
covered later in much greater detail – treat what we've mentioned so far as a starter; the
main course is still to come. But the next chapter belongs to the main star of our book –
the KVM hypervisor and its related utilities.
16 Understanding Linux Virtualization
Questions
1. Which types of hypervisors exist?
2. What are containers?
3. What is container-based virtualization?
4. What is OpenStack?
Further reading
Please refer to the following links for more information regarding what was covered in
this chapter:
最新资料最新资料
2
KVM as a
Virtualization
Solution 最新资料最新资料
• Virtualization as a concept
• The internal workings of libvirt, QEMU, and KVM
• How all these communicate with each other to provide virtualization
18 KVM as a Virtualization Solution
Virtualization as a concept
Virtualization is a computing approach that decouples hardware from software. It provides
a better, more efficient, and programmatic approach to resource splitting and sharing
between various workloads – virtual machines running OSes, and applications on top
of them.
If we were to compare traditional, physical computing of the past with virtualization, we
can say that by virtualizing, we get the possibility to run multiple guest OSes (multiple
virtual servers) on the same piece of hardware (same physical server). If we're using a type
1 hypervisor (explained in Chapter 1, Understanding Linux Virtualization), this means
that the hypervisor is going to be in charge of letting the virtual servers access physical
hardware. This is because there is more than one virtual server using the same hardware
as the other virtual servers on the same physical server. This is usually supported by some
kind of scheduling algorithm that's implemented programmatically in hypervisors so that
we can get more efficiency from the same physical server.
In a virtualized world, we're running a hypervisor (such as KVM), and virtual machines
on top of that hypervisor. Inside these virtual machines, we're running the same OS and
application, just like in the physical server. The virtualized approach is shown in the
following diagram:
There are still various scenarios in which the physical approach is going to be needed. For
example, there are still thousands of applications on physical servers all over the world
because these servers can't be virtualized. There are different reasons why they can't be
virtualized. For example, the most common reason is actually the simplest reason – maybe
these applications are being run on an OS that's not on the supported OS list by the
virtualization software vendor. That can mean that you can't virtualize that OS/application
combination because that OS doesn't support some virtualized hardware, most commonly
a network or a storage adapter. The same general idea applies to the cloud as well –
moving things to the cloud isn't always the best idea, as we will describe later in this book.
20 KVM as a Virtualization Solution
In conclusion, for PC-based servers, looking from the CPU perspective, switching to
multi-core CPUs was an opportune moment to start working toward virtualization as
the concept that we know and love today.
In parallel with these developments, CPUs got other additions – for example, additional
CPU registers that can handle specific types of operations. A lot of people heard about
instruction sets such as MMX, SSE, SSE2, SSE3, SSE4.x, AVX, AVX2, AES, and so on.
These are all very important today as well because they give us a possibility of offloading
certain instruction types to a specific CPU register. This means that these instructions
don't have to be run on a CPU as a general serial device, which executes these tasks
slower. Instead, these instructions can be sent to a specific CPU register that's specialized
for these instructions. Think of it as having separate mini accelerators on a CPU die
that could run some pieces of the software stack without hogging the general CPU
pipeline. One of these additions was Virtual Machine Extensions (VMX) for Intel, or
AMD Virtualization (AMD-V), both of which enable us to have full, hardware-based
virtualization support for their respective platforms.
Keeping all of this in mind, these are the requirements that we have to comply with
today so that we can run modern-day hypervisors with full hardware-assisted
virtualization support:
• The possibility to do PCI passthrough, which means we can take a PCI Express
connected card (for example, a video card) connected to a server motherboard and
present it to a virtual machine as if that card was directly connected to the virtual
machine via functionality called Physical Functions (PFs). This means bypassing
various hypervisor levels that the connection would ordinarily take place through.
• Trusted Platform Module (TPM) support, which is usually implemented as an
additional motherboard chip. Using TPM can have a lot of advantages in terms of
security because it can be used to provide cryptographic support (that is, to create,
save, and secure the use of cryptographic keys). There was quite a bit of buzz in the
Linux world around the use of TPM with KVM virtualization, which led to Intel's
open sourcing of the TPM2 stack in the summer of 2018.
When discussing SR-IOV and PCI passthrough, make sure that you take note of the core
functionalities, called PF and VF. These two keywords will make it easier to remember
where (on a physical or virtual level) and how (directly or via a hypervisor) devices are
forwarded to their respective virtual machines. These capabilities are very important for
the enterprise space and quite a few specific scenarios. Just as an example, there's literally
no way to have a virtual desktop infrastructure (VDI) solution with workstation-grade
virtual machines that you can use to run AutoCAD and similar applications without
these capabilities. This is because integrated graphics on CPUs are just too slow to do
that properly. That's when you start adding GPUs to your servers – so that you can use
最新资料最新资料
Why is this important? The more channels you have and the lower the latency is, the more
bandwidth you have from CPU to memory. And that is very, very desirable for a lot of
workloads in today's IT space (for example, databases).
最新资料最新资料
As shown in the preceding diagram, the protection rings are numbered from the most
privileged to the least privileged. Ring 0 is the level with the most privilege and interacts
directly with physical hardware, such as the CPU and memory. The resources, such as
memory, I/O ports, and CPU instructions, are protected via these privileged rings. Rings
1 and 2 are mostly unused. Most general-purpose systems use only two rings, even if the
hardware they run on provides more CPU modes than that. The two main CPU modes
are the kernel mode and the user mode, which are also related to the way processes are
executed. You can read more about it at this link: https://access.redhat.com/
sites/default/files/attachments/processstates_20120831.pdf. From
an OS's point of view, ring 0 is called the kernel mode/supervisor mode and ring 3 is the
user mode. As you may have assumed, applications run in ring 3.
OSes such as Linux and Windows use supervisor/kernel and user mode. This mode can do
almost nothing to the outside world without calling on the kernel or without its help due
to its restricted access to memory, CPU, and I/O ports. The kernels can run in privileged
mode, which means that they can run on ring 0. To perform specialized functions, the
user-mode code (all the applications that run in ring 3) must perform a system call to
the supervisor mode or even to the kernel space, where the trusted code of the OS will
perform the needed task and return the execution back to the userspace. In short, the OS
runs in ring 0 in a normal environment. It needs the most privileged level to do resource
management and provide access to the hardware. The following diagram explains this:
最新资料最新资料
The rings above 0 run instructions in a processor mode called unprotected. The
hypervisor/Virtual Machine Monitor (VMM) needs to access the memory, CPU, and
I/O devices of the host. Since only the code running in ring 0 is allowed to perform these
operations, it needs to run in the most privileged ring, which is ring 0, and has to be
placed next to the kernel. Without specific hardware virtualization support, the hypervisor
or VMM runs in ring 0; this basically blocks the virtual machine's OS in ring 0. So, the
virtual machine's OS must reside in ring 1. An OS installed in a virtual machine is also
expected to access all the resources as it's unaware of the virtualization layer; to achieve
this, it has to run in ring 0, similar to the VMM. Due to the fact that only one kernel can
run in ring 0 at a time, the guest OSes have to run in another ring with fewer privileges or
have to be modified to run in user mode.
This has resulted in the introduction of a couple of virtualization methods called full
virtualization and paravirtualization, which we mentioned earlier. Now, let's try to
explain them in a more technical way.
Full virtualization
In full virtualization, privileged instructions are emulated to overcome the limitations
that arise from the guest OS running in ring 1 and the VMM running in ring 0. Full
virtualization was implemented in first-generation x86 VMMs. It relies on techniques
such as binary translation to trap and virtualize the execution of certain sensitive and
最新资料最新资料
non-virtualizable instructions. This being said, in binary translation, some system calls are
interpreted and dynamically rewritten. The following diagram depicts how the guest OS
accesses the host computer hardware through ring 1 for privileged instructions and how
unprivileged instructions are executed without the involvement of ring 1:
Paravirtualization
In paravirtualization, the guest OS needs to be modified to allow those instructions to
access ring 0. In other words, the OS needs to be modified to communicate between the
VMM/hypervisor and the guest through the backend (hypercalls) path:
Paravirtualization (https://en.wikipedia.org/wiki/Paravirtualization)
is a technique in which the hypervisor provides an API, and the OS of the guest virtual
machine calls that API, which requires host OS modifications. Privileged instruction calls
are exchanged with the API functions provided by the VMM. In this case, the modified
guest OS can run in ring 0.
As you can see, under this technique, the guest kernel is modified to run on the VMM. In
other words, the guest kernel knows that it's been virtualized. The privileged instructions/
operations that are supposed to run in ring 0 have been replaced with calls known as
hypercalls, which talk to the VMM. These hypercalls invoke the VMM so that it performs
the task on behalf of the guest kernel. Since the guest kernel can communicate directly
with the VMM via hypercalls, this technique results in greater performance compared
to full virtualization. However, this requires a specialized guest kernel that is aware of
paravirtualization and comes with needed software support.
The concepts of paravirtualization and full virtualization used to be a common way to do
virtualization but not in the best possible, manageable way. That's where hardware-assisted
virtualization comes into play, as we will describe in the following section.
Hardware-assisted virtualization
Intel and AMD realized that full virtualization and paravirtualization are the major
最新资料最新资料
challenges of virtualization on the x86 architecture (since the scope of this book is limited
to x86 architecture, we will mainly discuss the evolution of this architecture here) due to
the performance overhead and complexity of designing and maintaining the solution.
Intel and AMD independently created new processor extensions of the x86 architecture,
called Intel VT-x and AMD-V, respectively. On the Itanium architecture, hardware-
assisted virtualization is known as VT-i. Hardware-assisted virtualization is a platform
virtualization method designed to efficiently use full virtualization with the hardware
capabilities. Various vendors call this technology by different names, including accelerated
virtualization, hardware virtual machine, and native virtualization.
Virtualization as a concept 29
For better support for virtualization, Intel and AMD introduced Virtualization
Technology (VT) and Secure Virtual Machine (SVM), respectively, as extensions of the
IA-32 instruction set. These extensions allow the VMM/hypervisor to run a guest OS that
expects to run in kernel mode, in lower privileged rings. Hardware-assisted virtualization
not only proposes new instructions but also introduces a new privileged access level,
called ring -1, where the hypervisor/VMM can run. Hence, guest virtual machines can
run in ring 0. With hardware-assisted virtualization, the OS has direct access to resources
without any emulation or OS modification. The hypervisor or VMM can now run at
the newly introduced privilege level, ring -1, with the guest OSes running on ring 0.
Also, with hardware-assisted virtualization, the VMM/hypervisor is relaxed and needs
to perform less work compared to the other techniques mentioned, which reduces the
performance overhead. This capability to run directly in ring -1 can be described with the
following diagram:
最新资料最新资料
libvirt
When working with KVM, you're most likely to first interface with its main Application
Programming Interface (API), called libvirt (https://libvirt.org). But libvirt has
other functionalities – it's also a daemon and a management tool for different hypervisors,
some of which we mentioned earlier. One of the most common tools used to interface
with libvirt is called virt-manager (http://virt-manager.org), a Gnome-based
graphical utility that you can use to manage various aspects of your local and remote
hypervisors, if you so choose. libvirt's CLI utility is called virsh. Keep in mind that you
can manage remote hypervisors via libvirt, so you're not restricted to a local hypervisor
only. That's why virt-manager has an additional parameter called --connect. libvirt
is also part of various other KVM management tools, such as oVirt (http://www.
ovirt.org), which we will discuss in the next chapter.
最新资料最新资料
The goal of the libvirt library is to provide a common and stable layer for managing virtual
machines running on a hypervisor. In short, as a management layer, it is responsible for
providing the API that performs management tasks such as virtual machine provision,
creation, modification, monitoring, control, migration, and so on. In Linux, you will have
noticed that some of the processes are daemonized. The libvirt process is also daemonized,
and it is called libvirtd. As with any other daemon process, libvirtd provides services
to its clients upon request. Let's try to understand what exactly happens when a libvirt
client such as virsh or virt-manager requests a service from libvirtd. Based on the
connection URI (discussed in the following section) that's passed by the client, libvirtd
opens a connection to the hypervisor. This is how the client's virsh or virt-manager asks
libvirtd to start talking to the hypervisor. In the scope of this book, we are aiming to
look at KVM virtualization technology. So, it would be better to think about it in terms of
a QEMU/KVM hypervisor instead of discussing some other hypervisor communication
from libvirtd. You may be a bit confused when you see QEMU/KVM as the underlying
hypervisor name instead of either QEMU or KVM. But don't worry – all will become clear
in due course. The connection between QEMU and KVM will be discussed in the following
chapters. For now, just know that there is a hypervisor that uses both the QEMU and
KVM technologies.
The internal workings of libvirt, QEMU, and KVM 31
Let's take a look at the source code now. We can get the libvirt source code from the libvirt
Git repository:
[root@kvmsource]# yum -y install git-core
[root@kvmsource]# git clone git://libvirt.org/libvirt.git
Once you clone the repo, you can see the following hierarchy of files in the repo:
最新资料最新资料
The ability to connect to various virtualization solutions gets us much more usability out
of the virsh command. This might come in very handy in mixed environments, such as
if you're connecting to both KVM and XEN hypervisors from the same system.
As in the preceding figure, there is a public API that is exposed to the outside world.
Depending on the connection URI (for example, virsh --connect QEMU://xxxx/
system) passed by the clients, when initializing the library, this public API uses internal
drivers in the background. Yes, there are different categories of driver implementations
in libvirt. For example, there are hypervisor, interface, network, nodeDevice,
nwfilter, secret, storage, and so on. Refer to driver.h inside the libvirt source
code to learn about the driver data structures and other functions associated with the
different drivers.
Take the following example:
struct _virConnectDriver {
virHypervisorDriverPtr hypervisorDriver;
virInterfaceDriverPtr interfaceDriver;
virNetworkDriverPtr networkDriver;
virNodeDeviceDriverPtr nodeDeviceDriver;
The internal workings of libvirt, QEMU, and KVM 33
virNWFilterDriverPtr nwfilterDriver;
virSecretDriverPtr secretDriver;
virStorageDriverPtr storageDriver;
};
The struct fields are self-explanatory and convey which type of driver is represented
by each of the field members. As you might have assumed, one of the important or
main drivers is the hypervisor driver, which is the driver implementation of different
hypervisors supported by libvirt. The drivers are categorized as primary and secondary
drivers. The hypervisor driver is an example of a primary driver. The following list gives
us some idea about the hypervisors supported by libvirt. In other words, hypervisor-level
driver implementations exist for the following hypervisors (check the README and the
libvirt source code):
Previously, we mentioned that there are secondary-level drivers as well. Not all, but
some secondary drivers (see the following) are shared by several hypervisors. That said,
currently, these secondary drivers are used by hypervisors such as the LXC, OpenVZ,
QEMU, UML, and Xen drivers. The ESX, Hyper-V, Power Hypervisor, Remote, Test, and
VirtualBox drivers all implement secondary drivers directly.
Examples of secondary-level drivers include the following:
are consumed to perform these operations, such as interface setup, firewall rules,
storage management, and general provisioning of APIs. The following is from
https://libvirt.org/api.html:
You should now have some idea about the internal structure of libvirt implementations;
this can be expanded further:
Our area of interest is QEMU/KVM. So, let's explore it further. Inside the src directory
of the libvirt source code repository, there is a directory for QEMU hypervisor driver
implementation code. Pay some attention to the source files, such as qemu_driver.c,
which carries core driver methods for managing QEMU guests.
See the following example:
libvirt makes use of different driver codes to probe the underlying hypervisor/emulator. In
the context of this book, the component of libvirt responsible for finding out the QEMU/
KVM presence is the QEMU driver code. This driver probes for the qemu-kvm binary
and the /dev/kvm device node to confirm that the KVM fully virtualized hardware-
accelerated guests are available. If these are not available, the possibility of a QEMU
emulator (without KVM) is verified with the presence of binaries such as qemu, qemu-
system-x86_64, qemu-system-mips, qemu-system-microblaze, and so on.
The internal workings of libvirt, QEMU, and KVM 37
Basically, libvirt's QEMU driver is looking for different binaries in different distributions and
different paths – for example, qemu-kvm in RHEL/Fedora. Also, it finds a suitable QEMU
binary based on the architecture combination of both host and guest. If both the QEMU
binary and KVM are found, then KVM is fully virtualized and hardware-accelerated guests
will be available. It's also libvirt's responsibility to form the entire command-line argument
for the QEMU-KVM process. Finally, after forming the entire command-line (qemu_
command.c) arguments and inputs, libvirt calls exec() to create a QEMU-KVM process:
util/vircommand.c
static int virExec(virCommandPtr cmd) {
…...
if (cmd->env)
execve(binary, cmd->args, cmd->env);
else
execv(binary, cmd->args);
In KVMland, there is a misconception that libvirt directly uses the device file (/dev/
kvm) exposed by KVM kernel modules, and instructs KVM to do the virtualization via
the different ioctl() function calls available with KVM. This is indeed a misconception!
As mentioned earlier, libvirt spawns the QEMU-KVM process and QEMU talks to the
KVM kernel modules. In short, QEMU talks to KVM via different ioctl() to the
最新资料最新资料
/dev/kvm device file exposed by the KVM kernel module. To create a virtual machine
(for example, virsh create), all libvirt does is spawn a QEMU process, which in turn
creates the virtual machine. Please note that a separate QEMU-KVM process is launched
for each virtual machine by libvirtd. Properties of virtual machines (the number
of CPUs, memory size, I/O device configuration, and so on) are defined in separate
XML files that are located in the /etc/libvirt/qemu directory. These XML files
contain all of the necessary settings that QEMU-KVM processes need to start running
virtual machines. libvirt clients issue requests via the AF_UNIX socket /var/run/
libvirt/libvirt-sock that libvirtd is listening on.
The next topic on our list is QEMU – what it is, how it works, and how it interacts
with KVM.
QEMU
QEMU was written by Fabrice Bellard (creator of FFmpeg). It's a free piece of software
and mainly licensed under GNU's General Public License (GPL). QEMU is a generic and
open source machine emulator and virtualizer. When used as a machine emulator, QEMU
can run OSes and programs made for one machine (such as an ARM board) on a different
machine (such as your own PC).
The internal workings of libvirt, QEMU, and KVM 39
QEMU as an emulator
In the previous chapter, we discussed binary translation. When QEMU operates as
an emulator, it is capable of running OSes/programs made for one machine type on a
different machine type. How is this possible? It just uses binary translation methods. In
this mode, QEMU emulates CPUs through dynamic binary translation techniques and
provides a set of device models. Thus, it is enabled to run different unmodified guest OSes
with different architectures. Binary translation is needed here because the guest code has
to be executed in the host CPU. The binary translator that does this job is known as a Tiny
Code Generator (TCG); it's a Just-In-Time (JIT) compiler. It transforms the binary code
written for a given processor into another form of binary code (such as ARM in X86),
as shown in the following diagram (TCG information from Wikipedia at https://
en.wikipedia.org/wiki/QEMU#Tiny_Code_Generator):
最新资料最新资料
QEMU as a virtualizer
This is the mode where QEMU executes the guest code directly on the host CPU, thus
achieving native performance. For example, when working under Xen/KVM hypervisors,
QEMU can operate in this mode. If KVM is the underlying hypervisor, QEMU can
virtualize embedded guests such as Power PC, S390, x86, and so on. In short, QEMU is
capable of running without KVM using the aforementioned binary translation method.
This execution will be slower compared to the hardware-accelerated virtualization enabled
by KVM. In any mode, either as a virtualizer or emulator, QEMU not only emulates the
processor; it also emulates different peripherals, such as disks, networks, VGA, PCI,
serial and parallel ports, USB, and so on. Apart from this I/O device emulation, when
working with KVM, QEMU-KVM creates and initializes virtual machines. As shown
in the following diagram, it also initializes different POSIX threads for each virtual
CPU (vCPU) of a guest. It also provides a framework that's used to emulate the virtual
machine's physical address space within the user-mode address space of QEMU-KVM:
最新资料最新资料
To execute the guest code in the physical CPU, QEMU makes use of POSIX threads. That
being said, the guest vCPUs are executed in the host kernel as POSIX threads. This itself
brings lots of advantages, as these are just some processes for the host kernel at a high-
level view. From another angle, the user-space part of the KVM hypervisor is provided
by QEMU. QEMU runs the guest code via the KVM kernel module. When working with
KVM, QEMU also does I/O emulation, I/O device setup, live migration, and so on.
QEMU opens the device file (/dev/kvm) that's exposed by the KVM kernel module and
executes ioctl() function calls on it. Please refer to the next section on KVM to find
out more about these ioctl()function calls. To conclude, KVM makes use of QEMU
to become a complete hypervisor. KVM is an accelerator or enabler of the hardware
virtualization extensions (VMX or SVM) provided by the processor so that they're tightly
coupled with the CPU architecture. Indirectly, this conveys that virtual systems must also
use the same architecture to make use of hardware virtualization extensions/capabilities.
Once it is enabled, it will definitely give better performance than other techniques, such as
binary translation.
Our next step is to check how QEMU fits into the whole KVM story.
Once it's cloned, you can see a hierarchy of files inside the repo, as shown in the
following screenshot:
最新资料最新资料
Data structures
In this section, we will discuss some of the important data structures of QEMU. The
KVMState structure contains important file descriptors of virtual machine representation
in QEMU. For example, it contains the virtual machine file descriptor, as shown in the
following code:
int vmfd;
int coalesced_mmio;
struct kvm_coalesced_mmio_ring *coalesced_mmio_ring; ….}
struct CPUState {
…..
int nr_cores;
int nr_threads;
…
int kvm_fd;
….
struct KVMState *kvm_state;
struct kvm_run *kvm_run
}
Also, CPUX86State looks into the standard registers for exception and interrupt
最新资料最新资料
handling:
kvm_init() is the function that opens the KVM device file, as shown in the following
code, and it also fills fd [1] and vmfd [2] of KVMState:
.....
}
As you can see in the preceding code, the ioctl() function call with the KVM_CREATE_
VM argument will return vmfd. Once QEMU has fd and vmfd, one more file descriptor
has to be filled, which is just kvm_fd or vcpu fd. Let's see how this is filled by QEMU:
main() ->
-> cpu_init(cpu_model); [#define cpu_
init(cpu_model) CPU(cpu_x86_init(cpu_model)) ]
->cpu_x86_create()
->qemu_init_vcpu
->qemu_kvm_start_vcpu()
->qemu_thread_create
->qemu_kvm_cpu_thread_fn()
-> kvm_init_vcpu(CPUState *cpu)
int kvm_init_vcpu(CPUState *cpu)
{
46 KVM as a Virtualization Solution
KVMState *s = kvm_state;
...
ret = kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)kvm_
arch_vcpu_id(cpu));
cpu->kvm_fd = ret; ---> [vCPU fd]
..
mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
cpu->kvm_run = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE,
MAP_SHARED, cpu->kvm_fd, 0); [3]
...
ret = kvm_arch_init_vcpu(cpu); [target-i386/kvm.c]
…..
}
Some of the memory pages are shared between the QEMU-KVM process and the KVM
kernel modules. You can see such a mapping in the kvm_init_vcpu() function. That
said, two host memory pages per vCPU make a channel for communication between the
QEMU user-space process and the KVM kernel modules: kvm_run and pio_data. Also
understand that, during the execution of these ioctl() function calls that return the
preceding fds, the Linux kernel allocates a file structure and related anonymous nodes.
最新资料最新资料
}
The internal workings of libvirt, QEMU, and KVM 47
The same function, kvm_cpu_exec(), also defines the actions that need to be taken
when the control comes back to the QEMU-KVM userspace from KVM with VM exit.
Even though we will discuss later on how KVM and QEMU communicate with each
other to perform an operation on behalf of the guest, let me touch upon this here. KVM
is an enabler of hardware extensions provided by vendors such as Intel and AMD with
their virtualization extensions such as SVM and VMX. These extensions are used by
KVM to directly execute the guest code on host CPUs. However, if there is an event – for
example, as part of an operation, guest kernel code access hardware device register, which
is emulated by the QEMU – then KVM has to exit back to QEMU and pass control. Then,
QEMU can emulate the outcome of the operation. There are different exit reasons, as
shown in the following code:
switch (run->exit_reason) {
case KVM_EXIT_IO:
DPRINTF("handle_io\n");
case KVM_EXIT_MMIO:
DPRINTF("handle_mmio\n");
DPRINTF("irq_window_open\n");
case KVM_EXIT_SHUTDOWN:
DPRINTF("shutdown\n");
case KVM_EXIT_UNKNOWN:
...
case KVM_EXIT_INTERNAL_ERROR:
…
case KVM_EXIT_SYSTEM_EVENT:
switch (run->system_event.type) {
case KVM_SYSTEM_EVENT_SHUTDOWN:
case KVM_SYSTEM_EVENT_RESET:
case KVM_SYSTEM_EVENT_CRASH:
Now that we know about QEMU-KVM internals, let's discuss the threading models in
QEMU.
48 KVM as a Virtualization Solution
• Main thread
• Worker threads for the virtual disk I/O backend
• One thread for each vCPU
For each and every virtual machine, there is a QEMU process running in the host system.
If the guest system is shut down, this process will be destroyed/exited. Apart from vCPU
threads, there are dedicated I/O threads running a select (2) event loop to process I/O,
such as network packets and disk I/O completion. I/O threads are also spawned by
QEMU. In short, the situation will look like this:
最新资料最新资料
Important note
More details about threading can be fetched from the threading model at
blog.vmsplice.net/2011/03/qemu-internals-overall-
architecutre-and-html?m=1.
The event loop thread is also called iothread. Event loops are used for timers, file
descriptor monitoring, and so on. main_loop_wait() is the QEMU main event loop
thread. This main event loop thread is responsible for main loop services, including file
descriptor callbacks, bottom halves, and timers (defined in qemu-timer.h). Bottom
halves are similar to timers that execute immediately but have lower overhead, and
scheduling them is wait-free, thread-safe, and signal-safe.
Before we leave the QEMU code base, I would like to point out that there are mainly two
parts to device codes. For example, the directory block contains the host side of the block
device code, and hw/block/ contains the code for device emulation.
KVM
There is a common kernel module called kvm.ko and also hardware-based kernel
modules such as kvm-intel.ko (Intel-based systems) and kvm-amd.ko (AMD-based
systems). Accordingly, KVM will load the kvm-intel.ko (if the vmx flag is present)
最新资料最新资料
or kvm-amd.ko (if the svm flag is present) modules. This turns the Linux kernel into
a hypervisor, thus achieving virtualization.
KVM exposes a device file called /dev/kvm to applications so that they can make use of
the ioctl() function calls system calls provided. QEMU makes use of this device file
to talk with KVM and create, initialize, and manage the kernel-mode context of virtual
machines.
Previously, we mentioned that the QEMU-KVM userspace hosts the virtual machine's
physical address space within the user-mode address space of QEMU/KVM, which
includes memory-mapped I/O. KVM helps us achieve that. There are more things that can
be achieved with the help of KVM. The following are some examples:
• Emulation of certain I/O devices; for example, (via MMIO) the per-CPU local APIC
and the system-wide IOAPIC.
• Emulation of certain privileged (R/W of system registers CR0, CR3, and CR4)
instructions.
50 KVM as a Virtualization Solution
• The facilitation to run guest code via VMENTRY and handling intercepted events
at VMEXIT.
• Injecting events, such as virtual interrupts and page faults, into the flow of the
execution of the virtual machine and so on. This is also achieved with the help
of KVM.
KVM is not a full hypervisor; however, with the help of QEMU and emulators (a slightly
modified QEMU for I/O device emulation and BIOS), it can become one. KVM needs
hardware virtualization-capable processors to operate. Using these capabilities, KVM
turns the standard Linux kernel into a hypervisor. When KVM runs virtual machines,
every virtual machine is a normal Linux process, which can obviously be scheduled to
run on a CPU by the host kernel, as with any other process present in the host kernel.
In Chapter 1, Understanding Linux Virtualization, we discussed different CPU modes of
execution. As you may recall, there is mainly a user mode and a kernel/supervisor mode.
KVM is a virtualization feature in the Linux kernel that lets a program such as QEMU
safely execute guest code directly on the host CPU. This is only possible when the target
architecture is supported by the host CPU.
However, KVM introduced one more mode called guest mode. In a nutshell, guest mode
allows us to execute guest system code. It can either run the guest user or the kernel code.
With the support of virtualization-aware hardware, KVM virtualizes the process states,
最新资料最新资料
With Intel's VT-X, the VMM runs in VMX root operation mode, while the guests (which
are unmodified OSes) run in VMX non-root operation mode. This VMX brings additional
virtualization-specific instructions to the CPU, such as VMPTRLD, VMPTRST, VMCLEAR,
VMREAD, VMWRITE, VMCALL, VMLAUNCH, VMRESUME, VMXOFF, and VMXON. The
virtualization mode (VMX) is turned on by VMXON and can be disabled by VMXOFF.
To execute the guest code, we have to use VMLAUNCH/VMRESUME instructions and leave
VMEXIT. But wait, leave what? It's a transition from non-root operation to root operation.
Obviously, when we do this transition, some information needs to be saved so that it
can be fetched later. Intel provides a structure to facilitate this transition called VMCS; it
handles much of the virtualization management functionality. For example, in the case of
VMEXIT, the exit reason will be recorded inside this structure. Now, how do we read or
write from this structure? VMREAD and VMWRITE instructions are used to read or write to
the respective fields.
Previously, we discussed SLAT/EPT/AMD-Vi. Without EPT, the hypervisor must exit
the virtual machine to perform address translations, which reduces performance. As we
noticed in Intel's virtualization-based processors' operating modes, AMD's SVM also
has a couple of operating modes, which are nothing but host mode and guest mode. As
you may have assumed, the hypervisor runs in host mode and the guests run in guest
mode. Obviously, when in guest mode, some instructions can cause VMEXIT exceptions,
which are handled in a manner that is specific to the way guest mode is entered. There
最新资料最新资料
Please take note of the VMCS or VMCB store guest configuration specifics, such
as machine control bits and processor register settings. I suggest that you examine
the structure definitions from the source. These data structures are also used by the
hypervisor to define events to monitor while the guest is executing. These events can
be intercepted. Note that these structures are in the host memory. At the time of using
VMEXIT, the guest state is saved in VMCS. As mentioned earlier, the VMREAD instruction
reads the specified field from the VMCS, while the VMWRITE instruction writes the
specified field to the VMCS. Also, note that there is one VMCS or VMCB per vCPU.
These control structures are part of the host memory. The vCPU state is recorded in these
control structures.
KVM APIs
As mentioned earlier, there are three main types of ioctl() function calls. The kernel
docs says the following (you can check it at https://www.kernel.org/doc/
Documentation/virtual/kvm/api.txt):
Three sets of ioctl make up the KVM API. The KVM API is a set of ioctls
that are issued to control various aspects of a virtual machine. These ioctls
belong to three classes:
- System ioctls: These query and set global attributes, which affect
最新资料最新资料
- Device ioctls: Used for device control, executed from the same context that
spawned the VM creation.
- VM ioctls: These query and set attributes that affect an entire virtual
machine—for example, memory layout. In addition, a VM ioctl is used
to create virtual CPUs (vCPUs). It runs VM ioctls from the same process
(address space) that was used to create the VM.
- vCPU ioctls: These query and set attributes that control the operation of
a single virtual CPU. They run vCPU ioctls from the same thread that was
used to create the vCPU.
To find out more about the ioctl() function calls exposed by KVM and the ioctl()
function calls that belong to a particular group of fd, please refer to KVM.h.
The internal workings of libvirt, QEMU, and KVM 53
r = KVM_API_VERSION;
break;
case KVM_CREATE_VM:
r = kvm_dev_ioctl_create_vm(arg);
break;
case KVM_CHECK_EXTENSION:
r = kvm_vm_ioctl_check_extension_generic(NULL,
arg);
break;
case KVM_GET_VCPU_MMAP_SIZE:
. …..
}
};
static struct file_operations kvm_vcpu_fops = {
.release = kvm_vcpu_release,
.unlocked_ioctl = kvm_vcpu_ioctl,
….
.mmap = kvm_vcpu_mmap,
.llseek = noop_llseek,
};
Data structures
From the perspective of the KVM kernel modules, each virtual machine is represented by
a kvm structure:
include/linux/kvm_host.h :
struct kvm {
...
struct mm_struct *mm; /* userspace tied to this vm */
...
struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
....
struct kvm_io_bus __rcu *buses[KVM_NR_BUSES];
….
struct kvm_coalesced_mmio_ring *coalesced_mmio_ring;
…..
}
As you can see in the preceding code, the kvm structure contains an array of pointers to
kvm_vcpu structures, which are the counterparts of the CPUX86State structures in the
最新资料最新资料
struct kvm_vcpu {
...
struct kvm *kvm;
int cpu;
…..
int vcpu_id;
…..
struct kvm_run *run;
…...
struct kvm_vcpu_arch arch;
…
}
56 KVM as a Virtualization Solution
The x86 architecture-specific part of the kvm_vcpu structure contains fields to which
the guest register state can be saved after a virtual machine exit and from which the guest
register state can be loaded before a virtual machine entry:
arch/x86/include/asm/kvm_host.h
struct kvm_vcpu_arch {
..
unsigned long regs[NR_VCPU_REGS];
unsigned long cr0;
unsigned long cr0_guest_owned_bits;
…..
struct kvm_lapic *apic; /* kernel irqchip context */
..
struct kvm_mmu mmu;
..
struct kvm_pio_request pio;
void *pio_data;
..
/* emulate context */ 最新资料最新资料
As you can see in the preceding code, kvm_vcpu has an associated kvm_run structure
used for the communication (with pio_data) between the QEMU userspace and the
KVM kernel module, as mentioned earlier. For example, in the context of VMEXIT,
to satisfy the emulation of virtual hardware access, KVM has to return to the QEMU
userspace process; KVM stores the information in the kvm_run structure for QEMU
to fetch it:
/include/uapi/linux/kvm.h:
/* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
struct kvm_run {
/* in */
...
/* out */
The internal workings of libvirt, QEMU, and KVM 57
...
/* in (pre_kvm_run), out (post_kvm_run) */
...
union {
/* KVM_EXIT_UNKNOWN */
...
/* KVM_EXIT_FAIL_ENTRY */
...
/* KVM_EXIT_EXCEPTION */
...
/* KVM_EXIT_IO */
struct {
#define KVM_EXIT_IO_IN 0
#define KVM_EXIT_IO_OUT 1
...
} io;
...
}
最新资料最新资料
The kvm_run struct is an important data structure; as you can see in the preceding code,
union contains many exit reasons, such as KVM_EXIT_FAIL_ENTRY, KVM_EXIT_IO,
and so on.
When we discussed hardware virtualization extensions, we touched upon VMCS and
VMCB. These are important data structures when we think about hardware-accelerated
virtualization. These control blocks especially help in VMEXIT scenarios. Not every
operation can be allowed for guests; at the same time, it's also difficult if the hypervisor
does everything on behalf of the guest. Virtual machine control structures, such as VMCS
or VMCB, control the behavior. Some operations are allowed for guests, such as changing
some bits in shadowed control registers, but others are not. This clearly provides fine-
grained control over what guests are allowed to do and not do. VMCS control structures
also provide control over interrupt delivery and exceptions. Previously, we said the exit
reason of VMEXIT is recorded inside the VMCS; it also contains some data about it. For
example, if write access to a control register caused the exit, information about the source
and destination registers is recorded there.
Let's look at some of the important data structures before we dive into the vCPU
execution flow.
58 KVM as a Virtualization Solution
vcpu_vmx structure
struct vcpu_vmx {
struct kvm_vcpu *vcpu;
...
struct loaded_vmcs vmcs01;
struct loaded_vmcs *loaded_vmcs;
….
最新资料最新资料
vcpu_svm structure
struct vcpu_svm {
struct kvm_vcpu *vcpu;
…
struct vmcb *vmcb;
….
}
The vcpu_vmx or vcpu_svm structures are allocated by the following code path:
kvm_arch_vcpu_create()
->kvm_x86_ops->vcpu_create
->vcpu_create() [.vcpu_create = svm_create_
vcpu, .vcpu_create = vmx_create_vcpu,]
Execution flow of vCPU 59
Please note that the VMCS or VMCB store guest configuration specifics such as machine
control bits and processor register settings. I would suggest you examine the structure
definitions from the source. These data structures are also used by the hypervisor to
define events to monitor while the guest is executing. These events can be intercepted and
these structures are in the host memory. At the time of VMEXIT, the guest state is saved
in VMCS. As mentioned earlier, the VMREAD instruction reads a field from the VMCS,
whereas the VMWRITE instruction writes the field to it. Also, note that there is one VMCS
or VMCB per vCPU. These control structures are part of the host memory. The vCPU
state is recorded in these control structures.
According to the underlying architecture and hardware, different structures are initialized
by the KVM kernel modules and one among them is vmx_x86_ops/svm_x86_ops
(owned by either the kvm-intel or kvm-amd module). It defines different operations
that need to be performed when the vCPU is in context. KVM makes use of the
kvm_x86_ops vector to point either of these vectors according to the KVM module
(kvm-intel or kvm-amd) loaded for the hardware. The run pointer defines the
function that needs to be executed when the guest vCPU run is in action, and handle_
exit defines the actions needed to be performed at the time of VMEXIT. Let's check the
Intel (vmx) structure for that:
.vcpu_create = svm_create_vcpu,
.run = svm_vcpu_run,
.handle_exit = handle_exit,
..
}
switch (ioctl) {
Execution flow of vCPU 61
case KVM_RUN:
….
kvm_arch_vcpu_ioctl_run(vcpu, vcpu->run);
->vcpu_load
-> vmx_vcpu_load
->vcpu_run(vcpu);
->vcpu_enter_guest
->vmx_vcpu_run
….
}
for (;;) {
最新资料最新资料
if (kvm_vcpu_running(vcpu)) {
r = vcpu_enter_guest(vcpu);
} else {
r = vcpu_block(kvm, vcpu);
}
Once it's in vcpu_enter_guest(), you can see some of the important calls happening
when it enters guest mode in KVM:
vcpu->mode = OUTSIDE_GUEST_MODE;
kvm_guest_exit();
62 KVM as a Virtualization Solution
r = kvm_x86_ops->handle_exit(vcpu);
[vmx_handle_exit or handle_exit ]
…
}
You can see a high-level picture of VMENTRY and VMEXIT from the vcpu_enter_
guest() function. That said, VMENTRY ([vmx_vcpu_run or svm_vcpu_run])
is just a guest OS executing in the CPU; different intercepted events can occur at this
stage, causing VMEXIT. If this happens, any vmx_handle_exit or handle_exit
function call will start looking into this exit cause. We have already discussed the reasons
for VMEXIT in previous sections. Once there is VMEXIT, the exit reason is analyzed and
action is taken accordingly.
vmx_handle_exit() is the function responsible for handling the exit reason:
[EXIT_REASON_EXCEPTION_NMI] = handle_exception,
[EXIT_REASON_EXTERNAL_INTERRUPT] = handle_external_
interrupt,
[EXIT_REASON_TRIPLE_FAULT] = handle_triple_
fault,
[EXIT_REASON_IO_INSTRUCTION] = handle_io,
[EXIT_REASON_CR_ACCESS] = handle_cr,
[EXIT_REASON_VMCALL] = handle_vmcall,
[EXIT_REASON_VMCLEAR] = handle_vmclear,
[EXIT_REASON_VMLAUNCH] = handle_vmlaunch,
…
}
Execution flow of vCPU 63
….
return svm_exit_handlers[exit_code](svm);
}
kvm-all.c:
static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) =
{
[SVM_EXIT_READ_CR0] = cr_interception,
[SVM_EXIT_READ_CR3] = cr_interception,
[SVM_EXIT_READ_CR4] = cr_interception,
….
}
switch (run->exit_reason) {
case KVM_EXIT_IO:
DPRINTF("handle_io\n");
/* Called outside BQL */
kvm_handle_io(run->io.port, attrs,
(uint8_t *)run + run->io.data_
offset,
run->io.direction,
run->io.size,
64 KVM as a Virtualization Solution
run->io.count);
ret = 0;
break;
This chapter was a bit source code-heavy. Sometimes, digging in and checking the source
code is just about the only way to understand how something works. Hopefully, this
chapter managed to do just that.
Summary
In this chapter, we covered the inner workings of KVM and its main partners in Linux
virtualization – libvirt and QEMU. We discussed various types of virtualization – binary
translation, full, paravirtualization, and hardware-assisted virtualization. We checked a
bit of kernel, QEMU, and libvirt source code to learn about their interaction from inside.
This gave us the necessary technical know-how to understand the topics that will follow in
this book – everything ranging from how to create virtual machines and virtual networks
to scaling the virtualization idea to a cloud concept. Understanding these concepts
will also make it much easier for you to understand the key goal of virtualization from
an enterprise company's perspective – how to properly design a physical and virtual
infrastructure, which will slowly but surely be introduced as a concept throughout this
book. Now that we've covered the basics about how virtualization works, it's time to move
最新资料最新资料
on to a more practical subject – how to deploy the KVM hypervisor, management tools,
and oVirt. We'll do this in the next chapter.
Questions
1. What is paravirtualization?
2. What is full virtualization?
3. What is hardware-assisted virtualization?
4. What is the primary goal of libvirt?
5. What does KVM do? What about QEMU?
Further reading 65
Further reading
Please refer to the following links for more information regarding what was covered in
this chapter:
最新资料最新资料
最新资料最新资料
Section 2:
libvirt and ovirt for
Virtual Machine
Management
最新资料最新资料
In this part of the book, you will get a complete understanding of how to install, configure,
and manage a KVM hypervisor using libvirt. You will get advanced knowledge of
KVM infrastructure components, such as networking, storage, and virtual hardware
configuration. As part of the learning process, you will also get a thorough knowledge of
virtual machine life cycle management and virtual machine migration techniques, as well
as virtual machine disk management. At the end of part 2, you will be well acquainted with
the libvirt command-line management tool virsh and the GUI tool virt-manager.
This part of the book comprises the following chapters:
This chapter provides you with an insight into the main topic of our book, which is the
Kernel Virtual Machine (KVM) and its management tools, libvirt and oVirt. We will
also learn how to do a complete installation of these tools from scratch using a basic
deployment of CentOS 8. You'll find this to be a very important topic as there will be
situations where you just don't have all of the necessary utilities installed – especially
oVirt, as this is a completely separate part of the overall software stack, and a free
management platform for KVM. As oVirt has a lot of moving parts – Python-based
daemons and supporting utilities, libraries, and a GUI frontend – we will include a
step-by-step guide to make sure that you can install oVirt with ease.
70 Installing KVM Hypervisor, libvirt, and oVirt
a surprise that QEMU also uses a modular approach. This has been a core principle in
the Linux world for many years, which further boosts the efficiency of how we use our
physical resources.
When we add libvirt as a management platform on top of QEMU, we get access to some
cool new utilities such as the virsh command, which we can use to do virtual machine
administration, virtual network administration, and a whole lot more. Some of the
utilities that we're going to discuss later on in this book (for example, oVirt) use libvirt as
a standardized set of libraries and utilities to make their GUI-magic possible – basically,
they use libvirt as an API. There are other commands that we get access to for a variety of
purposes. For example, we're going to use a command called virt-host-validate to
check whether our server is compatible with KVM or not.
Getting acquainted with oVirt 71
This is where the oVirt project comes in. oVirt is an open source platform for the
management of our KVM environment. It's a GUI-based tool that has a lot of moving
parts in the background – the engine runs on a Java-based WildFly server (what used
to be known as JBoss), the frontend uses a GWT toolkit, and so on. But all of them are
there to make one thing possible – for us to manage a KVM-based environment from
a centralized, web-based administration console.
From an administration standpoint, oVirt has two main building blocks – the engine
(which we can connect to by using a GUI interface) and its agents (which are used to
communicate with hosts). Let's describe their functionalities in brief.
72 Installing KVM Hypervisor, libvirt, and oVirt
The oVirt engine is the centralized service that can be used to perform anything that we
need in a virtualized environment – manage virtual machines, move them, create images,
storage administration, virtual network administration, and so on. This service is used to
manage oVirt hosts and to do that, it needs to talk to something on those hosts. This is
where the oVirt agent (vdsm) comes into play.
Some of the available advanced functionalities of the oVirt engine include the following:
Obviously, we need to deploy an oVirt agent and related utilities to our hosts, which are
going to be the main part of our environment and a place where we will host everything
– virtual machines, templates, virtual networks, and so on. For that purpose, oVirt uses
a specific agent-based mechanism, via an agent called vdsm. This is an agent that we will
最新资料最新资料
deploy to our CentOS 8 hosts so that we can add them to oVirt's inventory, which, in turn,
means that we can manage them by using the oVirt engine GUI. Vdsm is a Python-based
agent that the oVirt engine uses so that it can directly communicate with a KVM host, and
vdsm can then talk to the locally installed libvirt engine to do all the necessary operations.
It's also used for configuration purposes as hosts need to be configured to be used in
the oVirt environment in order to configure virtual networks, storage management
and access, and so on. Also, vdsm has Memory Overcommitment Manager (MOM)
integration so that it can efficiently manage memory on our virtualization hosts.
Installing QEMU, libvirt, and oVirt 73
最新资料最新资料
• We're going to use CentOS 8 for everything in this book (apart from some bits and
pieces that only support CentOS 7 as the last supported version at the time of writing).
• Our default installation profile is always going to be Server with GUI, with the
premise being that we're going to cover both GUI and text-mode utilities to do
almost everything that we're going to do in this book.
• Everything that we need to install on top of our default Server with GUI installation
is going to be installed manually so that we have a complete, step-by-step guide for
everything that we do.
74 Installing KVM Hypervisor, libvirt, and oVirt
• All the examples that we're going to cover in this book can be installed on a single
physical server with 16 physical cores and 64 GB of memory. If you modify some
numbers (number of cores assigned to virtual machines, amount of memory
assigned to some virtual machines, and so on), you could do this with a 6-core
laptop and 16 GB of memory, provided that you're not running all the virtual
machines all the time. If you shut the virtual machines down after you've completed
this chapter and start the necessary ones in the next chapter, you'll be fine with that.
In our case, we used a HP ProLiant DL380p Gen8, an easy-to-find, second-hand
server – and a quite cheap one at that.
After going through a basic installation of our server – selecting the installation profile,
assigning network configuration and root password, and adding additional users (if we need
them) – we're faced with a system that we can't do virtualization with because it doesn't
have all of the necessary utilities to run KVM virtual machines. So, the first thing that we're
going to do is a simple installation of the necessary modules and base applications so that
we can check whether our server is compatible with KVM. So, log into your server as an
administrative user and issue the following command:
We also need to tell the kernel that we're going to use IOMMU. This is achieved by
editing /etc/default/grub file, finding the GRUB_CMDLINE_LINUX and adding
a statement at the end of this line:
intel_iommu=on
Don't forget to add a single space before adding the line. Next step is reboot, so, we need
to do:
systemctl reboot
By issuing these commands, we're installing all the necessary libraries and binaries to
run our KVM-based virtual machines, as well as to use virt-manager (the GUI libvirt
management utility) to manage our KVM virtualization server.
Also, by adding the IOMMU configuration, we're making sure that our host sees the
IOMMU and doesn't throw us an error when we use virt-host-validate command
Installing QEMU, libvirt, and oVirt 75
After that, let's check whether our host is compatible with all the necessary KVM
requirements by issuing the following command:
virt-host-validate
This command goes through multiple tests to determine whether our server is compatible
or not. We should get an output like this:
最新资料最新资料
Figure 3.3 – Testing KVM virtual networks and listing the available virtual machines
By using these two commands, we checked whether our virtualization host has a correctly
configured default virtual network switch/bridge (more about this in the next chapter), as
well as whether we have any virtual machines running. We have the default bridge and no
virtual machines, so everything is as it should be.
76 Installing KVM Hypervisor, libvirt, and oVirt
There are some parameters here that might be a bit confusing. Let's start with the
--os-variant parameter, which describes which guest operating system you want
to install by using the virt-install command. If you want to get a list of supported
guest operating systems, run the following command:
osinfo-query os
The --network parameter is related to our default virtual bridge (we mentioned this
earlier). We definitely want our virtual machine to be network-connected, so we picked
this parameter to make sure that it's network-connected out of the box.
最新资料最新资料
one. So, if you don't have that, you're stuck with text-editing an existing kickstart file.
It's not very difficult, but it might take a bit of effort. The most important setting that we
need to configure correctly is related to the location that we're going to install our virtual
machine from – on a network or from a local directory (as we did in our first virt-
install example, by using a CentOS ISO from local disk). If we're going to use an ISO
file locally stored on the server, then it's an easy configuration. First, we're going to deploy
the Apache web server so that we can host our kickstart file online (which will come in
handy later). So, we need the following commands:
Before we start the deployment process, use the vi editor (or any other editor you prefer)
to edit the first configuration line in our kickstart file (/var/www/html/ks.cfg),
which says something like ignoredisk --only-use=sda, to ignoredisk
--only-use=vda. This is because virtual KVM machines don't use sd* naming for
devices, but vd naming. This makes it easier for any administrator to figure out if they
are administering a physical or a virtual server after connecting to it.
By editing the kickstart file and using these commands, we installed and started httpd
(Apache web server). Then, we permanently started it so that it gets started after every
next server reboot. Then, we copied our default kickstart file (anaconda-ks.cfg) to
Apache's DocumentRoot directory (the directory that Apache serves its files from) and
changed permissions so that Apache can actually read that file when a client requests it. In
our example, the client that's going to use it is going to be the virt-install command.
The server that we're using to illustrate this feature has an IP address of 10.10.48.1,
which is what we're going to use for our kickstart URL. Bear in mind that the default
KVM bridge uses IP address 192.168.122.1, which you can easily check with the
ip command:
Also, there might be some firewall settings that will need to be changed on the physical
server (accepting HTTP connections) so that the installer can successfully get the
最新资料最新资料
kickstart file. So, let's try that. In this and the following examples, pay close attention to
the --vcpus parameter (the number of virtual CPU cores for our virtual machine) as
you might want to change that to your environment. In other words, if you don't have 4
cores, make sure that you lower the core count. We are just using this as an example:
Important note
Please take note of the parameter that we changed. Here, we must use the
--location parameter, not the --cdrom parameter, as we're injecting a
kickstart configuration into the boot process (it's mandatory to do it this way).
After the deployment process is done, we should have two fully functional virtual
machines called MasteringKVM01 and MasteringKVM02 on our server, ready to be
used for our future demonstrations. The second virtual machine (MasteringKVM02)
will have the same root password as the first one because we didn't change anything in
the kickstart file except for the virtual disk option. So, after deployment, we can log into
our MasteringKVM02 machine by using the root username and password from the
MasteringKVM01 machine.
If we wanted to take this a step further, we could create a shell script with a loop that's
going to automatically give unique names to virtual machines by using indexing. We can
easily implement this by using a for loop and its counter:
#!/bin/bash
for counter in {1..5}
do
echo "deploying VM $counter" 最新资料最新资料
When we execute this script (don't forget to chmod it to 755!), we should get 10 virtual
machines named LoopVM1-LoopVM5, all with the same settings, which includes the
same root password.
80 Installing KVM Hypervisor, libvirt, and oVirt
If we're using a GUI server installation, we can use GUI utilities to administer our KVM
server. One of these utilities is called Virtual Machine Manager, and it's a graphical utility
that enables you to do pretty much everything you need for your basic administration
needs: manipulate virtual networks and virtual machines, open a virtual machine console,
and so on. This utility is accessible from GNOME desktop – you can use the Windows
search key on your desktop and type in virtual, click on Virtual Machine Manager,
and start using it. This is what Virtual Machine Manager looks like:
最新资料最新资料
Installing oVirt
There are different methods of installing oVirt. We can either deploy it as a self-hosted
engine (via the Cockpit web interface or CLI) or as a standalone application via package-
based installation. Let's use the second way for this example – a standalone installation in
a virtual machine. We're going to split the installation into two parts:
First, let's deal with oVirt engine deployment. Deployment is simple enough, and people
usually use one virtual machine for this purpose. Keeping in mind that CentOS 8 is not
supported for oVirt, in our CentOS 8 virtual machine, we need to punch in a couple
of commands:
Again, this is just the installation part; we haven't done any configuration as of yet. So,
that's our logical next step. We need to start a shell application called engine-setup,
which is going to ask us 20 or so questions. They're rather descriptive and explanations are
actually provided by the engine setup directly, so these are the settings that we've used for
our testing environment (FQDN will be different in your environment):
最新资料最新资料
After typing in OK, the engine setup will start. The end result should look something
like this:
you try to log in). After logging in, we should be greeted with the oVirt GUI:
We have various tabs on the left-hand side of the screen – Dashboard, Compute,
Network, Storage, and Administration – and each and every one of these has a
specific purpose:
• Dashboard: The default landing page. It contains the most important information,
a visual representation of the state of the health of our environment, and some basic
information, including the amount of virtual data centers that we're managing,
clusters, hosts, data storage domains, and so on.
• Compute: We go to this page to manage hosts, virtual machines, templates, pools,
data centers, and clusters.
• Network: We go to this page to manage our virtualized networks and profiles.
• Storage: We go to this page to manage storage resources, including disks, volumes,
domains, and data centers.
• Administration: For the administration of users, quotas, and so on.
We will deal with many more oVirt-related operations in Chapter 7, Virtual Machine
– Installation, Configuration, and Life Cycle Management, which is all about oVirt. But
for the time being, let's keep the oVirt engine up and running so that we can come back
to it later and use it for all of our day-to-day operations in our KVM-based virtualized
environment.
最新资料最新资料
Let's say that we created all five of our virtual machines from the shell script example and
that we left them powered on. We can easily check their status by issuing a simple virsh
list command:
最新资料最新资料
Figure 3.13 – The virt-manager GUI – we can see the list of registered
virtual machines and start managing them
86 Installing KVM Hypervisor, libvirt, and oVirt
If we want to do our regular operations on a virtual machine – start, restart, shut down,
turn off – we just need to right-click it and select that option from the menu. For all
the operations to become visible, first, we must start a virtual machine; otherwise, only
four actions are usable out of the available seven – Run, Clone, Delete, and Open.
The Pause, Shut Down sub-menu, and Migrate options will be grayed-out as they can
only be used on a virtual machine that's powered on. So, after we – for example – start
MasteringKVM01, the list of available options is going to get quite a bit bigger:
最新资料最新资料
Figure 3.14 – The virt-manager options – after powering the virtual machine on,
we can now use many more options
We will use virt-manager for various operations throughout this book, so make sure
that you familiarize yourself with it. It is going to make our administrative jobs quite a bit
easier in many situations.
Summary 87
Summary
In this chapter, we laid some basic groundwork and prerequisites for practically
everything that we're going to do in the remaining chapters of this book. We learned how
to install KVM and a libvirt stack. We also learned how to deploy oVirt as a GUI tool to
manage our KVM hosts.
The next few chapters will take us in a more technical direction as we will cover
networking and storage concepts. In order to do that, we will have to take a step back
and learn or review our previous knowledge about networking and storage as these are
extremely important concepts for virtualization, and especially the cloud.
Questions
1. How can we validate whether our host is compatible with the KVM requirements?
2. What's the name of oVirt's default landing page?
3. Which command can we use to manage virtual machines from the command line?
4. Which command can we use to deploy virtual machines from the command line?
Further reading
最新资料最新资料
Please refer to the following links for more information regarding what was covered in
this chapter:
own dedicated, physical network port. By implementing virtual networking, we're also
consolidating networking in a much more manageable way, both from an administration
and cost perspective.
This chapter provides you with an insight into the overall concept of virtualized networking
and Linux-based networking concepts. We will also discuss physical and virtual networking
concepts, try to compare them, and find similarities and differences between them. Also
covered in this chapter is the concept of virtual switching for a per-host concept and
spanned-across-hosts concept, as well as some more advanced topics. These topics include
single-root input/output virtualization, which allows for a much more direct approach to
hardware for certain scenarios. We will come back to some of the networking concepts
later in this book as we start discussing cloud overlay networks. This is because the basic
networking concepts aren't scalable enough for large cloud environments.
In this chapter, we will cover the following topics:
That being said, if you have a firm grasp of VMware or Microsoft-based virtual
networking purely at a technological level, you're in the clear here as all of these
concepts are very similar.
With that out of the way, what's the whole hoopla about virtual networking? It's actually
about understanding where things happen, how, and why. This is because, physically
speaking, virtual networking is literally the same as physical networking. Logically
speaking, there are some differences that relate more to the topology of things than to the
principle or engineering side of things. And that's what usually throws people off a little
bit – the fact that there are some weird, software-based objects that do the same job as the
physical objects that most of us have grown used to managing via our favorite CLI-based
or GUI-based utilities.
First, let's introduce the basic building block of virtualized networking – a virtual switch.
A virtual switch is basically a software-based Layer 2 switch that you use to do two things:
So, let's deal with why we need these virtual switches from the virtual machine
perspective. As we mentioned earlier, we use a virtual switch to connect virtual machines
to it. Why? Well, if we didn't have some kind of software object that sits in-between our
physical network card and our virtual machine, we'd have a big problem – we could
only connect virtual machines for which we have physical network ports to our physical
network, and that would be intolerable. First, it goes against some of the basic principles
of virtualization, such as efficiency and consolidation, and secondly, it would cost a lot.
Imagine having 20 virtual machines on your server. This means that, without a virtual
switch, you'd have to have at least 20 physical network ports to connect to the physical
network. On top of that, you'd actually use 20 physical ports on your physical switch as
well, which would be a disaster.
So, by introducing a virtual switch between a virtual machine and a physical network
port, we're solving two problems at the same time – we're reducing the number of physical
network adapters that we need per server, and we're reducing the number of physical
switch ports that we need to use to connect our virtual machines to the network. We
can actually argue that we're solving a third problem as well – efficiency – as there are
many scenarios where one physical network card can handle being an uplink for 20
virtual machines connected to a virtual switch. Specifically, there are large parts of our
environments that don't consume a lot of network traffic and for those scenarios, virtual
networking is just amazingly efficient. 最新资料最新资料
Virtual networking
Now, in order for that virtual switch to be able to connect to something on a virtual
machine, we have to have an object to connect to – and that object is called a virtual
network interface card, often referred to as a vNIC. Every time you configure a virtual
machine with a virtual network card, you're giving it the ability to connect to a virtual
switch that uses a physical network card as an uplink to a physical switch.
Of course, there are some potential drawbacks to this approach. For example, if you have
50 virtual machines connected to the same virtual switch that uses the same physical
network card as an uplink and that uplink fails (due to a network card issue, cable issue,
switch port issue, or switch issue), your 50 virtual machines won't have access to the
physical network. How do we get around this problem? By implementing a better design
and following the basic design principles that we'd use on a physical network as well.
Specifically, we'd use more than one physical uplink to the same virtual switch.
92 Libvirt Networking
Linux has a lot of different types of networking interfaces, something like 20 different
types, some of which are as follows:
There are a whole host of others. Then, on top of these network interface types, there are
some 10 types of tunneling interfaces, some of which are as follows:
Getting your head around all of them is quite a complex and tedious process, so, in this
book, we're only going to focus on the types of interfaces that are really important to us
for virtualization and (later in this book) the cloud. This is why we will discuss VXLAN
and GENEVE overlay networks in Chapter 12, Scaling Out KVM with OpenStack, as we
need to have a firm grip on Software-Defined Networking (SDN) as well.
So, specifically, as part of this chapter, we're going to cover TAP/TUN, bridging,
Open vSwitch, and macvtap interfaces as these are fundamentally the most important
networking concepts for KVM virtualization.
But before we dig deep into that, let's explain a couple of basic virtual network concepts
that apply to KVM/libvirt networking and other virtualization products (for example,
VMware's hosted virtualization products such as Workstation or Player use the same
concept). When you start configuring libvirt networking, you can choose between three
basic types: NAT, routed, and isolated. Let's discuss what these networking modes do.
our device for accessing the internet (for example, DSL modem) connects to the public
network (internet) and gets a public IP address as a part of that process. On our side of
the network, we have our own subnet (for example, 192.168.0.0/24 or something like
that) for all the devices that we want to connect to the internet.
Now, let's convert that into a virtualized network example. In our virtual machine
scenario, this means that our virtual machine can communicate with anything that's
connected to the physical network via host's IP address, but not the other way around.
For something to communicate to our virtual machine behind a NAT'd switch, our virtual
machine has to initiate that communication (or we have to set up some kind of port
forwarding, but that's beside the point).
94 Libvirt Networking
The following diagram might explain what we're talking about a bit better:
Isolated virtual networks are used in many other security-related scenarios, but this is just
an example scenario that we can easily identify with.
Let's describe our isolated network with a diagram:
最新资料最新资料
The default virtual switch works in NAT mode with the DHCP server active, and again,
there's a simple reason for that – guest operating systems are, by default pre-configured
with DHCP networking configuration, which means that the virtual machine that we just
created is going to poll the network for necessary IP configuration. This way, the VM gets
all the necessary network configuration and we can start using it right away.
The following diagram shows what the default KVM network does:
最新资料最新资料
Now, let's learn how to configure these types of virtual networking concepts from the
shell and from the GUI. We will treat this procedure as a procedure that needs to be
done sequentially:
1. Let's start by exporting the default network configuration to XML so that we can use
it as a template to create a new network:
最新资料最新资料
Make sure that you change the XML file so that it reflects the fact that we are
configuring a new bridge (virbr1). Now, we can complete the configuration
of our new virtual machine network XML file:
Given that we didn't delete our default virtual network, the last command should give us
the following output:
Figure 4.7 – Using virsh net-list to check which virtual networks we have on the KVM host
100 Libvirt Networking
Now, let's create two more virtual networks – a bridged network and an isolated network.
Again, let's use files as templates to create both of these networks. Keep in mind that, in
order to be able to create a bridged network, we are going to need a physical network
adapter, so we need to have an available physical adapter in the server for that purpose.
On our server, that interface is called ens224, while the interface called ens192 is
being used by the default libvirt network. So, let's create two configuration files called
packtro.xml (for our routed network) and packtiso.xml (for our isolated network):
In this specific configuration, we're using ens224 as an uplink to the routed virtual
network, which would use the same subnet (192.168.2.0/24) as the physical
network that ens224 is connected to:
So far, we've discussed virtual networking from an overall host-level. However, there's
also a different approach to the subject – using a virtual machine as an object to which
we can add a virtual network card and connect it to a virtual network. We can use virsh
for that purpose. So, just as an example, we can connect our virtual machine called
MasteringKVM01 to an isolated virtual network:
There are other concepts that allow virtual machine connectivity to a physical network,
and some of them we will discuss later in this chapter (such as SR-IOV). However, now
that we've covered the basic approaches to connecting virtual machines to a physical
network by using a virtual switch/bridge, we need to get a bit more technical. The thing is,
there are more concepts involved in connecting a virtual machine to a virtual switch, such
as TAP and TUN, which we will be covering in the following section.
virtualization. As a part of that process, some of the checks include checking if the
following devices exist:
• /dev/kvm: The KVM drivers create a /dev/kvm character device on the host to
facilitate direct hardware access for virtual machines. Not having this device means
that the VMs won't be able to access physical hardware, although it's enabled in the
BIOS and this will reduce the VM's performance significantly.
• /dev/vhost-net: The /dev/vhost-net character device will be created
on the host. This device serves as the interface for configuring the vhost-net
instance. Not having this device significantly reduces the virtual machine's
network performance.
• /dev/net/tun: This is another character special device used for creating TUN/
TAP devices to facilitate network connectivity for a virtual machine. The TUN/TAP
device will be explained in detail in future chapters. For now, just understand that
having a character device is important for KVM virtualization to work properly.
Let's focus on the last device, the TUN device, which is usually accompanied by a
TAP device.
102 Libvirt Networking
So far, all the concepts that we've covered include some kind of connectivity to a physical
network card, with isolated virtual networks being an exception. But even an isolated
virtual network is just a virtual network for our virtual machines. What happens when
we have a situation where we need our communication to happen in the user space,
such as between applications running on a server? It would be useless to patch them
through some kind of virtual switch concept, or a regular bridge, as that would just bring
additional overhead. This is where TUN/TAP devices come in, providing packet flow for
user space programs. Easily enough, an application can open /dev/net/tun and use
an ioctl() function to register a network device in the kernel, which, in turn, presents
itself as a tunXX or tapXX device. When the application closes the file, the network
devices and routes created by it disappear (as described in the kernel tuntap.txt
documentation). So, it's just a type of virtual network interface for the Linux operating
system supported by the Linux kernel – you can add an IP address and routes to it so that
traffic from your application can route through it, and not via a regular network device.
TUN emulates an L3 device by creating a communication tunnel, something like a point-
to-point tunnel. It gets activated when the tuntap driver gets configured in tun mode.
When you activate it, any data that you receive from a descriptor (the application that
configured it) will be data in the form of regular IP packages (as the most commonly used
case). Also, when you send data, it gets written to the TUN device as regular IP packages.
This type of interface is sometimes used in testing, development, and debugging for
最新资料最新资料
simulation purposes.
The TAP interface basically emulates an L2 Ethernet device. It gets activated when the
tuntap driver gets configured in tap mode. When you activate it, unlike what happens
with the TUN interface (Layer 3), you get Layer 2 raw Ethernet packages, including ARP/
RARP packages and everything else. Basically, we're talking about a virtualized Layer 2
Ethernet connection.
These concepts (especially TAP) are usable on libvirt/QEMU as well because by using
these types of configurations, we can create connections from the host to a virtual
machine – without the libvirt bridge/switch, just as an example. We can actually configure
all of the necessary details for the TUN/TAP interface and then start deploying virtual
machines that are hooked up directly to those interfaces by using kvm-qemu options. So,
it's a rather interesting concept that has its place in the virtualization world as well. This is
especially interesting when we start creating Linux bridges.
Implementing Linux bridging 103
The # brctl show command will list all the available bridges on the server, along
with some basic information, such as the ID of the bridge, Spanning Tree Protocol
(STP) status, and the interfaces attached to it. Here, the tester bridge does not have
any interfaces attached to its virtual ports.
2. A Linux bridge will also be shown as a network device. To see the network details of
最新资料最新资料
You can also use ifconfig to check and configure the network settings for a Linux
bridge; ifconfig is relatively easy to read and understand but not as feature-rich
as the ip command:
# ifconfig tester
tester: flags=4098<BROADCAST,MULTICAST>mtu 1500
ether26:84:f2:f8:09:e0txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
The Linux bridge tester is now ready. Let's create and add a TAP device to it.
104 Libvirt Networking
3. First, check if the TUN/TAP device module is loaded into the kernel. If not, you
already know the drill:
# lsmod | greptun
tun 28672 1
We now have a bridge named tester and a tap device named vm-vnic. Let's add
vm-vnic to tester:
# brctl addif tester vm-vnic
# brctl show
bridge name bridge id STP enabled interfaces
最新资料最新资料
Here, you can see that vm-vnic is an interface that was added to the tester bridge.
Now, vm-vnic can act as the interface between your virtual machine and the tester
bridge, which, in turn, enables the virtual machine to communicate with other virtual
machines that are added to this bridge:
You might also need to remove all the objects and configurations that were created in the
previous procedure. Let's do this step by step via the command line:
1. First, we need to remove the vm-vnic tap device from the tester bridge:
# brctl delif tester vm-vnic
# brctl show tester
bridge name bridge id STP enabled interfaces
tester 8000.460a80dd627d no
Once the vm-vnic has been removed from the bridge, remove the tap device using
the ip command:
# ip tuntap del dev vm-vnic mode tap
These are the same steps that libvirt carried out in the backend while enabling or disabling
networking for a virtual machine. We want you to understand this procedure thoroughly
before moving ahead. Now that we've covered Linux bridging, it's time to move on to a
more advanced concept called Open vSwitch. 最新资料最新资料
• The scale of the environment: This one is more obvious. Because of the environment
size, you need some kind of concept that's going to be managed centrally, instead of
on a host-per-host level, such as the virtual switches we've discussed so far.
• Company policies: These usually dictate some kind of compliance that comes from
configuration standardization as much as possible. Now, we can agree that we could
script some configuration updates via Ansible, Puppet, or something like that,
but what's the use? We're going to have to create new config files, new procedures,
and new workbooks every single time we need to introduce a change to KVM
networking. And big companies frown upon that.
So, what we need is a centralized networking object that can span across multiple hosts and
offer configuration consistency. In this context, configuration consistency offers us a huge
advantage – every change that we introduce in this type of object will be replicated to all the
hosts that are members of this centralized networking object. In other words, what we need
is Open vSwitch (OVS). For those who are more versed in VMware-based networking, we
can use an approximate metaphor – Open vSwitch is for KVM-based environments similar
to what vSphere Distributed Switch is for VMware-based environments.
In terms of technology, OVS supports the following:
最新资料最新资料
Now that we've listed some of the supported technologies, let's discuss the way in which
Open vSwitch works.
Configuring Open vSwitch 107
First, let's talk about the Open vSwitch architecture. The implementation of Open vSwitch
is broken down into two parts: the Open vSwitch kernel module (the data plane) and the
user space tools (the control pane). Since the incoming data packets must be processed as
fast as possible, the data plane of Open vSwitch was pushed to the kernel space:
最新资料最新资料
Open vSwitch works in two modes: normal and flow mode. This chapter will primarily
concentrate on how to bring up a KVM VM connected to Open vSwitch's bridge in
standalone/normal mode and will a give brief introduction to flow mode using the
OpenDaylight controller:
• Normal Mode: Switching and forwarding are handled by OVS bridge. In this
modem OVS acts as an L2 learning switch. This mode is specifically useful when
configuring several overlay networks for your target rather than manipulating the
switch's flow.
• Flow Mode: In flow mode, the Open vSwitch bridge flow table is used to decide on
which port the receiving packets should be forwarded to. All the flows are managed
by an external SDN controller. Adding or removing the control flow requires using
an SDN controller that's managing the bridge or using the ctl command. This
mode allows a greater level of abstraction and automation; the SDN controller
exposes the REST API. Our applications can make use of this API to directly
manipulate the bridge's flows to meet network needs.
Let's move on to the practical aspect and learn how to install Open vSwitch on CentOS 8:
1. The first thing that we must do is tell our system to use the appropriate repositories.
In this case, we need to enable the repositories called epel and centos-
最新资料最新资料
2. The next step will be installing openvswitch from Red Hat's repository:
dnf install openvswitch -y
The last command should throw you some output specifying the version of
Open vSwitch and its DB schema. In our case, it's Open vSwitch 2.11.0 and
DB schema 7.16.1.
Configuring Open vSwitch 109
4. Now that we've successfully installed and started Open vSwitch, it's time to configure
it. Let's choose a deployment scenario in which we're going to use Open vSwitch as a
new virtual switch for our virtual machines. In our server, we have another physical
interface called ens256, which we're going to use as an uplink for our Open vSwitch
virtual switch. We're also going to clear ens256 configuration, configure an IP address
for our OVS, and start the OVS by using the following commands:
ovs-vsctl add-br ovs-br0
ip addr flush dev ens256
ip addr add 10.10.10.1/24 dev ovs-br0
ovs-vsctl add-port ovs-br0 ens256
ip link set dev ovs-br0 up
5. Now that everything has been configured but not persistently, we need to make
the configuration persistent. This means configuring some network interface
configuration files. So, go to /etc/sysconfig/network-scripts and
create two files. Call one of them ifcfg-ens256 (for our uplink interface):
DEVICE=ens256
TYPE=OVSPort
最新资料最新资料
DEVICETYPE=ovs
OVS_BRIDGE=ovs-br0
ONBOOT=yes
6. We didn't configure all of this just for show, so we need to make sure that our
KVM virtual machines are also able to use it. This means – again – that we need
to create a KVM virtual network that's going to use OVS. Luckily, we've dealt with
KVM virtual network XML files before (check the Libvirt isolated network section),
so this one isn't going to be a problem. Let's call our network packtovs and its
corresponding XML file packtovs.xml. It should contain the following content:
<network>
<name>packtovs</name>
<forward mode='bridge'/>
<bridge name='ovs-br0'/>
<virtualport type='openvswitch'/>
</network>
So, now, we can perform our usual operations when we have a virtual network definition
in an XML file, which is to define, start, and autostart the network:
If we left everything as it was when we created our virtual networks, the output from
virsh net-list should look something like this:
After the virtual machine installation completes, we're connected to the OVS-based
packtovs virtual network, and our virtual machine can use it. Let's say that additional
configuration is needed and that we got a request to tag traffic coming from this virtual
machine with VLAN ID 5. Start your virtual machine and use the following set
of commands:
This command tells us that we're using the ens256 port as an uplink and that our virtual
machine, MasteringKVM03, is using the virtual vnet0 network port. We can apply
VLAN tagging to that port by using the following command:
We need to take note of some additional commands related to OVS administration and
management since this is done via the CLI. So, here are some commonly used OVS CLI
administration commands:
• #ovs-vsctl show: A very handy and frequently used command. It tells us what
the current running configuration of the switch is.
最新资料最新资料
The following examples are the most used options for each of these commands:
Also, you can always use the ovs-vsctl show command to get information about the
configuration of your OVS switch:
最新资料最新资料
• Connecting data centers and extending cloud overlay networks across data
center boundaries.
• A variety of disaster recover scenarios. NSX can be a big help for disaster recover,
for multi-site environments, and for integration with a variety of external services
and devices that can be a part of the scenario (Palo Alto PANs).
• Consistent micro-segmentation, across sites, done the right way on the virtual
machine network card level.
• For security purposes, varying from different types of supported VPN technologies
to connect sites and end users, to distributed firewalls, guest introspection options
(antivirus and anti-malware), network introspection options (IDS/IPS), and more.
• For load balancing, up to Layer 7, with SSL offload, session persistence, high
availablity, application rules, and more.
最新资料最新资料
Yes, VMware's take on SDN (NSX) and Open vSwitch seem like competing technologies on
the market, but realistically, there are loads of clients who want to use both. This is where
VMware's integration with OpenStack and NSX's integration with Linux-based KVM
hosts (by using Open vSwitch and additional agents) comes in really handy. Just to further
explain these points – there are things that NSX does that take extensive usage of Open
vSwitch-based technologies – hardware VTEP integration via Open vSwitch Database,
extending GENEVE networks to KVM hosts by using Open vSwitch/NSX integration,
and much more.
Imagine that you're working for a service provider – a cloud service provider, an ISP;
basically, any type of company that has large networks with a lot of network segmentation.
There are loads of service providers using VMware's vCloud Director to provide
cloud services to end users and companies. However, because of market needs, these
environments often need to be extended to include AWS (for additional infrastructure
growth scenarios via the public cloud) or OpenStack (to create hybrid cloud scenarios). If
we didn't have a possibility to have interoperability between these solutions, there would be
no way to use both of these offerings at the same time. But from a networking perspective,
the network background for that is NSX or NSX-T (which actually uses Open vSwitch).
114 Libvirt Networking
It's been clear for years that the future is all about multi-cloud environments, and these
types of integrations will bring in more customers; they will want to take advantage of
these options in their cloud service design. Future developments will also most probably
include (and already partially include) integration with Docker, Kubernetes, and/or
OpenShift to be able to manage containers in the same environment.
There are also some more extreme examples of using hardware – in our example, we are
talking about network cards on a PCI Express bus – in a partitioned way. For the time
being, our explanation of this concept, called SR-IOV, is going to be limited to network
cards, but we will expand on the same concept in Chapter 6, Virtual Display Devices and
Protocols, when we start talking about partitioning GPUs for use in virtual machines. So,
let's discuss a practical example of using SR-IOV on an Intel network card that supports it.
need to have hardware support, so we need to check if our network card actually supports
it. On a physical server, we could use the lspci command to extract attribute information
about our PCI devices and then grep out Single Root I/O Virtualization as a string to try to
see if we have a device that's compatible. Here's an example from our server:
Important Note
Be careful when configuring SR-IOV. You need to have a server that supports
it, a device that supports it, and you must make sure that you turn on SR-IOV
functionality in BIOS. Then, you need to keep in mind that there are servers
that only have specific slots assigned for SR-IOV. The server that we used (HP
Proliant DL380p G8) has three PCI-Express slots assigned to CPU1, but SR-
IOV worked only in slot #1. When we connected our card to slot #2 or #3, we
got a BIOS message that SR-IOV will not work in that slot and that we should
move our card to a slot that supports SR-IOV. So, please, make sure that you
read the documentation of your server thoroughly and connect a SR-IOV
compatible device to a correct PCI-Express slot.
Understanding and using SR-IOV 115
In this specific case, it's an Intel 10 Gigabit network adapter with two ports, which
we could use to do SR-IOV. The procedure isn't all that difficult, and it requires us to
complete the following steps:
So, what you would do is unload the module that the network card is currently using
by using modprobe -r. Then, you would load it again, but by assigning an additional
parameter. On our specific server, the Intel dual-port adapter that we're using (X540-AT2)
was assigned to the ens1f0 and ens1f1 network devices. So, let's use ens1f0 as an
example for SR-IOV configuration at boot time:
1. The first thing that we need to do (as a general concept) is find out which
kernel module our network card is using. To do that, we need to issue the
following command:
ethtool -i ens1f0 | grep ^driver
driver: ixgbe
We need to find additional available options for that module. To do that, we can use
the modinfo command (we're only interested in the parm part of the output):
modinfo ixgbe
…..
Parm: max_vfs (Maximum number of virtual functions
to allocate per physical function – default iz zero and
maximum value is 63.
For example, we're using the ixgbe module here, and we can do the following:
modprobe -r ixgbe
modprobe ixgbe max_vfs=4
2. Then, we can use the modprobe system to make these changes permanent across
reboots by creating a file in /etc/modprobe.d called (for example) ixgbe.conf
and adding the following line to it:
options ixgbe max_vfs=4
116 Libvirt Networking
This would give us up to four virtual functions that we can use inside our virtual
machines. Now, the next issue that we need to solve is how to boot our server with
SR-IOV active at boot time. There are quite a few steps involved here, so, let's get started:
1. We need to add the iommu and vfs parameters to the default kernel boot line and
the default kernel configuration. So, first, open /etc/default/grub and edit the
GRUB_CMDLINE_LINUX line and add intel_iommu=on (or amd_iommu=on if
you're using an AMD system) and ixgbe.max_vfs=4 to it.
2. We need to reconfigure grub to use this change, so we need to use the following
command:
grub2-mkconfig -o /boot/grub2/grub.cfg
3. Sometimes, even that isn't enough, so we need to configure the necessary kernel
parameters, such as the maximum number of virtual functions and the iommu
parameter to be used on the server. That leads us to the following command:
grubby --update-kernel=ALL --args="intel_iommu=on ixgbe.
max_vfs=4"
After reboot, we should be able to see our virtual functions. Type in the following command:
最新资料最新资料
pci_0000_04_10_1
pci_0000_04_10_2
pci_0000_04_10_3
pci_0000_04_10_4
pci_0000_04_10_5
pci_0000_04_10_6
pci_0000_04_10_7
The first two devices are our physical functions. The remaining eight devices (two
ports times four functions) are our virtual devices (from pci_0000_04_10_0 to
pci_0000_04_10_7). Now, let's dump that device's information by using the virsh
nodedev-dumpxml pci_0000_04_10_0 command:
最新资料最新资料
Of course, the domain, bus, slot, and function need to point exactly to our VF. Then, we
can use the virsh command to attach that device to our virtual machine (for example,
MasteringKVM03):
When we use virsh dumpxml, we should now see a part of the output that starts with
<driver name='vfio'/>, along with all the information that we configured in the
previous step (address type, domain, bus, slot, function). Our virtual machine should
have no problems using this virtual function as a network card.
Now, it's time to cover another concept that's very much useful in KVM networking:
macvtap. It's a newer driver that should simplify our virtualized networking by completely
removing tun/tap and bridge drivers with a single module.
最新资料最新资料
Understanding macvtap
This module works like a combination of the tap and macvlan modules. We already
explained what the tap module does. The macvlan module enables us to create virtual
networks that are pinned to a physical network interface (usually, we call this interface a
lower interface or device). Combining tap and macvlan enables us to choose between four
different modes of operation, called Virtual Ethernet Port Aggregator (VEPA), bridge,
private, and passthru.
If we're using the VEPA mode (default mode), the physical switch has to support VEPA
by supporting hairpin mode (also called reflective relay). When a lower device receives
data from a VEPA mode macvlan, this traffic is always sent out to the upstream device,
which means that traffic is always going through an external switch. The advantage of this
mode is the fact that network traffic between virtual machines becomes visible on the
external network, which can be useful for a variety of reasons. You can check how network
flow works in the following sequence of diagrams:
Understanding macvtap 119
Figure 4.17 – macvtap VEPA mode, where traffic is forced to the external network
In private mode, it's similar to VEPA in that everything goes to an external switch, but
unlike VEPA, traffic only gets delivered if it's sent via an external router or switch. You can
use this mode if you want to isolate virtual machines connected to the endpoints from one
another, but not from the external network. If this sounds very much like a private VLAN
scenario, you're completely correct:
最新资料最新资料
Figure 4.18 – macvtap in private mode, using it for internal network isolation
120 Libvirt Networking
In bridge mode, data received on your macvlan that's supposed to go to another macvlan
on the same lower device is sent directly to the target, not externally, and then routed
back. This is very similar to what VMware NSX does when communication is supposed to
happen between virtual machines on different VXLAN networks, but on the same host:
that a single network interface can only be passed to a single guest (1:1 relationship):
In Chapter 12, Scaling Out KVM with OpenStack and Chapter 13, Scaling Out KVM with
AWS, we'll describe why virtualized and overlay networking (VXLAN, GRE, GENEVE)
is even more important for cloud networking as we extend our local KVM-based
environment to the cloud either via OpenStack or AWS.
Summary
In this chapter, we covered the basics of virtualized networking in KVM and explained
why virtualized networking is such a huge part of virtualization. We went knee-deep into
configuration files and their options as this will be the preferred method for administration
in larger environments, especially when talking about virtualized networks.
Pay close attention to all the configuration steps that we discussed through this chapter,
especially the part related to using virsh commands to manipulate network configuration
and to configure Open vSwitch and SR-IOV. SR-IOV-based concepts are heavily used in
latency-sensitive environments to provide networking services with the lowest possible
overhead and latency, which is why this principle is very important for various enterprise
environments related to the financial and banking sector.
Now that we've covered all the necessary networking scenarios (some of which will be
revisited later in this book), it's time to start thinking about the next big topic of the
virtualized world. We've already talked about CPU and memory, as well as networks,
最新资料最新资料
which means we're left with the fourth pillar of virtualization: storage. We will tackle
that subject in the next chapter.
Questions
1. Why is it important that virtual switches accept connectivity from multiple virtual
machines at the same time?
2. How does a virtual switch work in NAT mode?
3. How does a virtual switch work in routed mode?
4. What is Open vSwitch and for what purpose can we use it in virtualized and
cloud environments?
5. Describe the differences between TAP and TUN interfaces.
122 Libvirt Networking
Further reading
Please refer to the following links for more information regarding what was covered in
this chapter:
最新资料最新资料
5
Libvirt Storage
This chapter provides you with an insight into the way that KVM uses storage. Specifically,
we will cover both storage that's internal to the host where we're running virtual machines
and shared storage. Don't let the terminology confuse you here – in virtualization and
cloud technologies, the term shared storage means storage space that multiple hypervisors
最新资料最新资料
can have access to. As we will explain a bit later, the three most common ways of achieving
this are by using block-level, share-level, or object-level storage. We will use NFS as an
example of share-level storage, and Internet Small Computer System Interface (iSCSI)
and Fiber Channel (FC) as examples of block-level storage. In terms of object-based
storage, we will use Ceph. GlusterFS is also commonly used nowadays, so we'll make sure
that we cover that, too. To wrap everything up in an easy-to-use and easy-to-manage box,
we will discuss some open source projects that might help you while practicing with and
creating testing environments.
In this chapter, we will cover the following topics:
• Introduction to storage
• Storage pools
• NFS storage
• iSCSI and SAN storage
• Storage redundancy and multipathing
124 Libvirt Storage
Introduction to storage
Unlike networking, which is something that most IT people have at least a basic
understanding of, storage tends to be quite different. In short, yes, it tends to be a bit more
complex. There are loads of parameters involved, different technologies, and…let's be
honest, loads of different types of configuration options and people enforcing them.
And a lot of questions. Here are some of them:
As you can see, the questions just keep piling up, and we've barely touched the surface,
because there are also questions about which filesystem to use, which physical controller
we will use to access storage, and what type of cabling—it just becomes a big mashup of
variables that has many potential answers. What makes it worse is the fact that many of
those answers can be correct—not just one of them.
Introduction to storage 125
Let's get the basic-level mathematics out of the way. In an enterprise-level environment,
shared storage is usually the most expensive part of the environment and can also have the
most significant negative impact on virtual machine performance, while at the same time
being the most oversubscribed resource in that environment. Let's think about this for a
second—every powered-on virtual machine is constantly going to hammer our storage
device with I/O operations. If we have 500 virtual machines running on a single storage
device, aren't we asking a bit too much from that storage device?
At the same time, some kind of shared storage concept is a key pillar of virtualized
environments. The basic principle is very simple – there are loads of advanced
functionalities that will work so much better with shared storage. Also, many operations
are much faster if shared storage is available. Even more so, there are so many simple
options for high availability when we don't have our virtual machines stored in the same
place where they are being executed.
As a bonus, we can easily avoid Single Point Of Failure (SPOF) scenarios if we design
our shared storage environment correctly. In an enterprise-level environment, avoiding
SPOF is one of the key design principles. But when we start adding switches and adapters
and controllers to the to buy list, our managers' or clients' heads usually starts to hurt. We
talk about performance and risk management, while they talk about price. We talk about
the fact that their databases and applications need to be properly fed in terms of I/O and
bandwidth, and they feel that you can produce that out of thin air. Just wave your magic
最新资料最新资料
So, let's start our journey through these supported protocols and learn how to configure
them. After we cover storage pools, we are going to discuss NFS, a typical share-level
protocol for virtual machine storage. Then, we're going to move to block-level protocols
such as iSCSI and FC. Then, we will move to redundancy and multipathing to increase
the availability and bandwidth of our storage devices. We're also going to cover various
use cases for not-so-common filesystems (such as Ceph, Gluster, and GFS) for KVM
virtualization. We're also going to discuss the new developments that are de facto trends
right now.
Storage pools
When you first start using storage devices—even if they're cheaper boxes—you're faced
with some choices. They will ask you to do a bit of configuration—select the RAID level,
configure hot-spares, SSD caching...it's a process. The same process applies to a situation
in which you're building a data center from scratch or extending an existing one. You have
to configure the storage to be able to use it.
Hypervisors are a bit picky when it comes to storage, as there are storage types that they
support and storage types that they don't support. For example, Microsoft's Hyper-V
supports SMB shares for virtual machine storage, but it doesn't really support NFS storage
for virtual machine storage. VMware's vSphere Hypervisor supports NFS, but it doesn't
最新资料最新资料
support SMB. The reason is simple—a company developing a hypervisor chooses and
qualifies technologies that its hypervisor is going to support. Then, it's up to various
HBA/controller vendors (Intel, Mellanox, QLogic, and so on) to develop drivers for that
hypervisor, and it's up to storage vendor to decide which types of storage protocols they're
going to support on their storage device.
From a CentOS perspective, there are many different storage pool types that are
supported. Here are some of them:
From the perspective of libvirt, a storage pool can be a directory, a storage device, or a
file that libvirt manages. That leads us to 10+ different storage pool types, as you're going
to see in the next section. From a virtual machine perspective, libvirt manages virtual
machine storage, which virtual machines use so that they have the capacity to store data.
oVirt, on the other hand, sees things a bit differently, as it has its own service that works
with libvirt to provide centralized storage management from a data center perspective.
Data center perspective might seem like a term that's a bit odd. But think about it—a
datacenter is some kind of higher-level object in which you can see all of your resources.
A data center uses storage and hypervisors to provide us with all of the services that we
need in virtualization—virtual machines, virtual networks, storage domains, and so on.
Basically, from a data center perspective, you can see what's happening on all of your
hosts that are members of that datacenter. However, from a host level, you can't see
what's happening on another host. It's a hierarchy that's completely logical from both
a management and a security perspective.
oVirt can centrally manage these different types of storage pools (and the list can get
bigger or smaller as the years go by):
• iSCSI
• FC
• Local storage (attached directly to KVM hosts)
• GlusterFS exports
• POSIX-compliant file systems
CentOS has a new way of dealing with storage pools. Although still in technology preview
state, it's worth going through the complete configuration via this new tool, called Stratis.
Basically, a couple of years ago, Red Hat finally deprecated the idea of pushing Brtfs for
future releases and started working on Stratis. If you've ever used ZFS, that's where this is
probably going—an easy-to-manage, ZFS-like, volume-managing set of utilities that Red
Hat can stand behind in their future releases. Also, just like ZFS, a Stratis-based pool can
use cache; so, if you have an SSD that you'd like to dedicate to pool cache, you can actually
do that, as well. If you have been expecting Red Hat to support ZFS, there's a fundamental
Red Hat policy that stands in the way. Specifically, ZFS is not a part of the Linux kernel,
mostly because of licensing reasons. Red Hat has a policy for these situations—if it's not
a part of the kernel (upstream), then they don't provide nor support it. As it stands, that's
not going to happen anytime soon. These policies are also reflected in CentOS.
Let's start from scratch and install Stratis so that we can use it. Let's use the following
command:
The first command installs the Stratis service and the corresponding command-line
utilities. The second one will start and enable the Stratis service.
Now, we are going to go through a complete example of how to use Stratis to configure
your storage devices. We're going to cover an example of this layered approach. So, what
we are going to do is as follows:
The premise here is simple—the software RAID10+ spare via MD RAID is going to
approximate the regular production approach, in which you'd have some kind of a
hardware RAID controller presenting a single block device to the system. We're going to
add a cache device to the pool to verify the caching functionality, as this is something that
we would most probably do if we were using ZFS, as well. Then, we are going to create
a filesystem on top of that pool and mount it to a local directory with the help of the
following commands:
This mounted filesystem is XFS-formatted. We could then easily use this filesystem via
NFS export, which is exactly what we're going to do in the NFS storage lesson. But for
now, this was just an example of how to create a pool by using Stratis.
We've covered some basics of local storage pools, which brings us closer to our next subject,
which is how to use pools from a libvirt perspective. So, that will be our next topic.
130 Libvirt Storage
最新资料最新资料
We're going to create various different types of storage pools in the following sections—an
NFS-based pool, an iSCSI and FC pool, and Gluster and Ceph pools: the whole nine yards.
We're also going to explain when to use each and every one of them as there will be different
usage models involved.
There are other bits and pieces that were enhanced in v4.2, but for now, this is more than
enough. You can find even more information about this in IETF's RFC 7862 document
(https://tools.ietf.org/html/rfc7862). We're going to focus our attention on
the implementation of NFS v4.2 specifically, as it's the best that NFS currently has to offer.
It also happens to be the default NFS version that CentOS 8 supports.
The first thing that we have to do is install the necessary packages. We're going to achieve
that by using the following commands:
The first command installs the necessary utilities to run the NFS server. The second one is
going to start it and permanently enable it so that the NFS service is available after reboot.
132 Libvirt Storage
Our next task is to configure what we're going to share via the NFS server. For that,
we need to export a directory and make it available to our clients over the network.
NFS uses a configuration file, /etc/exports, for that purpose. Let's say that we
want to create a directory called /exports, and then share it to our clients in the
192.168.159.0/255.255.255.0 network, and we want to allow them to write
data on that share. Our /etc/exports file should look like this:
/mnt/packtStratisXFS01 192.168.159.0/24(rw)
exportfs -r
These configuration options tell our NFS server which directory to export (/exports),
to which clients (192.168.159.0/24), and what options to use (rw means read-write).
Some other available options include the following:
• no_root_squash: All I/O operations from UID 0 and GID 0 are mapped to
UID 0 and GID 0.
If you need to apply multiple options to the exported directory, you add them with a
comma between them, as follows:
/mnt/packtStratisXFS01 192.168.159.0/24(rw,sync,root_squash)
You can use fully qualified domain names or short hostnames (if they're resolvable by
DNS or any other mechanism). Also, if you don't like using prefixes (24), you can use
regular netmasks, as follows:
/mnt/packtStratisXFS01 192.168.159.0/255.255.255.0(rw,root_
squash)
Now that we have configured the NFS server, let's see how we're going to configure libvirt
to use that server as a storage pool. As always, there are a couple of ways to do this. We
could just create an XML file with the pool definition and import it to our KVM host
by using the virsh pool-define --file command. Here's an example of that
configuration file:
NFS storage pool 133
• pool type: netfs means that we are going to use an NFS file share.
• name: The pool name, as libvirt uses pools as named objects, just like
最新资料最新资料
virtual networks.
• host : The address of the NFS server that we are connecting to.
• dir path: The NFS export path that we configured on the NFS server via
/etc/exports.
• path: The local directory on our KVM host where that NFS share is going to be
mounted to.
• permissions: The permissions used for mounting this filesystem.
• owner and group: The UID and GID used for mounting purposes (that's why we
exported the folder earlier with the no_root_squash option).
• label: The SELinux label for this folder—we're going to discuss this in Chapter 16,
Troubleshooting Guideline for the KVM Platform.
134 Libvirt Storage
If we wanted, we could've easily done the same thing via the Virtual Machine Manager
GUI. First, we would have to select the correct type (the NFS pool) and give it a name:
Figure 5.3 – Selecting the NFS pool type and giving it a name
最新资料最新资料
After we click Forward, we can move to the final configuration step, where we need to tell
the wizard which server we're mounting our NFS share from:
When we finish typing in these configuration options (Host Name and Source Path),
we can press Finish, which will mean exiting the wizard. Also, our previous configuration
screen, which only contained the default storage pool, now has our newly configured
pool listed as well:
最新资料最新资料
That being said, the fact that it's based on an Ethernet stack makes it easier to deploy
iSCSI-based solutions, while at the same time offering some unique challenges. For
example, sometimes it's difficult to explain to a customer that using the same network
switch(es) for virtual machine traffic and iSCSI traffic is not the best idea. What makes it
even worse is the fact that clients are sometimes so blinded by their desire to save money
that they don't understand that they're working against their own best interest. Especially
when it comes to network bandwidth. Most of us have been there, trying to work with
最新资料最新资料
clients' questions such as "but we already have a Gigabit Ethernet switch, why would you
need anything faster than that?"
The fact of the matter is, with iSCSI's intricacies, more is just – more. The more speed
you have on the disk/cache/controller side and the more bandwidth you have on the
networking side, the more chance you have of creating a storage system that's faster. All
of that can have a big impact on our virtual machine performance. As you'll see in the
Storage redundancy and multipathing section, you can actually build a very good storage
system yourself—both for iSCSI and FC. This might come in real handy when you try
to create some kind of a testing lab/environment to play with as you develop your KVM
virtualization skills. You can apply that knowledge to other virtualized environments,
as well.
The iSCSI and FC architectures are very similar—they both need a target (an iSCSI
target and an FC target) and an initiator (an iSCS initiator and an FC initiator). In this
terminology, the target is a server component, and the initiator is a client component.
To put it simply, the initiator connects to a target to get access to block storage that's
presented via that target. Then, we can use the initiator's identity to limit what the initiator
is able to see on the target. This is where the terminology starts to get a bit different when
comparing iSCSI and FC.
iSCSI and SAN storage 137
In iSCSI, the initiator's identity can be defined by four different properties. They are
as follows:
• iSCSI Qualified Name (IQN): This is a unique name that all initiators and targets
have in iSCSI communication. We can compare this to a MAC or IP address in
regular Ethernet-based networks. You can think of it this way—an IQN is for iSCSI
what a MAC or IP address is for Ethernet-based networks.
• IP address: Every initiator will have a different IP address that it uses to connect
to the target.
• MAC address: Every initiator has a different MAC address on Layer 2.
• Fully Qualified Domain Name (FQDN): This represents the name of the server
as it's resolved by a DNS service.
From the iSCSI target perspective—depending on its implementation—you can use any
one of these properties to create a configuration that's going to tell the iSCSI target which
IQNs, IP addresses, MAC addresses, or FQDNs can be used to connect to it. This is what's
called masking, as we can mask what an initiator can see on the iSCSI target by using
these identities and pairing them with LUNs. LUNs are just raw, block capacities that
we export via an iSCSI target toward initiators. LUNs are indexed, or numbered, usually
from 0 onward. Every LUN number represents a different storage capacity that an initiator
最新资料最新资料
Be careful about the firewall configuration; you might need to configure it to allow
connectivity on port 3260/tcp, which is the port that the iSCSI target portal uses.
So, if your firewall has started, type in the following command:
There are three possibilities for iSCSI on Linux in terms of what storage backend to use.
We could use a regular filesystem (such as XFS), a block device (a hard drive), or LVM.
So, that's exactly what we're going to do. Our scenario is going to be as follows:
So, after we install the necessary packages and configure the target service and firewall, we
should start with configuring our iSCSI target. We'll just start the targetcli command
and check the state, which should be a blank slate as we're just beginning the process:
最新资料最新资料
1. So, let's configure the XFS-based filesystem and configure the LUN0 file image to be
saved there. First, we need to partition the disk (in our case, /dev/sdb):
最新资料最新资料
Figure 5.8 – Formatting the XFS filesystem, creating a directory, and mounting it to that directory
140 Libvirt Storage
3. The next step is configuring targetcli so that it creates LUN0 and assigns an
image file for LUN0, which will be saved in the /LUN0 directory. First, we need
to start the targetcli command:
to use /dev/sdc1 (create the partition using the previous example) and check
the current state:
So, LUN0 and LUN1 and their respective backends are now configured. Let's finish things
off by configuring LVM:
1. First, we are going to prepare the physical volume for LVM, create a volume group
out of that volume, and display all the information about that volume group so that
we can see how much space we have for LUN2:
最新资料最新资料
Figure 5.11 – Configuring the physical volume for LVM, building a volume group,
and displaying information about that volume group
142 Libvirt Storage
2. The next step is to actually create the logical volume, which is going to be our
block storage device backend for LUN2 in the iSCSI target. We can see from the
vgdisplay output that we have 15,359 4 MB blocks available, so let's use that
to create our logical volume, called LUN2. Go to targetcli and configure the
necessary settings for LUN2:
最新资料最新资料
4. Next, we need to configure the IQN of our initiator. We usually want this
name to be reminiscent of the hostname, so, seeing that our host's FQDN is
PacktStratis01, we'll use that to configure the IQN. To do that, we need
to edit the /etc/iscsi/initiatorname.iscsi file and configure
the InitiatorName option. For example, let's set it to iqn.2019-12.
com.packt:PacktStratis01. The content of the /etc/iscsi/
initiatorname.iscsi file should be as follows:
InitiatorName=iqn.2019-12.com.packt:PacktStratis01
5. Now that this is configured, let's go back to the iSCSI target and create an Access
Control List (ACL). The ACL is going to allow our KVM host's initiator to connect
to the iSCSI target portal:
最新资料最新资料
Figure 5.13 – Creating an ACL so that the KVM host's initiator can connect to the iSCSI target
6. Next, we need to publish our pre-created file-based and block-based devices to the
iSCSI target LUNs. So, we need to do this:
Figure 5.14 – Adding our file-based and block-based devices to the iSCSI target LUNs 0, 1, and 2
144 Libvirt Storage
最新资料最新资料
<pool type='iscsi'>
<name>MyiSCSIPool</name>
<source>
<host name='192.168.159.145'/>
<device path='iqn.2003-01.org.linux-iscsi.packtiscsi01.
x8664:sn.7b3c2efdbb11'/>
</source>
<initiator>
<iqn name='iqn.2019-12.com.packt:PacktStratis01' />
</initiator>
<target>
iSCSI and SAN storage 145
<path>/dev/disk/by-path</path>
</target>
</pool>
• pool type= 'iscsi': We're telling libvirt that this is an iSCSI pool.
• name : The pool name.
• host name: The IP address of the iSCSI target.
• device path: The IQN of the iSCSI target.
• The IQN name in the initiator section: The IQN of the initiator.
• target path: The location where iSCSI target's LUNs will be mounted.
Now, all that's left for us to do is to define, start, and autostart our new iSCSI-backed KVM
storage pool:
The target path part of the configuration can be easily checked via virsh. If we type the
following command into the KVM host, we will get the list of available LUNs from the
MyiSCSIPool pool that we just configured:
If this output reminds you a bit of the VMware vSphere Hypervisor storage runtime
names, you are definitely on the right track. We will be able to use these storage pools
in Chapter 7, Virtual Machine – Installation, Configuration, and Life-Cycle Management,
when we start deploying our virtual machines.
So, as we're doing this in oVirt, there are a couple of things that we need to do. First, from
a networking perspective, it would be a good idea to create some storage networks. In our
case, we're going to assign two networks for iSCSI, and we will call them iSCSI01 and
iSCSI02. We need to open the oVirt administration panel, hover over Network, and
select Networks from the menu. This will open a pop-up window for the New Logical
Network wizard. So, we just need to name the network iSCSI01 (for the first one),
uncheck the VM network checkbox (as this isn't a virtual machine network), and go to
the Cluster tab, where we deselect the Require all checkbox. Repeat the whole process
again for the iSCSI02 network:
最新资料最新资料
The next step is assigning these networks to host network adapters. Go to compute/
hosts, double-click on the host that you added to oVirt's inventory, select the Network
interfaces tab, and click on the Setup Host Networks icon in the top-right corner. In that
UI, drag and drop iSCSI01 on the second network interface and iSCSI02 on the third
network interface. The first network interface is already taken by the oVirt management
network. It should look something like this:
最新资料最新资料
is enabling it. Again, in the oVirt GUI, go to Compute | Data Centers, select your
datacenter with a double-click, and go to the iSCSI Multipathing tab:
Now that we have covered the basics of storage pools, NFS, and iSCSI, we can move
on to a standard open source way of deploying storage infrastructure, which would be
to use Gluster and/or Ceph.
Gluster
Gluster is a distributed filesystem that's often used for high-availability scenarios. Its main
advantages over other filesystems are the fact that it's scalable, it can use replication and
snapshots, it can work on any server, and it's usable as a basis for shared storage—for
example, via NFS and SMB. It was developed by a company called Gluster Inc., which
was acquired by RedHat in 2011. However, unlike Ceph, it's a file storage service, while
Ceph offers block and object-based storage. Object-based storage for block-based devices
means direct, binary storage, directly to a LUN. There are no filesystems involved, which
最新资料最新资料
theoretically means less overhead as there's no filesystem, filesystem tables, and other
constructs that might slow the I/O process down.
Let's first configure Gluster to show its use case with libvirt. In production, that means
installing at least three Gluster servers so that we can make high availability possible.
Gluster configuration is really straightforward, and in our example, we are going to create
three CentOS 7 machines that we will use to host the Gluster filesystem. Then, we will
mount that filesystem on our hypervisor host and use it as a local directory. We can use
GlusterFS directly from libvirt, but the implementation is just not as refined as using it
via the gluster client service, mounting it as a local directory, and using it directly as a
directory pool in libvirt.
Our configuration will look like this:
So, let's put that into production. We have to issue a large sequence of commands on all
of the servers before we configure Gluster and expose it to our KVM host. Let's start with
gluster1. First, we are going to do a system-wide update and reboot to prepare the core
operating system for Gluster installation. Type the following commands into all three
CentOS 7 servers:
Then, we can start deploying the necessary repositories and packages, format disks,
configure the firewall, and so on. Type the following commands into all the servers:
mkfs.xfs /dev/sdb
mkdir /gluster/bricks/1 -p
echo '/dev/sdb /gluster/bricks/1 xfs defaults 0 0' >> /etc/
fstab
mount -a
mkdir /gluster/bricks/1/brick
最新资料最新资料
192.168.159.147 gluster1
192.168.159.148 gluster2
192.168.159.149 gluster3
For the next part of the configuration, we can just log in to the first server and use
it as the de facto management server for our Gluster infrastructure. Type in the
following commands:
The first three commands should get you the peer probe: success status. The third
one should return an output similar to this:
Then, we could mount Gluster as an NFS directory for testing purposes. For example,
we can create a distributed namespace called kvmgluster to all of the member hosts
(gluster1, gluster2, and gluster3). We can do this by using the following
commands:
The Gluster part is now ready, so we need to go back to our KVM host and mount the
Gluster filesystem to it by typing in the following commands:
wget \ https://download.gluster.org/pub/gluster/glusterfs/6/
LATEST/CentOS/gl\ usterfs-rhel8.repo -P /etc/yum.repos.d
yum install glusterfs glusterfs-fuse attr -y
mount -t glusterfs -o context="system_u:object_r:virt_
image_t:s0" \ gluster1:/kvmgluster /var/lib/libvirt/images/
GlusterFS
We have to pay close attention to Gluster releases on the server and client, which is why
we downloaded the Gluster repository information for CentOS 8 (we're using it on the
KVM server) and installed the necessary Gluster client packages. That enabled us to
mount the filesystem with the last command.
Now that we've finished our configuration, we just need to add this directory as a libvirt
storage pool. Let's do that by using an XML file with the storage pool definition, which
contains the following entries:
<pool type='dir'>
<name>glusterfs-pool</name>
<target>
最新资料最新资料
<path>/var/lib/libvirt/images/GlusterFS</path>
<permissions>
<mode>0755</mode>
<owner>107</owner>
<group>107</group>
<label>system_u:object_r:virt_image_t:s0</label>
</permissions>
</target>
</pool>
Let's say that we saved this file in the current directory, and that the file is called
gluster.xml. We can import and start it in libvirt by using the following
virsh commands:
We should mount this pool automatically on boot so that libvirt can use it. Therefore,
we need to add the following line to /etc/fstab:
gluster1:/kvmgluster /var/lib/libvirt/images/GlusterFS \
glusterfs defaults,_netdev 0 0
• We can use Gluster's failover capability, which will be managed automatically by the
Gluster utilities that we installed directly, as libvirt doesn't support them yet.
• We will avoid creating virtual machine disks manually, which is another limitation
of libvirt's implementation of Gluster support, while directory-based storage pools
support it without any issues.
It seems weird that we're mentioning failover, as it seems as though we didn't configure
it as a part of any of the previous steps. Actually, we have. When we issued the last mount
command, we used Gluster's built-in modules to establish connectivity to the first Gluster
server. That, in turn, means that after this connection, we got all of the details about the
whole Gluster pool, which we configured so that it's hosted on three servers. If any kind
of failure happens—which we can easily simulate—this connection will continue working.
We can simulate this scenario by turning off any of the Gluster servers, for example—
最新资料最新资料
gluster1. You'll see that the local directory where we mounted Gluster directory still
works, even though gluster1 is down. Let's see that in action (the default timeout
period is 42 seconds):
Figure 5.23 – Gluster failover working; the first node is down, but we're still able to get our files
If we want to be more aggressive, we can shorten this timeout period to—for
example—2 seconds by issuing the following command on any of our Gluster servers:
The number part is in seconds, and by assigning it a lower number, we can directly
influence how aggressive the failover process is.
So, now that everything is configured, we can start using the Gluster pool to deploy virtual
machines, which we will discuss further in Chapter 7, Virtual Machine – Installation,
Configuration, and Life-Cycle Management.
Seeing as Gluster is a file-based backend that can be used for libvirt, it's only natural to
describe how to use an advanced block-level and object-level storage backend. That's
where Ceph comes in, so let's work on that now.
Ceph
Ceph can act as file-, block-, and object-based storage. But for the most part, we're usually
using it as either block- or object-based storage. Again, this is a piece of open source
software that's designed to work on any server (or a virtual machine). In its core, Ceph
runs an algorithm called Controlled Replication Under Scalable Hashing (CRUSH).
This algorithm tries to distribute data across object devices in a pseudo-random manner,
and in Ceph, it's managed by a cluster map (a CRUSH map). We can easily scale Ceph out
by adding more nodes, which will redistribute data in a minimum fashion to ensure as
small amount of replication as possible.
最新资料最新资料
• ceph-mon : Used for cluster monitoring, CRUSH maps, and Object Storage
Daemon (OSD) maps.
• ceph-osd: This handles actual data storage, replication, and recovery. It requires at
least two nodes; we'll use three for clustering reasons.
• ceph-mds: Metadata server, used when Ceph needs filesystem access.
In accordance with best practices, make sure that you always design your Ceph
environments with the key principles in mind—all of the data nodes need to have the
same configuration. That means the same amount of memory, the same storage controllers
(don't use RAID controllers, just plain HBAs without RAID firmware if possible), the
same disks, and so on. That's the only way to ensure a constant level of Ceph performance
in your environments.
156 Libvirt Storage
One very important aspect of Ceph is data placement and how placement groups work.
Placement groups offer us a chance to split the objects that we create and place them in
OSDs in an optimal fashion. Translation: the bigger the number of placement groups we
configure, the better balance we're going to get.
So, let's configure Ceph from scratch. We're going to follow the best practices again
and deploy Ceph by using five servers—one for administration, one for monitoring,
and three OSDs.
Our configuration will look like this:
Also, make sure that you type the following commands into all of the hosts as the root
user. Let's start by deploying packages, creating an admin user, and giving them rights
to sudo:
Disabling SELinux will make our life easier for this demonstration, as will getting rid of
the firewall:
Change the last echo part to suit your environment—hostnames and IP addresses. We're
just using this as an example from our environment. The next step is making sure that we
can use our admin host to connect to all of the hosts. The easiest way to do that is by using
SSH keys. So, on ceph-admin, log in as root and type in the ssh-keygen command,
and then press the Enter key all the way through. It should look something like this:
最新资料最新资料
Figure 5.25 – Generating an SSH key for root for Ceph setup purposes
158 Libvirt Storage
We also need to copy this key to all of the hosts. So, again, on ceph-admin, use
ssh-copy-id to copy the keys to all of the hosts:
ssh-copy-id cephadmin@ceph-admin
ssh-copy-id cephadmin@ceph-monitor
ssh-copy-id cephadmin@ceph-osd1
ssh-copy-id cephadmin@ceph-osd2
ssh-copy-id cephadmin@ceph-osd3
Accept all of the keys when SSH asks you, and use ceph123 as the password, which we
selected in one of the earlier steps. After all of this is done, there's one last step that we
need to do on ceph-admin before we start deploying Ceph—we have to configure SSH
to use the cephadmin user as a default user to log in to all of the hosts. We will do this by
going to the .ssh directory as root on ceph-admin, and creating a file called config
with the following content:
Host ceph-admin
Hostname ceph-admin
User cephadmin
最新资料最新资料
Host ceph-monitor
Hostname ceph-monitor
User cephadmin
Host ceph-osd1
Hostname ceph-osd1
User cephadmin
Host ceph-osd2
Hostname ceph-osd2
User cephadmin
Host ceph-osd3
Hostname ceph-osd3
User cephadmin
Gluster and Ceph as a storage backend for KVM 159
That was a long pre-configuration, wasn't it? Now it's time to actually start deploying
Ceph. The first step is to configure ceph-monitor. So, on ceph-admin, type in the
following commands:
cd /root
mkdir cluster
cd cluster
ceph-deploy new ceph-monitor
Because of the fact that we selected a configuration in which we have three OSDs, we need
to configure Ceph so that it uses these additional two hosts. So, in the cluster directory,
edit the file called ceph.conf and add the following two lines at the end:
This will make sure that we can only use our example network (192.168.159.0/24)
for Ceph, and that we have two additional OSDs on top of the original one.
Now that everything's ready, we have to issue a sequence of commands to configure Ceph.
So, again, on ceph-admin, type in the following commands:
最新资料最新资料
• The first command starts the actual deployment process—for the admin, monitor,
and OSD nodes, with the installation of all the necessary packages.
• The second and third commands configure the monitor host so that it's ready to
accept external connections.
160 Libvirt Storage
• The two disk commands are all about disk preparation—Ceph will clear the disks
that we assigned to it (/dev/sdb per OSD host) and create two partitions on them,
one for Ceph data and one for the Ceph journal.
• The last two commands prepare these filesystems for use and activate Ceph. If at
any time your ceph-deploy script stops, check your DNS and /etc/hosts
and firewalld configuration, as that's where the problems usually are.
We need to expose Ceph to our KVM host, which means that we have to do a bit of extra
configuration. We're going to expose Ceph as an object pool to our KVM host, so we
need to create a pool. Let's call it KVMpool. Connect to ceph-admin, and issue the
following commands:
This command will create a pool called KVMpool, with 128 placement groups.
The next step involves approaching Ceph from a security perspective. We don't want
anyone connecting to this pool, so we're going to create a key for authentication to Ceph,
which we're going to use on the KVM host for authentication purposes. We do that by
typing the following command:
rwx pool=KVMpool'
key = AQB9p8RdqS09CBAA1DHsiZJbehb7ZBffhfmFJQ==
We can then switch to the KVM host, where we need to do two things:
The easiest way to do these two steps would be by using two XML configuration files for
libvirt. So, let's create those two files. Let's call the first one, secret.xml, and here are
its contents:
<name>client.KVMpool secret</name>
</usage>
</secret>
Make sure that you save and import this XML file by typing in the following command:
After you press the Enter key, this command is going to throw out a UUID. Please copy
and paste that UUID someplace safe, as we're going to need it for the pool XML file. In
our environment, this first virsh command threw out the following output:
We need to assign a value to this secret so that when libvirt tries to use this secret, it
knows which password to use. That's actually the password that we created on the Ceph
level, when we used ceph auth get-create, which threw us the key. So, now that we
have both the secret UUID and the Ceph key, we can combine them to create a complete
authentication object. On the KVM host, we need to type in the following command:
Now, we can create the Ceph pool file. Let's call the config file ceph.xml, and here are
its contents:
<pool type="rbd">
<source>
<name>KVMpool</name>
<host name='192.168.159.151' port='6789'/>
<auth username='KVMpool' type='ceph'>
<secret uuid='95b1ed29-16aa-4e95-9917-c2cd4f3b2791'/>
</auth>
</source>
</pool>
162 Libvirt Storage
So, the UUID from the previous step was used in this file to reference which secret
(identity) is going to be used for Ceph pool access. Now we need to do the standard
procedure—import the pool, start it, and autostart it—if we want to use it permanently
(after the KVM host reboot). So, let's do that with the following sequence of commands
on the KVM host:
Figure 5.26 – Checking the state of our pools; the Ceph pool is configured and ready to be used
最新资料最新资料
Now that the Ceph object pool is available for our KVM host, we could install a
virtual machine on it. We're going to work on that – again – in Chapter 7, Virtual
Machine – Installation, Configuration, and Life-Cycle Management.
Important note:
dd is known to be a resource-hungry command. It may cause I/O problems on
the host system, so it's good to first check the available free memory and I/O
state of the host system, and only then run it. If the system is already loaded,
lower the block size to MB and increase the count to match the size of the file
you wanted (use bs=1M, count=10000 instead of bs=1G, count=10).
• Preallocated: A preallocated virtual disk allocates the space right away at the time
of creation. This usually means faster write speeds than a thin-provisioned virtual
disk.
• Thin-provisioned: In this method, space will be allocated for the volume as
needed—for example, if you create a 10 GB virtual disk (disk image) with sparse
allocation. Initially, it would just take a couple of MB of space from your storage
and grow as it receives write from the virtual machine up to 10 GB size. This
allows storage over-commitment, which means faking the available capacity from
a storage perspective. Furthermore, this can lead to problems later, when storage
最新资料最新资料
space gets filled. To create a thin-provisioned disk, use the seek option with the
dd command, as shown in the following command:
Each comes with its own advantages and disadvantages. If you are looking for I/O
performance, go for a preallocated format, but if you have a non-IO-intensive load,
choose thin-provisioned.
Now, you might be wondering how you can identify what disk allocation method a certain
virtual disk uses. There is a good utility for finding this out: qemu-img. This command
allows you to read the metadata of a virtual image. It also supports creating a new disk
and performing low-level format conversion.
164 Libvirt Storage
See the disk size line of both the disks. It's showing 10G for /vms/dbvm_disk2.img,
whereas for /vms/dbvm_disk2_seek.img, it's showing 10M MiB. This difference is
最新资料最新资料
because the second disk uses a thin-provisioning format. The virtual size is what guests see
and the disk size is what space the disk reserved on the host. If both the sizes are the same,
it means the disk is preallocated. A difference means that the disk uses the thin-provisioning
format. Now, let's attach the disk image to a virtual machine; you can attach it using
virt-manager or the CLI alternative, virsh.
So, let's use the Virtual Machine Manager to attach the disk to the virtual machine:
1. In the Virtual Machine Manager main window, select the virtual machine to which
you want to add the secondary disk.
2. Go to the virtual hardware details window and click on the Add Hardware button
located at the bottom-left side of the dialog box.
Virtual disk images and formats and basic KVM storage operations 165
3. In Add New Virtual Hardware, select Storage and select the Create a disk image
for the virtual machine button and virtual disk size, as in the following screenshot:
最新资料最新资料
Important note:
Here, we used a disk image, but you are free to use any storage device that is
present on the host system, such as a LUN, an entire physical disk (/dev/
sdb) or disk partition (/dev/sdb1), or LVM logical volume. We could
have used any of the previously configured storage pools for storing this image
either as a file or object or directly to a block device.
166 Libvirt Storage
5. Clicking on the Finish button will attach the selected disk image (file) as a second
disk to the virtual machine using the default configuration. The same operation can
be quickly performed using the virsh command.
Using virt-manager to create a virtual disk was easy enough—just a couple of clicks
of a mouse and a bit of typing. Now, let's see how we can do that via the command
line—namely, by using virsh.
However, in a normal scenario, the following are sufficient to perform hot-add disk
attachment to a virtual machine:
Here, CentOS8 is the virtual machine to which a disk attachment is executed. Then, there
is the path of the disk image. vdb is the target disk name that would be visible inside the
guest operating system. --live means performing the action while the virtual machine
is running, and --config means attaching it persistently across reboot. Not adding a
--config switch will keep the disk attached only until reboot.
Important note:
Hot plugging support: The acpiphp kernel module should be loaded
in a Linux guest operating system in order to recognize a hot-added disk;
acpiphp provides legacy hot plugging support, whereas pciehp provides
native hot plugging support . pciehp is dependent on acpiphp. Loading
acpiphp will automatically load pciehp as a dependency.
Virtual disk images and formats and basic KVM storage operations 167
You can use the virsh domblklist <vm_name> command to quickly identify how
many vDisks are attached to a virtual machine. Here is an example:
This clearly indicates that the two vDisks connected to the virtual machine are both
file images. They are visible to the guest operating system as vda and vdb, respectively,
and in the last column of the disk images path on the host system.
Next, we are going to see how to create an ISO library.
1. First, create a directory on the host system to store the .iso images:
# mkdir /iso
2. Set the correct permissions. It should be owned by a root user with permission set
to 700. If SELinux is in enforcing mode, the following context needs to be set:
# chmod 700 /iso
# semanage fcontext -a -t virt_image_t "/iso(/.*)?"
3. Define the ISO image library using the virsh command, as shown in the following
code block:
# virsh pool-define-as iso_library dir - - - - "/iso"
# virsh pool-build iso_library
# virsh pool-start iso_library
168 Libvirt Storage
5. Now you can copy or move the .iso images to the /iso_lib directory.
6. Upon copying the .iso files into the /iso_lib directory, refresh the pool and
then check its contents:
# virsh pool-refresh iso_library 最新资料最新资料
7. This will list all the ISO images stored in the directory, along with their path. These
ISO images can now be used directly with a virtual machine for guest operating
system installation, software installation, or upgrades.
Creating an ISO image library is the de facto norm in today's enterprises. It's better to
have a centralized place where all your ISO images are, and it makes it easier to implement
some kind of synchronization method (for example, rsync) if you need to synchronize
across different locations.
Virtual disk images and formats and basic KVM storage operations 169
最新资料最新资料
The next logical step after creating a storage pool is to create a storage volume. From a
logical standpoint, the storage volume slices a storage pool into smaller parts. Let's learn
how to do that now.
170 Libvirt Storage
最新资料最新资料
In Chapter 8, Creating and Modifying VM Disks, Templates, and Snapshots, all the disk
formats are explained in detail. For now, just understand that qcow2 is a specially
designed disk format for KVM virtualization. It supports the advanced features needed
for creating internal snapshots.
Here, dedicated_storage is the storage pool, vm_vol1 is the volume name, and
10 GB is the size:
The virsh command and arguments to create a storage volume are almost the same
regardless of the type of storage pool it is created on. Just enter the appropriate input for
最新资料最新资料
a --pool switch. Now, let's see how to delete a volume using the virsh command.
Executing this command will remove the vm_vol2 volume from the dedicated_
storage storage pool.
The next step in our storage journey is about looking a bit into the future as all of the
concepts that we mentioned in this chapter have been well known for years, some even for
decades. The world of storage is changing and moving into new and interesting directions,
so let's discuss that for a bit next.
172 Libvirt Storage
devices. It's all about performance and latency, and older concepts such as Advanced Host
Controller Interface (AHCI), which we're still actively using with many SSDs on the
market today, are just not good enough to handle the performance that SSDs have. AHCI
is a standard way in which a regular hard disk (mechanical disk or regular spindle) talks
via software to SATA devices. However, the key part of that is hard disk, which means
cylinders, heads sectors—things that SSDs just don't have, as they don't spin around and
don't need that kind of paradigm. That meant that another standard had to be created so
that we can use SSDs in a more native fashion. That's what Non-Volatile Memory Express
(NVMe) is all about—bridging the gap between what SSDs are capable of doing and what
they can actually do, without using translations from SATA to AHCI to PCI Express
(and so on).
The fast development pace of SSDs and the integration of NVMe made huge
advancements in enterprise storage possible. That means that new controllers, new
software, and completely new architectures had to be invented to support this paradigm
shift. As more and more storage devices integrate NVMe for various purposes—primarily
for caching, then for storage capacity as well—it's becoming clear that there are other
problems that need to be solved as well. The first of which is the way in which we're
going to connect storage devices offering such a tremendous amount of capability to
our virtualized, cloud, or HPC environments.
The latest developments in storage – NVMe and NVMeOF 173
In the past 10 or so years, many people argued that FC is going to disappear from the
market, and a lot of companies hedged their bets on different standards—iSCSI, iSCSI over
RDMA, NFS over RDMA, and so on. The reasoning behind that seemed solid enough:
support that standard as well. That usually makes customers concerned as higher
numbers mean better throughput, at least in most people's minds.
But what's happening on the market now tells us a completely different story—not just
that FC is back, but that it's back with a mission. The enterprise storage companies have
embraced that and started introducing storage devices with insane levels of performance
(with the aid of NVMe SSDs, as a first phase). That performance needs to be transferred
to our virtualized, cloud, and HPC environments, and that requires the best possible
protocol, in terms of lowest latency, its design, and the quality and reliability, and FC has
all of that.
That leads to the second phase, where NVMe SSDs aren't just being used as cache devices,
but as capacity devices as well.
174 Libvirt Storage
Take note of the fact that, right now, there's a big fight brewing on the storage memory/
storage interconnects market. There are multiple different standards trying to compete
with Intel's Quick Path Interconnect (QPI), a technology that's been used in Intel CPUs
for more than a decade. If this is a subject that's interesting to you, there is a link at the
end of this chapter, in the Further reading section, where you can find more information.
Essentially, QPI is a point-to-point interconnection technology with low latency and high
bandwidth that's at the core of today's servers. Specifically, it handles communication
between CPUs, CPUs and memory, CPUs and chipsets, and so on. It's a technology that
Intel developed after it got rid of the Front Side Bus (FSB) and chipset-integrated memory
controllers. FSB was a shared bus that was shared between memory and I/O requests.
That approach had much higher latency, didn't scale well, and had lower bandwidth and
problems with situations in which there's a large amount of I/O happening on the memory
and I/O side. After switching to an architecture where the memory controller was a part
of the CPU (therefore, memory directly connects to it), it was essential for Intel to finally
move to this kind of concept.
If you're more familiar with AMD CPUs, QPI is to Intel what HyperTransport bus on
a CPU with built-in memory controller is to AMD CPUs.
As NVMe SSDs became faster, the PCI Express standard also needed to be updated, which
is the reason why the latest version (PCIe 4.0 – the first products started shipping recently)
was so eagerly anticipated. But now, the focus has switched to two other problems that
最新资料最新资料
need resolving for storage systems to work. Let's describe them briefly:
• Problem number one is simple. For a regular computer user, one or two NVMe
SSDs will be enough in 99% of scenarios or more. Realistically, the only real reason
why regular computer users need a faster PCIe bus is for a faster graphics cards.
But for storage manufacturers, it's completely different. They want to produce
enterprise storage devices that will have 20, 30, 50, 100, 500 NVMe SSDs in a single
storage system—and they want that now, as SSDs are mature as a technology and
are widely available.
• Problem number two is more complex. To add insult to injury, the latest generation
of SSDs (for example, based on Intel Optane) can offer even lower latency and
higher throughput. That's only going to get worse (even lower latencies, higher
throughput) as technology evolves. For today's services—virtualization, cloud, and
HPC—it's essential that the storage system is able to handle any load that we can
throw at it. These technologies are a real game-changer in terms of how much faster
storage devices can become, only if interconnects can handle it (QPI, FC, and many
more). Two of these concepts derived from Intel Optane—Storage Class Memory
(SCM) and Persistent Memory (PM) are the latest technologies that storage
companies and customers want adopted into their storage systems, and fast.
The latest developments in storage – NVMe and NVMeOF 175
• The third problem is how to transfer all of that bandwidth and I/O capability to
the servers and infrastructures using them. This is why the concept of NVMe over
Fabrics (NVMe-OF) was created, to try to work on the storage infrastructure stack
to make NVMe much more efficient and faster for its consumers.
If you take a look at these advancements from a conceptual point of view, it was clear
for decades that RAM-like memory is the fastest, lowest latency technology that we've
had for the past couple of decades. It's also logical that we're moving workloads to RAM,
as much as possible. Think of in-memory databases (such as Microsoft SQL, SAP Hana,
and Oracle). They've been around the block for years.
These technologies fundamentally change the way we think about storage. Basically, no
longer are we discussing storage tiering based on technology (SSD versus SAS versus
SATA), or outright speed, as the speed is unquestionable. The latest storage technologies
discuss storage tiering in terms of latency. The reason is very simple—let's say that you're a
storage company and that you build a storage system that uses 50 SCM SSDs for capacity.
For cache, the only reasonable technology would be RAM, hundreds of gigabytes of it.
The only way you'd be able to work with storage tiering on a device like that is by basically
emulating it in software, by creating additional technologies that will produce tiering-like
services based on queueing, handling priority in cache (RAM), and similar concepts.
Why? Because if you're using the same SCM SSDs for capacity, and they offer the same
speed and I/O, you just don't have a way of tiering based on technology or capability.
最新资料最新资料
Let's further describe this by using an available storage system to explain. The best device
to make our point is Dell/EMC's PowerMax series of storage devices. If you load them with
NVMe and SCM SSDs, the biggest model (8000) can scale to 15 million IOPS(!), 350 GB/s
throughput at lower than 100 microseconds latency and up to 4 PB capacity. Think about
those numbers for a second. Then add another number—on the frontend, it can have up to
256 FC/FICON/iSCSI ports. Just recently, Dell/EMC released new 32 Gbit/s FC modules
for it. The smaller PowerMax model (2000) can do 7.5 million IOPS, sub-100 microsecond
latency, and scale to 1 PB. It can also do all of the usual EMC stuff—replication, compression,
deduplication, snapshots, NAS features, and so on. So, this is not just marketing talk; these
devices are already out there, being used by enterprise customers:
Figure 3.30 – PowerMax 2000 – it seems small, but it packs a lot of punch
176 Libvirt Storage
These are very important concepts for the future, as more and more manufacturers
produce similar devices (and they are on the way). We fully expect the KVM-based world
to embrace these concepts in large-scale environments, especially for infrastructures with
OpenStack and OpenShift.
Summary
In this chapter, we introduced and configured various Open Source storage concepts for
libvirt. We also discussed industry-standard approaches, such as iSCSI and NFS, as they are
often used in infrastructures that are not based on KVM. For example, VMware vSphere-
based environments can use FC, iSCSI, and NFS, while Microsoft-based environments can
only use FC and iSCSI, from the list of subjects we covered in this chapter.
The next chapter will cover subjects related to virtual display devices and protocols. We'll
provide an in-depth introduction to VNC and SPICE protocols. We will also provide a
description of other protocols that are used for virtual machine connection. All that will
help us to understand the complete stack of fundamentals that we need to work with our
virtual machines, which we covered in the past three chapters.
Questions 最新资料最新资料
Further reading
Please refer to the following links for more information regarding what was covered in
this chapter:
最新资料最新资料
最新资料最新资料
6
Virtual Display
Devices and
Protocols 最新资料最新资料
In this chapter, we will discuss the way in which we access our virtual machines by using
virtual graphic cards and protocols. There are almost 10 available virtual display adapters
that we can use in our virtual machines, and there are multiple available protocols and
applications that we can use to access our virtual machines. If we forget about SSH for
a second and any kind of console-based access in general, there are various protocols
available on the market that we can use to access the console of our virtual machine,
such as VNC, SPICE, and noVNC.
In Microsoft-based environments, we tend to use a remote desktop protocol (RDP). If
we are talking about Virtual Desktop Infrastructure (VDI), then there are even more
protocols available – PC over IP (PCoIP), VMware Blast, and so on. Some of these
technologies offer additional functionality, such as greater color depth, encryption, audio
and filesystem redirection, printer redirection, bandwidth management, and USB and
other port redirection. These are key technologies for your remote desktop experience
in today's cloud-based world.
180 Virtual Display Devices and Protocols
All of this means that we must put a bit more time and effort into getting to know various
display devices and protocols, as well as how to configure and use them. We don't want
to end up in situations in which we can't see the display of a virtual machine because we
selected the wrong virtual display device, or in a situation where we try to open a console
to see the content of a virtual machine and the console doesn't open.
In this chapter, we will cover the following topics:
access the graphics from the client. Let's discuss these two concepts, starting with a virtual
graphic adapter. The latest version of QEMU has eight different types of virtual/emulated
graphics adapters. All of these have some similarities and differences, all of which can be
in terms of features and/or resolutions supported or other, more technical details. So,
let's describe them and see which use cases we are going to favor a specific virtual
graphic card for:
• tcx: A SUN TCX virtual graphics card that can be used with old SUN OSes.
• cirrus: A virtual graphic card that's based on an old Cirrus Logic GD5446 VGA
chip. It can be used with any guest OS after Windows 95.
• std: A standard VGA card that can be used with high-resolution modes for guest
OSes after Windows XP.
• vmware: VMware's SVGA graphics adapter, which requires additional drivers in
Linux guest OSes and VMware Tools installation for Windows OSes.
• QXL: The de facto standard paravirtual graphics card that we need to use when
we use SPICE remote display protocol, which we will cover in detail a bit later in
this chapter. There's an older version of this virtual graphics card called QXL VGA,
which lacks some more advanced features, while offering lower overhead (it uses
less memory).
Using virtual machine display devices 181
• Virtio: A paravirtual 3D virtual graphics card that is based on the virgl project,
which provides 3D acceleration for QEMU guest OSes. It has two different types
(VGA and gpu). virtio-vga is commonly used for situations where we need
multi-monitor support and OpenGL hardware acceleration. The virtio-gpu
version doesn't have a built-in standard VGA compatibility mode.
• cg3: A virtual graphics card that we can use with older SPARC-based guest OSes.
• none: Disables the graphics card in the guest OS.
When configuring your virtual machine, you can select these options at startup or virtual
machine creation. In CentOS 8, the default virtual graphics card that gets assigned to
a newly created virtual machine is QXL, as shown in the following screenshot of the
configuration for a new virtual machine:
最新资料最新资料
Also, by default, we can select three of these types of virtual graphics cards for any
given virtual machine, as these are usually pre-installed for us on any Linux server
that's configured for virtualization:
• QXL
• VGA
• Virtio
Some of the new OSes running in KVM virtualization shouldn't use older graphics card
adapters for a variety of reasons. For example, ever since Red Hat Enterprise Linux/
CentOS 7, there's an advisory not to use the cirrus virtual graphics card for Windows 10
and Windows Server 2016. The reason for this is related to the instability of the virtual
machine, as well as the fact that – for example – you can't use a full HD resolution display
with the cirrus virtual graphics card. Just in case you start installing these guest OSes,
make sure that you're using a QXL video graphics card as it offers the best performance
and compatibility with the SPICE remote display protocol.
Theoretically, you could still use cirrus virtual graphics card for some of the really old
guest OSes (older Windows NTs such as 4.0 and older client guest OSes such as Windows
XP), but that's about it. For everything else, it's much better to either use a std or QXL
driver as they offer the best performance and acceleration support. Furthermore, these
最新资料最新资料
Let's show this via an example. Let's say that we want to create a new virtual machine
that is going to have a set of custom parameters assigned to it in terms of how we access
its virtual display. If you remember in Chapter 3, Installing KVM Hypervisor, libvirt, and
ovirt, we discussed various libvirt management commands (virsh, virt-install)
and we also created some virtual machines by using virt-install and some custom
parameters. Let's add to those and use a similar example:
最新资料最新资料
Figure 6.2 – KVM virtual machine with a VGA virtual graphics card is created.
Here, VNC is asking for a password to be specified
184 Virtual Display Devices and Protocols
最新资料最新资料
Figure 6.3 – VGA display adapter and its low default (640x480) initial resolution - a familiar resolution
for all of us who grew up in the 80s
That being said, we just used this as an example of how to add an advanced option to the
virt-install command – specifically, how to install a virtual machine with a specific
virtual graphics card.
Using virtual machine display devices 185
There are other, more advanced concepts of using real graphics cards that we installed in
our computers or servers to forward their capabilities directly to virtual machines. This
is very important for concepts such as VDI, as we mentioned earlier. Let's discuss these
concepts for a second and use some real-word examples and comparisons to understand
the complexity of VDI solutions on a larger scale.
However, there's a big but. What happens if the scenario includes hundreds of users who
need access to 2D and/or 3D video acceleration? What happens if we are designing a VDI
solution for a company creating designs – architecture, plumbing, oil and gas, and video
production? Running VDI solutions based on CPU and software-based virtual graphics
cards will get us nowhere, especially at scale. This is where Xen Desktop and Horizon will
be much more feature-packed if we're talking about the technology level. And – to be
quite honest – KVM-based methods aren't all that far behind in terms of display options,
it's just that they lag in some other enterprise-class features, which we will discuss in later
chapters, such as Chapter 12, Scaling Out KVM with OpenStack.
Basically, there are three concepts we can use to obtain graphics card performance for a
virtual machine:
Just to use the VMware Horizon solution as a metaphor, these solutions would be called
CPU rendering, Virtual Direct Graphics Acceleration (vDGA), and Virtual Shared
Graphics Acceleration (vSGA). Or, in Citrix, we'd be talking about HDX 3D Pro. In
CentOS 8, we are talking about mediated devices in the shared graphics card scenario.
If we're talking about PCI passthrough, it definitely delivers the best performance as you
can use a PCI-Express graphics card, forward it directly to a virtual machine, install a
native driver inside the guest OS, and have the complete graphics card all for yourself.
But that creates four problems:
• You can only have that PCI-Express graphics card forwarded to one virtual machine.
• As servers can be limited in terms of upgradeability, for example, you can't run
50 virtual machines like that on one physical server as you can't fit 50 graphics cards
on a single server – physically or in terms of PCI-Express slots, where you usually
have up to six in a typical 2U rack server.
• If you're using Blade servers (for example, HP c7000), it's going to be even worse as
you're going to use half of the server density per blade chassis if you're going to use
additional graphics cards as these cards can only be fitted to double-height blades.
• You're going to spend an awful lot of money scaling any kind of solution like this
to hundreds of virtual desktops, or – even worse – thousands of virtual desktops.
最新资料最新资料
If we're talking about a shared approach in which you partition a physical graphics
card so that you can use it in multiple virtual machines, that's going to create another
set of problems:
• You're much more limited in terms of which graphics card to use as there are maybe
20 graphics cards that support this usage model (some include NVIDIA GRID,
Quadro, Tesla cards, and a couple of AMD and Intel cards).
• If you share the same graphics card with four, eight, 16, or 32 virtual machines, you
have to be aware of the fact that you'll get less performance, as you're sharing the
same GPU with multiple virtual machines.
• Compatibility with DirectX, OpenGL, CUDA, and video encoding offload
won't be as good as you might expect, and you might be forced to use older
versions of these standards.
• There might be additional licensing involved, depending on the vendor and solution.
Using virtual machine display devices 187
The next topic on our list is how to use a GPU in a more advanced way – by using the
GPU partitioning concept to provide parts of a GPU to multiple virtual machines. Let's
explain how that works and gets configured by using an NVIDIA GPU as an example.
The output of this command tells us that we have a 3D-capable GPU – specifically,
a NVIDIA GP104GL-based product. It tells us that this device is already using the
vfio-pci driver. This driver is the native SR-IOV driver for Virtualized Functions
(VF). These functions are the core of SR-IOV functionality. We will describe this by
using this SR-IOV-capable GPU.
188 Virtual Display Devices and Protocols
The first thing that we need to do – which all of us NVIDIA GPU users have been doing
for years – is to blacklist the nouveau driver, which gets in the way. And if we are going
to use GPU partitioning on a permanent basis, we need to do this permanently so that it
doesn't get loaded when our server starts. But be warned – this can lead to unexpected
behavior at times, such as the server booting and not showing any output without any
real reason. So, we need to create a configuration file for modprobe that will blacklist the
nouveau driver. Let's create a file called nouveauoff.conf in the /etc/modprobe.d
directory with the following content:
blacklist nouveau
options nouveau modeset 0
Then, we need to force our server to recreate the initrd image that gets loaded as our
server starts and reboot the server to make that change is active. We are going to do that
with the dracut command, followed by a regular reboot command:
After the reboot, let's check if our vfio driver for the NVIDIA graphics card has loaded
and, if it has, check the vGPU manager service:
最新资料最新资料
We need to create a UUID that we will use to present our virtual function to a KVM
virtual machine. We will use the uuidgen command for that:
uuidgen
c7802054-3b97-4e18-86a7-3d68dff2594d
Using virtual machine display devices 189
Now, let's use this UUID for the virtual machines that will share our GPU. For that, we
need to create an XML template file that we will add to the existing XML files for our
virtual machines in a copy-paste fashion. Let's call this vsga.xml:
Use these settings as a template and just copy-paste the complete content to any virtual
machine's XML file where you want to have access to our shared GPU.
The next concept that we need to discuss is the complete opposite of SR-IOV, where
we're slicing a device into multiple pieces to present these pieces to virtual machines. In
GPU passthrough, we're taking the whole device and presenting it directly to one object,
meaning one virtual machine. Let's learn how to configure that.
steps to be done in sequence. By doing these steps in the correct order, we're directly
presenting this hardware device to a virtual machine. Let's explain these configuration
steps and do them:
1. To enable GPU PCI passthrough, we need to configure and enable IOMMU – first
in our server's BIOS, then in our Linux distribution. We're using Intel-based servers,
so we need to add iommu options to our /etc/default/grub file, as shown in
the following screenshot:
2. The next step is to reconfigure the GRUB configuration and reboot it, which can be
achieved by typing in the following commands:
# grub2-mkconfig -o /etc/grub2.cfg
# systemctl reboot
We will need both IDs going forward, with each one for defining which device we
want to use and where.
4. The next step is to explain to our host OS that it will not be using this PCI
express device (Quadro card) for itself. In order to do that, we need to change the
GRUB configuration again and add another parameter to the same file (/etc/
defaults/grub):
Figure 6.6 – Adding the pci-stub.ids option to GRUB so that it ignores this device when booting the OS
Using virtual machine display devices 191
Again, we need to reconfigure GRUB and reboot the server after this, so type in the
following commands:
# grub2-mkconfig -o /etc/grub2.cfg
# systemctl reboot
This step marks the end of the physical server configuration. Now, we can move on
to the next stage of the process, which is how to use the now fully configured PCI
passthrough device in our virtual machine.
5. Let's check if everything was done correctly by using the virsh nodedev-
dumpxml command on the PCI device ID:
最新资料最新资料
Figure 6.7 – Checking if the KVM stack can see our PCIe device
192 Virtual Display Devices and Protocols
Here, we can see that QEMU sees two functions: 0x1 and 0x0. The 0x1 function
is actually the GPU device's audio chip, which we won't be using for our procedure.
We just need the 0x0 function, which is the GPU itself. This means that we need to
mask it. We can do that by using the following command:
Figure 6.8 – Detaching the 0x1 device so that it can't be used for passthrough
6. Now, let's add the GPU via PCI passthrough to our virtual machine. For this
purpose, we're using a freshly installed virtual machine called MasteringKVM03,
but you can use any virtual machine you want. We need to create an XML file that
QEMU will use to know which device to add to a virtual machine. After that, we
need to shut down the machine and import that XML file into our virtual machine.
In our case, the XML file will look like this:
最新资料最新资料
Figure 6.9 – The XML file with our GPU PCI passthrough definition for KVM
7. The next step is to attach this XML file to the MasteringKVM03 virtual machine.
We can do this by using the virsh attach-device command:
8. After the previous step, we can start our virtual machine, log in, and check if the
virtual machine sees our GPU:
the history of virtual machines, we had a number of different display protocols taking
care of this particular problem. So, let's discuss this history a bit.
The end of the 1970s was an important time in computer history as there were numerous
attempts to start mass-producing a personal computer for large amounts of people (for
example, Apple II from 1977). In the 1980s, people started using personal computers
more, as any Amiga, Commodore, Atari, Spectrum, or Amstrad fan will tell you. Keep
in mind that the first real, publicly available GUI-based OSes didn't start appearing until
Xerox Star (1981) and Apple Lisa (1983). The first widely available Apple-based GUI OS
was Mac OS System 1.0 in 1984. Most of the other previously mentioned computers were
all using a text-based OS. Even games from that era (and for many years to come) looked
like they were drawn by hand while you were playing them. Amiga's Workbench 1.0 was
released in 1985 and with its GUI and color usage model, it was miles ahead of its time.
However, 1985 is probably going to be remembered for something else – this is the year
that the first Microsoft Windows OS (v1.0) was released. Later, that became Windows 2.0
(1987), Windows 3.0 (1990), Windows 3.1 (1992), by which time Microsoft was already
taking the OS world by storm. Yes, there were other OSes by other manufacturers too:
All of these were just a tiny dot on the horizon compared to the big storm that happened
in 1995, when Microsoft introduced Windows 95. It was the first Microsoft client OS
that was able to boot to GUI by default since the previous versions were started from a
最新资料最新资料
command line. Then came Windows 98 and XP, which meant even more market share
for Microsoft. The rest of that story is probably very familiar, with Vista, Windows 7,
Windows 8, and Windows 10.
The point of this story is not to teach you about OS history per se. It's about noticing
the trend, which is simple enough. We started with text interfaces in the command line
(for example, IBM and MS DOS, early versions of Windows, Linux, UNIX, Amiga,
Atari, and so on). Then, we slowly moved toward more visual interfaces (GUI). With
advancements in networking, GPU, CPU, and monitoring technologies, we've reached
a phase in which we want a shiny, 4K-resolution monitor with 4-megapixel resolutions,
low latency, huge CPU power, fantastic colors, and a specific user experience. That user
experience needs to be immediate, and it shouldn't really matter that we're using a local
OS or a remote one (VDI, the cloud, or whatever the background technology is).
This means that along with all the hardware components that we just mentioned, other
(software) components needed to be developed as well. Specifically, what needed to be
developed were high-quality remote display protocols, which nowadays must be able to be
extended to a browser-based usage model, as well. People don't want to be forced to install
additional applications (clients) to access their remote resources.
Discussing remote display protocols 195
If we further expand our list to protocols that are being used for VDI, then the list
increases further:
• Teradici PCoIP (PC over IP): A UDP-based VDI protocol that we can use to access
virtual machines on VMware, Citrix and Microsoft-based VDI solutions
• VMware Blast Extreme: VMware's answer to PcoIP for VMware Horizon-based
VDI solution
• Citrix HDX: Citrix's protocol for virtual desktops.
最新资料最新资料
Of course, there are others that are available but not used as much and are way less
important, such as the following:
• Colorado CodeCraft
• OpenText Exceed TurboX
• NoMachine
• FreeNX
• Apache Guacamole
• Chrome Remote Desktop
• Miranex
The major differences between regular remote protocols and fully featured VDI protocols
are related to additional functionalities. For example, on PCoIP, Blast Extreme, and HDX,
you can fine-tune bandwidth settings, control USB and printer redirection (manually
or centrally via policies), use multimedia redirection (to offload media decoding), Flash
redirection (to offload Flash), client drive redirection, serial port redirection, and dozens of
other features. You can't do some of these things on VNC or Remote Desktop, for example.
196 Virtual Display Devices and Protocols
Having said that, let's discuss two of the most common ones in the open source world:
VNC and SPICE.
最新资料最新资料
When adding VNC graphics, you will be presented with the options shown in the
preceding screenshot:
For example, let's add VNC graphics to a virtual machine called PacktGPUPass and
then modify its VNC listening IP to 192.168.122.1:
You can also use virsh to edit PacktGPUPass and change the parameters individually.
Why VNC?
You can use VNC when you access virtual machines on LAN or to access the VMs
directly from the console. It is not a good idea to expose virtual machines over a public
network using VNC as the connection is not encrypted. VNC is a good option if the
virtual machines are servers with no GUI installed. Another point that is in favor of VNC
is the availability of clients. You can access a virtual machine from any operating system
platform as there will be a VNC viewer available for that platform.
198 Virtual Display Devices and Protocols
Important Note
Qumranet originally developed SPICE as a closed source code base in 2007.
Red Hat, Inc. acquired Qumranet in 2008, and in December 2009, they decided
to release the code under an open source license and treat the protocol as an
open standard.
SPICE is the only open source solution available on Linux that gives two-way audio. It
has high-quality 2D rendering capabilities that can make use of a client system's video
card. SPICE also supports multiple HD monitors, encryption, smart card authentication,
compression, and USB passthrough over the network. For a complete list of features, you
can visit http://www.spice-space.org/features.html. If you are a developer
and want to know about the internals of SPICE, visit http://www.spice-space.
org/documentation.html. If you are planning for VDI or installing virtual machines
that need GUIs, SPICE is the best option for you.
最新资料最新资料
SPICE may not be compatible with some older virtual machines as they do not have
support for QXL. In those cases, you can use SPICE along with other video generic
virtual video cards.
Now, let's learn how to add a SPICE graphics server to our virtual machine. This can be
considered the best-performing virtual display protocol in the open source world.
最新资料最新资料
You will be presented with a list of virtual machines available on the hypervisor.
Select the one you have to access, as shown in the following screenshot:
最新资料最新资料
If your environment is restricted to only a text console, then you must rely on
your favorite virsh – to be more specific, virsh console vm_name. This
needs some additional configuration inside the virtual machine OS, as described
in the following steps.
Methods to access a virtual machine console 201
3. If your Linux distro is using GRUB (not GRUB2), append the following line to
your existing boot Kernel line in /boot/grub/grub.conf and shut down the
virtual machine:
console=tty0 console=ttyS0,115200
If your Linux distro is using GRUB2, then the steps become a little complicated.
Note that the following command has been tested on a Fedora 22 virtual machine.
For other distros, the steps to configure GRUB2 might be different, though the
changes that are required for GRUB configuration file should remain the same:
# cat /etc/default/grub (only relevant variables are
shown)
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rd.lvm.lv=fedora/swap rd.lvm.
lv=fedora/root rhgb quiet"
4. Now, shut down the virtual machine. Then, start it again using virsh:
# virsh shutdown PacktGPUPass
# virsh start PacktGPUPass --console
5. Run the following command to connect to a virtual machine console that has
already started:
# virsh console PacktGPUPass
In some cases, we have seen a console command stuck at ^]. To work around this, press
the Enter key multiple times to see the login prompt. Sometimes, configuring a text
console is very useful when you want to capture the boot messages for troubleshooting
purposes. Use ctrl +] to exit from the console.
Our next topic takes us to the world of noVNC, another VNC-based protocol that has a
couple of major advantages over the regular VNC. Let's discuss these advantages and the
implementation of noVNC now.
• Clipboard copy-paste
• Supports resolution scaling and resizing
• It's free under the MPL 2.0 license
• It's rather easy to install and supports authentication and can easily be implemented
securely via HTTPS
• Virtual machine(s) that are configured to accept VNC connections, preferably with
a bit of configuration done – a password and a correctly set up network interface
to connect to the virtual machine, for instance. You can freely use tigervnc-
server, configure it to accept connections on – for example – port 5901 for a
specific user, and use that port and server's IP address for client connections.
Getting display portability with noVNC 203
• noVNC installation on a client computer, which you can either download from
EPEL repositories or as a zip/tar.gz package and run directly from your web
browser. To install it, we need to type in the following sequence of commands:
yum -y install novnc
cd /etc/pki/tls/certs
openssl req -x509 -nodes -newkey rsa:2048 -keyout /etc/
pki/tls/certs/nv.pem -out /etc/pki/tls/certs/nv.pem -days
365
websockify -D --web=/usr/share/novnc --cert=/etc/pki/tls/
certs/nv.pem 6080 localhost:5901
最新资料最新资料
Here, we can use our VNC server password for that specific console. After typing in the
password, we get this:
Figure 6.16 – noVNC console in action – we can see the virtual machine console and use
it to work with our virtual machine
We can also use all these options in oVirt. During the installtion of oVirt, we just need to
最新资料最新资料
--otopi-environment="OVESETUP_CONFIG/
websocketProxyConfig=bool:True"
This option will enable oVirt to use noVNC as a remote display client, on top of the
existing SPICE and VNC.
Getting display portability with noVNC 205
Let's take a look at an example of configuring a virtual machine in oVirt with pretty much
all of the options that we've discussed in this chapter. Pay close attention to the Monitors
configuration option:
最新资料最新资料
Figure 6.17 – oVirt also supports all the devices we discussed in this chapter
If we click on the Graphics protocol submenu, we will get the option to use SPICE, VNC,
noVNC, and various combinations thereof. Also, at the bottom of the screen, we have
available options for a number of monitors that we want to see in our remote display. This
can be very useful if we want to have a high-performance multi-display remote console.
206 Virtual Display Devices and Protocols
Seeing that noVNC has been integrated to noVNC as well, you can treat this as a sign of
things to come. Think about it from this perspective – everything related to management
applications in IT has steadily been moving to web-based applications for years now. It's
only logical that the same things happen to virtual machine consoles. This has also been
implemented in other vendors' solutions, so seeing noVNC being used here shouldn't be
a big surprise.
Summary
In this chapter, we covered virtual display devices and protocols used to display virtual
machine data. We also did some digging into the world of GPU sharing and GPU
passthrough, which are important concepts for large-scale virtualized environments
running VDI. We discussed some benefits and drawbacks to these scenarios as they tend
to be rather complex to implement and require a lot of resources – financial resources
included. Imagine having to do PCI passthrough for 2D/3D acceleration for 100 virtual
machines. That would actually require buying 100 graphic cards, which is a big, big
ask financially. Among the other topics we discussed, we went through various display
protocols and options that can be used for console access to our virtual machines.
In the next chapter, we will take you through some regular virtual machine operations –
installation, configuration, and life cycle management, including discussing snapshots and
最新资料最新资料
Questions
1. Which types of virtual machine display devices can we use?
2. What are the main benefits of using a QXL virtual display device versus VGA?
3. What are the benefits and drawbacks of GPU sharing?
4. What are the benefits of GPU PCI passthrough?
5. What are the main advantages of SPICE versus VNC?
6. Why would you use noVNC?
Further reading 207
Further reading
Please refer to the following links for more information regarding what was covered in
this chapter:
最新资料最新资料
最新资料最新资料
7
Virtual Machines:
Installation,
Configuration, 最新资料最新资料
Using virt-manager
virt-manager is the go-to GUI utility to manage KVM VMs. It's very intuitive and
easy to use, albeit lacking in functionality a bit, as we will describe a bit later. This is the
main virt-manager window:
Creating a new VM using virt-manager 211
Figure 7.2 – Connecting to other KVM hosts by using the Add Connection… option
212 Virtual Machines: Installation, Configuration, and Life Cycle Management
After we select the Add Connection… option, we will be greeted by a wizard to connect
to the external host, and we just need to punch in some basic information—the username
(it has to be a user that has administrative rights) and hostname or Internet Protocol (IP)
address of the remote server. Before we do that, we also need to configure Secure Shell
(SSH) keys on our local machine and copy our key to that remote machine, as this is
the default authentication method for virt-manager. The process is shown in the
following screenshot:
最新资料最新资料
As this wizard is the same as the wizard for installing VMs on your local server, we'll cover
both of these scenarios in one go. The first step in the New VM wizard is selecting where
you're installing your VM from. As you can see in the following screenshot, there are four
available options:
In VMware ESX integrated (ESXi)-based infrastructures, people often use ISO datastores
or content libraries for this functionality. In Microsoft Hyper-V-based infrastructures,
people usually have a Server Message Block (SMB) file share with ISO files needed for
a VM installation. It would be quite pointless to have a copy of an operating system ISO
per host, so some kind of a shared approach is much more convenient and is a good
space-saving mechanism.
Let's say that we're installing a VM from a network (HyperText Transfer Protocol
(HTTP), HyperText Transfer Protocol Secure (HTTPS), or File Transfer Protocol
(FTP)). We're going to need a couple of things to proceed, as follows:
ks=http://kickstart_file_url/file.ks
Note that we manually selected Red Hat Enterprise Linux 8.0 as the target guest
operating system as virt-manager doesn't currently recognize CentOS 8 (1905) as
the guest operating system from the URL that we specified. If the operating system had
been on the list of currently recognized operating systems, we could've just selected the
Automatically detect from installation media / source checkbox, which you sometimes
need to re-check and uncheck a couple of times before it works.
After clicking on the Forward button, we're faced with memory and central processing
unit (CPU) settings for this VM. Again, you can go in two different directions here,
as follows:
• Select the bare minimum of resources (for example, 1 virtual CPU (vCPU) and 1
GB of memory), and then change that afterward if you need more CPU horsepower
and/or more memory.
• Select a decent amount of resources (for example, 2 vCPU and 4 GB of memory)
with a specific usage in mind. For example, if the intended use case for this VM
is a file server, you won't get an awful lot of performance if you add 16 vCPUs
and 64 GB of memory to it, but there might be other use cases in which this
will be appropriate.
The next step is configuring the VM storage. There are two available options, as we can see
in the following screenshot:
最新资料最新资料
It's very important that you select a proper storage device for the VM, as you might have
various problems in the future if you don't. For example, if you put your VM on the wrong
storage device in a production environment, you'll have to migrate storage of that VM
to another storage device, which is a tedious and time-consuming process that will have
some nasty side effects if you have loads of VMs running on the source or destination
storage device. For starters, it will seriously impact their performance. Then, if you have
some dynamic workload management mechanism in your environment, it could trigger
additional VM or VM storage movement in your infrastructure. Features such as VMware's
Distributed Resource Scheduler (DRS)/Storage DRS, Hyper-V performance and resource
optimization (with System Center Operations Manager (SCOM) integration), and oVirt/
Red Hat Enterprise Virtualization cluster scheduling policies do things such as that. So,
adopting the think twice, do once strategy might be the correct approach here.
If you select the first available option, Create a disk image for the virtual machine,
virt-manager will create a VM hard disk in its default location—for Red Hat
Enterprise Linux (RHEL) and CentOS, that's in the /var/lib/libvirt/images
directory. Make sure that you have enough space for your VM hard disk. Let's say that
we have 8 GB of space available in the /var/lib/libvirt/images directory and its
underlying partition. If we leave everything as-is from the previous screenshot, we'd get
an error message because we tried to create a 10 GB file on a local disk where only 8 GB
is available. 最新资料最新资料
After we click the Forward button again, we're at the final step of the VM creation
process, where we can select the VM name (as it will appear in virt-manager),
customize the configuration before the installation process, and select which virtual
network the VM will use. We will cover the hardware customization of the VM a bit later
in the chapter. After you click Finish, as shown in the following screenshot, your VM will
be ready for deployment and—after we install the operating system—use:
Creating a new VM using virt-manager 217
最新资料最新资料
virt-viewer
As we've already used the virt-install command heavily before (check out
Chapter 3, Installing a Kernel-based Virtual Machine (KVM) Hypervisor, libvirt, and ovirt,
where we installed quite a few VMs by using this command), we're going to cover the
remaining commands.
Let's start with virt-viewer, as we've used this application before. Every time we
double-click on a VM in virt-viewer, we open a VM console, and that happens
to be virt-viewer in the background of this procedure. But if we wanted to use
virt-viewer from a shell—as people often do—we need some more information
about it. So, let's use a couple of examples.
First, let's connect to a local KVM called MasteringKVM01, which resides on the host
that we're currently connected to as root , by running the following command:
We could also connect to the VM in kiosk mode, which means that virt-viewer
will close when we shut down the VM that we connect to. To do this, we would run the
following command:
--kiosk-quit on-disconnect
If we need to connect to a remote host, we can also use virt-viewer, but we need a
couple of additional options. The most common way to authenticate to a remote system
is through SSH, so we can do the following:
virt-xml
The next command-line utility on our list is virt-xml. We can use it with virt-
install command-line options to change the VM configuration. Let's start with
a basic example—let's just enable the boot menu for the VM, as follows:
Then, let's add a thin-provisioned disk to the VM, in three steps— first, create the disk
itself, and then attach it to the VM and check that everything worked properly. The output
can be seen in the following screenshot:
最新资料最新资料
Figure 7.9 – Adding a thin-provision QEMU copy-on-write (qcow2) format virtual disk to a VM
As we can see, virt-xml is quite useful. By using it, we added another virtual disk to
our VM, and that's one of the simplest things that it can do. We can use it to deploy any
additional piece of VM hardware to an existing VM. We can also use it to edit a VM
configuration, which is really handy in larger environments, especially when you have
to script and automate such procedures.
220 Virtual Machines: Installation, Configuration, and Life Cycle Management
virt-clone
Let's now check virt-clone by using a couple of examples. Let's say we just want a
quick and easy way to clone an existing VM without any additional hassle. We can do
the following:
最新资料最新资料
Let's see how this could be a bit more customized. By using virt-clone, we are going
to create a VM named MasteringKVM05, by cloning a VM named MasteringKVM04,
and we are going to customize virtual disk names as well, as illustrated in the following
screenshot:
最新资料最新资料
Figure 7.11 – Customized VM creation: customizing VM names and virtual hard disk filenames
There are situations in real life that require you to convert VMs from one virtualization
technology to another. The bulk of that work is actually converting the VM disk format
from one format to another. That's what virt-convert is all about. Let's learn how it
does its job.
222 Virtual Machines: Installation, Configuration, and Life Cycle Management
qemu-img
Let's now check how we will convert a virtual disk to another format, and how we will
convert a VM configuration file from one virtualization method to another. We will use an
empty VMware VM as a source and convert its vmdk virtual disk and .vmx file to a new
format, as illustrated in the following screenshot:
Figure 7.12 – Converting VMware virtual disk to qcow2 format for KVM
If we are faced with projects that involve moving or converting VMs between these
platforms, we need to make sure that we use these utilities as they are easy to use and
understand and only require one thing—a bit of time. For example, if we have a 1 terabyte
(TB) VMware virtual disk (VM Disk (VMDK) and flat VMDK file), it might take hours
for that file to be converted to qcow2 format, so we have to be patient. Also, we need to be
prepared to edit vmx configuration files from time to time as the conversion process from
vmx to kvm format isn't 100% smooth, as we might expect it to be. During the course
最新资料最新资料
of this process, a new configuration file is created. The default directory for KVM VM
configuration files is /etc/libvirt/qemu, and we can easily see Extensible Markup
Language (XML) files in that directory—these are our KVM VM configuration files.
Filenames represent VM names from the virsh list output.
There are also some new utilities in CentOS 8 that will make it easier for us to manage
not only the local server but also VMs. The Cockpit web interface is one of those—it has
the capability to do basic VM management on a KVM host. All we need to do is connect
to it via a web browser, and we mentioned this web application in Chapter 3, Installing a
Kernel-based VM (KVM) Hypervisor, libvirt, and ovirt, when discussing the deployment
of oVirt appliances. So, let's familiarize ourselves with VM management by using Cockpit.
Creating a new VM using virt-manager 223
最新资料最新资料
Figure 7.14 – Cockpit web console, which we can use to deploy VMs
224 Virtual Machines: Installation, Configuration, and Life Cycle Management
最新资料最新资料
最新资料最新资料
最新资料最新资料
最新资料最新资料
We will definitely want to configure boot options—attach a CD/ISO, add a virtual hard
disk, and configure the boot order, as illustrated in the following screenshot:
最新资料最新资料
最新资料最新资料
Figure 7.20 – Installing KVM VM from oVirt: make sure that you select correct boot options
Realistically, if you're managing an environment that has more than two to three KVM
hosts, you'll want to use some kind of centralized utility to manage them. oVirt is really
good for that, so don't skip it.
Now that we have done the whole deployment procedure in a variety of different ways, it's
time to think about the VM configuration. Keeping in mind that a VM is an object that
has many important attributes—such as the number of virtual CPUs, amount of memory,
virtual network cards, and so on—it's very important that we learn how to customize the
VM settings. So, let's make that our next topic.
230 Virtual Machines: Installation, Configuration, and Life Cycle Management
Configuring your VM
When we were using virt-manager, if you go all the way to the last step, there's an
interesting option that you could've selected, which is the Customize configuration
before install option. The same configuration window can be accessed if you check the
VM configuration post-install. So, whichever way we go, we'll be faced with the full scale
of configuration options for every VM hardware device that was assigned to the VM we
just created, as can be seen in the following screenshot:
最新资料最新资料
For example, if we click on the CPUs option on the left-hand side, you will see the number
of available CPUs (current and maximum allocation), and we'll also see some pretty
advanced options such as CPU topology (Sockets/Cores/Threads), which enables us to
configure specific non-uniform memory access (NUMA) configuration options. Here's
what that configuration window looks like:
最新资料最新资料
最新资料最新资料
最新资料最新资料
In terms of configuration options, by far the most feature-rich configuration menu for
virt-manager is the virtual storage menu—in our case, VirtIO Disk 1. If we click
on that, we're going to get the following selection of configuration options:
最新资料最新资料
• Disk bus—There are usually five options here, VirtIO being the default (and the
best) one. Just as with Vmware, ESXi, and Hyper-V, KVM has different virtual
storage controllers available. For example, VMware has BusLogic, LSI Logic,
Paravirtual, and other types of virtual storage controllers, while Hyper-V has the
integrated drive electronics (IDE) and small computer system interface (SCSI)
controllers. This option defines the storage controller that the VM is going to see
inside its guest operating system.
Configuring your VM 235
• Storage format—There are two formats: qcow2 and raw (dd type format).
The most common option is qcow2 as it offers the most flexibility for VM
management—for example, it supports thin provisioning and snapshots.
• Cache mode—There are six types: writethrough, writeback, directsync,
unsafe, none, and default. These modes explain how data gets written from
an I/O that originated from the VM to the storage underlay below the VM. For
example, if we're using writethrough, the I/O gets cached on the KVM host and
is written through to the VM disk as well. On the other hand, if we're using none,
there's no caching on the host (except for the disk writeback cache), and data
gets written to the VM disk directly. Different modes have different pros and cons,
but generally, none is the best option for VM management. You can read more
about them in the Further reading section.
• IO mode—There are two modes: native and threads. Depending on this
setting, the VM I/O will be either written via kernel asynchronous I/O or via pool
of threads in the user space (which is the default value, as well). When working
with qcow2 format, it's generally accepted that threads mode is better as qcow2
format first allocates sectors and then writes to them, which will hog vCPUs
allocated to the VM and have direct influence on I/O performance.
• Discard mode—There are two available modes here, called ignore and unmap.
If you select unmap, when you delete files from your VM (which translates to
最新资料最新资料
free space in your qcow2 VM disk file), the qcow2 VM disk file will shrink to
reflect the newly freed capacity. Depending on which Linux distribution, kernel,
and kernel patches you have applied and the Quick Emulator (QEMU) version,
this function might only be available on a SCSI disk bus. It's supported for QEMU
version 4.0+.
• Detect zeroes—There are three modes available: off, on, and unmap. If you
select unmap, zero write will be translated as an unmapping operation (as explained
in discard mode). If you set it to on, zero writes by the operating system will be
translated to specific zero write commands.
During the lifespan of any given VM, there's a significant chance that we will reconfigure
it. Whether that means adding or removing virtual hardware (of course, usually, it's
adding), it's an important aspect of a VM's life cycle. So, let's learn how to manage that.
236 Virtual Machines: Installation, Configuration, and Life Cycle Management
最新资料最新资料
最新资料最新资料
Migrating VMs
In simple terms, migration enables you to move your VM from one physical machine to
another physical machine, with a very minimal downtime or no downtime. We can also
move VM storage, which is a resource-hog type of operation that needs to be carefully
planned and—if possible —executed after hours so that it doesn't affect other VMs'
performance as much as it could.
There are various different types of migration, as follows:
• Offline (cold)
• Online (live)
• Suspended migration
There are also various different types of online migrations, depending on what you're
moving, as follows:
• The compute part of the VM (moving the VM from one KVM host to another
KVM host)
• The storage part of the VM (moving VM files from one storage pool to another
storage pool) 最新资料最新资料
• Both (moving the VM from host to host and storage pool to storage pool at the
same time)
There are some differences in terms of which migration scenarios are supported if you're
using just a plain KVM host versus oVirt or Red Hat Enterprise Virtualization. If you want
to do a live storage migration, you can't do it on a KVM host directly, but you can easily
do it if the VM is shut down. If you need a live storage migration, you will have to use
oVirt or Red Hat Enterprise Virtualization.
We discussed single-root input-output virtualization (SR-IOV), Peripheral Component
Interconnect (PCI) device passthrough, virtual graphics processing units (vGPUs), and
similar concepts as well (in Chapter 2, KVM as a Virtualization Solution, and Chapter 4,
Libvirt Networking). In CentOS 8, you can't live-migrate a VM that has either one of these
options assigned to a running VM.
Whatever the use case is, we need to be aware of the fact that migration needs to be
performed either as the root user or as a user that belongs to the libvirt user
group (what Red Hat refers to as system versus user libvirt session).
Migrating VMs 239
There are different reasons why VM migration is a valuable tool to have in your arsenal.
Some of these reasons are obvious; others, less so. Let's try to explain different use cases
for VM migration and its benefits.
Benefits of VM migration
The most important benefits of VM live migration are listed as follows:
VM migration needs proper planning to be put in place. There are some basic
requirements the migration looks for. Let's see them one by one.
The migration requirements for production environments are the following:
最新资料最新资料
Check out Chapter 4, Libvirt Networking, and Chapter 5, Libvirt Storage, to remind
yourself how to create a storage pool using shared storage.
There are, as always, some rules that apply here. These are rather simple, so we need to
learn them before starting migration processes. They are as follows:
There are some pre-requisites that we need to have in mind when planning our
environment for VM migration. For the most part, these pre-requisites are mostly the
same for all virtualization solutions. Let's discuss these pre-requisites and, in general,
how to set up our environment for VM migration next.
Let's build the environment to do VM migration—both offline and live migrations. The
following diagram depicts two standard KVM virtualization hosts running VMs with
a shared storage:
We're going to create a NFS share on CentOS 8 server. It's going to be hosted in /testvms
directory, which we're going to export via NFS. The name of the server is nfs-01. (in our
case, IP address of nfs-01 is 192.168.159.134)
1. The first step is creating and exporting the /testvms directory from nfs-01 and
turning off SELinux (check Chapter 5, Libvirt Storage, Ceph section to see how):
# mkdir /testvms
# echo '/testvms *(rw,sync,no_root_squash)' >> /etc/
exports
2. Then, allow the NFS service in the firewall by executing the following code:
# firewall-cmd --get-active-zones
public
interfaces: ens33
# firewall-cmd --zone=public --add-service=nfs
# firewall-cmd --zone=public --list-all
4. Confirm that the share is accessible from your KVM hypervisors. In our case, it is
PacktPhy01 and PacktPhy02. Run the following code:
5. If mounting fails, reconfigure the firewall on the NFS server and recheck the mount.
This can be done by using the following commands:
firewall-cmd --permanent --zone=public --add-service=nfs
firewall-cmd --permanent --zone=public
--add-service=mountd
firewall-cmd --permanent --zone=public --add-service=rpc-
bind
firewall-cmd -- reload
6. Unmount the volume once you have verified the NFS mount point from both
hypervisors, as follows:
# umount /mnt
242 Virtual Machines: Installation, Configuration, and Life Cycle Management
The testvms storage pool is now created and started on two hypervisors.
In this next example, we are going to isolate the migration and VM traffic. It is highly
recommended that you do this isolation in your production environment, especially if
you do a lot of migrations, as it will offload that demanding process to a separate network
interface, thus freeing other congested network interfaces. So, there are two main reasons
for this, as follows:
We will discuss three of the most important scenarios— offline migration, non-live
migration (suspended), and live migration (online). Then, we will discuss storage
migration as a separate scenario that requires additional planning and forethought.
Migrating VMs 243
Offline migration
As the name suggests, during offline migration, the state of the VM will be either shut
down or suspended. The VM will be then resumed or started at the destination host. In
this migration model, libvirt will just copy the VM's XML configuration file from the
source to the destination KVM host. It also assumes that you have the same shared storage
pool created and ready to use at the destination. As the first step in the migration process,
you need to set up two-way passwordless SSH authentication on the participating KVM
hypervisors. In our example, they are called PacktPhy01 and PacktPhy02.
For the following exercises, disable Security-Enhanced Linux (SELinux) temporarily.
In /etc/sysconfig/selinux, use your favorite editor to modify the following line
of code:
SELINUX=enforcing
SELINUX=permissive
Also, in the command line, as root, we need to temporarily set SELinux mode to
permissive, as follows: 最新资料最新资料
# setenforce 0
# ssh-keygen
# ssh-copy-id root@PacktPhy02
# ssh-keygen
# ssh-copy-id root@PacktPhy01
You should now be able to log in to both of these hypervisors as root without typing
a password.
Let's do an offline migration of MasteringKVM01, which is already installed, from
PacktPhy01 to PacktPhy02. The general format of the migration command looks
similar to the following:
When a VM is on shared storage and you have some kind of issue with one of the hosts,
you could also manually register a VM on another host. That means that you might end
up in a situation where the same VM is registered on two hypervisors, after you repair
the issue on your host that had an initial problem. It's something that happens when
you're manually managing KVM hosts without a centralized management platform such
最新资料最新资料
as oVirt, which wouldn't allow such a scenario. So, what happens if you're in that kind of
situation? Let's discuss this scenario.
• lockd: lockd makes use of the POSIX fcntl() advisory locking capability.
It was started by the virtlockd daemon. It requires a shared filesystem
(preferably NFS), accessible to all the hosts that share the same storage pool.
• sanlock: This is used by oVirt projects. It uses a disk paxos algorithm for
maintaining continuously renewed leases.
Enabling lockd
For image-based storage pools that are POSIX-compliant, you can enable lockd easily
by uncommenting the following command in /etc/libvirt/qemu.conf or on
both hypervisors:
lock_manager = "lockd"
Now, enable and start the virtlockd service on both the hypervisors. Also, restart
libvirtd on both the hypervisors, as follows:
Another method to enable lockd is to use a hash of the disk's file path. Locks are saved
in a shared directory that is exported through NFS, or similar sharing, to the hypervisors.
This is very useful when you have virtual disks that are created and attached using a
multipath logical unit number (LUN). fcntl() cannot be used in such cases. We
recommend that you use the methods detailed next to enable the locking.
On the NFS server, run the following code (make sure that you're not running any virtual
machines from this NFS server first!):
mkdir /flockd
# echo "/flockd *(rw,no_root_squash)" >> /etc/exports
# systemctl restart nfs-server
# showmount -e
Export list for :
/flockd *
/testvms *
246 Virtual Machines: Installation, Configuration, and Life Cycle Management
Add the following code to both the hypervisors in /etc/fstab and type in the rest of
these commands:
Reboot both hypervisors, and, once rebooted, verify that the libvirtd and virtlockd
daemons started correctly on both the hypervisors, as follows:
MasteringKVM01 has two virtual disks, one created from an NFS storage pool and the
other created directly from a LUN. If we try to power it on the PacktPhy02 hypervisor
host, MasteringKVM01 fails to start, as can be seen in the following code snippet:
When using LVM volumes that can be visible across multiple host systems, it is desirable
to do the locking based on the universally unique identifier (UUID) associated with
each volume, instead of their paths. Setting the following path causes libvirt to do
UUID-based locking for LVM:
lvm_lockspace_dir = "/var/lib/libvirt/lockd/lvmvolumes"
Migrating VMs 247
When using SCSI volumes that can be visible across multiple host systems, it is desirable
to do locking based on the UUID associated with each volume, instead of their paths.
Setting the following path causes libvirt to do UUID-based locking for SCSI:
scsi_lockspace_dir = "/var/lib/libvirt/lockd/scsivolumes"
Important note
If you are not able to start VMs due to locking errors, just make sure that they
are not running anywhere and then delete the lock files. Start the VM again. We
deviated a little from migration for the lockd topic. Let's get back to migration.
Migration implementation in KVM does not need any support from the VM. It means
that you can live-migrate any VMs, irrespective of the operating system they are using.
A unique feature of KVM live migration is that it is almost completely hardware-
independent. You should ideally be able to live-migrate a VM running on a hypervisor
that has an Advanced Micro Devices (AMD) processor to an Intel-based hypervisor.
We are not saying that this will work in 100% of the cases or that we in any way recommend
having this type of mixed environment, but in most of the cases, it should be possible.
Before we start the process, let's go a little deeper to understand what happens under the
hood. When we do a live migration, we are moving a live VM while users are accessing
it. This means that users shouldn't feel any disruption in VM availability when you do
a live migration.
248 Virtual Machines: Installation, Configuration, and Life Cycle Management
Live migration is a five-stage, complex process, even though none of these processes are
exposed to the sysadmins. libvirt will do the necessary work once the VM migration
action is issued. The stages through which a VM migration goes are explained in the
following list:
1. Preparing the destination: When you initiate a live migration, the source
libvirt (SLibvirt) will contact the destination libvirt (DLibvirt) with
the details of the VM that is going to be transferred live. DLibvirt will pass
this information to the underlying QEMU, with relevant options to enable live
migration. QEMU will start the actual live migration process by starting the VM
in pause mode and will start listening on a Transmission Control Protocol
(TCP) port for VM data. Once the destination is ready, DLibvirt will inform
SLibvirt, with the details of QEMU. By this time, QEMU, at the source, is ready
to transfer the VM and connects to the destination TCP port.
2. Transferring the VM: When we say transferring the VM, we are not transferring
the whole VM; only the parts that are missing at the destination are transferred—for
example, the memory and the state of the virtual devices (VM state). Other than the
memory and the VM state, all other virtual hardware (virtual network, virtual disks,
and virtual devices) is available at the destination itself. Here is how QEMU moves
the memory to the destination: 最新资料最新资料
a) The VM will continue running at the source, and the same VM is started in
pause mode at the destination.
b) In one go, it will transfer all the memory used by the VM to the destination. The
speed of transfer depends upon the network bandwidth. Suppose the VM is using
10 gibibytes (GiB); it will take the same time to transfer 10 GiB of data using the
Secure Copy Protocol (SCP) to the destination. In default mode, it will make use of
the full bandwidth. That is the reason we are separating the administration network
from the VM traffic network.
c) Once the whole memory is at the destination, QEMU starts transferring the dirty
pages (pages that are not yet written to the disk). If it is a busy VM, the number of
dirty pages will be high and it will take time to move them. Remember, dirty pages
will always be there and there is no state of zero dirty pages on a running VM.
Hence, QEMU will stop transferring the dirty pages when it reaches a low threshold
(50 or fewer pages).
QEMU will also consider other factors, such as iterations, the number of dirty pages
generated, and so on. This can also be determined by migrate-setmaxdowntime,
which is in milliseconds.
Migrating VMs 249
3. Stopping the VM on the source host: Once the number of dirty pages reaches
the said threshold, QEMU will stop the VM on the source host. It will also sync
the virtual disks.
4. Transferring the VM state: At this stage, QEMU will transfer the state of the VM's
virtual devices and remaining dirty pages to the destination as quickly as possible.
We cannot limit the bandwidth at this stage.
5. Continuing the VM: At the destination, the VM will be resumed from the paused
state. Virtual network interface controllers (NICs) become active, and the bridge
will send out gratuitous Address Resolution Protocols (ARPs) to announce the
change. After receiving the announcement from the bridge, the network switches
will update their respective ARP cache and start forwarding the data for the VM
to the new hypervisors.
Note that Steps 3, 4, and 5 will be completed in milliseconds. If some errors happen,
QEMU will abort the migration and the VM will continue running on the source
hypervisor. All through the migration process, libvirt services from both participating
hypervisors will be monitoring the migration process.
Our VM called MasteringKVM01 is now running on PacktPhy01 safely, with lockd
enabled. We are going to live-migrate MasteringKVM01 to PacktPhy02.
最新资料最新资料
We need to open the necessary TCP ports used for migration. You only need to do that
at the destination server, but it's a good practice to do this in your whole environment
so that you don't have to micro-manage these configuration changes as you need them
in the future, one by one. Basically, you have to open the ports on all the participating
hypervisors by using the following firewall-cmd command for the default zone
(in our case, the public zone):
Check and verify all the virtual disks attached are available at the destination, on the
same path, with the same storage pool name. This is applicable to attached unmanaged
(iSCSI and FC LUNs, and so on) virtual disks also.
Check and verify all the network bridges and virtual networks used by the VM available at
the destination. After that, we can start the migration process by running the following code:
Our VM is using only 4,096 megabytes (MB) of memory, so all five stages completed
in a couple of seconds. The --persistent option is optional, but we recommend
adding this.
This is the output of ping during the migration process (10.10.48.24 is the IP address
of MasteringKVM01):
# ping 10.10.48.24
PING 10.10.48.24 (10.10.48.24) 56(84) bytes of data.
64 bytes from 10.10.48.24: icmp_seq=12 ttl=64 time=0.338 ms
64 bytes from 10.10.48.24: icmp_seq=13 ttl=64 time=3.10 ms
最新资料最新资料
If you get the following error message, change cache to none on the virtual disk attached:
target is the disk to change the cache. You can find the target name by running the
following command:
You can try a few more options while performing a live migration, as follows:
• --undefine domain: Option used to remove a KVM domain from a KVM host.
• --suspend domain: Suspends a KVM domain—that is, pauses a KVM domain
until we unsuspend it.
• --compressed: When we do a VM migration, this option enables us to compress
memory. That will mean a faster migration process, based on the –comp-methods
parameter.
• --abort-on-error: If the migration process throws an error, it is automatically
stopped. This is a safe default option as it will help in situations where any kind of
corruption might happen during the migration process.
• --unsafe: Kind of like the polar opposite of the –abort-on-error option. This
option forces migration at all costs, even in the case of error, data corruption, or any
other unforeseen scenario. Be very careful with this option—don't use it often, or
in any situation where you want to be 100% sure that VM data consistency is a key
最新资料最新资料
pre-requisite.
You can read more about these options in the RHEL 7—Virtualization Deployment and
Administration guide (you can find the link in the Further reading section at the end of
this chapter). Additionally, the virsh command also supports the following options:
As you can see, migration is a complex process from a technical standpoint, and has
multiple different types and loads of additional configuration options that you can use
for management purposes. That being said, it's still such an important capability of a
virtualized environment that it's very difficult to imagine working without it.
Summary
In this chapter, we covered different ways of creating VMs and configuring VM hardware.
We also covered VM migration in detail, and live and offline VM migration. In the next
chapter, we will work with VM disks, VM templates, and snapshots. These concepts are
very important to understand as they will make your life administering a virtualized
最新资料最新资料
Questions
1. Which command-line tools can we use to deploy VMs in libvirt?
2. Which GUI tools can we use to deploy VMs in libvirt?
3. When configuring our VMs, which configuration aspects should we be careful with?
4. What's the difference between online and offline VM migration?
5. What's the difference between VM migration and VM storage migration?
6. How can we configure bandwidth for the migration process?
Further reading 253
Further reading
Please refer to the following links for more information regarding what was covered in
this chapter:
Snapshots
This chapter represents the end of second part of the book, in which we focused on
various libvirt features—installing Kernel-based Virtual Machine (KVM) as
a solution, libvirt networking and storage, virtual devices and display protocols,
installing virtual machines (VMs) and configuring them… and all of that as a preparation
for things that are coming in the next part of the book, which is about automation,
customization, and orchestration. In order for us to be able to learn about those concepts,
we must now switch our focus to VMs and their advanced operations—modifying,
templating, using snapshots, and so on. Some of these topics will often be referenced later
in the book, and some of these topics will be even more valuable for various business
reasons in a production environment. Let's dive in and cover them.
256 Creating and Modifying VM Disks, Templates, and Snapshots
• virt-edit
• virt-filesystems
• virt-rescue
• virt-sparsify
• virt-sysprep
• virt-v2v
• virt-p2v
We'll start with five of the most important commands—virt-v2v, virt-p2v, virt-
copy-in, virt-customize, and guestfish. We will cover virt-sysprep when
we cover VM templating, and we have a separate part of this chapter dedicated to virt-
builder, so we'll skip these commands for the time being.
virt-v2v
Let's say that you have a Hyper-V-, Xen-, or VMware-based VM and you want to convert
them to KVM, oVirt, Red Hat Enterprise Virtualization, or OpenStack. We'll just use a
VMware-based VM as an example here and convert it to a KVM VM that is going to be
managed by libvirt utilities. Because of some changes that were introduced in 6.0+
最新资料最新资料
revisions of VMware platforms (both on the ESX integrated (ESXi) hypervisor side
and on the vCenter server side and plugin side), it is going to be rather time-consuming
to export a VM and convert it to a KVM machine—either by using a vCenter server or
a ESXi host as a source. So, the simplest way to convert a VMware VM to a KVM VM
would be the following:
The reason why we need OVFtool for this is rather disappointing—it seems that
VMware removed the option to export the OVA file directly. Luckily, OVFtool exists for
Windows-, Linux-, and OS X-based platforms, so you'll have no trouble using it. Here's
the last step of the process:
The -of and -o options specify the output format (qcow2 libvirt image), and -n makes
最新资料最新资料
Make sure that you specify the VM disk location correctly. The -o local and -os /
var/lib/libvirt/images options make sure that the converted disk image gets
saved locally, in the specified directory (the KVM default image directory).
There are other types of VM conversion processes, such as converting a physical machine
to a virtual one. Let's cover that now.
Modifying VM images using libguestfs tools 259
virt-p2v
Now that we've covered virt-v2v, let's switch to virt-p2v. Basically, virt-v2v and
virt-p2v perform a job that seems similar, but the aim of virt-p2v is to convert a
physical machine to VM. Technically speaking, this is quite a bit different, as with
virt-v2v we can either access a management server and hypervisor directly and
convert the VM on the fly (or via an OVA template). With a physical machine, there's
no management machine that can provide some kind of support or application
programming interface (API) to do the conversion process. We have to attack the
physical machine directly. In the real world of IT, this is usually done via some kind
of agent or additional application.
Just as an example, if you want to convert a physical Windows machine to a VMware-based
VM, you'll have to do it by installing a VMware vCenter Converter Standalone on a system
that needs to be converted. Then, you'll have to select a correct mode of operation and
stream the complete conversion process to vCenter/ESXi. It does work rather well, but—for
example—RedHat's approach is a bit different. It uses a boot media to convert a physical
server. So, before using this conversion process, you have to log in to the Customer Portal
(located at https://access.redhat.com/downloads/content/479/ver=/
rhel---8/8.0/x86_64/product-software for Red Hat Enterprise Linux (RHEL)
8.0, and you can switch versions from the menu). Then, you will have to download a correct
image and use the virt-p2v and virt-p2v-make-disk utilities to create an image.
最新资料最新资料
guestfish
The last utility that we want to discuss in this intro part of the chapter is called
guestfish. This is a very, very important utility that enables you to do all sorts of
advanced things with actual VM filesystems. We can also use it to do different types of
conversion—for example, convert an International Organization for Standardization
(ISO) image to tar.gz; convert a virtual disk image from an ext4 filesystem to a Logical
Volume Management (LVM)-backed ext4 filesystem; and much more. We will show you
a couple of examples of how to use it to open a VM image file and root around a bit.
260 Creating and Modifying VM Disks, Templates, and Snapshots
The first example is a really common one—you have prepared a qcow2 image with a
complete VM; the guest operating system is installed; everything is configured; you're
ready to copy that VM file somewhere to be reused; and... you remember that you
didn't configure a root password according to some specification. Let's say that this was
something that you had to do for a client, and that client has specific root password
requirements for the initial root password. This makes it easier for the client—they
don't need to have a password sent by you in an email; they have only one password to
remember; and, after receiving the image, it will be used to create the VM. After the VM
has been created and run, the root password will be changed to something—according
to security practices—used by a client.
So, basically, the first example is an example of what it means to be human—forgetting
to do something, and then wanting to repair that, but (in this case) without actually
running the VM as that can change quite a few settings, especially if your qcow2 image
was created with VM templating in mind, in which case you definitely don't want to start
that VM to repair something. More about that in the next part of this chapter.
This is an ideal use case for guestfish. Let's say that our qcow2 image is called
template.qcow2. Let's change the root password to something else—for example,
packt123. First, we need a hash for that password. The easiest way to do that would be
to use openssl with the -6 option (which equals SHA512 encryption), as illustrated in
the following screenshot:
最新资料最新资料
Now that we have our hash, we can mount and edit our image, as follows:
Figure 8.3 – Using guestfish to edit the root password inside our qcow2 VM image
Shell commands that we typed in were used to get direct access to the image (without
libvirt involvement) and to mount our image in read-write mode. Then, we started
最新资料最新资料
our session (guestfish run command), checked which filesystems are present in the
image (list-filesystems), and mounted the filesystem on the root folder. In the
second-to-last step, we changed the root's password hash to the hash created by openssl.
The exit command closes our guestfish session and saves changes.
You could use a similar principle to—for example—remove forgotten sshd keys from the
/etc/ssh directory, remove user ssh directories, and so on. The process can be seen in
the following screenshot:
The second example is also rather useful, as it involves a topic covered in the next chapter
(cloud-init), which is often used to configure cloud VMs by manipulating the early
initialization of the VM instance. Also, taking a broader view of the subject, you can use
this guestfish example to manipulate the service configuration inside VM images. So,
let's say that our VM image was configured so that the cloud-init service is started
automatically. We want that service to be disabled for whatever reason—for example,
to debug an error in the cloud-init configuration. If we didn't have the capability to
manipulate qcow image content, we'd have to start that VM, use systemctl to disable
the service, and—perhaps—do the whole procedure to reseal that VM if this was a VM
template. So, let's use guestfish for the same purpose, as follows:
Important note
Be careful in this example, as normally we'd use ln -sf with a space
character between the command and options. Not so in our guestfish
example—it needs to be used without a space.
最新资料最新资料
And lastly, let's say that we need to copy a file to our image. For example, we need to copy
our local /etc/resolv.conf file to the image as we forgot to configure our Domain
Name System (DNS) servers properly. We can use the virt-copy-in command for
that purpose, as illustrated in the following screenshot:
VM templating
One of the most common use cases for VMs is creating VM templates. So, let's say that we
need to create a VM that is going to be used as a template. We use the term template here
literally, in the same manner in which we can use templates for Word, Excel, PowerPoint,
and so on, as VM templates exist for the very same reason—to have a familiar working
environment preconfigured for us so that we don't need to start from scratch. In the case of
VM templates, we're talking about not installing a VM guest operating system from scratch,
which is a huge time-saver. Imagine getting a task to deploy 500 VMs for some kind of
testing environment to test how something works when scaled out. You'd lose weeks doing
that from scratch, even allowing for the fact that you can do installations in parallel.
VMs need to be looked at as objects, and they have certain properties or attributes. From
the outside perspective (meaning, from the perspective of libvirt), a VM has a name,
a virtual disk, a virtual central processing unit (CPU) and memory configuration,
connectivity to a virtual switch, and so on. We covered this subject in Chapter 7, VM:
Installation, Configuration, and Life Cycle Management. That being said, we didn't touch
the subject of inside a VM. From that perspective (basically, from the guest operating
system perspective), a VM also has certain properties—installed guest operating system
version, Internet Protocol (IP) configuration, virtual local area network (VLAN)
configuration… After that, it depends on which operating system the family VM is based.
We thus need to consider the following: 最新资料最新资料
It can be even more specific than that. For example, preparing a template for Ubuntu-
based VMs is different from preparing a template for CentOS 8-based VMs. And to create
these templates properly, we need to learn some basic procedures that we can then use
repetitively every single time when creating a VM template.
264 Creating and Modifying VM Disks, Templates, and Snapshots
Consider this example: suppose you wish to create four Apache web servers to host your
web applications. Normally, with the traditional manual installation method, you would
first have to create four VMs with specific hardware configurations, install an operating
system on each of them one by one, and then download and install the required Apache
packages using yum or some other software installation method. This is a time-consuming
job, as you will be mostly doing repetitive work. But with a template approach, it can be
done in considerably less time. How? Because you will bypass operating system installation
and other configuration tasks and directly spawn VMs from a template that consists of a
preconfigured operating system image, containing all the required web server packages
ready for use.
The following screenshot shows the steps involved in the manual installation method. You
can clearly see that Steps 2-5 are just repetitive tasks performed across all four VMs, and
they would have taken up most of the time required to get your Apache web servers ready:
最新资料最新资料
Now, see how the number of steps is drastically reduced by simply following Steps 1-5
once, creating a template, and then using it to deploy four identical VMs. This will save
you a lot of time. You can see the difference in the following diagram:
This isn't the whole story, though. There are different ways of actually going from Step 3
to Step 4 (from Create a Template to deployment of VM1-4), which either includes a full
cloning process or a linked cloning process, detailed here:
• Full clone: A VM deployed using the full cloning mechanism creates a complete
copy of the VM, the problem being that it's going to use the same amount of
capacity as the original VM.
• Linked clone: A VM deployed using the thin cloning mechanism uses the template
image as a base image in read-only mode and links an additional copy-on-write
(COW) image to store newly generated data. This provisioning method is heavily
used in cloud and Virtual Desktop Infrastructure (VDI) environments as it saves
a lot of disk space. Remember that fast storage capacity is something that's really
expensive, so any kind of optimization in this respect will be a big money saver.
Linked clones will also have an impact on performance, as we will discuss a bit later.
Creating templates
Templates are created by converting a VM into a template. This is actually a three-step
procedure that includes the following:
1. Installing and customizing the VM, with all the desired software, which will become
the template or base image.
2. Removing all system-specific properties to ensure VM uniqueness—we need to take
care of SSH host keys, network configuration, user accounts, media access control
(MAC) address, license information, and so on.
3. Mark the VM as a template by renaming it with a template as a prefix. Some
最新资料最新资料
virtualization technologies have special VM file types for this (for example, a
VMware .vmtx file), which effectively means that you don't have to rename a VM
to mark it as a template.
To understand the actual procedure, let's create two templates and deploy a VM from
them. Our two templates are going to be the following:
• A CentOS 8 VM with a complete Linux, Apache, MySQL, and PHP (LAMP) stack
• A Windows Server 2019 VM with SQL Server Express
1. Create a VM and install CentOS 8 on it, using the installation method that you
prefer. Keep it minimal as this VM will be used as the base for the template that is
being created for this example.
VM templating 267
2. SSH into or take control of the VM and install the LAMP stack. Here's a script for
you to install everything needed for a LAMP stack on CentOS 8, after the operating
system installation has been done. Let's start with the package installation, as
follows:
yum -y update
yum -y install httpd httpd-tools mod_ssl
systemctl start httpd
systemctl enable httpd
yum -y install mariadb-server mariadb
yum install -y php php-fpm php-mysqlnd php-opcache php-gd
php-xml php-mbstring libguestfs*
After we're done with the software installation, let's do a bit of service
configuration—start all the necessary services and enable them, and reconfigure the
firewall to allow connections, as follows:
systemctl start mariadb
systemctl enable mariadb
systemctl start php-fpm
systemctl enable php-fpm 最新资料最新资料
3. After this has been done, we need to configure MariaDB, as we have to set some
kind of MariaDB root password for the database administrative user and configure
basic settings. This is usually done via a mysql_secure_installation script
provided by MariaDB packages. So, that is our next step, as illustrated in the
following code snippet:
mysql_secure_installation
最新资料最新资料
Figure 8.9 – First part of MariaDB setup: assigning a root password that is empty after installation
VM templating 269
After assigning a root password for the MariaDB database, the next steps are more
related to housekeeping—removing anonymous users, disallowing remote login,
and so on. Here's what that part of wizard looks like:
最新资料最新资料
Figure 8.10 – Housekeeping: anonymous users, root login setup, test database data removal
We installed all the necessary services—Apache, MariaDB—and all the necessary
additional packages (PHP, FastCGI Process Manager (FPM)), so this VM is ready
for templating. We could also introduce some kind of content to the Apache web
server (create a sample index.html file and place it in /var/www/html), but
we're not going to do that right now. In production environments, we'd just copy
web page contents to that directory and be done with it.
270 Creating and Modifying VM Disks, Templates, and Snapshots
4. Now that the required LAMP settings are configured the way we want them, shut
down the VM and run the virt-sysprep command to seal it. If you want to
expire the root password (translation—force a change of the root password on the
next login), type in the following command:
passwd --expire root
Our test VM is called LAMP and the host is called PacktTemplate, so here are the
necessary steps, presented via a one-line command:
Our LAMP VM is now ready to be reconfigured as template. For that, we will use the
virt-sysprep command.
What is virt-sysprep?
This is a command-line utility provided by the libguestfs-tools-c package to ease
the sealing and generalizing procedure of Linux VM. It prepares a Linux VM to become
a template or clone by removing system-specific information automatically so that clones
can be made from it. virt-sysprep can be used to add some additional configuration
bits and pieces—such as users, groups, SSH keys, and so on.
最新资料最新资料
There are two ways to invoke virt-sysprep against a Linux VM: using the -d or -a
option. The first option points to the intended guest using its name or universally unique
identifier (UUID), and the second one points to a particular disk image. This gives us
the flexibility to use the virt-sysprep command even if the guest is not defined in
libvirt.
Once the virt-sysprep command is executed, it performs a bunch of sysprep
operations that make the VM image clean by removing system-specific information from
it. Add the --verbose option to the command if you are interested in knowing how this
command works in the background. The process can be seen in the following screenshot:
VM templating 271
最新资料最新资料
By default, virt-sysprep performs more than 30 operations. You can also choose
which specific sysprep operations you want to use. To get a list of all the available
operations, run the virt-sysprep --list-operation command. The default
operations are marked with an asterisk. You can change the default operations using the
--operations switch, followed by a comma-separated list of operations that you want
to use. See the following example:
Important note
Make sure that from now on, this VM is never started; otherwise, it will lose all
sysprep operations and can even cause problems with VMs deployed using the
thin method.
LAMP-Template, our template, is now ready to be used for future cloning processes. You
can check its settings by using the following command:
1. Create a VM and install the Windows Server 2019 operating system on it. Our VM
is going to be called WS2019SQL.
2. Install the Microsoft SQL Express software and, once it's configured the way you want,
restart the VM and launch the sysprep application. The .exe file of sysprep is
present in the C:\Windows\System32\sysprep directory. Navigate there by
entering sysprep in the run box and double-click on sysprep.exe.
3. Under System Cleanup Action, select Enter System Out-of-Box Experience
(OOBE) and click on the Generalize checkbox if you want to do system
identification number (SID) regeneration, as illustrated in the following screenshot:
最新资料最新资料
4. Under Shutdown Options, select Shutdown and click on the OK button. The
sysprep process will start after that, and when it's done, it will be shut down.
5. Rename the VM using the same procedure we used on the LAMP template,
as follows:
# virsh domrename WS2019SQL WS2019SQL-Template
Again, we can use the dominfo option to check basic information about our newly
created template, as follows:
Important note
Be careful when updating templates in the future—you need to run them,
update them, and reseal them. With Linux distributions, you won't have
many issues doing that. But serializing Microsoft Windows sysprep (start
template VM, update, sysprep, and repeating that in the future) will get you
to a situation in which sysprep will throw you an error. So, there's another
school of thought that you can use here. You can do the whole procedure as
we did it in this part of our chapter, but don't sysprep it. That way, you can
easily update the VM, then clone it, and then sysprep it. It will save you a lot
最新资料最新资料
of time.
最新资料最新资料
2. Provide a name for the resulting VM and skip all other options. Click on the Clone
button to start the deployment. Wait till the cloning operation finishes.
3. Once it's finished, your newly deployed VM is ready to use and you can start using
it. You can see the output from the process in the following screenshot:
2. Verify that the backing file attribute for the newly created qcow2 images is pointing
correctly to the /var/lib/libvirt/images/WS2019SQL.qcow2 image,
using the qemu-img command. The end result of these three procedures should
look like this:
最新资料最新资料
4. By using the uuidgen -r command, generate two random UUIDs. We will need
them for our VMs. The process can be seen in the following screenshot:
5. Edit the SQL1.xml and SQL2.xml files by assigning them new VM names and
UUIDs. This step is mandatory as VMs have to have unique names and UUIDs.
Let's change the name in the first XML file to SQL1, and the name in the second
XML file to SQL2. We can achieve that by changing the <name></name>
statement. Then, copy and paste the UUIDs that we created with the uuidgen
command in the SQL1.xml and SQL2.xml <uuid></uuid> statement. So,
relevant entries for those two lines in our configuration files should look like this:
Figure 8.19 – Changing the VM name and UUID in their respective XML configuration files
6. We need to change the virtual disk location in our SQL1 and SQL2 image files. Find
最新资料最新资料
entries for .qcow2 files later in these configuration files and change them so that
they use the absolute path of files that we created in Step 1, as follows:
Figure 8.20 – Changing the VM image location so that it points to newly created linked clone images
7. Now, import these two XML files as VM definitions by using the virsh create
command, as follows:
Figure 8.21 – Creating two new VMs from XML definition files
280 Creating and Modifying VM Disks, Templates, and Snapshots
8. Use the virsh command to verify if they are defined and running, as follows:
最新资料最新资料
Figure 8.23 – Result of linked clone deployment: base image, small delta images
This should provide plenty of examples and info about using the linked cloning process.
Don't take it too far (many linked clones on a single base image) and you should be
fine. But now, it's time to move to our next topic, which is about virt-builder. The
virt-builder concept is very important if you want to deploy your VMs quickly – that
is, without actually installing them. We can use virt-builder repos for that. Let's learn
how to do that next.
virt-builder and virt-builder repos 281
最新资料最新资料
virt-builder provides us with a way of doing just that. By issuing a couple of simple
commands, we can import a CentOS 8 image, import it to KVM, and start it. Let's
proceed, as follows:
最新资料最新资料
Figure 8.25 – Using virt-builder to grab a CentOS 8.0 image and check its size
2. A logical next step is to do virt-install—so, here we go:
Figure 8.26 – New VM configured, deployed, and added to our local KVM hypervisor
virt-builder and virt-builder repos 283
3. If this seems cool to you, let's expand on that. Let's say that we want to take
a virt-builder image, add a yum package group called Virtualization
Host to that image, and, while we're at it, add the root's SSH key. This is what
we'd do:
In all reality, this is really, really cool—it makes our life much easier, does quite a bit of
work for us, and does it in a pretty simple way, and it works with Microsoft Windows
operating systems as well. Also, we can use custom virt-builder repositories to
download specific VMs that are tailored to our own needs, as we're going to learn next.
virt-builder repositories
Obviously, there are some pre-defined virt-builder repositories (http://
libguestfs.org/ is one of them), but we can also create our own. If we go to
the /etc/virt-builder/repos.d directory, we'll see a couple of files there
(libguestfs.conf and its key, and so on). We can easily create our own additional
configuration file that will reflect our local or remote virt-builder repository. Let's
say that we want to create a local virt-builder repository. Let's create a config file
called local.conf in the /etc/virt-builder/repos.d directory, with the
following content:
[local]
uri=file:///root/virt-builder/index
284 Creating and Modifying VM Disks, Templates, and Snapshots
Then, copy or move an image to the /root/virt-builder directory (we will use our
centos-8.0.img file created in the previous step, which we will convert to xz format
by using the xz command), and create a file called index in that directory, with the
following content:
[Packt01]
name=PacktCentOS8
osinfo=centos8.0
arch=x86_64
file=centos-8.0.img.xz
checksum=ccb4d840f5eb77d7d0ffbc4241fbf4d21fcc1acdd3679
c13174194810b17dc472566f6a29dba3a8992c1958b4698b6197e6a1689882
b67c1bc4d7de6738e947f
format=raw
size=8589934592
compressed_size=1220175252
notes=CentOS8 with KVM and SSH
You can clearly see that our Packt01 image is at the top of our list, and we can easily
use it to deploy new VMs. By using additional repositories, we can greatly enhance our
workflow and reuse our existing VMs and templates to deploy as many VMs as we want
to. Imagine what this, combined with virt-builder's customization options, does for
cloud services on OpenStack, Amazon Web Services (AWS), and so on.
The next topic on our list is related to snapshots, a hugely valuable and misused VM
concept. Sometimes, you have concepts in IT that can be equally good and bad, and
snapshots are the usual suspect in that regard. Let's explain what snapshots are all about.
Snapshots
A VM snapshot is a file-based representation of the system state at a particular point in
time. The snapshot includes configuration and disk data. With a snapshot, you can revert
a VM to a point in time, which means by taking a snapshot of a VM, you preserve its state
and can easily revert to it in the future if needed.
Snapshots have many use cases, such as saving a VM's state before a potentially
destructive operation. For example, suppose you want to make some changes on your
existing web server VM, which is running fine at the moment, but you are not certain if
the changes you are planning to make are going to work or will break something. In that
case, you can take a snapshot of the VM before doing the intended configuration changes,
最新资料最新资料
and if something goes wrong, you can easily revert to the previous working state of the
VM by restoring the snapshot.
libvirt supports taking live snapshots. You can take a snapshot of a VM while the guest
is running. However, if there are any input/output (I/O)-intensive applications running
on the VM, it is recommended to shut down or suspend the guest first to guarantee a
clean snapshot.
There are mainly two classes of snapshots for libvirt guests: internal and external; each
has its own benefits and limitations, as detailed here:
In this section, you'll learn how to create, delete, and restore internal snapshots (offline/
online) for a VM. You'll also learn how to use virt-manager to manage internal
snapshots.
Internal snapshots work only with qcow2 disk images, so first make sure that the VM
for which you want to take a snapshot uses the qcow2 format for the base disk image. If
not, convert it to qcow2 format using the qemu-img command. An internal snapshot is
a combination of disk snapshots and the VM memory state—it's a kind of checkpoint to
which you can revert easily when needed.
I am using a LAMP01 VM here as an example to demonstrate internal snapshots. The
LAMP01 VM is residing on a local filesystem-backed storage pool and has a qcow2 image
acting as a virtual disk. The following command lists the snapshot associated with the VM:
As can be seen, currently, there are no existing snapshots associated with the VM; the
LAMP01 virsh snapshot-list command lists all of the available snapshots for
the given VM. The default information includes the snapshot name, creation time, and
domain state. There is a lot of other snapshot-related information that can be listed by
passing additional options to the snapshot-list command.
The following is a simple example of creating a snapshot. Running the following command
will create an internal snapshot for the LAMP01 VM:
By default, a newly created snapshot gets a unique number as its name. To create a
snapshot with a custom name and description, use the snapshot-create-as
command. The difference between these two commands is that the latter one allows
configuration parameters to be passed as an argument, whereas the former one does not.
It only accepts XML files as the input. We are using snapshot-create-as in this
chapter as it's more convenient and easy to use.
288 Creating and Modifying VM Disks, Templates, and Snapshots
With the --atomic option specified, libvirt will make sure that no changes happen
if the snapshot operation is successful or fails. It's always recommended to use the
--atomic option to avoid any corruption while taking the snapshot. Now, check the
snapshot-list output here:
Our first snapshot is ready to use and we can now use it to revert the VM's state if
something goes wrong in the future. This snapshot was taken while the VM was in a
running state. The time to complete snapshot creation depends on how much memory
the VM has and how actively the guest is modifying that memory at the time.
最新资料最新资料
Note that the VM goes into paused mode while snapshot creation is in progress; therefore,
it is always recommended you take the snapshot while the VM is not running. Taking a
snapshot from a guest that is shut down ensures data integrity.
Here, we used the --parent switch, which prints the parent-children relation of
snapshots. The first snapshot's parent is (null), which means it was created directly on
the disk image, and Snapshot1 is the parent of Snapshot2 and Snapshot2 is the
parent of Snapshot3. This helps us know the sequence of snapshots. A tree-like view
of snapshots can also be obtained using the --tree option, as follows:
Now, check the state column, which tells us whether the particular snapshot is live
or offline. In the preceding example, the first and second snapshots were taken while the
VM was running, whereas the third was taken when the VM was shut down.
Restoring to a shutoff-state snapshot will cause the VM to shut down. You can also use
the qemu-img command utility to get more information about internal snapshots—for
example, the snapshot size, snapshot tag, and so on. In the following example output, you
can see that the disk named as LAMP01.qcow2 has three snapshots with different tags.
最新资料最新资料
This also shows you when a particular snapshot was taken, with its date and time:
This can also be used to check the integrity of the qcow2 image using the check switch,
as follows:
If any corruption occurred in the image, the preceding command will throw an error. A
backup from the VM should be immediately taken as soon as an error is detected in the
qcow2 image.
If you are reverting to a shutdown snapshot, then you will have to start the VM manually.
Use the --running switch with virsh snapshot-revert to get it started
最新资料最新资料
automatically.
------------------------------------------------------
Snapshot1 2020-02-05 09:00:13 +0230 running
Snapshot3 2020-02-05 09:00:43 +0230 running
Snapshot4 2020-02-05 10:17:00 +0230 shutoff
Let's now check how to do these procedures by using virt-manager, our GUI utility
for VM management.
最新资料最新资料
Then, if we want to take a snapshot, just use the + button, which will open a simple wizard
so that we can give the snapshot a name and description, as illustrated in the following
screenshot:
最新资料最新资料
If something goes wrong, you can simply discard the overlay_image image and you
are back to the original state.
Snapshots 293
With external disk snapshots, the backing_file image can be any disk image (raw;
qcow; even vmdk) unlike internal snapshots, which only support the qcow2 image format.
4 WS2019SQL-Template running
You can take an external snapshot while a VM is running or when it is shut down.
Both live and offline snapshot methods are supported.
2. Create a VM snapshot via virsh, as follows:
# virsh snapshot-create-as WS2019SQL-Template snapshot1
"My First Snapshot" --disk-only --atomic
The --disk-only parameter creates a disk snapshot. This is used for integrity
and to avoid any possible corruption.
3. Now, check the snapshot-list output, as follows:
# virsh snapshot-list WS2019SQL-Template
Name Creation Time State
--------------------------------------------------------
--
snapshot1 2020-02-10 10:21:38 +0230 disk-snapshot
294 Creating and Modifying VM Disks, Templates, and Snapshots
4. Now, the snapshot has been taken, but it is only a snapshot of the disk's state; the
contents of memory have not been stored, as illustrated in the following screenshot:
# virsh snapshot-info WS2019SQL-Template snapshot1
Name: snapshot1
Domain: WS2019SQL-Template
Current: no
State: disk-snapshot
Location: external <<
Parent: -
Children: 1
Descendants: 1
Metadata: yes
5. Now, list all the block devices associated with the VM once again, as follows:
# virsh domblklist WS2019SQL-Template
Target Source
------------------------------------
vda /var/lib/libvirt/images/WS2019SQL-Template.snapshot1
最新资料最新资料
Notice that the source got changed after taking the snapshot. Let's gather some
more information about this new image /var/lib/libvirt/images/
WS2019SQL-Template.snapshot1 snapshot, as follows:
# qemu-img info /var/lib/libvirt/images/WS2019SQL-
Template.snapshot1
image: /var/lib/libvirt/images/WS2019SQL-Template.
snapshot1
file format: qcow2
virtual size: 19G (20401094656 bytes)
disk size: 1.6M
cluster_size: 65536
backing file: /var/lib/libvirt/images/WS2019SQL-Template.
img
backing file format: raw
Important note
/var/lib/libvirt/images/WS2019SQL-Template.img is the
backing file (original disk).
/var/lib/libvirt/images/WS2019SQL-Template.
snapshot1 is the newly created overlay image, where all the writes are now
happening.
------------------------------------------------
file disk vda /snapshot_store/WS2019SQL-Template.
snapshot2
Here, we used the --diskspec option to create a snapshot in the desired location. The
option needs to be formatted in the disk[,snapshot=type][,driver=type]
[,file=name] format. This is what the parameters used signify:
Notice that this time, I added one more option: --quiesce. Let's discuss this in the
next section.
What is quiesce?
Quiesce is a filesystem freeze (fsfreeze/fsthaw) mechanism. This puts the guest
filesystems into a consistent state. If this step is not taken, anything waiting to be written
to disk will not be included in the snapshot. Also, any changes made during the snapshot
process may corrupt the image. To work around this, the qemu-guest agent needs to be
installed on—and running inside—the guest. The snapshot creation will fail with an error,
as illustrated here:
Always use this option to be on the safe side while taking a snapshot. Guest tool
最新资料最新资料
installation is covered in Chapter 5, Libvirt Storage; you might want to revisit this
and install the guest agent in your VM if it's not already installed.
We have created three snapshots so far. Let's see how they are connected with each other
to understand how an external snapshot chain is formed, as follows:
1. List all the snapshots associated with the VM, like this:
# virsh snapshot-list WS2019SQL-Template
Name Creation Time State
--------------------------------------------------------
--
snapshot1 2020-02-10 10:21:38 +0230 disk-snapshot
snapshot2 2020-02-10 11:51:04 +0230 disk-snapshot
snapshot3 2020-02-10 11:55:23 +0230 disk-snapshot
Snapshots 297
2. Check which is the current active (read/write) disk/snapshot for the VM by running
the following code:
# virsh domblklist WS2019SQL-Template
Target Source
------------------------------------------------
vda /snapshot_store/WS2019SQL-Template.snapshot3
3. You can enumerate the backing file chain of the current active (read/write) snapshot
using the --backing-chain option provided with qemu-img. --backing-
chain will show us the whole tree of parent-child relationships in a disk image
chain. Refer to the following code snippet for a further description:
# qemu-img info --backing-chain /snapshot_store/
WS2019SQL-Template.snapshot3|grep backing
backing file: /snapshot_store/WS2019SQL-Template.
snapshot2
backing file format: qcow2
backing file: /var/lib/libvirt/images/WS2019SQL-Template.
snapshot1
backing file format: qcow2 最新资料最新资料
From the preceding details, we can see the chain is formed in the following manner:
最新资料最新资料
Figure 8.33 – All snapshots made from virt-manager and libvirt commands
without additional options are internal snapshots
Snapshots 299
最新资料最新资料
Does that mean that, once an external disk snapshot is taken for a VM, there is no way to
revert to that snapshot? No—it's not like that; you can definitely revert to a snapshot but
there is no libvirt support to accomplish this. You will have to revert manually by
manipulating the domain XML file.
300 Creating and Modifying VM Disks, Templates, and Snapshots
Suppose you want to revert to snapshot2. The solution is to shut down the VM (yes—a
shutdown/power-off is mandatory) and edit its XML file to point to the snapshot2 disk
image as the boot image, as follows:
1. Locate the disk image associated with snapshot2. We need the absolute path
of the image. You can simply look into the storage pool and get the path, but the
best option is to check the snapshot XML file. How? Get help from the virsh
command, as follows:
# virsh snapshot-dumpxml WS2019SQL-Template
--snapshotname snapshot2 | grep
最新资料最新资料
3. The -r switch with qemu-img tries to repair any inconsistencies that are found
4. During the check. -r leaks repairs only cluster leaks, whereas –r all fixes all
5. Kinds of errors, with a higher risk of choosing the wrong fix or hiding
6. Corruption that has already occurred.
Let's check the information about this snapshot, as follows:
# qemu-img info /snapshot_store/WS2019SQL-Template.
snapshot2 | grep backing
backing file: /var/lib/libvirt/images/WS2019SQL-Template.
snapshot1
backing file format: qcow2
7. It is time to manipulate the XML file. You can remove the currently attached
disk from the VM and add /snapshot_store/WS2019SQL-Template.
snapshot2. Alternatively, edit the VM's XML file by hand and modify the disk
path. One of the better options is to use the virt-xml command, as follows:
# virt-xml WS2019SQL-Template --remove-device --disk
target=vda
# virt-xml --add-device --disk /snapshot_store/WS2019SQL-
最新资料最新资料
Template.snapshot2,fo
rmat=qcow2,bus=virtio
There are many options to manipulate a VM XML file with the virt-xml
command. Refer to its man page to get acquainted with it. It can also be used
in scripts.
8. Start the VM, and you are back to the state when snapshot2 was taken. Similarly,
you can revert to snapshot1 or the base image when required.
The next topic on our list is about deleting external disk snapshot which—as we
mentioned—is a bit complicated. Let's check how we can do that next.
302 Creating and Modifying VM Disks, Templates, and Snapshots
• blockcommit: Merges data with the base layer. Using this merging mechanism,
you can merge overlay images into backing files. This is the fastest method of
snapshot merging because overlay images are likely to be smaller than backing
images.
• blockpull: Merges data toward the active layer. Using this merging mechanism,
you can merge data from backing_file to overlay images. The resulting file will
always be in qcow2 format.
Next, we are going to read about merging external snapshots using blockcommit.
3. Time to merge all the snapshot images into the base image, like this:
# virsh blockcommit VM1 hda --verbose --pivot --active
Block Commit: [100 %]
Successfully pivoted
4. Now, check the current active block device in use:
# virsh domblklist VM1
Target Source
--------------------------
hda /var/lib/libvirt/images/vm1.img
Notice that now, the current active block device is the base image and all writes are
switched to it, which means we successfully merged the snapshot images into the base
image. But the snapshot-list output in the following code snippet shows that there
are still snapshots associated with the VM: 最新资料最新资料
If you want to get rid of this, you will need to remove the appropriate metadata and delete
the snapshot images. As mentioned earlier, libvirt does not have complete support for
external snapshots. Currently, it can just merge the images, but no support is available for
automatically removing snapshot metadata and overlaying image files. This has to be done
manually. To remove snapshot metadata, run the following code:
In this example, we learned how to merge external snapshots by using the blockcommit
method. Let's learn how to merge external snapshot using the blockpull method next.
3. Merge the base image into the snapshot image (base to overlay image merging),
like this:
# virsh blockpull VM2 --path /var/lib/libvirt/images/vm2.
snap1 --wait --verbose
Block Pull: [100 %]
Pull complete
We ran the merge and snapshot deletion tasks while the VM is in the running state,
without any downtime. blockcommit and blockpull can also be used to remove
a specific snapshot from the snapshot chain. See the man page for virsh to get more
information and try it yourself. You will also find some additional links in the Further
reading section of this chapter, so make sure that you go through them.
• When you take a VM snapshot, you are creating new delta copy of the VM disk,
qemu2, or a raw file, and then you are writing to that delta. So, the more data you
write, the longer it's going to take to commit and consolidate it back into the parent.
Yes—you will eventually need to commit snapshots, but it is not recommended you
go into production with a snapshot attached to the VM.
• Snapshots are not backups; they are just a picture of a state, taken at a specific point
in time, to which you can revert when required. Therefore, do not rely on it as a
最新资料最新资料
direct backup process. For that, you should implement a backup infrastructure
and strategy.
• Don't keep a VM with a snapshot associated with it for long time. As soon as you
verify that reverting to the state at the time a snapshot was taken is no longer
required, merge and delete the snapshot immediately.
• Use external snapshots whenever possible. The chances of corruption are much
lower in external snapshots when compared to internal snapshots.
• Limit the snapshot count. Taking several snapshots in a row without any cleanup
can hit VM and host performance, as qemu will have to trawl through each image
in the snapshot chain to read a new file from base_image.
• Have Guest Agent installed in the VM before taking snapshots. Certain operations
in the snapshot process can be improved through support from within the guest.
• Always use the --quiesce and --atomic options while taking snapshots.
If you're using these best practices, we are comfortable recommending using snapshots
for your benefit. They will make your life much easier and give you a point you can come
back to, without all the problems and hoopla that comes with them.
306 Creating and Modifying VM Disks, Templates, and Snapshots
Summary
In this chapter, you learned how to work with libguestfs utilities to modify VM
disks, create templates, and manage snapshots. We also looked into virt-builder and
various provisioning methodologies for our VMs, as these are some of the most common
scenarios used in the real world. We will learn even more about the concept of deploying
VMs in large numbers (hint: cloud services) in the next chapter, which is all about
cloud-init.
Questions
1. Why would we need to modify VM disks?
2. How can we convert a VM to KVM?
3. Why do we use VM templates?
4. How do we create a Linux-based template?
5. How do we create a Microsoft Windows-based template?
6. Which cloning mechanisms for deploying from template do you know of? What are
the differences between them?
7. Why do we use virt-builder?
最新资料最新资料
Further reading
Please refer to the following links for more information regarding what was covered in
this chapter:
最新资料最新资料
最新资料最新资料
Section 3:
Automation,
Customization, and
Orchestration for
KVM VMs 最新资料最新资料
In this part of the book, you will get a complete understanding of how to customize
KVM virtual machines by using cloud-init and cloudbase-init. This part also
covers how to leverage the automation capabilities of Ansible to manage and orchestrate
the KVM infrastructure.
This part of the book comprises the following chapters:
Customizing a virtual machine often seems simple enough – clone it from a template;
start; click a couple of Next buttons (or text tabs); create some users, passwords, and
groups; configure network settings... That might work for a virtual machine or two. But
what happens if we have to deploy two or three hundred virtual machines and configure
them? All of a sudden, we're faced with a mammoth task – and it's a task that will be
prone to errors if we do everything manually. We're wasting precious time while doing
that instead of configuring them in a much more streamlined, automated fashion. That's
where cloud-init comes in handy, as it can customize our virtual machines, install software
on them, and it can do it on first and subsequent virtual machine boots. So, let's discuss
cloud-init and how it can bring value to your large-scale configuration nightmares.
In this chapter, we will cover the following topics:
Basically, it fills up additional configuration prompts with settings we defined earlier. This
means that we are basically doing a full installation and creating a complete system from
scratch every time we need to deploy a new virtual machine.
The main problem is other distributions do not use Kickstart. There are similar systems
that enable unattended installations. Debian and Ubuntu use a tool/system called preseed
and are able to support Kickstart in some parts, SuSe uses AutoYaST, and there are
even a couple of tools that offer some sort of cross-platform functionality. One of them,
called Fully Automated Install ( FAI ) is able to automate installing and even the online
reconfiguration of different Linux distributions. But that still doesn't solve all of the
problems that we have. In a dynamic world of virtualization, the main goal is to deploy
as quickly as possible and to automate as much as possible, since we tend to use the same
agility when it comes to removing virtual machines from production.
Imagine this: you need to create a single application deployment to test your new
application with different Linux distributions. All of your future virtual machines will
need to have a unique identifier in the form of a hostname, a deployed SSH identity that
will enable remote management through Ansible, and of course, your application. Your
application has three dependencies – two in the form of packages that can be deployed
through Ansible, but one depends on the Linux distribution being used and has to be
tailored for that particular Linux distribution. To make things even more realistic, you
expect that you will have to periodically repeat this test, and every time you will need to
最新资料最新资料
Another approach to this problem can be using a system like Ansible – we deploy all the
systems from virtual machine templates, and then do the customization from Ansible.
This is better – Ansible is designed for a scenario just like this, but this means that we
must first create virtual machine templates that are able to support Ansible deployment,
with implemented SSH keys and everything else Ansible needs to function.
There is one problem neither of these approaches can solve, and that is the mass
deployment of machines. This is why a framework called cloud-init was designed.
Understanding cloud-init
We need to get a bit more technical in order to understand what cloud-init is and to
understand what its limitations are. Since we are talking about a way to fully automatically
reconfigure a system using simple configuration files, it means that some things need to be
prepared in advance to make this complex process user friendly.
We already mentioned virtual machine templates in Chapter 8, Creating and Modifying VM
Disks, Templates, and Snapshots. Here, we are talking about a specially configured template
that has all the elements needed to read, understand, and deploy the configuration that we
are going to provide in our files. This means that this particular image has to be prepared in
advance, and is the most complicated part of the whole system.
最新资料最新资料
Luckily, cloud-init images can be downloaded already pre-configured, and the only thing
that we need to know is which distribution we want to use. All the distributions we have
mentioned throughout this book (CentOS 7 or 8, Debian, Ubuntu, and Red Hat Enterprise
Linux 7 and 8) have images we can use. Some of them even have different versions of the
base operating system available, so we can use those if we need to. Be aware that there may
be differences between installed versions of cloud-init, especially on older images.
Why is this image important? Because it is prepared so that it can detect the cloud
system it is running under, it determines whether cloud-init should be used or should be
disabled, and after that, it reads and performs the configuration of the system itself.
Understanding cloud-init architecture 315
• The generator is the first one, and the simplest one: it will determine whether we are
even trying to run cloud-init, and based on that, whether it should enable or disable
the processing of data files. Cloud-init will not run if there are kernel command-line
directives to disable it, or if a file called /etc/cloud/cloud-init.diabled
exists. For more information on this and all the other things in this chapter, please
read the documentation (start at https://cloudinit.readthedocs.io/
en/latest/topics/boot.html) since it contains much more detail about
switches and different options that cloud-init supports and that make it tick.
• The local phase tries to find the data that we included for the boot itself, and then
it tries to create a running network configuration. This is a relatively simple task
performed by a systemd service called cloud-init-local.service, which
will run as soon as possible and will block the network until it's done. The concept
of blocking services and targets is used a lot in cloud-init initialization; the reason is
最新资料最新资料
simple – to ensure system stability. Since cloud-init procedures modify a lot of core
settings for a system, we cannot afford to let the usual startup scripts run and create
a parallel configuration that could overrun the one created by cloud-init.
• The network phase is the next one, and it uses a separate service called
cloud-init.service. This is the main service that will bring up the previously
configured network and try to configure everything we scheduled in the data files.
This will typically include grabbing all the files specified in our configuration,
extracting them, and executing other preparation tasks. Disks will also be formatted
and partitioned in this stage if such a configuration change is specified. Mount
points will also get created, including those that are dynamic and specific to a
particular cloud platform.
316 Customizing a Virtual Machine with cloud-init
• The config stage follows, and it will configure the rest of the system, applying
different parts of our configuration. It uses cloud-init modules to further configure
our template. Now that the network is configured, it can be used to add repositories
(the yum_repos or apt modules), add an SSH key (the ssh-import-id
module), and perform similar tasks in preparation for the next phase, in which
we can actually use the configuration done in this phase.
• The final stage is the part of the system boot that runs things that would probably
belong in userland – installing the packages, the configuration management plugin
deployment, and executing possible user scripts.
After all this has been done, the system will be completely configured and up and running.
The main advantage of this approach, although it seems complicated, is to have only one
image stored in the cloud, and then to create simple configuration files that will only
cover the differences between the vanilla default configuration, and the one that we need.
Images can also be relatively small since they do not contain too many packages geared
toward an end user.
Cloud-init is often used as the first stage in deploying a lot of machines that are going to
be managed by orchestration systems such as Puppet or Ansible since it provides a way to
create working configurations that include ways of connecting to each instance separately.
Every stage uses YAML as its primary data syntax, and almost everything is simply a
最新资料最新资料
list of different options and variables that get translated into configuration information.
Since we are configuring a system, we can also include almost any other type of file in the
configuration – once we can run a shell script while configuring the system, everything
is possible.
Why is all of this so important?
Cloud-init stems from a simple idea: create a single template that will define the base
content of the operating system you plan to use. Then, we create a separate, specially
formatted data file that will hold the customization data, and then combine those two
at runtime to create a new instance when you need one. You can even improve things a
bit by using a template as a base image and then create different systems as differencing
images. Trading speed for convenience in this way can mean deploying in minutes instead
of hours.
Understanding cloud-init architecture 317
• Ubuntu
• SLES/openSUSE
• RHEL/CentOS
• Fedora
• Gentoo Linux
• Debian
• Arch Linux
• FreeBSD
We enumerated all the distributions, but cloud-init, as its name suggests is also
cloud-aware, which means that cloud-init is able to automatically detect and use almost
any cloud environment. Running any distribution on any hardware or cloud is always
a possibility, even without something like cloud-init, but since the idea is to create a
platform-independent configuration that will be deployable on any cloud without any
reconfiguration, our system needs to automatically account for any differences between
最新资料最新资料
different cloud infrastructures. On top of that, cloud-init can be used for bare-metal
deployment, even if it isn't specifically designed for it, or to be more precise, even if it
is designed for a lot more than that.
Important note
Being cloud-aware means that cloud-init gives us tools to do post-deployment
checks and configuration changes, another extremely useful option.
This all sounds a lot more theoretical than it should be. In practice, once you start
using cloud-init and learn how to configure it, you will start to create a virtual machine
infrastructure that will be almost completely independent of the cloud infrastructure you
are using. In this book, we are using KVM as the main virtualization infrastructure, but
cloud-init works with any other cloud environment, usually without any modification.
Cloud-init was initially designed to enable easy deployment on Amazon AWS but it has
long since transcended that limitation.
318 Customizing a Virtual Machine with cloud-init
Also, cloud-init is aware of all the small differences between different distributions, so all
the things you put in your configuration file will be translated into whatever a particular
distribution uses to accomplish a particular task. In that sense, cloud-init behaves a lot like
Ansible – in essence, you define what needs to be done, not how to do it, and cloud-init
takes that and makes it happen.
booted, there is a large amount of data created by the system about how the boot was
done, what actual cloud configuration the system is running on, and what was done in
regard to customization. Any of your applications and scripts can then rely on this data
and use it to run and detect certain configuration and deployment parameters. Check
out this example, taken from a virtual machine in Microsoft Azure, running Ubuntu:
Cloud-init images
In order to use cloud-init at boot time, we first need a cloud image. At its core, it is
basically a semi-installed system that contains specially designed scripts that support
cloud-init installation. On all distributions, these scripts are part of a package called
cloud-init, but images are usually more prepared than that since they try to negotiate
a fine line between size and convenience of installation.
In our examples, we are going to use the ones available at the following URLs:
• https://cloud.centos.org/
• https://cloud-images.ubuntu.com/
In all the examples we are going to work with, the main intention is to show how the
system works on two completely different architectures with minimal to no modifications.
Under normal circumstances, getting the image is all you need to be able to run
cloud-init. Everything else is handled by the data files.
For example, these are some of the available images for the CentOS distribution:
最新资料最新资料
Notice that images cover almost all of the releases of the distribution, so we can simply
test our systems not only on the latest version but on all the other versions available. We
can freely use all of these images, which is exactly what we are going to do a bit later
when we start with our examples.
configuration. A good example of this is the hostname, part of the metadata that will end
up as the actual hostname on the individual machine. This file is populated by cloud-init
and can be accessed through the command line or directly.
When creating any file in the configuration, we can use any file format available, and we
are able to compress the files if needed – cloud-init will decompress them before it runs. If
we need to pass the actual files into the configuration, there is a limitation though – files
need to be encoded as text and put into variables in a YAML file, to be used and written
later on the system we are configuring. Just like cloud-init, YAML syntax is declarative –
this is an important thing to remember.
Now, let's learn how we pass metadata and user data to cloud-init.
Passing metadata and user data to cloud-init 321
The most complicated part is defining how and what we need to configure when booting.
All of this is accomplished on a machine that is running the cloud-utils package for a
given distribution.
At this point, we need to make a point about the two different packages that are used in all
the distributions to enable cloud-init support: 最新资料最新资料
The main difference between these packages is the computer we are installing them on.
cloud-init is to be installed on the computer we are configuring and is part of the
deployment image. cloud-utils is the package intended to be used on the computer
that will create the configuration.
In all the examples and all the configuration steps in this chapter, we are in fact referring
to two different computers/servers: one that can be considered primary, and the one that
we are using in this chapter – unless we state otherwise – is the computer that we use to
create the configuration for cloud-init deployment. This is not the computer that is going
to be configured using this configuration, just a computer that we use as a workstation
to prepare our files.
322 Customizing a Virtual Machine with cloud-init
In this simplified environment, this is the same computer that runs the entire KVM
virtualization and is used both to create and deploy virtual machines. In a normal setup,
we would probably create our configuration on a workstation that we work on and deploy
to some kind of KVM-based host or cluster. In that case, every step that we present in this
chapter basically remains the same; the only difference is the place that we deploy to, and
the way that the virtual machine is invoked for the first boot.
We will also note that some virtualization environments, such as OpenStack, oVirt, or
RHEV-M, have direct ways to communicate with a cloud-init enabled template. Some of
them even permit you to directly reconfigure the machine on first boot from a GUI, but
that falls way out of the scope of this book.
The next topic on our list is cloud-init modules. Cloud-init uses modules for
a reason – to extend its range of available actions it can take in the virtual machine
boot phase. There are dozens of cloud-init modules available – SSH, yum, apt, setting
hostname, password, locale, and creating users and groups, to name a few. Let's
check how we can use them.
universal configuration demands, such as this package needs to be installed into actual shell
commands on a particular system. The way this is done is through modules. Modules are
logical units that break down different functionalities into smaller groups and enable us
to use different commands. You can check the list of all available modules at the following
link: https://cloudinit.readthedocs.io/en/latest/topics/modules.
html. It's quite a list, which will just further show you how well developed cloud-init is.
As we can see from the list, some of the modules, such as, for example, Disk setup or
Locale, are completely platform-independent while some, for example, Puppet, are
designed to be used with a specific software solution and its configuration, and some are
specific to a particular distribution or a group of distributions, like Yum Add Repo or
Apt Configure.
This can seem to break the idea of a completely distribution-agnostic way to deploy
everything, but you must remember two things – cloud-init is first and foremost
cloud-agnostic, not distribution-agnostic, and distributions sometimes have things that
are way too different to be solved with any simple solution. So, instead of trying to be
everything at once, cloud-init solves enough problems to be useful, and at the same time
tries not to create new ones.
Examples on how to use a cloud-config script with cloud-init 323
Important note
We are not going to deal with particular modules one by one since it would
make this chapter too long and possibly turn it into a book on its own. If you
plan on working with cloud-init, consult the module documentation since it
will provide all the up-to-date information you need.
Note that the actual size on the disk is different – qemu-img gives us 679 MB and 2.2 GB
versus roughly 330 MB and 680 MB of actual disk usage:
Figure 9.4 – Image size via qemu-img differs from the real virtual image size
We can now do a couple of everyday administration tasks on these images – grow
them, move them to the correct directory for KVM, use them as a base image, and
then customize them via cloud-init:
1. Let's make these images bigger, just so that we can have them ready for future
capacity needs (and practice):
最新资料最新资料
Figure 9.5 – Growing the Ubuntu and CentOS maximum image size to 10 GB via qemu-img
After growing our images, note that the size on the disk hasn't changed much:
Figure 9.6 – The real disk usage has changed only slightly
Examples on how to use a cloud-config script with cloud-init 325
The next step is to prepare our environment for the cloud-image procedure so that
we can enable cloud-init to do its magic.
2. The images that we are going to use are going to be stored in /var/lib/
libvirt/images:
This file is all it takes to name the machine the way we want and is written in a
normal YAML notation. We do not need anything else, so this file essentially
becomes a one-liner. Then we need an SSH key pair and we need to get it into the
configuration. We need to create a file called user-data that will look like this:
#cloud-config
users:
- name: cloud
ssh-authorized-keys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCZh6
6Gf1lNuMeenGywifUSW1T16uKW0IXnucNwoIynhymSm1fkTCqyxLk
ImWbyd/tDFkbgTlei3qa245Xwt//5ny2fGitcSa7jWvkKvTLiPvxLP
0CvcvGR4aiV/2TuxA1em3JweqpNppyuapH7u9q0SdxaG2gh3uViYl
/+8uuzJLJJbxb/a8EK+szpdZq7bpLOvigOTgMan+LGNlsZc6lqE
VDlj40tG3YNtk5lxfKBLxwLpFq7JPfAv8DTMcdYqqqc5PhRnnKLak
SUQ6OW0nv4fpa0MKuha1nrO72Zyur7FRf9XFvD+Uc7ABNpeyUTZVI
j2dr5hjjFTPfZWUC96FEh [email protected]
sudo: ['ALL=(ALL) NOPASSWD:ALL']
groups: users
shell: /bin/bash
runcmd: 最新资料最新资料
Note that the file must follow the way YAML defines everything including the
variables. Pay attention to the spaces and newlines, as the biggest problems with
deployment come from misplaced newlines in the configuration.
There is a lot to parse here. We are creating a user that uses the username cloud.
This user will not be able to log in using a password since we are not creating one,
but we will enable login using SSH keys associated with the local root account,
which we will create by using the ssh-keygen command. This is just an example
SSH key, and SSH key that you're going to use might be different. So, as root, go
through the following procedure:
Examples on how to use a cloud-config script with cloud-init 327
最新资料最新资料
Figure 9.10 – SSH keygen procedure done, SSH keys are present and accounted for
Keys are stored in the local .ssh directory, so we just need to copy them. When
we are doing cloud deployments, we usually use this method of authentication, but
cloud-init enables us to define any method of user authentication. It all depends
on what we are trying to do and whether there are security policies in place that
enforce one authentication method over another.
In the cloud environments, we will rarely define users that are able to log in
with a password, but for example, if we are deploying bare-metal machines for
workstations, we will probably create users that use normal passwords. When we
create a configuration file like this, it is standard practice to use hashes of passwords
instead of literal cleartext passwords. The directive you are looking for is probably
passwd: followed by a string containing the hash of a password.
328 Customizing a Virtual Machine with cloud-init
Next, we configured sudo. Our user needs to have root permissions since there are
no other users defined for this machine. This means they need to be a member of
the sudo group and have to have the right permissions defined in the sudoers
file. Since this is a common setting, we only need to declare the variables, and
cloud-init is going to put the settings in the right files. We will also define a
user shell.
In this file, we can also define all the other users' settings available on Linux, a
feature that is intended to help deploy user computers. If you need any of those
features, check the documentation available here: https://cloudinit.
readthedocs.io/en/latest/topics/modules.html#users-and-
groups. All the extended user information fields are supported.
The last thing we are doing is using the runcmd directive to define what will
happen after the installation finishes, in the last stage. In order to permit the user to
log in, we need to put them on the list of allowed users in the sshd and we need to
restart the service.
Now we are ready for our first deployment.
5. We have three files in our directory: a hard disk that uses a base file with the cloud
template, a meta-data file that contains just minimal information that is essential
最新资料最新资料
for our deployment, and user-data, which contains our definitions for our user.
We didn't even try to install or copy anything; this install is as minimal as it gets, but
in a normal environment this is a regular starting point, as a lot of deployments are
intended only to bring our machine online, and then do the rest of the installation
by using other tools. Let's move to the next step.
We need a way to connect the files we just created, the configuration, with the
virtual machine. Usually, this is done in a couple of ways. The simplest way is
usually to generate a .iso file that contains the files. Then we just mount the file as
a virtual CD-ROM when we create the machine. On boot, cloud-init will look for
the files automatically.
Another way is to host the files somewhere on the network and grab them when we
need them. It is also possible to combine these two strategies. We will discuss this
a little bit later, but let's finish our deployment first. The local .iso image is the
way we are going to go on this deployment. There is a tool called genisoimage
(provided by the package with the same name) that is extremely useful for this (the
following command is a one-line command):
genisoimage -output deploy-1-cidata.iso -volid cidata
-joliet -rock user-data meta-data
Examples on how to use a cloud-config script with cloud-init 329
What we are doing here is creating an emulated CD-ROM image that will follow the
ISO9660/Joliet standard with Rock Ridge extensions. If you have no idea what we
just said, ignore all this and think about it this way – we are creating a file that will
hold our metadata and user data and present itself as a CD-ROM:
最新资料最新资料
Figure 9.12 – ISO is created and we are ready to start a cloud-init deployment
Please note that images are taken post deployment, so the size of disk can vary
wildly based on your configuration. This was all that was needed in the form of
preparations. All that's left is to spin up our virtual machine.
Now, let's start with our deployments.
Although it may look complicated, if you came to this part of the book after reading its
previous chapters, there should be nothing you haven't seen yet. We are using KVM,
creating a name for our domain (virtual machine), we are going to give it 1 CPU and
2 GB of RAM. We are also telling KVM we are installing a generic Linux system. We
already created our hard disk, so we are mounting it as our primary drive, and we are
also mounting our .iso file to serve as a CD-ROM. Lastly, we will connect our virtual
machine to the default network:
最新资料最新资料
Figure 9.14 – The cloud-init.log file, used to check what cloud-init did to the operating system
Another thing is how much actually happens below the surface completely automatically.
最新资料最新资料
Since this is CentOS, cloud-init has to deal with the SELinux security contexts in real
time, so a lot of the information is simply that. There are also a lot of probes and tests
going on. Cloud-init has to establish what the running environment is and what type of
cloud it is running under. If something happens during the boot process and it in any way
involves cloud-init, this is the first place to look.
Let's now deploy our second virtual machine by using a second (Ubuntu) image. This is
where cloud-init really shines – it works with various Linux (and *BSD) distributions,
whatever they might be. We can put that to the test now.
332 Customizing a Virtual Machine with cloud-init
Figure 9.15 – Preparing our environment for another cloud-init-based virtual machine deployment
What do we need to do? We need to copy both meta-data and user-data to the
new folder. We need to edit the metadata file since it has the hostname inside it, and we
want our new machine to have a different hostname. As for user-data, it is going to be
completely the same as on our first virtual machine. Then we need to create a new disk
and resize it:
最新资料最新资料
Figure 9.16 – Growing our virtual machine image for deployment purposes
We are creating a virtual machine from our downloaded image, and just allowing for more
space as the image is run. The last step is to start the machine:
The command line is almost exactly the same, only the names change:
Figure 9.19 – Using SSH to verify whether we can connect to our virtual machine
334 Customizing a Virtual Machine with cloud-init
As we can see, the connection to our virtual machine works without any problems.
One more thing is to check the deployment log. Note that there is no mention of
configuring SELinux since we are running on Ubuntu:
最新资料最新资料
Figure 9.20 – The Ubuntu cloud-init log file has no mention of SELinux
Just for fun, let's do another deployment with a twist – let's use a module to deploy a
software package.
This time we are adding another section (packages) to the configuration, so that we can
tell cloud-init that we need a package to be installed (httpd):
Figure 9.21 – Cloud-init configuration file for the third virtual machine deployment
Since all the steps are more or less the same, we get the same result – success:
最新资料最新资料
Figure 9.22 – Repeating the deployment process for the third virtual machine
We should wait for a while so that the VM gets deployed. After that, let's log in and check
whether the image deployed correctly. We asked for httpd to be installed during the
deployment. Was it?
ready. Both of them are empty but very important. The fact that they exist signifies that
cloud-init is enabled, and that network has been configured and is working. If the files
are not there, something went wrong and we need to go back and debug. More about
debugging can be found at https://cloudinit.readthedocs.io/en/latest/
topics/debugging.html.
The results.json file holds this particular instance metadata. status.json is more
concentrated on what happened when the whole process was running, and it provides info
on possible errors, the time it took to configure different parts of the system, and whether
everything was done.
Both those files are intended to help with the configuration and orchestration, and, while
some things inside these files are important only to cloud-init, the ability to detect and
interact with different cloud environments is something that other orchestration tools
can use. Files are just a part of it.
Another big part of this scheme is the command-line utility called cloud-init. To get
information from it, we first need to log in to the machine that we created. We are going to
show the differences between machines that were created by the same file, and at the same
time demonstrate similarities and differences between distributions.
Examples on how to use a cloud-config script with cloud-init 337
Before we start talking about this, be aware that cloud-init, as with all Linux software,
comes in different versions. CentOS 7 images use an old version, 0.7.9:
Realistically, a lot of things are missing for CentOS, some of them completely:
Figure 9.29 – After a bit of yum update, an up-to-date list of cloud-init features
As we can see, this will make things a lot easier to work with.
Examples on how to use a cloud-config script with cloud-init 339
We are not going to go into too much detail about the cloud-init CLI tool, since there
is simply too much information available for a book like this, and as we can see, new
features are being added quickly. You can freely check additional options by browsing at
https://cloudinit.readthedocs.io/en/latest/topics/cli.html. In
fact, they are being added so quickly that there is a devel option that holds new features
while they are in active development. Once they are finished, they become commands of
their own.
There are two commands that you need to know about, both of which give an enormous
amount of information about the boot process and the state of the booted system. The
first one is cloud-init analyze. It has two extremely useful subcommands: blame
and show.
The aptly named blame is actually a tool that returns how much time was spent on things
that happened during different procedures cloud-init did during boot. For example, we
can see that configuring grub and working with the filesystem was the slowest operation
on Ubuntu:
最新资料最新资料
The third virtual machine that we deployed uses CentOS image and we added httpd
to it. By extension, it was by far the slowest thing that happened during the cloud-init
process:
最新资料最新资料
Figure 9.31 – Checking time consumption – it took quite a bit of time for
cloud-init to deploy the necessary httpd packages
Examples on how to use a cloud-config script with cloud-init 341
A tool like this makes it easier to optimize deployments. In our particular case, almost
none of this makes sense, since we deployed simple machines with almost no changes to
the default configuration, but being able to understand why the deployment is slow is a
useful, if not essential, thing.
Another useful thing is being able to see how much time it took to actually boot the
virtual machine:
After working with it for even a few hours, cloud-init becomes one of those indispensable
tools for a system administrator. Of course, its very essence means it will be much more
suited to those of us who have to work in the cloud environment, because the thing it
does best is the quick and painless deployment of machines from scripts. But even if you
are not working with cloud technologies, the ability to quickly create instances that you
can use for testing, and then to remove them without any pain, is something that every
administrator needs.
Summary
In this chapter, we covered cloud-init, its architecture, and the benefits in larger deployment
scenarios, where configuration consistency and agility are of utmost importance. Pair that
with the paradigm change in which we don't do everything manually – we have a tool that
does it for us – and it's an excellent addition to our deployment processes. Make sure that
you try to use it as it will make your life a lot easier, while preparing you for using cloud
virtual machines, where cloud-init is extensively used.
In the next chapter, we're going to learn how to expand this usage model to Windows
virtual machines by using cloudbase-init.
Questions
最新资料最新资料
1. Recreate our setup using CentOS 7 and Ubuntu base cloud-init images.
2. Create one Ubuntu and two CentOS instances using the same base image.
3. Add a fourth virtual machine using Ubuntu as a base image.
4. Try using some other distribution as a base image without changing any of the
configuration files. Give FreeBSD a try.
5. Instead of using SSH keys, use predefined passwords. Is this more or less secure?
6. Create a script that will create 10 identical instances of a machine using cloud-init
and a base image.
7. Can you find any reason why it would be more beneficial to use
a distribution-native way of installing machines instead of using cloud-init?
Further reading 343
Further reading
Please refer to the following links for more information regarding what was covered in
this chapter:
最新资料最新资料
最新资料最新资料
10
Automated Windows
Guest Deployment
and Customization 最新资料最新资料
Now that we have covered the different ways of deploying Linux-based Virtual Machines
(VMs) in KVM, it's time to switch our focus to Microsoft Windows. Specifically, we'll
work on Windows Server 2019 machines running on KVM, and cover prerequisites and
different scenarios for the deployment and customization of Windows Server 2019 VMs.
This book isn't based on the idea of Virtual desktop infrastructure (VDI) and desktop
operating systems, which require a completely different scenario, approach, and technical
implementation than virtualizing server operating systems.
In this chapter, we will cover the following topics:
Let's start from scratch. We are going to create a Windows Server 2019 VM in this chapter.
Version selection was made to keep in touch with the most recent release of Microsoft
server operating systems on the market. Our goal will be to deploy a Windows Server
2019 VM template that we can use later for more deployments and cloudbase-init,
and the tool of choice for this installation process is going to be virt-install. If you
need to install an older version (2016 or 2012), you need to know two facts:
If you want to use Virtual Machine Manager to deploy Windows Server 2019, make
sure that you configure the VM properly. That includes selecting the correct ISO file
for the guest operating system installation, and connecting another virtual CD-ROM
for virtio-win drivers so that you can install them during the installation process.
Make sure that your VM has enough disk space on the local KVM host (60 GB+ is
recommended), and that it has enough horsepower to run. Start with two virtual CPUs
and 4 GB of memory, as this can easily be changed later.
The next step in our scenario is to create a Windows VM that we'll use throughout this
chapter to customize via cloudbase-init. In a real production environment, we need
to do as much configuration in it as possible – driver installation, Windows updates,
commonly used applications, and so on. So, let's do that first.
Creating Windows VMs using the virt-install utility 347
Then, it's time to start deploying our VM. Here are our settings:
When the installation process starts, we have to click Next a couple of times before we
reach the configuration screen where we can select the disk where we want to install our
guest operating system. On the bottom of that screen to the left, there's a button called
Load driver, which we can now use, repeatedly, to install all of the necessary virtio-
win drivers. Make sure that you untick the Hide drivers that aren't compatible with
this computer's hardware checkbox. Then, add the following drivers one by one, from a
specified directory, and select them with your mouse:
After that, click Next and wait for the installation process to finish.
You might be asking yourself: why did we micro-manage this so early in the installation
process, when we could've done this later? The answer is two-fold – if we did it later, we'd
have the following problems:
• There's a chance – at least for some operating systems – that we won't have all the
necessary drivers loaded before the installation starts, which might mean that the
installation will crash.
• We'd have loads of yellow exclamation marks in Device Manager, which is usually
annoying to people.
Creating Windows VMs using the virt-install utility 349
Being as it is after deployment, our device manager is happy and the installation was
a success:
最新资料最新资料
Figure 10.1 – The operating system and all drivers installed from the get-go
The only thing that's highly recommended post installation is that we install the guest
agent from virtio-win.iso after we boot our VM. You will find an .exe file on the
virtual CD-ROM, in guest-agent directory, and you just need to click the Next button
until the installation is complete.
Now that our VM is ready, we need to start thinking about customization. Specifically,
large-scale customization, which is a normal usage model for VM deployments in the
cloud. This is why we need to use cloudbase-init, which is our next step.
350 Automated Windows Guest Deployment and Customization
• It can execute custom commands and scripts, most commonly coded in PowerShell,
although regular CMD scripts are also supported.
• It can work with PowerShell remoting and the Windows Remote Management
最新资料最新资料
(WinRM) service.
• It can manage and configure disks, for example, to do a volume expansion.
• It can do basic administration, including the following:
a) Creating users and passwords
b) Setting up a hostname
c) Configuring static networking
d) Configuring MTU size
e) Assigning a license
f) Working with public keys
g) Synchronizing clocks
Customizing Windows VMs using cloudbase-init 351
We mentioned earlier that our Windows Server 2019 VM is going to be used for
cloudbase-init customization, so that's our next subject. Let's prepare our VM for
cloudbase-init. We are going to achieve that by downloading the cloudbase-
init installer and installing it. We can find the cloudbase-init installer by pointing
our internet browser at https://cloudbase-init.readthedocs.io/en/
latest/intro.html#download. The installation is simple enough, and it can work
both in a regular, GUI fashion and silently. If you're used to using Windows Server Core
or prefer silent installation, you can use the MSI installer for silent installation by using
the following command:
Make sure that you check the cloudbase-init documentation for further
configuration options as the installer supports additional runtime options. It's located at
https://cloudbase-init.readthedocs.io/en/latest/.
Let's stick with the GUI installer as it's simpler to use, especially for a first-time user. First,
the installer is going to ask about the license agreement and installation location – just the
usual stuff. Then, we're going to get the following options screen:
最新资料最新资料
最新资料最新资料
• bin: The location where some of the binary files are installed, such as elevate,
bsdtar, mcopy, mdir, and so on.
• conf: The location of three main configuration files that we're going to work with,
which is discussed a bit later.
• LocalScripts: The default location for PowerShell and similar scripts that we
want to run post-boot.
• Log: The location where we'll store the cloudbase-init log files by default so
that we can debug any issues.
• Python: The location where local installation of Python is deployed so that we can
also use Python for scripting.
Let's focus on the conf directory, which contains our configuration files:
• cloudbase-init.conf
最新资料最新资料
• cloudbase-init-unattend.conf
• unattend.xml
The way that cloudbase-init works is rather simple – it uses the unattend.
xml file during the Windows sysprep phase to execute cloudbase-init with the
cloudbase-init-unattend.conf configuration file. The default cloudbase-
init-unattend.conf configuration file is easily readable, and we can use the example
provided by the cloudbase-init project with the default configuration file explained
step by step:
[DEFAULT]
# Name of the user that will get created, group for that user
username=Admin
groups=Administrators
firstlogonbehaviour=no
inject_user_password=true # Use password from the metadata
(not random).
354 Automated Windows Guest Deployment and Customization
The next part of the config file is about devices – specifically, which devices to inspect
for a possible configuration drive (metadata):
config_drive_raw_hhd=true
config_drive_cdrom=true
# Path to tar implementation from Ubuntu.
bsdtar_path=C:\Program Files\Cloudbase Solutions\Cloudbase-
Init\bin\bsdtar.exe
mtools_path= C:\Program Files\Cloudbase Solutions\Cloudbase-
Init\bin\
# Logging level
verbose=true
debug=true
# Where to store logs
logdir=C:\Program Files (x86)\Cloudbase Solutions\Cloudbase-
Init\log\
logfile=cloudbase-init-unattend.log
最新资料最新资料
default_log_levels=comtypes=INFO,suds=INFO,iso8601=WARN
logging_serial_port_settings=
The next part of the configuration file is about networking, so we'll use DHCP to get all
the networking settings in our example:
We need to configure the location where the scripts are residing, the same scripts that we
can use as a part of the cloudbase-init process:
The last part of the configuration file is about the services and plugins to be loaded, along
with some global settings, such as whether to allow the cloudbase-init service
to reboot the system or not and how we're going to approach the cloudbase-init
shutdown process (false=graceful service shutdown):
Let's just get a couple of things out of the way from the get-go. Default configuration
files already contain some settings that were deprecated, as you're going to find out
最新资料最新资料
soon enough. Specifically, settings such as verbose, logdir and logfile are
already deprecated in this release, as you can see from the following screenshot, where
cloudbase-init is complaining about those very options:
Figure 10.4 – cloudbase-init complaining about its own default configuration file options
356 Automated Windows Guest Deployment and Customization
• We want our VM to ask us to change the password post-sysprep and after the
cloudbase-init process.
• We want our VM to take all of its network settings (the IP address, netmask,
gateway, DNS servers, and NTP) from DHCP.
• We want to sysprep the VM so that it's unique to each scenario and policy.
So, let's create a cloudbase-init-unattend.conf config file that will do this for us.
最新资料最新资料
The first part of the configuration file was taken from the default config file:
[DEFAULT]
username=Admin
groups=Administrators
inject_user_password=true
config_drive_raw_hhd=true
config_drive_cdrom=true
config_drive_vfat=true
bsdtar_path=C:\Program Files\Cloudbase Solutions\Cloudbase-
Init\bin\bsdtar.exe
mtools_path= C:\Program Files\Cloudbase Solutions\Cloudbase-
Init\bin\
debug=true
default_log_levels=comtypes=INFO,suds=INFO,iso8601=WARN
logging_serial_port_settings=
mtu_use_dhcp_config=true
ntp_use_dhcp_config=true
cloudbase-init customization examples 357
As we decided to use PowerShell for all of the scripting, we created a separate directory for
our PowerShell scripts:
local_scripts_path=C:\PS1
The rest of the file was also just copied from the default configuration file:
metadata_services=cloudbaseinit.metadata.services.base.
EmptyMetadataService
plugins=cloudbaseinit.plugins.common.mtu.MTUPlugin,
cloudbaseinit.plugins.common.sethostname.
SetHostNamePlugin, cloudbaseinit.plugins.common.localscripts.
LocalScriptsPlugin,cloudbaseinit.plugins.common.userdata.
UserDataPlugin
allow_reboot=false
stop_service_on_exit=false
As for the cloudbase-init.conf file, the only change that we made was selecting the
correct local script path (reasons to be mentioned shortly), as we will use this path in our
next example:
[DEFAULT] 最新资料最新资料
username=Admin
groups=Administrators
inject_user_password=true
config_drive_raw_hhd=true
config_drive_cdrom=true
config_drive_vfat=true
Also, part of our default config file contained paths for tar, mtools, and debugging:
This part of the config file was also taken from the default config file, and we only changed
local_scripts_path so that it's set to the directory that we're using to populate with
PowerShell scripts:
first_logon_behaviour=no
default_log_levels=comtypes=INFO,suds=INFO,iso8601=WARN
logging_serial_port_settings=
mtu_use_dhcp_config=true
ntp_use_dhcp_config=true
local_scripts_path=C:\PS1
We can then go back to the cloudbase-init installation screen, check the sysprep
option, and click Finish. After starting the sysprep process and going through with it,
this is the end result:
最新资料最新资料
Figure 10.5 – When we press Sign in, we are going to be asked to change the administrator's password
cloudbase-init customization examples 359
Now, let's take this a step further and complicate things a bit. Let's say that you want to
do the same process, but with additional PowerShell code that should do some additional
configuration. Consider the following example:
• It should create another two local users called packt1 and packt2, with a
predefined password set to Pa$$w0rd.
• It should create a new local group called students, and add packt1 and packt2
to this group as members.
• It should set the hostname to Server1.
The PowerShell code that enables us to do this should have the following content:
• Sets the PowerShell execution policy to unrestricted so that our host doesn't stop
our script execution, which it would do by default.
• Creates a password variable from a plaintext string (Pa$$w0rd), which gets
converted to a secure string that we can use with the New-LocalUser PowerShell
cmdlet to create a local user.
• New-LocalUser is a PowerShell cmdlet that creates a local user. Mandatory
parameters include a username and password, which is why we created a
secure string.
• New-LocalGroup is a PowerShell cmdlet that creates a local group.
• Add-LocalGroupMember is a PowerShell cmdlet that allows us to create a new
local group and add members to it.
• Rename-Computer is a PowerShell cmdlet that changes the hostname of
a Windows computer.
360 Automated Windows Guest Deployment and Customization
We also need to call this code from cloudbase-init somehow, so we need to add
this code as script. Most commonly, we'll use a directory called LocalScripts in the
cloudbase-init installation folder for that. Let's call this script userdata.ps1,
save the content mentioned previously to it in the folder, as defined in the .conf file
(c:\PS1), and add a cloudbase-init parameter at the top of the file:
# ps1
$password = "Pa$$w0rd" | ConvertTo-SecureString -AsPlainText
-Force
New-LocalUser -name "packt1" -Password $password
New-LocalUser -name "packt2" -Password $password
New-LocalGroup -name "Students"
Add-LocalGroupMember -group "Students" -Member
"packt1","packt2"
Rename-Computer -NewName "Server1" –Restart
After starting the cloudbase-init procedure again, which can be achieved by starting
the cloudbase-init installation wizard and going through it as we did in the previous
example, here's the end result in terms of users:
最新资料最新资料
Figure 10.6 – The packt1 and packt2 users were created, and added to the group created
by our PowerShell script
We can clearly see that the packt1 and packt2 users were created, along with a group
called Students. We can then see that the Students group has two members –
packt1 and packt2. Also, in terms of setting the server name, we have the following:
Troubleshooting common cloudbase-init customization issues 361
最新资料最新资料
Figure 10.7 – Slika 1. Changing the server name via PowerShell script also works
Using cloudbase-init really isn't simple, and requires a bit of investment in terms of
time and tinkering. But afterward, it will make our job much easier – not being forced to
do pedestrian tasks such as these over and over again should be a reward enough, which is
why we need to talk a little bit about troubleshooting. We're sure that you'll run into these
issues as you ramp up your cloudbase-init usage.
Although this seems counter-intuitive, we had much more success getting cloudbase-
init to work with the latest development version instead of the latest stable one. We're
not exactly sure what the problem is, but the latest development version (at the time of
writing, this is version 0.9.12.dev125) worked for us right out of the gate. With version
0.9.11., we had massive issues with getting the PowerShell script to even start.
Apart from these issues, there are other issues that you will surely encounter as you get
to know cloudbase-init. The first one is the reboot loop. This problem is really
common, and it almost always happens because of two reasons:
Making a mistake in configuration files is something that happens often, which throws
cloudbase-init into a weird state that ends up like this:
最新资料最新资料
We've seen this situation multiple times. The real problem is the fact that sometimes
it takes hours and hours of waiting, sometimes cycling through numerous reboots,
but it's not just a regular reboot loop. It really seems that cloudbase-init is doing
something – the CMD is started, you get no errors in it or on the screen, but it keeps
doing something and then finishes like this.
Other issues that you might encounter are even more picky – for example, when
cloudbase-init fails to reset the password during the sysprep/cloudbase-init
process. This can happen if you manually change the account password that's being used
by the cloudbase-init service (hence, why using LocalSystem is a better idea).
That will lead to the failure of the whole cloudbase-init procedure, a part of which
can be a failure to reset the password.
There's an even more obscure reason why this might happen – sometimes we manually
manage system services by using the services.msc console and we deliberately disable
services that we don't immediately recognize. If you set the cloudbase-init service to
be disabled, it will fail in its process, as well. These services need to have automatic startup
priority and shouldn't be manually reconfigured to be disabled.
A failure to reset the password can also happen because of some security policies – for
example, if the password isn't complex enough. That's why we used a bit more of a
complex password in our PowerShell script, as most of us system engineers learned that
最新资料最新资料
Summary
In this chapter, we worked with Windows VM customization, a topic that's equally as
important as Linux VM customization. Maybe even more so, keeping in mind the market
share numbers and the fact that a lot of people are using Windows in cloud environments,
as well.
Now that we have covered all the bases in terms of working with VMs, templating, and
customization, it's time to introduce a different approach to additional customization
that's complementary to cloud-init and cloudbase-init. So, the next chapter
is about that approach, which is based around Ansible.
Questions
1. Which drivers do we need to install onto Windows guest operating systems so that
we can make a Windows template on the KVM hypervisor?
2. Which agent do we need to install onto Windows guest operating systems to have
better visibility into the VM's performance data?
3. What is sysprep?
4. What is cloudbase-init used for? 最新资料最新资料
Further reading
Please refer to the following links for more information:
Automation
Ansible has become the de facto standard in today's open source community because it
offers so much while asking so little of you and your infrastructure. Using Ansible with
Kernel-based Virtual Machine (KVM) also makes a lot of sense, especially when you
think about larger environments. It doesn't really matter if it's just a simple provisioning
of KVM hosts that you want to do (install libvirt and related software), or if you want to
uniformly configure KVM networking on hosts – Ansible can be invaluable for both. For
example, in this chapter, we will use Ansible to deploy a virtual machine and multi-tier
application that's hosted inside KVM virtual machines, which is a very common use case
in larger environments. Then, we'll move to more pedantic subjects of combining Ansible
and cloud-init since they differ in terms of timeline when they're applied and a way in
which things get done. Cloud-init is an ideal automatic way for initial virtual machine
configuration (hostname, network, and SSH keys). Then, we usually move to Ansible so
that we can perform additional orchestration post-initial configuration – add software
packages, make bigger changes to the system, and so on. Let's see how we can use Ansible
and cloud-init with KVM.
366 Ansible and Scripting for Orchestration and Automation
• Understanding Ansible
• Provisioning a virtual machine using the kvm_libvirt module
• Using Ansible and cloud-init for automation and orchestration
• Orchestrating multi-tier application deployment on KVM VMs
• Learning by example, including various examples on how to use Ansible with KVM
Understanding Ansible
One of the primary roles of a competent administrator is to try and automate themselves
out of everything they possibly can. There is a saying that you must do everything
manually at least once. If you must do it again, you will probably be annoyed by it, and the
third time you must do it, you will automate the process. When we talk about automation,
it can mean a lot of different things.
Let's try to explain this with an example as this is the most convenient way of describing
the problem and solution. Let's say that you're working for a company that needs to
最新资料最新资料
deploy 50 web servers to host a web application, with standard configuration. Standard
configuration includes the software packages that you need to install, the services and
network settings that need to be configured, the firewall rules that need to be configured,
and the files that need to be copied from a network share to a local disk inside a virtual
machine so that we can serve these files via a web server. How are you going to make
that happen?
There are three basic approaches that come to mind:
• Do everything manually. This will cost a lot of time and there will be ample
opportunity to do something wrong as we're humans, after all, and we make
mistakes (pun intended).
• Try to automate the process by deploying 50 virtual machines and then throwing
the whole configuration aspect into a script, which can be a part of the automated
installation procedure (for example, kickstart).
• Try to automate the process by deploying a single virtual machine template that
will contain all the moving parts already installed. This means we just need to
deploy these 50 virtual machines from a virtual machine template and do a bit
of customization to make sure that our virtual machines are ready to be used.
Understanding Ansible 367
There are different kinds of automation available. Pure scripting is one of them, and
it involves creating a script out of everything that needs to run more than once. An
administrator that has been doing a job for years usually has a batch of useful scripts.
Good administrators also know at least one programming language, even when they hate
to admit it, since being an administrator means having to fix things after others break
them, and it sometimes involves quite a bit of programming.
So, if you're considering doing automation via a script, we absolutely agree with you that
it's doable. But the question remains regarding how much time you'll spend covering
every single aspect of that script to get everything right so that the script always works
properly. Furthermore, if it doesn't, you're going to have to do a lot of manual labor to
make it right, without any real way of amending an additional configuration on top of
the previous, unsuccessful one.
This is where procedure-based tools such as Ansible come in handy. Ansible produces
modules that get pushed to endpoints (in our example, virtual machines) that bring
our object to a desired state. If you're coming from the Microsoft PowerShell world, yes,
Ansible and PowerShell Desired State Configuration (DSC) are essentially trying to do
the same thing. They just go about it in a different way. So, let's discuss these different
automatization processes to see where Ansible fits into that world.
Automation approaches
最新资料最新资料
In general, all of this applies to administering systems and their parts, installing
applications, and generally taking care of things inside the installed system. This can be
considered an old approach to administration since it generally deals with services, not
servers. At the same time, this kind of automation is decidedly focused on a single server
or a small number of servers since it doesn't scale well. If we need to work on multiple
servers, using regular scripts creates new problems. We need to take a lot of additional
variables into account (different SSH keys, hostnames, and IP addresses) since scripts
are more difficult to expand to work on multiple servers (which is easy in Ansible).
If one script isn't enough, then we have to move to multiple scripts, which creates a new
problem, one of which is script management. Think about it – what happens when we need
to change something in a script? How do we make sure that all the instances on all the
servers are using the same version, especially if the server IP addresses aren't sequential?
So, to conclude, while old and tested, this kind of automation has serious drawbacks.
368 Ansible and Scripting for Orchestration and Automation
最新资料最新资料
Agentless systems
Agentless systems behave differently. Nothing is installed on the system that has to be
managed; instead, the central server (or servers) does everything using some kind of
command and control channel. On Windows, this may be PowerShell, WinRM, or
something similar, while on Linux, this usually SSH or some other remote execution
framework. The central server creates a task that then gets executed through the remote
channel, usually in the form of a script that is copied and then started on the target
system. This is what this principle would look like:
Figure 11.2 – The management platform doesn't need an agent to connect to objects that need
orchestration and automation
Regardless of their type, these systems are usually called either automation or
configuration management systems, and although these are two de facto standards yet
completely different things, in reality, they are used indiscriminately. At the time of
writing, two of the most popular are Puppet and Ansible, although there are others
最新资料最新资料
Introduction to Ansible
Ansible is an IT automation engine – some call it an automation framework – that enables
administrators to automate provisioning, configuration management, and many everyday
tasks a system administrator may need to accomplish.
The easiest (and way too simplified) way of thinking about Ansible is that it is
a complicated set of scripts that are intended to accomplish administration tasks on
a large scale, both in terms of complexity and the sheer number of systems it can control.
Ansible runs on a simple server that has all the parts of the Ansible system installed. It
requires nothing to be installed on the machines it controls. It is safe to say that Ansible
is completely agentless and that in order to accomplish its goal, it uses different ways to
connect to remote systems and push small scripts to them.
370 Ansible and Scripting for Orchestration and Automation
This also means that Ansible has no way of detecting changes on the systems it controls;
it is completely up to the configuration script we create to control what happens if
something is not as we expect it to be.
There are a couple of things that we need to define before doing everything else – things
that we can think of as building blocks or modules. Ansible likes to call itself a radically
simple IT engine, and it only has a couple of these building blocks that enable it to work.
First, it has inventories – lists of hosts that define what hosts a certain task will be
performed on. Hosts are defined in a simple text file and can be as simple as a straight list
that contains one host per line, or as complicated as a dynamic inventory that is created as
Ansible is performing a task. We will cover these in more detail as we show how they are
used. The thing to remember is that hosts are defined in text files as there are no databases
involved (although there can be) and that hosts can be grouped, a feature that you will
use extensively.
Secondly, there's a concept called play, which we will define as a set of different tasks run
by Ansible on target hosts. We usually use a playbook to start a play, which is another type
of object in the Ansible hierarchy.
In terms of playbooks, think of them as a policy or a set of tasks/plays that are required
to do something or achieve a certain state on a particular system. Playbooks are also text
files and are specifically designed to be readable by humans and are created by humans.
最新资料最新资料
Playbooks are used to define a configuration or, to be more precise, declare it. They can
contain steps that start different tasks in an ordered manner. These steps are called plays,
hence the name playbook. The Ansible documentation is helpful in explaining this as
thinking about plays in sports where list of tasks that may be performed are provided and
need to be documented, but at the same time may not be called. The important thing to
understand here is that our playbooks can have decision-making logic inside them.
The fourth big part of the Ansible puzzle are its modules. Think of modules as small
programs that are executed on the machines you are trying to control in order to
accomplish something. There are literally hundreds of modules included with the
Ansible package, and they can be used individually or inside your playbooks.
Modules allow us to accomplish tasks, and some of them are strictly declarative. Others
return data, either as the results of the tasks the modules did, or explicit data that the
module got from a running system through a process called fact gathering. This process is
based on a module called gather_facts. Gathering correct facts about the system is one
of the most important things we can do once we've started to develop our own playbooks.
Understanding Ansible 371
in a more GUI-based approach, you can always consider buying Red Hat Ansible Tower.
Ansible Tower is a GUI-based utility that you can use to manage your Ansible-based
environments. This started as a project called AWX, which is still very much alive today.
But there are some key differences in the way in which AWX gets released versus how
Ansible Tower gets released. The main one is the fact that Ansible Tower uses specific
release versions while AWX takes a more what OpenStack used to be approach – a project
that's moving forward rather quickly and has new releases very often.
As Red Hat clearly states on https://www.ansible.com/products/
awx-project/faq, that:
"Ansible Tower is produced by taking selected releases of AWX, hardening them for long-term
supportability, and making them available to customers as Ansible Tower offerings."
372 Ansible and Scripting for Orchestration and Automation
最新资料最新资料
We don't nearly have enough space here to demonstrate how it looks and what it can be
used for, so we are just going to go through the basics of installing it and deploying the
simplest scenario.
The single-most important address we need to know about when we are talking about
AWX is https://github.com/ansible/awx. This is the place where the project
resides. The most up-to-date information is here, in readme.md, a file that is shown on
the GitHub page. If you are unfamiliar with cloning from GitHub, do not worry – we are
basically just copying from a special source that will enable you to copy only the things that
have changed since you last got your version of the files. This means that in order to update
to a new version, you only need to clone once more using the same exact command.
On the GitHub page, there is a direct link to the install instructions we are going to
follow. Remember, this deployment is from scratch, so we will need to build up our
demo machine once again and install everything that is missing.
The first thing we need to do is get the necessary AWX files. Let's clone the GitHub
repository to our local disk:
最新资料最新资料
Also, this is a good place to mention that we need at least 4 GB of RAM and 20 GB of
space on our machine in order to run AWX. This differs to the low footprint that we are
used to using with Ansible, but this makes sense since AWX is much more than just
a bunch of scripts. Let's start by installing the prerequisites.
Docker is the first one we will install. We are using CentOS 8 for this, so Docker is no
longer part of the default set of packages. Therefore, we need to add the repository and
then install the Docker engine. We are going to use the -ce package, which stands for
Community Edition. We will also use the --nobest option to install Docker – without
this option, CentOS will report that we are missing some dependencies:
The overall result should look something like this. Note that the versions of every package
on your particular installations will probably be different. This is normal as packages
change all the time: 最新资料最新资料
If you are running on a completely clean CentOS 8 installation, you might have to install
epel-release before Ansible is available.
Understanding Ansible 375
Next on our list is Python. Just using the dnf command is not going to get Python
installed as we're going to have to supply the Python version we want. For this, we would
do something like this:
最新资料最新资料
curl -L https://github.com/docker/compose/releases/
download/1.25.0/docker-compose-`uname -s`-`uname -m` -o /usr/
local/bin/docker-compose
This command will get the necessary install file from GitHub and use the necessary
input parameters (by executing uname commands) to start the installation process for
docker-compose.
376 Ansible and Scripting for Orchestration and Automation
We know this is a lot of dependencies, but AWX is a pretty complex system under the
hood. On the surface, however, things are not so complicated. Before we do the final
install part, we need to verify that our firewall has stopped and that it is disabled. We
are creating a demo environment, and firewalld will block communication between
containers. We can fix that later, once we have the system running.
Once we have everything running, installing AWX is simple. Just go to the
awx/installer directory and run the following:
The installation should take a couple of minutes. The result should be a long listing that
ends with the following:
This means that the local AWX environment has been deployed successfully.
Now, the fun part starts. AWX is comprised of four small Docker images. For it to work,
最新资料最新资料
all of them need to be configured and running. You can check them out by using docker
ps and docker logs -t awx_task.
The first command lists all the images that got deployed, as well as their status:
The second command shows us all the logs that the awx_task machine is creating.
These are the main logs for the whole system. After a while, the initial configuration
will complete:
using Ctrl + C.
After this whole process, we can point our web browser to http://localhost. We
should be greeted by a screen that looks like this:
The default username is admin, while the password is password. After logging in
successfully, we should be faced with the following UI:
最新资料最新资料
chapter, when we deploy Ansible. All these attributes are different parts of an Ansible
playbook, including the playbook itself, the inventory, the credentials used, and a couple
of other things that make using Ansible easier. If we scroll down a bit, there should be
three buttons there. Press the LAUNCH button. This will play the template and turn it
into a job:
Figure 11.15 – By clicking on the Launch button, we can start our template job
380 Ansible and Scripting for Orchestration and Automation
The idea is that we can create templates and run them at will. Once you've run them, the
results of the runs will end up under Jobs (find it as the second item on the left-hand side
of the window):
最新资料最新资料
最新资料最新资料
Deploying Ansible
Out of all the similar applications designed for orchestration and systems management,
Ansible is probably the simplest one to install. Since it requires no agents on the systems
it manages, installation is limited to only one machine – the one that will run all the
scripts and playbooks. By default, Ansible uses SSH to connect to machines, so the only
prerequisite for its use is that our remote systems have an SSH server up and running.
Other than that, there are no databases (Ansible uses text files), no daemons (Ansible
runs on demand), and no management of Ansible itself to speak of. Since nothing is
running in the background, Ansible is easily upgraded – the only thing that can change
is the way playbooks are structured, and that can easily be fixed. Ansible is based on the
Python programming language, but its structure is simpler than that of a standard Python
program. Configuration files and playbooks are either simple text files or YAML formatted
text files, with YAML being a file format used to define data structures. Learning YAML is
outside the scope of this chapter, so we will just presume that you understand simple data
structures. The YAML files we'll be using as examples are simple enough to warrant almost
no explanation, but if one is needed, it will be provided.
The installation can be as simple as running the following:
You can run this command as the root user or use the following command:
The machine that Ansible is installed in is also called the control node. It must be installed
on a Linux host as Windows is not supported in this role. Ansible control nodes can be
run inside virtual machines.
Machines that we control are called managed nodes, and by default, they are Linux
boxes controlled through the SSH protocol. There are modules and plugins that enable
extending this to Windows and macOS operating systems, as well as other communication
channels. When you start reading the Ansible documentation, you will notice that most
of the modules that support more than one architecture have clear instructions regarding
how to accomplish the same tasks on different operating systems.
We can configure Ansible's settings using /etc/ansible/ansible. This file contains
parameters that define the defaults, and by itself contains a lot of lines that are commented
out but contain default values for all the things Ansible uses to work. Unless we change
something, these are the values that Ansible is going to use to run. Let's use Ansible in
a practical sense to see how all of this fits together. In our scenario, we are going to use
Ansible to provision a virtual machine by using its built-in module.
One thing that you may or may not include is a setting that defines how SSH is used to
connect to machines Ansible is going to configure. Before we do that, we need to spend
a bit of time talking about security and Ansible. Like almost all things related to Linux
(or *nix in general), Ansible is not an integrated system, instead relying on different
services that already exist. To connect to systems it manages and to execute commands,
Ansible relies on SSH (in Linux) or other systems such as WinRM or PowerShell
on Windows. We are going to focus on Linux here, but remember that quite a bit of
information about Ansible is completely system-independent.
SSH is a simple but extremely robust protocol that allows us to transfer data (Secure
FTP, SFTP, and so on) and execute commands (SSH) on remote hosts through a secure
channel. Ansible uses SSH directly by connecting and then executing commands and
transferring files. This, of course, means that in order for Ansible to work, it is crucial
that SSH works.
384 Ansible and Scripting for Orchestration and Automation
There are a couple of things that you need to remember when using SSH to connect:
• The first is a key fingerprint, as seen from the Ansible control node (server). When
establishing a connection for the first time, SSH requires the user to verify and
accept keys that the remote system presents. This is designed to prevent MITM
attacks and is a good tactic in everyday use. But if we are in the position of having
to configure freshly installed systems, all of them will require for us to accept their
keys. This is time-consuming and complicated to do once we start using playbooks,
so the first playbook you will start is probably going to disable key checks and
logging into machines. Of course, this should only be used in a controlled
environment since this lowers the security of the whole Ansible system.
• The second thing you need to know is that Ansible runs as a normal user. Having
said that, maybe we do not want to connect to the remote systems as the current
user. Ansible solves that by having a variable that can be set on individual
computers or groups that indicates what username the system is going to use to
connect to this particular computer. After connecting, Ansible allows us to execute
commands on the remote system as a different user entirely. This is something that
is commonly used since it enables us to reconfigure the machine completely and
change users as if we were at the console.
• The third thing that we need to remember are the keys – SSH can log in by using
最新资料最新资料
Although we can use fixed passwords inside inventory files (or special key vaults), this
is a bad idea. Luckily, Ansible enables us to script a lot of things, including copying keys
to remote systems. This means that we are going to have some playbooks that are going
to automate deployment of new systems, and these will enable us to take control of them
for further configuration.
To sum this up, the Ansible steps for deploying a system will probably start like this:
1. Install the core system and make sure that SSHD is running.
2. Define a user that has admin rights on the system.
3. From the control node, run a playlist that will establish the initial connection and
copy the local SSH key to a remote location.
4. Use the appropriate playbooks to reconfigure the system securely, and without the
need to store passwords locally.
Every reasonable manager will tell you that in order to do anything, you need to define
the scope of the problem. In automation, this means defining systems that Ansible is
going to work on. This is done through an inventory file, located in /etc/Ansible,
called hosts.
Hosts can be grouped or individually named. In text format, that can look like this:
[servers]
srv1.local
srv2.local
srv3.local
[workstations]
wrk1.local
wrk2.local
wrk3.local
Computers can be part of multiple groups simultaneously, and groups can be nested.
The format we used here is straight text. Let's rewrite this in YAML:
最新资料最新资料
All:
Servers:
Hosts:
Srv1.local:
Srv2.local:
Srv3.local:
Workstations:
Hosts:
Wrk1.local:
Wrk2.local:
Wrk3.local:
Production:
Hosts:
Srv1.local:
Workstations:
386 Ansible and Scripting for Orchestration and Automation
Important Note
We created another group called Production that contains all the workstations
and one server.
Anything that is not part of the default or standard configuration can be included
individually in the host definition or in the group definition as variables. Every Ansible
command has some way of giving you flexibility in terms of partially or completely
overriding all the items in the configuration or inventory.
The inventory supports ranges in host definitions. Our previous example can be written
as follows:
[servers]
Srv[1:3].local
[workstations]
Wrk[1:3].local
This also works for characters, so if we need to define servers named srva, srvb, srvc,
and srvd, we can do that by stating the following:
srv[a:d] 最新资料最新资料
IP ranges can also be used. So, for instance, 10.0.0.0/24 would be written down
as follows:
10.0.0.[1:254]
There are two predefined default groups that can also be used: all and ungrouped.
As their names suggest, if we reference all in a playbook, it will be run on every server
we have in our inventory. Ungrouped will reference only those systems that are not part
of any group.
Ungrouped references are especially useful when setting up new computers – if they
are not in any group, we can consider them new and set them up to be joined to
a specific group.
These groups are defined implicitly and there is no need to reconfigure them or even
mention them in the inventory file.
Provisioning a virtual machine using the kvm_libvirt module 387
We mentioned that the inventory file can contain variables. Variables are useful when we
need to have a property that is defined inside a group of computers, a user, password, or
a setting specific to that group. Let's say that we want to define a user that is going to be
using on the servers group:
2. Then, we define the variables that are going to be used for the whole group:
[servers:vars]
ansible_user=Ansibleuser
ansible_connection=ssh
This will use the user named Ansibleuser to connect using SSH when asked
to perform a playbook.
Important Note
Note that the password is not present and that this playbook will fail
if either the password is not separately mentioned or the keys are not
最新资料最新资料
exchanged beforehand. For more on variables and their use, consult Ansible
documentation.
Now that we've created our first practical Ansible task, it's time to talk about how to make
Ansible do many things at once while using a more objective approach. It's important to
be able to create a single task or a couple of tasks that we can combine through a concept
called a playbook, which can include multiple tasks/plays.
In our examples, we've configured four CentOS7 systems, gave them consecutive
addresses in the range of 10.0.0.1 to 10.0.0.4, and used them for everything.
Ansible is installed on the system with the IP address 10.0.0.1, but as we already said,
this is completely arbitrary. Ansible has a minimal footprint on the system that is used
as a control node and can be installed on any system as long as it has connectivity to the
rest of the network we are going to manage. We simply chose the first computer in our
small network. One more thing to note is that the control node can be controlled by itself
through Ansible. This is useful, but at the same time not a good thing to do. Depending
on your setup, you will want to test not only playbooks, but individual commands before
they are deployed to other machines – doing that on your control server is not a wise
thing to do.
Now that Ansible is installed, we can try and do something with it. There are two distinct
ways that Ansible can be run. One is by running a playbook, a file that contains tasks that
are to be performed. The other way is by using a single task, sometimes called ad hoc
execution. There are reasons to use Ansible either way – playbooks are our main tool, and
you will probably use them most of the time. But ad hoc execution also has its advantages,
especially if we are interested in doing something that we need done once, but across
multiple servers. A typical example is using a simple command to check the version of
an installed application or application state. If we need it to check something, we are not
going to write a playbook.
最新资料最新资料
To see if everything works, we are going to start by simply using ping to check if the
machines are online.
Ansible likes to call itself radically simple automation, and the first thing we are going
to do proves that.
We are going to use a module named ping that tries to connect to a host, verifies that it
can run on local Python environment, and returns a message if everything is ok. Do not
confuse this module with the ping command in Linux; we are not pinging through a
network; we are only pinging from the control node to the server we are trying to control.
We will use a simple ansible command to ping all the defined hosts by issuing the
following command:
最新资料最新资料
Figure 11.18 – Our first Ansible module – ping, checks for Python and reports its state
What we did here is run a single command called ansible all -m ping.
ansible is the simplest command available and runs a single task. The all parameter
means run it on all the hosts in the inventory, and -m is used to call a module that will
be run.
This particular module has no parameters or options, so we just need to run it in order to
get a result. The result itself is interesting; it is in YAML format and contains a few things
other than just the result of the command.
If we take a closer look at this, we will see that Ansible returned one result for each host
in the inventory. The first thing we can see is the final result of the command – SUCCESS
means that the task itself ran without a problem. After that, we can see data in form of an
array – ansible_facts contains information that the module returns, and it is used
extensively when writing playbooks. Data that is returned this way can vary. In the next
section, we will show a much bigger dataset, but in this particular case, the only thing
that is shown is the location of the Python interpreter. After that, we have the changed
variable, which is an interesting one.
390 Ansible and Scripting for Orchestration and Automation
When Ansible runs, it tries to detect whether it ran correctly and whether it has changed
the system state. In this particular task, the command that ran is just informative and does
not change anything on the system, so the system state was unchanged.
In other words, this means that whatever was run did not install or change anything on
the system. States will make more sense later when we need to check if something was
installed or not, such as a service.
The last variable we can see is the return of the ping command. It simply states pong
since this is the correct answer that the module gives if everything was set up correctly.
Let's do something similar, but this time with an argument, such as an ad hoc command
that we want to be executed on remote hosts. So, type in the following command:
最新资料最新资料
Figure 11.19 – Using Ansible to explicitly execute a specific command on Ansible targets
Here, we called another module called shell. It simply runs whatever is given as
a parameter as a shell command. What is returned is the local hostname. This is
functionally the same as what would happen if we connected to each host in our
inventory using SSH, executed the command, and then logged out.
For a simple demonstration of what Ansible can do, this is OK, but let's do something
more complex. We are going to use a module called yum that is specific to CentOS/Red
Hat to check if there is a web server installed on our hosts. The web server we are going
to check for is going to be lighttpd since we want something lightweight.
Provisioning a virtual machine using the kvm_libvirt module 391
When we talked about states, we touched on a concept that is both a little confusing at
first and extremely useful once we start using it. When calling a command like this, we are
declaring a desired state, so the system itself will change if the state is not the one we are
demanding. This means that, in this example, we are not actually testing if lighttpd is
installed – we are telling Ansible to check it and that if it's not installed to install it. Even
this is not completely true – the module takes two arguments: the name of the service and
the state it should be in. If the state on the system we are checking is the same as the state
we sent when invoking the module, we are going to get changed: false since nothing
changed. But if the state of the system is not the same, Ansible will make the current state
of the system the same as the state we requested.
To prove this, we are going to see if the service is not installed or absent in Ansible
terms. Remember that if the service was installed, this will uninstall it. Type in the
following command:
This is what you should get as the result of running the preceding command:
最新资料最新资料
Then, we can say that we want it present on the system. Ansible is going to install the
services as needed:
最新资料最新资料
Figure 11.21 – Using the yum install command on all Ansible targets
Here, we can see that Ansible simply checked and installed the service since it wasn't
there. It also provided us with other useful information, such as what changes were done
on the system and the output of the command it performed. Information was provided
as an array of variables; this usually means that we will have to do some string
manipulation in order to make it look nicer.
Now, let's run the command again:
最新资料最新资料
Figure 11.22 – Using Ansible to check the service state after service installation
As we can see, there were no changes here since the service is installed.
These were all just starting examples so that we could get to know Ansible a little bit. Now,
let's expand on this and create an Ansible playbook that's going to install KVM on our
predefined set of hosts.
Installing KVM
Now, let's create our first playbook and use it to install KVM on all of our hosts. For
our playbook, we used an excellent example from the GitHub repository, created by
Jared Bloomer, that we changed a bit since we already have our options and inventory
configured. The original files are available at https://github.com/jbloomer/
Ansible---Install-KVM-on-CentOS-7.git.
394 Ansible and Scripting for Orchestration and Automation
This playbook will show everything that we need to know about automating simple tasks.
We chose this particular example because it shows not only how automation works, but
also how to create separate tasks and reuse them in different playbooks. Using a public
repository has an added benefit that you will always get the latest version, but it may differ
significantly than the one presented here:
1. First, we created our main playbook – the one that will get called – and named it
installkvm.yaml:
Figure 11.23 – The main Ansible playbook, which checks for virtualization support and installs KVM
As we can see, this is simple declaration, so let's analyze it line by line. First, we have
the playbook name, a string that can contain whatever we want:
The hosts variable defines what part of the inventory this playbook is going to be
performed on – in our case, all the hosts. We can override this (and all the other
最新资料最新资料
variables) at runtime, but it helps to limit the playbook to just the hosts we need to
control. In our particular case, this is actually all the hosts in our inventory, but in
production, we will probably have more than one group of hosts.
The next variable is the name of the user that is going to perform the task. What we
did here is not recommended in production since we are using a superuser account
to perform tasks. Ansible is completely capable of working with non-privileged
accounts and elevating rights when needed, but as in all demonstrations, we are
going to make mistakes so that you don't have to and all in order to make things
easier to understand.
Now comes the part that is actually performing our tasks. In Ansible, we declare
roles for the system. In our example, there are two of them. Roles are really just
tasks to be performed, and that will result in a system that will be in a certain state.
In our first role, we are going to check if the system supports virtualization, and
then in the second one, we will install KVM services on all the systems that do.
Provisioning a virtual machine using the kvm_libvirt module 395
2. When we downloaded the script from the GitHub, it created a few folders. In the
one named roles, there are two subfolders that each contain a file; one is called
checkVirtualization and the other is called installKVM.
You can probably already see where this is heading. First, let's see what
checkVirtualization contains:
Figure 11.24 – Checking for CPU virtualization via the lscpu command
This tasks simply calls a shell command and tries to grep for the lines containing
virtualization parameters for the CPU. If it finds none, it fails.
3. Now, let's see the other task:
最新资料最新资料
Figure 11.25 – Ansible task for installing the necessary libvirt packages
The first part is a simple loop that will just install five different packages if they are not
present. We are using the package module here, which is a different approach than
the one we used in our first demonstration regarding how to install packages. The
module that we used earlier in this chapter is called yum and is specific to CentOS as a
distribution. The package module is a generic module that will translate to whatever
package manager a specific distribution is using. Once we've installed all the packages
we need, we need to make sure that libvirtd is enabled and started.
396 Ansible and Scripting for Orchestration and Automation
We are using a simple loop to go through all the packages that we are installing.
This is not necessary, but it is a better way to do things than copying and pasting
individual commands since it makes the list of packages that we need much
more readable.
Then, as the last part of the task, we verify if the KVM has loaded.
As we can see, the syntax for the playbook is a simple one. It is easily readable, even
by somebody who has only minor knowledge of scripting or programming. We
could even say that having a firm understanding of how the Linux command line
works is more important.
4. In order to run a playbook, we use the ansible-playbook command, followed
by the name of the playbook. In our case, we're going to use the ansible-
playbook main.yaml command. Here are the results:
最新资料最新资料
5. Here, we can see that Ansible breaks down everything it did on every host, change
by change. The end result is a success:
最新资料最新资料
Figure 11.28 – Using Ansible to check all the virtual machines on Ansible targets
Having finished this simple exercise, we have a running KVM on four machines and the
ability to control them from one place. But we still have no VMs running on the hosts.
Next, we are going to show you how to create a CentOS installation inside the KVM
environment, but we are going to use the most basic method to do so – virsh.
398 Ansible and Scripting for Orchestration and Automation
We are going to do two things: first, we are going to download a minimal ISO image
for CentOS from the internet. Then, we are going to call virsh. This book will show you
different ways to accomplish this task; downloading from the internet is one of
the slowest:
最新资料最新资料
Figure 11.30 – Status check – checking if the files have been downloaded to our targets
3. Since we are not automating this and instead creating a single task, we are going
to run it in a local shell. The command to run for this would be something like
the following:
ansible all -m shell -a "virt-install --name=COS7Core
--ram=2048 --vcpus=4 --cdrom=/var/lib/libvirt/boot/
CentOS-7-x86_64-Minimal-1810.iso --os-type=linux
Provisioning a virtual machine using the kvm_libvirt module 399
最新资料最新资料
Figure 11.31 – Using Ansible to check if all our VMs are running
Here, we can see that all the KVMs are running and that each of them has its own virtual
machine online and running.
Now, we are going to wipe our KVM cluster and start again, but this time with a different
configuration: we are going to deploy the cloud version of CentOS and reconfigure it
using cloud-init.
More details can be found at cloud-init.io, but in a nutshell, cloud-init is a tool that
enables the creation of special files that can be combined with VM templates in order to
rapidly deploy them. The main difference between cloud-init and unattended installation
scripts is that cloud-init is more or less distribution-agnostic and much easier to change
with scripting tools. This means less work during deployment, and less time from start
of deployment until machines are online and working. On CentOS, this can be
accomplished with kickstart files, but this not nearly as flexible as cloud-init.
Cloud-init works using two separate parts: one is the distribution file for the operating
system we are deploying. This is not the usual OS installation file, but a specially
configured machine template intended to be used as a cloud-init image.
The other part of the system is the configuration file, which is compiled–or to be more
precise, packed – from a special YAML text file that contains configuration for the
machine. This configuration is small and ideal for network transmission.
These two parts are intended to be used as a whole to create multiple instances of identical
virtual machines.
The way this works is simple:
1. First, we distribute a machine template that is completely identical for all the
machines that we are going to create. This means having one master copy and
最新资料最新资料
Let's simplify this even more: if we need to create 100 servers that will have four different
roles using the unattended installation files, we would have to boot 100 images and wait
for them to go through all the installation steps one by one. Then, we would need to
reconfigure them for the task we need. Using cloud-init, we are booting one image in
100 instances, but the system takes only a couple of seconds to boot since it is already
installed. Only critical information is needed to put it online, after which we can take
over and completely configure it using Ansible.
We are not going to dwell too much on cloud-init's configuration; everything we need is
in this example:
Provisioning a virtual machine using the kvm_libvirt module 401
1. First, we are going to copy the cloud image and our configuration onto our KVM
hosts. After that, we are going to create a machine out of these and start it:
最新资料最新资料
Figure 11.34 – The playbook that will download the required image, configure cloud-init,
and start the VM deployment process
Provisioning a virtual machine using the kvm_libvirt module 403
Since this is our first complicated playbook, we need to explain a few things. In every
play or task, there are some things that are important. A name is used to simplify
running the playbook; this is what is going to be displayed when the playbook
runs. This name should be explanatory enough to help, but not too long in order
to avoid clutter.
After the name, we have the business part of each task – the name of the module
being called. In our example, we are using three distinct ones: copy, command, and
virt. copy is used to copy files between hosts, command executes commands on
the remote machine, and virt contains commands and states needed to control
the virtual environment.
You will notice when reading this that copy looks strange; src denotes a local
directory, while dest denotes a remote one. This is by design. To simplify things,
copy works between the local machine (the control node running Ansible) and the
remote machine (the one being configured). Directories will get created if they do
not exist, and copy will apply the appropriate permissions.
After that, we are running a command that will work on local files and create
a virtual machine. One important thing here is that we are basically running the
image we copied; the template is on the control node. At the same time, this saves
disk space and deployment time – there is no need to copy the machine from local
最新资料最新资料
to remote disk and then duplicate it on the remote machine once again; as soon as
the image is there, we can run it.
Back to the important part – the local installation. We are creating a machine with
1 GB of RAM and one CPU using the disk image we just copied. We're also
attaching our config.iso file as a virtual CD/DVD. We are then importing
this image and using no graphic terminal.
2. The last task is starting the VM on the remote KVM host. We will use the following
command to do so:
ansible-playbook installvms.yaml
404 Ansible and Scripting for Orchestration and Automation
Let's check two more things – networking and the machine state. Type in the
following command:
最新资料最新资料
The difficult part is the package names: in our demonstration machine, we are using
CentOS7 as the operating system and its package names are a little different. Apache is
called httpd and mysql is replaced with mariaDB, another engine that is compatible
with MySQL. PHP is luckily the same as on other distributions. We also need another
package named python2-PyMySQL (the name is case sensitive) in order to get our
playbook to work.
The next thing we are going to do is test the installation by starting all the services and
creating the simplest .php script possible. After that, we are going to create a database
and a user that is going to use it. As a warning, in this chapter, we are concentrating
on Ansible basics, since Ansible is far too complex to be covered in one chapter of a
book. Also, we are presuming a lot of things, and our biggest assumption is that we are
creating demo systems that are not in any way intended for production. This playbook in
particular lacks one important step: creating a root password. Do not go into production
with your SQL password not set.
One more thing: our script presumes that there is a file named index.php in the
directory our playbook runs from, and that file will get copied to the remote system:
最新资料最新资料
As we can see, there is nothing complicated going on, just a simple sequence of steps.
Our .php file looks like this:
最新资料最新资料
Ansible is much more complex than what we showed you here in this chapter, so we
strongly suggest you do some further reading and learning. What we did here was just
the simplest example of how we can install KVM on multiple hosts and control all of
them at once using the command line. What Ansible does best is save us time – imagine
having a couple of hundred hypervisors and having to deploy thousands of servers.
Using playbooks and a couple of preconfigured images, we can not only configure
KVM to run our machines, but reconfigure anything on the machines themselves. The
only real prerequisites are a running SSH server and an inventory that will enable us
to group machines.
• Task 1:
We configured and ran one machine per KVM host. Create a playbook that will
form a pair of hosts – one running a website and another running a database.
最新资料最新资料
• Task 4:
Based on our LAMP deployment playbook, improve on it by doing the following:
a) Create a playbook that will run on a remote machine.
b) Create a playbook that will install different roles on different servers.
c) Create a playbook that will deploy a more complex application, such as WordPress.
If you managed to solve these five tasks, then congratulations – you're en route to
becoming an administrator who can use Automation, with a capital A.
Summary
In this chapter, we discussed Ansible – a simple tool for orchestration and automation.
It can be used both in open source and Microsoft-based environments as it supports both
natively. Open source systems can be accessed via SSH keys, while Microsoft operating
systems can be accessed by using WinRM and PowerShell. We learned a lot about simple
Ansible tasks and more complex ones since deploying a multi-tier application that's hosted
on multiple virtual machines isn't an easy task to do – especially if you're approaching the
problem manually. Even deploying a KVM hypervisor on multiple hosts can take quite a
bit of time, but we managed to solve that with one simple Ansible playbook. Mind you, we
only needed some 20 configuration lines to do that, and the upshot of that is that we can
最新资料最新资料
easily add hundreds of more hosts as targets for this Ansible playbook.
The next chapter takes us to a world of cloud services – specifically OpenStack – where
our Ansible knowledge is going to be very useful for large-scale virtual machine
configuration as it's impossible to configure all of our cloud virtual machines by using
any kind of manual utilities. Apart from that, we'll extend our knowledge of Ansible by
integrating OpenStack and Ansible so that we can use both of these platforms to do what
they do really well – manage cloud environments and configure their consumables.
Questions
1. What is Ansible?
2. What does an Ansible playbook do?
3. Which communication protocol does Ansible use to connect to its targets?
4. What is AWX?
5. What is Ansible Tower?
Further reading 411
Further reading
Please refer to the following links for more information regarding what was covered in
this chapter:
最新资料最新资料
最新资料最新资料
Section 4:
Scalability, Monitoring,
Performance Tuning,
and Troubleshooting
最新资料最新资料
In this part of the book, you will learn about the scalability, monitoring, advanced
performance tuning, and troubleshooting of KVM-based virtual machines and hypervisors.
This part of the book comprises the following chapters:
Being able to virtualize a machine is a big thing, but sometimes, just virtualization is
not enough. The problem is how to give individual users tools so that they can virtualize
whatever they need, when they need it. If we combine that user-centric approach with
virtualization, we are going to end up with a system that needs to be able to do two things:
it should be able to connect to KVM as a virtualization mechanism (and not only KVM)
and enable users to get their virtual machines running and automatically configured
in a self-provisioning environment that's available through a web browser. OpenStack
adds one more thing to this since it is completely free and based entirely on open source
technologies. Provisioning such a system is a big problem due to its complexity, and in
this chapter, we are going to show you – or to be more precise, point you – in the right
direction regarding whether you need a system like this.
In this chapter, we will cover the following topics:
• Introduction to OpenStack
• Software-defined networking
• OpenStack components
• Additional OpenStack use cases
416 Scaling Out KVM with OpenStack
Introduction to OpenStack
In its own words, OpenStack is a cloud operating system that is used to control
a large number of different resources in order to provide all the essential services for
Infrastructure-as-a-Service (IaaS) and orchestration.
But what does this mean? OpenStack is designed to completely control all the resources
that are in the data center, and to provide both central management and direct control
over anything that can be used to deploy both its own and third-party services. Basically,
for every service that we mention in this book, there is a place in the whole OpenStack
landscape where that service is or can be used.
OpenStack itself consists of several different interconnected services or service parts,
each with its own set of functionalities, and each with its own API that enables full
control of the service. In this part of this book, we will try to explain what different
parts of OpenStack do, how they interconnect, what services they provide, and how
to use those services to our advantage.
最新资料最新资料
The reason OpenStack exists is because there was the need for an open source cloud
computing platform that would enable creating public and private clouds that are
independent of any commercial cloud platform. All parts of OpenStack are open source
and were released under the Apache License 2.0. The software was created by a large,
mixed group of individuals and large cloud providers. Interestingly, the first major release
was the result of NASA (a US government agency) and Rackspace Technology (a large US
hosting company) joining their internal storage and computing infrastructure solutions.
These releases were later designated with the names Nova and Swift, and we will cover
them in more detail later.
The first thing you will notice about OpenStack is its services since there is no single
OpenStack service but an actual stack of services. The name OpenStack comes directly
from this concept because it correctly identifies OpenStack as an open source component
that acts as services that are, in turn, grouped into functional sets.
Introduction to OpenStack 417
Once we understand that we are talking about autonomous services, we also need to
understand that services in OpenStack are grouped by their function, and that some
functions have more than one specialized service under them. We will try to cover as
much as possible about different services in this chapter, but there are simply too many
of them to even mention all of them here. All the documentation and all the whitepapers
can be found at http://openstack.org, and we strongly suggest that you consult it
for anything not mentioned here, and even for things that we mention but that could have
changed by the time you read this.
The last thing we need to clarify is the naming – every service in OpenStack has its project
name and is referred to by that name in the documentation. This might, at first glance,
look confusing since some of the names are completely unrelated to the specific function
a particular service has in the whole project, but using names instead of official
designators for a function is far easier once you start using OpenStack. Take, for example,
Swift. Swift's full name is OpenStack Object Store, but this is rarely mentioned in the
documentation or its implementation. The same goes for other services or projects under
OpenStack, such as Nova, Ironic, Neutron, Keystone, and over 20 other different services.
If you step away from OpenStack for a second, then you need to consider what cloud
services are all about. The cloud is all about scaling – in terms of compute resources,
storage, network, APIs – whatever. But, as always in life, as you scale things, you're going
to run into problems. And these problems have their own names and solutions. So, let's
最新资料最新资料
a scalability problem.
Let's continue our journey through OpenStack by explaining the most fundamental
subject of cloud environments, which is scaling cloud networking via software-defined
networking (SDN). The reason for this is really simple – without SDN concepts, the cloud
wouldn't really be scalable enough for customers to be happy, and that would be
a complete showstopper. So, buckle up your seatbelts and let's do an SDN primer.
Software-defined networking
One of the straightforward stories about the cloud – at least on the face of it – should
have been the story about cloud networking. In order to understand how simple this
story should've been, we only need to look at one number, and that number is the virtual
LAN (VLAN ID) number. As you might already be aware, by using VLANs, network
administrators have a chance to divide a physical network into separate logical networks.
Bearing in mind that the VLAN part of the Ethernet header can have up to 12 bits, the
maximum number of these logically isolated networks is 4,096. Usually, the first and last
VLANs are reserved (0 and 4095), as is VLAN 1.
Software-defined networking 419
So, basically, we're left with 4,093 separate logical networks in a real-life scenario, which
is probably more than enough for the internal infrastructure of any given company.
However, this is nowhere near enough for public cloud providers. The same problem
applies to public cloud providers that use hybrid-cloud types of services to – for
example – extend their compute power to the cloud.
So, let's focus on this network problem for a bit. Realistically, if we look at this problem
from the cloud user perspective, data privacy is of utmost importance to us. If we look
at this problem from the cloud provider perspective, then we want our network isolation
problem to be a non-issue for our tenants. This is what cloud services are all about at
a more basic level – no matter what the background complexity in terms of technology
is, users have to be able to access all of the necessary services in as user-friendly a way as
possible. Let's explain this by using an example.
What happens if we have 5,000 different clients (tenants) in our public cloud
environment? What happens if every tenant needs to have five or more logical networks?
We quickly realize that we have a big problem as cloud environments need to be
separated, isolated, and fenced. They need to be separated from one another at a network
level for security and privacy reasons. However, they also need to be routable, if a tenant
needs that kind of service. On top of that, we need the ability to scale so that situations in
which we need more than 5,000 or 50,000 isolated networks don't bother us. And, going
back to our previous point – roughly 4,000 VLANs just isn't going to cut it.
最新资料最新资料
There's a reason why we said that this should have been a straightforward story. The
engineers among us see these situations in black and white – we focus on a problem and
try to come to a solution. And the solution seems rather simple – we need to extend the
12-bit VLAN ID field so that we can have more available logical networks. How difficult
can that be?
As it turns out, very difficult. If history teaches us anything, it's that various different
interests, companies, and technologies compete for years for that top dog status in
anything in terms of IT technology. Just think of the good old days of DVD+R, DVD-R,
DVD+RW, DVD-RW, DVD-RAM, and so on. To simplify things a bit, the same thing
happened here when the initial standards for cloud networking were introduced. We
usually call these network technologies cloud overlay network technologies. These
technologies are the basis for SDN, the principle that describes the way cloud networking
works at a global, centralized management level. There are multiple standards on the
market to solve this problem – VXLAN, GRE, STT, NVGRE, NVO3, and more.
420 Scaling Out KVM with OpenStack
Realistically, there's no need to break them all down one by one. We are going to take
a simpler route – we're going to describe one of them that's the most valuable for us in
the context of today (VXLAN) and then move on to something that's considered to be
a unified standard of tomorrow (GENEVE).
First, let's define what an overlay network is. When we're talking about overlay networks,
we're talking about networks that are built on top of another network in the same
infrastructure. The idea behind an overlay network is simple – we need to disentangle the
physical part of the network from the logical part of the network. If we want to do that
in absolute terms (configure everything without spending massive amounts of time in
the CLI to configure physical switches, routers, and so on), we can do that as well. If we
don't want to do it that way and we still want to work directly with our physical network
environment, we need to add a layer of programmability to the overall scheme. Then, if
we want to, we can interact with our physical devices and push network configuration to
them for a more top-to-bottom approach. If we do things this way, we'll need a bit more
support from our hardware devices in terms of capability and compatibility.
Now that we've described what network overlay is, let's talk about VXLAN, one of the
most prominent overlay network standards. It also serves as a basis for developing some
other network overlay standards (such as GENEVE), so – as you might imagine – it's very
important to understand how it works.
最新资料最新资料
Understanding VXLAN
Let's start with the confusing part. VXLAN (IETF RFC 7348) is an extensible overlay
network standard that enables us to aggregate and tunnel multiple Layer 2 networks
across Layer 3 networks. How does it do that? By encapsulating a Layer 2 packet inside a
Layer 3 packet. In terms of transport protocol, it uses UDP, by default on port 4789 (more
about that in just a bit). In terms of special requests for VXLAN implementation – as long
as your physical network supports MTU 1600, you can implement VXLAN as a cloud
overlay solution easily. Almost all the switches you can buy (except for the cheap home
switches, but we're talking about enterprises here) support jumbo frames, which means
that we can use MTU 9000 and be done with it.
Software-defined networking 421
VTEPs; that is, VXLAN tunneling endpoints) that check VXLAN network identifiers
(VNIs) so that they can decide which packets go where.
If this seems complicated, then don't worry – we can simplify this. From the perspective
of VXLAN, a VNI is the same thing as a VLAN ID is to VLAN. It's a unique network
identifier. The difference is just the size – the VNI field has 24 bits, compared to VLAN's
12. That means that we have 2^24 VNIs compared to VLAN's 2^12. So, VXLANs – in
terms of network isolation – are VLANs squared.
最新资料最新资料
The simplicity, scalability, and extensibility of VXLAN also means more really useful usage
models, such as the following:
• Stretching Layer 2 across sites: This is one of the most common problems
regarding cloud networking, as we will describe shortly.
• Layer 2 bridging: Bridging a VLAN to a cloud overlay network (such as VXLAN)
is very useful when onboarding our users to our cloud services as they can then just
connect to our cloud network directly. Also, this usage model is heavily used when
we want to physically insert a hardware device (for example, a physical database
server or a physical appliance) into a VXLAN. If we didn't have Layer 2 bridging,
imagine all the pain that we would have. All our customers running the Oracle
Database Appliance would have no way to connect their physical servers to our
cloud-based infrastructure.
• Various offloading technologies: These include load balancing, antivirus,
vulnerability and antimalware scanning, firewall, IDS, IPS integration, and so on.
All of these technologies enable us to have useful, secure environments with simple
management concepts.
efficiently.
VM4 is in some other remote site as its segment (VXLAN 5001) spans across those sites.
How? As long as the underlying hosts can communicate with each other over the VXLAN
transport network (usually via the management network as well), the VTEPs from the
first site can talk to the VTEPs from the second site. This means that virtual machines that
are backed by VXLAN segments in one site can talk to the same VXLAN segments in the
other site by using the aforementioned Layer 2-to-Layer 3 encapsulation. This is a really
simple and elegant way to solve a complex and costly problem.
We mentioned that VXLAN, as a technology, served as a basis for developing some other
standards, with the most important being GENEVE. As most manufacturers work toward
GENEVE compatibility, VXLAN will slowly but surely disappear. Let's discuss what the
purpose of the GENEVE protocol is and how it aims to become the standard for cloud
overlay networking.
Software-defined networking 425
Understanding GENEVE
The basic problem that we touched upon earlier is the fact that history kind of repeated
itself in cloud overlay networks, as it did many times before. Different standards, different
firmwares, and different manufacturers supporting one standard over another, where all
of the standards are incredibly similar but still not compatible with each other. That's why
VMware, Microsoft, Red Hat, and Intel proposed GENEVE, a new cloud overlay standard
that only defines the encapsulation data format, without interfering with the control
planes of these technologies, which are fundamentally different. For example, VXLAN
uses a 24-bit field width for VNI, while STT uses 64-bit. So, the GENEVE standard
proposes no fixed field size as you can't possibly know what the future brings. Also, taking
a look at the existing user base, we can still happily use our VXLANs as we don't believe
that they will be influenced by future GENEVE deployments.
Let's see what the GENEVE header looks like:
最新资料最新资料
OpenStack components
When OpenStack was first formed as a project, it was designed from two different
services:
• A computing service that was designed to manage and run virtual machines
themselves
• A storage service that was designed for large-scale object storage
These services are now called OpenStack Compute or Nova, and OpenStack Object
Store or Swift. These services were later joined by Glance or the OpenStack Image service,
which was designed to simplify working with disk images. Also, after our SDN primer,
we need to discuss OpenStack Neutron, the Network-as-a-Service (NaaS) component
of OpenStack.
The following diagram shows the components of OpenStack:
最新资料最新资料
We'll go through these in no particular order and will include additional services that are
important. Let's start with Swift.
Swift
The first service we need to talk about is Swift. For that purpose, we are going to grab the
project's own definition from the OpenStack official documentation and parse it to try
and explain what services are fulfilled by this project, and what is it used for. The Swift
website (https://docs.openstack.org/swift/latest/) states the following:
"Swift is a highly available, distributed, eventually consistent object/blob store. Organizations
can use Swift to store lots of data efficiently, safely, and cheaply. It's built for scale and
optimized for durability, availability, and concurrency across the entire dataset. Swift is ideal
for storing unstructured data that can grow without bounds."
Having read that, we need to point out quite a few things that may be completely new to
you. First and foremost, we are talking about storing data in a particular way that is not
common in computing unless you have used unstructured data stores. Unstructured does
not mean that this way of storing data is lacking structure; in this context, it means that
we are the ones that are defining the structure of the data, but the service itself does not
care about our structure, instead relying on the concept of objects to store our data. One
result of this is something that may also sound unusual at first, and that is that the data
最新资料最新资料
we store in Swift is not directly accessible through any filesystem, or any other way we are
used to manipulating files through our machines. Instead, we are manipulating data as
objects and we must use the API that is provided as part of Swift to get the data objects.
Our data is stored in blobs, or objects, that the system itself just labels and stores to take
care of availability and access speed. We are supposed to know what the internal structure
of our data is and how to parse it. On the other hand, because of this approach, Swift can
be amazingly fast with any amount of data and scales horizontally in a way that is almost
impossible to achieve using normal, classic databases.
428 Scaling Out KVM with OpenStack
Another thing worth mentioning is that this service offers highly available, distributed,
and eventually consistent storage. This means that, first and foremost, the priority is for
the data to be distributed and highly available, which are two things that are important
in the cloud. Consistency comes after that but is eventually achieved. Once you come to
use this service, you will understand what that means. In almost all usual scenarios where
data is read and rarely written, it is nothing to even think about, but there are some cases
where this can change the way we need to think about the way we go about delivering the
service. The documentation states the following:
"Because each replica in Object Storage functions independently and clients generally require
only a simple majority of nodes to respond to consider an operation successful, transient
failures such as network partitions can quickly cause replicas to diverge. These differences
are eventually reconciled by asynchronous, peer-to-peer replicator processes. The replicator
processes traverse their local filesystems and concurrently perform operations in a manner
that balances load across physical disks."
We can roughly translate this. Let's say that you have a three-node Swift cluster. In such
a scenario, a Swift object will become available to clients after the PUT operation has been
confirmed to have been completed on at least two nodes. So, if your goal is to create a
low-latency, synchronous storage replication with Swift, there are other solutions available
for that.
最新资料最新资料
Having put aside all the abstract promises regarding what Swift offers, let's go into more
details. High availability and distribution are the direct result of using a concept of zones
and having multiple copies of the same data written onto multiple storage servers. Zones
are nothing but a simple way of logically dividing the storage resources we have at our
disposal and deciding on what kind of isolation we are ready to provide, as well as what
kind of redundancy we need. We can group servers by the server itself, by the rack, by
sets of servers across a Datacenter, in groups across different Datacenters, and in any
combination of those. Everything really depends on the amount of available resources and
the data redundancy and availability we need and want, as well as, of course, the cost that
will accompany our configuration.
OpenStack components 429
Based on the resources we have, we are supposed to configure our storage system in terms
of how many copies it will hold and how many zones we are prepared to use. A copy of
a particular data object in Swift is referred to as a replica, and currently, the best practices
call for at least three replicas across no less than five zones.
A zone can be a server or a set of servers, and if we configure everything correctly, losing
any one zone should have no impact on the availability or distribution of data. Since
a zone can be as small as a server and as big as any number of data centers, the way we
structure our zones has a huge impact on the way the system reacts to any failures and
changes. The same goes for replicas. In the recommended scenario, configuration has a
smaller number of replicas than the number of zones, so only some of the zones will hold
some of these replicas. This means the system must balance the way data is written in order
to evenly distribute both the data and the load, including both the writing and the reading
load for the data. At the same time, the way we structure the zones will have an enormous
impact on the cost – redundancy has a real cost in terms of server and storage hardware,
and multiplying replicas and zones creates additional demands in regard to how much
storage and computing power we need to allocate for our OpenStack installation. Being
able to do this correctly is the biggest problem that a Datacenter architect has to solve.
Now, we need to go back to the concept of eventual consistency. Eventual consistency in
this context means that data is going to be written to the Swift store and that objects are
going to get updated, but the system will not be able to do a completely simultaneous
最新资料最新资料
write of all the data into all the copies (replicas) of the data across all zones. Swift will try
to reconcile the differences as soon as possible and will be aware of these changes, so it
serves new versions of the objects to whoever tries to read them. Scenarios where data is
inconsistent due to a failure of some part of the system exist, but they are to be considered
abnormal states of the system and need to be repaired rather than the system being
designed to ignore them.
Swift daemons
Next, we need to talk about the way Swift is designed in regard to its architecture. Data is
managed through three separate logical daemons:
• Swift-account is used to manage a SQL database that contains all the accounts
defined with the object storage service. Its main task is to read and write the data
that all the other services need, primarily in order to validate and find appropriate
authentication and other data.
430 Scaling Out KVM with OpenStack
• Swift-container is another database process, but it is used strictly to map data into
containers, a logical structure similar to AWS buckets. This can include any number
of objects that are grouped together.
• Swift-object manages mapping to actual objects, and it keeps track of the location
and availability of the objects themselves.
All these daemons are just in charge of data and make sure that everything is both
mapped and replicated correctly. Data is used by another layer in the architecture: the
presentation layer.
When a user wants to use any data object, it first needs to authenticate via a token that
can be either externally provided or created by an authentication system inside Swift.
After that, the main process that orchestrates data retrieval is Swift-proxy, which handles
communication with three daemons that deal with the data. Provided that the user
presented a valid token, it gets the data object delivered to the user request.
This is just the briefest of overviews regarding how Swift works. In order to understand
this, you need to not only read the documentation but also use some kind of system that
will perform low-level object retrieval and storage into and out of Swift.
Cloud services can't be scaled or used efficiently if we don't have orchestration services,
which is why we need to discuss the next service on our list – Nova.
最新资料最新资料
Nova
Another important service or project is Nova – an orchestration service that is used for
providing both provisioning and management for computing instances at a large scale.
What it basically does is allow us to use an API structure to directly allocate, create,
reconfigure, and delete or destroy virtual servers. The following is a diagram of a logical
Nova service structure:
OpenStack components 431
最新资料最新资料
Almost all distributed systems must rely on queues to be able to perform their tasks.
Messages need to be forwarded to a central place that will enable all daemons to do their
tasks, and using the right messaging and queueing system is crucial for system speed and
reliance. Nova currently uses RabbitMQ, a highly scalable and available system by itself.
Using a production-ready system like this means that not only are there tools to debug
the system itself, but there are a lot of reporting tools available for directly querying the
messaging queue.
The main purpose of using a messaging queue is to completely decouple any clients from
servers, and to provide asynchronous communication between different clients. There
is a lot to be said on how the actual messaging works, but for this chapter, we will just
refer you to the official documentation at https://docs.openstack.org/nova/
latest/, since we are not talking about a couple of functions on a server but an entirely
independent software stack.
The database is in charge of holding all the state data for the tasks currently being
performed, as well as enabling the API to return information about the current state
of different parts of Nova.
All in all, the system consists of the following:
• nova-api: The daemon that is directly facing the user and is responsible for
accepting, parsing, and working through all the user API requests. Almost all
最新资料最新资料
As a side note, nova-conductor is there to process requests that require any conversion
regarding objects, resizing, and database/proxy access.
The next service on our list is Glance – a service that is very important for virtual machine
deployment as we want to do this from images. Let's discuss Glance now.
Glance
At first, having a separate service for cloud disk image management makes little sense, but
when scaling any infrastructure, image management will become a problem that needs
an API to be solved. Glance basically has this dual identity – it can be used to directly
manipulate VM images and store them inside blobs of data, but at the same time it can
be used to completely automatically orchestrate a lot of tasks when dealing with a huge
number of images.
Glance is relatively simple in terms of its internal structure as it consists of an image
information database, an image store that uses Swift (or a similar service), and an API
that glues everything together. Database is sometimes called Registry, and it basically gives
information about a given image. Images themselves can be stored on different types of
stores, either from Swift (as blobs) on HTTP servers or on a filesystem (such as NFS).
Glance is completely nonspecific about the type of image store it uses, so NFS is perfectly
最新资料最新资料
okay and makes implementing OpenStack a little bit easier, but when scaling OpenStack,
both Swift and Amazon S3 can be used.
When thinking about the place in the big OpenStack puzzle that Glance belongs to, we
could describe it as being the service that Nova uses to find and instantiate images. Glance
itself uses Swift (or any other storage) to store images. Since we are dealing with multiple
architectures, we need a lot of different supported file formats for images, and Glance does
not disappoint. Every disk format that is supported by different virtualization engines is
supported by Glance. This includes both unstructured formats such as raw and structured
formats such as VHD, VMDK, qcow2, VDI ISO, and AMI. OVF – as an example of an
image container – is also supported.
Glance probably has the simplest API of them all, enabling it to be used even from the
command line using curl to query the server and JSON as the format of the messages.
OpenStack components 435
We'll finish this section with a small note directly from the Nova documentation: it
explicitly states that everything in OpenStack is designed to be horizontally scalable but
that, at any time, there should be significantly more computing nodes than any other type.
This actually makes a lot of sense – computing nodes are the ones in charge of actually
accepting and working on requests. The amount of storage nodes you'll need will depend
on your usage scenario, and Glance's will inevitably depend on the capabilities and
resources available to Swift.
The next service in line is Horizon – a human-readable GUI dashboard of OpenStack
where we consume a lot of OpenStack visual information.
Horizon
Having explained the core services that enable OpenStack to do what it does the way it
does in some detail, we need to address the user interaction. In almost every paragraph
in this chapter, we refer to APIs and scripting interfaces as a way to communicate and
orchestrate OpenStack. While this is completely true and is the usual way of managing
large-scale deployments, OpenStack also has a pretty useful interface that is available as
a web service in a browser. The name of this project is Horizon, and its sole purpose is
to provide a user with a way of interacting with all the services from one place, called
the dashboard. Users can also reconfigure most, if not all, the things in the OpenStack
installation, including security, networking, access rights, users, containers, volumes, and
最新资料最新资料
Designate
Every system that uses any kind of network must have at least some kind of name
resolution service in the form of a local or remote DNS or a similar mechanism.
Designate is a service that tries to integrate the DNSaaS concept in OpenStack in one
place. When connected to Nova and Neutron, it will try to keep up-to-date records in
regards to all the hosts and infrastructure details.
Another very important aspect of the cloud is how we manage identities. For that specific
purpose, OpenStack has a service called Keystone. We'll discuss what it does next.
Keystone
Identity management is a big thing in cloud computing, simply because when deploying
a large-scale infrastructure, not only do you need a way to scale your resources, but
you also need a way to scale user management. A simple list of users that can access a
resource is not an option anymore, mainly because we are not talking about simple users
anymore. Instead, we are talking about domains containing thousands of users separated
by groups and by roles – we are talking about multiple ways of logging in and providing
authentication and authorization. Of course, this also can span multiple standards for
authentication, as well as multiple specialized systems.
最新资料最新资料
Neutron
OpenStack Neutron is an API-based service that aims to provide a simple and extensible
cloud network concept as a development of what used to be called a Quantum service
in older releases of OpenStack. Before this service, networking was managed by
nova-network, which, as we mentioned, is a solution that's obsolete, with Neutron being
the reason for this. Neutron integrates with some of the services that we've already
discussed – Nova, Horizon, and Keystone. As a standalone concept, we can deploy
Neutron to a separate server, which will then give us the ability to use the Neutron API.
This is reminiscent of what VMware does in NSX with the NSX Controller concept.
When we deploy neutron-server, a web-based service that hosts the API connects to the
Neutron plugin in the background so that we can introduce networking changes to our
Neutron-managed cloud network. In terms of architecture, it has the following services:
• Cisco UCS/Nexus
• The Brocade Neutron plugin
• IBM SDN-VE
• VMware NSX
• Juniper OpenContrail
• Linux bridging
• ML2
• Many others
Most of these plugin names are logical, so you won't have any problems understanding
what they do. But we'd like to mention one of these plugins specifically, which is the
Modular Layer 2 (ML2) plugin.
438 Scaling Out KVM with OpenStack
By using the ML2 plugin, OpenStack Neutron can connect to various Layer 2 backends
– VLAN, GRE, VXLAN, and so on. It also enables Neutron to go away from the Open
vSwitch and Linux bridge plugins as its basic plugins (which are now obsolete). These
plugins are considered to be too monolithic for Neutron's modular architecture, and ML2
has replaced them completely since the release of Havana (2013). ML2 today has many
vendor-based plugins for integration. As shown by the preceding list, Arista, Cisco, Avaya,
HP, IBM, Mellanox, and VMware all have ML2-based plugins for OpenStack.
In terms of network categories, Neutron supports two:
Tenant networks usually use some kind of SNAT mechanism to access external
networks, and this service is usually implemented via virtual routers. The same
concept is used in other cloud technologies such as VMware NSX-v and NSX-t,
as well as Microsoft Hyper-V SDN technologies backed by Network Controller.
In terms of network types, Neutron supports multiple types:
Now that we've covered OpenStack's usage models, ideas, and services, let's
discuss additional ways in which OpenStack can be used. As you might imagine,
OpenStack – being what it is – is highly capable of being used in many non-standard
scenarios. We'll discuss these non-obvious scenarios next.
Additional OpenStack use cases 439
When deploying OpenStack, we are talking about a large-scale enterprise solution that is
usually deployed for one of three reasons:
• Testing and learning: Maybe we need to learn how to configure a new installation, or
we need to test a new computing node before we even go near production systems.
For that reason, we need a small OpenStack environment, perhaps a single server
that we can expand if there is a need for that. In practice, this system should be able
to support probably a single user with a couple of instances. Those instances will
usually not be the focus of your attention; they are going to be there just to enable
you to explore all the other functionalities of the system. Deploying such a system
is usually done the way we described in this chapter – using a readymade script that
installs and configures everything so that we can focus on the part we are actually
working on.
• We have a need for a staging or pre-production environment: Usually, this means that
we need to either support the production team so they have a safe environment
to work in, or we are trying to keep a separate test environment for storing and
running instances before they are pushed into production.
440 Scaling Out KVM with OpenStack
else to be able to spot problems before the users see them. Has a switch failed
over? Are the computing nodes all running correctly? Have the disks degraded in
performance due to a failure? Each of these things, in a carefully configured system,
will have minimal to no impact on the users, but if we are not proactive in our
approach, compounding errors can quickly bring the system down.
Having distinguished between a single server and a full install in two different scenarios,
we are going to go through both. The single server will be done manually using scripts,
while the multi-server will be done using Ansible playbooks.
Now that we've covered OpenStack in quite a bit of detail, it's time to start using it. Let's
start with some small things (a small environment to test) in order to provision a regular
OpenStack environment for production, and then discuss integrating OpenStack with
Ansible. We'll revisit OpenStack in the next chapter, when we start discussing scaling
out KVM to Amazon AWS.
Additional OpenStack use cases 441
yum update -y
yum install -y centos-release-openstack-train
yum update -y
yum install -y openstack-packstack
packstack --allinone
As the process goes through its various phases, you'll see various messages, such as the
following, which are quite nice as you get to see what's happening in real time with
a decent verbosity level:
最新资料最新资料
After the installation is finished, you will get a report screen that looks similar to this:
最新资料最新资料
documentation at https://wiki.openstack.org/wiki/Packstack.
If you're going to use an external network, you need a static IP address without
NetworkManager, and you probably want to either configure firewalld or stop it
altogether. Other than that, you can start using this as your demo environment.
OpenStack is a cloud operating system, and its main idea is to enable us to use multiple
servers and other devices to create a coherent, easily configured cloud that can be
managed from a central point, either through an API or through a web server. The size
and type of the OpenStack deployment can be from one server running everything, to
thousands of servers and storage units integrated across several Datacenters. OpenStack
does not have a problem with large-scale deployment; the only real limiting factor is
usually the cost and other requirements for the environment we are trying to create.
We mentioned scalability a few times, and this is where OpenStack shines in both ways.
The amazing thing is that not only does it scale up easily but that it also scales down.
An installation that will work perfectly fine for a single user can be done on a single
machine – even on a single VM inside a single machine – so you will be able to have your
own cloud within a virtual environment on your laptop. This is great for testing things but
nothing else.
Having a bare-metal install that will follow the guidelines and recommended
configuration requirements for particular roles and services is the only way to go forward
when creating a working, scalable cloud, and obviously this is the way to go if you need
to create a production environment. Having said that, between a single machine and
a thousand server installs, there are a lot of ways that your infrastructure can be shaped
and redesigned to support your particular use case scenario.
最新资料最新资料
Let's first quickly go through an installation inside another VM, a task that can be
accomplished in under 10 minutes on a faster host machine. For our platform, we
decided on installing Ubuntu 18.04.3 LTS in order to be able to keep the host system to
a minimum. The entire guide for Ubuntu regarding what we are trying to do is available
at https://docs.openstack.org/devstack/latest/guides/single-
machine.html.
One thing that we must point out is that the OpenStack site has a guide for a number
of different install scenarios, both on virtual and bare-metal hardware, and they are
all extremely easy to follow, simply because the documentation is straight to the point.
There's also a simple install script that takes care of everything once a few steps are done
manually by you.
Be careful with hardware requirements. There are some good sources available to cover
this subject. Start here: https://docs.openstack.org/newton/install-
guide-rdo/overview.html#figure-hwreqs.
Provisioning the OpenStack environment 445
We also need to install git and switch to our newly created user:
最新资料最新资料
Now for the fun part. We are going to clone (copy the latest version of) devstack,
the installation script that will provide everything we need to be able to run and use
OpenStack on this machine:
There will be some issues with this installation process, and as a result, installation might
break twice because of the following reasons:
• Ownership of /opt/stack/.cache is root:root, instead of stack:stack.
Please correct this ownership before running the installer;
• An installer problem (a known one), as it fails to install a component and then
fails. Solution is rather simple - there's a line that needs to be changed in a file in
inc directory, called python. At time of writing, line 192 of that file needs to be
changed from $cmd_pip $upgrade \ to $cmd_pip $upgrade --ignore-
installed \
In the end, after we collected all the data and modified the file, we settled on this
configuration:
最新资料最新资料
Once you've run the script, a really verbose installation should start and dump a lot of
lines on your screen. Wait for it to finish – it is going to take a while and download a lot
of files from the internet. At the end, it is going to give you an installation summary that
looks something like this:
After logging in with the credentials that are written on your machine, after the
installation (the default administrator name is admin and the password is the one you set
in local.conf when installing the service), you are going to be welcomed by a screen
showing you the stats for your cloud. The screen you are looking at is actually a Horizon
dashboard and is the main screen that provides you with all you need to know about your
cloud at a glance.
OpenStack administration
Looking at the top-left corner of Horizon, we can see that there are three distinct sections
that are configured by default. The first one – Project – covers everything about our
default instance and its performance. This is where you can create new instances, manage
images, and work on server groups. Our cloud is just a core installation, so we only have
one server and two defined zones, which means that we have no server groups installed:
最新资料最新资料
1. Go to Launch Instance in the far-right part of the screen. A window will open
that will enable you to give OpenStack all the information it needs to create a
new VM instance:
最新资料最新资料
2. On the next screen, you need to supply the system with the image source. We
already mentioned glances – these images are taken from the Glance store and can
be either an image snapshot, a ready-made volume, or a volume snapshot. We can
also create a persistent image if we want to. One thing that you'll notice is that there
are two differences when comparing this process to almost any other deployment.
The first is that we are using a ready-made image by default as one was provided
for us. Another big thing is the ability to create a new persistent volume to store
our data in, or to have it deleted when we are done with the image, or have it not
be created at all.
Choose the one image you have allocated in the public repository; it should be called
something similar to the one shown in the following screenshot. CirrOS is a test
image provided with OpenStack. It's a minimal Linux distribution that is designed to
be as small as possible and enable easy testing of the whole cloud infrastructure but
to be as unobtrusive as possible. CirrOS is basically an OS placeholder. Of course,
we need to click on the Launch Instance button to go to the next step:
最新资料最新资料
3. The next important part of creating a new image is choosing a flavor. This is another
one of those peculiarly named things in OpenStack. A flavor is a combination of
certain resources that basically creates a computing, memory, and storage template
for new instances. We can choose from instances that have as little as 64 MB of
RAM and 1 vCPU and go as far as our infrastructure can provide:
最新资料最新资料
4. The last thing we actually need to choose is the network connectivity. We need to set
all the adapters our instance will be able to use while running. Since this is a simple
test, we are going to use both adapters we have, both the internal and external one.
They are called public and shared:
最新资料最新资料
We'll quickly create another instance, and then create a snapshot so that we can
show you how image management works. If you click on the Create snapshot
button on the right-hand side of the instance list, Horizon will create a snapshot
and immediately put you in the interface meant for image administration:
最新资料最新资料
What we can see is all the information we need on our instances, what their
IP addresses are, their flavor (which translates into what amount of resources are
allocated for a particular instance), the availability zone that the image is running
in, and information on the current instance state. The next thing we are going to
check out is the Volumes tab on the left. When we created our instances, we told
OpenStack to create one permanent volume for the first instance. If we now click
on Volumes, we should see the volume under a numeric name:
最新资料最新资料
If we click on the Network Topology tab, we will get the whole network topology of
our currently running network, shown in a simple graphical display. We can choose
from Topology and Graph, both of which basically represent the same thing:
最新资料最新资料
Day-to-day administration
We are more or less finished with the most important options that are in any way
connected to the administration of our day-to-day tasks in the Project Datacenter. If we
click on the tab named Admin, we will notice that the menu structure we've opened looks
a lot like the one under Project. This is because, now, we are looking at administration
tasks that have something to do with the infrastructure of the cloud, not the infrastructure
of our particular logical Datacenter, but the same building blocks exist in both of these.
However, if we – for example – open Compute, a completely different set of options exist:
最新资料最新资料
The adminsitrative view enables us to monitor our nodes on a more direct level, not only
through the services they provide, but also through raw data about a particular host and
the resources utilized on it:
最新资料最新资料
The basic idea is to create flavors that will give individual users just enough resources to
get their job done in a satisfactory way. This is not obvious in a deployment that has 10
instances, but once we run into thousands, a flavor that always leaves 10 percent of the
storage unused is quickly going to eat into our resources and limit our ability to serve
more users. Striking this balance between what we have and what we give users to use in a
particular way is probably the hardest task in planning and designing our environments:
最新资料最新资料
3. Select the size of a base disk, and an ephemeral disk that doesn't get included in any
of the snapshots and gets deleted when a virtual machine is terminated.
4. Select the amount of swap space.
5. Select the RX/TX factor so that we can create a bit of QoS on the network level.
Some flavors will need to have more network traffic priority than others.
OpenStack allows a particular project to have more than one flavor, and for a single
flavor to belong to different projects. Now that we've learned that, let's work with our
user identities and assign them some objects.
Identity management
The last tab on the left-hand side is Identity, which is responsible for handling users, roles,
and projects. This is where we are going to configure not only our usernames, but the user
roles, groups, and projects a particular user can use:
最新资料最新资料
The point of this structure is to enable the administrator to limit the users not only to
what they can administer, but also to how many of the available resources are available for
a particular project. Let's use an example for this. If we go to Projects and edit an existing
project (or create a new one), we'll see a tab called Quota in the configuration menu,
which looks like this:
最新资料最新资料
When we are talking about deployment and target hosts, we need to make a clear
distinction: the deployment host is a single entity holding Ansible, scripts, playbooks,
roles, and all the supporting bits. The target hosts are the actual servers that are going
to be part of the OpenStack cloud.
The requirements for installation are straightforward:
Disk requirements are really up to you. OpenStack suggests using fast disks if possible,
recommending SSD drives in a RAID, and large pools of disks available for block storage.
• Infrastructure nodes have requirements that are different than the other types
of nodes since they are running a few databases that grow over time and need at
least 100 GB of space. The infrastructure also runs its services as containers, so
it will consume resources in a particular way that will be different than the other
compute nodes.
The deployment guide also suggests running a logging host since all the services create
logs. The recommended disk space is at least 50 GB for logs, but in production, this will
quickly grow in orders of magnitude.
OpenStack needs a fast, stable network to work with. Since everything in OpenStack
will depend on the network, every possible solution that will speed up network access
is recommended, including using 10G and bonded interfaces. Installing a deployment
server is the first step in the overall process, which is why we'll do that next.
you need to configure the ssh keys to be able to log into the target hosts. This is an
Ansible requirement and is also a best practice that leverages security and ease of access.
The network is simple – our deployment host must have connectivity to all the other
hosts. The deployment host should also be installed on the L2 of the network, which
is designed for container management.
Then, the repository should be cloned:
# scripts/bootstrap-ansible.sh
Integrating OpenStack with Ansible 465
This concludes preparing the Ansible deployment server. Now, we need to prepare the
target computers we are going to use for OpenStack. Target computers are currently
supported on Ubuntu Server (18.04) LTS, CentOS 7, and openSUSE 42.x (at the time of
writing, there still isn't CentOS 8 support). You can use any of these systems. For each of
them, there is a helpful guide that will get you up and running quickly: https://docs.
openstack.org/project-deploy-guide/openstack-ansible/latest/
deploymenthost.html. We'll just explain the general steps to ease you into installing
it, but in all truth, just copy and paste the commands that have been published for your
operating system from https://www.openstack.org/.
No matter which system you decide to run on, you have to be completely up to date
with system updates. After that, install the linux-image-extra package (if it exists
for your kernel) and install the bridge-utils, debootstrap, ifenslave, lsof,
lvm2, chrony, openssh-server, sudo, tcpdump, vlan, and Python packages. Also,
enable bonding and VLAN interfacing. All these things may or may not be available for
your system, so if something is already installed or configured, just skip over it.
Configure the NTP time sync in chrony.conf to synchronize time across the whole
deployment. You can use any time source you like, but for the system to work, time must
be in sync.
Now, configure the ssh keys. Ansible is going to deploy using ssh and key-based
最新资料最新资料
authentication. Just copy the public keys from the appropriate user on your deployment
machine to /root/.ssh/authorized_keys. Test this setup by simply logging in
from the deployment host to the target machine. If everything is okay, you should be able
to log in without any password or any other prompt. Also, note that the root user on the
deployment host is the default user for managing everything and that they have to have
their ssh keys generated in advance since they are used not only on the target hosts but
also in all the containers for different services running across the system. These keys must
exist when you start to configure the system.
For storage nodes, please note that LVM volumes will be created on the local disks,
thus overwriting any existing configuration. Network configuration is going to be done
automatically; you just need to ensure that Ansible is able to connect to the target machines.
The next step is configuring our Ansible inventory so that we can use it. Let's do that now.
466 Scaling Out KVM with OpenStack
yml defines which hosts run which services and nodes. Before committing to installing,
please review the documentation mentioned in the previous paragraph.
One thing that stands out on the web is install_method. You can choose either source
or distro. Each has its pros and cons:
• Source is the simplest installation as it's done directly from the sources on the
OpenStack official site and contains an environment that's compatible with
all systems.
• The distro method is customized for the particular distribution you are installing
on by using specific packages known to work and known as being stable. The
major drawback of this is that updates are going to be much slower since not only
OpenStack needs to be deployed but also information about all the packages on
distributions, and that setup needs to be verified. As a result, expect long waits
between when the upgrade reaches the source and gets to your distro installation.
After installing, you must go with your primary choice; there is no mechanism
for switching from one to the other.
The last thing you need to do is open the user_secrets.yml file and assign passwords
for all the services. You can either create your own passwords or use a script provided just
for this purpose.
Summary 467
All of these Ansible playbooks need to be finished successfully so that we can integrate
Ansible with Openstack. So, the only thing left is to run the Ansible playbooks. We need
to start with the following command:
# openstack-ansible setup-hosts.yml
# openstack-ansible setup-infrastructure.yml
最新资料最新资料
# openstack-ansible setup-openstack.yml
All the playbooks should finish without unreachable or failed plays. And with
that – congratulations! You have just installed OpenStack.
Summary
In this chapter, we spent a lot of time describing the architecture and inner workings
of OpenStack. We discussed software-defined networking and its challenges, as well as
different OpenStack services such as Nova, Swift, Glance, and so on. Then, we moved
on to practical issues, such as deploying Packstack (let's just call that OpenStack for
proof of concept), and full OpenStack. In the last part of this chapter, we discussed
OpenStack-Ansible integration and what it might mean for us in larger environments.
Now that we've covered the private cloud aspect, it's time to grow our environment
and expand it to a more public or hybrid-based approach. In KVM-based infrastructures,
this usually means connecting to AWS to convert your workloads and transfer them there
(public cloud). If we're discussing the hybrid type of cloud functionality, then we have
to introduce an application called Eucalyptus. For the hows and whys, check out the
next chapter.
468 Scaling Out KVM with OpenStack
Questions
1. What is the main problem with VLAN as a cloud overlay technology?
2. Which types of cloud overlay networks are being used on the cloud market today?
3. How does VXLAN work?
4. What are some of the most common problems with stretching Layer 2 networks
across multiple sites?
5. What is OpenStack?
6. What are the architectural components of OpenStack?
7. What is OpenStack Nova?
8. What is OpenStack Swift?
9. What is OpenStack Glance?
10. What is OpenStack Horizon?
11. What are OpenStack flavors?
12. What is OpenStack Neutron?
Further reading
最新资料最新资料
Please refer to the following links for more information regarding what was covered in
this chapter:
Introduction to AWS
While talking about cloud services, AWS is one that needs almost no introduction,
although few people actually understand how big and complex a system the whole
Amazon cloud is. What is completely certain is that, at this time, it is unquestionably
the biggest and most used service on the planet.
Before we do anything else, we need to talk about how and why AWS is so important,
not only in regard to its impact on the internet but also on any task that even remotely
tries to provide some kind of scaling.
As we already have done a couple of times in this book, we will start with the basic
premise of the AWS cloud – to provide a broadly scalable and distributed solution that
will encompass all possible scenarios for performing any type of workload on the internet.
In almost every other place in this book where we mentioned the cloud, we talked about
最新资料最新资料
scaling up, but when we try to describe AWS as being able to scale, we are talking about
probably one of the largest providers of capacity and scalability on the planet.
Right now, there are more or less four really big cloud providers on the internet: AWS,
Microsoft Azure, Google Cloud Platform, and Alibaba. Since all the numbers are
confidential for business reasons, the number of servers and sheer capacity they can
provide is something analysts try to estimate, or more frequently guess, but it has to
be in the millions.
This idea was different from anything on the market. People were used to having
collocated computers in data centers, and being able to rent a server in the cloud, but the
concept of renting just the part of the stack they needed instead of the entire hardware
infrastructure was something new. The first big service AWS offered was simple, and
was even named like that – Simple Storage Service (S3), along with Elastic Compute
Cloud (EC2). S3 was basically cloud-backed storage that offered almost unlimited storage
resources for those who could pay for them in pretty much the same way it is available
even today. EC2 offered computing resources.
Offerings expanded to a Content Delivery Network (CDN) and much more during the
next 6 years, while the competition was still trying to get to grips with what the cloud
actually meant.
We'll come back to the services AWS offers in a moment, but only after we mention the
competition they eventually got in what has become a market worth hundreds of billions
of dollars yearly.
Microsoft realized it would need to build up an infrastructure to support itself and its
customers sometime in the late 2000s. Microsoft had its own business support infrastructure
in place to run the company, but there were no public services offered to the general public
at that time. That changed once Microsoft Azure was introduced in 2010. Initially, it was
called Windows Azure, and it was mainly created to run services for both Microsoft and its
最新资料最新资料
partners, mainly on Windows. Very quickly, Microsoft realized that offering just a Microsoft
operating system in the cloud was going to cost them a lot of customers, so Linux was also
offered as an operating system than could be installed and used.
Azure now runs as a publicly available cloud, but a large portion of it is still used by
Microsoft and its services, most notably Office 365 and a myriad of Microsoft training
and marketing solutions.
Google, on the other hand, came to the market in a different way. They also realized the
need for a cloud offering but limited their first engagement with the cloud to offering a
single service called App Engine, in 2008. It was a service targeted at the web developer
community, and Google even gave 10,000 free licenses for the usage of the service in a
limited way. At its core, this service and almost all the services that came after it came out
with the premise that the web needs services that will enable developers to quickly deploy
something that may or may not scale and that may or may not work. Therefore, giving it for
free meant that a lot of developers were inclined to use the service just for simple testing.
Google now also has a vast number of services offered, but when you take a look from
outside at the actual services and the pricing, it seems that Google has created its cloud
as a way to lease out extra capacity it has available in its data centers.
472 Scaling out KVM with AWS
Multi-cloud
Looking a few years back, both Azure and Google Cloud Platform had a viable cloud
service, but compared to what AWS was offering, their services were simply not up to par.
AWS was the biggest player, both in terms of market share, but also in people's minds.
Azure was considered as being more Microsoft oriented, although more than half of the
servers running on it are Linux-based, and Google just wasn't perceived as a competitor;
their cloud looked more like a side business than a viable proposal to run a cloud.
Then came multi-cloud. The idea is simple – do not use a single cloud to deploy your
services; use multiple cloud providers to provide both data redundancy, availability, and
failover, and most important – cost reduction and agility. It may sound strange, but one of
the biggest costs when using a cloud service is getting data out of it. Usually, getting data
into the cloud, be it the user uploading data, or you deploying data on a server, is either
free or has an extremely low cost, which makes sense, since you are more likely to use
more services on this particular cloud if you have a lot of data online. Once you need to
extract your data, it becomes expensive. This is intentional, and it keeps users locked into
the cloud provider. Once you upload your data, it is much cheaper to just keep it on the
servers and not try to work with it offline. But the data is not the only thing that has to be
considered when talking about multi-cloud; services are also part of the equation.
Why multi-cloud?
最新资料最新资料
Many companies (and we must stress that multi-cloud users are mostly big companies
because of the costs involved) are scared of being locked into a particular platform
or technology. One of the biggest questions is what happens if a platform changes
so much that the company has to redo part of its infrastructure? Imagine that you
are a multibillion-dollar company running an enterprise application for hundreds of
thousands of your own users. You chose the cloud for the usual reasons – to keep capital
expenditures down and to be able to scale your services. You decided to go with one
of the big providers. Suddenly, your provider decides it is going to change technologies
and will phase out some part of the infrastructure. When it comes to shifts like that it
usually means that your current setup will slowly become much more expensive, or you
are going to lose some part of the available functionalities. Thankfully, these things also
typically stretch into years, as no sane cloud provider is going to go through a strategic
change overnight.
But a change is a change, and you as a corporation have a choice – stay with the provider
and face the consequences in the form of a much higher price for your systems – or
redesign the systems, which will also cost money, and may take years to finish – sometimes
decades.
Introduction to AWS 473
So, a lot of companies decided on a very conservative strategy – to design a system that
could run on any cloud, and that means using the lowest common denominator of all
available technologies. This also means that the system can be migrated from cloud to
cloud in a matter of days. Some of them then decided to even run the system on different
clouds at the same time. This is an extremely conservative approach, but it works.
Another big reason to use a multi-cloud strategy is the complete opposite of the one that
we just mentioned. Sometimes, the idea is to use a particular cloud service or services
that are the best in a very specialized task. This means choosing different services from
different providers to perform separate tasks but to do it as efficiently as possible. In the
long run, it will also mean having to change providers and systems from time to time, but
if the core system that the company uses is designed with that in mind, this approach can
have its benefits.
Shadow IT
There is another way that a company can become a multi-cloud environment without even
knowing it, this is usually called Shadow IT. If a company does not have a strict security
policy and rules, some of the workers might start using services that are not part of the
services that they are provided with by the company. It could be a cloud storage container,
a videoconferencing system, or a mailing list provider. In bigger companies, it could even
be that entire departments start using something from different cloud providers without
最新资料最新资料
even realizing it. All of a sudden, there is company data on a server that is outside of the
scope that company's IT covers or is able to cover.
One of the better examples of this particular phenomenon was how the usage of video
conferencing services changed during the COVID-19 virus worldwide pandemic. Almost
all companies had an established communication system, usually a messaging system that
covered the whole company. And then, literally overnight, the pandemic put all workers
in their homes. Since communication is the crucial thing in running a company, everyone
decided to switch to video and audio conferencing in the span of a week, globally. What
happened next can and probably will become a bestselling book theme one day. Most
companies tried to stick with their solution but almost universally that attempt failed on
the first day, either because the service was too basic or too outdated to be used as both
an audio and video conferencing solution, or because the service was not designed for
the sheer volume of calls and requests and crashed.
People wanted to coordinate, so suddenly nothing was off the table. Every single video
conferencing solution suddenly became a candidate. Companies, departments, and teams
started experimenting with different conferencing systems, and cloud providers soon
realized the opportunity – almost all the services instantly became free for even sizeable
departments, allowing, in some cases, up to 150 people to participate in conferences.
474 Scaling out KVM with AWS
A lot of services crashed due to demand, but big providers mostly were able to scale up to
the volume required to keep everything running.
Since the pandemic was global, a lot of people decided they also needed a way to talk to
their family. So individual users started using different services at home, and when they
decided something worked, they used it at work. In a matter of days, companies became
a multi-cloud environment with people using one provider for communication, another
for email, a third for storage, and a fourth for backups. The change was so quick that
sometimes IT was informed of the change a couple of days after the systems went online
and the people were already using them.
This change was so enormous that at the time we are writing this book, we cannot even try
to predict how many of these services are going to become a regular part of the company
toolset, once users realize something works better than the company-provided software.
These services further prove this point by being able to work continuously through a
major disaster like this, so there is only so much a company-wide software usage policy
can do to stop this chaotic multi-cloud approach.
Market share
One of the first things everyone mentions as soon as cloud computing companies and
services are mentioned is the market share each one of them has. We also need to address
最新资料最新资料
this point, since we said that we are talking about the biggest one or the second one. Before
multi-cloud became a thing, market share was divided basically between AWS, with the
biggest market share; Azure as a distant second; followed by Google and a big group of
small providers, such as Alibaba, Oracle, IBM, and such.
Once multi-cloud became a thing, the biggest problem became how to establish who
had the biggest actual market share. All the big companies started using different cloud
providers and just simply trying to add up the market share of the providers became
difficult. From different polls, it is clear that Amazon is still the leading provider but that
companies are slowly starting to use other providers together with Amazon services.
What this means is that, right now, AWS is still by far the cloud provider of choice but the
choice itself is no longer about a single provider. People are using other providers as well.
There is a distinct company that has a big cloud presence but uses its own infrastructure
almost exclusively to deliver its own content – Facebook. Although it's hard to compare
infrastructure sizes in terms of the number of servers, data centers, or any other metric,
since those numbers are a closely guarded secret, Facebook has an infrastructure that is in
the same order of magnitude in size as AWS. The real difference is that this infrastructure
is not going to serve services for third parties, and in reality, it was never meant to do so;
everything that Facebook created was tailor-made to support itself, including choosing
locations for the data centers, configuring and deploying hardware, and creating software.
Facebook is not going to suddenly turn into another AWS; it's too big to do that. Available
infrastructure does not always correlate with cloud market share.
Pricing
Another topic we have to cover, if just to mention it, is the one of pricing. Almost every
mention of the cloud in this book is technical. We compared probably every possible
metric that made any sense, from IOPS, through GHz, to PPS on the network side, but
the cloud is not only a technical problem – when you have to put it in use, someone has
to pay for it.
Pricing is a hot topic in the cloud world since the competition is fierce. All the cloud
providers have their pricing strategies, and rebates and special deals are almost the norm,
and all that turns understanding pricing into a nightmare, especially if you are new to all
最新资料最新资料
the different models. One thing is certain, all the providers will say that they are going to
charge you only for what you use but defining what they actually mean by that can be
a big problem.
When starting to plan the costs of deployment, you should first stop and try to define
what you need, how much of it you need, and whether you are using the cloud in the way
it is meant to be used. By far the most common mistake is to think that the cloud is in
any form similar to using a normal server. The first thing people notice is the price of a
particular instance, in a particular data center, running a particular configuration. The
price will usually be either the monthly cost of the instance, and will usually be prorated,
so you will pay only for the part that you use, or the price will be given for a different time
unit – per day, per hour, or maybe even per second. This should be your first clue: you
pay for using the instance, so in order to keep the costs down, do not keep your instances
running all the time. This also means that your instances must be designed to be quickly
brought up and down on demand so using the standard approach of installing a single or
multiple servers and running them all the time is not necessarily a good option here.
476 Scaling out KVM with AWS
When choosing instances, the options are literally too numerous to name here. All the
cloud providers have their own idea of what people need, so you can not only choose
simple things such as the number of processors or the amount of memory but also get an
OS preinstalled, and get wildly varied types of storage and networking options. Storage
is an especially complicated topic we are just going to quickly scratch the surface of here
and only mention later. All the cloud providers offer two things – some sort of storage
meant to be attached to instances, and some sort of storage that is meant to be used as a
service. What a given provider offers can depend on the instance you are trying to request,
the data center you are trying to request it in, and a number of other factors. Expect that
you will have to balance three things: capacity, pricing, and speed. Of course, here, we are
talking about instance storage. Storage as a service is even more complicated and with
that, you have to think about pricing and capacity, but also about other factors like latency,
bandwidth, and so on.
For example, AWS enables you to choose from a variety of services that go from database
storage, file storage, and long-term backup storage, to different types of block and object
storage. In order to use these services optimally, you need to first understand what is being
offered, how it's being offered, what the different costs involved are, and how to use them
to your advantage.
Another thing that you will notice quickly is that when the cloud providers said that
everything is a service, they really meant it. It is completely possible to have a running
最新资料最新资料
The other thing to mention when talking about pricing is that everything costs a little, but
the composite price for a given configuration can be huge. Any additional resource will
cost money. An internal link between servers, external IP address, firewall, load balancer,
virtual switch, these are all the things that we usually don't even think about when
designing infrastructure, but once we need them in the cloud, they can become expensive.
Another thing to expect is that some of the services have different contexts – for example,
network bandwidth can have a different price if you are transferring data between
instances or to the outside world. The same goes for storage – as we mentioned earlier in
this chapter, most providers will charge you different prices when storing and getting data
out of the cloud.
Data centers
A couple of times in this chapter, we have mentioned data centers, and it is important that
we talk a little bit about them. Data centers are at the core of the cloud infrastructure, and
in more ways than you may think of. When we talked about a lot of servers, we mentioned
that we usually group them into racks, and put the racks into data centers. As you are
probably aware, a data center is in its essence a group of racks with all the infrastructure
that servers need to function optimally, both in terms of power and data, but also when it
comes to cooling, securing, and all the other things required to keep the servers running.
They also need to be logically divided into risk zones that we usually call fault domains,
最新资料最新资料
so that we can avert various risks associated with the we deployed everything on one rack
or we deployed everything on one physical server scenarios.
A data center is a complex infrastructure element in any scenario since it requires a
combination of things to be efficient, secure, and redundant. Putting a group of servers
in a rack is easy enough, but providing cooling and power is not a simple task. Add to
that the fact that the cooling, power, and data all have to be redundant if you want your
servers to work, and that all that needs to be secure, both from fires, floods, earthquakes
and people, and the cost of running a real data center can be high. Of course, a data center
running a couple of hundred servers is not as complex as the one running thousands or
even tens of thousands, and the prices rise with the size of the facility. Add to that that
having multiple data centers creates additional infrastructure challenges in connecting
them so costs add up.
478 Scaling out KVM with AWS
Now multiply that cost by a hundred since this is the number of data centers each of the
cloud providers keep around the world. Some of the centers are small, some are huge
but the name of the game is simple – networking. In order to be a truly global provider,
all of them have to have a data center, or a couple of servers at least, as close to you as
possible. If you are reading this in one of the bigger cities in almost any big country in the
world, chances are there is an AWS, Microsoft, or Google-owned server in a radius of 100
miles from you. All the providers try to have at least one data center in every big city in
every country since that can enable them to offer a range of services extremely quickly.
This concept is called Point of Contact (POC) and means that when connecting to the
provider's cloud, you just need to get to the nearest server, and after that, the cloud will
make sure your services are as quick as possible.
But when we are talking about data centers that actually belong to Amazon or the others,
we are still dealing with a large-scale operation. Here, numbers are in the hundreds, and
their location is also a secret, mainly for security reasons. They all have a few things in
common. They are a highly automated operation situated somewhere in the vicinity of
a major power source, a major source of cooling, or a major data hub. Ideally, they would
be placed in a spot that has all those three things, but that is usually not possible.
center can mean a lot of cost savings. Some even go to extremes. Microsoft, for instance,
has a data center completely submerged in the ocean to facilitate cooling.
When providing a service for a particular user, your main concern is usually speed and
latency, and that in turn means that you want your server or your service to run in the
data center that is closest to the user. For that purpose, all cloud providers divide their
data centers geographically, which in turn enables administrators to deploy their services
in the optimal part of the internet. But at the same time, this creates a typical problem
with resources – there are places on the planet that have a small number of available data
centers but are heavily populated, and there are places that are quite the opposite. This in
turn has a direct influence on the price of resources. When we talked about pricing, we
mentioned different criteria; now we can add another one – location. The location of a
data center is usually given as a region. This means that AWS, or any other provider for
that matter, is not going to give you the location of their data center, but instead will say
users in this region would be best served by this group of servers. As a user, you have no idea
where the particular servers are, but instead, you only care about the region as given to
you by the provider. You can find the names of service regions and their codes here:
Introduction to AWS 479
最新资料最新资料
Figure 13.1 – Service regions on AWS with the names used in the configuration
Choosing a service provided by a region that is in heavy demand can be expensive, and
that same service can be much cheaper if you choose a server that is somewhere else. This
is the beauty of the cloud – you can use the services that suit you and your budget.
Sometimes price and speed are not the most important things. For example, legal
frameworks such as GDPR, a European regulation on personal data collection,
processing, and movement, basically states that companies from Europe must use a data
center in Europe since they are covered by the regulation. Using a US region in this case
could mean that a company could be legally liable (unless the company running this cloud
service is a part of some other framework that allows this – such as Privacy Shield).
480 Scaling out KVM with AWS
AWS services
We need to talk a little bit about what AWS offers in terms of services since understanding
services is one thing that will enable you to use the cloud appropriately. On AWS, all the
available services are sorted into groups by their purpose. Since AWS has hundreds of
services, the AWS management console, the first page you will see once you log in, will
at first be a daunting place.
You will probably be using the AWS Free Tier for learning purposes, so the first step is to
actually open an AWS Free account. Personally, I used my own personal account. For the
Free account, we need to use the following URL: https://aws.amazon.com/free/,
and follow along with the procedure. It just asks for a couple of pieces of information,
such as email address, password, and AWS account name. It will ask you for credit card
info as well, to make sure that you don't abuse the AWS account.
After signing up, we can log in and get to the AWS dashboard. Take a look at this screenshot:
最新资料最新资料
Every single thing here is a link, and they all point to different services or pages with
subservices. Moreover, this screenshot shows only about a third of all available services.
There is no point in covering them all in this book; we are just going to use three of all
these to show how AWS connects to our KVM infrastructure, but once you get the hang
of it, you will slowly begin to understand how everything connects and what to use in
a particular moment. What really helps is that AWS has great documentation, and that
all the different services have provisioning wizards that help you find the thing you are
looking for.
In this particular chapter, we are going to use these three services: IAM, EC2, and S3.
All of these are of course abbreviations, but other services just use project names, such as
CloudFront or Global Accelerator. In any case, your first point of order should be to start
using them, not just read about them; it is much easier to understand the structure once
you put it to good use.
In this chapter, we used a Free account, and almost everything we did was free, so there
is no reason for you not to try to use the AWS infrastructure yourself. AWS tries to be
helpful there as much as it can, so if you scroll down on the console page, you will find
these helpful icons:
最新资料最新资料
Figure 13.3 – Some AWS wizards, documentation, and videos – all very helpful
482 Scaling out KVM with AWS
All of these are simple scenarios that will get you up and running in a couple of minutes,
for free. Amazon realizes that first-time users of the cloud are overwhelmed with all the
choices, so they try to get your first machine running in a couple of minutes to show you
how easy it is.
Let's get you acquainted with the services we are going to use, which we're going to do
by using a scenario. We want to migrate a machine that was running in our local KVM
installation into Amazon AWS. We are going to go through the whole process step by step,
but we first need to understand what we need. The first thing, obviously, is the ability to
run virtual machines in the cloud. In the AWS universe, this is EC2 or Amazon Elastic
Compute Cloud in full.
EC2
EC2 is one of the few real core services that basically runs everything there is to run in
the AWS cloud. It is a scalable computing capacity provider for the whole infrastructure.
It enables running different instances or virtual computing environments, using various
configurations of storage, memory, CPU, and networking, and it also provides everything
else those instances need, including security, storage volumes, zones, IP addresses, and
virtual networks. Some of these services are also available separately in case you need
more complex scenarios, for example, a lot of different storage options exist, but the core
functionality for the instances is provided by EC2.
最新资料最新资料
S3
The full name of this service is actually Amazon Simple Storage Service, hence the name
Amazon S3. The idea is to give you the ability to store and retrieve any amount of data,
anytime you need it, using one or more of the methods offered. The most important
concept we are going to use is an S3 bucket. A bucket is a logical storage element that
enables you to group objects you store. Think of it as a name for a storage container you
will later use to store things, whatever those things may be. You can name your buckets
however you want, but there is a thing we must point out – the names of buckets have to
be globally unique. This means that when you name a bucket, it must have a name that is
not repeated anywhere else in any of the regions. This makes sure that your bucket will
have a unique name, but it also means that trying to create a generic-sounding name
such as bucket1 or storage is probably not going to work.
Preparing and converting virtual machines for AWS 483
Once you create a bucket, you can upload and download data from it using the web, a
CLI, or an API. Since we are talking about a global system, we must also point out that
data is stored in the region you specify when creating the bucket, and is kept there unless
you specify you want some form of multi-region redundancy. Have that in mind when
deploying buckets, since once you start using the data in the bucket, your users or your
instances need to get the data, and latency can become a problem. Due to legal and privacy
concerns, data never leaves your dedicated region unless you explicitly specify otherwise.
A bucket can store any number of objects, but there is a limit of 100 buckets per account.
If that is not enough, you can request (and pay) to have that limit raised to 1,000 buckets.
Also, take a close look at other different options for storing and moving data – there are
different types of storage that may or may not fit your needs and budget, such as, for
example, S3 Glacier, which offers much cheaper options for storing large amounts of
data, but is expensive if you need to get the data out.
IAM
AWS Identity and Access Management (IAM) is the service we need to use since it enables
access management and permissions for all the objects and services. In our example, we are
going to use it to create policies, users, and roles necessary to accomplish our task.
最新资料最新资料
Other services
There is simply no way to mention all the services AWS offers in simple form. We
mentioned only the ones that were necessary and tried to point you in the right direction.
It is up to you to try and see what your usage scenario is, and how to configure whatever
satisfies your particular needs.
So far, we have explained what AWS is and how complex it can become. We have also
mentioned the most commonly used parts of the platform and started explaining what
their functions are. We are going to expand on that as we actually migrate a machine
from our local environment into AWS. This is going to be our next task.
If you actually try to do it, you will quickly understand that, given basic knowledge of
the way AWS works, you will not be able to follow the instructions. This is why we choose
to do this simple task as an example of using AWS to quickly create a working VM in
the cloud.
We only need to convert the image file, containing the disk image for the system;
other data was left out of the installation of the VM. We need to have in mind
that we are converting to a format that has no compression so your file size can
significantly increase:
Figure 13.6 – The conversion process and the corresponding capacity change
We can see that our file increased from 42 MB to 8 GB just because we had to remove
the advanced features qcow2 offers for data storage. The free tier offers only 5 GB of
storage, so please make sure to configure the raw image size correspondingly.
Our next obvious step is to upload this image to the cloud since the conversion is
done there. Here, you can use different methods, either GUI or CLI (API is also a
最新资料最新资料
Before we can use the AWS CLI, we need to configure it. This tool has to know how
it is going to connect to the cloud, which user it's going to use, and has to have all
the permissions granted in order for the tool to be able to get the data uploaded to
AWS, and then imported and converted into the EC2 image. Since we do not have
that configured, let's switch to the GUI on AWS and configure the things we need.
Important note
From now on, if something looks edited in the screenshots, it probably is. To
enable things to work seamlessly, AWS has a lot of personal and account data
on the screen.
4. We will go into Identity and Access Management or IAM, which looks like the
following screenshot. Go to Services | Security, click on Identity & Compliance,
and click on IAM. This is what you should see:
最新资料最新资料
We need to choose users on the left side of the screen. This will give us access to
the user console. We will create a user named Administrator, a group named
administrators, and apply appropriate permissions to them. Then we are going
to join the user to the group. On the first screen, you can choose both options,
Programmatic access and AWS Management Console access. The first one enables
you to use the AWS CLI, and the second one enables the user to log in to the
management console if we need this account to configure something. We choose
only the first one but will add the API key later:
最新资料最新资料
最新资料最新资料
We can assign appropriate policies directly to the user, but having policies assigned
to groups, and then assigning users to appropriate groups is a better option, saving
a lot of time when we need to remove some permissions from users. Once you
click the Create group button on the left, you will be able to name and create
the group. Below the name box are a lot of predefined policies that we can use to
configure a strict user policy. We can also create custom policies, but we are not
going to do that:
最新资料最新资料
The next step is the tags: you can create different attributes or tags that can be used
later for identity management.
5. Tagging can be done by using almost anything – name, email, job title, or whatever
you need. We are going to leave this empty:
最新资料最新资料
Figure 13.16 – Reviewing the user configuration with group, policy, and tag options
Preparing and converting virtual machines for AWS 493
Accept these and add the user. You should be greeted with a reassuring green
message box that will give you all the relevant details about what just happened.
There is also a direct link to the console for management access, so you can share
that with your new user:
最新资料最新资料
A couple of things need to be said about this key. There are two fields – one is the key
itself, which is Access key ID, the other is the secret part of the key, which is Secret
access key. In regard to security, this is completely the same as having a username and
password for a particular user. You are given only one opportunity to see and download
the key, and after that, it is gone. This is because we are dealing with hashed information
here, and AWS is not storing your keys, but hashes of them. This means there is no way
to retrieve a key if you didn't save it. It also means if somebody grabs a key, let's say
by reading it off a screenshot, they can identify themselves as the user that has the key
assigned. The good thing is that you can create as many keys as you want and revoking
them is only a question of deleting them here. So, save the key somewhere safe:
最新资料最新资料
We are finished with the GUI for now. Let's go back and install the AWS CLI:
1. We just need to start the installation script and let it finish its job. This is done by
starting the file called install in the aws directory:
最新资料最新资料
2. Select S3 as the service in the console. There is a button at the top right labeled
Create bucket – click it. The following screen will appear. Now create a bucket that
is going to store your virtual machine in its raw format. In our case, we labeled the
bucket importkvm but choose a different name. Make sure that you take note of
the region pull-down menu – this is the AWS location where your resource will
be created. Remember that the name has to be unique; if everybody who bought
this book tried to use this name, only the first one would succeed. Fun fact: if by
the time you read this, we haven't deleted this bucket, nobody will be able to create
another with the same name, and only those of you reading this exact sentence will
understand why. This wizard is quite big in terms of screen estate and might not fit
on a single book page, so let's split it into two parts:
最新资料最新资料
Figure 13.21 – Wizard for creating an S3 bucket – selecting bucket name and region
Preparing and converting virtual machines for AWS 497
最新资料最新资料
OK, having done that, it's time for some waiting on the next command to finish. In the
next step, we are going to use our AWS CLI to copy the .raw file onto S3.
Important note
Depending on the type of account, from this point on, it is possible that we will
have to pay for some of the services that we create since they may overdraft the
free tier enabled on your account. If you do not enable anything expensive, you
should be fine, but always take look at your Cost management dashboard, and
check that you are still in the black.
最新资料最新资料
Figure 13.24 – Using the AWS CLI to copy a virtual machine raw image to an S3 bucket
That's the end result. Since we are talking about 8 GB of data, you will have to wait
for some time, depending on your upload speed. The syntax for the AWS CLI is
pretty straightforward. You can use most Unix commands that you know, both
ls and cp do their job. The only thing to remember is to give your bucket name
in the following format as the destination: s3://<bucketname>.
2. After that, we do an ls – it will return the bucket names, but we can list their contents
by using the bucket name. In this example, you can also see it took us something like
15 minutes to transfer the file from the moment we created the bucket:
And now starts the fun part. We need to import the machine into EC2. To do that,
we need to do a few things before we will be able to do the conversion. The problem
is related to permissions – AWS services are unable to talk to each other by default.
Therefore, you have to give explicit permission to each of them to do the importing.
In essence, you have to let EC2 talk to S3 and get the file from the bucket.
3. For upload purposes, we will introduce another AWS concept – .json files. A lot
of things in AWS are stored in .json format, including all the settings. Since the
GUI is rarely used, this is the quickest way to communicate data and settings, so
we must also use it. The first file we need is trust-policy.json, which we are
using to create a role that will enable the data to be read from the S3 bucket:
{
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Allow",
"Principal":{
"Service":"vmie.amazonaws.com"
}, 最新资料最新资料
"Action":"sts:AssumeRole",
"Condition":{
"StringEquals":{
"sts:ExternalId":"vmimport"
}
}
}
]
}
Just create a file with the name trust-policy.json, and get the preceding
code typed in. Do not change anything. The next one up is the file named
role-policy.json. This one has some changes that you have to make. Take
a closer look inside the file and find the lines where we mention our bucket name
(importkvm). Delete our name and put the name of your bucket instead:
{
"Version":"2012-10-17",
"Statement":[
{
500 Scaling out KVM with AWS
"Effect":"Allow",
"Action":[
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetObject"
],
"Resource":[
"arn:aws:s3:::importkvm"
"arn:aws:s3:::importkvm/*"
],
},
{
"Effect":"Allow",
"Action":[
"s3:GetObject",
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetBucketAcl",
"s3:PutObject"
最新资料最新资料
],
"Resource":[
"arn:aws:s3:::importkvm"
"arn:aws:s3:::importkvm/*"
],
},
{
"Effect":"Allow",
"Action":[
"ec2:ModifySnapshotAttribute",
"ec2:CopySnapshot",
"ec2:RegisterImage",
"ec2:Describe*"
],
"Resource":"*"
}
]
}
Preparing and converting virtual machines for AWS 501
Now it's time to put it all together and finally upload our virtual machine to AWS.
4. Execute these two commands, disregard whatever happens in the formatting – both
of them are one-liners, and the filename is the last part of the command:
/usr/local/bin/aws iam create-role --role-name vmimport
--assume-role-policy-document file://trust-policy.json
/usr/local/bin/aws iam put-role-policy --role-name
vmimport --policy-name vmimport --policy-document file://
role-policy.json
最新资料最新资料
"Userbucket": {
"S3Bucket": "importkvm",
"S3Key": "deploy1.raw"
}
]
As you can see, there is nothing special in the file, but when you create your own
version, pay attention to use your name for the bucket and the disk image that
is stored inside the bucket. Name the file whatever you want, and use that name
to call the import process:
最新资料最新资料
1. Go to your EC2 console. You should be able to find the image under AMIs on the
left side:
最新资料最新资料
Figure 13.29 – Our AMI has been uploaded successfully and we can see it in the EC2 console
Now click the big blue Launch button. There are a couple of steps you need to
finish before your instance is running, but we are almost there. First, you need to
choose your instance type. This means choosing what configuration fits your needs,
according to how much of everything (CPU, memory, and storage) you need.
504 Scaling out KVM with AWS
2. If you are using a region that is not overcrowded, you should be able to spin a free
tier instance type that is usually called t2.micro and is clearly marked. In your
free part of the account, you have enough processing credits to enable you to run
this machine completely free:
最新资料最新资料
3. EC2 is going to put this key into the appropriate accounts on the machine you
are just creating (all of them), so you can log in without using the password. A key
pair is generated if you choose to do so, but Amazon will not store it – you have
to do that:
最新资料最新资料
To connect, you need to use the key pair provided to you, and you need an ssh client.
Alternatively, you can use the embedded ssh that AWS provides. In any case, you need the
outside address of the machine, and AWS also provides that, along with simple instructions:
最新资料最新资料
That's it. You have successfully connected to your machine. You can even keep it running.
But be aware, if you have accounts or services that are on by default or have no password –
you have, after all, pulled a VM out of your safe, home sandbox and stuck it on the big, bad
internet. And one last thing: after you have your VM running, delete the file in the bucket to
save you some resources (and money). After conversion, this file is no longer needed.
The next topic on our list is how to extend our local cloud environments into hybrid
cloud environments by using an application called Eucalyptus. This is a hugely popular
process that a lot of enterprise companies go through as they scale their infrastructure
beyond their local infrastructure. Also, this offers benefits in terms of scalability when
needed – for example, when a company needs to scale its testing environment so an
application that its employees are working on can be load-tested. Let's see how it's done
via Eucalyptus and AWS.
with buckets and all of that. This is on purpose, and with consent from Amazon. Having
an environment like this is a great thing for everybody since it creates a safe space for
developers and companies to deploy and test their instances.
Eucalyptus consists of several parts:
• Cloud Controller (CLC) is the central point of the system. It provides both the
EC2 and the web interfaces and routes every task to itself. It is there to provide
scheduling, allocation of resources, and accounting. There is one of these per cloud.
• Cluster Controller (CC) is the part that manages each individual node and controls
VMs and their execution. One is running in each availability zone.
• Storage Controller (SC) is there to provide block-level storage, and to provide
support for instances and snapshots but within the cluster. It is similar to EBS
storage from AWS.
• Node Controller (NC) hosts instances and their endpoints. One is running for
each node.
最新资料最新资料
When you get to know Eucalyptus, you'll notice that it has a vast array of capabilities. It
can do the following:
• Work with volumes, instances, key pairs, snapshots, buckets, images, network
objects, tags, and IAM.
• Work with load balancers.
• Work with AWS as an AWS integration tool.
These are just some of the features worth mentioning at the start of your Eucalyptus journey.
Eucalyptus has an additional command-line interface called Euca2ools, available as a
package for all the major Linux distributions. Euca2ools is an additional tool that provides
full API and CLI compatibility between AWS and Eucalyptus. This means that you can use
a single tool to manage both and to perform hybrid cloud migrations. The tool is written
in Python, so it is more or less platform-independent. If you want to learn more about this
interface, make sure that you visit https://wiki.debian.org/euca2ools.
Building hybrid KVM clouds with Eucalyptus 509
• If you try to install on a machine that has less than 16 GB of RAM, the installation
will probably fail.
• The installation will, however, succeed on a machine with a smaller disk size than
the minimum recommended, but you will almost immediately run out of disk space
as soon as you start getting the deployment images installed.
最新资料最新资料
For production, everything changes – the minimums are 160 GB for the storage, or 500 GB
of storage for nodes that are going to run Walrus and SC services. Nodes must run on bare
metal; nested virtualization is not supported. Or, to be more precise, it will work but will
negate any positive effect that the cloud can provide.
Having said all that, we have another point to make before you start installing – check for
the availability of a new version, and have in mind that it is quite possible that there is a
newer release than the one that we are working on in this book.
Important note
At the time of writing, the current version was 4.4.5, with version 5 being
actively worked on and close to being released.
Having installed your base operating system – and it has to be a core system without
a GUI, it's time to do the actual Eucalyptus installation. The whole system is installed
using FastStart, so the only thing we have to do is to run the installer from the internet.
The link is helpfully given on the front page of the following URL for the project –
https://eucalyptus.cloud.
510 Scaling out KVM with AWS
The installation looks strange if you're seeing it for the first time. It kind of reminds us
of some text-based games and services that we used in the 1990s (MUD, IRC):
最新资料最新资料
Once installed, Eucalyptus will provide you with a default set of credentials:
Important note
In the event that the current installer breaks, the solutions to the subsequent
problems are in the CiA video here: <video_URL>. There are known bugs,
and may or may not be solved before this book hits the stores. Make sure that
you check https://eucalyptus.cloud and documentation before
installation.
The information is case sensitive. Having finished the installation, you can connect to
the machine using a web browser and log in. The IP address you are going to use is the
IP address of the machine you just installed:
最新资料最新资料
On the other hand, by the time you read this, maybe the problem will be fixed. The
tutorial on how to do this and all of its parts is offered in plain text mode on the machine
Eucalyptus is running on. It's not available in the GUI:
最新资料最新资料
Figure 13.38 – Using the master tutorial to learn how to configure Eucalyptus
Building hybrid KVM clouds with Eucalyptus 513
As you can see, everything is explained in detail, so you can really learn key concepts in a
short amount of time. Go through the tutorial – it is well worth it. Another thing you can
do from the command line as soon as you start up the system is to download a couple of
new template images. The script for this is also started from the web, and is written in big
letters on the official site, literally on the landing page located at the following URL (make
sure that you scroll down a bit) – https://www.eucalyptus.cloud/:
最新资料最新资料
Figure 13.40 – Simple menu asking us to select which image we want to install
514 Scaling out KVM with AWS
Choose one at a time, and they will get included in the image list.
Now let's switch to the web interface to see how it works. Log in using the credentials
written above. You will be greeted with a well-designed dashboard. On the right, there
are groups of functionalities that are most commonly used. The left part is reserved for
the menu that holds links to all the services. The menu will autohide as soon as you move
your mouse away from it, leaving only the most essential icons:
最新资料最新资料
1. In the left part of the stack of services, there is an inviting green button labeled
Launch instance – click on it. A list of the images that are available on the system
will appear. We already used the script to grab some cloud images, so we have
something to choose from:
Building hybrid KVM clouds with Eucalyptus 515
最新资料最新资料
最新资料最新资料
After clicking the LAUNCH INSTANCE button, your machine should boot. For
testing purposes, we already started another machine earlier, so right now we have
two of them running:
最新资料最新资料
最新资料最新资料
最新资料最新资料
We will stop here with the AWS integration. The point of this chapter was, after all, to
get you to see how Eucalyptus connects to AWS. You might see that this interface lacks
functionality that AWS has, but can at the same time be more than enough to control a
basic mid-size infrastructure – buckets, images, and instances from one place. Having
tested the 5.0 beta 1 version, we can definitely tell you that the full 5.0 version should
be quite a substantial upgrade when it comes out. The beta version already has many
additional options and we're rather excited to see when the full release comes out.
Summary
In this chapter, we covered a lot of topics. We introduced AWS as a cloud solution and did
some cool things with it – we converted our virtual machine so that we can run in it, and
made sure that everything works. Then we moved to Eucalyptus, to check how we can
use it as a management application for our local cloud environment, and how to use it
to extend our existing environment to AWS.
The next chapter takes us into the world of monitoring KVM virtualization by using
the ELK stack. It's a hugely important topic, especially as companies and infrastructures
grow in size – you just can't keep up with the organic growth of IT services by manually
monitoring all possible services. The ELK stack will help you with that – just how much,
you'll learn in the next chapter. 最新资料最新资料
Questions
1. What is AWS?
2. What are EC2, S3, and IAM?
3. What is an S3 bucket?
4. How do we migrate a virtual machine to AWS?
5. Which tool do we use to upload a raw image to AWS?
6. How do we authenticate ourselves to AWS as a user?
7. What is Eucalyptus?
8. What are the key services in Eucalyptus?
9. What are availability zones? What are fault domains?
10. What are the fundamental problems of delivering Tier-0 storage services for
virtualization, cloud, and HPC environments?
522 Scaling out KVM with AWS
Further reading
Please refer to the following links for more information regarding what was covered in
this chapter:
euca2ools-guide/index.html
14
Monitoring the
KVM Virtualization
Platform 最新资料最新资料
When you move away from an environment that only has a couple of objects to manage
(for example, KVM hosts) to an environment that has hundreds of objects to manage, you
start asking yourself very important questions. One of the most prominent questions is,
How am I going to monitor my hundreds of objects without doing a lot of manual work and
with some GUI reporting options? And the answer to that question is the Elasticsearch,
Logstash, Kibana (ELK) stack. In this chapter, we'll see what these software solutions can
do for you and your KVM-based environment.
Behind those cryptic names are technologies that are here to solve a lot of problems you
might have when running more than one server. Although you can run the ELK stack
to monitor one service, it makes no sense to do so. The advice and solutions provided in
this chapter are applicable to all projects involving multiple devices and servers, not only
those running on KVM but, in essence, anything that is capable of producing any kind of
logging. We will start with the basics of how to monitor KVM as a virtualization platform
in general. Then, we'll move on to the ELK stack, including its building blocks and
installation, before moving on to its advanced configuration and customization.
524 Monitoring the KVM Virtualization Platform
the amount of disk space left, devices that are connecting and disconnecting, and more.
When we start running any useful task, the logs will only get larger.
Having a good and verbose log means that we can find what is going on at this instant
with the system; is it running correctly and do we need to do something to make it run
better? If something unexpected happens, logs can help us determine what is actually
wrong and point us in the direction of the solution. Correctly configured logs can even
help us spot errors before they start to create problems.
Suppose you have a system that is getting slower and slower week after week. Let's
further suppose that our problem is with the memory allocation of an app we installed
on the system. But let's also suppose that this memory allocation is not constant, and
instead varies with the number of users using the system. If you take a look at any point
in time, you may notice the number of users and memory allocated. But if you just take
measurements at different times, you will have a hard time understanding what kind of
correlation there is between the memory and the number of users – will the amount of
memory allocated be linear to the number of users or will it behave exponentially? If we
can see that 100 users are using 100 MB of memory, does that mean that 1,000 users will
use 1,000 MB?
Monitoring the KVM virtualization platform 525
But let's suppose that we are logging the amount of memory and the number of users at
equally spaced intervals.
We are not doing anything complicated; every couple of seconds, we are writing down
the time of the measurement, the amount of memory allocated, and the number of users
using the system. We are creating something called a dataset, consisting of data points.
Using data points is no different than what we did in the preceding example, but once
we have a dataset, we can do trend analysis. Basically, instead of looking at a slice of the
problem, we can analyze different time segments and compare the number of users and
what the amount of memory they were using actually was. That will give us important
information about how our system is actually using our memory and at what point we
had a problem, even if we don't have a problem right now.
This approach can even help us find and troubleshoot problems that are non-obvious,
such as a backup that is taking too long to finish once a month and works normally the
rest of the time. This kind of capability that enables us to spot trends and analyze data
and system performance is what logging is all about.
Put simply, any kind of monitoring boils down to two things: collecting data from the
thing we are trying to monitor and analyzing that data.
Monitoring can be either online or offline. Online monitoring is useful when we are trying
to create some sort of alerting system or when we are trying to establish the self-correcting
最新资料最新资料
system that will be able to respond to changes in the process. Then, we can either try to
correct problems or shut down or restart the system. Online monitoring is usually used by
the operations team in order to make sure that everything is running smoothly and that
the problems the system may have are logged.
Offline monitoring is much more complicated. Offline monitoring enables us to gather all
the data into logs, analyze these logs later, and extrapolate trends and figure out what can
be done to the system to make it better. But the fact of the matter is that it's always delayed
in terms of real-time activity since the offline methodology requires us to download and
then analyze the logs. That's why we prefer real-time log ingestion, which is something
that needs to be done online. That's why learning about the ELK stack is so important.
By fitting all these small pieces – real-time log ingestion, search, analytics, and reports –
into one larger stack, ELK makes it easier for us to monitor our environment in real time.
Let's learn how.
526 Monitoring the KVM Virtualization Platform
Elasticsearch
The first component that was created and that got traction in the community was
Elasticsearch, created to be a flexible, scalable system for indexing and searching large
datasets. Elasticsearch was used for thousands of different purposes, including searching
最新资料最新资料
for specific content in documents, websites, or logs. Its main selling point and the reason
a lot of people started using it is that it is both flexible and scalable, and at the same time
extremely fast.
When we think of searching, we usually think about creating some kind of query and
then waiting for the database to give us back some form of answer. In complex searches,
the problem is usually the waiting since it is exhausting having to tweak our queries and
wait for them to produce results. Since a lot of modern data science relies on the concept
of non-structured data, meaning that a lot of data that we need to search has no fixed
structure, or no structure at all, creating a fast way to search inside this pool of data is
a tough problem.
Imagine you need to find a certain book in a library. Also, imagine you do not have a
database of all the books, authors, publishing information, and everything else that a
normal library has; you are only allowed to search through all the books themselves.
Introduction to the open source ELK solution 527
Having a tool that is able to recognize patterns in those books and that can tell you the
answer to questions such as who wrote this book? or how many times is KVM mentioned
in all the books that are longer than 200 pages? is a really useful thing. This is what a good
search solution does.
Being able to search for a machine that is running the Apache web server and has
problems with a certain page requested by a certain IP address is essential if we want
to quickly and efficiently administer a cluster or a multitude of clusters of physical and
virtual servers.
The same goes for system information when we are monitoring even a single point of
data, such as memory allocation across hundreds of hosts. Even presenting that data is
a problem and searching for it in real time is almost impossible without the right tool.
Elasticsearch does exactly that: it creates a way for us to quickly go through enormous
amounts of barely structured data and then comes up with results that make sense. What
makes Elasticsearch different is its ability to scale, which means you can use it to create
search queries on your laptop, and later just run them on a multi-node instance that
searches through a petabyte of data.
Elasticsearch is also fast, and this is not something that only saves time. Having the ability
to get search results faster gives you a way to learn more about your data by creating and
modifying queries and then understanding their results.
最新资料最新资料
Since this is just a simple introduction to what ELK actually does, we will switch to the
next component, Logstash, and come back to searching a bit later.
Logstash
Logstash has a simple purpose. It is designed to be able to digest any number of logs and
events that generate data and store them for future use. After storing them, it can export
them in multiple formats such as email, files, HTTP, and others.
What is important about how Logstash works is its versatility in accepting different input
streams. It is not limited to using only logs; it can even accept things such as Twitter feeds.
528 Monitoring the KVM Virtualization Platform
Kibana
The last part of the old ELK stack is Kibana. If Logstash is storage and Elasticsearch is
for computing, then Kibana is the output engine. Simply put, Kibana is a way to use the
results of Elasticsearch queries to create visually impressive and highly customizable
layouts. Although the output of Kibana is usually some kind of a dashboard, its output
can be many things, depending on the user's ability to create new layouts and visualize
data. Having said all this, don't be afraid – the internet offers at least a partial, if not full
solution, to almost every imaginable scenario.
Next, what we will do is go through the basic installation of the ELK stack, show what
it can do, point you in the right direction, and demonstrate one of the most popular
beats – metricbeat.
Using the ELK stack is, in many ways, identical to running a server – what you need to do
depends on what you actually want to accomplish; it takes only a couple of minutes to get
the ELK stack running, but the real effort only starts there.
Of course, for us to fully understand how the ELK stack is used in a live environment,
we need to deploy it and set it up first. We'll do that next.
Thankfully, almost everything that we need to install is already prepared by the Elasticsearch
team. Aside from Java, everything is nicely sorted and documented on their site.
The first thing you need to do is install Java – ELK depends on Java to run, so we need to
have it installed. Java has two different install candidates: the official one from Oracle and
the open source OpenJDK. Since we are trying to stay in the open source ecosystem, we'll
install OpenJDK. In this book, we are using CentOS 8 as our platform, so the yum package
manager will be used extensively.
Let's start with the prerequisite packages. The only prerequisite package we need in order
to install Java is the java-11-OpenJDK-devel package (substitute "11" with the
current version of OpenJDK). So, here, we need to run the following command:
After issuing that command, you should get a result like this:
最新资料最新资料
java -version
The output should be the current version of Java and no errors. Other than verifying
whether Java works, this step is important in order to verify that the path to Java is correctly
set – if you are running on some other distributions, you may have to set the path manually.
Now that java is installed and ready to go, we can continue with the installation of the ELK
stack. The next step is to configure the install source for Elasticsearch and other services:
[Elasticsearch-7.x]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-Elasticsearch
enabled=1
autorefresh=1
type=rpm-md
Save the file. The important thing here is that the repository is GPG-signed,
so we need to import its key and apply it so that the packages can be verified
最新资料最新资料
3. Now, we are ready to do some housekeeping on the system side and grab all the
changes in the repository system:
sudo yum clean all
sudo yum makecache
Neither elasticsearch nor any of the other services are going to be started or
enabled automatically. We must do this manually for each of them. Let's do that now.
4. The procedure to start and enable services is standard and is the same for all
three services:
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.service
sudo systemctl start elasticsearch.service
sudo systemctl status elasticsearch.service
The last thing to do is installing beats, which are services that are usually installed
on the monitored servers, and which can be configured to create and send
important metrics on the system. Let's do that now.
5. For the purpose of this demonstration, we will install them all, although we are not
going to use all of them:
sudo yum install filebeat metricbeat packetbeat
heartbeat-elastic auditbeat
After this, we should have a functional system. Let's have a quick overview.
532 Monitoring the KVM Virtualization Platform
Kibana and Elasticsearch are both running as web services, on different ports. We are going
to interact with Kibana via the web browser (using the URLs http://localhost:9200
and http://localhost:5601) since this is where the visualization happens:
With that, the deployment process was finished successfully. Our logical next step would
be to create a workflow. Let's do that now.
Workflow
In this section, we are going to establish a workflow – we are going to create logs and
metrics that are going to be ingested into Logstash, queried via Elasticsearch, and then
visually represented in Kibana.
By default, Kibana runs on port 5601, which can be changed in the configuration.
But what does this mean for me? What does this mean for KVM?
The biggest selling point for using Elastic stack is flexibility and ease of presentation.
It doesn't matter if we are running one, 10, or 1,000 machines inside dozens of KVM
hosts; we can treat them the same in production and establish a stable monitoring
workflow. Using extremely simple scripts, we can create completely custom metrics and
quickly display them, we can watch for trends, and we can even create a near-real-time
monitoring system. All this, essentially for free.
Let's create a simple monitor that is going to dump system metrics for the host system that
is running ELK. We've already installed Metricbeat, so the only thing left is to configure
the service to send the data to Elasticsearch. Data is sent to Elasticsearch, not Logstash,
最新资料最新资料
and this is simply because of the way that the services interoperate. It is possible to send
both to Logstash and Elasticsearch, so we need to do a quick bit of explaining here.
Logstash is, by definition, a service that stores data that's sent to it. Elasticsearch searches
that data and communicates with Logstash. If we send the data to Logstash, we are
not doing anything wrong; we are just dumping data for later analysis. But sending to
Elasticsearch gives us one more feature – we can send not only data but also information
about the data in the form of templates.
On the other hand, Logstash has the ability to perform data transformation right after
it receives it and before data is stored, so if we need to do things such as parse GeoIP
information, change the names of hosts, and so on, we will probably use Logstash as our
primary destination. Keeping that in mind, do not set Metricbeat so that it sends data
both to Elasticsearch and Logstash; you will only get duplicate data stored in the database.
Using ELK is simple, and we've got this far into the installation without any real effort.
When we start analyzing the data is when the real problems start. Even simple and
perfectly formatted data that comes out of Metricbeat can be complex to visualize,
especially if we are doing it for the first time. Having premade templates both for
Elasticsearch and Kibana saves a lot of time.
534 Monitoring the KVM Virtualization Platform
• The input is always the first in the pipeline and is designed to receive data from
the source.
• The output is the last element in the pipeline, and it outputs the data.
• The filter is an optional element and stands between the input and output in order
to modify the data in accordance with the rules that we can define.
Setting up and integrating the ELK stack 535
All these elements can be chosen from a list of plugins in order for us to create an optimal
pipeline adjusted for a specific purpose. Let's go through this step by step.
What we need to do is just uncomment the one pipeline that is defined in the configuration
file, located in the /etc/logstash folder.
The whole stack uses YAML as the standard for the configuration file structure, so every
configuration file ends with the .yml extension. This is important in order to understand
that all the files that do not have this extension are here as either a sample or some kind
of template for the configuration; only files with the .yml extension will get parsed.
To configure Logstash, just open logstash.yml and uncomment all the lines that are
related to the first pipeline, called main. We don't need to do anything else. The file itself
is located in the /etc/logstash folder, and should look something like this after you
make these changes:
最新资料最新资料
This command will go through the initial setup. This part of the setup is probably
the most important thing in the whole initial configuration – pushing the
dashboard templates to Kibana. These templates will enable you to get up and
最新资料最新资料
This will give you a list of all the modules that Metricbeat already has prepared for
different services. Go ahead and enable two of them, logstash and kvm:
metricbeat modules enable kvm
metricbeat modules enable logstash
The logstash module is confusingly named since it is not intended to push data to
Logstash; instead, its main purpose is to report the Logstash service and enable you to
monitor it through Logstash. Sound confusing? Let's rephrase this: this module enables
Logstash to monitor itself. Or to be more precise, it enables beats to monitor part of the
Elastic stack.
The KVM module is a template that will enable you to gather different KVM-related metrics.
Configuring data collector and aggregator 537
This should be it. As a precaution, type in the following command to check Metricbeat's
configuration:
If the preceding command runs okay, start the Metricbeat service using the following
command:
You now have a running service that is gathering data on your host – the same one that
is running KVM and dumping that data into Elasticsearch. This is essential since we are
going to use all that data to create visualizations and dashboards.
entry in the log file. This is completely arbitrary and is just a default that ensures you know
when this instance was started.
In the same line as this name, there should be some numbers: what we are interested in is
the docs count – the number of objects that database holds. For the time being, if it's not
zero, we are okay.
Now, go to the Dashboard page and open the Metricbeat System Overview ECS
dashboard. It will show a lot of visual widgets representing CPU, memory, disk, and
network usage:
Now, you can click on Host Overview and view even more data about your system. Try
playing with the dashboard and different settings. One of the most interesting items on
this dashboard is the one in the upper-right part of the screen – the one that defines the
timespan that we are interested in. We can either create our own or use one of the presets,
such as last 15 minutes. After you click the Refresh button, new data should show
on the page.
With that, you now know enough about Kibana to get started, but we still are unable
to visualize KVM data. The next step is to create a dashboard that will cover that.
But before we do that, think about what you can do with only what we've learned so far.
Not only can you monitor the local system that has your KVM stack installed, but you
can also monitor any system that is able to run Metricbeat. The only thing that you need
to know is the IP address of the ELK stack, so that you can send data to it. Kibana will
automatically deal with visualizing all the different data from different systems, as we
will see later.
At first glance, these checks may confuse you – the installation that we guided you
through will work, and suddenly, as you try to configure some settings, everything will
fail. This is intentional.
In previous versions, these checks were performed but were flagged as warnings if a
configuration item was missed or misconfigured. Starting from version 7, these checks will
trigger an error when the system is in production and not configured correctly. This state
automatically means that your installation will not work if it's not configured properly.
ELK has two distinct modes of operation: development and production. On the first
installation, it is assumed that you are in development mode, so most of the functionality
simply works out of the box.
Things change a lot once you go into production mode – security settings and other
configuration options need to be explicitly set in order for the stack to function.
Configuring data collector and aggregator 539
The trick is that there is no explicit mode change – production settings and checks
associated with them are triggered by some settings in the configuration. The idea is that
once you reconfigure something that can be important from a security standpoint, you need
to reconfigure everything correctly. This will prevent you from forgetting something that can
be a big problem in production and force you to have at least a stable configuration to start
from. There is a switch to disable checks, but it is not recommended in any circumstances.
The main thing to pay attention to is the binding interface – the default installation binds
everything to localhost or a local loopback interface, which is completely fine for
production. Once your Elasticsearch is capable of forming a cluster and it can be triggered
by simply reconfiguring the network address for HTTP and transport communication, you
have to pay attention to the checks and reconfigure the whole system in order to make it
work. Please consult the documentation available on https://www.elastic.co/ for
more information, starting with https://www.elastic.co/guide/index.html.
For example, configuring clusters in the Elastic stack and all that it entails is way out of
the scope of this book – we are going to stay within the realm of a single-node cluster
in our configuration. This solution was specifically created for situations that can work
with a single node or, more precisely, a single machine instance that covers all the
functionality of a stack. In a normal deployment, you will run Elastic stack in a cluster, but
implementation details will be something determined by your configuration and its needs.
最新资料最新资料
We need to warn you of two crucial points – firewall and SELinux settings are up to you.
All the services use standard TCP to communicate. Don't forget that for the services to
run, the network has to be configured correctly.
Now that we've gotten that out of the way, let's answer one simple question: what do we
need to do to make the Elastic stack work with more than one server? Let's discuss this
scenario, bit by bit.
Elasticsearch
Go to the configuration file (/etc/elasticsearch/elasticsearch.yml) and add
a line in the discovery section:
discovery.type: single-node
Using this section is not mandatory, but it helps when you must go back to the
configuration later.
This option will tell Elasticsearch that you will have only one node in the cluster, and it
will make Elasticsearch ignore all the checks associated with the cluster and its network.
This setting will also make this node the master node automatically since Elasticsearch
depends on having master nodes that control everything in the cluster.
540 Monitoring the KVM Virtualization Platform
Change the setting under network.host: so that it points to the IP address of the
interface Elasticsearch is going to be available on. By default, it points to localhost and
is not visible from the network.
Restart the Elasticsearch service and make sure it is running and not generating errors:
Once you have it working, check whether the service is behaving normally from the local
machine. The easiest way is to do this is as follows:
The response should be .json formatted text containing information about the server.
Important note
The Elastic stack has three (or four) parts or services. In all our examples,
three of them (Logstash, Elasticsearch, and Kibana) were running on the same
server, so no additional configuration was necessary to accommodate network
communication. In a normal configuration, these services would probably run
on independent servers and in multiple instances, depending on the workload
and configuration of the service we are trying to monitor.
最新资料最新资料
Logstash
The default installation for Logstash is a file named logstash-sample.conf in the
/etc/logstash folder. This contains a simple Logstash pipeline to be used when we are
using Logstash as the primary destination for beats. We will come to this later, but for the
time being, copy this file to /etc/logstash/conf.d/logstash.conf and change
the address of the Elasticsearch server in the file you just copied. It should look something
like this:
Change localhost to the correct IP address of your server. This will make Logstash
listen on port 5044 and forward the data to Elasticsearch. Restart the service and verify
that it runs:
Kibana
Kibana also has some settings that need to be changed, but when doing so, there are a
couple of things to remember about this service:
• By itself, Kibana is a service that serves visualizations and data over the HTTP
protocol (or HTTPS, depending on the configuration).
• At the same time, Kibana uses Elasticsearch as its backend in order to get and work
with data. This means that there are two IP addresses that we must care about:
a) The first one is the address that will be used to show Kibana pages. By default,
this is localhost on port 5601.
b) The other IP address is the Elasticsearh service that will deal with the queries.
The default for this is also localhost, but it needs to be changed to the IP address
of the Elasticsearch server.
The file that contains configuration details is /etc/kibana/kibana.yml and you
need to at least make the following changes:
• server.host: This needs to point to the IP address where Kibana is going to have
its pages.
• elasticsearch.hosts: This needs to point to the host (or a cluster, or multiple
hosts) that are going to perform queries. 最新资料最新资料
Restart the service, and that's it. Now, log into Kibana and test whether everything works.
To get you even more familiarized with Kibana, we will try and establish some basic
system monitoring and show how we can monitor multiple hosts. We are going to
configure two beats: Metricbeat and Filebeat.
We already configured Metricbeat, but it was for localhost, so let's fix that first. In the
/etc/metricbeat/metricbeat.yml file, reconfigure the output in order to send
data to the elasticsearch address. You only need to change the host IP address since
everything else stays the same:
# Array of hosts to connect to
Hosts: ["Your-host-IP-address:9200"]
Make sure that you change Your-host-IP-address to the IP address you're using.
Configuring filebeat is mostly the same; we need to use /etc/filebeat/filebeat.
yml to configure it. Since all the beats use the same concepts, both filebeat and metricbeat
(as well as other beats) use modules to provide functionality. In both, the core module is
named system, so enable it using the following command in filebeat:
filebeat modules enable system
542 Monitoring the KVM Virtualization Platform
We mentioned this previously, in the first example, but you can test your configuration
by running the following command:
Repeat all this for every system that you intend to monitor. In our example, we have two
systems: one that is running KVM and another that is running Kibana. We also have
Kibana set up on the other system to test syslog and the way it notifies us of the problems
it notices.
We need to configure filebeat and metricbeat to send data to Kibana. We'll edit the
filebeat.yml and metricbeat.yml files for that purpose, by changing the
following portion of both files:
setup.kibana
host: "Your-Kibana-Host-IP:5601"
Before running beats, on a fresh installation, you need to upload dashboards to Kibana. You
only need to do this once for each Kibana installation, and you only need to do this from
one of the systems you are monitoring – templates will work, regardless of the system they
were uploaded from; they will just deal with data that is coming into Elasticsearch.
Configuring data collector and aggregator 543
filebeat setup
metricbeat setup
This will take a couple of seconds or even a minute, depending on your server and client.
Once it says that it created the dashboards, it will display all the dashboards and settings
it created.
Now, you are almost ready to go through all the data that Kibana will display:
最新资料最新资料
Before we start, there's something else you need to know about time and timestamps. The
date/time picker in the top-right corner will let you choose either your own timespan or
one of the predefined intervals:
最新资料最新资料
Important note
Always remember that the time that's shown is local to the browser's/machine's
time zone you are accessing Kibana from.
All the timestamps in the logs are local to the machine that is sending the logs. Kibana
will try and match time zones and translate the resulting timestamps, but if there is a
mismatch in the actual time settings on the machines you are monitoring, there is going
to be a problem trying to establish a timeline of events.
Configuring data collector and aggregator 545
Let's presume you got filebeat and metricbeat running. What can you do with these? As it
turns out, a lot:
• The first thing is discovering what is in your data. Press the Discover button in
Kibana (it looks like a small compass). Some data should show on the right if
everything is okay.
• To the right of the icon you just clicked on, a vertical space will fill up with all the
attributes that Kibana got from the data. If you do not see anything or something is
missing, remember that the time span you select narrows down the data that will get
shown in this view. Try readjusting the interval to Last 24 hours or Last 30 days.
Once the list of attributes shows up, you can quickly establish how many times each shows
up in the data you just selected – just click on any attribute and select Visualize. Also note
that once you click on the attribute, Kibana shows you the top five distinct values in the
last 500 records. This is a very useful tool if you need to know, for example, which hosts
are showing data, or how many different OS versions there are.
The visualization of particular attributes is just a start – notice how, once you hover over
an attribute name, a button called Add appears? Try clicking it. A table will start forming
on the right, filled with just the attributes you selected, sorted by timestamp. By default,
these values are not auto-refreshed, so the timestamps will be fixed. You can choose as
many attributes as you want and save this list or open it later.
最新资料最新资料
The next thing we need to look at is individual visualizations. We are not going to go
into too many details, but you can create your own visualizations out of the datasets
using predefined visualization types. At the same time, you are not limited to using only
predefined things – using JSON and scripting is also possible, for even more customization.
The next thing we need to learn about is dashboards.
Depending on a particular dataset, or to be more precise, on the particular set of
machines you are monitoring, some of them will have attributes that cover things only a
particular machine does or has. One example is virtual machines on AWS – they will have
some information that is useful only in the context of AWS. This is not important in our
configuration, but you need to understand that there may be some attributes in the data
that are unique for a particular set of machines. For starters, choose one of the system
metrics; either System Navigation ECS for metricbeat or Dashboards ECS for filebeat.
These dashboards show a lot of information about your systems in a lot of ways. Try
clicking around and see what you can deduce.
546 Monitoring the KVM Virtualization Platform
The metricbeat dashboard is more oriented toward running systems and keeping an eye
on memory and CPU allocation. You can click and filter a lot of information, and have it
presented in different ways. The following is a screenshot of metricbeat so that you can
get a rough idea of what it looks like:
The filebeat dashboard is more oriented toward analyzing what happened and establishing
trends. Let's check a couple of excerpts from the filebeat dashboard, starting with the
syslog entries part:
At first glance, you can notice a couple of things. We are showing data for two systems,
and the data is partial since it covers a part of the interval that we set. Also, we can see that
some of the processes are running and generating logs more frequently than others. Even
if we do not know anything about the particular system, we can now see there are some
processes that show up in logs, and they probably shouldn't:
Let's take a look at setroubleshoot. Click on the process name. In the window that
opens, click on the magnifying glass. This isolates only this process and shows only its
logs at the bottom of the screen.
We can quickly see on which host – including how often and why – setroubleshoot is
writing logs to. This is a quick way to spot potential problems. In this particular case, some
action should obviously be taken on this system to reconfigure SELinux since it generates
exceptions and stops some applications from accessing files.
Let's move along the vertical navigation bar and point out some other interesting
functionalities.
Going from top to bottom, the next big functionality is Canvas – it enables us to create
live presentations using data from the dataset we are collecting. The interface is similar to
what can be expected from other presentation programs, but the accent is on using data
directly in slides and generating slides in almost real time.
548 Monitoring the KVM Virtualization Platform
The next is Maps. This is a new addition to version 7.0 and allows us to create a
geographic presentation of data.
Machine learning is next – it enables you to manipulate data and use it to "train" filters
and create pipelines out of them.
Infrastructure is also interesting – when we mentioned dashboards, we were talking about
flexibility and customization. Infrastructure is a module that enables us to do real-time
monitoring with minimal effort and observe important metrics. You can see important data
either as a table, a balloon-like interface, or as a graph. Data can be averaged or presented
in other ways, and all that is done through a highly intuitive interface.
Heartbeat is another of these highly specialized boards – as its name suggests, it is the
easiest way to track and report on uptime data, and to quickly notice if something has
gone offline. Inventory hosts require the Heartbeat service to be installed on each system
we intend to monitor.
SIEM deserves a more thorough explanation: if we think about dashboards as being
multipurpose, SIEM is the exact opposite; it is created to be able to track all the events
on all the systems that can be categorized as security-related. This module will parse the
data when searching for IPs, network events, sources, destinations, network flows, and all
other data, and create simple to understand reports regarding what is happening on the
machines you are monitoring. It even offers anomaly detection, a feature that enables the
最新资料最新资料
Elastic stack to serve as a solution for advanced security purposes. This feature is a paid
one and requires the highest-paid tier to function.
Stack monitor is another notable board as it enables you to actually see what is happening
in all the different parts of the Elastic stack. It will show the status of all the services, their
resource allocation, and license status. The Logs feature is especially useful since it tracks
how many logs of what type the stack is generating, and it can quickly point to problems
if there are any.
This module also generates statistics for services, enabling us to understand how the
system can be optimized.
Configuring data collector and aggregator 549
Management, the last icon at the bottom, was already mentioned – it enables the
management of the cluster and its parts. This is the place where we can see whether there
are any indices we are expecting, whether the data flowing in, whether can we optimize
something, and so on. This is also the place we can manage licenses and create snapshots
of system configuration.
1. First, go to the hypervisor and open a virsh shell. List all the available domains,
choose a domain, and use the dommemstat –-domain <domain_name>
command.
The result should be something like this:
最新资料最新资料
2. Open Kibana and log in, go to the Discover tab, and select metric* as the index
we are working with. The left column should populate with attributes that are in the
dataset that metricbeat sends to this Kibana instance. Now, look at the attributes
and select a couple of them:
最新资料最新资料
3. For now, let's stick with the ones we selected. To the right of the column, a table is
formed that contains only the fields you selected, enabling you to check the data that
the system is receiving. You may need to scroll down to see the actual information
since this table will display all the data that was received that has at least one item
that has a value. Since one of the fields is always a timestamp, there will be a lot of
rows that will not contain any useful data for our analysis:
4. Now, let's build a gauge that will show us the real-time data and a quick
visualization of one of the values. Go to Visualize and click Create new
visualization:
最新资料最新资料
5. We are going to configure the y axis to show average data for kvm.dommemstat.
stat.value, which is the attribute that holds our data. Select Average under
Aggregation and kvm.dommemstat.stat.value as the field we are aggregating.
You can create a custom label if you want to:
最新资料最新资料
6. Before we finish this visualization, we need to add a filter. The problem with data
that is received from the KVM metricbeat module is that it uses one attribute
to hold different data – if we want to know what the number in the file we are
displaying actually means, we need to read its name from kvm.dommemstat.
stat.name. To accomplish this, just create a filter called kvm.dommemstat.
stat.name:"unused".
After we refresh the visualization, our data should be correctly visualized on
the right:
最新资料最新资料
最新资料最新资料
8. Save this dashboard so that you can use it later. All the elements of the dashboard
can be customized, and the dashboard can consist of any number of visualizations.
Kibana lets you customize almost everything you see and combine a lot of data
on one screen for easy monitoring. There is only one thing we need to change to
make this a good monitoring dashboard, and that is to make it autorefresh. Click
on the calendar icon on the right-hand side of the screen and select the auto refresh
interval. We decided on 5 seconds:
最新资料最新资料
Summary
What Kibana enables you to do is create custom dashboards that can show you data
for different machines side by side, so KVM is just one of the many options we have.
Depending on your needs, you can display, for example, disk usage for a KVM hypervisor
and all the hosts running on it, or some other metric. The Elastic stack is a flexible tool,
but as with all things, it requires time to master. This chapter only covered the bare
basics of Elastic configuration, so we strongly recommend further reading on this
topic – alongside KVM, ELK can be used to monitor almost everything that produces
any kind of data.
The next chapter is all about performance tuning and optimization for KVM virtual
machines, a subject that we didn't really touch upon. There's quite a lot to be discussed
– virtual machine compute sizes, optimizing performance, disks, storage access and
multipathing, optimizing kernels, and virtual machine settings, just to name a few.
All these subjects will be more important the larger our environment becomes.
Questions
1. What do we use metricbeat for?
2. Why do we use Kibana? 最新资料最新资料
Further reading
Please refer to the following links for more information regarding what was covered in
this chapter:
When we're thinking about virtualization, there are always questions that keep coming
up. Some of them might be simple enough, such as what are we going to get out of
virtualization? Does it simplify things? Is it easier to back up? But there are also much
more complex questions that start coming up once we've used virtualization for a while.
How do we speed things up on a compute level? Is there a way to do more optimization?
What can we tune additionally to get some more speed out of our storage or network?
Can we introduce some configuration changes that will enable us to get more out of the
existing infrastructure without investing a serious amount of money in it?
That's why performance tuning and optimization is so important to our virtualized
environments. As we will find out in this chapter, there are loads of different parameters
to consider – especially if we didn't design things properly from the very start, which
is usually the case. So, we're going to cover the subject of design first, explain why it
shouldn't be just a pure trial-and-error process, and then move on to disassembling
that thought process through different devices and subsystems.
560 Performance Tuning and Optimization for KVM VMs
Important note:
It's really simple: linear design will get you nowhere, and proper design is
the basis of performance tuning, which leaves much less work to be done on
performance tuning afterward.
a model from a big list. It doesn't really matter which brand – there are a lot of models on
offer. You can go with 1U (so-called pizza box) servers, which mostly have either one or
two CPUs, depending on the model. Then, you can select a 2U server, a 3U server…the list
gets exponentially bigger. Let's say that you selected a 2U server with one CPU.
In the next step, you select the amount of memory – let's say 96 GB or 128 GB. You place
your order, and a couple of days or weeks later, your server gets delivered. You open it up,
and you realize something – all of the RAM is connected to CPU1 memory channels. You
put that in your memory bank, forget about it, and move on to the next phase.
Then, the question becomes about the micro-management of some very pedestrian
settings. The BIOS version of the server, the drivers on the hypervisor level, and the
BIOS settings (power management, C-states, Turbo Boost, hyperthreading, various
memory-related settings, not allowing cores to turn themselves off, and so on) can have
a vast influence on the performance of our VMs running on a hypervisor. Therefore, it's
definitely best practice to first check whether there are any newer BIOS/firmware versions
for our hardware, and check the manufacturer and other relevant documentation to make
sure that the BIOS settings are as optimized as possible. Then, and only then, we can start
checkboxing some physical and deployment procedures – deploying our server in a rack,
installing an OS and everything that we need, and start using it.
562 Performance Tuning and Optimization for KVM VMs
Let's say that after a while, you realize that you need to do some upgrades and order some
PCI Express cards – two single-port Fibre Channel 8 Gbit/s host-based adapters, two
single-port 10 Gbit/s Ethernet cards, and two PCI Express NVMe SSDs. For example, by
ordering these cards, you want to add some capabilities – to access Fibre Channel storage
and to speed up your backup process and VM migrations by switching both of these
functionalities from 1 Gbit/s to 10 Gbit/s networking. You place your order, and a couple
of days or weeks later, your new PCI Express cards are delivered. You open them up,
shut down your server, take it out of the rack, and install these cards. 2U servers usually
have space for two or even three PCI Express riser cards, which are effectively used for
connecting additional PCI Express devices. Let's say that you use the first PCI Express
riser to deploy the first two cards – the Fibre Channel controllers and 10 Gbit/s Ethernet
cards. Then, noticing that you don't have enough PCI Express connectors to connect
everything to the first PCI Express riser, you use the second PCI Express riser to install
your two PCI Express NVMe SSDs. You screw everything down, close the server cover,
put the server back in your rack, and power it back on. Then, you go back to your laptop
and connect to your server in a vain attempt to format your PCI Express NVMe SSDs and
use them for new VM storage. You realize that your server doesn't recognize these SSDs.
You ask yourself – what's going on here? Do I have a bad server?
最新资料最新资料
Figure 15.1 – A PCI Express riser for DL380p G8 – you have to insert your
PCI Express cards into its slots
You call up your sales rep, and tell them that you think the server is malfunctioning as it
can't recognize these new SSDs. Your sales rep connects you to the pre-sales tech; you hear
a small chuckle from the other side and the following information: "Well, you see, you
can't do it that way. If you want to use the second PCI Express riser on your server, you
have to have a CPU kit (CPU plus heatsink) in your second CPU socket, and memory for
that second CPU, as well. Order these two things, put them in your server, and your PCI
Express NVMe SSDs will work without any problems."
It's all about design 563
You end your phone conversation and are left with a question mark over your head – what
is going on here? Why do I need to have a second CPU and memory connected to its memory
controllers to use some PCI Express cards?
This is actually related to two things:
• You can't use the memory slots of an uninstalled CPU, as that memory needs a
memory controller, which is inside the CPU.
• You can't use PCI Express on an uninstalled CPU, as the PCI Express lanes that
connect PCI Express risers' cards to the CPU aren't necessarily provided by the
chipset – the CPU can also be used for PCI Express lanes, and it often is, especially
for the fastest connections, as you'll learn in a minute.
We know this is confusing; we can feel your pain as we've been there. Sadly, you'll have to
stay with us for a little bit longer, as it gets even more confusing.
In Chapter 4, Libvirt Networking, we learned how to configure SR-IOV by using an Intel
X540-AT2 network controller. We mentioned that we were using the HP ProLiant DL380p
G8 server when configuring SR-IOV, so let's use that server for our example here, as well.
If you take a look at specifications for that server, you'll notice that it uses an Intel C600
chipset. If you then go to Intel's ARK website (https://ark.intel.com) and search
for information about C600, you'll notice that it has five different versions (C602, C602J,
最新资料最新资料
C604, C606, and C608), but the most curious part of it is the fact that all of them only
support eight PCI Express 2.0 lanes. Keeping in mind that the server specifications clearly
state that this server supports PCI Express 3.0, it gets really confusing. How can that be
and what kind of trickery is being used here? Yes, PCI Express 3.0 cards can almost always
work at PCI Express 2.0 speeds, but it would be misguiding at best to flat-out say that this
server supports PCI Express 3.0, and then discover that it supports it by delivering PCI
Express 2.0 levels of performance (twice as slow per PCI Express lane).
It's only when you go to the HP ProLiant DL380p G8 QuickSpecs document and find
the specific part of that document (the Expansions Slots part, with descriptions of three
different types of PCI Express risers that you can use) where all the information that
we need is actually spelled out for us. Let's use all of the PCI Express riser details for
reference and explanation. Basically, the primary riser has two PCI Express v3.0 slots
that are provided by processor 1 (x16 plus x8), and the third slot (PCI Express 2.0 x8) is
provided by the chipset. For the optional riser, it says that all of the slots are provided by
the CPU (x16 plus x8 times two). There are actually some models that can have three PCI
Express risers, and for that third riser, all of the PCI Express lanes (x16 times two) are also
provided by processor 2.
564 Performance Tuning and Optimization for KVM VMs
This is all very important. It's a huge factor in performance bottlenecks for many scenarios,
which is why we centered our example around the idea of two PCI Express NVMe SSDs.
We wanted to go through the whole journey with you.
So, at this point, we can have an educated discussion about what should be the de facto
standard hardware design of our example server. If our intention is to use these PCI
Express NVMe SSDs for local storage for our VMs, then most of us would treat that as a
priority. That would mean that we'd absolutely want to connect these devices to the PCI
Express 3.0 slot so that they aren't bottlenecked by PCI Express 2.0 speeds. If we have
two CPUs, we're probably better off using the first PCI Express slot in both of our PCI
Express risers for that specific purpose. The reasoning is simple – they're PCI Express 3.0
compatible and they're provided by the CPU. Again, that's very important – it means that
they're directly connected to the CPU, without the added latency of going through the
chipset. Because, at the end of the day, the CPU is the central hub for everything, and data
going from VMs to SSDs and back will go through the CPU. From a design standpoint,
we should absolutely use the fact that we know this to our advantage and connect our
PCI Express NVMe SSDs locally to our CPUs.
The next step is related to Fibre Channel controllers and 10 Gbit/s Ethernet controllers.
The vast load of 8 Gbit/s Fibre Channel controllers are PCI Express 2.0 compatible. The
same thing applies to 10 Gbit/s Ethernet adapters. So, it's again a matter of priority. If
you're using Fibre Channel storage a lot from our example server, logic dictates that you'd
最新资料最新资料
want to put your new and shiny Fibre Channel controllers in the fastest possible place.
That would be the second PCI Express slot in both of our PCI Express risers. Again,
second PCI Express slots are both provided by CPUs – processor 1 and processor 2. So
now, we're just left with 10 Gbit/s Ethernet adapters. We said in our example scenario that
we're going to be using these adapters for backup and VM migration. The backup won't
suffer all that much if it's done via a network adapter that's on the chipset. VM migration
might be a tad sensitive to that. So, you connect your first 10 Gbit/s Ethernet adapter to
the third PCI Express slot on the primary riser (for backup, provided by the chipset).
Then, you also connect your second 10 Gbit/s Ethernet adapter to the third PCI Express
slot on the secondary riser (PCI Express lanes provided by processor 2).
We've barely started on the subject of design with the hardware aspect of it, and already we
have such a wealth of information to process. Let's now move on to the second phase of our
design – which relates to VM design. Specifically, we're going to discuss how to create new
VMs that are designed properly from scratch. However, if we're going to do that, we need
to know which application this VM is going to be created for. For that matter, we're
going to create a scenario. We're going to use a VM that we're creating to host a node in a
Microsoft SQL database cluster on top of a VM running Windows Server 2019. The VM
will be installed on a KVM host, of course. This is a task given to us by a client. As we
already did the general hardware design, we're going to focus on VM design now.
Tuning the VM CPU and memory performance 565
VM design
Creating a VM is easy – we can just go to virt-manager, click a couple of times,
and we're done. The same applies to oVirt, RedHat Enterprise Virtualization Manager,
OpenStack, VMware, and Microsoft virtualization solutions… it's more or less the same
everywhere. The problem is designing VMs properly. Specifically, the problem is creating
a VM that's going to be pre-tuned to run an application on a very high level, which then
only leaves a small number of configuration steps that we can take on the server or VM
side to improve performance – the premise being that most of the optimization process
later will be done on the OS or application level.
So, people usually start creating a VM in one of two ways – either by creating a VM
from scratch with XYZ amount of resources added to the VM, or by using a template,
which – as we explained in Chapter 8, Creating and Modifying VM Disks, Templates, and
Snapshots – will save a lot of time. Whichever way we use, there's a certain amount of
resources that will be configured for our VM. We then remember what we're going to
use this VM for (SQL), so we increase the amount of CPUs to, for example, four, and the
amount of memory to 16 GB. We put that VM in the local storage of our server, spool it
up, and start deploying updates, configuring the network, and rebooting and generally
preparing the VM for the final installation step, which is actually installing our application
(SQL Server 2016) and some updates to go along with it. After we're done with that, we
start creating our databases and move on to the next set of tasks that need to be done.
最新资料最新资料
Let's take a look at this process from a design and tuning perspective next.
• A lot of pre-configuration can be done before the installation phase or during the
template phase, before you clone the VM. If it's an existing environment that you're
migrating to a new one, collect information about the old environment. Find out what
the database sizes are, what storage is being used, and how happy people are with
the performance of their database server and the applications using them.
At the end of the whole process, learn to take a mile-high perspective on the IT-related
work that you do. From a quality assurance standpoint, IT should be a highly structured,
procedural type of work. If you've done something before, learn to document the things
that you did while installing things and the changes that you made. Documentation – as it
stands now – is one of the biggest Achilles' heels of IT. Writing documentation will make
it easier for you to repeat the process in the future when faced with the same (less often)
or a similar (much more often) scenario. Learn from the greats – just as an example, we
would know much less about Beethoven, for example, if he didn't keep detailed notes
of the things he did day in, day out. Yes, he was born in 1770 and this year will mark
250 years since he was born, and that was a long time ago, but that doesn't mean that
250-year-old routines are bad.
So now, your VM is configured and in production, and a couple of days or weeks later, you
get a call from the company and they ask why the performance is not all that great. Why
isn't it working just like on a physical server?
最新资料最新资料
As a rule of thumb, when you're looking for performance issues on Microsoft SQL, they
can be roughly divided into four categories:
In our experience, the first and second category can easily account for 80–85% of SQL
performance issues. The third would probably account for 10%, while the last one is rather
rare, but it still happens. Keeping that in mind, from an infrastructure standpoint, when
you're designing a database VM, you should always look into VM memory and storage
configuration first, as they are by far the most common reasons. The problems just kind of
accumulate and snowball from there. Specifically, some of the most common key reasons
for sub-par SQL VM performance is the memory location, looking at it from a CPU
perspective, and storage issues – latencies/IOPS and bandwidth being the problem. So,
let's describe these one by one.
Tuning the VM CPU and memory performance 567
The first issue that we need to tackle is related to – funnily enough – geography. It's very
important for a database to have its memory content as close as possible to the CPU
cores assigned to its VMs. This is what NUMA is all about. We can easily overcome this
specific issue on KVM with a bit of configuration. Let's say that we chose that our VM
uses four virtual CPUs. Our test server has Intel Xeon E5-2660v2 processors, which have
10 physical cores each. Keeping in mind that our server has two of these Xeon processors,
we have 20 cores at our disposal overall.
We have two basic questions to answer:
• How do these four cores for our VM correlate to 20 physical cores below?
• How does that relate to the VM's memory and how can we optimize that?
The answer to both of these questions is that it depends on our configuration. By default,
our VM might use two cores from two physical processors each and spread itself in terms
of memory across both of them or 3+1. None of these configuration examples are good.
What you want is to have all the virtual CPU cores on one physical processor, and you
want those virtual CPU cores to use memory that's local to those four physical cores –
directly connected to the underlying physical processor's memory controller. What we just
described is the basic idea behind NUMA – to have nodes (consisting of CPU cores) that
act as building compute blocks for your VMs with local memory.
最新资料最新资料
If at all possible, you want to reserve all the memory for that VM so that it doesn't swap
somewhere outside of the VM. In KVM, that outside of the VM would be in the KVM
host swap space. Having access to real RAM memory all of the time is a performance and
SLA-related configuration option. If the VM uses a bit of underlying swap partition that
acts as its memory, it will not have the same performance. Remember, swapping is usually
done on some sort of local RAID array, an SD card, or a similar medium, which are many
orders of magnitude slower in terms of bandwidth and latency compared to real RAM
memory. If you want a high-level statement about this – avoid memory overcommitment
on KVM hosts at all costs. The same goes for the CPU, and this is a commonly used best
practice on any other kind of virtualization solution, not just on KVM.
Furthermore, for critical resources, such as a database VM, it definitely makes sense to
pin vCPUs to specific physical cores. That means that we can use specific physical cores
to run a VM, and we should configure other VMs running on the same host not to use
those cores. That way, we're reserving these CPU cores specifically for a single VM, thus
configuring everything for maximum performance not to be influenced by other VMs
running on the physical server.
568 Performance Tuning and Optimization for KVM VMs
Yes, sometimes managers and company owners won't like you because of this best practice
(as if you're to blame), as it requires proper planning and enough resources. But that's
something that they have to live with – or not, whichever they prefer. Our job is to make
the IT system run as best as it possibly can.
VM design has its basic principles, such as the CPU and memory design, NUMA
configuration, configuring devices, storage and network configuration, and so on. Let's go
through all of these topics step by step, starting with an advanced CPU-based feature that
can really help make our systems run as best as possible if used properly – CPU pinning.
CPU pinning
CPU pinning is nothing but the process of setting the affinity between the vCPU and the
physical CPU core of the host so that the vCPU will be executing on that physical CPU
core only. We can use the virsh vcpupin command to bind a vCPU to a physical CPU
core or to a subset of physical CPU cores.
There are a couple of best practices when doing vCPU pinning:
• If the number of guest vCPUs is more than the single NUMA node CPUs, don't go
for the default pinning option.
• If the physical CPUs are spread across different NUMA nodes, it is always better
最新资料最新资料
to create multiple guests and pin the vCPUs of each guest to physical CPUs in the
same NUMA node. This is because accessing different NUMA nodes, or running
across multiple NUMA nodes, has a negative impact on performance, especially for
memory-intensive applications.
1. Execute virsh nodeinfo to gather details about the host CPU configuration:
2. The next step is to get the CPU topology by executing the virsh capabilities
command and check the section tagged <topology>:
最新资料最新资料
Figure 15.3 – The virsh capabilities output with all the visible physical CPU cores
Once we have identified the topology of our host, the next step is to start pinning
the vCPUs.
3. Let's first check the current affinity or pinning configuration with the guest named
SQLForNuma, which has four vCPUs:
4. Let's pin vCPU0 to physical core 0, vCPU1 to physical core 1, vCPU2 to physical
core 2, and vCPU3 to physical core 3:
You can also make use of virsh vcpuinfo to verify the pinning. The output of the
virsh vcpuinfo command is as follows:
最新资料最新资料
• Memory allocation
• Memory tuning
• Memory backing
Let's start by explaining how to configure memory allocation for a virtual system or guest.
Memory allocation
To make the allocation process simple, we will consider the virt-manager
libvirt client again. Memory allocation can be done from the window shown in the
following screenshot:
最新资料最新资料
• Maximum allocation: The runtime maximum memory allocation of the guest. This
is the maximum memory that can be allocated to the guest when it's running.
• Current allocation: How much memory a guest always uses. For memory
ballooning reasons, we can have this value lower than the maximum.
The virsh command can be used to tune these parameters. The relevant virsh
command options are setmem and setmaxmem.
Tuning the VM CPU and memory performance 573
Memory tuning
The memory tuning options are added under <memtune> of the guest configuration file.
Additional memory tuning options can be found at http://libvirt.org/
formatdomain.html#elementsMemoryTuning.
The admin can configure the memory settings of a guest manually. If the <memtune>
configuration is omitted, the default memory settings apply for a guest. The virsh
command at play here is as follows:
It can have any of the following values; this best practice is well documented in the
man page:
The default/current values that are set for the memtune parameter can be fetched
as shown:
When setting hard_limit, you should not set this value too low. This might lead
to a situation in which a VM is terminated by the kernel. That's why determining the
correct amount of resources for a VM (or any other process) is such a design problem.
Sometimes, designing things properly seems like dark arts.
To learn more about how to set these parameters, please see the help output for the
memtune command in the following screenshot:
最新资料最新资料
Memory backing
The following is the guest XML representation of memory backing:
<domain> ...
<memoryBacking>
<hugepages>
<page size="1" unit="G" nodeset="0-3,5"/>
<page size="2" unit="M" nodeset="4"/>
</hugepages>
<nosharepages/>
<locked/>
</memoryBacking> ...
</domain>
You may have noticed that there are three main options for memory backing: locked,
nosharepages, and hugepages. Let's go through them one by one, starting
with locked.
locked
In KVM virtualization, guest memory lies in the process address space of the qemu-kvm
最新资料最新资料
process in the KVM host. These guest memory pages can be swapped out by the Linux
kernel at any time, based on the requirement that the host has, and this is where locked
can help. If you set the memory backing option of the guest to locked, the host will not
swap out memory pages that belong to the virtual system or guest. The virtual memory
pages in the host system memory are locked when this option is enabled:
<memoryBacking>
<locked/>
</memoryBacking>
We need to use <memtune> to set hard_limit. The calculus is simple – whatever the
amount of memory for the guest we need plus overhead.
576 Performance Tuning and Optimization for KVM VMs
nosharepages
The following is the XML representation of nosharepages from the guest
configuration file:
<memoryBacking>
<nosharepages/>
</memoryBacking>
There are different mechanisms that can enable the sharing of memory when the memory
pages are identical. Techniques such as Kernel Same-Page Merging (KSM) share pages
among guest systems. The nosharepages option instructs the hypervisor to disable
shared pages for this guest – that is, setting this option will prevent the host from
deduplicating memory between guests.
hugepages
The third and final option is hugepages, which can be represented in XML format, as
follows:
<memoryBacking>
</hugepages>
最新资料最新资料
</memoryBacking>
HugePages were introduced in the Linux kernel to improve the performance of memory
management. Memory is managed in blocks known as pages. Different architectures (i386,
ia64) support different page sizes. We don't necessarily have to use the default setting for
x86 CPUs (4 KB memory pages), as we can use larger memory pages (2 MB to 1 GB),
a feature that's called HugePages. A part of the CPU called the Memory Management
Unit (MMU) manages these pages by using a list. The pages are referenced through page
tables, and each page has a reference in the page table. When a system wants to handle a
huge amount of memory, there are mainly two options. One of them involves increasing
the number of page table entries in the hardware MMU. The second method increases the
default page size. If we opt for the first method of increasing the page table entries, it is
really expensive.
Tuning the VM CPU and memory performance 577
The second and more efficient method when dealing with large amounts of memory is
using HugePages or increased page sizes by using HugePages. The different amounts of
memory that each and every server has means that there is a need for different page sizes.
The default values are okay for most situations, while huge memory pages (for example,
1 GB) are more efficient if we have large amounts of memory (hundreds of gigabytes or
even terabytes). This means less administrative work in terms of referencing memory
pages and more time spent actually getting the content of these memory pages, which
can lead to a significant performance boost. Most of the known Linux distributions can
use HugePages to manage large memory amounts. A process can use HugePages memory
support to improve performance by increasing the CPU cache hits against the Translation
LookAside Buffer (TLB), as explained in Chapter 2, KVM as a Virtualization Solution.
You already know that guest systems are simply processes in a Linux system, thus the
KVM guests are eligible to do the same.
Before we move on, we should also mention Transparent HugePages (THP). THP is an
abstraction layer that automates the HugePages size allocation based on the application
request. THP support can be entirely disabled, can only be enabled inside MADV_HUGEPAGE
regions (to avoid the risk of consuming more memory resources), or enabled system-wide.
There are three main options for configuring THP in a system: always, madvise,
and never:
最新资料最新资料
# cat/sys/kernel/mm/transparent_hugepage/enabled [always]
madvise never
From the preceding output, we can see that the current THP setting in our server is
madvise. Other options can be enabled by using one of the following commands:
The system settings for performance are automatically optimized by THP. We can have
performance benefits by using memory as cache. It is possible to use static HugePages
when THP is in place or in other words THP won't prevent it from using a static
method. If we don't configure our KVM hypervisor to use static HugePages, it will use
4 Kb transparent HugePages. The advantages we get from using HugePages for a KVM
guest's memory are that less memory is used for page tables and TLB misses are reduced;
obviously, this increases performance. But keep in mind that when using HugePages for
guest memory, you can no longer swap or balloon guest memory.
Let's have a quick look at how to use static HugePages in your KVM setup. First, let's
check the current system configuration – it's clear that the HugePages size in this
system is currently set at 2 MB:
最新资料最新资料
1. View the current explicit hugepages value by running the following command or
fetch it from sysfs, as shown:
# cat /proc/sys/vm/nr_hugepages
0
3. As the HugePage size is 2 MB, we can set hugepages in increments of 2 MB. To set
the number of hugepages to 2,000, use the following command:
# echo 2000 > /proc/sys/vm/nr_hugepages
The total memory assigned for hugepages cannot be used by applications that are
not hugepage-aware – that is, if you over-allocate hugepages, normal operations of
the host system can be affected. In our examples, 2048*2 MB would equal 4,096 MB
of memory, which we should have available when we do this configuration.
4. We need to tell the system that this type of configuration is actually OK and
configure /etc/security/limits.conf to reflect that. Otherwise, the system
might refuse to give us access to 2,048 hugepages times 2 MB of memory. We need
to add two lines to that file:
soft memlock <value>
hard memlock <value>
6. Then, mount the fs hugepages, reconfigure the VM, and restart the host:
# mount -t hugetlbfs hugetlbfs /dev/hugepages
<memoryBacking>
</hugepages>
</ memoryBacking>
It's time to shut down the VM and reboot the host. Inside the VM, do the following:
# systemctl poweroff
580 Performance Tuning and Optimization for KVM VMs
# systemctl reboot
After the host reboot and the restart of the VM, it will now start using the hugepages.
The next topic is related to sharing memory content between multiple VMs, referred to
as KSM. This technology is heavily used to save memory. At any given moment, when
multiple VMs are powered on the virtualization host, there's a big statistical chance
that those VMs have blocks of memory contents that are the same (they have the same
contents). Then, there's no reason to store the same contents multiple times. Usually, we
refer to KSM as a deduplication process being applied to memory. Let's learn how to use
and configure KSM.
shared and common to more than one process, the process that requests the change gets a
new copy and the changes are saved in it.
Even though the consolidated COW shared page is accessible by all the processes,
whenever a process tries to change the content (write to that page), the process gets a
new copy with all of the changes. By now, you will have understood that, by using KSM,
we can reduce physical memory consumption. In the KVM context, this can really add
value, because guest systems are qemu-kvm processes in the system, and there is a huge
possibility that all the VM processes will have a good amount of similar memory.
For KSM to work, the process/application has to register its memory pages with KSM.
In KVM-land, KSM allows guests to share identical memory pages, thus achieving an
improvement in memory consumption. That might be some kind of application data, a
library, or anything else that's used frequently. This shared page or memory is marked as
copy on write. In short, KSM avoids memory duplication and it's really useful when
similar guest OSes are present in a KVM environment.
Getting acquainted with KSM 581
By using the theory of prediction, KSM can provide enhanced memory speed and
utilization. Mostly, this common shared data is stored in cache or main memory, which
causes fewer cache misses for the KVM guests. Also, KSM can reduce the overall guest
memory footprint so that, in a way, it allows the user to do memory overcommitting in a
KVM setup, thus supplying the greater utilization of available resources. However, we have
to keep in mind that KSM requires more CPU resources to identify the duplicate pages
and to perform tasks such as sharing/merging.
Previously, we mentioned that the processes have to mark the pages to show that they are
eligible candidates for KSM to operate. The marking can be done by a process based on
the MADV_MERGEABLE flag, which we will discuss in the next section. You can explore
the use of this flag in the madvise man page:
# man 2 madvise
MADV_MERGEABLE (since Linux 2.6.32)
Enable Kernel Samepage Merging (KSM) for the pages in the
range specified by addr and length. The kernel regularly scans
those areas of user memory that have been marked as mergeable,
looking for pages with identical content. These are replaced
by a single write-protected page (that is automatically copied
if a process later wants to update the content of the page).
KSM merges only private anonymous pages (see mmap(2)).
最新资料最新资料
KSM gets deployed as a part of the qemu-kvm package. Information about the KSM
service can be fetched from the sysfs filesystem, in the /sys directory. There are
different files available in this location, reflecting the current KSM status. These are updated
dynamically by the kernel, and it has a precise record of the KSM usage and statistics:
As with any other service, the ksmtuned service also has logs stored in a log file, /var/
log/ksmtuned. If we add DEBUG=1 to /etc/ksmtuned.conf, we will have logging
from any kind of KSM tuning actions. Refer to https://www.kernel.org/doc/
Documentation/vm/ksm.txt for more details.
Getting acquainted with KSM 583
Once we start the KSM service, as shown next, you can watch the values change
depending on the KSM service in action:
We can then check the status of the ksm service like this:
Figure 15.15 – The ksm service command and the ps command output
Once the KSM service is started and we have multiple VMs running on our host, we can
check the changes by querying sysfs by using the following command multiple times:
最新资料最新资料
cat /sys/kernel/mm/ksm/*
Let's explore the ksmtuned service in more detail. The ksmtuned service is designed
so that it goes through a cycle of actions and adjusts KSM. This cycle of actions continues
its work in a loop. Whenever a guest system is created or destroyed, libvirt will notify the
ksmtuned service.
The /etc/ksmtuned.conf file is the configuration file for the ksmtuned service.
Here is a brief explanation of the configuration parameters available. You can see these
configuration parameters match with the KSM files in sysfs:
overhead in some setups or environments – for example, if you have a few VMs that
have similar memory content when you start them and loads of memory-intensive
operations afterward. This will create issues as KSM will first work very hard to reduce the
memory footprint, and then lose time to cover for all of the memory content differences
between multiple VMs. Also, there is a concern that KSM may open a channel that could
potentially be used to leak information across guests, as has been well documented in the
past couple of years. If you have these concerns or if you see/experience KSM not helping
to improve the performance of your workload, it can be disabled.
To disable KSM, stop the ksmtuned and ksm services in your system by executing
the following:
We have gone through the different tuning options for CPU and memory. The next big
subject that we need to cover is NUMA configuration, where both CPU and memory
configuration become a part of a larger story or context.
Tuning the CPU and memory with NUMA 585
a system. One of the easiest ways to show the current NUMA topology is via the
numactl command:
The preceding numactl output conveys that there are 10 CPUs in the system and they
belong to a single NUMA node. It also lists the memory associated with each NUMA
node and the node distance. When we discussed CPU pinning, we displayed the topology
of the system using the virsh capabilities. To get a graphical view of the NUMA
topology, you can make use of a command called lstopo, which is available with the
hwloc package in CentOS-/Red Hat-based systems:
最新资料最新资料
<domain> ...
<numatune>
Tuning the CPU and memory with NUMA 587
最新资料最新资料
<domain>
…
<numatune>
<memory mode="strict" nodeset="1-4,^3"/>
<memnode cellid="0" mode="strict" nodeset="1"/>
<memnode cellid="2" mode="preferred" nodeset="2"/>
</numatune> ...
</domain>
588 Performance Tuning and Optimization for KVM VMs
Even though the element called numatune is optional, it is provided to tune the
performance of the NUMA host by controlling the NUMA policy for the domain process.
The main sub-tags of this optional element are memory and nodeset. Some notes on
these sub-tags are as follows:
• memory: This element describes the memory allocation process on the NUMA
node. There are three policies that govern memory allocation for NUMA nodes:
a) Strict: When a VM tries to allocate memory and that memory isn't available,
allocation will fail.
b) Interleave: Nodeset-defined round-robin allocation across NUMA nodes.
c) Preferred: The VM tries to allocate memory from a preferred node. If that
node doesn't have enough memory, it can allocate memory from the remaining
NUMA nodes.
• nodeset: Specifies a NUMA node list available on the server.
There are some more things to consider when thinking about CPU pinning in the NUMA
context. We discussed the basis of CPU pinning earlier in this chapter, as it gives us better,
predictable performance for our VMs and can increase cache efficiency. Just as an example,
let's say that we want to run a VM as fast as possible. It would be prudent to run it on the
fastest storage available, which would be on a PCI Express bus on the CPU socket where
we pinned the CPU cores. If we're not using an NVMe SSD local to that VM, we can use a
storage controller to achieve the same thing. However, if the storage controller that we're
using to access VM storage is physically connected to another CPU socket, that will lead
to latency. For latency-sensitive applications, that will mean a big performance hit.
However, we also need to be aware of the other extreme – if we do too much pinning, it
can create other problems in the future. For example, if our servers are not architecturally
the same (having the same amount of cores and memory), migrating VMs might become
problematic. We can create a scenario where we're migrating a VM with CPU cores
pinned to cores that don't exist on the target server of our migration process. So, we
always need to be careful about what we do with the configuration of our environments so
that we don't take it too far.
The next subject on our list is emulatorpin, which can be used to pin our qemu-kvm
emulator to a specific CPU core so that it doesn't influence the performance of our VM
cores. Let's learn how to configure that.
最新资料最新资料
Understanding emulatorpin
The emulatorpin option also falls into the CPU tuning category. The XML
representation of this would be as follows:
<domain> ...
<cputune> ….. <emulatorpin cpuset="1-3"/>
…..
</cputune> ...
</domain>
590 Performance Tuning and Optimization for KVM VMs
The emulatorpin element is optional and is used to pin the emulator (qemu-kvm) to a
host physical CPU. This does not include the vCPU or IO threads from the VM. If this is
omitted, the emulator is pinned to all the physical CPUs of the host system by default.
Important note:
Please note that <vcpupin>, <numatune>, and <emulatorpin>
should be configured together to achieve optimal, deterministic performance
when you tune a NUMA-capable system.
Before we leave this section, there are a couple more things to cover: the guest system
NUMA topology and hugepage memory backing with NUMA.
Guest NUMA topology can be specified using the <numa> element in the guest XML
configuration; some call this virtual NUMA:
<cpu> ...
<numa>
<cell id='0' cpus='0-3' memory='512000' unit='KiB'/>
<cell id='1' cpus='4-7' memory='512000' unit='KiB' />
</numa> ...
</cpu>
最新资料最新资料
The cell id element tells the VM which NUMA node to use, while the cpus element
configures a specific core (or cores). The memory element assigns the amount of memory
per node. Each NUMA node is indexed by number, starting from 0.
Previously, we discussed the memorybacking element, which can be specified to use
hugepages in guest configurations. When NUMA is present in a setup, the nodeset
attribute can be used to configure the specific hugepage size per NUMA node, which
may come in handy as it ties a given guest's NUMA nodes to certain hugepage sizes:
<memoryBacking>
<hugepages>
<page size="1" unit="G" nodeset="0-2,4"/>
<page size="4" unit="M" nodeset="3"/>
</hugepages>
</memoryBacking>
Tuning the CPU and memory with NUMA 591
This type of configuration can optimize the memory performance, as guest NUMA nodes
can be moved to host NUMA nodes as required, while the guest can continue to use the
hugepages allocated by the host.
NUMA tuning also has to consider the NUMA node locality for PCI devices, especially
when a PCI device is being passed through to the guest from the host. If the relevant PCI
device is affiliated to a remote NUMA node, this can affect data transfer and thus hurt
the performance.
The easiest way to display the NUMA topology and PCI device affiliation is by using the
lstopo command that we discussed earlier. The non-graphic form of the same command
can also be used to discover this configuration. Please refer to the earlier sections.
# cat /sys/kernel/mm/ksm/merge_across_nodes
1 最新资料最新资料
If this parameter is set to 0, KSM only merges memory pages from the same NUMA node.
If it's set to 1 (as is the case here), it will merge across the NUMA nodes. That means that
the VM CPUs that are running on the remote NUMA node will experience latency when
accessing a KSM-merged page.
Obviously, you know the guest XML entry (the memorybacking element) for asking the
hypervisor to disable shared pages for the guest. If you don't remember, please refer back
to the memory tuning section for details of this element. Even though we can configure
NUMA manually, there is something called automatic NUMA balancing. We did mention
it earlier, but let's see what this concept involves.
592 Performance Tuning and Optimization for KVM VMs
# cat /sys/kernel/debug/sched_features
GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY
CACHE_HOT_BUDDY
WAKEUP_PREEMPTION ARCH_POWER NO_HRTICK NO_DOUBLE_TICK LB_BIAS
NONTASK_
POWER TTWU_QUEUE NO_FORCE_SD_OVERLAP RT_RUNTIME_SHARE NO_LB_MIN
最新资料最新资料
NUMA
NUMA_FAVOUR_HIGHER NO_NUMA_RESIST_LOWER
We can check whether it is enabled in the system via the following method:
# cat /proc/sys/kernel/numa_balancing
1
The automatic NUMA balancing mechanism works based on the number of algorithms
and data structures. The internals of this method are based on the following:
• Task placement
• Task grouping
One of the best practices or recommendations for a KVM guest is to limit its resource to
the amount of resources on a single NUMA node. Put simply, this avoids the unnecessary
splitting of VMs across NUMA nodes, which can degrade the performance. Let's start
by checking the current NUMA configuration. There are multiple available options to do
this. Let's start with the numactl command, NUMA daemon, and numastat, and then
go back to using a well-known command, virsh.
最新资料最新资料
# cat /sys/kernel/debug/sched_features
This will not list NUMA flags if the system is not NUMA-aware.
Generally, don't make VMs wider than what a single NUMA node can provide. Even if
the NUMA is available, the vCPUs are bound to the NUMA node and not to a particular
physical CPU.
594 Performance Tuning and Optimization for KVM VMs
large in-memory database application, for example, especially if memory accesses will
likely remain unpredictable, numad will probably not improve performance.
To adjust and align the CPUs and memory resources automatically according to the
NUMA topology, we need to run numad. To use numad as an executable, just run
the following:
# numad
Once the numad binary is executed, it will start the alignment, as shown in the following
screenshot. In our system, we have the following VM running:
# numad -i 0
We can always stop it, but that will not change the NUMA affinity state that was
configured by numad. Now let's move on to numastat.
最新资料最新资料
The numactl package provides the numactl binary/command and the numad package
provides the numad binary/command:
Figure 15.22 – The numastat command output for the qemu-kvm process
596 Performance Tuning and Optimization for KVM VMs
Important note:
The numerous memory tuning options that we have used have to be thoroughly
tested using different workloads before moving the VM to production.
Before we jump on to the next topic, we'd just like to remind you of a point we made
earlier in this chapter. Live-migrating a VM with pinned resources might be complicated,
as you have to have some form of compatible resources (and their amount) on the target
host. For example, the target host's NUMA topology doesn't have to be aligned with
the source host's NUMA topology. You should consider this fact when you tune a KVM
environment. Automatic NUMA balancing may help, to a certain extent, the need for
manually pinning guest resources, though.
There are mainly two layers (virt queue and virtual ring) to support communication
between the guest and the hypervisor.
Virt queue and virtual ring (vring) are the transport mechanism implementations in
virtio. Virt queue (virtio) is the queue interface that attaches the frontend and backend
drivers. Each virtio device has its own virt queues and requests from guest systems are
put into these virt queues. Each virt queue has its own ring, called a vring, which is where
the memory is mapped between QEMU and the guest. There are different virtio drivers
available for use in a KVM guest.
The devices are emulated in QEMU, and the drivers are part of the Linux kernel, or an
extra package for Windows guests. Some examples of device/driver pairs are as follows:
• virtio-console: The virtio console device is a simple device for data input and
output between the guest and host userspace.
• virtio-rng: The virtio entropy device supplies high-quality randomness for
guest use, and so on.
In general, you should make use of these virtio devices in your KVM setup for better
performance.
A guest treats the virtual disk as its storage. When an application inside a guest OS writes
data to the local storage of the guest system, it has to pass through a couple of layers. That
said, this I/O request has to traverse through the filesystem on the storage and the I/O
subsystem of the guest OS. After that, the qemu-kvm process passes it to the hypervisor
from the guest OS. Once the I/O is within the realm of the hypervisor, it starts processing
the I/O like any other applications running in the host OS. Here, you can see the number
of layers that the I/O has to pass through to complete an I/O operation. Hence, the block
device backend performs better than the image file backend.
The following are our observations on disk backends and file- or image-based virtual disks:
• A file image is part of the host filesystem and it creates an additional resource
demand for I/O operations compared to the block device backend.
• Using sparse image files helps to over allocate host storage but its usage will reduce
the performance of the virtual disk.
• The improper partitioning of guest storage when using disk image files can cause
unnecessary I/O operations. Here, we are mentioning the alignment of standard
partition units.
At the start of this chapter, we discussed virtio drivers, which give better performance. So,
it's recommended that you use the virtio disk bus when configuring the disk, rather than
最新资料最新资料
the IDE bus. The virtio_blk driver uses the virtio API to provide high performance
for storage I/O device, thus increasing storage performance, especially in large enterprise
storage systems. We discussed the different storage formats available in Chapter 5, Libvirt
Storage; however, the main ones are the raw and qcow formats. The best performance
will be achieved when you are using the raw format. There is obviously a performance
overhead delivered by the format layer when using qcow. Because the format layer has to
perform some operations at times, for example, if you want to grow a qcow image, it has
to allocate the new cluster and so on. However, qcow would be an option if you want to
make use of features such as snapshots. These extra facilities are provided with the image
format, qcow. Some performance comparisons can be found at http://www.Linux-
kvm.org/page/Qcow2.
Block I/O tuning 599
There are three options that can be considered for I/O tuning, which we discussed in
Chapter 7, Virtual Machine – Installation, Configuration, and Life Cycle Management:
• Cache mode
• I/O mode
• I/O tuning
Let's briefly go through some XML settings so that we can implement them on our VMs.
The cache option settings can reflect in the guest XML, as follows:
• Limiting the disk I/O of each guest may be required, especially when multiple
最新资料最新资料
Even though the disk I/O is not the only resource that has to be considered to
guarantee QoS, this has some importance. Tuning I/O can prevent a guest system from
monopolizing shared resources and lowering the performance of other guests running on
the same host. This is really a requirement, especially when the host system is serving a
Virtual Private Server (VPS) or a similar kind of service. KVM gives the flexibility to do
I/O throttling on various levels – throughput and I/O amount, and we can do it per block
device. This can be achieved via the virsh blkdeviotune command. The different
options that can be set using this command are displayed as follows:
最新资料最新资料
After that, you need to edit /etc/sysctl.conf to make this setting persistent.
For more information on ARP Flux, please refer to http://linux-ip.net/html/
ether-arp.html#ether-arp-flux.
Additional tuning can be done on the driver level; that said, by now we know that virtio
drivers give better performance compared to emulated device APIs. So, obviously,
using the virtio_net driver in guest systems should be taken into account. When
we use the virtio_net driver, it has a backend driver in qemu that takes care of the
最新资料最新资料
communication initiated from the guest network. Even if this was performing better,
some more enhancements in this area introduced a new driver called vhost_net, which
provides in-kernel virtio devices for KVM. Even though vhost is a common framework
that can be used by different drivers, the network driver, vhost_net, was one of the
first drivers. The following diagram will make this clearer:
As you may have noticed, the number of context switches is really reduced with the new
path of communication. The good news is that there is no extra configuration required in
guest systems to support vhost because there is no change to the frontend driver.
vhost_net reduces copy operations, lowers latency and CPU usage, and thus yields
better performance. First of all, the kernel module called vhost_net (refer to the
screenshot in the next section) has to be loaded in the system. As this is a character device
inside the host system, it creates a device file called /dev/vhost-net on the host.
How to turn it on
When QEMU is launched with -netdev tap,vhost=on, it will instantiate the
vhost-net interface by using ioctl() calls. This initialization process binds qemu
with a vhost-net instance, along with other operations such as feature negotiations
and so on:
最新资料最新资料
• Guest-to-host communication
• Small packet workloads
Also, the performance improvement can be obtained by enabling multi queue virtio-
net. For additional information, check out https://fedoraproject.org/wiki/
Features/MQ_virtio_net.
One of the bottlenecks when using virtio-net was its single RX and TX queue. Even
though there are more vCPUs, the networking throughput was affected by this limitation.
virtio-net is a single-queue type of queue, so multi-queue virtio-net was
developed. Before this option was introduced, virtual NICs could not utilize the multi-
queue support that is available in the Linux kernel.
This bottleneck is lifted by introducing multi-queue support in both frontend and backend
drivers. This also helps guests scale with more vCPUs. To start a guest with two queues,
you could specify the queues parameters to both tap and virtio-net, as follows:
<interface type='network'>
<source network='default'/>
<model type='virtio'/>
<driver name='vhost' queues='M'/>
</interface>
Here, M can be 1 to 8, as the kernel supports up to eight queues for a multi-queue tap
device. Once it's configured for qemu, inside the guest, we need to enable multi-queue
support with the ethtool command. Enable the multi-queue through ethtool
(where the value of K is from 1 to M), as follows:
You can check the following link to see when multi-queue virtio-net provides the
greatest performance benefit: https://access.redhat.com/documentation/
en-us/red_hat_enterprise_linux/7/html/virtualization_
tuning_and_optimization_guide/sect-virtualization_tuning_
optimization_guide-networking-techniques.
Don't use the options mentioned on the aforementioned URL blindly – please test the
最新资料最新资料
impact on your setup, because the CPU consumption will be greater in this scenario
even though the network throughput is impressive.
kvm-clock
kvm-clock is also known as a virtualization-aware (paravirtualized) clock device.
When kvm-clock is in use, the guest asks the hypervisor about the current time,
guaranteeing both stable and accurate timekeeping. The functionality is achieved by the
guest registering a page and sharing the address with the hypervisor. This is a shared page
between the guest and the hypervisor. The hypervisor keeps updating this page unless it
is asked to stop. The guest can simply read this page whenever it wants time information.
However, please note that the hypervisor should support kvm-clock for the guest to use
it. For more details, you can check out https://lkml.org/lkml/2010/4/15/355.
By default, most of the newer Linux distributions use Time Stamp Counter (TSC), a CPU
register, as a clock source. You can verify whether TSC or kvm_clock are configured
inside the guest via the following method:
You can also use ntpd or chrony as your clock sources on Linux, which requires
minimal configuration. In your Linux VM, edit /etc/ntpd.conf or /etc/chronyd.
conf and modify the server configuration lines to point to your NTP servers by IP
最新资料最新资料
address. Then, just enable and start the service that you're using (we're using chrony
as an example here):
There's another, a bit newer, protocol that's being heavily pushed for time synchronization,
which is called the Precision Time Protocol (PTP). Nowadays, this is becoming the de
facto standard service to be used on the host level. This protocol is directly supported in
hardware (as in network interface cards) for many of the current network cards available
on the market. As it's basically hardware-based, it should be even more accurate then
ntpd or chronyd. It uses timestamping on the network interface, and external sources,
and the computer's system clock for synchronization.
Installing all of the necessary pre-requisites is just a matter of one yum command to
enable and start a service:
By doing this, a ptp device will be created in the /dev directory, which we can then
use as a chrony time source. Add the following line to /etc/chrony.conf and
restart chronyd:
By using an API call, all Linux VMs are capable of then getting their time from the
最新资料最新资料
Software-based design
Remember our initial scenario, involving a Windows Server 2019-based VM that should
be a node in a Microsoft SQL Server cluster? We covered a lot of the settings in terms of
tuning, but there's more to do – much more. We need to be asking some questions. The
sooner we ask these questions, the better, as they're going to have a key influence on
our design.
Network I/O tuning 607
• Excuse me, dear customer, when you say cluster, what do you mean specifically, as
there are different SQL Server clustering methodologies?
• Which SQL licenses do you have or are you planning to buy?
• Do you need active-active, active-passive, a backup solution, or something else?
• Is this a single-site or a multi-site cluster?
• Which SQL features do you need exactly?
• Which licenses do you have and how much are you willing to spend on them?
• Is your application capable of working with a SQL cluster (for example, in a
multi-site scenario)?
• What kind of storage system do you have?
• What amount of IOPS can your storage system provide?
• How are latencies on your storage?
• Do you have a storage subsystem with different tiers?
• What are the service levels of these tiers in terms of IOPS and latency?
最新资料最新资料
• If you have multiple storage tiers, can we create SQL VMs in accordance with the
best practices – for example, place data files and log files on separate virtual disks?
• Do you have enough disk capacity to meet your requirements?
These are just licensing, clustering, and storage-related questions, and they are not going
to go away. They need to be asked, without hesitation, and we need to get real answers
before deploying things. We have just mentioned 14 questions, but there are actually
many more.
Furthermore, we need to think about other aspects of VM design. It would be prudent
to ask some questions such as the following:
• Do you have any information about the scale and/or amount of queries that you're
designing this SQL infrastructure for?
• Is money a big deciding factor in this project (as it will influence a number of design
decisions as SQL is licensed per core)? There's also the question of Standard versus
Enterprise pricing.
This stack of questions actually points to one very, very important part of VM design,
which is related to memory, memory locality, the relationship between CPU and memory,
and also one of the most fundamental questions of database design – latency. A big part
of that is related to correct VM storage design – the correct storage controller, storage
system, cache settings, and so on, and VM compute design – which is all about NUMA.
We've explained all of those settings in this chapter. So, to configure our SQL VM
properly, here's a list of the high-level steps that we should follow:
• Configure a VM with the correct NUMA settings and local memory. Start with four
vCPUs for licensing reasons and then figure out whether you need more (such as if
your VM becomes CPU-limited, which you will see from performance graphs and
SQL-based performance monitoring tools).
• If you want to reserve CPU capacity, make use of CPU pinning so that specific CPU
cores on the physical server's CPU is always used for the SQL VM, and only that.
Isolate other VMs to the remaining cores.
最新资料最新资料
• Reserve memory for the SQL VM so that it doesn't swap, as only using real RAM
memory will guarantee smooth performance that's not influenced by noisy neighbors.
• Configure KSM per VM if necessary and avoid using it on SQL VMs as it might
introduce latency. In the design phase, make sure you buy as much RAM memory
as possible so that memory doesn't become an issue as it will be a very costly
issue in terms of performance if a server doesn't have enough of it. Don't ever
overcommit memory.
• Configure the VM with multiple virtual hard disks and put those hard disks in
storage that can provide levels of service needed in terms of latency, overhead, and
caching. Remember, an OS disk doesn't necessarily need write caching, but database
and log disks will benefit from it.
• Use separate physical connections from your hosts to your storage devices and tune
storage to get as much performance out of it as possible. Don't oversubscribe – both
on the links level (too many VMs going through the same infrastructure to the same
storage device) and the datastore level (don't put one datastore on a storage device
and store all VMs on it as it will negatively impact performance – isolate workloads,
create multiple targets via multiple links, and use masking and zoning).
Summary 609
The takeway of this chapter is the following – don't just blindly install an application just
because a client asks you to install it. It will come to haunt you later on, and it will be
much, much more difficult to resolve any kind of problems and complaints. Take your
time and do it right. Prepare for the whole process by reading the documentation, as it's
widely available.
最新资料最新资料
Summary
In this chapter, we did some digging, going deep into the land of KVM performance tuning
and optimization. We discussed many different techniques, varying from simple ones,
such as CPU pinning, to much more complex ones, such as NUMA and proper NUMA
configuration. Don't be put off by this, as learning design is a process, and designing things
correctly is a craft that can always be improved with learning and experience. Think of it
this way – when architects were designing the highest skyscrapers in the world, didn't they
move the goalposts farther and farther with each new highest building?
In the next chapter – the final chapter of this book - we will discuss troubleshooting your
environments. It's at least partially related to this chapter, as we will be troubleshooting
some issues related to performance as well. Go through this chapter multiple times before
switching to the troubleshooting chapter – it will be very, very beneficial for your overall
learning process.
610 Performance Tuning and Optimization for KVM VMs
Questions
1. What is CPU pinning?
2. What does KSM do?
3. How do we enhance the performance of block devices?
4. How do we tune the performance of network devices?
5. How can we synchronize clocks in virtualized environments?
6. How do we configure NUMA?
7. How do we configure NUMA and KSM to work together?
Further reading
Please refer to the following links for more information:
html#elementsCPUTuning
• KSM kernel documentation: https://www.kernel.org/doc/
Documentation/vm/ksm.txt
• Placement: http://libvirt.org/formatdomain.
html#elementsNUMATuning
• Automatic NUMA balancing: https://www.redhat.com/files/
summit/2014/summit2014_riel_chegu_w_0340_automatic_numa_
balancing.pdf
• Virtio 1.1 specification: http://docs.oasis-open.org/virtio/virtio/
v1.1/virtio-v1.1.html
• ARP Flux: http://Linux-ip.net/html/ether-arp.html#ether-arp-
flux
Further reading 611
• MQ virtio: https://fedoraproject.org/wiki/Features/MQ_virtio_
net
• libvirt NUMA tuning on RHEL 7: https://access.redhat.com/
documentation/en-us/red_hat_enterprise_linux/7/html/
virtualization_tuning_and_optimization_guide/sect-
virtualization_tuning_optimization_guide-numa-numa_and_
libvirt
最新资料最新资料
最新资料最新资料
16
Troubleshooting
Guidelines for the
KVM Platform 最新资料最新资料
If you've followed this book all the way from Chapter 1, Understanding Linux
Virtualization, then you'll know we went through a lot together in this book – hundreds
and hundreds of pages of concepts and practical aspects, including configuration
examples, files and commands – everything. 700 or so pages of it. So far, we've almost
completely ignored troubleshooting as part of that journey. We didn't do this on the
premise that everything just works in Linux and that we didn't have any issues at all
and that we achieved a state of nirvana while going through this book cover to cover.
614 Troubleshooting Guidelines for the KVM Platform
It was a journey riddled with various types of issues. Some of them aren't worth
mentioning as they were our own mistakes. Mistakes like the ones we made (and you
will surely make more too) mostly come from the fact that we mistyped something (in a
command or configuration file). Basically, humans play a big role in IT. But some of these
issues were rather frustrating. For example, implementing SR-IOV required a lot of time
as we had to find different types of problems at the hardware, software, and configuration
levels to make it work. oVirt was quite quirky, as we'll soon explain. Eucalyptus was
interesting, to put it mildly. Although we used it a lot before, cloudbase-init was really
complicated and required a lot of our time and attention, which turned out not to be due
to something we did – it was just the cloudbase-init version. But overall, this just further
proved a general point from our previous chapter – reading about various IT subjects in
books, articles, and blog posts – is a really good approach to configuring a lot of things
correctly from the start. But even then, you'll still need a bit of troubleshooting to make
everything picture perfect.
Everything is great and amazing once you install a service and start using it, but it seldom
happens that way the first time round. Everything we used in this book was actually
installed to enable us to test different configurations and grab the necessary screenshots,
but at the same time, we wanted to make sure that they can actually be installed and
configured in a more structured, procedural way.
So, let's start with some simple things related to services, packages, and logging. Then,
最新资料最新资料
we'll move on to more advanced concepts and tools for troubleshooting, described
through various examples that we have covered along the way.
In this chapter, we will cover the following topics:
In Chapter 3, Installing a KVM Hypervisor, libvirt, and ovirt, we did a basic installation
of the overall KVM stack by installing virt module and using the dnf command to
deploy various packages. There are a couple of reasons why this might not end up being
a good idea:
You can also use the following command, if you already installed your Linux host
and the appropriate packages that we mentioned in Chapter 3, Installing a KVM
hypervisor, libvirt, and ovirt:
virt-host-validate
If you don't get any output from this command, your system either doesn't support
virtualization (less likely) or doesn't have virtualization features turned on. Make
sure that you check your BIOS settings. 最新资料最新资料
Often, we forget to start and enable the libvirt-guests service, and then we
get very surprised after we reboot our host. The result of libvirt-guests not
being enabled is simple. When started, it suspends your virtual machines when you
initiate shutdown and resumes them on the next boot. In other words, if you don't
enable them, your virtual machines won't resume after the next reboot. Also, check
out its configuration file, /etc/sysconfig/libvirt-guests. It's a simple
text configuration file that enables you to configure at least three very important
settings: ON_SHUTDOWN, ON_BOOT, and START_DELAY. Let's explain these:
a) By using the ON_SHUTDOWN setting, we can select what happens with the virtual
machine when we shut down your host since it accepts values such as shutdown
and suspend.
b) The ON_BOOT option does the opposite – it tells libvirtd whether it needs to
start all the virtual machines on host boot, whatever their autostart settings are. It
accepts values such as start and ignore.
c) The third option, START_DELAY, allows you to set a timeout value (in seconds)
between multiple virtual machine power-on actions while the host is booting. It
accepts numeric values, with 0 being the value for parallel startup and all other
(positive) numbers being the number of seconds it waits before it starts the next
virtual machine.
最新资料最新资料
• Make sure that these two services are actually running by typing in the following
commands:
systemctl status libvirtd
systemctl status libvirt-guests
• When you start the libvirt service, it usually comes with some sort of pre-defined
firewall configuration. Keep that in mind in case you ever decide to disable libvirt
services as the firewall rules will almost always still be there. That might require
a bit of additional configuration.
The next step in your troubleshooting journey will be checking through some of the log
files. And there are plenty to choose from – KVM has its own, oVirt has its own, as does
Eucalyptus, ELK, and so on. So, make sure that you know these services well so that you
can check the correct log files for the situation you're trying to troubleshoot. Let's start
with KVM services logging.
• Let's say that you're logged in as root in the GUI and that you started virt-manager.
This means that you have a virt-manager.log file located in the /root/.
cache/virt-manager directory. It's really verbose, so be patient when reading
through it.
• The /etc/libvirt/libvirtd.conf file is libvirtd's configuration file and
最新资料最新资料
contains a lot of interesting options, but some of the most important options are
actually located almost at the end of the file and are related to auditing. You can
select the commented-out options (audit_level and audit_logging) to suit
your needs.
• The /var/log/libvirt/qemu directory contains logs and rotated logs for all of
the virtual machines that were ever created on our KVM host.
Also, be sure to check out a command called auvirt. It's really handy as it tells you
basic information about the virtual machines on your KVM host – both virtual machines
that are still there and/or successfully running and the virtual machines that we tried to
install and failed at doing so. It pulls its data from audit logs, and you can use it to display
information about a specific virtual machine we need as well. It also has a very debug-level
option called --all-events, if you want to check every single little detail about any
virtual machine that was – or still is – an object on the KVM host.
618 Troubleshooting Guidelines for the KVM Platform
Let's explain them step by step. The first option – log_level – describes log verbosity.
This option has been deprecated since libvirt version 4.4.0. In the Logging controls
section of the file, there's additional documentation hardcoded into the file to make things
easier. For this specific option, this is what the documentation says:
最新资料最新资料
Important Note
After changing any of these settings, we need to make sure that we restart the
libvirtd service by typing in the systemctl restart libvirtd
command.
export LIBVIRT_LOG_OUTPUTS="1:file:/var/log/libvirt_guests.log"
All these options are valid until the next libvirtd service restart, which is quite handy
for permanent settings. However, there's a runtime option that we can use when we need
to debug a bit on the fly, without resorting to permanent configuration. That's why we
have a command called virt-admin. We can use it to set our own settings. For example,
let's see how we can use it to get our current settings, and then how to use it to set
temporary settings:
最新资料最新资料
This is something that's definitely recommended after we're done debugging. We don't
want to use our log space for nothing.
In terms of straight-up debugging virtual machines – apart from these logging options
– we can also use serial console emulation to hook up to the virtual machine console.
This is something that we'd do if we can't get access to a virtual machine in any other
way, especially if we're not using a GUI in our environments, which is often the case in
production environments. Accessing the console can be done as follows:
In the preceding command, kvm_domain_name is the name of the virtual machine that
we want to connect to via the serial console.
• oVirt problems
• Problems with snapshots and templates
• Virtual machine customization issues
• Ansible issues
• OpenStack problems
• Eucalyptus and AWS combo problems
• ELK stack issues
Interestingly enough, one thing that we usually don't have problems with when dealing
最新资料最新资料
with KVM virtualization is networking. It's really well documented – from KVM bridges
all the way to open vSwitch – and it's just the matter of following the documentation. The
only exception is related to firewall rules, which can be a handful, especially when dealing
with oVirt and remote database connections while keeping a minimal security footprint.
If you're interested in this, make sure that you check out the following link: https://
www.ovirt.org/documentation/installing_ovirt_as_a_standalone_
manager_with_remote_databases/#dns-requirements_SM_remoteDB_
deploy.
There's a big table of ports later in that article describing which port gets used for what
and which protocols they use. Also, there's a table of ports that need to be configured at
the oVirt host level. We recommend that you use this article if you're putting oVirt
into production.
622 Troubleshooting Guidelines for the KVM Platform
oVirt
There are two common problems that we often encounter when dealing with oVirt:
• Installation problems: We need to slow down when we're typing installation options
into the engine setup and configure things correctly.
• Update problems : These can either be related to incorrectly updating oVirt or the
underlying system.
Installation problems are fairly simple to troubleshoot as they usually happen when we're
just starting to deploy oVirt. This means that we can afford the luxury of just stopping the
installation process and starting from scratch. Everything else will just be too messy and
complicated.
Update problems, however, deserve a special mention. Let's deal with both subsets of
oVirt update issues and explain them in a bit more detail.
Updating the oVirt Engine itself requires doing the thing that most of us just dislike
doing – reading through heaps and heaps of documentation. The first thing that we
need to check is which version of oVirt are we running. If we're – for example – running
version 4.3.0 and we want to upgrade to 4.3.7, this is a minor update path that's pretty
straightforward. We need to back up our oVirt database first:
最新资料最新资料
We do this just as a precaution. Then, later on, if something does get broken, we can use
the following command:
If you didn't deploy the DWH service and its database, you can ignore the
--provision-dwh-db option. Then, we can do the standard procedure:
engine-upgrade-check
yum update ovirt\*setup\*
engine-setup
This should take about 10 minutes and cause no harm at all. But it's still better to be safe
than sorry and back up the database before doing that.
Advanced troubleshooting tools 623
If we're, however, migrating from some older version of oVirt to the latest one – let's
say, from version 4.0.0, or 4.1.0, or 4.2.0 to version 4.3.7 – that's a completely different
procedure. We need to go to the ovirt.org website and read through the documentation.
For example, let's say that we're updating from 4.0 to 4.3. There's documentation on ovirt.
org that describes all these processes. You can start here: https://www.ovirt.org/
documentation/upgrade_guide/.
This will give us 20 or so substeps that we need to complete to successfully upgrade. Please
be careful and patient, as these steps are written in a very clear order and need to be
implemented that way.
Now that we've covered oVirt troubleshooting in terms of upgrading, let's delve into OS
and package upgrades as that's an entirely different discussion with much more to consider.
Keeping in mind that oVirt has its own prerequisites, ranging from CPU, memory, and
storage requirements to firewall and repository requirements, we can't just blindly go and
use a system-wide command such as the following:
yum -y update
We can't expect oVirt to be happy with that. It just won't, and this has happened to us
many times, both in production environments and while writing this book. We need to
check which packages are going to be deployed and check if they're in some co-dependent
最新资料最新资料
relationship to oVirt. If there are such packages, you need to make sure that you do the
engine-backup procedure that we mentioned earlier in this chapter. It will save you from a
lot of problems.
It's not only the oVirt Engine that can be a problem – updating KVM hosts that oVirt has
in its inventory can also be quite a bit melodramatic. The oVirt agent (vdsm) that gets
deployed on hosts either by the oVirt Engine or our manual installation procedures, as
well as its components, also have their own co-dependencies that can be affected by a
system-wide yum -y update command. So, put the handbrake on before just accepting
upgrades as it might bring a lot of pain later. Make sure that you check the vdsm logs
(usually located in the /var/log/vdsm directory). These log files are very helpful when
you're trying to decipher what went wrong with vdsm.
The most common problem apart from these pre-configuration issues is related to failover
– a scenario where a path toward a storage device fails. That's when we are very happy if
we scaled out our storage or storage network infrastructure a bit – we added additional
adapters, additional switches, configured multipathing (MPIO), and so on. Make sure
that you check your storage device vendor's documentation and follow along with the
best practices for a specific storage device. Believe us when we say this – iSCSI storage
configuration and its default settings are a world apart from configuring Fibre Channel
storage, especially when multipathing is concerned. For example, when using MPIO with
iSCSI, it's much happier and snappier if you configure it properly. You'll find more details
about this process in the Further reading section at the end of this chapter.
If you're using IP-based storage, make sure that multiple paths toward your storage
device(s) use separate IP subnets as everything else is a bad idea. LACP-like technologies
and iSCSI don't work in same sentence together and you'll be troubleshooting a
technology that's not meant for storage connections and is working properly, while
you're thinking that it's not. We need to know what we're troubleshooting; otherwise,
troubleshooting makes no sense. Creating LACP for iSCSI equals still using one path
for iSCSI connections, which means wasting network connectivity that doesn't actively
get used except for in the case of a failover. And you don't really need LACP or similar
technologies for that. One notable exception might be blade servers as you're really
limited in terms of upgrade options on blades. But even then, the solution to the we
最新资料最新资料
need more bandwidth from our host to storage problem is to get a faster network or Fibre
Channel adapter.
• The backup application doesn't want to start because the virtual machine has a
snapshot (a common one).
• A snapshot doesn't want to delete and assemble.
• Multiple snapshots don't want to delete and assemble.
• A snapshot crashes a virtual machine for quirky reasons.
Advanced troubleshooting tools 625
• A snapshot crashes a virtual machine for valid reasons (lack of disk space
on storage)
• A snapshot crashes an application running in a virtual machine as that application
doesn't know how to tidy itself up before the snapshot and goes into a dirty state
(VSS, sync problems)
• Snapshots get lightly misused, something happens, and we need to troubleshoot
• Snapshots get heavily misused, something always happens, and we need
to troubleshoot
This last scenario occurs far more often than expected as people really tend to flex their
muscles regarding the number of snapshots they have if they're given permission to. We've
seen virtual machines with 20+ snapshots running on a production environment and
people complaining that they're slow. All you can do in that situation is breathe in, breathe
out, shrug, and ask, "What did you expect, that 20+ snapshots are going to increase the
speed of your virtual machine"?
Through it all, what got us through all these issues was three basic principles:
amount of available storage space on the datastore where the virtual machine is
located, and then check if the virtual machine already has snapshots.
• Constantly repeating the mantra: snapshots are not backups to all of our clients, over,
and over again, and hammering them with additional articles and links explaining
why they need to lay off the snapshots, even if that means denying someone
permission to even take a snapshot.
Actually, this last one has become a de facto policy in many environments we've
encountered. We've even seen companies implementing a flat-out policy when dealing
with snapshots, stating that the company policy is to have one or two snapshots, max, for
a limited period of time. For example, in VMware environments, you can assign a virtual
machine advanced property that sets the maximum number of snapshots to 1 (using
a property called snapshot.maxSnapshots). In KVM, you're going to have to use
storage-based snapshots for these situations and hope that the storage system has policy-
based capabilities to set the snapshot number to something. However, this kind of goes
against the idea of using storage-based snapshots in many environments.
626 Troubleshooting Guidelines for the KVM Platform
The vast majority of these and other problems are related to the fact that cloudbase-init
最新资料最新资料
has documentation that's really bad. It does have some config file examples, but most of
it is more related to APIs or the programmatic approach than actually explaining how to
create some kind of configuration via examples. Furthermore, we had various issues with
different versions, as we mentioned in Chapter 10, Automated Windows guest deployment
and customization. We then settled on a pre-release version, which worked out-of-the-
box with a configuration file that wasn't working on a stable release. But by and large,
the biggest issue we had while trying to make it work was related to making it work with
PowerShell properly. If we get it to execute PowerShell code properly, we can pretty much
configure anything we want on a Windows-based system, so that was a big problem.
Sometimes, it didn't want to execute a PowerShell script from a random directory on the
Windows system disk.
Make sure that you use examples in this book for your starting points. We deliberately
made examples in Chapter 10, Automated Windows guest deployment and customization as
simple as possible, which includes the executed PowerShell code. Afterward, spread your
wings and fly – do whatever needs to be done with it. PowerShell makes everything easier
and more natural when you're working with Microsoft-based solutions, both local and
hybrid ones.
Advanced troubleshooting tools 627
As you've probably deduced yourself, OpenStack is really, really picky when it comes
to storage. That's why storage companies usually create reference architectures for their
own storage devices to be used in OpenStack-based environments. Check out these two
documents from HPE and Dell EMC as good examples of that approach:
• https://www.redhat.com/cms/managed-files/cl-openstack-hpe-
synergy-ceph-reference-architecture-f18012bf-201906-en.pdf
• https://docs.openstack.org/cinder/rocky/configuration/
block-storage/drivers/dell-emc-unity-driver.html
One last word of warning relates to the most difficult obstacle to surmount – OpenStack
version upgrades. We can tell you loads of horror stories on this subject. That being
said, we're also partially to blame here, because we, as users, deploy various third-
party modules and utilities (vendor-based plugins, forks, untested solutions, and so
最新资料最新资料
on), forget about using them, and then we're really surprised and horrified when the
upgrade procedure fails. This goes back to our multiple discussions about documenting
environments that we had throughout this book. This is a subject that we'll revisit for one
final time just a bit later in this chapter.
Dependencies
Every administrator is completely aware that almost every service has some dependencies
– either the services that depend on this particular service running or services that our
service needs to work. Dependencies are also a big thing when working with packages
– the whole point of package managers is to strictly pay attention to what needs to be
installed and what depends on it so that our system works as it should.
What most admins do wrong is forget that, in larger systems, dependencies can stretch
across multiple systems, clusters, and even data centers.
Advanced troubleshooting tools 629
Every single course that covers OpenStack has a dedicated lesson on starting, stopping,
and verifying different OpenStack services. The reason for this is simple – OpenStack
is usually run across a big number of nodes (hundreds, sometimes thousands). Some
services must run on every node, some are needed by a set of nodes, some services are
duplicated on every node instance, and some services can only exist as a single instance.
Understanding the basics of each service and how it falls into the whole OpenStack
schema is not only essential when installing the whole system but is also the most
important thing to know when debugging why something is not working on OpenStack.
Read the documentation at least once to connect the dots. Again, the Further reading
section at the end of this chapter contains links that will point you in the right direction
regarding OpenStack.
OpenStack is one of those systems that includes how do I properly reboot a machine
running X? in the documentation. The reason for this is as simple as the whole system
is complex – each part of the system both has something it depends on and something
that is depending on it – if something breaks, you need to not only understand how this
particular part of the system works, but also how it affects everything else. But there is a
silver lining through all this – in a properly configured system, a lot of it is redundant, so
sometimes, the easiest way of repairing something is to reinstall it.
And this probably sums the whole troubleshooting story – trying to fix a simple system
最新资料最新资料
Troubleshooting Eucalyptus
It would be a lie to say that once we started the installation process, everything went
according to the manual – most of it did, and we are reasonably sure that if you follow the
steps we documented, you will end up with a working service or system, but at any point
in time, there are things that can – and will – go wrong. This is when you need to do the
most complicated thing imaginable – troubleshoot. But how do you do that? Believe it or
not, there is a more or less systematic approach that will enable you to troubleshoot almost
any problem, not just KVM/OpenStack/AWS/Eucalyptus-related ones.
630 Troubleshooting Guidelines for the KVM Platform
Gathering information
Before we can do anything, we need to do some research. And this is the moment most
people do the wrong thing, because the obvious answer is to go to the internet and search
for the problem. Take a look at this screenshot:
最新资料最新资料
Figure 16.5 – Eucalyptus logs, part I – clean, crisp, and easy to read – every procedure that's been done
in Eucalyptus clearly visible in the log
Advanced troubleshooting tools 631
If you haven't noticed already, the internet is full of ready-made solutions to almost any
imaginable problem, with a lot of them being wrong. There are two reasons why this is so:
most of the people who worked on the solution didn't understand what the problem was,
so as soon as they found any solution that solved their particular problem, they simply
stopped solving it. In other words – a lot of people in IT try to picture a path from point
A (problem) to point B (solution) as a laser beam – super flat, the shortest possible path,
no obstacles along the way. Everything is nice and crisp and designed to mess with our
troubleshooting thought process as soon as the laser beam principle stops working. This is
because, in IT, things are rarely that simple.
Take, for example, any problem caused by DNS being misconfigured. Most of those can be
solved by creating an entry in the hosts file. This solution usually works but is, at the same
time, wrong on almost any level imaginable. The problem that is solved by this is solved
on only one machine – the one that has the particular hosts file on it. And the DNS is still
misconfigured; we just created a quick, undocumented workaround that will work in our
particular case. Every other machine that has the same problem will need to be patched in
this way, and there is a real possibility that our fix is going to create even more problems
down the road.
The real solution would obviously be to get to the root of the problem itself and solve
the issue with DNS, but solutions like this are few and far between on the internet. This
happens mainly because the majority of the commenters on the internet are not familiar
最新资料最新资料
with a lot of services, and quick fixes are basically the only ones they are able to apply.
Another reason why the internet is mostly wrong is because of the famous reinstall
fixed the problem solution. Linux has a better track record there as people who use it
are less inclined to solve everything by wiping and reinstalling the system, but most of
the solutions you will find for Windows problems are going to have at least one simple
reinstall fixed it. Compared to just giving a random fix as the one that always works, this
reinstall approach is far worse. Not only does it mean you are going to waste a lot of time
reinstalling everything; it also means your problem may or may not be solved in the end,
depending on what the problem actually was.
So, the first short piece of advice we will give is, do not blindly trust the internet.
632 Troubleshooting Guidelines for the KVM Platform
OK, but what should you actually do? Let's take a look:
1. Gather information about the problem. Read the error message, read the logs (if the
application has logs), and try to turn on debug mode if at all possible. Get some
solid data. Find out what is crashing, how it is crashing, and what problems are
causing it to crash. Take a look at the following screenshot:
最新资料最新资料
Figure 16.6 – Eucalyptus logs, part II – again, clean, crisp, and easy to read – information messages
about what was updated and where
2. Read the documentation. Is the thing you are trying to do even supported? What are
the prerequisites for the functioning system? Are you missing something? A cache
disk? Some amount of memory? A fundamental service that is a dependency for
your particular system? A dependency that's a library or additional packages? A
firmware upgrade?
Advanced troubleshooting tools 633
Sometimes, you will run into an even bigger problem, especially in poorly written
documentation – some crucial system dependency may be mentioned in passing and may
cause your entire system to crash. Take, for example, an external identification service
– maybe your directory uses a wrong character set, causing your system to crash when a
particular user uses it in a particular way. Always make sure you understand how your
systems are interconnected.
Next, check your system. If you are installing a new system, check the prerequisites. Do
you have enough disk space and memory? Are all the services your application requires
readily available and working properly?
Search the internet. We mentioned previously that the internet has a simple, incorrect
solution to all possible problems, but it usually also has the right solution hidden
somewhere among the wrong ones. Having armed yourself with a lot of data about your
particular system and your specific problem, the internet will soon become your friend.
Since you understand what the problem is, you will be able to understand what solutions
have been offered to you are simply wrong.
Now, let's talk about a real-world problem we created while installing Eucalyptus on
purpose, just to show you how important documentation is.
We showed you how to install Eucalyptus in Chapter 13, Scaling Out KVM with AWS – we
not only went through the installation process but also how to use this amazing service.
最新资料最新资料
If you want to learn something about how not to do it, continue reading. We will present
you with a deliberate scenario of an unsuccessful Eucalyptus installation that won't finish
because we creatively forgot to do some steps that we knew we needed to do. Let's put it this
way – we acted as humans and used the method of browsing the documentation instead of
actually sitting down and reading the documentation. Does that sound familiar?
Installing Eucalyptus should be a straightforward task since its installation is, in essence,
an exercise in applied scripting. Eucalyptus even says so on the front page of the project:
just run this script.
But the truth is much more complicated – Eucalyptus can definitely be installed using
only this script, but certain prerequisites must be met. Of course, in your rush to test the
new service, you will probably neglect to read the documentation, as we did, since we
already had experience with Eucalyptus.
We configured the system, we started the installation, and we ran into a problem. After
confirming the initial configuration steps, our installation failed with an error that said it
was unable to resolve a particular address: 192.168.1.1.nip.io.
634 Troubleshooting Guidelines for the KVM Platform
DNS is one of the primary sources of problems in the IT infrastructure, and we quickly
started debugging – the first thing we wanted to see was what this particular address
is. There's actually a saying in IT – It's always DNS. It looks like a local address, so we
started pinging it, and it seemed fine. But why is DNS even involved with IP addresses?
DNS should be resolving domain names, not IP addresses. Then, we turned to the
documentation, but that didn't yield much. The only thing that we found was that DNS
must work for the whole system to work.
Then, it was time to try and debug the DNS. First, we tried resolving it from the machine
we were installing it on. The DNS returned a timeout. We tried this on another machine
and we got back the response we didn't expect – 127.0.0.1.nip.io resolved as
127.0.0.1, which meant localhost. Basically, we asked a DNS on the internet to give
us an address, and it directed us to our local system.
So, we had an error we didn't understand, an address that resolved to an IP address we
hadn't expected, and two different systems exhibiting completely different behaviors for
an identical command. We turned our attention to the machine we were installing on and
realized it was misconfigured – there was no DNS configured at all. The machine not only
failed to resolve our strange IP address but failed to resolve anything.
We fixed that by pointing to the right DNS server. Then, in true IT fashion, we restarted
the installation so that we were able to go through with this part and everything was ok,
最新资料最新资料
or so it seemed. But what happened? Why is a local service resolving such strange names
and why do they get resolved at all?
We turned to the internet and took a look at the name of the domain that our mystery
name had at its end. What we found out is that the service, nip.io, actually does just the
thing we observed it do – when asked for a particular name formed from an IP address in
the local subnet range (as defined by RFC 1918), it returned that same IP.
Our next question was – why?
After some more reading, you will realize what the trick was here – Eucalyptus uses DNS
names to talk to all of its components. The authors very wisely chose not to hardcode a
single address into the application, so all the services and nodes of the system have to have
a real DNS registered name. In a normal multi-node, multi-server installation, this works
like a charm – every server and every node are first registered with their appropriate DNS
server, and Eucalyptus will try and resolve them so it can communicate with the machine.
Advanced troubleshooting tools 635
We are installing a single machine that has all the services on it, and that makes installing
easier, but nodes do not have separate names, and even our machine may not be registered
with the DNS. So, the installer does a little trick. It turns local IP addresses into completely
valid domain names and makes sure we can resolve them.
So, now we know what happened (resolving process was not working) and why it
happened (our DNS server settings were broken), but we also understood why DNS
was needed in the first place.
This brings us to the next point – do not presume anything.
While we were troubleshooting and then following up on our DNS problem, our
installation crashed. Eucalyptus is a complex system and its installation is a fairly complex
thing – it automatically updates the machine you run it on, then it installs what seems
like thousands of packages, and then it downloads, configures, and runs a small army
of images and virtual packages. To keep things tidy, the user doesn't see everything that
is happening, only the most important bits. The installer even has a nice ASCII graphic
screen to keep you busy. Everything was OK up to a point, but suddenly, our installation
completely crashed. All we got was a huge stack trace that looked like it belonged to the
Python language. We reran the installation, but it failed again.
The problem at this point was that we had no idea why all this was happening since the
installation calls for a minimal installation of CentOS 7. We were running our tests on a
最新资料最新资料
As with all great installers of the IT universe, this one also has something reserved
especially for this possibility: a log file. Take a look at the following screenshot:
最新资料最新资料
Figure 16.7 – The Eucalyptus installation process takes time when you don't read its documentation.
And then some more time... and some more...
This is the installation screen. We can't see any real information regarding what is
happening, but the third line from the top contains the most important clue – the location
of the log file. In order to stop your screen from being flooded with information, the
installer shows this very nice figlet-coffee graphic (everyone who ever used IRC in the
1990s and 2000s will probably smile now), but also dumps everything that is happening
into a log. By everything, we mean everything – every command, every input, and every
output. This makes debugging easy – we just need to scroll to the end of this file and try to
go from that point backward to see what broke. Once we did that, the solution was simple
– we forgot to allocate enough memory for the machine. We gave it 8 GB of RAM, and
officially it should have at least 16 GB to be able smoothly. There are reports of machines
running with as little as 8 GB of RAM, but that makes absolutely no sense – we are
running a virtualized environment after all.
Advanced troubleshooting tools 637
for, it will get you nowhere. Help in this form only works as long as you have a basic
understanding of the concepts. If you are doing something for the first time, or you have
a problem you haven't seen before, we suggest that you find an example of the task you
are trying to do, and do the exercise.
In our case, this was strange, since all we had to do was run a simple command. But our
import was still failing. After a couple of hours of us banging our head against the wall, we
decided to just behave like we knew nothing and went and did the how do I import a VM
into AWS? example. Everything worked. Then, we tried importing our own machine; that
didn't work. The commands were copy/pasted, but it still didn't work.
And then we realized the most important thing – we need to pay attention to details.
Without this train of thought properly implemented and executed, we're inviting a world
of problems upon ourselves.
638 Troubleshooting Guidelines for the KVM Platform
最新资料最新资料
What we misconfigured was the name of the role – to import the VM into the EC2
instance, there needs to be a security role named vmimport giving EC2 the right
permissions. We configured a role named importvm in our haste. When we completed
the examples, we pasted the examples and everything was fine, but as soon as we started
using our security settings, EC2 was failing to do its job. So, always check the product
documentation and read it carefully.
elastalert
• Use Elastic Stack Features (formerly X-Pack) – check out this URL: https://
www.elastic.co/guide/en/x-pack/current/installing-xpack.
html
Here's one more piece of advice: you can always centralize logs via rsyslog as it's a
built-in feature. There are free applications out there for browsing through log files if you
create a centralized log server (Adiscon LogAnalyzer, for example). If dealing with ELK
seems like a bit too much to handle, but you're aware of the fact that you need something,
start with something like that. It's very easy to install and configure and offers a free
web-like interface with regular expression support so that you can browse through
log entries.
640 Troubleshooting Guidelines for the KVM Platform
• Keep it simple, in configuration: What good does a situation in which you deployed
50 OpenStack hosts across three subnets in one site do? Just because you can subnet
to an inch of an IP range's life doesn't mean you should. Just because you have eight
available connections on your server doesn't mean that you should LACP all of
them to access iSCSI storage. Think about end-to-end configuration (for example,
Jumbo Frames configuration for iSCSI networks). Simple configuration almost
always means simpler troubleshooting.
• Keep it simple, in troubleshooting: Don't go chasing the super-complex scenarios
first. Start simple. Start with log files. Check what's written there. With time, use
your gut feeling as it will develop and you'll be able to trust it.
• Use monitoring tools such as ELK stack: Use something to monitor your
environments constantly. Invest in some kind of large-screen display, hook it up
to a separate computer, hang that display on a wall, and spend time configuring
important dashboards for your environments.
• Use reporting tools to create multiple automated reports about the state of your
最新资料最新资料
Get used to having a large portion of <insert your favorite drink here>
available at all times and many sleepless nights if you want to work in IT as administrator,
engineer, or DevOps engineer. Coffee, Pepsi, Coca-Cola, lemon juice, orange juice….
whatever gets your intellectual mojo flowing. And sometimes, learn to walk away from
a problem for a short period of time. Solutions often click in your head when you're
thinking about something completely opposite to work.
Summary 641
And finally, remember to try and have fun while working. Otherwise, the whole ordeal of
working with KVM or any other IT solution is just going to be an Open Shortest Path First
to relentless frustration. And frustration is never fun. We prefer yelling at our computers
or servers. It's therapeutic.
Summary
In this chapter, we tried to describe some basic troubleshooting steps that can be applied
generally and when troubleshooting KVM. We also discussed some of the problems
that we had to deal with while working with various subjects of this book – Eucalyptus,
OpenStack, the ELK stack, cloudbase-init, storage, and more. Most of these issues were
caused by misconfiguration, but there were quite a few where documentation was severely
lacking. Whatever happens, don't give up. Troubleshoot, make it work, and celebrate when
you do.
Questions
1. What do we need to check before deploying the KVM stack?
2. What do we need to configure after deploying the KVM stack in terms of making
sure that virtual machines are going to run after reboot?
最新资料最新资料
Further reading
Please refer to the following links for more information regarding what was covered in
this chapter:
knowledgecenter/STHGUJ_8.2.1/com.ibm.storwize.v5100.821.
doc/storwize_openstack_matrix.html
• HPE Reference Architecture for the Red Hat OpenStack Platform on _HPE Synergy
with Ceph Storage: https://www.redhat.com/cms/managed-files/
cl-openstack-hpe-synergy-ceph-reference-architecture-
f18012bf-201906-en.pdf
• Integrating Dell EMC Unity and OpenStack: https://docs.openstack.org/
cinder/rocky/configuration/block-storage/drivers/dell-emc-
unity-driver.html
• DM-multipath configuration for Red Hat Enterprise Linux 7: https://access.
redhat.com/documentation/en-us/red_hat_enterprise_linux/7/
html/dm_multipath/mpio_setup
• DM-multipath configuration for Red Hat Enterprise Linux 8: https://
access.redhat.com/documentation/en-us/red_hat_enterprise_
linux/8/pdf/configuring_device_mapper_multipath/Red_Hat_
Enterprise_Linux-8-Configuring_device_mapper_multipath-
en-US.pdf
Further reading 643
最新资料最新资料
最新资料最新资料
最新资料最新资料
Index
A aggregator
configuring 536, 537
Access Control List (ACL) 143 Amazon Elastic Compute Cloud (EC2) 14
Address Resolution Protocols (ARPs) 249 Amazon Machine Image (AMI) 503
ad hoc 388 Amazon Web Services (AWS)
Advanced Host Controller about 14, 470, 637
Interface (AHCI) 172 big infrastructure 474, 475
Advanced Micro Devices (AMD) 247 最新资料最新资料
cloud, approaching 470, 471
advanced troubleshooting tools data centers 477, 478
about 621 image, uploading to EC2 498-507
Ansible and OpenStack, problems key placement 478, 479
working with 627, 628 market share 474
AWS 637 multi-cloud 472
dependencies 628, 629 pricing 475-477
ELK stack, used for troubleshooting Shadow IT 473, 474
problems 639 verbosity 637
Eucalyptus, troubleshooting 629 virtual machines, converting 484
KVM, storage problems 624 virtual machines, migrating 484-498
oVirt 622, 623 virtual machines, preparing 484
oVirt, storage problems 624 AMD Virtualization (AMD-V) 21
service, executing 638, 639 Ansible
service, implementing 638, 639 about 366-372
snapshots and templates, approaches 366
problems 624-626 deploying 382, 383
agentless systems 369 examples, with KVM 409, 410
650 Index
Gluster I
using, as storage backend
Identity and Access Management
for KVM 150-154
(IAM) 481, 483
GPU
image information
partitioning, with NVIDIA
obtaining 164
vGPU 187, 188, 189
Infrastructure-as-a-Service (IaaS) 14, 416
GPU PCI passthrough
input/output (I/O) 231
enabling 189-193
input/output (I/O)-intensive
graphical user interface (GUI) 286
applications 285
guestfish
Input/Output Memory Management
using 259-262
Unit (IOMMU) 22
integrated drive electronics (IDE) 234
H internal snapshot
about 285
hardware-assisted virtualization 28, 29
creating 287
hardware-based approach 8
creating, with custom name
hardware design 561-564
and description 288
high availability (HA) 226
deleting 290, 291
Horizon 435
multiple snapshots, creating 288, 289
hosted hypervisors 11
最新资料最新资料
reverting to 290
host's NUMA topology 596
working with 287
HugePages 576
International Organization for
hybrid KVM clouds
Standardization (ISO) 213
building, with Eucalyptus 507, 508
Internet of Things (IoT) 182
hybrid virtualization 8
Internet Protocol (IP) 212, 263
hypercalls 28
Internet Small Computer Systems
HyperText Transfer Protocol (HTTP) 214
Interface (iSCSI) 239
HyperText Transfer Protocol
inventories 370
Secure (HTTPS) 214
iothread 49
hypervisor
iSCSI
type 1 hypervisor 10, 11
using 136-145
type 2 hypervisor 10, 11
ISO image library
using 9
creating 167, 168
Hyper-V Network Virtualization
(HNV) 425
654 Index
guest mode 50
internal working 30
installing 393-399
URL 30
internal working 49, 50
used, for starting virtual machine 83-86
storage problems 623, 624
libvirtd 30
virtual machine, installing 76
libvirt isolated network 95-100
Windows VMs, creating
libvirt NAT network 93, 94
prerequisites 346
libvirt routed network 94, 95
KVM APIs 52
libvirt storage pools 130
kvm-clock 605, 606
linear design 561
KVM guest time-keeping
linked cloning method
best practices 604
about 265
KVM internals 41
used, for deploying VM 277-280
KVM issues
Linux, Apache, MySQL, and
best practices, for troubleshooting 640
PHP (LAMP) 266
kvm_libvirt module
Linux bridging
used, for provisioning virtual
implementing 103-105
machine 383-387
KVM services logging 617
Index 655
T U
TAP devices Uniform Resource Locator (URL) 214
userspace networking, using universally unique identifier
with 101, 102 (UUID) 160, 246
templates user data
creating 266 about 320
creating, examples 266-273 passing, to cloud-init 321, 322
problems 624-626 user mode Linux (UML) 15, 433
virt-sysprep 270, 272 userspace networking
working with 266 using, with TAP devices 101, 102
terabyte (TB) 222 using, with TUN devices 101, 102
Time Stamp Counter (TSC) 605
Tiny Code Generator (TCG)
about 39
V
reference link 39 vCPU
Translation Lookaside Buffer execution flow 59-64
(TLB) 22, 577 VDI scenarios
Transmission Control Protocol (TCP) 248 physical graphics cards 185-187
Transparent Hugepages (THP) 577 virtual graphics cards 185-187
最新资料最新资料