Getting Started Developing Vsphere IO Filter Solutions
Getting Started Developing Vsphere IO Filter Solutions
Filter Solutions
Version 1.0
VMware, Inc.
3401 Hillview Ave.
Palo Alto, CA 94304
www.vmware.com
Preface 7
Acknowledgements and Credits 7
Legal Access Notice 7
1 Course Overview 9
About This Course 9
Understand the Course's Prerequisites 9
Understand the Course's Strategic Objectives 10
Understanding the Course's Tactical Objectives 10
Understanding the Course's Organization 12
Chapter Summary 12
Index 275
n Content developers (in alphabetical order): Nagib Gulam, Vasuki Nagaraju , Kamal Prasad, Deepa
Seshadri, Matthew Thurmaier, Jinlu Yu
n Content Reviewers (in alphabetical order) – Adrian Drzewiecki, Bob Seigman, Christoph Klee, Dworkin
Muller, Jesse Pool, Nasser Shayesteh, Nishant Yadav, Suraj Swaminathan, Rohit Jog, Zahed Khurasani .
These people gave significant feedback and corrections. To the extent that there are errors still in the
course, they are the responsibility of Vasuki, Nagib and Matthew, not these engineers.
The course's content developers received significant help from the IO Filter engineering team and the project
and program managers including Alex Jauch and Nasser Shayesteh.
The entire team hopes you find the course valuable and enjoyable.
n C programming language fluency and experience, including compiling programs, solving syntax
issues, etc.
n Experience using the gdb debugger for example to post-mortem core dumps
n Experience using a Linux shell and text editors (vi/vim, EMACS, etc.)
n Know how to deploy and manage a basic vSphere development cluster as presented in the VMware
Fundamentals for Developers course, including:
n Experience installing and removing VIBs from ESXi instances, including moving ESXi into and out of
maintenance mode, all from the ESXi CLI
n Using the Remote System Explorer (RSE) in Eclipse, especially the VMware customized RSE in VMware
Workbench
n Understand the flow of an IO between a guest OS and a Virtual Machine Disk (VMDK), with and
without an IO Filter, including understanding the role of:
n The VMX
n The Virtual SCSI (vSCSI), POSIX Object Layer (POL) File System Switch, and SSMod kernel
modules
n Understand the role of CIM providers and VWC plug-ins in IO filter Solutions
n List the components of a minimal environment for developing and testing IO Filter Solutions
n List the key folders, files, and tools created by installing the VAIODK, including the sample filters
n Specify where to create your build folder on your build system, and why
n Create appropriate entries in your Solution's SCONS and JSON files, based on the plans for your
Solution
n Create minimal source files for the library and daemon components of your Solution
n Define the prototype for each of the callback's in a library and daemon component, and when the IO
Filters Framework invokes each
n Specify in which components with which you can not, may not, and may create threads using
pthread_create()
n Understand the use of blocking rules in Poll and Timer Callbacks, and WorkGroup Functions
n Understand when you can and cannot hold a lock across calls to the VAIO
1 Course Overview (this chapter) — Discuss course overview, objectives, and organization.
Chapter Summary
This chapter has presented an overview of the course, its objectives, and organization. You should now be
able to :
n The Virtual SCSI (vSCSI), POSIX Object Layer (POL) File System Switch, and SSMod kernel
modules
n Understand the role of CIM providers and VWC plug-ins in IO filter Solutions
n “Understanding the Purpose and High-Level Attributes of vSphere IO Filter Solutions,” on page 14
n “Understanding IO Flows to VMDKs and Back to Requester on vSphere With IO Filters,” on page 20
n “Understanding the Role of vSphere Web Client (VWC) Plugins in IO Filter Solutions,” on page 28
In a traditional multipurpose OS, the initiator can be an user-space application or a kernel module with the
target being a device driver that sends the data to the target device for write requests or extracts the data
from the device for read requests. Components of the IO Stack can include buffer caches for block disk IO
and / or packet rings for network IO. Historically, the only flexibility in general-purpose OS IO stacks were:
n Certain 3rd-party developers could create device drivers (using platform-vendor-provided DDKs) to
interface with hardware not supported by the base kernel
The following diagram illustrates this historic architecture of the IO Stack (the red line indicates an example
flow of an IO between an initiator and a target):
Other than in the driver itself, these IO stacks did not allow 3rd party developers to intercept and process
any IOs. That is, if a developer wanted to process IOs for devices managed by Driver A, there was no way to
do this. There are many use cases for developers to intercept and process these IOs (called IO Filtering),
including:
n Checking data being read from or written to the disk for malware, blocking IOs containing infected
data
n Compressing data in an IO for better throughput across a network and / or to fit more data onto a single
disk
IO Filter Software that intercepts IO transactions somewhere between the initiator and
target, regardless of the final disposition of said transaction (blocking,
deleting, replication, etc.)
Eventually, most general-purpose OS vendors created frameworks that allow 3rd party developers to create
IO Filters to insert into said vendor's OSes' IO Stacks. The framework most OS vendors provide requires IO
Filter developers to create kernel-modules that have specific entry-points, similar to device drivers, for
receiving and processing IO requests. Thus, IO Filters using such frameworks are often called Filter
Drivers. The following figure extends the preceding figure, inserting Filter Drivers.
Through vSphere 6.0, with respect to IO Filters, VMware's ESXi was similar to the traditional OSes. While
3rd parties were allowed to create specific-purpose kernel modules in the IO flow, (for example for
Pluggable Storage Architecture (PSA) or vSphere APIs for Array Integration (VAAI-Block) ), said kernel
modules did not provide a framework for general purpose IO Filtering.
Now, VMware has created the vSphere APIs for IO Filter, abbreviated VAIO (the F of Filter is not included
in the acronym, and the "APIs" is plural even though there is really only one API with many functions and
data structures) which provides a general-purpose framework that allows 3rd parties to create solutions
(called IO Filter Solutions), analogous to Filter Drivers in non-hypervisor operating systems.
n Compression — To increase the amount of information stored in a given space on the disk
n Encryption — Separate from compression, requiring a key to decrypt the information, used for
enhancing security
n Cache — Using SSDs to cache the contents of spinning disks, used to increase performance
VAIO defines each use case as a separate class of filter. Currently, it allows administrators to assign at most
one filter of any class with a given virtual disk at any one time. For example, a VM called foo could have a
virtual disk called foo-disk1.vmdk with filters for inspection (for malware), encryption, and replication
attached.
NOTE The first release of vSPhere APIs for IO Filters will only support the replication and caching uses
cases. VMware will not certify VAIO solutions attempting to support other use cases at this time.
n Filtering of IOs to VMDKs. There is currently no support for filtering network IO.
n Filter modules run in multi-world user space cartels (like multi-threaded user-space processes in a Unix
environment). This feature ensures that flaws in an IO filter will not crash the ESXi kernel.
The following figure provides a high-level overview of flow of an IO between a VM and a filtered VMDK
(and back) using a user-space IO Filter Solution:
2 The ESXi kernel starts processing the request, then realizes that the target disk has an IO filter on it, so
the kernel sends the IO request to the vSphere IO Filter Framework
3 The IO Filter Framework sends the IO request to the IO Filter Solution. There are actually several sub-
steps involved in getting the IO to user-space and to the right IO Filter Solution.
4 The IO Filter Solution processes / filters the IO request appropriately for the use-case it implements. For
example: A replication solution will send write requests to a remote replication site; A caching solution
would look up read requests in the cache and update the cache with write requests.
After processing the request, said solution may then allow the rest of the IO stack to continue
processing the IO. The solution continues the processing by returning control of the IO to the vSphere
IO Filter Framework.
NOTE Later sections in this course discuss exceptions to this rule including dropping the IO and
processing the IO request asynchronously.
5 The IO Filter Framework continues the IO through the rest of the IO stack (typically including other
ESXi kernel modules), which eventually results in a call to the device driver controlling the hardware
on which the VMDK resides.
6 The driver completes the IO request and sends the result back towards the requesting VM, including
the vSphere IO Filter Framework.
7 If, during the IO Filter solution's processing of the IO request (step 4), said solution requests to be
notified when the IO completes, the IO Filter Framework sends said completed request back to the IO
Filter Solution again. For example, a caching solution may wait to update the cache on writes until after
the write completes on the VMDK itself. If said solution does not ask for IO completion notification, the
flow continues at Step 9.
8 The IO Filter solution does any additional processing required on IO completion, and then may notify
the vSphere IO Filter Framework that the IO can complete.
9 The vSphere IO Filter Framework continues the IO on its way, typically through additional ESXi kernel
modules.
10 The last ESXi kernel module in the IO Stack presents the results of the IO to the guest OS in the VM.
In addition to the DiskLib used by ESXi, VMware offers another version of DiskLib (called vixDiskLib) via
the un-gated Virtual Disk Development Kit (VDDK). The version provided in the VDDK provides functions for
accessing and manipulating VMDK files from a Windows or Linux program. Both versions of DiskLib invoke
IO Filtering Solutions for all IOs to VMDKs marked for filtering. This leads to the following definitions:
On-line Filtering The application of IO Filtering Solutions to IOs between a VM's guest OS and
any of its VMDKs that are marked for filtering
Off-line Filtering The application of IO Filtering Solutions to IOs to VMDKs (that are marked
for filtering) driven by a anything other than a VM's guest OS. Examples
include:
It is important to understand that there are three main sources of IO requests on ESXi, as illustrated by the
following figure:
n User-space cartels accessing a VMDK. Two special cases of this are the hostd and vpxa cartels which,
among other things, proxy requests from off-host entities to access or manage VMDKs on host.
This section concentrates on the first two items in the preceding list.
The first kernel module to receive IO requests depends on the source of the IO.
n For IOs generated from a guest VM, the IO gets handed to the vSCSI module in the kernel which is
invoked by the Virtual Machine Monitor (VMM).
TIP For those unfamiliar with ESXi VMM/VMX architecture, here is some background information.
When ESXi starts running a VM, it creates a new kernel space cartel called Virtual Machine Monitor
(VMM) with one world (thread) per vCPU in the VM. The VMM in turn creates a user space cartel called
Virtual Machine Executable (VMX) again with one world per vCPU in the VM. Each pair of worlds
work in concert to provide virtualization services to the guest OS, and then starts the guest code
running.
NOTE When the guest code is running in the CPU, the vCPU is said to be in direct execution mode
At some point, the guest code performs some operation that requires the services of the hypervisor,
such submitting an IO request to virtual hardware. The CPU switches from direct execution mode back
to running the VMM. The VMM processes and then submits the IO request to the vSCSI module in the
ESXi kernel. Said vSCSI module implements a SCSI interface for all storage IO requests (to a VMDK).
After performing some processing on the IO request, the vSCSI module passes the IO to the File System
Switch (FSS) kernel module (see “Understanding the Role of the File System Switch (FSS) Module in IO
Flows,” on page 19 ).
n For user-space cartels, the module that handles IO requests is called POSIX Object Layer (POL)
(“Understanding the Role of the POSIX Object Layer in IO Flows,” on page 19). For reads, this layer
may consult the Buffer Cache (BC) to obviate a read to physical hardware. This layer may also update the
BC for write operations. On BC miss, or if it does not use the BC, this layer hands off the IO request to
the File System Switch (FSS) kernel module (see “Understanding the Role of the File System Switch (FSS)
Module in IO Flows,” on page 19 ).
Thus, whether a VMDK's IOs are generated from a user space cartel or a VM, eventually, said requests end
up in the FSS module.
Next, the FSS module invokes the appropriate filesystem callbacks to perform the IO. For NFS, this involves
network IO. For other filesystems, the ESXi kernel invokes some (purposely obscured, for purposes of this
discussion) other kernel modules, eventually resulting in a call to the storage device's driver.
When the device completes the IO, analogous (but not necessarily the same) modules process the result on
its way back to the requester.
NOTE The ESXi File IO Stack, as described in this and subsequent subsections, may appear to be similar to
File IO Stacks in other semi-POSIX-compliant operating system. However, it is important to remember that
ESXi is its own proprietary OS with its own Kernel and should be thought of as such. You should not think
of ESXi as being based on any other semi-POSIX-reliant OS.
When a user-space cartel issues a file-related system call, said VMKernel code redirects the call to this
module, which then invokes the FSS layer using an appropriate FSS handle.
Understanding the Role of the File System Switch ( FSS ) Module in IO Flows
ESXi's File System Switch (FSS) module is analogous to the Virtual File System (VFS) layer in Linux / Unix
systems.
For those unfamiliar with either of these, the purpose of this module is to provide a uniform set of
operations to higher-layer kernel modules, (for ESXi: POL / vSCSI), but allow different implementation of
the actual filesystem organization (for ESXi: NFS, VMFS, VSAN, etc.).
For those familiar with object-oriented programming, think of this module as a pure-virtual class (C++) or
Interface (Java), with filesystems being the derived classes that actually implement each of the methods in
the FSS class.
Attaching an IO Filter to a VMDK means that the Framework sends all subsequent IOs for said VMDK to
said IO Filter for processing. Further, it means the Framework invokes filter-supplied functions (called
callbacks) to alert said filter of key events related to the VMDK (other than IOs) for example:
n when someone migrates the VM controlling a disk from one host to another
n when a program opens or closes the disk (such as VM start and stop / pause, etc.)
Detaching an IO Filter from a VMDK means the Framework no longer sends IOs or invokes callbacks for
VMDK-related events.
IMPORTANT The concept of callbacks is central to developing IO Filters. In this context, a callback is analogous
to an entry-point function in a device driver. The VAIO specifies a series of callbacks that every filter must
provide for the IO Filter Framework to call, as well as some that are optional.
This section focusses on describing the flow of an IO to a VMDK with IO Filters attached. Other chapters of
this course discuss how to define and flesh the callbacks that make up an IO Filter. This section builds on the
preceding discussion, and is high-level also, keeping the concepts as simple as possible. The intent is to
provide you with a big picture and context for how vSPhere IO Filters work, which will help you
understand what you can and can't do in an IO Filter and why.
The following figure augments the preceding one (Figure 2-4) and shows the possible IO Flows to a VMDK
with an IO Filter attached.
Figure 2‑5. IO Stack with vSphere IO Filters and other Filter Components
NOTE To reduce the complexity of this (already complicated) figure, this diagram does not show the path of
IOs to VMDKs without IO Filters. That said, for those IOs, the path would flow from the FSS beneath the
VMM to either DevFS (in the case of writing to a device file) or to a filesystem module such as NFS, VMFS,
etc.
The following sub-topics describe the vSphere IO Filter-specific modules illustrated in this figure.
NOTE IO Filters are interested in, and must process events beyond the IO requests passed to them via
upcalls. Those events not passed via upcalls are sent to SSLib via other mechanism.
NOTE As depicted in the preceding figure, IO Filters can have several components: A Library Instance (LI),
a Daemon, and a CIM Provider, each of which are discussed in other sub-topics. The component that
provides the callbacks referenced in the preceding paragraph is the Library Instance.
Library Instances are shared object (library) files that get loaded into the VMX process of VMs with said
library attached to one of their VMDKs. For offline filtering, SSLib and the LIs get loaded into the cartel
accessing the VMDK, for example hostd, vmkfstools, etc. In this latter case, SSMod makes upcalls to the
SSLib in the cartel that loaded it for offline processing.
SSLib invokes the applicable callback(s) in sequence according to the class / type of the filter. Currently, the
two classes of filters supported by vSPhere IO Filters, and the order in which they are invoked by SSLib are:
n Replication
n Cache
For example, if VMDK foo.vmdk has a caching filter C and a replication filter R attached, SSLib will invoke
R's callback first and then C's callback for any given event for or IO request to foo.vmdk.
NOTE Developers specify the class of their filter in a build-related configuration file, discussed in “Creating
and Populating a Correct Scons File,” on page 89. SSLib relies solely on this declaration to define a filter's
type.
Since callbacks are in the LI of the filter, SSLib invokes them with a simple procedure call. As such, it must
wait for each callback to return before it can proceed. How SSLib proceeds depends on the value returned
from the callback. Return values cause the SSLib to fail events (for example fail the read(), snapshot,
migration, etc.). A value indicating success causes SSLib to submit the IO to the next filter in sequence, or, if
all filters have processed the event successfully, return a success value to the caller. There is a third option
beyond success or failure, though, as discussed in the next sub-topic (“Understanding Synchronous vs
Asynchronous Processing in Callbacks,” on page 22).
n The completion callbacks (and data) registered by filters to be issued when the IO request is completed
Many VMDK events require synchronous processing. That is, all work that a filter must do within a callback
must be completed before the callback returns. In these cases, unless the callback wants to return an error, it
returns a VAIO-defined code representing success (VMIOF_SUCCESS).
However, some events require work that may not complete quickly. For example, consider a replication
filter that receives a write request form a guest OS, where the filter is configured to only complete the write
locally after it receives an ACK that the data has been written on the replication site(s). The replication filter
must send the data to the replication site, and wait for an ACK from that site, before sending the write
request to the local disk and completing the write request.
If the filter were to block, waiting for the ACK before returning from the remote site, it would tie up a world
(thread) in the cartel that issues the write. In the case of online filtering, the VMX process has very few
worlds. Having them blocked indefinitely (in this example, waiting for an ACK) would have adverse impact
on performance. In these cases, the callback may schedule some method for handling the work
asynchronously, and then return a code indicating this to the IO Filter Framework(VMIOF_ASYNC).
When SSLib receives VMIOF_ASYNC from a callback, it suspends any further processing of the event. When the
filter completes the asynchronous work (in the example, it receives the ACK and sends the write to the local
disk), the filter invokes a VAIO function to tell the IO Filter Framework that it can continue processing the
event.
Callbacks, other than the one for IO requests, include a pointer to a function the Filter must call to signal
completion of the callback.
For IO requests, the VAIO provides separate utility functions that signal the Framework that the Filter has
completed its work. One function indicates that the Filter has fulfilled (completed) the IO request, which
causes the Framework to turn the IO request around and start sending it up the IO stack. Another function
indicates that the Filter has completed its work and that the Framework should continue the request down
the IO stack. Each of these functions take a parameter that indicates whether the IO request was serviced
successfully or had a failure.
From the time a filter receives an IO request until it completes or continues it, the filter is said to own the IO.
Filters are required to keep track of each IO request it owns on a per-VMDK basis. How they implement this
is up to the Filter developer, though this course provides a suggestion and example. Filters must do this
because the Framework may send abort or reset requests to a Filter, which must be able to check if it owns
the IO(s) being aborted / reset.
One callback that each Library component must supply is to a function generically referred to as
diskIOStart. SSLib invokes this function for each read / write IO request it receives on a VMDK to which
this filter is attached. This function must perform whatever filtering logic is appropriate, and then return a
value indicating either: Success; Failure; Or that the filter will complete the IO request later, asynchronously.
SSlib then proceeds with the IO request as discussed in the previous topic, “Understanding the Role of the
SecretSauce Library (SSLib) in IOs With IO Filters,” on page 22. Code in this function often involves
communication with a Filter's daemon, if present, as discussed in the next topic.
Developers specify the source files that make up a filter's library instance using definitions in the filter's
SCONS file (see “Creating and Populating a Correct Scons File,” on page 89).
It is important to understand that for any given user-space cartel that opens one or more filtered VMDKs,
the IO Filter Framework (through SSLib) loads the Library component of the required IO Filters just once.
When the IO Filter Framework attaches an IO Filter to a VMDK, or opens a VMDK with an IO Filter
attached, the framework allocates a new opaque handle that it associates with that VMDK. Whenever the IO
Filter Framework invokes a callback of the Library component, said framework passes the opaque handle to
the callback so that the Library code knows on which VMDK the event occurred. Thus, an IO Filter's Library
component code must keep separate data for each VMDK it is filtering. This data is known as instance data.
The context of a Library component filtering a single VMDK is thus called a Library Instance (LI). For
example, consider the following figure:
n Filter A's Library component is loaded into the VMX of VM Y (so one address for each of the
callbacks in the Library component)
n The Library code keeps separate data about each of the VMDKs to which it is attached (represented
by the hexagons labeled ID1 and ID2). Thus, it is said there are two LIs for this filter in this VM's
context.
n Filter A's and Filter B's Library component is loaded into the VMX of VM Z. That is, there are two
addresses for each callback, one for Filter A, one for Filter B.
n Filter A only keep track of data for VMDK3. Thus, it is said there is only one LI for this filter in this
VM's context.
n Filter B keep separate data for VMDK3 and VMDK 4. Thus it is said there are two LIs for this filter
in this VM's context.
n Configuration data (also called filter properties or filter capabilities) — For example:
n A replication solution may allow administrators to configure whether the writes to the replication
site must be ACKed by at least one remote site before continuing the IO request on the source host
n A replication solution my want to keep the average throughput between the host and replication
site(s)
Some of this data is fast-moving and need not persist between the close of a VMDK and it being reopened
(for example between a power-off and power-on of a VM owning the VMDK), such as current cache hit
rates. Some of this data must persist and be present as long as the filter is attached to the VMDK, such as
configuration data. All such data is referred to as filter-private data.
The non-persistent filter-private data kept by a LI is called instance data as discussed in “Understanding the
Role of an IO Filter Solution's Library and Library Instances (LIs) in IO Flows with IO Filters,” on page 23
and must be kept by Filters in RAM-based data structures they define themselves. The persistent filter-
private data is called sidecar data because the Filter keeps it in a file associated with the VMDK called a
sidecar. The VAIODK provides a set of utility functions that allow Library code to manage both instance and
sidecar data.
n The stun level of a VMDK (defined and discussed in detail in “Understanding and Processing diskStun
and diskUnstun Events,” on page 208)
n Handles for any sidecar files associated with the instance's VMDK
n The pathname of the VMDK file (this can change as a result of storage migration)
n The current settings for filter properties and any additional configuration parameters
n Reside in the same directory as a VMDK, though their file name is (currently) obscured
For information on the instance data functions, see “Understanding and Using Filter-Private Data Functions
to Keep Non-Persistent Per-VMDK Meta Data,” on page 145. For information on sidecar functions, see
“Using Sidecars Functions in Library Code to Keep Persistent Per-VMDK Meta Data,” on page 137.
The daemon gets loaded and started as part of the filter's VIB installation onto an ESXi host. The iofilterd
cartel starts a daemon by invoking a (mandatory) start callback. The daemon is stopped and unloaded as
part of a filter's VIB removal. The iofilterd cartel stops a daemon by invoking a (mandatory) stop callback.
Daemons are not allowed to run in infinite loops as they do in Linux / Unix (or services in Windows).
Instead, the start function typically creates and binds to sockets, then sets up asynchronous IO callbacks on
receipt of data on said sockets. They may also perform timer operations, etc. using VAIO utility functions
(discussed in Chapter 5).
Thus, Daemons: Start before LI code runs: Stop after all LI code exits; Can handle asynchronous events that
make it appear as though it is running at all times. A common thing for LIs to include in the callback for
handling VMDK open events is to make a socket connection to the filter's Daemon.
NOTE It is a best practice to use UNIX domain sockets for these connections.
The purpose of a Daemon is to provide functionality to LIs that LIs cannot do themselves. For example, LI
code cannot make a TCP/IP socket connection to an off-host server. This is enforced with ESXi access
restrictions. However, daemons can. Further, after making such a connection, they can pass the file-
descriptor of the resulting socket to a LI so that said LI can communicate directly with the remote server
without having to continually involve the daemon.
Another example is, for a caching filter, it is considered a best practice to have the daemon create and
manage vFlash File System (VFFS) cache files caching storage (typically but not always SSDs). In this case, for
read request processing, the LI queries the daemon to see if the cache file contains a required block for a
given VMDK. For write request processing, the LI must send updated data to the daemon to update the
cache.
To avoid sending large amounts of data for IO requests between daemon and LI over a socket, a reasonable
programming pattern is to: 1) Create a shared memory area accessible to LI and Daemon components; 2)
Place data for read / write requests in the shared data; 3) Send requests / responses, that include offsets to
the applicable data, in messages between the LI and daemon over the socket. You can think of the shared
memory as a data-plane and the socket as control-plane for the data sharing.
In summary, for a given IO request, an LI may return VMIOF_ASYNC to the SSLib, then send requests to its
daemon to perform processing on the request that the LI cannot (or should not) perform itself, and then wait
for a response from the daemon.
As with LIs, you specify the source files that make up the daemon using SCONS file definitions (see
“Creating and Populating a Correct Scons File,” on page 89).
NOTE The daemon in your IO Filter Solution is recommended to use SSL to establish a secure connections
between its LIs and off-host peer daemons, incorporating additional measures to authentic such
connections, such as magic numbers, etc.
For example, a caching filter may optionally perform read-ahead when processing a cache miss. Further, it
may allow administrators to configure how many blocks to read ahead. Such a filter can include a CIM
provider that receives configuration commands from a standard management tool (vSphere Client, vSphere
Web Client, Power CLI, etc.), and pass those commands onto the filter instance, typically via the filter's
daemon. The CIM provider can also surface the current configuration to said management tools by
requesting current settings from the filter daemon or LI. If the daemon supports separate read-ahead values
for each VMDK, the filter can keep the parameter in one of the VMDK's sidecar files. If the daemon only
supports a global setting, the filter can keep the data in the CIM provider, which has standardised methods
for storing and retrieving persistent data.
While each IO Filter must define and provide source for a CIM provider in its SCONS file, said CIM
provider code need not actually do anything useful. Unless you are an experienced CIM provider
developer, consider starting with the skeletal example provided by one of the example filters in the
VAIODK, then adding functionality as you learn.
Developing CIM providers is a large topic on its own, and is somewhat orthogonal to the discussion of
developing a vSphere IO Filter. Thus, further discussion of developing a CIM provider is beyond the scope
of this course.
Further, vSphere IO Filters integrate, at the vSphere layer, with vSphere's Storage Policy Based Management
(SPBM) feature. Historically, SPBM has allowed administrators to apply policies to VMDKs such as: Only
place this VMDK on devices that can provide X IOPs, and migrate it to a sufficient device if the current
device falls below X IOPs. For vSphere IO Filters, administrators associate an IO Filter to a VMDK as part of
the SPBM framework.
The work flow for configuring IO Filters in a cluster, and applying them to VMDKs for VMs running there
is similar to, if not exactly, the following:
2 Deploy the bundle to the vCenter Server / VCSA controlling clusters in the datacenter. This enables
administrators to associate an IO Filter with a VMDK using SPBM methods in the VWC.
3 Apply an IO Filter SPBM policy to a VMDK, either when creating said VMDK or after it is created,
using the VWC. This action causes the vCenter Server / VCSA to do two things:
a Deploy the specified IO Filter's VIB to each ESXi host in the cluster (which in turn starts said IO
Filter's daemon)
b Performs a diskAttach operation for said IO Filter to the specified VMDK, using default values for
any configurable filter properties.
There are implications of running VMs with IO Filters attached to their VMDKs in a cluster, including:
n Migration — For a VM (with filtered disks) to migrate from Host A to B, B must have the required IO
Filter installed, too. Further, if the filter is of class cache, does B have its own SSD with a VFFS volume?
What is the appropriate behavior for the filter to take in either case?
NOTE It is entirely possible for a VM running on Host B to use the VFFS cache on Host A, and still
increase performance over just accessing VMDKs on their native backing store without cache.
n If a host enters the cluster with DRS, the vCenter Server / VCSA must (and does) deploy all deployed
IO Filters to that host as part of the join. Similarly, if a host leaves the cluster, all VMs with filtered
VMDKs must migrate to other hosts in the cluster, and the filter must be removed as part of the leave.
Where necessary, topics in Chapter 5 include detailed discussion of cluster implications on callback design
decisions.
n Deploy an IO Filter Solution into a cluster, meaning that the method causes the Filter's VIB to get
installed into each ESXi host in the cluster.
n Configure properties of a IO Filter Solution, which get surfaced to the VWC via entries in the filter's
build environment files. That is, when you specify that your filter has a configurable property (for
example cache read-ahead) in your filter's build configuration variables, then compile the VIB, the VIB
includes the information you provide about those configurable properties. When the VIB is deployed to
the cluster, the VWC is able to see the configurable properties in the VIBs and provide a standard UI for
changing the properties. When the administrator changes the property, the VWC sends the change
command, via the vCenter Server, to the IO Filter's CIM Provider, which can then effect the change in
the other components of the IO Filter as appropriate (for example sending the command to change the
read-ahead value to the Filter's daemon or LI.) CIM provider is not involved if only for the filter
property / capability change.
VMware provides an un-gated, free SDK that allows anyone, but typically partners, to develop plugins that
provide additional functionality for managing their solutions within a vSphere environment. As a vSphere
IO Filter Developer, you only need develop a VWC plugin for your filter if you need functionality beyond
what is provided by the base VWC, that is, beyond requesting reads / writes to filter properties. If you do
decide to create your own VWC plugin, you should still use the CIM framework for communicating
requests to your solution.
Creating VWC plugins is a large topic, for which VMware offers a 5-6 day bootcamp. Thus further
discussion of developing VWC plugins is beyond the scope of this course.
n Each IO Filter must provide Library Instance code that gets invoked by the IO Filter Framework to filter
IOs to designated VMDKs. The library code runs in the context of a VM's VMX user-space cartel and / or
in the context of a user-space cartel performing off-line processing of the VMDK.
n Each IO Filter can optionally provide a Daemon that runs in the context of the iofilterd user-space
cartel. Daemons are started by the IO Filter Framework during deployment of the filter's VIB on an
ESXi host.
n IOs to VMDKs from VMs enter the storage stack via the VMM invoking the vSCSI module. IOs to VMDKs
from on-host user-space programs, including hostd, enter the storage stack via POL module. The hostd
process proxies IOs that originate on Windows / Linux applications for this off-line processing. The
combination of these factors indicate that all IOs to VMDKs going through these three sources get into the
ESXi storage stack and are thus subject to IO Filtering.
NOTE IOs to VMDK files on shared storage that do not go through ESXi are not subject to IO filtering. For
example, a VMDK on an NFS file store, opened by a program using standard POSIX open/close/read/write
system calls will not have its IOs filtered by vSphere IO Filters.
Executing filter code in the context of user-space cartel has the benefit that fatal flaws in a given filter's code
only affects VMs whose VMDKs are attached to said filter. This contrasts with kernel-based filtering
solutions where a fatal flaw in the filtering code causes the kernel to crash. This user-space architecture
increases the reliability of the ESXi hosts.
Chapter Summary
To achieve the tactical objectives of this course, after successfully completing this chapter, you should be
able to:
n Understand the flow of an IO between a guest OS and a Virtual Machine Disk (VMDK), with and
without an IO Filter, including understanding the role of:
n The VMX
n The Virtual SCSI (vSCSI), POSIX Object Layer (POL) File System Switch, and SSMod kernel
modules
n Understand the role of CIM providers and VWC plug-ins in IO filter Solutions
a Caching
b Encryption
c Inspection
d Replication
2 Which of the following are required components of a vSphere IO Filter solution developed by you (a
partner)? (choose all that apply)
a Library Instance
b Daemon
c CIM Provider
d VWC Plugin
3 In which of the following contexts does SSLib run for a VM? (Choose the best answer)
d A VMkernel module
4 Which of the following components does VMware recommend control VFFS cache files for a caching
filter solution?
b The daemon
c The LI
a VMM
b VMX
c iofilterd
d VMkernel
a It receives events from SSMod / hostd / vpxa and invokes the appropriate callback of the LI
7 Where do IO Filters store persistent meta-data about the VMDKs they filter ?
a Instance Data
b Sidecars
n List the key folders, files, and tools created by installing the VAIODK, including the sample filters
n “Understanding a Filter's Directory Contents and Building a VAIODK Sample Filter,” on page 47
n A C tool chain that includes a cross compiler and customized gdb debugger, linker, etc. The entire
toolchain must run on a supported 64-bit Linux system. It also includes make and scons, which are used
to build IO Filter VIBs / Bundles. If you are unfamiliar with scons:
n Don't Panic! This course teaches you what entries to make in scons-related build files. You don't
have to learn scons to create IO Filters.
n A customized version of the CIM Provider Development Kit (CIMPDK) to build CIM providers
n A customized version of the Host Extension Development Kit (HEXDK) that the toolchain uses to create
a VIB and Bundle with which you can deploy your IO Filter Solution to ESXi hosts and vCenter Servers
n It is possible to include your own static ELF libraries in your IO Filter Solution, allowing for modular
development of your solution.
As indicated by the presence of a C tool chain, all IO Filters must be written in C. No other language is
supported at this time, including C++. You must use the VAIO-provided toolchain to compile your filters
and build your VIBs / Bundles.
To develop and test vSphere IO Filters, you must have the following in your development environment:
n At least two ESXi 60U1 systems, each with a VMkernel vNIC configured for vMotion traffic. You
should enable SSH to each host, and disable ESXi shell timeouts.
n vSphere DRS configured. Consider changing the Automation parameter from the default of Fully
Automated to Manual.
WHAT'S NEW DRS is no longer a requirement starting in ESXi 60U2. This will allow customers to
upgrade the IO Filter by moving the hosts into maintenance mode and then performing the
upgrade or uninstall process without the need of DRS.
n A VM installed to the shared datastore. The guest OS in the VM must be one that can support
dynamically adding additional disks. It can be anything that you know how to administer: DOS,
Windows, Linux, etc. You must have an appropriate license for said guest OS.
IMPORTANT The build of ESXi you use to deploy your ESXi hosts must match the build of the VAIODK.
Mismatches will (almost certainly) cause IO Filters to fail to load, VMs to crash in unexpected ways, etc.
NOTE The authentication system used between vCenter Server and ESXi hosts is time sensitive. If the
systems' times are too far out of sync, certain cluster operations may fail. Therefore, a best practice is to
configure the vCenter Server and ESXi hosts to use the same Network Time Protocol (NTP) server.
n A 64-bit Linux system with which you write, build, and debug your IO Filter. To do this you must
install the VAIODK onto this system and follow the other procedures discussed in this chapter.
While you can use just about any 64-bit Linux system to do your development today, VMware
recommends using the VMware Workbench VM, installed on an Fusion / vSphere / Workstation
environment that is not part of the cluster you will be using to test your IO Filter.
NOTE VMware Workbench VM is currently based on SUSE Enterprise Linux (SLES) 11 SP 3. You get access
to this product free from VMware as part of belonging to the IO Filters program. It is provided through
a special licensing agreement with Novel, owners of SLES, eliminating the need for you to buy a license
for the version of SLES used for Workbench yourself.
Another advantage of using VMware Workbench VM is that you can use its customized Remote System
Explorer to install and remove VIBs from ESXi hosts instead of having to do this from the ESXi CLI.
n CIM Providers are compiled to 32-bit ELF shared object (.so) files that get loaded into the context of the
sfcbd (small foot-print cim broker daemon) cartel, which is a 32-bit executable
n IO Filter Daemons are compiled to 64-bit ELF shared object (.so) files that get loaded into the context of
the iofilterd cartel, which is a 64-bit executable
n IO Filter Libraries are compiled to both 32-bit and 64-bit ELF shared object (.so) files that get loaded
into the context of either 32 and 64-bit executables, respectively. The key user of the 64-bit Library is
vmx. Most other user-space cartels, such as vmkfstools, hostd and vpxa are currently 32-bit executables
and thus use the 32-bit Library.
The mixed world has several implications for writing code, generating the following guidelines:
n The read/write buffer for the structure VAIODK defines for IO elements (scatter/gather items),
VMIOF_DiskIOElem, contains addr that is declared to be of type uint64_t. To convert this address to a C
pointer, and vice versa, use the macros A2P() and P2A() which are defined as follows:
These macros are defined in at least one of the sample filters that come with the VAIODK and discussed
in “Understanding the Results of a Successful VAIODK Install,” on page 43.
n You cannot share a mutex between 32 and 64-bit code. For example, if you create a shared memory
segment between your LI and Daemon, and control access to the area with a pthread_mutex_t object,
this will work for LIs running in a VMX context, but not in the context of, for example, vmkfstools.
n System V semaphores are an alternative to pthread mutexes when working with 32 and 64-bit code.
n If you want to include your own static library in an IO Filter solution, you must provide both 32 and 64-
bit versions of said libraries so they, too, can get loaded into both 32 and 64-bit cartels such as vmx and
vmkfstools
Prerequisites
In order to deploy the VAIODK, you must:
n Have a 64-bit Linux system installed and available for hosting the VAIODK , as discussed in
“Understanding the Development Environment and its Requirements,” on page 32
n Have received the download location for the VAIODK from the VAIO program management.
Procedure
1 Download and unwrap the VAIODK package.
3 Verify the results of the installation created content in the appropriate places on the disk.
What to do next
Build the sample filters as discussed in “Understanding a Filter's Directory Contents and Building a
VAIODK Sample Filter,” on page 47
The files listed in the preceding figure are located in the folder /iofilter-early-access/20151014-3074370-
VAIODK-GA-ESXi6.0U1A. Your account on the SFTP site does not have permission to browse the directories on
the site. Therefore you must specify files to get by their full pathname. VMware recommends using tools
such as Filezilla or WinSCP to access and download files from the SFTP site.
Historically, the VAIODK was staged on this server. How, you download the VAIODK Developer Center as
discussed in the next topic.
Its important to note that the VAIODK install requires Workbench version 3.5.3.
Given the above, the procedure for installing and unwrapping the VAIODK is:
Procedure
1 Download the VAIODK from Developer Center.
2 Download Workbench VM 3.5 from Developer Center (as discussed in the course VMware
Fundamentals for Developers) and deploy it a virtualized envrionement that is NOT part of your test
environment. (The developers of this course typically run Workbench in VMware Fusion or VMware
Workstation).
3 Download the Workbench 3.5.3 update package from Developer Center, and install it within the
Workbench application (Eclipse) in the Workbench VM.
4 Install the VAIODK archive as described in the next topic: “Installing the VAIODK using VMware
Workbench,” on page 35.
Prerequisites
The VMware Fundamentals for Developers course discusses how to install the VMware workbench. You find
the workbench VM download at https://developercenter.vmware.com/group/workbench/vm/3.5. You must
have also updated your Workbench environment to version 3.5.3 by installing the VMware Workbench Core
Update found on the page at that same URL.
Procedure
1 If you have not already done so, start the VMware workbench VM. Log in using as either root or
vmware. The default password for both is vmware.
Whichever username you use, when you start the Workbench app, it runs as root.
2 Start the Workbench application by clicking on the VMware Workbench icon on the desktop.
3 Specify the location where you want your project to reside. Click OK to continue.
5 In the Install Wizard, click on Add to specify the location of the VAIODK zip file that you downloaded
in the previous step.
6 Click on Archive and browse to the location of the VAIODK zip file and select it.
7 Click on Select All to include all the files inside the zip file. Click on Next to proceed with the VAIODK
installation.
8 You are prompted with the Licencing Terms and Conditions. Click on I accept the terms of the license
agreements radio button. Then click on Finish to complete the VAIODK installation.
9 To verify that the VAIODK installation is successful, open a terminal and execute the command rpm –qa
| grep vmware-esx. The display is similar to the following screenshot.
NOTE Installation of VAIODK should be done only as a root user. It is recommended that you do your IO
Filter solution development as a non root. You could use username :vmware and password : vmware
credentials for example, that is already available.
NOTE Within this topic, the term buildId refers to a specific build of ESXi. You must use the version of ESXi
indicated in the buildId of a VAIODK.
Installing the VAIODK results in adding the following key items to the filesystem:
n /build/toolchain — This directory is the traditional location of VMware's internal tools and supporting
tools (such as various versions of Perl, gcc, etc.). These tools will migrate to /opt/vmware/toolchain
over time.
n /opt/vmware — This directory is where VMware installs its SDKs. It includes directories such
as /opt/vmware/toolchain, which is the (new) location for VMware's internal tools and supporting
tools. Sub-directories containing SDKs have a name in the form SDKName-SDKVersion, for example
vaiodk-6.0.0-2329218.
Within the /opt/vmware directory, VAIODK installation creates several sub-directories, of which the
following are key:
n cimpdk — As the name suggests, this directory contains the CIM Provider Development Kit (CIMPDK).
This allows IO Filter Solutions to include a CIM Provider. The tool chain for building an IO Filter VIB
invokes tools in this CIMPDK.
n vaiodk-6.0.0-buildId — This directory contains the core of the VAIODK as discussed later in this
topic.
While writing code for, building, and debugging IO Filters, you will spend most of your time working
within the /opt/vmware/vaiodk-6.0.0-buildID folder. The key sub-directories within this directory after
install are illustrated in the following figure:
n debug — This directory contains items needed to perform post-mortem debugging on IO Filter
components
n docs — As the name suggests, this directory contains documentation for VAIODK. Most notably, the
doc/html directory contains doxygen contents from the VAIODK .h files.
n src — This is the directory under which all IO filter code must be created. It also includes the directory
with the VAIODK include files and samples, as discussed in the next two sub-topics.
The directory bora/lib/public/vmiof of the VAIODK's src directory contains all the header files that define
data types and prototypes for the VAIO. It is a best practice to just #include vmiof.h, which includes all of
the other .h files for you. However, the following list describes each of the header files for your background
information.
n vmiof_aio.h — This file provides prototypes for the functions, and related data structures, used to
perform asynchronous IO from within an IO Filter
n vmiof_crossfd.h — This file provides prototypes for the functions, and related data structures, that a
cartel can use to provide access to its memory to other cartels (for example iofilterd and a cartel that
loaded an IO Filter Library) using a file IO abstraction. For more information about these functions, see
“Understanding and Using the IO Filters CrossFD Functions,” on page 184.
n vmiof_daemon.h — This file provides prototypes for the functions, and related data structures, used to
create Daemons for an IO Filter
n vmiof_disk.h — This file provides prototypes for the callback functions defined by an IO Filter, utility
functions used by an IO Filter, and their related data structures. While the callback functions defined
here are only used to write Filter Libraries, the utility functions are used by Filter Daemons as well.
n vmiof_heap.h — This file provides prototypes for the functions, and related data structures, used to
create, destroy, and managed dynamic memory spaces (heaps) within an IO Filter. For more
information about these functions, see “Managing Memory in an IO Filter Solution,” on page 130.
n vmiof_log.h — This file provides prototypes for the functions, and related data structures, used to
create log entries within an IO Filter. For more information about these functions, see “Understanding
Logging in an IO Filter,” on page 50.
n vmiof_poll.h — This file provides prototypes for the functions, and related data structures, used to do
poll-based processing, as an alternative to using the select(2) or poll(2) system calls (or variants
thereof), within an IO Filter. For more information about these functions, see “Understanding and
Using the IO Filters Polling Functions,” on page 141.
n vmiof_scsi.h — This file provides prototypes for the functions, and related data structures, used to
issue SCSI commands within an IO Filter
n vmiof_status.h — This file provides definitions for the common return values returned by callbacks
and utility functions of the VAIO. For more information about status codes, see “Understanding
VMIOF_Status Results for Functions in the VAIO,” on page 102.
n vmiof_timer.h — This file provides prototypes for the functions, and related data structures, for doing
time-based processing within an IO Filter. For more information about these functions, see
“Understanding and Using the IO Filters Timer Functions,” on page 171.
n vmiof_cache.h — This file provides prototypes for the functions, and related data structures, used to
create, destroy, and perform IO to files on an ESXi host's vFlash Files System (VFFS), created from an
aggregate of the host's SSDs. For more information about these functions, see “Understanding and
Using the VMIOF_Cache*() Functions,” on page 253.
n vmiof_work.h — This file provides prototypes for the functions, and related data structures, for
implementing work-pile based processing within an IO Filter. For more information about these
functions, see “Understanding and Using the IO Filters Worker Functions,” on page 174.
The VAIODK includes several sample IO Filter Solutions, each of which serves a specific purpose in aiding
you to develop your own IO Filter Solution. All of the samples are located within sub-directories
of /opt/vmware/vaiodk-version-build/src/partners/samples/iofilters. The samples and their
descriptions, in order of complexity, are as follows:
n sampfilt — This solution includes LI and a CIM provider components, but no daemon component. The
LI does not communicate with the CIM provider.
The LI contains the minimum code necessary for a functional LI, having just the required entry points,
and the code for each of those just containing a call to make a log entry and then return a value
indicating the callback succeeded. The CIM provider is similarly minimal.
n Validate the VAIODK build capability — After installing the VAIODK, build this solution. If you
encounter errors, there is almost certainly something wrong with your VAIODK deployment.
n Validate the VAIO runtime in your ESXi hosts — After building this solution, deploy its VIB to an
ESXi host, attach it to the VMDK of a VM, and power on the VM. If you have any problems doing
this, you have issues in your development environment.
n Understanding callback invocation — When an entity loads and runs this filter, you can observe
when the IO Filter Framework invokes various callbacks. For example, you can perform a
snapshot, migration, power on/off, etc. and observe when the framework invokes callbacks such as
diskOpen(), diskIOStart(), diskStun(), diskUnStun(), etc. Further, you can enhance the code by
having it log the values passed into parameters of the various callbacks to gain an understanding of
said parameters.
While this filter is designated as a replication class filter, in fact it performs no replication.
n countIO — This solution is only slightly more complicated than sampfilt. It includes a daemon
component in addition to an LI and CIM provider. That said, none of the components communicate
with one another.
The daemon component contains minimal functionality, including the callbacks for starting, stopping,
and cleaning up after a daemon stops.
The LI callbacks include code to manipulate non-persistent per-VMDK meta-data (private data) as well
as counting the number of times the IO Filter Framework invokes the diskIOStart() callback. Both
components use the VMIOF_heapAllocate() and VMIOF_heapFree() functions(see “Managing Memory in
an IO Filter Solution,” on page 130).
This filter is designated as a caching filter, even though it performs no caching. Thus, you can use this
filter, in combination with sampfilt, to understand the sequence in which the IO Filter Framework
invokes callbacks on a VMDK with multiple filters attached. That is, if you configure a VMDK with
both the sampfilt and countIO filters attached, you can see that for any given event, the IO Filter
framework invokes the callback for sampfilt before countIO because of their respective classes, and that
the IO Filter Framework invokes callbacks for replication filters before caching filters.
n proxy — This solution is more complicated still, though it does not include a daemon component. The
complexity added in this solution is in how it processes IO requests in diskIOStart(). Instead of just
completing each IO request synchronously, this filter illustrates several commonly used programming
patterns for handling IOs, including:
n Registering for IO completion — Before submitting the duplicate IO to the IO Filter Framework,
the LI registers a callback function for the IO Filter Framework to call upon completion of the IO
request. In that callback, the LI completes the original IO request, and free's the duplicated IO it
had previously created and submitted.
n Asynchronous IO completion — After duplicating, registering for completion, and submitting the
duplicate IO to the IO Filter Framework, the LI signals said framework that it will complete the
original IO request asynchronously. Per the preceding bullet, the completion callback signals said
framework that the original IO is complete after it receives the results using the duplicate IO
request.
n A fully functional LI and Daemon that communicate with one another via UNIX domain sockets
and shared memory using the Crossfd functions (see “Understanding and Using the IO Filters
CrossFD Functions,” on page 184
n Use of the vFlash functions to create cache files on a cache device, for example an SSD, (see
“Understanding and Using the VMIOF_Cache*() Functions,” on page 253)
n Use of sidecar functions to manage persistent per-VMDK meta-data (see “Using Sidecars Functions
in Library Code to Keep Persistent Per-VMDK Meta Data,” on page 137)
n Asynchronous IO completion — Requests are always completed asynchronously, after read data is
retrieved from the cache or disk and after writes are completed to the disk and cache, respectively.
n IO request duplication, allocation, and freeing — Read requests are fulfilled from the cache as
much as possible, and from the disk for whatever is not in the cache. For the data missing from the
cache, the filter creates a new IO request and submits it to the disk, filling in the original request
with the complete data on completion of the filter-generated IO request.
n Use of work-pile and poll callbacks — The LI and daemon both use work-piles to complete
background processing of certain tasks. They also use poll callbacks to process reads/writes
through their shared socket.
VMware maintains a VMware Confidential Copyright on all of the sample code. In general, you are free to
copy the patterns in your own solutions. Check your program contracts for specific language on how you
can (and can't) use the source code in these samples.
Prerequisites
At this current time, uninstall support of the VAIODK in Workbench is unconfirmed. We recommend that
you create a new WorkBench VM and install the desired VAIODK.
During development, as an alternative, you can deploy an IO Filter Solution to ESXi hosts as VIBs. This is
faster than deploying to a whole cluster and is appropriate when testing non-cluster related functionality.
Thus, the objective of an IO Filter build is the generation of its VIB and bundle files.
Each of the VAIODK sample filters are ready to build as soon as you install the VAIODK because they
contain all the required, and sometimes optional, components of an IO Filter Solution. This section discusses
what those components are, and how to build them into a VIB / bundle.
Specifically, each IO Filter source directory must contain the following items:
n Source for the filter's LI, based on the name of the filter, for example sampfilt.c, countIO.c, etc.
n A source file for the filter's daemon, if present. Remember that the sampfilt and proxy sample solutions
do not include daemons.
n A sub-directory with a build environment for the filter's CIM provider. The sub-directory's name is also
based on the filter name using the pattern cim/filter-name, for example cim/sampfilt, cim/countIO, etc.
The CIMPDK dictates the contents of this sub-directory. For this topic, it is enough to know that the
CIM provider source is located in cim/filter-name/src.
n A SCONS file, whose name is based on the filter, for example sampfilt.sc, with rules for building the
solution's VIB, bundle, and their components. For details on this file, see “Creating and Populating a
Correct Scons File,” on page 89.
n A Makefile with rules for invoking scons to build the solution's VIB, debugging environment, and to
clean the build environment for this filter.
In aggregate the content's of each filters' directory allows you to build the VIB very easily:
Prerequisites
To build the VAIODK sample filters, you must have already installed the VAIODK.
Procedure
1 Set into the sample's directory.
Make invokes SCONS to build the solution. It compiles the 32 and 64-bit versions of the filter's LI, the
64-bit daemon shared object, and 32-bit CIM provider shared object, then builds the VIB and bundle file
from those components.
After successfully building your filter, you should deploy it to, and test it on your vSphere cluster as
described in “Testing an IO Filter,” on page 49.
IMPORTANT Please run "make clean" before you run "make", otherwise, VAIODK might bundle all the old
vibs into the final bundle. You can use "unzip -l <bundle-name>.zip" to verify what's inside the bundle.
NOTE The x86_64 in the vib file name indicates that it is meant to be loaded on 64-bit ESXi hosts, not
that all of its contents are 64-bit executables. Remember that IO Filter Solution VIBs have a mix of 32
and 64-bit objects.
The notable exception is that the build creates CIM objects under the filter's cim/filter_name/build/ sub-
directory, for example cim/sampfilt/build/.libs/libsampfiltprovider.so.
Again, these examples were for builds of the countIO sample. For whatever filter you build, just replace
countio or countIO with your filter's name for the names for that filter.
IMPORTANT Each time you build, you should check the timestamp on the VIB and bundle to ensure that they
were updated by your most recent build.
Testing an IO Filter
IO Filter Solutions are meant to be deployed to a cluster, which in turn causes the cluster's vCenter Server to
deploy it to all hosts within the cluster. That said, during early development, for simplicity, you may choose
to only deploy your IO Filter Solution directly to one or more ESXi hosts.
Once you have the IO Filter Solution deployed in your vSphere environment (to a cluster or one or more
hosts), you must attach the filter to one or more VMDKs. Again, for initial simplicity, you may attach it to
only one VMDK, then to multiple VMDKs associated with the same VM, and finally to multiple VMDKs on
multiple VMs.
To prevent damage to the guest OS running in your test VM, while developing your Solution, attach your
filter to and detach it from a non-system disk. VMware recommends just adding an additional VMDK to
your VM and using that with your IO Filter, until you are confident in the correctness of your code. At that
point, a good test of your Solution is to try installing a guest OS in a new VM with your Solution attached to
all of said VM's VMDKs. Try migrating the VM from host to host in the cluster, and taking snapshots,
during the installation. If the installation fails because of an IO Filter issue, you have more work to do.
Once you have your Solution working with multiple VMDKs in multiple VMs, you should also deploy a
sample Solution of a different class than your Solution to ensure that your Solution "plays well with others."
You may even consider adding an alternate test filter which injects various faults into certain operations,
such as preparing for snapshots or migrations, to validate how your Solution responds to the IO Filter
Framework surfacing those faults for your LI.
Prerequisites
To test an IO Filter Solution, you must have successfully built its VIB and bundle files and have a minimal
vSphere cluster into which you deploy them. The VM should be running a guest OS that you can easily
administer, such as adding and removing disks, making filesystems on those disks, etc.
Procedure
1 Set each of the ESXi hosts in your cluster to Community Supported mode so that it can accept VIBs whose
signatures can't be verified.
You can now install the VIBs you build in our environment without having to sign each VIB with your
company's certificates.
2 Deploy your filter on a single ESXi host using the filter's VIB file and test it in a single-host environment
This simplifies the environment in which your filter runs. You don't have to worry about DRS, vCenter
Server issues, using the MOB or VWC, etc.
3 After thoroughly testing your filter on an ESXi host, deploy it to your cluster's vCenter Server using the
filter's offline bundle file and test it in that environment.
What to do next
Each of the sub-topics that follow provide detailed instructions for deploying your filter onto an individual
host and into a cluster.
However, if you deploy your IO Filter Solution to a cluster, the vCenter Server sends the VIB to each host
and proxies the install request through said hosts' hostd cartel. In this case, hostd does not tell the system to
ignore the VIB certificates, which causes the installation to fail, unless you first put the ESXi hosts into
Community Support mode. In this mode, the ESXi Installer (the back-end invoked by all installation
methods) ignores all VIBs certificates and just installs the VIB.
DANGER You should only put development hosts into Community Support mode. You should never put
production systems into Community Support mode, for security reasons.
Prerequisites
In order to perform the commands in this topic, you must have configured the host for SSH access, enabled
the ESX CLI, and ssh'd into the host as root.
Procedure
1 SSH into your ESXi system as root.
This command should print the following: Host acceptance level changed to 'CommunitySupported'.
3 Verify that the preceding command works by entering this command: esxcli software acceptance get
n VMX — The VMX sends log messages to the file vmware.log located in the VM's directory. For example, a
VM called foo running on a datastore mounted at /vmfs/volumes/nas1 will have VMX log entries appear
in /vmfs/volumes/nas1/foo/vmware.log.
n vmkfstools — This ESXi CLI command sends log messages to its standard output (stdout). However, it
only makes log messages visible if you set the verbose flag (-v or --verbose) to 5 or higher.
n hostd — This cartel, among many other things, proxy's requests from a vCenter Server to perform IO
Filter requests in the host on which hostd runs. For example, when you deploy an IO Filter to a cluster,
and the vCenter Server in turn deploys the VIB to hosts, it uses hostd on each host to perform the VIB
deployment. Hostd sends log messages to /var/log/hostd.log. Check this file for log entries generated
by your LI when you perform operations through the VWC such as attaching and detaching an IO
Filter from a VMDK.
n vpxa — Similar to hostd, vpxa also proxy's requests from a vCenter Server to perform IOFilter requests
on the host. Vpxa sends log messages to /var/log/vpxa.log. Logs for some operations initiated by VWC,
such as disk cloning, will go to vpxa.log. If you have hard time finding log messages, check both
hostd.log and vpxa.log.
The iofilterd cartel also sends log messages generated by a Daemon to/var/log/syslog.log.
n /var/log/syslog.log
n /var/log/iofilter-init.log
n /var/log/hostd.log
n /var/log/vpxa.log
n /var/log/esxupdate.log
n /vmfs/volumes/some-datastore/VM-name/vmware.log
n Stdout on the ESXi CLI for commands you run from there that load the LI, for example vmkfstools
The places on the vCenter Server to look for log messages related to IO Filtering include:
n /var/log/vmware/eam/eam.log
n /var/log/vmware/vpxd/vpxd.log
n VMIOF_LOG_PANIC
n VMIOF_LOG_ERROR
n VMIOF_LOG_WARNING
n VMIOF_LOG_INFO
n VMIOF_LOG_VERBOSE
n VMIOF_LOG_TRIVIA
By default, the IO Filter Infrastructure only logs messages with a level of VMIOF_LOG_WARNING or higher. To
force the infrastructure to log messages with a lower level, do one of two things. Either:
n Edit the .vmx file for the VM, and add log.logMinLevel=X to the end of the file (where X is the minimum
log level you want the infrastructure to use)
n Edit the file /etc/vmware/config on the ESXi host, adding vmx.log.logMinLevel=X to the end of the file
(where X is the minimum log level you want the infrastructure to use)
VMIOF_VLog () function
The VAIO also provides VMIOF_VLog() to generate log messages. This function has the following prototype :
n The first input parameter to this function is the level of the message, which is analogous to the severity
parameter to syslog(). The log levels defined for VMIOF_VLog are:
n VMIOF_LOG_PANIC
n VMIOF_LOG_ERROR
n VMIOF_LOG_WARNING
n VMIOF_LOG_INFO
n VMIOF_LOG_VERBOSE
n VMIOF_LOG_TRIVIA
1 Install the VIB to the ESXi host. You can do this either using an ESXi CLI command, the Remote System
Explorer in VMware's Workbench, the Power CLI, or a Perl / Java program using the VIM API.
2 Attach the IO Filter to a VMDK of a test VM. While you can create a VMDK for a VM with any of the
management tools, you must attach the filter to the VMDK with the ESXi CLI.
NOTE If the VMDK does not exist, you must power off the VM to associate the VMDK with the VM,
then power the VM back on.
As you discover and fix flaws or provide enhancements to your filter, you must remove the existing filter
before you can install a new one.
The following sub-topics provide steps for performing each of these actions.
The ESXi CLI command you use to deploy a VIB to an ESXi host requires that you provide a URL to the VIB
file. The acceptable URLs include:
n file://some-pathname — where some-pathname is a file on the host's filesystem. To use this type of URL,
you must scp the VIB to your ESXi host, or access the vib staged on a NFS datastore.
n ftp://IP-or-hostname/some-path — where some-pathname is the path relative to the root of an FTP server
running on the specified IP or hostname.
To use the latter two URL forms, you must copy the VIB to an appropriate place under the doc root of the
web server or root of the FTP server.
NOTE To be able to scp files to an ESXi host, you must enable SSH to the host using its Direct Console.
ATTENTION While VMware's Workbench does not currently ship with a standard web server, there is a trick
you can use to create a web server using the build/vib directory of your filter's source directory:
2 Cd to the build/vib directory within your filter's source. Remember, this directory only exists after you
build your VIB the first time.
You can now browse to the IP address of your Workbench VM, port 8000, using http, and see your VIB file
there. The URL to your VIB is now http://ip-of-WorkbenchVM/vib-file-name, for example:
http://172.16.1.2:8000/sampfilt-1.0.0-1OEM.600.0.0.2329218.x86_64.vib
To test your IO Filter's functionality on a single ESXi host, you must install its VIB to said host. The
following procedure provides one method (of several) to complete this task:
Prerequisites
To install an IO Filter VIB via the ESXi CLI you must have:
n Enabled SSH for your ESXi host from its Direct Console
n Staged the VIB so that it is accessible by the ESXi host using a supported URL
Procedure
1 SSH into your ESXi host.
For example:
Another example:
This command invokes the ESXi installer, which opens the VIB and deploys its components to the
appropriate places on the host. On success, the command displays output similar to the following:
Installation Result
Message: Operation finished successfully.
Reboot Required: false
VIBs Installed: ZZZ_bootbank_countio_1.0.0-1OEM.600.0.0.2329218
VIBs Removed:
VIBs Skipped:
n The deployment should place a init script for it in the file /etc/init.d/iofilterd-filter_name (for
example /etc/init.d/iofilterd-countio)
n In the ESXi shell, run the command: ps -c | grep iof. You should see one iofilterd cartel
running, with a user-world belonging to that cartel called iofltd-filter_name (e.g. iofltd-
countio).
n Check /var/log/iofilter-init.log for messages indicating that your daemon started and did
not terminate due to a watchdog or for some other reason. You should see output similar to
the following in this file:
n The XML file created from parsing your CAPABILITIES part of your filter's .json file is placed
in /usr/lib/vmware/vmiof/disk/filter_name-config.xml.
n The XML file created from parsing your NETWORK part of your filter's .json file is placed
in /etc/vmware/firewall/vmiof-disk-filter_name.xml.
This topic builds on the content presented in “Deploying an IO Filter VIB to an ESXi Instance Using the ESXi
CLI,” on page 53. It has, essentially, the same prerequisites and purpose. However, this topic discusses how
to complete the same task using the Remote System Explorer component of VMware Workbench instead of
the ESXi CLI.
One difference is that you do not need to stage the VIB to deploy it with Workbench. That is handled under
the covers for you by Workbench. The other difference is that you need to have configured your ESXi host in
the Remote System Explorer before you can follow this procedure.
NOTE The Remote System Explorer (RSE) is an open source plugin to Eclipse, a key component of VMware
Workbench. VMware expanded RSE's functionality to understand ESXi type hosts and perform certain
operations with said hosts, including installing and removing VIBs. The VMware Fundamentals for Developers
course discusses how to create RSE connections to ESXi hosts.
Once you've successfully built your filter's VIB, to deploy it with Workbench, follow these steps:
Procedure
1 If you have not already done so, start VMware Workbench.
One way to do this is to select Window > Open Perspective > Remote System Explorer.
In one of its panes, Workbench displays the RSE perspective similar to the following:
Figure 3‑12. Workbench: Remote System perspective, with ESXi hosts added
3 Right-click on the ESXi host to which you would like to deploy the VIB and then select VMware >
Install Package...
Notice that the wizard includes a list of all of the ESXi systems that exist in the RSE, with the one you
right-clicked on already checked. To install the VIB on multiple ESXi hosts at the same time, check
additional hosts.
4 Either enter the full pathname to the VIB file, or click Browse... to use a modified File Chooser to
navigate to the VIB file.
The following figure shows the use of File Chooser, navigated to the build/vib folder for the countIO
sample:
NOTE You can create shortcuts to your build/vib directory by: Navigating to the build directory;
Highlighting the vib directory; Clicking + Add. This figure shows that shortcuts were added for several
vib directories. In the future, just select the vib directory shortcut instead of navigating through the
entire file tree.
The Install Wizard has the name of the VIB file to deploy.
Workbench displays the next dialog in the Install Wizard, similar to the following:
6 Click Install.
The Install Wizard attempts to deploy the VIB to the selected hosts. As it does this, it displays a
progress dialog similar to the following:
When the operation completes, the Install Wizard displays the results in a dialog similar to the
following:
NOTE The text displayed in this dialog is the output form the ESXi Installer as discussed in “Deploying
an IO Filter VIB to an ESXi Instance Using the ESXi CLI,” on page 53
8 If the install failed, fix the problem, then use the < Back buttons to return to the beginning and try
again.
10 Click Finish.
After you deploy your IO Filter for the first time on a host, you must attach said filter to a VMDK in order to
test said filter. You need only do this once, even if you replace the filter on the host (for example, after fixing
a flaw or adding functionality), even if you change the version of the filter, as you attach filters to a VMDK
using the name of the filter, not the name of its VIB file, etc.
Attaching the IO Filter to a VMDK causes the IO Filter Framework to invoke the diskAttach() callback of
said filter, passing any filter properties that you may set when you perform this procedure.
This procedure uses the vmkfstools command, whose syntax has been modified to support IO Filters
operations. The relevant syntax is:
Where:
n # — is the level of debugging verbosity. During development, consider setting this number to 5 or
higher to see the messages generated by calls to VMIOF_Log().
n filter_name — is the name of the IO Filter as specified in the name field of its SCONS file in its
Identification dictionary. This is not to be set to the VIB's file name.
n property — is the name of one of the properties exposed by the IO Filter as listed in its .json file
n vmdkfile — is the name of the VMDK's meta-data file, not its flat file or snapshot files
NOTE To specify values for multiple properties, separate each property name/value pair with a colon.
Further, all values are passed as ASCII strings. It is up to the IO Filter LI to convert ASCII strings to binary
numbers if needed.
Prerequisites
In order to attach an IO Filter to a VMDK via the ESXi Shell using the procedure described in this topic, you
must have already:
n Deployed the IO Filter VIB to the ESXi host running the VM to which the VMDK belongs
NOTE While it is technically possible to both create a VMDK and attach a filter to it with a single command,
attaching a filter to an existing VMDK provides behavior that more closely approximates the behavior of
attaching a filter to a VMDK via the VWC.
The VMware Fundamentals for Developers course discusses how to create VMDKs (vDisks) for a VM.
Procedure
1 Run the vmkfstools command providing the --iofilters option with the name of your IO Filter and
any properties you want to set and also providing the name of the VMDK to which you wish to add the
IO Filter.
2 Run the vmkfstools command providing the --iofilterslist option and the name of the VMDK file
specified in the preceding step.
The command displays the list of IO Filters attached to the VMDK file.
countio
This topic builds on the information presented in the topic “Attaching an IO Filter to a VMDK, Using the
ESXi Shell,” on page 60. It has analogous context and prerequisites. It has one additional prerequisite: that
you have already attached the IO Filter that you wish to detach.
Detaching an IO Filter from a VMDK causes the IO Filter Framework to invoke the diskDetach() callback of
said filter.
To detach a filter from a VMDK using the ESX CLI, perform the following steps as root:
Procedure
u Run the vmkfstools command as you did to attach the IO Filter, except: A) use "" (empty quotes) for the
filter_name; B) Do not set any property values.
Removing an IO Filter VIB from an ESXi Instance using the ESXi CLI
Every time you make changes to your IO Filter code and successfully build it, you need to deploy the new
VIB to test it. Before you can deploy the new vib, you must remove the previously installed VIB. This topic
provides one procedure (of many) for completing the task of removing the VIB, using the ESXi CLI.
To remove an IO Filter VIB, the system must be in system maintenance mode. In maintenance mode, ESXi
hosts do not run VMs. Thus, to enter maintenance mode, all VMs running on the host must get shut down
or suspended. Certain methods you use to put the system in maintenance mode migrate the VM to another
host in the cluster for you. Others methods require you to suspend or power-off the VM manually. This
topic assumes that you have powered off your test VM or migrated it to another host so that you can put
this host into maintenance mode.
To remove an IO Filter VIB from an ESXi host using the ESXi CLI, follow this procedure, running all
commands as root:
Prerequisites
To remove an IO Filter VIB via the ESXi CLI you must have:
n Enabled SSH for your ESXi host from its Direct Console
n Deployed the VIB to the ESXi host (using any of the available methods)
n Placed the ESXi host in system maintenance mode, migrating all running VMs to another host in the
DRS cluster or powering them off on this host
Procedure
1 Run the command: esxcli system maintenanceMode set -e true
The system tries to enter maintenance mode. If any VMs are running that it cannot stop, it fails to enter
maintenance mod and displays the following error message:
The system does not generate any output if it successfully enters maintenance mode. You can always
obtain the current mode using the following command: esxcli system maintenanceMode get.
The ESXi Installer stops your IO Filter's daemon, if it has one. Then, it removes the items from the VIB
deployed to the filesystem (for example /usr/lib/vmware/plugin/libvmiof-disk-filter_name.so.
Finally, it removes the VIB from the VIB database.
The ESXi Installer displays the results of the command, similar to the following:
Removal Result
Message: Operation finished successfully.
Reboot Required: false
VIBs Installed:
VIBs Removed: ZZZ_bootbank_countio_1.0.0-1OEM.600.0.0.2329218
VIBs Skipped:
3 Take the system out of maintenance mode by running the command: esxcli system maintenanceMode
set -e false
The system exits system maintenance mode, starting any VMs configured to auto-start. All other VMs
on the host remain paused or powered off.
Your IO Filter VIB is no longer on the ESXi host. You are now able to install a new IO Filter VIB.
This topic builds on the content presented in “Removing an IO Filter VIB from an ESXi Instance using the
ESXi CLI,” on page 62. It has, essentially, the same prerequisites and purpose. However, this topic discusses
how to complete the same task using the Remote System Explorer component of VMware Workbench
instead of the ESXi CLI.
One difference is that you need to have configured your ESXi host in the Remote System Explorer (RSE)
before you can follow this procedure.
The workflow for removing a VIB via RSE in Workbench is similar to the workflow for deploying a VIB via
RSE in Workbench as described in “Deploying an IO Filter VIB to an ESXi Instance Using Workbench,” on
page 55. The key differences are:
n The procedure uses the VMware Uninstall Wizard instead of Install Wizard
To remove an IO Filter VIB from an ESXi host using RSE in Workbench, follow these steps from within the
Workbench GUI:
Procedure
1 If you have not already done so, start VMware Workbench.
One way to do this is to select Window > Choose Perspective > Remote System Explorer.
In one of its panes, Workbench displays the RSE perspective similar to the following:
Figure 3‑20. Workbench: Remote System perspective, with ESXi hosts added
3 Right-click on the ESXi host to which you would like to remove the VIB and then select VMware >
Package Manager
4 Scroll through the list of installed VIBs and select the VIB for your IO Filter Solution.
When you select your VIB, the Package Manager enables the Remove button.
The Package Manager launches the Uninstall Wizard, which displays its first dialog similar to the
following:
Checking Maintenance Mode causes the Uninstall Wizard to (attempt to) place the host in maintenance
mode before it tries to uninstall the VIB, and then move it back again after the operation. Again, you
can only remove IO Filter VIBs when a host is in maintenance mode.
7 Click Uninstall.
The Uninstall Wizard starts the uninstall process, displaying a dialog similar to the following as it
progresses:
After the operation completes, the Uninstall Wizard displays another dialog with the results, similar to
the following:
NOTE The text displayed in this dialog is the output form the ESXi Installer as discussed in “Removing
an IO Filter VIB from an ESXi Instance using the ESXi CLI,” on page 62.
8 If the uninstall failed, fix the problem, then use the < Back buttons to return to the beginning and try
again.
9 Click Finish.
Workbench closes the Uninstall Wizard dialog, leaving the Package Manager dialog.
10 Click Close.
Your IO Filter VIB is no longer on the ESXi host. You are now able to install a new IO Filter VIB.
n Deploy / upgrade / remove IO Filters to a cluster managed by a vCenter Server using the VIM API (and
a supported SDK for that API)
n Create a SPBM policy that uses the IO Filter with its properties set to certain values. If they want the
filter to use different property values for different VMDKs, they must create separate policies for each
set of values.
n Apply the desired SPBM policy to various VMDKs of VMs. Applying the policy to a VMDK causes the
IO Filter Framework to attach the IO Filter in the SPBM policy to the VMDK. Removing the policy
causes the IO Filter Framework to detach the IO Filter from the VMDK.
Administrators use the VWC to perform all of these actions, except deploy / upgrade / removal of IO filters.
Again, the latter operations currently require invoking functions in the VIM API.
The following sub-topics provide detailed steps for completing each of these tasks.
Once you have built an IO Filter, which builds its offline bundle file, you must stage it to a web server. Do
this in a manner analogous to the procedure given in “Staging your VIB for Deployment from the ESX CLI,”
on page 53, except:
n You must stage the bundle to a web server (http or https). You cannot stage it to an FTP or other server.
VMware has enhanced the VIM API with several functions for managing IO Filters on a cluster. Common
parameters to these functions include:
n clusterMOID — is the Managed Object ID (MOID) of the cluster to which you wish to deploy the IO
Filter.
n filterID — is a string that is a concatenation of these items (each separated by an under-score): The 3-
letter vendor_code in the filter's SCONS file; The string bootbank; The filer's name from its SCONS file;
The filter's version from its SCONS file, a dash, and then a string assigned by the build system, for
example "1OEM.600.0.0.2329218" (the numbers after the last decimal are the build number of ESXi at
which the SDK is targeted). For example, ZZZ_bootbank_countio_1.0.0-1OEM.6.0.0.2329218.
This function creates a new task on the vCenter Server against which you run this function. The task, in
turn, attempts the deployment, which includes deploying the IO Filter's VIB (enclosed in the bundle
file) to each host in the cluster, and making the IO Filter available for SPBM policies.
NOTE The bundleURL can be HTTPS, however, currently vSphere (VC/EAM) doesn't validate the
certificate and has no way for partners to provide credentials for the bundle server.
VMware provides several officially supported SDKs with which you can develop programs using the VIM
API, including one for Perl, Java, and the Power CLI. It also provides a community-supported SDK for the
VIM API that uses Python (pyvmomi). Currently, you can only access the IO Filter functions of the VIM API
via the officially supported SDKs, and the Managed Object Browser (MOB) of the vCenter Server containing
the cluster to which you wish to deploy your IO Filter Solution.
Since the MOB is language-independent, and accessible via any web browser, this topic demonstrates
deploying the IO Filter Solution via this interface. To deploy your IO Filter Solution to your cluster via the
MOB, follow these steps:
Prerequisites
To deploy an IO Filter Solution to a cluster, you must have:
n Already built the IO Filter, specifically its bundle (located in the file build/bundle/filter_name-
offline-bundle.zip under the Filter's source directory)
n Credentials for an administrator account on the vCenter Server that contains the cluster
n Removed the IO Filter from the cluster, if you had deployed it previously
Procedure
1 Browse to https://IP/mob.
Where IP is the IP address (or host name) of the vCenter Server containing the cluster to which you
wish to deploy your IO Filter Solution.
The MOB requires authentication, so your browser displays a standard browser authentication dialog.
2 In the authentication dialog, enter credentials of a vCenter Server user with administrator privileges.
By default, vCenter Servers install with the user [email protected]. You set the password for
this user during deployment of the vCenter Server.
The browser displays a page with the MOB's content similar to the following:
Figure 3‑27. MOB: Content page, with IOFilterManager outlined for emphasis
The browser displays a page with the IO Filter Manager VIM functions, similar to the following:
Figure 3‑28. MOB: IOFilterManager page, with InstallIoFilter_Task outlined for emphasis
The browser displays a page with fields for each of the parameters for the function, similar to the
following:
6 Enter the URL of the staged IO Filter Bundle file in the text field labeled vibURL - its really looking for
a bundle, not a VIB - outlined in a dashed line in the preceding figure for emphasis.
7 Replace the text MOID with the managed object ID of the cluster to which you wish to deploy your IO
Filter Solution.
The following figure shows the form filled out with the URL for the countIO bundle, staged at
https://172.16.193.11 to the cluster whose MOID is domain-c7:
The MOB creates a vCenter Server task that invokes the InstallIoFilter_Task() function, passing the
arguments you placed in this web form. The browser displays the task ID in another form at the bottom
of the web form. The following figure shows an example result page:
The MOB queries the vCenter Server for properties of the task, which the browser displays, similar to
the following:
The MOB queries the vCenter Server for information about the task, which the browser displays,
similar to the following:
This figure shows the task state is "running", (outlined for emphasis). This means that the vCenter
Server is deploying the IO Filter VIBs in the bundle to the ESXi hosts, creating the appropriate SPBM
entities, etc.
11 Continue to refresh the task info page until the state is either "success" or "error".
On success, you are ready to create SPBM policies, attach the policies to VMDKs, etc.
On error, you need to explore the log files (especially /var/log/hostd.log on each of the ESXi hosts) to
determine why the operation failed.
What to do next
After successfully deploying an IO Filter Solution to a cluster, you need to create an SPBM policy that uses
the filter.
In a production environment, in order to get an IO Filter attached to a disk, administrators must first create
an SPBM policy that includes a rule for the said IO Filter. They do this with the VWC. To accomplish this
task, follow these steps using the VWC:
Prerequisites
You must have deployed an IO Filter's bundle to a cluster before you can create an SPBM policy that uses it.
You must also log into the VWC using credentials of a user with administrator privileges, such as
[email protected]
Procedure
1 Browse to the Home page, home tab. The browser displays a page similar to the following:
Figure 3‑34. VWC: Home Page, Home Tab, VM Storage Policies icon highlighted
The VWC displays a page with a list of VM Storage Policies (SPBM policies). By default the page looks
like the following:
3
Click the icon to create a new SPBM policy.
The VWC displays the Create New VM Storage Policy dialog similar to the following:
Figure 3‑36. VWC: Create New VM Storage Policy dialog, Name and Description page
4 Enter a name and description for the policy in the fields provided, and then click Next.
Each policy must have a unique name, which should provide a brief but accurate indication of what the
policy is for. Use the Description field to provide detailed information about the policy. For IO Filter
policies, you should describe the details of the policy, at a high level. For example, an administrator
might set the name to "countioAcceleration5" and have a description of "CountIO policy with
Acceleration property set to 5."
Figure 3‑37. VWC: Create New VM Storage Policy dialog, Rule-Sets page
5 Read the information in the Rule-Sets page, and then click Next.
The VWC displays the Common rules page of the dialog as follows:
Figure 3‑38. VWC: Create New VM Storage Policy dialog, Common rules page
You must enable common rules because IO Filters are part of the SPBM common rule set.
The VWC displays the list of IO Filter types currently installed on the host. Currently, this will either be
cache or replication, as present in the CLASS field of a filter's .json file.
8 Select the class of filter for which you want to make a rule in the policy, using the pull-down.
The VWC expands the Rule-set page with another pull-down for you to select the IO Filter of the
selected type. The following figure shows the results of selecting a replication filter class and the pull-
down for selecting the filter (outlined for emphasis):
Figure 3‑39. VWC: Create New VM Storage Policy dialog, Common rules page, adding a rule
9 Use the [Select Value] pull-down to select the IO Filter to add to the rule.
The VWC displays a list of properties (capabilities) exposed by the IO Filter's .json file, along with their
default values and pull-downs / text fields for modifying their values. The following figure shows the
properties for a variant of sampfilt called MyFilter1:
Figure 3‑40. VWC: Create New VM Storage Policy dialog, Common rules page, adding a rule, setting
property values
NOTE All VMDKs to which you attach this policy will receive the same values for these properties. If
you want different VMDKs to have different values for these properties, you must create separate
policies with the different values.
11 If desired, add additional rules to the policy using the <Add rule> pull down, repeating steps 9 and 10
for the additional rule(s).
NOTE You can currently only add two rules to a common rule policy, because there are only two
classes of filters supported and a given VMDK can only have one filter of each class attached at a given
time.
12 Click Next.
Figure 3‑41. VWC: Create New VM Storage Policy dialog, Rule-Set 1 page
You can use the items on this page to configure datastore rules for your policy. By default, no datastore
rules are present in a vSphere system. Further, they are not needed for testing IO Filters.
14 Click Next.
The VWC checks the cluster's storage against the datastore rule set and displays the Storage
compatibility page of the dialog similar to the following:
Figure 3‑42. VWC: Create New VM Storage Policy dialog, Storage compatibility page
NOTE Since you did not create any datastore rules, this dialog should list all of your datastores. In any
event, there are no actionable items in this dialog.
15 Click Next to see a summary of the policy configured thus far, or Finish to just complete the policy
creation.
If you click Finish, continue to the discussion of Step 17. If you click Next, the VWC displays the Ready
to complete page similar to the following:
Figure 3‑43. VWC: Create New VM Storage Policy dialog, Ready to complete page
16 Review the content in this page. If anything is wrong, use Back to correct it and then navigate back to
this page.
The VWC causes the vCenter Server to create the new policy as configured. On successful completion,
the VWC displays the VM Storage Policies page as shown earlier in this topic, except that the list of
policies now contains the one you just created.
The cluster now has a VM Storage Policy (SPBM policy) which you can apply to VMDKs in the cluster.
What to do next
Apply the policy you created to one or more VMDKs.
NOTE In ESX 60U1, after upgrading your Filter to a newer version, you need to delete the SPBM policy and
recreate it. This is no longer a requirement using 60U2
Attaching an IO Filter to, and Detaching it from, a VMDK Using the VWC and SPBM
In production, you attach IO Filters to VMDKs via Storage Policy-Based Management (SPBM) policies. After
creating an SPBM policy that references the IO Filter as a common rule, you then apply that policy to
VMDK(s). This causes the vCenter Server to send requests to the hostd on the ESXi system currently hosting
the VM to attach the filter to the said VMDK(s).
You configure (add/remove/change) storage policies for a VMDK using the same VWC pages you use to
configure other aspects of a VMDK. For example:
n For an existing VM, you assign a policy to any of a VM's VMDKs in the Virtual Hardware tab of the
Edit Settings dialog (reached by right-clicking on a VM and then selecting Edit Settings, and then
selecting the Virtual Hardware tab.)
n While creating a VM, you assign a policy to any of the VM's VMDKs in the Customize hardware page
of the New Virtual Machine dialog (reached via Actions > New Virtual Machine > New Virtual
Machine... with a VM or cluster selected in the Object Navigator).
In either case the VWC displays a list of VM hardware similar to the following:
Clicking the arrow associated with a virtual disk causes the VWC to display the attributes of the VMDK,
similar to the following:
The VM storage policy pull-down allows you to select which storage policy, of those define on the cluster, to
apply to this VMDK. You can apply different storage policies to each VMDK, as long as the policies do not
conflict.
In summary, follow these steps to add / remove / change the IO Filters attached to a VMDK, using the VWC
and SPBM policies:
Prerequisites
In order to attach an IO Filter to a VMDK using the VWC, you must:
Procedure
1 Define a storage policy that references the desired IO Filter(s).
2 Select a VM that contains the VMDK to which you wish to attach the IO Filter(s) in said storage policy,
or create the VM.
3 In the Virtual Hardware tab of the Edit Settings dialog, or Customize hardware page of the New
Virtual Machine dialog, respectively, display the list of properties for the VMDK to which you wish to
attach the IO Filter(s)
4 Select the SPBM Policy you created in Step 1 using the pull-down list associated with the VM storage
policy label.
This causes the VWC to tell the vCenter Server to direct the hostd running on the ESXi system hosting
the VM to have the IO Filter Framework on said host invoke the diskAttach() callback for the IO
Filter(s) listed in the policy.
Removing an IO Filter Solution from a Cluster Using the Managed Object Browser
(MOB)
This topic builds on the concepts presented in two other topics: “Removing an IO Filter VIB from an ESXi
Instance using the ESXi CLI,” on page 62 and “Deploying an IO Filter Solution to a Cluster using the MOB,”
on page 68. The reason for removing an IO Filter Solution is the same as that for removing an IO Filter VIB
from an ESXi instance. The prerequisites are analogous, that you have installed the bundle and have access
to the MOB with administrator credentials, and you must know the MOID of the cluster from which you
wish to remove the IO Filter Solution.
However, there are some additional considerations for removing an IO Filter Solution from a cluster:
n You must know the ID (not the name) of the IO Filter you wish to remove. You can retrieve this
information using the QueryIoFilterInfo() VIM API method via the MOB, the Power CLI or a
supported SDK.
n You cannot remove an IO Filter that is used in an rule in an SPBM policy that is currently applied to a
VMDK. You must edit the policy so that it does not contain the IO Filter in one of its rules, or you must
delete the policy entirely. However, you cannot delete a policy that is currently applied to any VMDKs.
n If you remove an IO Filter that is used in a rule in an SPBM policy that is currently not applied to any
VMDKs, the policy becomes invalid until you either edit the associated rule, or re-deploy the IO Filter
Solution.
n The URL you specified when installing the Filter must be still accessible as VC doesn't store your Filter
Bundle, so it has to access your URL to get the bundle info before uninstalling.
As a reminder, the MOB is one interface for invoking the VIM API IOFilterManager methods. You use the
UninstallIoFilter_Task() VIM API method to remove an IO Filter Solution from a cluster. To invoke this
method via the MOB, follow these steps:
Procedure
1 Use the MOB to access the IOFilterManager by following Steps 1-4 of the procedure in “Deploying an
IO Filter Solution to a Cluster using the MOB,” on page 68.
2 Click UninstallIOFilter_Task.
3 Enter the ID of the IO Filter in the ID field, and the MOID of the cluster into which the IO Filter has
been deployed in the MOID field, and then click Invoke Method.
The MOB invokes the method on the vCenter Server, which creates a task to remove the filter from each
ESXi host in the cluster, and displays a link to the task in the same web form page.
4 View the task information as discussed in Steps 9-11 of the procedure in“Deploying an IO Filter
Solution to a Cluster using the MOB,” on page 68.
The IO Filter Solution is removed from the cluster, and each ESXi host therein.
Chapter Summary
The topics in this chapter presented details such that you should now be able to:
n List the components of a minimal environment for developing and testing IO Filter Solutions
n List the key folders, files, and tools created by installing the VAIODK, including the sample filters
c A vCenter Server
e VMware Workbench VM
2 Why does the VAIODK tool-chain compile each Library component in both 32 and 64-bit mode?
b So that the LI can be linked with the 32 and 64-bit versions of the Daemon
a make
b scons
c config; make
d jam
4 In normal operations, to which vSphere component do administrators deploy their IO Filter Solutions?
a ESXi instances
b Clusters
c Datacenters
d vCenter Server
n Specify where to create your build folder on your build system, and why
n Create appropriate entries in your Solution's SCONS and JSON files, based on the plans for your
Solution
n Create minimal source files for the library and daemon components of your Solution
n Define the prototype for each of the callback's in a library and daemon component, and when the IO
Filters Framework invokes each
n “Creating a Skeletal Filter Library Component Source File and Understanding its Entry-points /
Callbacks,” on page 101
n Start with a sample filter that is closest to your product, then morph that code as you need
Regardless of the strategy you choose, at a minimum, you need a filter with a library component and a CIM
provider component. Most IO Filter Solutions also include a Daemon component. The code in all of these
components contains a set of callbacks functions invoked by the IO Filter and CIM Provider frameworks,
respectively.
This chapter provides the details you need to create each of these components.
n The Class Executable rule (a vmware extension to SCONS) requires that its sources be under the said
SCONS root directory.
You must have the VAIODK installed before performing this task. Enter the following commands at the
shell prompt on your development platform. :
Procedure
1 cd /opt/vmware/vaiodk-6*/src
2 mkdir myfilter
4 cd myfilter
n To make it easier for people unfamiliar with scons but familiar with make, the samples provided in the
SDK also include a Makefile with rules that invoke scons to build the filter, clean the build
environment, etc.
NOTE As an exercise, use diff to compare the Makefiles in two sample filter's directories.
Once you create your filter's source directory (as discussed in “Creating a Build Folder,” on page 88), create
a Makefile in that directory by copying the generic Makefile from any of the sample filter directories into
your filter's source directory.
NOTE Remember, the source directories for all sample filters are under /opt/vmware/vaiodk-
*/src/partners/samples/iofilter/*. For example the
directory /opt/vmware/vaiodk-6.0.0-2198567/src/partners/samples/iofilter/countIO contains the
countIO sample filter.
After creating a Makefile, you need to create / edit a SCONS file as discussed in detail in section“Creating
and Populating a Correct Scons File,” on page 89.
mkdir -p /opt/vmware/cimpdk-6.0.0-2799832/oss/sfcb/src
mkdir: cannot create directory `/opt/vmware/cimpdk-6.0.0-2799832/oss/sfcb/src': Permission denied
make: *** [generic-prep] Error 1
Enter the following commands at the shell prompt on your development platform:
Procedure
1 cd /opt/vmware/cimpdk*/oss
3 cd /opt/vmware/cimpdk*/oss
The SCONS file has precise syntax requirements. Before getting to IO Filter-specific rules, some general
syntax notes are:
n SCONS is built on top of Python, so all of its syntax rules apply, including: Case sensitivity; comments
begin with a pound sign (#); You can use either single or double quotes for strings, as long as you begin
and end the string with the same quote; The escape character is back-slash (\), etc.
n Some of the SCONS rules required to build IO Filter solutions use lists. Pythons syntax has lists
enclosed in square brackets ([]) with items in the list separated by commas.
n SCONS files for VAIO require you to define several dictionaries (a Python data structure) with specific
names, with specific keys within those dictionaries set to values you provide. While it is a gross
injustice to equate Python dictionaries with C structures, the way the VAIO SCONS files use
dictionaries is analogous to defining a C structure with members initialized to values you provide.
While there are several methods for creating dictionaries in python, the syntax used in VAIO SCONS
files is:
somename = {
'key1' : 'yourvalue1',
'key2' : 'yourvalue2',
}
This snippet defines a dictionary called somename that contains two keys: key1 and key2, with values
yourvalue1 and yourvalue2 respectively.
VAIO has some specific content requirements for SCONS files to build an IO Filter solution. This topic
discusses the dictionaries you must define and methods you must invoke to successfully define a SCONS
file for building an IO Filter solution.
To understand these requirements, consider the following SCONS file from the countIO sample solution,
with a following discussion of the content:
92 'identification' : countIOIdentification,
93 'cim location' : 'cim/countIO',
94 'shared include' : [ 'include/common',
95 ],
96 }
97 countIOProvider = defineCimProvider(countIOProviderDef)
98
99 #
100 # Build the Filter's config file
101 #
102 countIOConfig = defineVMIOFconfig(countIOVmiofDef, cim=countIOProviderDef)
103
104 #
105 # VIB build definitions for the filter package
106 #
107 countIOVibDef = {
108 'identification' : countIOIdentification,
109 'payload' : [ countIOSo,
110 countIOdaemonSo,
111 countIOConfig,
112 countIOProvider
113 ],
114 'vib properties' : {
115 'provides' : [],
116 'depends' : [],
117 'conflicts' : [],
118 'replaces' : [],
119 'acceptance-level' : 'community',
120 }
121 }
122 countIOVib = defineVmiofVib(countIOVibDef)
123
124 #
125 # Offline Bundle definition for the filter package
126 #
126 countIOBulletinDef = {
127 "identification" : countIOIdentification,
128 "vibs" : [ countIOVib,
129 ],
130
131 "bulletin" : {
132 # These elements show the default values for the corresponding items
133 # in bulletin.xml file. Uncomment a line if you need to use a
134 # different value.
135 #'severity' : 'general',
136 #'category' : 'Enhancement',
137 #'releaseType' : 'extension',
138 #'urgency' : 'Important',
139
140 'kbUrl' : 'http://kb.vmware.com/kb/example.html',
141
142 # 1. At least one target platform needs to be specified with
143 # 'productLineID'
144 # 2. The product version number may be specified explicitly, like 7.8.9,
145 # or, when it's None or skipped, be a default one for the devkit
The first 28 lines just contain copyright and comments. The remaining contents are divided into sections as
described in the following sub-topics.
You use this dictionary to define other dictionaries in the SCONS file, as well as with functions you invoke
in the SCONS file. The keys you must provide in this definition, as seen in lines 32-43 of the countIO.sc file,
are:
n name — Set this to the name of the filter. In the example, the name is countio. The name is exposed in
UIs as well as used in the name of the VIB and bundle files. Because it is used for filenames, it cannot
contain spaces or other punctuation. You are required to name your filter using the format: Vender-
CodePRDName where:
For example, if ExampleCo is assigned the VendorCode exc, and calls their filter product cacheme, the
name should be exccacheme. Only lower case characters can be used.
n binary compat — While you can set this to either yes or no, you must set it to yes for asynchronous
releases and certification
n summary — A longer name of the filter. Unlike the name key, this key can contain spaces, punctuation,
etc. However, it should be brief.
n version — Version number of the filter. This must be strictly numbers and dots. The ESXi installer uses
this number to differentiate revisions of the filter. If you deploy version X to a host, then re-deploy
version X to the same host, the installer believes they are the same filter. To upgrade, the new filter
must have a different version number.
n license — Using one of the following macros, specify the type of license you use to release your IO Filter
solution. The available license macros, and their meaning, are:
n VMK_MODULE_LICENSE_VMWARE — The same license agreement as vMware uses to release its products
Alternatively, you can set this to something you define yourself, for example,
"MySpecialLicenseAgreement". In this case, the build system considers this a "third party license"
agreement which is unknown to VMware.
n vendor_code — You must obtain a vendor code from your IO Filter program manager, and set this key to
that value
n vendor_email — Set this to the email address you want customers to use when contacting you about
issues with this filter
As with the Identification dictionary property, you use this dictionary in a function you invoke within the
SCONS file. The keys you must provide in this definition, as seen in lines 48-53 of the countIO.sc file, are:
n identification — Set this to the name of the identification dictionary you have previously defined
n VMIOFTYPE type — The type of IO filter implemented in this solution. Currently, you can only set this
value disk.
n VMIOF version — The version of the IO Filtering framework to use with this IO Filter
WHAT'S NEW Currently this tag is not used. If you use 60U1 VAIODK to build. the vib/bundle will
depend on VMIOF API version 1.0, if you use 60U2 VAIODK to build, the vib/bundle will depend on
VMIOF API version 1.1.
n capabilities — The path, relative to the filter's directory, of a .json file that contains settings required for
an IO Filter solution as described in “Creating and Populating a Correct .json File,” on page 98.
As with other dictionaries, you use this dictionary's definition with a function you invoke within the SCONS
file. The keys you must provide in this definition, as see in lines 58-66 of the countIO.sc file, are:
n identification — Set this to the name of the identification dictionary you have previously defined
n source files — This must be a list of files whose source make up your filter's LI. In the example, the LI
only has one source file, countIO.c.
n cc defs / cc flags / cc warnings — These are optional flags that can be passed to the compiler. cc defs are
flags that as passed to the c preprocessor. Since the compiler invokes the preprocessor they are also
included on the C compiler command line. By defining them separately, it is therefore possible to use
the preprocessor independently of the compiler. cc flags are flags that are only given to the C compiler.
cc warnings— warnings are diagnostic messages that report constructions that are not inherently
erroneous but that are risky or suggest that there may have been an error. These are flags that can be
passed to the C compiler to determine the different category of warnings that could be either enabled or
suppressed.
n extra objects— You can pass pre-compiled partner provided opaque binary ELF files into the user-world
build information of your IO filter.
NOTE You will have to provide both 32 and 64 bit versions of these files as they could get loaded into
either 32 or 64 bit cartels such as vmx and vmkfstools, as the case maybe.
n The IO Filter properties definition previously defined in Filter Properties dictionary item
n identification—Set this to the name of the identification dictionary you have previously defined
n source files — This must be a list of files whose sources make up you IO Filter's daemon code. In the
example, the daemon only has one source file countIODaemon.c.
n cc defs / cc flags / cc warnings — These are optional flags that can be passed to the compiler. cc defs are
flags that as passed to the c preprocessor. Since the compiler invokes the preprocessor they are also
included on the C compiler command line. By defining them separately, it is therefore possible to use
the preprocessor independently of the compiler. cc flags are flags that are only given to the C compiler.
cc warnings— warnings are diagnostic messages that report constructions that are not inherently
erroneous but that are risky or suggest that there may have been an error. These are flags that can be
passed to the C compiler to determine the different category of warnings that could be either enabled or
suppressed.
n extra objects— You can pass pre-compiled partner provided opaque binary ELF files into the user-world
build information of your IO filter.
NOTE you provide a 64 bit version of this binary ELF file since it gets loaded into the 64 bitiofilterd
cartel
n The IO Filter's Daemon plug-in dictionary item previously defined in this topic
n The IO Filter properties definition previously defined in Filter Properties dictionary item in this topic
n identification— Set this to the name of the identification dictionary previously defined in this topic
n cim location— The location of the directory containing the CIM provider project files. This location is
relative to the directory holding the .sc file.
n shared include— The location of any files necessary to the CIM provider build that are shared with the
IO filter. This location is relative to the IO filter project directory.
n IO filter properties definition as per Filter Properties Dictionary Item previously defined in this topic
n CIM provider definition as per CIM Provider Dictionary Item previously defined in this topic
n identification— Set this to the name of the identification dictionary previously defined in this topic
n payload— The shared objects to be included in the VIB. Refer to sections Invoking the defineVMIOFso
Function, Invoking the defineVMIOFDaemonSo Function, and Invoking the defineVMIOFconfig Function
defined previously in this topic.
n Setting VIB properties — The VIB properties provides the ESXi installer with information about your IO
Filter that is necessary for it to install correctly. You must set these fields to appropriate values as
required for your IO Filter.
n provides— Lists interfaces or virtual packages that this VIB package provides. Each entry has two
members — the name field that is required and version field that is optional.
n depends— This is the list of VIBs that the said VIB is dependent on
n conflicts— This is the list of VIBs that should not be installed along with the said VIB.
n replaces— List of VIB/VIBs if any that this VIB replaces. There is no need to enter a replaces field for
older versions of the same package name as these will be automatically replaced. Use this only
when packages have been renamed.
n acceptance-level— This represents the VIB acceptance level. Valid values are community, partner,
accepted, certified.
Each IO filter project has standard depends and provides properties. These are automatically added to the
VIB. You can use the VIB section to add vendor-specific depends and provides properties information.
n identification— Set this to the name of the identification dictionary previously defined in this topic
n vibs— List of VIBs to be included in the offline bundle as per Invoking the definePartnerVib Function
topic defined previously.
n Bulletin Properties — The information used to create elements in the bulletin.xml file comprise of the
following :
n Severity— This identifies the bulletin's severity. Accepted values are - critical, security, general.
n category— This field defines the purpose of the said packaged item, for example what kind of issue
is the VIB addressing. Valid values are : Security, BugFix, Enhancement(default), Recall, RecallFix,
Info, Misc.
n releaseType— This field specifies the release type. Valid values are : patch, rollup, update, extension,
notification, upgrade.
n urgency— This field enables you to define the importance of the packaged item. Valid values are :
critical, important(default), moderate, low.
n kbUrl— This is the URL link to a knowledge base article or similar online documentation about the
entire issue. This field must contain text, but can indicate that no URL is available.
n platforms— This field denotes the platform for which the IO filter was developed. At least one
target platform needs to be specified.
The following SCONS file snippet illustrates how to create the dictionary item and invoke the function:
1 #
2 # Build the test app
3 #
4 sampfiltTestuser-worldDef = {
5 'identification' : sampfiltIdentification,
6 'source dirs' : ['tests',
7 ]
8 }
9 sampfiltTest = defineVmiofTestApp('test1',
10 sampfiltTestuser-worldDef,
11 sampfiltVmiofDef,
12 testDir='ZZZ_corp_tests')
n identification — Set this to the name of the identification dictionary you have previously defined (Not
shown)
n source dirs — The path, relative to the filter's directory, of a .json file that contains settings required for
an IO Filter Test App “Creating and Populating a Correct .json File,” on page 98.
n The IO Filter properties definition previously defined in Filter Properties dictionary item (Not shown in
this example)
NOTE You cannot add the test app to the vib or offline bundle. You must manually copy it to the ESXi host
in order to run it there (or have it available via an NFS mount).
The document vSphere APIs for IO Filtering Development Kit (VAIODK) Guide for the Command Line contains
the topic Editing the .json File (<filter_name>_config.json), which describes the contents of this file well.
To understand these requirements, consider the following .json file from the countIO sample solution, with
a following discussion of the content:
1. {
2. "INFO" : [
3. " Copyright 2014 VMware, Inc. ",
4. " All rights reserved. -- VMware Confidential "
5. ],
6. "PROPERTIES" : {
7. "TYPE" : "disk",
8. "CLASS" : "cache",
9. "CATALOGS" : {
10. "en" : "catalogs/catalog_en.vmsg"
11. }
12. },
13. "CAPABILITIES": {
14. "numWorkGroups": {
15. "min": 0,
16. "max": 8,
17. "default": 0
18. },
19. "encoding": {
20. "values": ["2", "4", "6"],
21. "default": "2"
22. }
23. },
24. "NETWORK" : {
25. "firewall" : {
26. "inbound" : [ 32768, [32770, 32774], 32790],
27. "outbound" : [ 25, 80, 443, [32770, 32774]]
28. }
29. },
30. "RESOURCES" : {
31. "DAEMON MEMORY RESERVATION" : 100
32. }
33. }
n CLASS — It is important to note that you define the class of the IO Filter Solution in the PROPERTIES
section. This is the only place that you define the class. The IO Filter Framework uses this setting to
determine the order in which to invoke this filter's callbacks, relative to other IO Filters attached to the
same VMDK.
n CATALOGS — You must create one entry in this section for each locale for which you define natural
language (localized) versions of your filter's properties' names and descriptions (see next section). You
only have to provide this section if your filter defines CAPABILITIES / properties. For details on
defining locales and catalogs for them, see “Understanding How to Localize your Solution,” on
page 116.
n LIs are given the list of properties and their initial values when an administrator attaches the IO Filter to
a VMDK
n Administrators can change the values via the vSphere Web Client (VWC) or vmkfstools, which trigger
(optional) callbacks for validating and changing the property's value.
n You specify the initial value of each property using the default attribute of each property.
Administrators can override these initial values in the VWC and using vmkfstools, when the
administrator attaches the IO Filter to a VMDK.
n If you specify the min and max attributes, or values attribute of a property, the VWC only allows settings
within bounds or from the list, respectively.
n LIs typically store the values of each property in a sidecar for the VMDK.
IMPORTANT Currently, each IO Filter solution must define at least one capability / property, even if the
solution does not use it.
NOTE In 60U2, support was added to create a capability that accepts any value from the user. This can be
used to provide IPs, network shares, or other arbitrary string based filter settings. The default value must be
"*" to pass the SPBM compatibility check.
To prevent port-usage collisions, VMware assigns each IO Filter partner up to 8 discrete ports to which their
solutions can bind to listen for incoming connection requests. VMware limits solutions to creating up to 200
outbound connections to off-host destinations.
You must create a firewall entry within the NETWORK entry, and separate lists of inbound and outbound
used by your solution. On installation of your IO Filter solution's VIB to a host, the installation process
automatically opens the listed ports in the firewall in the indicated directions. For example, the example
entry indicates that the solution listens (binds to) ports 32768, 32770 through 32774 and 32790, and makes
outbound connections on ports 25, 80, 443, and 32770 through 32774.
NOTE The daemon in your IO Filter Solution is recommended to use SSL to establish a secure connections
between its LIs and off-host peer daemons, incorporating additional measures to authentic such
connections, such as magic numbers, etc.
If you do not provide this attribute, the build system assumes its value to be zero. Thus, failing to provide it
does not cause a build issue. However, providing a value that is too low does prevent your daemon from
starting at runtime. For example, if you specify 10MB in this value, and your daemon includes a static data
of 15MB, the IO Filter Framework will fail to start the daemon during ESXi boot or after VIB installation.
REMEMBER Memory allocated using VMIOF_Heap* API is does not count against this limit.
1 Set "DAEMON MEMORY RESERVATION" to a very large value, e.g. 1GB, then build and install the vib.
5 Run your IO Filter through worst case scenario by opening the maximum number of VMDKs you want
to support at the same time.
6 Run memstats again, and rMinPeak in the first row will be the maximum memory needed by your
daemon. Then add some headroom on top of that.
7 Set "DAEMON MEMORY RESERVATION" to the new value you just figured out.
NOTE Shared memory is not counted by DAEMON MEMORY RESERVATION. But the page table
used to map shared memory is counted, which will be automatically covered using the steps described
above.
n #include(s) — The source must #include <vmiof.h> in order to have definitions for the macro you use
to enumerate the mandatory and optional callbacks present in your Library component
n Callbacks — The source must define certain code for all mandatory callbacks. It may also define code
for the optional callbacks allowed by the IO Filter Framework.
n VMIOF_DEFINE_DISK_FILTER — The code must contain one invocation of this macro, which provides the
IO Filter Framework with pointers to the callbacks in the Library (among other things)
You must reference this file in the SCONS file's Library Instance dictionary item, in the source files key.
The sub-sections that follow provide details for creating a skeletal LI source file.
NOTE The sampfilt sample is essentially the skeletal solution discussed in this topic, except that it has
functions for optional as well as mandatory callbacks.
Procedure
1 Create a C source file in the IO Filter's source directory.
b At the bottom of the file, use the VMIOF_DEFINE_DISK_FILTER macro (discussed in the next section) to
define the callbacks you will have in your filter's Library component. Initially consider including
calls to VMIOF_Log() in each of the callbacks to see when the IO Filter Framework invokes each of
these functions.
c In the middle of the file, declare and define the body of each callback you specify in the
VMIOF_DEFINE_DISK_FILTER macro.
This chapter provides one topic for each callback you can define in a LI.
You have a starting point for fleshing a LI with your own code.
An example VMIOF_DEFINE_DISK_FILTER macro definition for countIO sample filter, taken from the file
countIO.c file is shown below :
VMIOF_DEFINE_DISK_FILTER(
.diskAttach = &TestDiskAttach,
.diskDetach = &TestDiskDetach,
.diskOpen = &TestDiskOpen,
.diskClose = &TestDiskClose,
.diskPropertiesSet = &TestDiskSetProperties,
.diskPropertiesGet = &TestDiskGetProperties,
.diskRequirements = &TestDiskRequirements,
/*
* The following optional event callback functions could be implemented
* as well but are omitted here for brevity.
* .diskSnapshot = ...,
* .diskCollapse = ...,
* .diskClone = ...,
* .diskVmMigration = ...,
*/
.diskIOStart = &TestDiskStartIO,
.diskIOAbort = &TestDiskAbortIO,
.diskIOsReset = &TestDiskResetIOs,
);
NOTE This macro has pointers to all required callbacks. It mentions, but does not provide pointers to
functions for, some of the optional callbacks. The additional optional callbacks not listed in this code sample
include: diskRelease, diskPropertiesFree, diskPropertiesValid, diskGrow, diskStun, and diskUnstun.
n VMIOF_SIDECAR_LIMIT — The maximum number of sidecar files per filter was reached
n VMIOF_SCSI_RESERVATION_CONFLICT - There was a SCSI Reservation conflict error from the underlying
storage
NOTE This VMIOF_STATUS type is used by both utility functions (discussed in “Overview of VMIOF IO Utility
Functions,” on page 181) implemented in the IO Filters Framework and entry point / callback functions
(discussed in “Understanding and Processing diskIOStart Events,” on page 169 and “Understanding and
Processing Other IO Filter Events,” on page 208) that you must implement.
NOTE diskAttach() callback function must complete synchronously. Hence, if there is any long running
task that needs to be done as a result of attaching the filter to the disk, it must be done outside the context of
this callback.
See “Understanding and Processing DiskAttach Events,” on page 155 for further information including an
example code.
The IO Filter Framework invokes a filter's diskDetach() callback when an administrator removes the IO
Filter from a VMDK. It indicates that the said IO Filter is being disassociated from the disk. It is essential
that you release any resources, that you were holding thus far and is no longer needed by your IO Filter like,
the sidecar that is associated with the VMDK. This is done by invoking VMIOF_DiskSidecarDelete(). Refer to
section “Using Sidecars Functions in Library Code to Keep Persistent Per-VMDK Meta Data,” on page 137
for details on sidecar discussion.
NOTE diskDetach() callback function does not need to complete synchronously. The filter must remove
itself from the disk completely. This would require completing any pending work and returning with the
disk data in a consistent state. Depending on the detachFlags member of the VMIOF_DiskDetachInfo
structure, the detach operation could also result in disk deletion. Therefore the VMIOF_DiskDetachInfo
structure has as its members progressFunc and completionFunc to keep the filter framework informed of
progress and subsequent completion of work in case of asynchronous processing.
For further information on diskDetach() including a code sample refer to section “Understanding and
Processing DiskDetach Events,” on page 158
Refer to section, “Understanding and Processing diskOpen Events,” on page 159 for further information
including a code sample.
The IO Filter Framework invokes diskClose() callback when a open vmdk is closed. All work groups must
be waited on to ensure that all queued work has been drained, all poll callbacks must be removed, and all
timer operations if any must be removed before invoking diskclose(). Other cleanup operations like
removing connections if any to a daemon, freeing up the sidecar and instance data structure resources of
your IO filter and finally destroying the heap itself is done as part of diskClose().
Refer to section “Understanding and Processing DiskClose Events,” on page 165for further information
including a code sample.
Each snapshot occurs in the following phases: Prepare, Notification, and possibly Failure. A brief
description of these phases is:
n Prepare — As the name suggests, the prepare phase occurs after snapshot method invocation but before
the vSphere starts performing the snapshot itself. For IO Filters, it allows the filter to perform any
preparation work, such as flushing dirty cache buffers to the disk before vSphere performs the
snapshot. For more information on this phase, see “Understanding and Processing a Snapshot Event,”
on page 193
n Notification - Almost all the snapshot work is done when the IO Filter Framework invokes this callback.
You can consider the snapshot complete when said framework invokes the next diskClose() callback
after this callback.
NOTE It is possible to receive this callback for a Notification phase without receiving it for a Prepare
phase. For example, when vSphere creates a Linked Clone of a VM, the IO Filter Framework invokes
this callback for both Prepare and Notification phases for the VMDK(s) of the source VM. It only
invokes this callback with the Notification phase as vSphere creates each VMDK for the linked clone
(target) VM, doing so as vSphere creates said VMDK(s).
n Failure — Something prevented the snapshot from completing successfully, for example running out of
disk space.
The IO Filter Framework invokes a filter's diskSnapshot() callback function once for each phase of a
snapshot on each disk being snapshotted. The prototype for a diskSnapshot() function is:
n VMIOF_ASYNC — The function cannot complete processing of the notification at this time. The IO Filter
Framework postpones further snapshot processing until the filter calls the function pointed to by info-
>completionFunc (that must match the prototype of VMIOF_DiskOpCompletionFunc, typically in the
context of a work-pile, timer, or poll callback function. If info->completionFunc is NULL, then you
cannot return VMIOF_ASYNC from the function.
Returning any error code other than VMIOF_SUCCESS and VMIOF_ASYNC from the prepare phase causes the IO
Filter Framework to cancel the snapshot process. This causes the IO Filter framework to post a failure
notification to the other IO Filters.
If the code for snapshot contains any long running code, or processes the request asynchronously, it must
invoke the function pointed to by info->progressFunc at least once every 10 seconds to prevent the IO Filter
Framework from timing out the snapshot operation.
n VMIOF_FAILURE — The IO Filter is not willing to allow the collapse to proceed. This results in link
hierarchy remaining unaffected. Since the framework finishes a diskCollapse() before calling this
routine, the chain of links which has been deleted cannot be recovered and the parent link will contain
all of the data in a single file.
NOTE This callback is optional. If the Library component does not have it, the IO Filter Framework assumes
it would return VMIOF_SUCCESS.
The first callback is invoked before deleting the blocks of a virtual disk. The filter has the ability to deny this
operation by returning the appropriate error code if the LI is not ready for the UNMAP operation. It also gives
the LI a chance to do any internal bookeeping necessary to accomodate the operation. The UNMAP can come
from utilities such as vmkfstools, or be initiated inside the guest.
VMIOF_Status
(*diskDeleteBlocksPrepare)(VMIOF_DiskHandle *handle, const VMIOF_DiskDeleteBlocksInfo *info);
This callback returns VMIOF_SUCCESS if the operation is allowed to proceed, else it returns an appropriate
error value.
The second callback is called to signal the status of the block deletion to the filter. It is invoked after a set of
virtual disk blocks are deleted, or if the operation fails. This callback allows the LI to complete any
bookeeping started in the diskDeleteBlocksPrepare callback.
void
(*diskDeleteBlocks)(VMIOF_DiskHandle *handle, const VMIOF_DiskDeleteBlocksInfo *info,
VMIOF_Status status);
To understand how to calculate the memory requirements, see “Understanding and Using a
diskRequirements Callback,” on page 153
n Any other value indicates the filter does now allow for the clone to proceed.
NOTE This callback is optional. If the Library component does not have it, the IO Filter Framework assumes
it would return VMIOF_SUCCESS.
n VMIOF_ASYNC — Processing of the migration event continues asynchronously. The framework will
postpone the migration on until the function calls completionFunc.
n Any other value is taken as a failure causing the IO Filter Framework to abort the migration.
NOTE This callback function is not allowed to block, nor can it perform any long-running activity.
n When the guest VM to which the SCSI device has been added boots up, it tries to read the first few
sectors of the device and also do the same when the device is mounted
n Mounting of a device requires that a filesystem be created on the disk which again requires that the first
few sectors of the device be written
n Regular read() / write() operations on a disk or file thereon. Invocations of these functions may be
coalesced by the guest OS.
Return Value : Only two return values are permitted for this callback - VMIOF_SUCCESS and VMIOF_ASYNC. This
callback function is not allowed to block.
NOTE The Framework may invoke multiple diskIOStart callbacks for the same diskHandle at the same
time.
Prototype :
Return Value : Only the following two return values are permitted for this callback –
Prototype :
In general, the IO Filter framework invokes the diskUnstun() callback to undo a corresponding diskStun()
operation for a given VMDK.
NOTE It is possible for the IO Filter framework to invoke diskUnstun() without a preceding diskStun(). In
that case, this callback function should return without doing anything.
This is an optional callback function. If you do not provide this callback function, the framework assumes
VMIOF_SUCCESS in calling this routine.
Refer to “Understanding and Processing diskStun and diskUnstun Events,” on page 208 for further
information.
NOTE The framework invokes diskRelease without first calling the LI's diskOpen() callback.
In response to this callback, the LI must get whichever cartel has opened the VMDK to close it. For an
explanation of how it does that, see “Understanding and Processing a DiskRelease Event,” on page 228. The
LI reflects its success in getting the other cartel to close the VMDK in its return value, as follows:
n VMIOF_SUCCESS – The other cartel has closed the VMDK. On receiving this return value, the kernel tries
to open the VMDK again. If the retry fails, the operation requesting the open (such as starting the VM)
fails.
n VMIOF_FAILURE – The other cartel could not be determined, meaning this disk has not been opened by a
component belonging to this filter; or refused to close the VMDK, whose result is analogous to a failed
retry described in the preceding bullet.
The parameter to this callback is VMIOF_DiskHandle *handle, an opaque handle to the virtual disk to be
released. You use this parameter in the code that implements this callback, for example to open certain
sidecar files associated with this VMDK. For details on implementing this callback, see “Understanding and
Processing a DiskRelease Event,” on page 228.
NOTE This callback is optional. If the Library component does not have it, the IO Filter Framework assumes
that it would return VMIOF_FAILURE.
Return VMIOF_SUCCESS to allow the grow to proceed, or any other value to abort the grow operation.
diskExtentGetPre() callback is invoked before a disk extent get operation is invoked on the virtual disk. The
filter may modify the start offset if needed. The function prototype for diskExtentGetPre() is :
diskExtentGetPost() callback function of a filter is invoked after the disk extent get operation is completed
on the virtual disk. The filter may modify the start offset, extent's offset and length if needed. The function
prototype for diskExtentGetPost() is :
VMIOF_Status
(*diskPropertiesValid)(VMIOF_DiskHandle *handle, const VMIOF_DiskFilterProperty *const
*properties);
VMIOF_Status
(*diskPropertiesSet)(VMIOF_DiskHandle *handle,
const VMIOF_DiskFilterProperty *const *properties);
Prototype :
Prototype :
void
(*diskPropertiesFree)(VMIOF_DiskHandle *handle, VMIOF_DiskFilterProperty **properties);
When it comes to filter library callbacks, there are three things to consider. First, whether the callback is
required or optional. Some of the callbacks are required, meaning the filter library must implement them,
like diskAttach , while some are optional, like diskSnapshot.
Third, whether the callback can perform a long running operation. If VMIOF_DiskOpProgressFunc is passed in
as part of the parameter for a callback, e.g. VMIOF_DiskOpProgressFunc is part of VMIOF_DiskGrowInfo for the
diskGrow callback, then it is allowed to do a long running operation, but it must report progress using
VMIOF_DiskOpProgressFunc at least every 10 seconds. The following Table lists these three attributes for all
the filter library callbacks.
Other than the scenarios where completionFunc or progressFunc is present, there is an overall non-blocking
rule that applies to all callbacks in a Library Instance. In LI callbacks, you should not wait indefinitely, e.g.
using semaphore, socket read or eventfd read, but mutex / spinlocks are allowed.
NOTE The table only summarizes the typical case. You always need to check whether
VMIOF_DiskOpProgressFunc and VMIOF_DiskOpCompletionFunc exist before calling them. For example, when
you detach a filter from a VMDK of a powered off VM, completionFunc will not be set in diskDetach
callback, so you cannot return ASYNC.
Table 4‑1.
Whether Long Running
Callbacks Required / Optional Sync / Async Operation Allowed (Y/N)
vmkfstools --iofilters
<filtername>:property1="value1":property2="value2":...:property="valueN" <vmdk filename>
An example using a filter named countio with properties numWorkGroups and port would be :
n vmkfstools --iofilters countio:numWorkGroups="3":port="32769" iofilter.vmdk
n #include(s) — The source must #include <vmiof.h> in order to have definitions for the macro you use
to enumerate the mandatory callbacks for Daemons
n Callbacks — The source must define code for certain mandatory callbacks. The IO Filter Framework
does not have optional callbacks for Daemons.
You must reference the source files that make up your daemon code in the SCONS file's Daemon Plugin
dictionary item, in the source files key.
The sub-sections that follow provide details for creating a skeletal Daemon source file.
NOTE The countIO sample provides a skeletal Daemon that you can copy into your filter and use as a
starting point.
VMIOF_DEFINE_DAEMON(
.start = &SCDStart,
.stop = &SCDStop,
.cleanup = &SCDCleanup,
);
The next three sub-topics discuss the prototype of each function, when the IO Filter Framework invokes it,
and the typical things found in the body of each.
The prototype for the start function is defined by the type VMIOF_DaemonOpsStartCB, which is defined as:
That is, you define the start function similar to the following:
VMIOF_Status
daemonStart(void) {...}
Like a Unix / Linux daemon, an IO Filter Daemon's start function typically does the following:
n Make outbound connections — Only Daemons are allowed to open off-host TCP/IP connections, such
as to replication sites or management interfaces of storage devices, etc.
n Listen for inbound connections — Daemons typically accept connections from Library Instances and
possibly from the filter's CIM provider
n Initialize any global state or related data structures — This includes creating at least one heap from
which to dynamically allocate memory for data structures such as those used to keep track of the state
of each client connection. Daemons may also use System V synchronization primitives (semaphores,
shared memory, etc.) and pthread synchronization primitives (mutex's, etc.) to communicate with LIs
and possibly CIM providers.
Unlike a Unix / Linux daemon, the code in the start function must return a status of VMIOF_SUCCESS in a
timely manner to tell the IO Filter Framework that it has successfully completed performing the setup tasks
listed in the preceding list. Otherwise:
n If the Daemon returns a failure code, the IO Filter Framework logs the error and notes that the Filter is
not available. An administrator may diagnose and resolve the problem and then attempt to restart the
daemon manually from the ESXi CLI.
n If the Daemon fails to return any code in a timely manner, the IO Filter Framework kills and attempts to
restart the Daemon several times before eventually giving up, logging the error, and making the filter
unavailable.
Because of the need for start functions to return in a timely manner, if your Daemon uses sockets, you
cannot block waiting for connection requests on said sockets. Instead, you must use the VMIOF_PollAdd()
function to register callbacks that the IO Filter Framework invokes on receipt of IO on said sockets. The
callback for bound sockets typically:
1 Accepts connections (most commonly from an LI, but possibly from a CIM Provider)
2 Creates a data structure to keep track of the conversation with the client
3 Uses VMIOF_PollAdd() to register a different callback for processing the conversation with said client
NOTE On detecting a close on the socket by the client, the Daemon must remove the poll callback by
invoking VMIOF_PollRemove().
For details on using the VMIOF_Poll*() functions, see “Understanding and Using the IO Filters Polling
Functions,” on page 141.
NOTE The daemon in your IO Filter Solution is recommended to use SSL to establish a secure connections
between its LIs and off-host peer daemons, incorporating additional measures to authentic such
connections, such as magic numbers, etc.
1 Check if there are any VMs using the LI of the filter. If so, ESXi refuses the request.
2 Tell the IO Filter Framework to stop the daemon (by invoking the stop callback.
When the Framework invokes the function, it passes a pointer to a function (called the stopped function)
that the daemon must call when it has finished stopping. Upon receipt of the stopped callback, the
Framework signals ESXi that daemon shutdown is complete.
The prototype for the stop function is defined by the type VMIOF_DaemonOpsStopCB, which is defined as:
That is, you define the stop function similar to the following:
void
daemonStop(VMIOF_DaemonStoppedCB stopped, void *data) {...}
The body of this function must undo whatever the start function and subsequent operations have done,
including:
n Signal any VMs with which the daemon is connected that it is shutting down
The stop function cannot be long running. However, the tasks in the preceding list may take some time.
Thus, this function typically creates a work group and queues an asynchronous function to complete said
tasks. If the stop function uses this pattern, ensure that the worker function calls the stopped function.
Otherwise, the stop function must call the stopped function itself.
NOTE In some caching filter design patterns, under certain conditions, daemons on one host act as caches
for VMs running on other hosts. In this case, the daemon must still perform all the steps listed in the
preceding list before ceasing operation.
NOTE The daemon keeps running when ESX enters Maintenance Mode.
It may be desirable for the IO Filter Solution to differentiate between these two conditions. For example
during a complete removal, the logic in a daemon's stop function (and associated code) should delete cache
files from a host's SSD(s). On an upgrade, the logic may wish to leave the cache files in place.
The IO Filter Framework does not provide any reason for the invocation when it calls stop (host shutdown,
filter upgrade, filter removal, etc.). If a filter requires this reason (context), the IO Filter Solution should
include a VWC plugin that drives its update and removal. Such a plugin can send context to the daemon
(through the CIM provider) before invoking the SPBM API to update or remove the filter. The stop function
can then check for this context to determine what action to take, then erasing the context as part of the task
list. Should the stop not find context when it is invoked, it can assume a default context of host shutdown.
The prototype for the cleanup function is defined by the type VMIOF_DaemonCleanupCB, which is defined as:
That is, you define the cleanup function similar to the following:
void
daemonCleanup(void) {...}
REMEMBER The CIMP code for IO Filter examples is located in the directory cim/name (where name is the
sample filter name), within the sample filter.
For example, to copy the CIM code from the sampfilt sample into your filter's source directory, follow these
steps:
2 Make a directory for the CIM content, using the folder name specified in the CIM dictionary definition,
in the cim location key (e.g. CIM/myfilter). For example:
mkdir cim/myfilter1
cd cim/myfilter1
cp -R /opt/vmware/VAIO-6*/src/partners/iofilters/sampfilt/cim/sampfilt/ .
5 In each of the files that the filter name occurs, replace the name of the filter from which you copied the
CIM code with your filter's name.
1 Create an entry for the locale in the CATALOGS section of the .json file that specifies the locale you are
adding and an associated catalog file.
NOTE Currently, you are only required to provide localization for English, using the "en" locale, which is
what the VWC defaults to if it cannot find localization files for the locale currently set in the browser in
which it is running.
Creating New Entries in the CATALOGS section of the .json File for Each Locale
Each sample filter's .json file has the following CATALOGS definition in the PROPERTIES section:
"CATALOGS" : {
"en" : "catalogs/catalog_en.vmsg"
}
This definition defines a single locale, en (English), and specifies that the catalog file for this locale is located
in the catalogs/catalog_en.vmsg file (relative to the scons root directory of the filter).
To add a new entry locale, just create a new line within the CATALOGS definition. The format for each
entry is:
n locale — a string, within double quotes, specifying the locale defined by ISO 639
n filename — a double-quoted string specifying the pathname to the catalog file. The pathname is relative
to the scons root of the filter. The format of this file is defined in the next section.
For example, the following CATALOGS definition provides entries for the six locales supported by the
VWC, plus the default entry:
"CATALOGS" : {
"en" : "catalogs/catalog_en.vmsg",
"en_US" : "catalogs/catalog_en_US.vmsg",
"de_DE" : "catalogs/catalog_de_DE.vmsg",
"ja_JP" : "catalogs/catalog_ja_JP.vmsg",
"zh_CN" : "catalogs/catalog_zh_CN.vmsg",
"fr_FR" : "catalogs/catalog_fr_FR.vmsg",
"ko_KR" : "catalogs/catalog_ko_KR.vmsg",
}
Remember, for each local / catalog pair, you must also create the indicated catalog file as discussed in the
next section.
The following content is taken from the countIO sample's .json file:
"CAPABILITIES": {
"acceleration": {
"min": 1,
"max": 5,
"default": 3
}
The following content is taken from the countIO sample's "en" catalog file:
name.label = countio
name.description = Hello, I am the countio IO filter.
capability.acceleration.label = acceleration
capability.acceleration.description = acceleration level
n name.label and name.description — These two keys define the label and description to display for an
IO Filter. For example, the French version of name.label could be: Nombre IO, and the name.description:
Bonjour, Je suis le Nombre IO filtre.
capability.xxx.label = something
capability.xxx.description = something else
where:
n xxx is the name of the capability / property given in the CAPABILITIES section of the .json file
n something and something else are the values assigned to the keys. NOTE that you are not required
to put the values in quotes. If you do, the quotes themselves become part of the value.
NOTE In ESX 60U1 and 60U2, there is a known issue that your customers may see the filter name shown as
"iofilters.disk.<filteer-name>.name.label" in VWC instead of the name defined in catalog file. This is a bug in
VWC. The workaround is to log out of VWC and re-login. This bug has been fixed for the ESXi 2016 release.
Since this topic involves running commands in multiple environments, the commands shown use different
prompts to indicate both platform and authentication required to perform the task. Specifically:
n DEV$ — You are an ordinary user (or root) on the development platform (for example VMware
Workbench)
Whenever a user-space cartel generates a core dump, ESXi generates a core file of the form command-zdump.#
where command is the name of the program that dumped core, and # is a 3-digit number that sequences the
core files (e.g. 001, 002, etc.). ESXi places many core files in the /var/core directory. One exception is that the
core files for VMX cartels are located in the same folder as the VM's .vmx file. Also in the case of VMX
dumps, the command part of the file is vmx-debug, not just vmx, for example vmx-debug-zdump.001.
Once ESXi generates the core file, to use gdb to debug the issue, follow these steps:
1 On the ESXi host, copy the zdump file to the directory on the development platform that contains the
IO filter project’s .sc file. The following example shows copying the dump file while logged into the
DEV system and in the IO Filter's root directory:
2 In the development machine directory containing the zdump file, type make prep-debug. This displays a
list of the zdump files in the directory and a prompt for specifying the zdump.
NOTE You must run make prep-debug each time you want to examine a new core file.
For example:
3 At the prompt, enter matching characters for the zdump you want and press <ENTER>. This creates a
dstage directory with the necessary tools to run the gdb session.
4 Navigate to the dstage directory and start the gdb session by typing ./gdbiof
DEV$ cd /workspace/acme_sampcache/sampcache/dstage
DEV$ ./gdbiof
Traceback (most recent call last):
File "<string>", line 35, in <module>
File "/usr/share/gdb/python/gdb/__init__.py", line 23, in <module>
'gdb.function': os.path.join(gdb.PYTHONDIR, 'gdb', 'function'),
NameError: name 'os' is not defined
GNU gdb (GDB) 7.2.0.20100903-cvs (build 2014-05-12)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
(gdb) bt
#0 0x05d21092 in _dl_sysinfo_int80 () from
/workspace/acme_sampcache/sampcache/dstage/lib/ld-linux.so.2
#1 0x0bb38ca5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:67
#2 0x0bb3a4e3 in abort () at abort.c:92
#3 0x0d40594d in TestDiskOpen (handle=0x7d30cde8, diskInfo=0xff9d5904) at
/workspace/acme_sampcache/sampcache/sampcache.c:65
#4 0x0b655eaf in FiltLibDiskOpenFilter (dh=0x7d30e1b4, dlInfo=0x7d30f8c0,
iofilters=0x7d30fb70 "countIO", lightWeightOpen=0 '\000',
outCtx=0x7d30e1e4) at bora/lib/filtlib/filtlibDisk.c:1118
#5 FiltLibDiskOpenAllFilters (dh=0x7d30e1b4, dlInfo=0x7d30f8c0, iofilters=0x7d30fb70
"countIO", lightWeightOpen=0 '\000', outCtx=0x7d30e1e4) at
bora/lib/filtlib/filtlibDisk.c:1154
#6 FiltLibCreateContextFromFiltersInt (dh=0x7d30e1b4, dlInfo=0x7d30f8c0,
iofilters=0x7d30fb70 "countIO", lightWeightOpen=0 '\000',
outCtx=0x7d30e1e4) at bora/lib/filtlib/filtlibDisk.c:1287
#7 FiltLib_CreateContextFromFilters (dh=0x7d30e1b4, dlInfo=0x7d30f8c0,
iofilters=0x7d30fb70 "sampcache", lightWeightOpen=0 '\000',
outCtx=0x7d30e1e4) at bora/lib/filtlib/filtlibDisk.c:1368
#8 0x0b656181 in FiltLib_CreateContext (dh=0x7d30e1b4, dlInfo=0x7d30f8c0,
lightWeightOpen=0 '\000', outCtx=0x7d30e1e4) at
bora/lib/filtlib/filtlibDisk.c:1420
#9 0x0b59ff41 in DiskLibFiltLibInit (diskHandle=0x7d30e1b4, dlInfo=0x7d30f8c0,
forceInit=0 '\000') at bora/lib/disklib/diskLib.c:6241
...
#50 0x0555d7d9 in HostdApp::Run (upgradeVMs=false, autoStartVMs=false) at
bora/vim/hostd/main/hostdApp.cpp:892
#51 0x0555598e in HostdMain (argc=2, argv=0xff9d6e8c) at bora/vim/hostd/main/main.cpp:411
#52 0x047f07c5 in main (argc=2, argv=0xff9d6e8c) at
bora/vim/hostd/main/static/main.cpp:62
(gdb) quit
Live Debugging
While writing your IO Filter solution, there may be situations where you might need to debug the process
live. For this very purpose, VAIODK supports remote live debugging on vmkfstools, iofilterd, test-apps
developed for IO Filter and/or vmx process. This section describes the steps for performing live-debugging.
Since this topic involves running commands in multiple environments, the commands shown use different
prompts to indicate both platform and authentication required to perform the task. Specifically:
n DEV$ — You are an ordinary user (or root) on the development platform (for example VMware
Workbench)
It is assumed that you will have access/credentials to the ESX host you want to perform the live-debugging
on. It is also required that you understand the VAIODK framework and understand concepts like vmx,
daemon, hostd, etc before proceeding further.
DEV$ cd /opt/vmware/vaiodk-6.0.0-2799832/src/partners/samples/iofilter/sampfilt
3 On running the make command, you will be prompted to enter the ESXi hostname or IP address. Enter
the relevant information and also provide the password when prompted
4 You will be prompted to enter the cartel id/name to which you want to attach the gdb process. If you
want to debug the LI context, you can choose the vmx process. If your filter has a daemon component,
you can choose the daemon process to debug.
5 Now navigate to the directory name dstage created in your working folder.
DEV$ cd dstage
6 Run the script gdbiof to attach the gdb process to the cartel you selected in Step 4 above.
DEV$ ./gdbiof
Traceback (most recent call last):
File "<string>", line 35, in <module>
File "/usr/share/gdb/python/gdb/__init__.py", line 23, in <module>
'gdb.function': os.path.join(gdb.PYTHONDIR, 'gdb', 'function'),
NameError: name 'os' is not defined
GNU gdb (GDB) 7.2.0.20100903-cvs (build 2014-05-12)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attached; pid = 1000752474
Listening on port 50000
Remote debugging from host 10.112.82.247
warning: Can not parse XML target description; XML support was disabled at compile time
[New Thread 1000752474]
Created trace state variable $trace_timestamp for target's variable 1.
(gdb)
[Thread 1000752474] #1 stopped.
0x0000000022dcd758 in ?? ()
warning: .dynamic section for "/lib64/libgcc_s.so.1" is not at the expected address (wrong
library or version mismatch?)
warning: Could not load shared library symbols for 8 libraries, e.g. /lib64/libX11.so.6.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
(gdb) bt
#0 0x0000000022dcd758 in ppoll (fds=0x3ffe671da08, nfds=12, timeout=0x3ffe671b960,
sigmask=0x0) at
../sysdeps/unix/sysv/linux/ppoll.c:58
#1 0x00000000204db2a4 in PollExecuteDevice (polltab=0x3ffe694e010, class=POLL_CLASS_MAIN,
timeout=<value optimized out>)
at bora/vmx/main/pollVMX.c:3012
#2 0x00000000204dbb27 in PollVMXLoopTimeout (loop=0 '\000', exit=0x218f3245 "",
class=<value optimized out>, timeout=1000000)
at bora/vmx/main/pollVMX.c:2255
#3 0x00000000204c7eac in VMXPoweredOnLoop () at bora/vmx/main/vmx.c:2394
#4 VMXPowerOnMainThread () at bora/vmx/main/vmx.c:2307
#5 VMX_Loop () at bora/vmx/main/vmx.c:244
#6 0x00000000204c3aa6 in MainRun (ac=<value optimized out>, av=0x3ffe671edf8) at
bora/vmx/main/main.c:2276
#7 main (ac=<value optimized out>, av=0x3ffe671edf8) at bora/vmx/main/main.c:575
(gdb) info threads
[New Thread 1000752510]
[New Thread 1000752509]
[New Thread 1000752508]
8 You can now start using gdb commands and attach a breakpoint where desired. Note that there are
multiple threads and attaching gdb to one thread will not stop processing on other threads. Once the
control reaches the breakpoint that you added, you will need to identify the appropriate thread
executing the operation.
(gdb) b SampleFilterDiskStartIO
Breakpoint 1 at 0x23cf0070: file partners/samples/iofilter/sampfilt/sampfilt.c, line 200.
(gdb) c
Continuing.
^C
[Thread 1000752474] #1 stopped.
0x0000000022dcd758 in ppoll (fds=0x3ffe671da08, nfds=12, timeout=0x3ffe671b960, sigmask=0x0)
at ../sysdeps/unix/sysv/linux/ppoll.c:58
58 ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.
in ../sysdeps/unix/sysv/linux/ppoll.c
(gdb) info threads
7 Thread 1000752497 (running)
6 Thread 1000752505 SampleFilterDiskStartIO (handle=0x32abf7d0, io=0xbbaec230) at
partners/samples/iofilter/sampfilt/sampfilt.c:200
5 Thread 1000752507 (running)
4 Thread 1000752508 (running)
3 Thread 1000752509 (running)
2 Thread 1000752510 (running)
* 1 Thread 1000752474 0x0000000022dcd758 in ppoll (fds=0x3ffe671da08, nfds=12,
timeout=0x3ffe671b960, sigmask=0x0)
at ../sysdeps/unix/sysv/linux/ppoll.c:58
(gdb) thread 6
[Switching to thread 6 (Thread 1000752505)]#0 SampleFilterDiskStartIO (handle=0x32abf7d0,
io=0xbbaec230)
at partners/samples/iofilter/sampfilt/sampfilt.c:200
200 {
9 Once you are in the correct thread-context, you can step through the function/code using gdb
commands.
(gdb) bt
#0 SampleFilterDiskStartIO (handle=0x32abf7d0, io=0xbbaec230) at
partners/samples/iofilter/sampfilt/sampfilt.c:200
#1 0x0000000020b07902 in FiltLibWrapIOStart (handle=0x32abf7d0, io=0xbbaec230) at
bora/lib/filtlib/filtlibTrace.c:212
#2 0x0000000020b03c2e in FiltLibDiskIOHandleIO (io=0xbc3003b0) at
bora/lib/filtlib/filtlibDiskIO.c:133
#3 0x0000000020b092a8 in FiltLibUpcallProcessIOs (data=0x32ac37e0) at
bora/lib/filtlib/filtlibUpcall.c:685
Live-debugging a test-app
You can also write your own test-app to test your IO Filter solution. The test-app will be written in a
directory named <tests> in your <filter> directory. In addition to the files that contain the code for your test-
app, you will also create a subdir.json file that defines the rules for compiling your test-app. You will also
need to add the compilation rules in the scons files defined your <filter> directory, see “Creating and
Populating a Correct Scons File,” on page 89. More information on how you can build your test-app can be
found in the document “vSphere APIs for IO Filtering Development Kit (VAIODK) Guide for the Command
Line”.
When you build your IO Filter by running the make command, the test-app is also built. You can verify this
by navigating to the build folder inside the <filter> directory and checking for the <test-apps> folder. The
binary for the test-app will reside inside this folder
DEV$ pwd
/opt/vmware/vaiodk-6.0.0-2799832/src/partners/samples/iofilter/sampfilt/build
DEV$ ls
.cimpdk_clean .cimpdk_stage bundle catalogs cim config init.d shutdown.d test-apps usr
vib
DEV$ cd test-apps
DEV$ ls
ZZZ_corp_tests
2 Once on the ESX host, invoke the test-app so that it is running as a process to which we can attach the
gdb process remotely. E.g. In the case of sampfilt-test1, it is required that the test-app be invoked
continuously through a script since it just exits after printing some data.
#!/bin/bash
while :
do
./sampfilt-test1
sleep 1
done
1 To enable remote live-debugging on the test-app, follow these steps: Note: For this example,
"vmkfstools -v" is also running in a script
DEV$ cd /opt/vmware/vaiodk-6.0.0-2799832/src/partners/samples/iofilter/sampfilt
DEV$ make live-debug
enter an ESXi hostname or ip address: 10.112.80.113
Checking for ssh RSA key on ESXi host 10.112.80.113.
DEV$ cd dstage
DEV$ ls
ZZZ_corp_tests bin etc gdb.cmd gdbiof include init.d lib lib64 sbin share
shutdown.d test-apps usr usr64 var
DEV$ ./gdbiof
Traceback (most recent call last):
File "<string>", line 35, in <module>
File "/usr/share/gdb/python/gdb/__init__.py", line 23, in <module>
'gdb.function': os.path.join(gdb.PYTHONDIR, 'gdb', 'function'),
NameError: name 'os' is not defined
GNU gdb (GDB) 7.2.0.20100903-cvs (build 2014-05-12)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Process /ZZZ_corp_tests/sampfilt-test1 created; pid = 1004367573
Listening on port 50000
Remote debugging from host 10.112.80.155
warning: Can not parse XML target description; XML support was disabled at compile time
[New Thread 1004367573]
Created trace state variable $trace_timestamp for target's variable 1.
3 Once the gdb process is attached to the test-app, you can use gdb commands to do further processing.
(gdb)
[Thread 1004367573] #1 stopped.
0x0000000020a6d270 in ?? ()
bt
#0 0x0000000020a6d270 in _start ()
from /opt/vmware/vaiodk-6.0.0-2697104/src/partners/samples/iofilter/sampfilt/dstage/lib64/ld-
linux-x86-64.so.2
#1 0x0000000000000001 in ?? ()
#2 0x000003fff736ff11 in ?? ()
#3 0x0000000000000000 in ?? ()
(gdb) b main
Breakpoint 1 at 0x2086a6b0: file partners/samples/iofilter/sampfilt/tests/test1.c, line 13.
(gdb) c
Continuing.
warning: Could not load shared library symbols for 2 libraries, e.g. /lib64/libvmiof.so.0.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
n Specify where to create your build folder on your build system, and why
n Create appropriate entries in your Solution's SCONS and JSON files, based on the plans for your
Solution
n Create minimal source files for the library and daemon components of your Solution
n Define the prototype for each of the callback's in a library and daemon component, and when the IO
Filters Framework invokes each
Review Questions
1 Under which directory do you create IO Filter source?
2 In which file do you list the source files that make up the components of your IO Filter Solution?
a Makefile
3 Which of the following callbacks are optional in an LI? (choose all that apply)
a diskAttach
b diskPropertiesSet
c diskRelease
d diskIOStart
b Makefile
5 Which of the following are required callbacks for a Daemon? (choose all that apply)
a start
b IOStart
c stop
d shutdown
6 If your filter dumps core while running in the context of a VMX, where would you find the core file?
a /var/core
d /tmp
a Library (LI)
b Daemon
c CIM Provider
d Test application
8 What entry do you use in a JSON file to specify the amount of static memory needed by a daemon?
c You don't put it in the JSON file, but in the SCONS file, as part of the daemon dictionary definition
9 What entry do you use in the SCONS file to set warning flags for C compilation?
b 'cc warnings' in the definition of each component for which you want to set said flags
c 'cc warnings' in the last parameter to the function that defines a given component
d You don't do this in a SCONS file, but rather set the MAKE_WARNINGS environment variable in
your shell to the flags you want
n “Understanding and Using IO Filter Utility Functions Common to Most Solutions,” on page 130
The following subsections discuss how to use the utility functions provided for these tasks.
The framework serializes many but not all callbacks. For example it will not block timer destruction while a
timer callback runs, and will not invoke a timer callback during a timer destruction call. However, as a
counter example, the framework can invoke diskStun and diskUnstun concurrently with other callbacks.
However, Timers, Workgroups and Poll callbacks are not serialized so its up the IO Filter developer to
provide synchronization.
Since the Timer, Workgroups, and Poll callbacks are invoked asynchronously to other callbacks, you need to
ensure that there is no race between diskClose and these other callbacks when diskClose frees the LI private
data. To prevent this race, the diskClose callback should remove any pending Poll, and Timer callbacks, and
wait for all remaining Workgroups to complete before it frees the LI private instance data.
NOTE The VAIO provides utility functions beyond those discussed in the next few sub-sections, for
example to create and submit IO requests. Those utility functions not discussed in the following subsections
are discussed in more appropriate contexts later in this chapter.
You define the size of each IO Filter heap you create, ideally using a utility function provided by VAIO
(discussed later in this topic), based on the set of structures your filter would expect to create during normal,
or even stressed, operations. You may define multiple heaps in your filter. Regardless, the IO Filter
Framework enforces memory limits specified in the diskRequirements callback for LIs.
At a high level, there are three sets of functions you use to manage memory within an IO Filter solution:
n Setup — There is a set of functions used to define a heap handle, estimate the size of the heap, and then
actually create the heap. You typically invoke these functions during diskOpen, diskAttach / diskDetach,
diskRelease, and daemonStart callbacks. You may also create separate heaps in your Daemon each time
it receives a connection with a LI, CIM Provider, or another Daemon, with said heaps sized to keep
track of the state of transactions with those entities.
n Operation — There is a set of functions your filter components use during normal operation, during a
diskAttach or diskRelease operation, to allocate and free memory from the heap(s) it set up. Allocations
can be either aligned or unaligned.
n Cleanup — There is a function used during diskClose, diskAttach / diskDetach, diskRelease and
daemonStop / daemonCleanup callbacks, to destroy the heap(s) created during setup.
A key thing to remember is that, whatever heap you create, you must destroy before the context of your
filter goes away. Whatever memory you allocate from a heap, you must free before the heap is destroyed.
Failure to follow these rules causes the framework to crash your filter, which can result in crashing a VM.
DANGER When you create a heap for an LI (typically in the body of a diskOpen() callback), you must only
access that heap while in the context of that LI, including freeing and memory allocated from it and
destroying it. For example, you cannot allocate a heap in the context of VMDK1, and then destroy it or free it
in the context of VMDK2. Doing this will crash the filter. Put another way, perform all heap operations
within the same LI that created said heap.
Thus, at a high level, the steps for using these functions are:
1 Declare a pointer to VMIOF_HeapHandle. This represents the opaque handle to the heap that you create.
You use this handle later when you dynamically allocate memory units from the heap and also when
you finally destroy the heap when you no longer need it.
2 Estimate the required heap size based on the set of allocations you expect your filter component to
make (for example 1 struct foo per open VMDK, one struct bar per outstanding IO, multiplied by the
number of possible open disks and outstanding IOs, respectively). The estimation of required space also
must include space to provide alignment for certain data structures your filter component may need to
allocate, such as buffers for doing I/O to sidecars.
3 Create the new heap object based on the estimated heap size.
4 Allocate memory, either aligned or non-aligned, from the heap created in Step 3.
NOTE Always free the same unit you allocated. Always free all memory you allocated from a heap before
destroying said heap.
WHAT'S NEW Starting in release 60U2, the frameworks supports one single heap across different LIs within
the same VMX cartel. However, we don't have a proper way to report this single heap requirement, and
everything reported in VMIOF_DiskRequirements will be counted once for every LI. This will be fixed in
the 2016 release.
An alternative way to have memory accessible to different LIs is to use mmap with MAP_SHARED. Just like
System V Shared Memory Segments, mmap with MAP_SHARED comes from the kernel directly, so there is
no limitation. However, the page table for it comes from different resource pools. For the VMX cartel, it
comes from the uwshmempt page pool (beginning in 60U2), for other cartels, it comes from resource pools
with a fixed limit. You don't need to account for the System V Shared memory itself, but you need to
account for the page table overhead for the Daemon in DAEMON_MEMORY_RESERVATION and in
diskRequirements() for LIs.
Estimating the size of the heap: Understanding the VMIOF_HeapAllocation type and
VMIOF_HeapEstimateRequiredSize ()
Estimating the size of the heap(s) required by your code is critical to overall system behavior, as well as
functionality of your filter. Creating heaps larger than you need potentially deprives other parts of the
system of memory. Creating heaps smaller than needed will prevent your filter from functioning properly.
NOTE At no time can a LI create heaps that, in aggregate, exceed the total amount of memory returned by
the diskRequirements callback.
n Which data structures you will dynamically allocate from said heap
n The maximum number of those data structures will you need allocated at any given time
.Instead of calculating the worst-case combination of these structures your self, use
VMIOF_HeapEstimateRequiredSize() to do this for you. To use this function, you must first create an array of
structures of type VMIOF_HeapAllocation, each of which describes one of the data structures the filter
component will allocate from this heap, its alignment requirements (if any), and the maximum number of
the data structures the heap must support at any given time.
After declaring an array of these structures, initialized with data for the data structures to be allocated from
the heap, use VMIOF_HeapEstimateRequiredSize() to estimate the heap’s size. This function determines the
minimum heap size required such that the heap has enough room to allocate any combination of the data
structures described by the array of VMIOF_HeapAllocation structures. The prototype for this function is:
n const VMIOF_HeapAllocation *allocations — This input parameter represents the base address of an
allocations array of VMIOF_HeapAllocation structures, that describe all sets and types of memory
allocations to be performed on the heap.
n size_t count —This input parameter is the total number of VMIOF_HeapAllocation objects in allocations
put together.
The function returns the estimated heap size as a size_t. On failure, it returns 0.
Understanding VMIOF_HeapCreate ()
Use VMIOF_HeapCreate() to create a new heap from which you can dynamically allocate memory. . . The
prototype of this function is:
n VMIOF_HeapHandle **heap — This output parameter is the opaque handle to the heap object that gets
created upon success. Use this handle for all subsequent heap operations, including allocating / freeing
memory and destroying the heap.
n Any other value — The heap creation failed. The error code indicates the reason for the failure.
Understanding VMIOF_HeapDestroy ()
When a heap is no longer needed, call VMIOF_HeapDestroy() to destroy the heap created with
VMIOF_HeapCreate(). The prototype for this function is:
The parameter to this function is VMIOF_HeapHandle *heap, the handle to the heap to destroy, what was
initialized by a call to VMIOF_HeapCreate().
The function returns VMIOF_SUCCESS when the heap is successfully destroyed, or some other code on error.
NOTE The heap must be empty before the framework will destroy it. Attempting to destroy a heap that is
not empty results in a crash in the framework.
Understanding VMIOF_HeapAllocate ()
Call VMIOF_HeapAllocate() to dynamically allocate memory from a heap. The prototype of this function is:
n size_t size — This input parameter is the size of the memory block to be allocated.
The function returns a void pointer to the memory allocated, or NULL on failure.
Understanding VMIOF_HeapAllocateAligned ()
Use VMIOF_HeapAllocateAligned() to dynamically allocate memory from a heap, with the memory aligned
on a specified boundary. The prototype for this function is:
n size_t size — This input parameter is the size of the memory block to be allocated.
n size_t alignment — This input parameter provides the desired alignment of the memory.
The function returns a void pointer to the memory allocated, or NULL on failure.
Understanding VMIOF_HeapFree ()
When you are done using any block of dynamically allocated memory, use VMIOF_HeapFree() to return it to
the heap from whence it came. The prototype of this function is:
n VMIOF_HeapHandle *heap — This input parameter is the handle to the heap from which the memory was
originally allocated.
n void *memory — This is a pointer to the memory to free. It should point to memory that was allocated
from the indicated heap, that is no longer needed, and not already freed. It must be the same pointer
returned by VMIOF_HeapAllocate*(). That is, you cannot free pieces of allocated memory from a heap.
NOTE freeing an already freed block or a NULL pointer results in a crash in the framework.
1. VMIOF_Status
2. SampleFilterDiskOpen(VMIOF_DiskHandle *diskHandle, const VMIOF_DiskInfo *diskInfo)
3. {
4. VMIOF_Status res;
5. size_t heapSizeEstimate;
6. InstanceData_t *id;
7. uint32_t idx, count=diskInfo->linksInChain;
8. VMIOF_HeapAllocation allocations[] = {
9. { sizeof(InstanceData_t), 0, 1 }, /* instance data */
10. { MY_SIDECAR_SIZE, VMIOF_DISK_SIDECAR_ALIGN, 1 }, /* sidecar */
11. { MY_OWNER_SIDECAR_SIZE, VMIOF_DISK_SIDECAR_ALIGN, 1 }, /* sidecar */
12. { sizeof(SampfiltIOXact_t), 0, MAX_OUTSTANDING_IOTS}, /* io transactions */
13. };
14. size_t numAllocations = sizeof(allocations)/sizeof(VMIOF_HeapAllocation);
15. VMIOF_HeapHandle *instanceHeapHandlep;
16. char mbuf[MAX_FLAGS_STRING_SIZE];
17. VMIOF_DiskSidecar *ownerSidecarHandlep;
18. char *base;
19. ...
20. heapSizeEstimate = VMIOF_HeapEstimateRequiredSize(allocations,
21. numAllocations);
22. res = VMIOF_HeapCreate(heapSizeEstimate, &instanceHeapHandlep);
23. if( VMIOF_SUCCESS != res ) {
24. /* couldn't create the heap */
25. VMIOF_Log(VMIOF_LOG_ERROR,"NoopfiltLI(%s): error creating filter heap "
26. "(%d)\n", __func__, res);
27. return res;
28. }
29. VMIOF_Log(VMIOF_LOG_ERROR,"NoopfiltLI: created heap\n");
30. ...
31. id = VMIOF_HeapAllocate(instanceHeapHandlep, sizeof(InstanceData_t));
32. if(!id) {
33. VMIOF_Log(VMIOF_LOG_ERROR,"Sampfilt(%s): Could not allocate instance "
34. "id. Failing the open.\n", __func__);
35. VMIOF_HeapDestroy(instanceHeapHandlep);
36. return VMIOF_NO_MEMORY;
37. }
38. VMIOF_Log(VMIOF_LOG_ERROR,"SampFilt: did ID heap allocation\n");
39. bzero(id,sizeof(*id)); /* Heap allocations are not zero'd */
40. ...
41. /* allocate buffer for sidecar from heap aligned as required */
42. id->mySidecarp = (SampfiltSidecar_t *)VMIOF_HeapAllocateAligned(
43. id->heapHandlep, MY_SIDECAR_SIZE, VMIOF_DISK_SIDECAR_ALIGN);
44. if (NULL == id->mySidecarp) {
45. VMIOF_Log(VMIOF_LOG_ERROR,"NoopfiltLI(%s): could not allocate sidecar "
46. "buffer\n", __func__);
47. close(id->daemonSockFD);
48. VMIOF_HeapFree(id->heapHandlep, id);
49. VMIOF_HeapDestroy(instanceHeapHandlep);
50. return VMIOF_NO_MEMORY;
51. }
52. }
53. …
54.. VMIOF_Status
55. SampleFilterDiskIOStart(VMIOF_DiskHandle *handle, VMIOF_DiskIO *io) {
56. …
57. /* create an I/O transaction for this request */
58. if (NULL == (iotp = VMIOF_HeapAllocate(id->heapHandlep, sizeof(SampfiltDelayIO_t)))) {
59. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskIOStart): can't HeapAlloc iotp for local
worker. just continuing the IO for now\n");
60. return VMIOF_SUCCESS;
61. } /* heap alloc */
62. iotp->io = io;
63. iotp->handle = handle;
64. iotp->sequence = sequence++;
65. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client: Client(DiskStartIO): async-local queueing
work\n");
66. res = VMIOF_WorkQueue(id->workGroupp, SampfiltDelayIOWorker, (void *)iotp);
67. if( VMIOF_SUCCESS != res) {
68. /* well, that didn't work */
69. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskStartIO): could not add delayIO to the
work pile (%d).",res);
70. return res;
71. } else { /* succeeded in adding to the work pile */ {
72. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskStartIO): added delay IO to the work
pile\n");
73. }
74. return VMIOF_ASYNC;
75. }
76. …
77. VMIOF_Status
78. SampleFilterDiskClose(VMIOF_DiskHandle *diskHandle)
79. {
80. …
81. /* free the sidecar buffers BEFORE freeing the instance data */
82. VMIOF_HeapFree(id->heapHandlep,id->mySidecarp);
83. /* free our instance data from the heap */
84. tmp = id->heapHandlep; /* save this pointer, first */
85. VMIOF_HeapFree(id->heapHandlep,id);
86. VMIOF_HeapDestroy(tmp);
87. …
88. }
n Line 8-13: allocations[]: An array of type VMIOF_HeapAllocation with the elements we will be
allocating over the life of the heap.
n Line 20 : Estimate the size of the heap to be created and capture the return value of
VMIOF_HeapEstimateRequiredSize() in heapSizeEstimate.
n Line 22-28 : Create the heap of size heapSizeEstimate via VMIOF_HeapCreate(). Capture the return value in
local variable res, and a pointer to the heap in instanceHeapHandlep. If heap creation not successful,
error message is sent to the log.
n Line 31-37 : Allocate a structure of type InstanceData_t from the heap and capture the return value of
VMIOF_HeapAllocate() in pointer id. If allocation not successful, send appropriate error message to log,
and destroy the heap.
n Line 39: Upon successful allocation of structure InstanceData_t from heap, zero out the structure as
Heap allocations are not automatically zeroed.
n Line 58-61: Allocate a SampfiltDelayIO_t structure using VMIOF_HeapAllocate(). If the allocation was not
successful, send an error message to the log.
n Line 66-73: You are ready to do asynchronous local queueing of work. Enqueue the work into the
workgroup queue. The function associated with the work thread is SampfiltDelayIOWorker() and it
accepts the pointer iotp. If work enqueue is not successful send an error message to the log.
n Line 82 : Free the previously allocated sidecar structure pointed to by id, once you are done using it, by
calling VMIOF_HeapFree().
n Line 85 : Free the previously allocated structure of type InstanceData_t pointed to by id, after saving off
the heap pointer in a temporary variable (tmp), by calling VMIOF_HeapFree().
n Line 86: Once you are done using the heap, destroy it using VMIOF_HeapDestroy(). Ensure that you free
all previously allocated memory blocks before calling this function.
The following sub-sections provide information on using each of the sidecar-related functions.
n Each read from or write to a sidecar file must be at an offset that is a whole multiple of
VMIOF_DISK_SIDECAR_ALIGN bytes
n The address of the memory (RAM) from which you read or to which you write data must be on a
VMIOF_DISK_SIDECAR_ALIGN boundary
It is important that you use this macro in your code rather than hard-coding its current value, as VMware
reserves the right to change the value associated with this macro at any time. Failure to use the appropriate
values for offsets, sizes, or addresses cause the sidecar-related functions to fail, returning appropriate status
codes.
The VAIODK defines the type VMIOF_DiskSidecar which is analogous to a file descriptor in POSIX file
functions. Your code receives pointer to a valid VMIOF_DiskSidecar structure when it successfully creates or
opens a sidecar. Your code must pass this pointer into subsequent functions to read from / write to / delete /
close sidecar files.
You can create a small number of sidecar files for each VMDK that your code filters. To distinguish between
sidecars, your code must assign a unique key to each. Keys are of type uint32_t are must be within the range
VMIOF_DISK_SIDECAR_KEYMIN and VMIOF_DISK_SIDECAR_KEYMAX. Thus, the number of sidecars allowed is
VMIOF_DISK_SIDECAR_KEYMAX - VMIOF_DISK_SIDECAR_KEYMIN.
n VMIOF_DiskHandle *handle — The handle of the VMDK as passed into the diskAttach or
diskPropertiesSet callback
n uint64_t size — The size of the sidecar file. This must be a whole multiple of
VMIOF_DISK_SIDECAR_ALIGN
n VMIOF_SUCCESS — The function succeeded. It created the sidecar and set outh to point to the
VMIOF_DiskSidecar handle to be used in subsequent operations with said sidecar.
n VMIOF_NOT_SUPPORTED —The function failed because it was not invoked in the context of a diskAttach or
diskPropertiesSet callback
n VMIOF_SIDECAR_LIMIT — The function failed because the LI has already created the maximum number of
sidecar files per LI for this VMDK
n VMIOF_DiskHandle *handle— The handle of the VMDK as passed into the diskAttach or
diskPropertiesSet callback
n VMIOF_SUCCESS — The function succeeded and the sidecar file was deleted
n VMIOF_NOT_SUPPORTED — The function failed because it was not invoked in the context of a diskDetach or
diskPropertiesSet callback
n VMIOF_BUSY — The function was invoked for an already opened sidecar. The filter must close the sidecar
using VMIOF_DiskSidecarClose before calling this function.
n VMIOF_DiskHandle *handle— The handle of the VMDK as passed into the callback
n uint32_t key — The key representing the sidecar. It must be within the range
VMIOF_DISK_SIDECAR_KEYMIN and VMIOF_DISK_SIDECAR_KEYMAX.
n VMIOF_SUCCESS — The function succeeded and the sidecar file was indeed opened. outh now contains a
valid handle to an opened sidecar
n VMIOF_NOT_SUPPORTED — The function was invoked from an unsupported filter operation. In other
words, the function was not invoked in the context of diskOpen, diskClose, diskAttach,
diskPropertiesSet, diskDetach, diskGrow, diskClone, diskCollapse, diskVmMigration, diskSnapshot and
diskRelease callback functions
n VMIOF_DiskSidecar *handle — The handle to the sidecar that you are closing
n VMIOF_NOT_SUPPORTED — The function was invoked from an unsupported filter operation. In other
words, the function was not invoked in the context of diskOpen, diskClose, diskAttach,
diskPropertiesSet, diskDetach, diskGrow, diskClone, diskCollapse, diskVmMigration, diskSnapshot and
diskRelease callback functions
n VMIOF_DiskSidecar *handle — The handle to the sidecar that you are reading
n void *buffer — This is a pointer to a buffer into which the sidecar data is read into. This memory
should be VMIOF_DISK_SIDECAR_ALIGN aligned.
n uint64_t numBytes — The number of bytes to read from the sidecar file into the buffer. This value must
be a multiple of VMIOF_DISK_SIDECAR_ALIGN
n uint64_t byteOff — The offset in the sidecar file at which to start reading. This offset must be a
multiple of VMIOF_DISK_SIDECAR_ALIGN
n VMIOF_SUCCESS — The function succeeded and the buffer now contains the data that was read from the
sidecar file
n VMIOF_OUT_OF_RANGE — You are trying to read beyond the size of the sidecar's capacity
n VMIOF_DiskSidecar *handle — The handle to the sidecar that you are writing into
n void *buffer — This is a pointer to a buffer with data that will be written into the sidecar file. This
memory should be VMIOF_DISK_SIDECAR_ALIGN aligned.
n uint64_t numBytes — The number of bytes to write from the buffer into the sidecar. This value must be
a multiple of VMIOF_DISK_SIDECAR_ALIGN
n uint64_t byteOff — It is the offset in the sidecar file at which to start writing. This offset must be a
multiple of VMIOF_DISK_SIDECAR_ALIGN
n VMIOF_SUCCESS — The function succeeded and the data in the buffer is written into the sidecar file
n VMIOF_OUT_OF_RANGE — You are trying to write beyond the size of the sidecar's capacity
n VMIOF_DiskSidecar *handle — The handle to the sidecar whose size you are trying to retrieve
n VMIOF_SUCCESS — The function succeeded and size now contains the retrieved sidecar file size
VMIOF_DiskSidecarSetSize — Use this function to resize your sidecar file. You can call this function only in
the context diskAttach, diskPropertiesSet and diskGrow
n VMIOF_DiskSidecar *handle — The handle to the sidecar that you are resizing
n VMIOF_SUCCESS — The function succeeded and size now contains the new size of the sidecar file
n VMIOF_NOT_SUPPORTED — The function was invoked from an unsupported filter operation. The function
failed because it was invoked outside the context of diskAttach, diskPropertiesSet and diskGrow
Important Notes
The VAIO implementation limits when you can invoke certain sidecar functions, including:
n You can only invoke VMIOF_DiskSidecarCreate() during the diskAttach or diskPropertiesSet callbacks.
n You cannot call any sidecar functions while a VM is stunned. Thus you need to keep the current stun
status of a VMDK's VM in its VMDK private data, not in its sidecar.
The IO Filter framework imposes no limitation on the size of the sidecar files. However, the downside of
having large sidecars is the time spent in copying the sidecar during snapshot operations. A VM can see
large stun times during the snapshot create operation if the sidecars are large because the sidecars are
copied from parent disk to child disk while the VM is stunned. The same is true when the sidecars are
copied from child disk to parent disk during the snapshot consolidate operation.
NOTE The vm-support bundle, used for crash analysis, can collect the first 1MB of each sidecar file.
However, this is not enabled by default, you have to specify it explicitly as follows. Please note that this
feature doesn't work on VVOL and vSAN datastores.
For example, two coding patterns for server programs that use sockets are :
n Many-simple threads:
n For each connection accepted, spawn a new thread to process and respond to requests on the file
returned by accept()
n IO Filters are discouraged from creating additional threads in either their daemon or Library Instance
code as excessive threads may cause performance problems and may even cause ESXi to crash
n Library Instances and Daemons only get invoked through their entry-point functions. The Library
Instance routines must eventually call return to continue the IO flow on the VM . The Daemon's must
call return in a timely manner or they will be killed by the IO Filter Framework's Daemon watchdog.
Instead of using either of the patterns described above, the proper way to wait for file IO within the IO Filter
Solution is to use its poll functions. At a high level, your code registers each file descriptor for which it
wants to be notified of pending IO with the framework via the VMIOF_PollAdd() function (discussed in
detail later in this topic), associating a callback function with the file descriptor. Whenever there is specific
IO pending on this file descriptor, the IO Filter Framework invokes the associated callback to process the IO.
NOTE Poll functions available in the IO filter framework, currently support operations on sockets and pipes
only.
Given the above details, Common examples of using the poll functions include:
n Daemon code waiting on incoming requests / commands from either Library Instances or CIM
Providers
n Library instances waiting for results from deferred IO requests sent to a Daemon for fulfillment
The following sub-sections provide details and an example of using the VAIO poll functionality.
NOTE Poll functions won't be called during the callbacks of diskOpen and diskClose. The same thread that
calls all the poll functions is also responsible for these two callbacks. As a workaround, we recommend you
cancel all timers, and poll callbacks in diskClose, and then use poll/select directly.
Understanding VMIOF_PollHandle
The VAIO API provides an opaque data type VMIOF_PollHandle. This represents a handle to the poll callback
function that gets registered via VMIOF_PollAdd() (discussed in detail in the next topic). This function is
invoked upon occurrence of the event being polled, example read/write operation on the file descriptor
(socket/pipe).
VMIOF_PollHandle **poll;
Understanding VMIOF_PollAdd ()
Use VMIOF_PollAdd() to register a poll callback for a certain event, like read or write operation on a socket or
pipe file descriptor. VMIOF_PollAdd() has the prototype in vmiof_poll.h .
VMIOF_Status
VMIOF_PollAdd(VMIOF_FileHandle file, VMIOF_PollEvent event, VMIOF_PollCallback
callback, void *data, VMIOF_PollHandle **poll);
n VMIOF_FileHandle file — This is the file descriptor of the socket or pipe that is polled
n VMIOF_PollEvent event — The refers to the event being polled. It is read or write operation on the
socket or pipe.
n VMIOF_PollCallback callback - This function pointer is the poll callback function that is registered to be
invoked upon occurrence of VMIOF_PollEvent
n void *data — A pointer to data to be passed to VMIOF_PollCallback function by the IO Filter Framework.
NOTE In both LIs and Daemons, no blocking functions may be called in Poll callbacks.
Understanding VMIOF_PollRemove ()
Use VMIOF_PollRemove() to remove or unregister the VMIOF_PollCallback(). You use this function when you
no longer want to poll the event (read or write) associated with the socket or pipe file descriptor.
VMIOF_PollRemove() has the prototype in vmiof_poll.h.
VMIOF_Status
VMIOF_PollRemove(VMIOF_PollHandle *poll);
n VMIOF_PollHandle *poll This is the opaque handle to the poll callback function that you remove or
unregister, associated with the event (read or write) on the file descriptor of the socket or pipe.
IMPORTANT Be sure to call this function on a file descriptor before you close the file descriptor. If you fail to
remove the poll before you close the file descriptor, the framework will call the callback function
continuously.
0. char Mbuf[1024];
1. /* Declare the poll handle for the callback function to be registered via VMIOF_PollAdd()*/
2. VMIOF_PollHandle *SampfiltLISocketPollHandle;
3. int SampfiltSockFD;
4. void SampfiltDaemonReadWorker(void *datap) {
5. SampfiltDelayIO_t delayIO;
6. if(-1 == read(SampfiltSockFD,&delayIO,sizeof(delayIO))) {
7. /*Unable to read Daemon socket data*/
n Line 0 : Define a character buffer, Mbuf that the code uses for buffering log messages
n Line 3 : Define SampfiltSockFD of type integer holds the file descriptor of the socket
n Line 4-16 : SampfiltDaemonReadWorker() is the callback function that is registered via VMIOF_PollAdd().
This function reads data from the socket.
n Line 6-9: Read from socket using its file descriptor SampfiltSockFD. In case the read fails log an
appropriate error message indicating the failure
n Line 18-33 : Define SampleFilterDiskOpen(). Not shown in this code snippet, the Library Instance defines
this function as the entry point for the diskOpen event. Specifically:
n Line 20: Declare res to hold the return values from certain VAIO functions called by this function
n Line 22 - 25 : Create the socket, an endpoint for communication and accept the return value in the
socket file descriptor SampfiltSockFD. Log appropriate error message upon failure to create the
socket.
n Line 28 - 33 : Upon successful creation of the socket add it to the poll list to monitor it for read
event using VMIOF_PollAdd(). The first parameter to VMIOF_PollAdd() is the socket file descriptor
SampfiltSockFD, the second parameter VMIOF_POLL_EVENT_READ indicates that the event being polled
is a read event on the socket, the third parameter SampfiltDaemonReadWorker() is the callback
function to invoke upon occurrence of the read event, the fourth parameter (void *)NULL indicates
that the callback function SampfiltDaemonReadWorker() being registered does not expect any
parameter. The fifth parameter is an output parameter that holds an opaque handle to the callback
function SampfiltDaemonReadWorker() being registered. This handle is needed for any future
manipulation, for example unregistering the callback function when you no longer want to poll the
socket for the event. The return value of VMIOF_PollAdd() is captured in res and checked for
success. Appropriate error message is logged upon failure.
n Lines 36-42 : Define SampleFilterDiskClose(). Not shown in this code snippet, the Library Instance defines
this function as the entry point for the diskClose event. Specifically:
n Line 40: Removes or unregisters the poll callback function SampfiltDaemonReadWorker() via the
VMIOF_PollHandle. The event on the socket will no longer be polled or monitored subsequently.
Normally code should check the return value of this function. For simplicity, this function discards
and does not check the return value
Thus, in general, whenever ESXi opens a VMDK associated with this filter, the library code creates a socket
to talk to the demon (that might be providing caching or replication services on a different ESXi host)
polling it, for read event. Upon occurrence of the event the callback function SampfiltDaemonReadWorker() is
called and IO continues. When ESXi closes the disk, the registered callback is removed via
VMIOF_PollRemove().
n Handles to timer ( “Understanding and Using the IO Filters Timer Functions,” on page 171 ), poll
( “Understanding and Using the IO Filters Polling Functions,” on page 141 ), and worker callbacks
( “Understanding and Using the IO Filters Worker Functions,” on page 174).
n VM stun state, for example to determine whether you can perform Sidecar operations as discussed in
the next section
n A cached copy of sidecar data that changes quickly, where continually updating the sidecar would have
an adverse performance impact
The VAIO provides a method for filters to associate arbitrary information (defined by the Filter Solution,
called Filter-Private Data) with a VMIOF_Diskhandle pointer, and then retrieve that data on demand. The
functions are:
n void VMIOF_DiskFilterPrivateDataSet(DiskHandle *handle, void *data)
Use this function to retrieve the Filter-Private data associated with handle
Because the data is meant to be per-VMDK, and filters should be designed to support an arbitrary number
of VMDKs, filter library's typically:
n Can Define the Filter-Private data structure in a .h file
n Retrieves the data, (using VMIOF_DiskFilterPrivateDataGet()) in each of the other callbacks that require
the data. You may also consider passing the Filter-Private data as the parameter to timer, poll and
worker callback functions (discussed in the previously referenced sections).
NOTE In addition to the example content listed previously, Filter-Private Data typically includes a pointer
to the VMIOF_DiskHandle with which the data is associated. This allows timer/poll/worker callback functions,
which don't receive a said handle, to retrieve the data when the IO Filter Framework invokes said callbacks.
NOTE Library Instances should not share Heaps with each other. Filter Private Data should be allocated
from its own heap.
1. /* data needed per filtered VMDK. Allocated on DiskOpen() and freed on DiskClose() */
2. typedef struct InstanceData_s {
3. VMIOF_DiskHandle *handle;
4. int SampfiltSockFD;
5. VMIOF_PollHandle *SampfiltLISocketPollHandle;
6. VMIOF_DiskSidecar *mySidecarHandlep;
7. SampfiltSidecar_t *mySidecarp; /* pointer to mmaped space for reading/writing
sidecar data */
8. VMIOF_WorkGroup *workGroupp;
9. VMIOF_WorkHandle *workHandlep;
10. // DoneExperimenting VMIOF_TimerHandle *timerHandlep;
11. VMIOF_TimerHandle *snapTimerHandlep; /* handle for snapshot progress timer */
12. uint64_t snapCount; /* how far we've progressed through the simulated
'work'of a snapshot */
13. pthread_cond_t snapCondVar; /* used for signalling when snap progress is
complete */
14. pthread_mutex_t snapMutex;
15. char stunStatus; /* 0=unstunned, 1=stunned */
16. } InstanceData_t;
1. /* **************************************************************************
2. * Copyright 2014 VMware, Inc. All rights reserved.
3. * -- VMware Confidential
4. * **************************************************************************/
5.
6. #include "vmiof/vmiof_disk.h"
7.
8. #include "sampfilt_delayIO.h"
9. #include "sampfilt_instanceData.h"
10.
11. #define THIRTY_SECONDS (uint64_t)30000000 /* 30 million microseconds = 30 seconds */
12. void
13. TimerCallback(void *datap) {
14. static int count = 1;
15. VMIOF_Status res;
16. InstanceData_t *id = (InstanceData_t *)datap;
17.
18. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(TimerCallback): Disk %p: Timer fired!\n"id-
>handle);
19. ...
20. return;
21. } /* TimerCallback() */
22.
23. VMIOF_Status
24. SampleFilterDiskOpen(VMIOF_DiskHandle *diskHandle, const VMIOF_DiskInfo *diskInfo)
25. {
26. VMIOF_Status res;
27. struct sockaddr_in serv_addr;
28. struct hostent *server;
29. size_t heapSizeEstimate;
30. InstanceData_t *id;
31. uint32_t idx, count=diskInfo->linksInChain;
32.
33. ...
34. /* Heap is created before this*/
35. id = VMIOF_HeapAllocate(SampfiltHeapHandlep, sizeof(*id));
36. if(!id) {
37. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskOpen): Could not allocate instance id.
Failing the open on\n");
38. return VMIOF_NO_MEMORY;
39. } else {
40. bzero(id,sizeof(*id)); /* just to be sure */
41. }
42. pthread_cond_init(&id->snapCondVar, NULL); /* initialize the condition variable */
43. pthread_mutex_init(&id->snapMutex, NULL); /* initialize the mutex variable */
44. id->handle = diskHandle;
45. /* create a timer */
46. if( VMIOF_SUCCESS != (res = VMIOF_TimerAdd(THIRTY_SECONDS, TimerCallback, (void *)id,
&id->timerHandlep))) {
47. /* couldn't create the timer */
48. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskOpen): creating timer failed (%d)",res);
49. return res;
50. } else {
51. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskOpen): created timer\n");
52. } /* if VMIOF_TimerAdd() */
53. VMIOF_DiskFilterPrivateDataSet(diskHandle, (void *)id);
54. ...
55. return res;
56. } /* SampleFilterDiskOpen() */
57.
58. VMIOF_Status
59. SampleFilterDiskClose(VMIOF_DiskHandle *diskHandle)
60. {
61. VMIOF_Status res = VMIOF_SUCCESS;
62. InstanceData_t *id;
63.
64. /* announce we are here, and print the parameters */
65. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskClose): handle=%p\n",diskHandle);
66. id = (InstanceData_t *)VMIOF_DiskFilterPrivateDataGet(diskHandle);
67. ...
68. /* cancel the timer, remove the workgroup, remove the poll */
69. (void)VMIOF_TimerRemove(id->timerHandlep); /* don't care about the return value here */
70. ...
71. /* free our instance data from the heap */
72. VMIOF_HeapFree(id->HeapHandlep,id);
73. ...
74. return res;
75. } /* SampleFilterDiskClose() */
n Line 11 declares a constant used when creating a timer that fires every 30 seconds (see line 52)
n Lines 12-21 define the function TimerCallback() which is registered as the callback for the 30-second
timer in line 52. NOTE that this callback expects to receive an InstanceData_t * in its data parameter.
The code shown prints out the VMIOF_DiskHandle associated with this InstanceData. The handle is
stored in the InstanceData in line 50.
n Lines 23-56 define the diskOpen() callback, SampleFilterDiskOpen(), which does the following:
n Lines 26-31 declare local variables, including a pointer to an InstanceData_t, called id. This topic
does not discuss the other variables, but may provide food for thought for items that are
appropriate to be local instead of in Filter-Private Data.
n Lines 34-41 attempt to allocate a new InstanceData_t structure from the heap, storing the result in
id. If the allocation fails, the code logs the failure and returns failure to the caller. If the allocation
succeeds, the code initializes the structure to all zeros.
n Lines 45-52 attempt to create a timer that fires every 30 seconds, logging an error and returning
failure to the caller if the creation fails. Specifically, on line 46, the call passes: The callback
TimerCallback in the second parameter; id in the third parameter (this is the parameter to pass to
TimerCallback()); The timerHandlep member of id in the third parameter. This handle is used to
cancel/remove the timer in Line 72.
n Line 53 associates id with handle so that it can be recalled in various other routines, such as on line
66.
n Lines 58-75 define the diskClose() callback, SampleFilterDiskClose(), which does the following:
n Line 66 retrieves the InstanceDataassociated with handle, storing it in id. The code missing in line
67 should check that id is non-NULL before proceeding.
n Line 69 cancels/removes the timer whose handle is id->TimerHandlep that was created in Line 46.
NOTE vSphere keeps the UUID the same during move or rename operations. vSphere assigns a new UUID
for copied VMDKs. In the case of a linked clone or snapshot, vSphere keeps the same UUID. In some cases
(e.g., Windows VSS), the snapshot may be opened on the same VM, and will have the same UUID. The same
VMDK will never be opened twice simultaneously on the same VM, but two separate (but related) VMDKs
with the same UUID's may be opened simultaneously. The combination of the UUID and the VM's instance
UUID will generate an identifier that may be sufficiently unique for logging and reporting.
n uint8_t uuid — This output parameter us used to capture the UUID of the disk upon successful return
of VMIOF_DiskUuidGet().
#define VMIOF_DISK_UUID-SIZE 16
n VMIOF_NOT_FOUND — The UUID of the disk is not available. All virtual disks will have a valid UUID
when created. However VMIOF_NOT_FOUND may be returned if the virtual disk descriptor file has been
modified or got corrupted.
n VMIOF_NOT_SUPPORTED — The operation is not supported on this disk handle. It could happen that the
disk chain is partially opened like only a subset of the total links being opened. In this case, you could
get a return value of VMIOF_NOT_SUPPORTED.
NOTE Upon successful execution of VMIOF_DiskUuidGet(), the UUID returned matches the UUID that is
available in the vSphere API under vim.vm.device.VirtualDisk.backing.uuid and also in the virtual
disk descriptor file with the key ddb.uuid, in the Disk Data Base section.
NOTE Usually the UUID doesn't change, however it can change if someone invokes the public vSphere
API to change it.
The following code snipped shows the use of VMIOF_DiskUuidGet() function outlined above —
7. if (status != VMIOF_SUCCESS) {
8. LOG("ERROR: could not get a disk uuid for this filter\n");
9. return (status);
10. }
11. LOG("Disk UUID for this filter is:");
12. for (i = 0; i < VMIOF_VM_UUID_SIZE; i++) {
13. LOG("%x ", uuid[i]);
14. }
15. LOG("\n");
16. return (status);
17. }
n Line 4 — Declare Local variable uuid to capture the uuid of the disk
n Line 5 — Invocation of function VMIOF_DiskUuidGet() that takes the disk handle and the uuid. Here uuid
is an output parameter and gets populated with the disk uuid upon successful return of the
VMIOF_DiskUuidGet().
n Line 7-10 — Upon unsuccessful return from VMIOF_DiskUuidGet() log message to convey that you could
not get the disk uuid for this filter.
n Line 11-13 — Upon successful return from VMIOF_DiskUuidGet(), log the the disk uuid.
n This corresponds to the uuid field of the disk data base in the Disk Descriptor File of the virtual
disk to which your filter is attached. For example line 12 logged, "60 00 C2 98 e1 85 37 85-62 d7 25
0d fe 1e 6e 5e" that corresponded to the entry - iofilter1.vmdk:ddb.uuid = "60 00 C2 98 e1 85 37
85-62 d7 25 0d fe 1e 6e 5e" in the iofilter*.vmdk file. This is the vmdk to which the IO Filter is
attached.
n Line 14-16 — Log new line and return back to the calling function.
VmkuserStatus_Code
VmkuserVersion_GetUniqueSystemVersion(
VmkuserVersion_UniqueSystemVersionInfo *versionInfo);
char buildDate[VMKUSER_SYSVERSION_STRING_LEN];
char buildTime[VMKUSER_SYSVERSION_STRING_LEN];
char releaseUpdate[VMKUSER_SYSVERSION_STRING_LEN];
char releasePatch[VMKUSER_SYSVERSION_STRING_LEN];
unsigned int vmkernelBuild;
} VmkuserVersion_UniqueSystemVersionInfo;
Invoking this function causes the IO Filter Framework to generate a certain VMKEvent, which ESXi sends to
the vCenter Server, which in turn has two effects:
n It generates an alert, visible in the VWC, that informs administrators of the issue
n It advises vCenter Server to not power-on or provision VMs on the host from which the event was
received
This function has no effect, with respect to IO Filter Framework events, for the invoking VM or any other
VMs using this filter. It is up to each filter instance / daemon to decide how to proceed in the face of the
specific error it encounters. Consider the following examples:
n Suppose an administrator has configured two VMDKs (A and B) to be replicated to two different sites
rA and rB and that the VM with these VMDKs is currently running. Suppose the connection to rB is
dropped due to a cable cut, but not rA.
In this scenario
n The LI for B invokes this function, with the previously described effects, and then must decide
what to do with its pending IOs and how else to proceed. For example, it may:
n Ask a daemon on another host in the cluster to attempt sending the replication IOs
n Crash the VM (very drastic action, probably more appropriate for caching errors than the one
described in this scenario).
n Suppose a caching solution detects than an entire SSD it uses has died, and there are currently two
VMDKs (A and B) with the solution's filter attached in one or more running VMs. Call the LIs for these
VMDKs lA and lB. Next, suppose that the daemon for the solution controls the cache and detected the
issue. Finally, suppose that lA has IOs in flight to the cache but lB does not. In this scenario:
n The daemon invokes VMIOF_FailureRportDiabled(), and (probably) sends control messages to the
LIs.
n lA may wish to crash the VM to prevent it from thinking that its IOs are OK, because they are not.
n lB may just fail any future IOs (return a failure code for each diskIOStart() invocation), since its
VMDK is in tact, but cannot perform any additional IOs, allowing the guest to give up or attempt
recover, etc.
Understanding VMIOF_FailureReportDisabled ()
This utility function has the following prototype (in vmiof_failure.h):
n const char *reason — This input parameter is the string describing the reason for failure.
The function returns an error code describing the result of the operation :
n VMIOF_SUCCESS — The function succeeded and the IO Filter's operational state was successfully set to
disabled.
n VMIOF_FAILURE — The function was unable to set the IO Filter's operational state to disabled.
NOTE If you plan to use this API, please work with VMware to provide a link to your documentation so
that we can update our KB article accordingly.
where filtername is the name of the IO Filter. Invoking this command sends a different VMKEvent than the one
described earlier, which ESXi sends to the vCenter Server, which in turn has two effects:
n Advises vCenter Server that it is now OK to power-on and provision VMs on this host
Besides, using the following command, you can list all the installed Filters on the host:
Understanding VmkuserVob_Notify
WHAT'S NEW From version 60U2, VAIO provides this function to send a notification about a critical
problem, or some critical observation which may help in identifying a root cause of a problem. This call
should not be used for reporting general error conditions without specific, known solutions.
The function returns an error code describing the result of the operation :
n uint64_t requiredMemoryPerIO — The memory required in bytes per IO. Internally we compute
the heap overhead as requiredMemoryPerIO * VMIOF_DiskMaxOutstandingIOsGet(). We understand
that this value can get relatively high for a large per IO overhead, so you can also decide to support
a smaller number of in-flight IOs and return VMIOF_NO_MEMORY. In that case your computation
should be: perIO = (real per IO overhead * own_max_supported_IOs /
VMIOF_DiskMaxOutstandingIOsGet());
n uint64_t requiredStaticMemory — The required memory amount should include all the memory
that is not accounted for by requiredMemoryPerDiskMiB or requiredMemoryPerIO, even if its total
amount may depend on a customizable variable through the filter policy.
The following code snippet provides an example of the proper use of the diskRequirements callback:
1 static void
2 TestDiskRequirements(VMIOF_DiskRequirements *requirements)
3 {
4 uint32_t maxIOs = VMIOF_DiskMaxOutstandingIOsGet();
5 VMIOF_HeapAllocation staticMemory[] = { /* size, alignment, # */
6 { sizeof(CountIOPrivateData), 0, 1 }, /* instance data */
7 { sizeof(CountIOSidecar),VMIOF_DISK_SIDECAR_ALIGN, 1}, /* sidecar data */
8 };
9 size_t countStatic = sizeof(staticMemory)/sizeof(VMIOF_HeapAllocation);
10 VMIOF_HeapAllocation perIO[] = { /* size, alignment, # */
11 { sizeof(CountIOWorkItem), 0, maxIOs },
12 };
13 size_t countPerIO = sizeof(perIO)/sizeof(VMIOF_HeapAllocation);
14
15 requirements->requiredMemoryPerDiskMiB = 0;
16 requirements->requiredMemoryPerIO =
17 (VMIOF_HeapEstimateRequiredSize(perIO, countPerIO) + maxIOs - 1) / maxIOs;
18
19 requirements->requiredStaticMemory = VMIOF_HeapEstimateRequiredSize(staticMemory,
20 countStatic);
21 }
n Line 4: maxIOs : holds the max number IOs you will get from the framework.
n Line 13: countPerIO: holds the number of elements of the perIO[] array
n Line 15: Set the requireMemoryPerDiskMiB to 0 since we don't have any per megabyte disk requirements
n Line 16-17: Set the requireMemoryPerIO using the VMIOF_HeapEstimateRequiredSize function. We use
this function as it takes into account the proper alignment when sizing the heap.
NOTE This function can be called at any point after filter is loaded and no state information should be
assumed by the code in this function. It means, the framework may query for the requirements of the filter
even before initializing the filter instance or attaching the filter to a specific disk.
NOTE You don't need to account System V Shared Memory Segments and mmap with MAP_SHARED,
because they come from kernel directly. However, you need to account for the page table overhead in
diskRequirements.
If DRS tries to power on a VM, it will check the number reported by this callback, if it is larger than the
remaining memory capacity of any host in the cluster, it will fail powering on the VM. However, if user
performs a manual power on, ESX won't do such admission check.
NOTE You can only create sidecar files from the context of a diskAttach function and diskPropertiesSet
function.
For details on sidecar-related functions, see “Using Sidecars Functions in Library Code to Keep Persistent
Per-VMDK Meta Data,” on page 137)
Recall that data written to a sidecar file must be aligned on a boundary specified by the
VMIOF_DISK_SIDECAR_ALIGN macro. Further, recall that the VAIO provides the VMIOF_HeapAllocateAligned()
function to allocate aligned memory from the heap. Thus, if your diskAttach() writes initial data to sidecar
files, you must first create a heap and then allocate aligned memory for this purpose. The size of the heap
needed in a diskAttach() is different from that needed for processing IOs once the filter is attached. Be sure
to invoke VMIOF_HeapSizeEstimation() in your diskAttach() before calling VMIOF_HeapCreate(), and to call
it with just enough size to perform the diskAttach(). For details on using these heap functions, see
“Managing Memory in an IO Filter Solution,” on page 130.
At a minimum, consider storing the following data in sidecars associated with a VMDK:
n Disk parameters, for example such as size. You can update this information when you get appropriate
callbacks when this information changes, such as diskGrow, diskCollapse, diskSnapshot, etc.
n Filter properties. As with disk parameters, you can update property values in your
VMIOF_DiskPropertiesSet callback.
n If you plan on supporting the diskRelease callback, have a separate sidecar that contains the pathname
and, optionally, an IP address and port number. When the diskOpen callback runs in the context of a
Daemon, it will fill in these latter fields with the IP information. This course refers to this as the owner
sidecar. Initialize the IP and port number to zero. If you don't plan to support diskRelease, put the
pathname in some other sidecar.
The IO Filters Framework passes this information into the diskAttach() function using parameters as
follows:
Parameters :
n VMIOF_DiskHandle *handle - This input parameter an opaque handle to the disk and is valid only for the
filter that it is passed to. Almost every callback function has this handle to the disk as the first
parameter.
n const VMIOF_DiskInfo *info - This is a pointer to a structure of type VMIOF_DiskInfo that describes disk
information like disk capacity, number of links that compose the disk and files that compose the disk.
Return Value :
n Every callback function should return a valid VMIOF_Status (Refer section “Understanding
VMIOF_Status Results for Functions in the VAIO,” on page 102)
Following is a code snippet describing the activities associated with diskAttach callback :
1. VMIOF_Status
2. SampleFilterDiskAttach(VMIOF_DiskHandle *handle,
3. const VMIOF_DiskInfo *diskInfo,
4. const VMIOF_DiskFilterProperty *const *properties)
5. {
6. VMIOF_DiskSidecar *schp;
7. VMIOF_Status res;
8. SampfiltSidecar_t *sidecarp;
9. size_t heapSizeEstimate;
47. }
48. ...
49. /* Sidecar Write */
50. if( VMIOF_SUCCESS != (res = VMIOF_DiskSidecarWrite(schp,
51. sidecarp, MY_SIDECAR_SIZE, (uint64_t)0))) {
52. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskAttach): writing initial "
53. "sidecar failed (%d)\n",res);
54. ...
55. return res ;
56. } else { /* it worked */
57. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskAttach): initial sidecar"
58. " written\n");
59. }
60. ...
61. /* Close the sidecar */
62. if( schp && (VMIOF_SUCCESS != (res = VMIOF_DiskSidecarClose(schp)))) {
63. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskAttach): closing sidecar "
64. "failed (%d)\n",res);
65. ...
66. return res;
67. } else { /* it worked */
68. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskAttach): sidecar closed\n");
69. } /* if write update */
70.
71. ...
72.
73. /* Destroy the heap */
74. VMIOF_HeapDestroy(heapHandlep);
75. ...
n Lines 6-14 : Declaration and initialization of local variables needed for heap creation, sidecar creation, to
accept the return value of callback functions via
n Lines18-24 : Heap is created via VMIOF_HeapCreate(). If not successful, log appropriate message to
indicate failure and return failure via res
n Lines 28-33 : Create your sidecar using VMIOF_DiskSidecarCreate(). If not successful, log appropriate
message indicating failure and return the failure status via res.
n Lines 38-43 : Map a buffer from the heap you created for your sidecar file using
VMIOF_HeapAllocateAligned(). Note that data written to a sidecar file must be aligned on a boundary
specified by the VMIOF_DISK_SIDECAR_ALIGN macro. Initialization of different members of this structure is
not shown specifically in this code sample.
n Lines 44-47 : Log appropriate message upon successful mapping of the buffer from the heap via
VMIOF_HeapAllocateAligned().
n Lines 49-55 : Write to the sidecar using VMIOF_DiskSidecarWrite(). In case of a failure, log appropriate
error message and return failure via res
n Lines 61-66 : Now that you are done with you sidecar, close it using VMIOF_DiskSidecarClose()
function. In case of failure, log appropriate message and return failure status via res.
n VMIOF_DiskHandle *handle — An opaque handle to the virtual disk from which the IO Filter is being
detached
n const VMIOF_DiskDetachInfo *info— Information about the detach event, defined as:
n VMIOF_DiskDetachFlags detachFlags — There is currently only one value defined for the flags at
this time - VMIOF_DISK_DETACH_DELETE. The IO Filter Framework sets this flag when the vdisk from
which the filter is being detached is also being deleted. One special case is during a diskCollapse.
In that case, the IO Filter Framework performs a series of diskAttach / diskDetach callbacks for
each delta disk, setting the VMIOF_DISK_DETACH_DELETE flag for the delta disks getting deleted as
part of the collapse.
n VMIOF_ASYNC — The callback is continuing its work asynchronously. It must call completionFunc to
indicate when it is done.
Following is a code snippet describing the activities associated with diskDetach() callback :
n Lines 1-3 : Define SampleFilterDiskDetach function. The Library Instance defines this function as the
entry point for the diskDetach event.
n Line 4 : Variable res declared to hold the return value of type VMIOF_Status (Refer to section
“Understanding VMIOF_Status Results for Functions in the VAIO,” on page 102)
n Line 5 : You get the current set of valid properties of your IO Filter via
VMIOF_DiskFilterPrivateDataGet() and accept it in a pointer to a structure of type InstanceData_t. You
later use this pointer id to reference the sidecar that you will try closing first and then delete as part of
diskDetach operation.
n Lines 6-11 : First close the sidecar that you intend to delete as part of diskDetach. This is done by
invoking VMIOF_DiskSidecarClose(). Note that you are referencing the sidecar to be closed via id
n Lines 12-18 : After ensuring that you closed the sidecar in the previous step, you now remove it by
invoking VMIOF_DiskSidecarDelete(). You accept the return value of this function in variable res. If not
successful, you log an appropriate message and return failure via res.
n Lines 19-23 : Upon successfully deleting the sidecar, you log message indicating that the sidecar indeed
got deleted, and return success via res.
The VMIOF_DiskFlags parameter in the VMIOF_DiskInfo structure specifies the mode in which the disk is
opened. It is required to check the disk size during open and free up any resources in excess to what is
required for the current disk size. This can happen when the said filter had reserved some resources during
disk grow and for some reason disk grow operation failed.
Following are some of the operations that are done in the context of diskOpen():
1 Determine the size of heap that you need to create using VMIOF_HeapEstimateRequiredSize().
2 Create the heap of the required size as determined above, using VMIOF_HeapCreate(). This heap is used
for all your dynamic memory allocation requirements like creating the instance data structure for your
IO Filter and mapping buffer for your sidecar.
NOTE A filter instance MUST create its own heap, and you may not share a heap with another filter
instance.
3 Allocate memory for your instance data structure from the heap you created using
VMIOF_HeapAllocate(). If you are unable to do this, return with VMIOF_NO_MEMORY after invoking
VMIOF_HeapDestroy() on the heap created in Step 2. Upon successful allocation of the instance data
structure, you initialize its members to appropriate values. Remember that the instance data should
contain a list of IOs owned by the instance. Ensure that you initialize the list to an empty state.
4 If your LI needs to connect with the filter's Daemon component, establish a UNIX socket connection
between them. Using a UNIX socket allows you to pass file descriptors (such as a crossFD, discussed in
the next item) between cartels using control messages.
NOTE The daemon in your IO Filter Solution is recommended to use SSL to establish a secure
connections between its LIs and off-host peer daemons, incorporating additional measures to authentic
such connections, such as magic numbers, etc.
5 If you want the Daemon to share memory using the VAIO crossFD utility functions, use the appropriate
functions to create a crossFD and manage the address space shared between the LI and Daemon. When
the LI does its initial handshake with the Daemon, pass the crossFD's file descriptor to the Daemon in
the control part of a POSIX msghdr structure.
6 Allocate memory from the heap to process IOs to/from your sidecar(s) using
VMIOF_HeapAllocateAligned().
NOTE You must have created the sidecar(s) using VMIOF_DiskSidecarCreate() in your diskAttach() or
diskPropertiesSet(). You cannot create sidecar files in the context of diskOpen().
8 Upon successful open of the sidecar(s), read the values into structures allocated from a heap, caching
the values in the VMDK's instance data. Remember that you may have multiple sidecar(s) for a filter.
9 If the open is happening after a failed diskGrow event, the code for processing the grow may have
allocated resources that are no longer needed. To detect this, after a diskGrow, determine the size of the
disk (using information in the info parameter) and compare it with the disk size-related resources used
by your filter. If appropriate, free any unused resources.
10 If you need to react to a storage migration, determine whether the pathname passed into open is the
same as the one last written into whichever sidecar you keep the VMDK's pathname. After processing
the migration, update the pathname in the sidecar.
b If so:
1 Write the VMDK's pathname and IP address of this host and the port number on which the
Daemon is listening (if the port is not static) into it.
c Else:
d Close the ownership sidecar. It is VITAL that you keep this sidecar closed except during
diskAttach, diskOpen, and diskClose callbacks. Otherwise, you can't implement diskRelease.
12 Create a workgroup via VMIOF_WorkGroupAlloc() to which you can later submit workers.
NOTE This callback must complete synchronously. Hence, if there is any long running task that needs to be
done as a result of opening the disk, it must be done outside the context of this callback.
n VMIOF_DiskInfo *info —This structure describes the disk being opened as follows:
n const char *const *filesInChain — An array of pointers to strings with the absolute path names
of the base and delta disks of the virtual disk (see “Understanding and Processing Snapshot-related
Events,” on page 192). The most recent delta disk will be the first in the array, while the base disk
will be the last in the array. Logically, this should be declared of as:
n VMIOF_DiskFlags diskFlags — Zero or more of the following flags to indicate how the disk is being
opened:
n VMIOF_DISK_NO_IO — The filter is not allowed to do IO to the virtual disk, but can perform IO
to its sidecar file(s) and to cache files
n VMIOF_DISK_RO — The virtual disk is read-only. The filter cannot perform writes to the virtual
disk or its sidecar file(s)
n VMIOF_DISK_CREATE — The virtual disk is newly created (this is the first open on the disk)
n VMIOF_DISK_SHARED — The virtual disk can have multiple writers on the disk at the same time.
This can occur if the VMDK's Sharing property is set to multi-writer or if the vHBA to which
the VMDK is attached has its SCSI Bus Sharing property set to Virtual or Physical.
n VMIOF_DISK_SYNC — The virtual disk cannot be made dirty by the filter (any caches should be
write through)
n Any other value indicates your code failed to process the open event
49. id->transListCount=0;
50. if(VMIOF_SUCCESS != (res = SampleFilterCrossfdSetup(id))) {
51. /* cleanup */
52. close(id->daemonSockFD);
53. VMIOF_HeapFree(id->heapHandlep, id);
54. VMIOF_HeapDestroy(instanceHeapHandlep);
55. return res;
56. }
57. /* map buffer for sidecar from heap alligned as required */
58. if(NULL == (id->mySidecarp = (SampfiltSidecar_t *)VMIOF_HeapAllocateAligned(
59. id->heapHandlep, MY_SIDECAR_SIZE, VMIOF_DISK_SIDECAR_ALIGN))) {
60. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskOpen): could not " "allocate
sidecar buffer\n");
61. close(id->daemonSockFD);
62. VMIOF_HeapFree(id->heapHandlep, id);
63. VMIOF_HeapDestroy(instanceHeapHandlep);
64. return VMIOF_NO_MEMORY;
65. } else {
66. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskOpen): mySidecarp =" " %p\n",id-
>mySidecarp);
67. }
68. if( VMIOF_SUCCESS != (res = VMIOF_DiskSidecarOpen(diskHandle, MY_SIDECAR_KEY, &id-
>mySidecarHandlep))) {
69. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskOpen): no sidecar found on " "open
(%d)...\n",res);
70. /* NOTE: on failure mySidecarHandlep is left unchanged. That is, it should be NULL */
71. } else { /* open succeeded! */
72. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskOpen): sidecar opened"
73. " handlep=%p\n",id->mySidecarHandlep);
74. /* go read the sidecar data */
75. if( VMIOF_SUCCESS != (res = VMIOF_DiskSidecarRead(id->mySidecarHandlep,
76. id->mySidecarp, MY_SIDECAR_SIZE,
(uint64_t)0))) {
77. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskOpen): reading sidecar "
78. "data failed (%d), going to recreate it\n", res);
79. /* for now, just recreate the initial data */
80. bzero(id->mySidecarp,MY_SIDECAR_SIZE);
81. strncpy(id->mySidecarp->signature, "MySidecarSignature", MY_SIDECAR_SIG_SIZE-1);
82. id->mySidecarp->open_count = 1; /* we've had an open */
83. if( VMIOF_SUCCESS != (res = VMIOF_DiskSidecarWrite(id->mySidecarHandlep,
84. id->mySidecarp, MY_SIDECAR_SIZE,
(uint64_t)0))) {
85. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskOpen): re-writing" " initial sidecar
failed (%d)\n",res);
86. } else { /* reinitializing write worked */
87. VMIOF_Log(VMIOF_LOG_ERROR,"VMIOF-Client(DiskOpen): re-initial" " sidecar
written\n");
88. }
89. } else { /* read succeeded */
90. /* add one to 'open_count' */
91. id->mySidecarp->open_count++;
92. /* now write it out */
93. if( VMIOF_SUCCESS != (res = VMIOF_DiskSidecarWrite(id->mySidecarHandlep,
94. id->mySidecarp, MY_SIDECAR_SIZE,
(uint64_t)0))) {
n Lines 3-6 : Declare local variables, res to hold return value of type VMIOF_Status from function calls
(Refer to section “Understanding VMIOF_Status Results for Functions in the VAIO,” on page 102),
variable heapSizeEstimate to get the estimated heap size to create, variable id that is a pointer to the
instance data structure etc.
n Lines 7-15 : Initialize the required variables and estimate the heap size that is required to be created
using VMIOF_HeapEstimateRequiredSize()
n Lines 16-19 : Create the heap using VMIOF_HeapCreate(). If not successful, log appropriate message and
return the failure via res
n Lines 20-22 : If successful in creating the heap, log appropriate message indicating success.
n Lines 23-27 : Allocate memory from heap for the instance data structure. If not successful, log
appropriate failure message and return VMIOF_NO_MEMORY
n Lines 28-34 : Upon successful allocation of instance id structure, zero it out and log its address.
n Lines 50-56 : CrossFD setup using SampleFilterCrossfdSetup(). Upon failure, do cleanup like closing
the daemon socket, freeing up the instance data structure and destroy the heap.
n Lines 57-64 : Allocate memory mapping buffer for your sidecar using VMIOF_HeapAllocateAligned().
Upon failure, do cleanup like closing the daemon socket connection, freeing up the instance data
structure and destroy the heap after logging appropriate error message.
n Lines 65-67 : Upon successful allocation of the mapping buffer, log its address.
n Lines 68-70 : Open the sidecar. Upon failure, log appropriate failure message.
n Liens 71-73 : Log appropriate message indicating successful opening of the sidecar.
n Lines 74-88 : Read the sidecar data. If not successful, log appropriate message and just recreate it and
write to sidecar using VMIOF_DiskSidecarWrite()
n Lines 89-101 : Upon successful read of sidecar data, write to sidecar using VMIOF_DiskSidecarWrite()
n Lines 102-110 : Create a workgroup with a single thread. If not successful, log failure message followed
by a through cleanup - including removing the poll, release memory for the sidecar and instance id
structure and destroy the heap.
n Lines 111-115 : Log message indicating successful creation of the workgroup and return.
Long time processing is not allowed in the callback of diskClose. If your solution is a cache solution, and
you have dirty disk data in SSD when the diskClose is called, you should not flush the dirty data in this
callback. Instead, let your daemon open the disk using VMIOF_VirtualDiskOpen and flush the dirty data. For
how to use VMIOF_VirtualDiskOpen, please see “Understanding and Using the VMIOF_VirtualDisk*
Functions,” on page 231
Parameters :
n VMIOF_DiskHandle *handle - This input parameter an opaque handle to the disk. Almost every callback
function has this handle to the disk as the first parameter.
Return Value :
n VMIOF_SUCCESS : diskClose() returns VMIOF_SUCCESS upon successfully closing the vmdk
Following is a code snippet describing the activities associated with diskClose event :
1 VMIOF_Status
2 SampfiltDiskClose(VMIOF_DiskHandle *diskHandle)
3 {
4 VMIOF_Status res = VMIOF_SUCCESS;
5 InstanceData_t *id;
6 VMIOF_HeapHandle *tmp;
7
8 /* announce we are here, and print the parameters */
9 VMIOF_Log(VMIOF_LOG_ERROR,"SampfiltLI(%s): handle=%p\n",__func__,diskHandle);
10 id = (InstanceData_t *)VMIOF_DiskFilterPrivateDataGet(diskHandle);
11 /* wait for all async I/Os to complete before continuing */
12 VMIOF_WorkGroupWait(id->workGroupp);
13
14 /* update close count and close the sidecar */
15 if(id->mySidecarHandlep) {
16 id->mySidecarp->close_count++;
n Line 15: Check to see if the sidecar handle is valid. If the IO Filter Framework invoked diskDetach()
before close, that callback will have set the sidecar handle to NULL before closing and deleting the
sidecar.
n Lines 26-34: Attempt to close the sidecar. Accept the return value of SampleFilterDiskClose() in local
variable res. Upon failure to close the sidecar log appropriate message.
n Lines 46 & 47: Time to free up the instance structure. But first store the heap handle in the instance
structure into a local variable tmp. Because once you free up the instance structure, you lose all its
information and will be unable to destroy the heap, that you will be doing in the subsequent step. Now
free the instance structure that is also drawn from the heap.
n Line 48: Destroy the heap via VMIOF_HeapDestroy, passing it the variable tmp that holds the heap
handle.
The following figure summarizes the layout of a VMIOF_DiskIO structure, with members of the structure
defined in the following text:
n DiskResetIdentifier resetIdentifier — This is essentially a large integer. By default all IOs belonging
to a diskHandle have the same resetIdentifier. However, there might be cases where two IOs belonging
to the same diskHandle might have different resetIdentifiers.
The other flags may or may not be set with the VMIOF_READ_OP or VMIOF_WRITE_OP flag. When set, they
have the following meaning:
n VMIOF_VM_IO — As the comment suggests, if this flag is set, the IO request originated in a VM. This
contrasts with IOs that may originate in Filter Library code using VMIOF_DiskIOAlloc() (see
“Understanding and Using the IO Filters VMIOF_DiskIOAlloc Function,” on page 182).
n VMIOF_ZERO_COPY — The original semantics of this flag was "The IO has not been copied via
VMIOF_DISKIODup()." The implications were that the IO was coming from a VM, meaning that
the addr members of the VMIOF_DiskIOElem items (discussed later in this section) pointed at
memory in the guest VM. The implication of that was that filters could not change the data pointed
to (except when completing a VMIOF_READ_OP).
The semantics have changed to mean that, except when completing a VMIOF_READ_OP, the
buffer pointed to by the addr member of the VMIOF_DiskIOElem items is read-only. There are many
reasons why the framework may set this flag, including the original semantic (that its guest
memory). But, Filter Library code creating new IOs (via VMIOF_DiskIOAlloc()) can also set this flag
to prevent filters lower in the stack from writing over their data.
NOTE VMIOF_DiskIODup() does not replicate this flag in the duplicated IO structure it creates.
n uint64_t offset — The offset into the file to start reading / writing data. The amount of data to read /
write, and the address of the RAM buffer are in the VMIOF_DIskIOElem structures described later in
this list.
n uint32_t numelems — This member indicates how many VMIOF_DiskIOElem structures are in the array
pointed to by the elems member.
n uint64_t addr — The address in memory from which the data is to be written or to which the data
is to be read. To convert from addr to a pointer or a pointer to an addr, use the A2P() and P2A()
macros found in the proxy and sampcache samples.
NOTE The offset and the total length of an IO are always sector (512 bytes) aligned, but this may
not be true for the individual elements.
Return Value : Only two return values are permitted for this callback - VMIOF_SUCCESS and VMIOF_ASYNC. This
callback function is not allowed to block.
n SYNC processing — Perform its filter processing, if any, possibly including setting up a completion
callback, and then return VMIOF_SUCCESS to the IO Filter Framework. In this case, the IO Filter
Framework continues the IO through the rest of the IO stack. This may mean passing the IO to the next
IO Filter (if there is one), or sending the request to the next kernel module in the IO stack. Before
returning one of the synchronous return codes, the LI must remove the IO from its list of owned IOs.
n ASYNC processing — you should process the request asynchronously if said processing involves calls
to remote services such as Daemons for caching solutions or replication sites for replication solutions.
To process an IO asynchronously:
b Enqueue the IO to some service (a worker callback, a timer callback, or the Daemon)
In this case, the IO Filter Framework suspends processing of the IO request until the LI tells said
Framework that it is finished processing said IO, either completing or continuing it. Whatever code
completes processing the IO must also remove it from the LI's list of owned IOs.
Whether processing the IO synchronously or asynchronously, the callback may register a further callback
function for the IO Filter Framework to invoke when the IO request has been completed (for example when
the data has been read from or written to a VMDK). It does this by invoking
VMIOF_DiskIOCompletionCallbackSet(). In addition to the IO request and VMIOF_DiskHandle parameters,
this function takes a pointer to a function of type VMIOF_DiskIOCompletionCallback that you provide in your
LI, and a pointer to opaque data your LI can associate with this IO request.
NOTE The framework may invoke multiple completion callback functions at the same time.
VMIOF_DiskIOCompletionCallbackSet(VMIOF_DiskHandle *handle,
VMIOF_DiskIO *io,
VMIOF_DiskIOCompletionCallback callback,
void *data);
n VMIOF_DiskHandle *handle — The handle of the VMDK as passed into the diskIOStart callback.
n VMIOF_Status ioStatus —The status code with which the IO was completed.
NOTE The utility function and callback type are very similarly named. Remember that the utility function
ends in "Set" and the callback type does not.
The IO Filter Framework passes each of these parameters to your callback, plus the completion status of the
IO (for example did the read / write succeed, and did no other IO Filter fail it). The completion function can
further delay the IO by returning VMIOF_ASYNC, or continue the completion by returning
VMIOF_SUCCESS.
NOTE Generally you do not need to call VMIOF_DiskIOComplete() in the completion callbacks, but if you do,
the completion callback needs to return VMIOF_ASYNC"
Some times, an IO Filter may wish to short-circuit the normal flow of an IO request, completing the request
without having the rest of the IO stack process said request. For example, a caching solution may find the
desired data in its cache file, in which case there is no reason for the request to progress through the rest of
the IO stack to the VMDK, etc. Call VMIOF_DiskIOComplete() to cause the IO Filter Framework to turn the IO
around and start it heading back to its requestor.
NOTE If an LI (say for filter X) has registered a completion callback on an IO request (via
VMIOF_DiskIOCompletionCallbackSet()), and a subsequent LI (say for filter Y) invokes
VMIOF_DiskIOComplete() on the same IO request, the IO Filter Framework invokes the completion callback
for filter X.
IMPORTANT If your filter requires stable data, remember to check the ioFlags member of the request
VMIOF_DiskIO structure against VMIOF_ZERO_COPY. If the flag is set, the data pointed to by the address
members of said request's VMIOF_DiskIOElem elements can change while the IO is still in flight. So if your IO
Filter solution needs stable data in the IO request, it should use VMIOF_DiskIODup to create a duplicate of the
data and process the duplicate. This is like the Bounce Buffer on a Linux or BSD system to copy data for
devices having specific addressing requirement to DMA access.
NOTE ESXi will kill the VMX cartel if it is stuck for over 120 seconds in the IO processing thread. Please be
aware of time spent in both diskIOStart callback and VMIOF_DiskIOCompletionCallback callbacks.
Timers in IO filter framework are used to invoke specified function in an IO Filter with a specified
periodicity (though there may be some latency that can be expected since ESXi is not a real-time operating
system). Examples of reasons to use timers in an IO Filter solution include:
n Creating timeouts for an ACK for each IO request sent to a replication site
n Invoking a function every 10 seconds or less to indicate the progress of a given long-running operation,
such as required by certain entry points including diskAttach(), diskSnapshot(), and diskVmMigration()
1 Define callback functions for your timer(s) as discussed in the sub-section. “Understanding
VMIOF_TimerCallback(),” on page 172.
3 Create timers with specific periodicity and an associated callback function as discussed in the sub-
section “Understanding VMIOF_TimerAdd(),” on page 172.
4 If/When you no longer wish the timer to fire, remove it as discussed in the sub-section. “Understanding
VMIOF_TimerRemove(),” on page 173.
The VAIO defines all of the data types and function prototypes it uses to implement timers in the file
vmiof_timer.h.
Understanding VMIOF_TimerHandle
The VAIO API provides an opaque data type VMIOF_TimerHandle to represent each timer you create in your
code. If your code uses a fixed number of timers, as in the case of a timer for a progress function, you may
consider declaring handles for such timers as regular variables. If your code needs to create timers
dynamically, as in the case of a timer per IO request being sent to a replication site, use dynamic memory
allocation to create space for the timer handles.
Because most VAIO timer functions require a pointer to a VMIOF_TimerHandle instead of an actual
VMIOF_TimerHandle, a typical declaration looks similar to the following:
VMIOF_TimerHandle *timerHandlep;
Understanding VMIOF_TimerCallback ()
Whenever a timer fires (expires), the IO Filter Framework invokes a callback function that you specify when
you create the timer. All callback functions must be of type VMIOF_TimerCallback, which is defined in
vmiof_timer.h as:
That is, your callback must take VMIOF_TimerCallback a pointer to some data (the data itself is opaque to the
IO Filter framework) and return void. You associate the data pointer with the timer when you create the
timer. Continuing the replication preceding timeout example, the replication code could associate a pointer
to the IO request that it is sending to the replication site and on whose ACK said replication code is still
waiting. Alternatively, in some instances, solutions choose to associate no data, that is a NULL pointer, with
a timer. In this latter case, when the IO Filter Framework invokes the callback, *data is NULL.
NOTE In both LIs and Daemons, no blocking functions may be called in Timer callbacks
Understanding VMIOF_TimerAdd ()
You create a timer using VMIOF_TimerAdd(), which has the following prototype (in vmiof_timer.h):
VMIOF_Status
VMIOF_TimerAdd(uint64_t delay, VMIOF_TimerCallback callback, void *data,
VMIOF_TimerHandle **timer);
n void *data — A pointer to the data to associate with the timer. Again, this pointer is passed to callback by
the IO Filter Framework
n VMIOF_BAD_PARAM — The value of delay is too large, currently the max value is INT_MAX.
n VMIOF_NO_MEMORY — The IO Filter framework was unable to allocate the memory, it needed, to create the
timer
Understanding VMIOF_TimerRemove ()
Once your code creates a timer, the IO Filter framework continues to fire the timer with the given
periodicity, theoretically forever. To stop, your code must invoke VMIOF_TimerRemove() provided by the
VAIO. This function has the following prototype (in vmiof_timer.h):
VMIOF_Status
VMIOF_TimerRemove(VMIOF_TimerHandle *timer);
This function takes only one parameter, the VMIOF_TimerHandle * returned by VMIOF_TimerAdd().
n VMIOF_SUCCESS — The function succeed in removing the timer. That is, the timer will no longer fire.
NOTE The framework synchronizes removing a Timer with its own lock, eliminating a race between the
removal of the timer and its firing.
0. char Mbuf[1024];
1. #define THIRTY_SECONDS (uint64_t)30000000
2. /* Declare the timer handle for the new timer to be created */
3. VMIOF_TimerHandle *SampFiltTimerHandlep;
4. void
5. TimerCallback(void *datap) {
6. IOFLOG1("It's Time!\n");
7. } /* TimerCallback() */
8.
9. VMIOF_Status
10.SampleFilterDiskOpen(VMIOF_DiskHandle *diskHandle, const VMIOF_DiskInfo *diskInfo) {
11. VMIOF_Status res;
12. …
13. /* create a timer */
14. if( VMIOF_SUCCESS != (res = VMIOF_TimerAdd(THIRTY_SECONDS, TimerCallback, NULL,
&SampFiltTimerHandlep))) {
15. /* couldn't create the timer */
16. sprintf(Mbuf,"creating timer failed (%d)",res);
17. IOFLOG1(Mbuf);
18. } else {
19. IOFLOG1("created timer\n");
20. }
21. …
22. } /* End of SampleFilterDiskOpen () */
23.
24. VMIOF_Status
25. SampleFilterDiskClose(VMIOF_DiskHandle *diskHandle) {
26. …
27. /* cancel the timer */
28. (void)VMIOF_TimerRemove(SampFiltTimerHandlep);
29. …
30. } /* end of SampleFilterDiskClose() */
n Line 0 : Defines a character buffer, Mbuf that the code uses for buffering log messages
n Line 1 : Defines THIRTY_SECONDS to be the number of microseconds in 30 full seconds. The code uses this
value for the periodicity of the timer.
n Line 3 : Defines a timer handle called SmpFiltTimerHandlep that the code uses for its one and only timer
n Lines 4/7: Defines TimerCallback(), the callback function that the code associates with its timer on line 13.
The function has just one line of code, Line 6, which displays It's time! when it gets invoked by the IO
Filter Framework
n Line 9-22 : Define SampleFilterDiskOpen(). Not shown in this code snippet, the Library Instance defines
this function as the entry point for the diskOpen event. Specifically:
n Line 11: Defines res to hold the return values from certain VAIO functions called by this function
n Line 14: Invokes VMIOF_TimerAdd() to create a new timer with a periodicity of THIRTY_SECONDS, that
calls TimerCallback() when the timer fires, passes NO data to TimerCallback() when the IO Filter
Framework invokes the function, and stores a handle to the timer in SampFiltTimerHandlep. The
code evaluates the return value of the invocation.
n Lines 15-17: Log an appropriate error message in the event of failure adding the timer on Line
14
n Line 19: Logs an appropriate error message in the event the timer add succeeds on Line 14
n Lines 24-30 : Define SampleFilterDiskClose(). Not shown in this code snippet, the Library Instance defines
this function as the entry point for the diskClose event. Specifically:
n Line 28: Cancels / removes the timer create / added on Line 14. Normally code should check the
return value of this function. For simplicity, this function discards and does not check the return
value
Thus, in general, whenever ESXi opens a VMDK covered by this filter, the code starts a timer that fires every
30 seconds, calling TimerCallback(). The timer continues to fire until ESXi closes the disk, at which time the
code cancels the timer.
One of the patterns for writing multi-threaded code is called work pile (see Pthreads Programming by Buttlar,
Farrel, and Nichols from O'Reilly Media, September 1996).. this model contains:
1 A single queue that contains the objects of work to be done at any time, indicated by a function to call
and data to pass to said function.
3 A pool of threads that perform the work. Each thread pulls an item off the head of the queue, invokes
the indicated function, passing the indicated data. When the function finishes, the thread pulls the next
item from the queue, rinse and repeat, until there are no items in the queue.
The queue is called the work pile. One distinguishing attribute of this pattern is that it does not require a
thread to coordinate which worker thread performs which item of work. (That pattern is called a boss /
worker model by the previously referenced book.
VAIO provides a set of functions that implement a work-pile pattern, though it uses the term work group.
The main reason to queue work to a work-pile in IO Filters is to prevent the current thread from blocking,
for example:
n Performing an asynchronous operation such as querying the Daemon for I/O processing
n There are a limited number of threads that can process poll events in a Library / Daemon (currently,
only one!). If you block in a poll thread, no other poll-driven processing can occur until that thread
unblocks. If the code that unblocks the poll code in turn depends on a poll event to run, the Solution
will deadlock.
n Increase performance. If you block in certain functions, you prevent the Framework from submitting
additional events to the Solution. Thus, many events allow a VMIOF_ASYNC return to complete at a
later time. You use the work group functions to complete the event. Examples of this include:
n You may want diskDetach() to be done asynchronously, for example to flush cache data or remove
encryption. Either of which may take a significant time
n A replication filter may have to duplicate incoming IOs, send them to the replication sites, waiting
for acknowledgements. In the meantime, it blocks the caller in a synchronous fashion. To avoid this
blocking and allow for the IO Filter Framework to continue processing events, perform these long-
running actions in worker functions. The workers must inform the framework when they finish
work for those events.
The pattern for using the work group VAIO functions is:
n Once — Declare an work group handle to keep track of the work group.
n Once — Create (allocate) a work group, specifying the maximum number of worker threads the
framework will allocate to work on the items in the queue.
n As necessary — Queue work items into the work group. For each item, specify the function that the
thread must call to perform the work, and the data on which the function must work.
n When desired, especially before shutting down / closing — Wait for all the items in the work group to
complete. You cannot add new work items to the queue while waiting for the existing items to
complete.
The remaining sub-topics discuss how to use the VAIO work group functions to implement this pattern.
Understanding VMIOF_WorkGroup
The VAIO API provides an opaque data type VMIOF_WorkGroup to represent each work group you create in
your code. You normally declare handle to work group as a regular static variable. Upon allocation of a new
work group you use this handle for all subsequent references to the work group. For example, when you
want to enqueue a work item into the work group or you want to wait until all the work items in this work
group have completed executing or when you want to free a work group itself when you no longer need it.
Because many VAIO work functions require a pointer to the work group that they operate upon, a typical
declaration looks similar to the following:
VMIOF_WorkGroup *workGroupp;
Understanding VMIOF_WorkGroupAlloc () :
Use VMIOF_WorkGroupAlloc() to allocate a new work group. Creating work groups is a means of grouping
various logically related work items together. These work items will be implemented as separate threads
that get executed asynchronously to accomplish their specified tasks. You can wait for the enqueued work
items to finish before proceeding further. This provides a mechanism for the master to co-ordinate the
various tasks that it is implementing via multiple threads or work items. When VMIOF_WorkGroupAlloc()
returns VMIOF_SUCCESS, it returns an opaque handle to VMIOF_WorkGroup in the group parameter. Work group
creation could fail in case of memory allocation failure with a return value of VMIOF_NO_MEMORY.
VMIOF_Status
VMIOF_WorkGroupAlloc(uint32_t maxThrds, VMIOF_WorkGroup **group);
n uint32_t maxThrds— This input parameter indicates the maximum number of threads for this work
group.
n VMIOF_WorkGroup**group— This output parameter is the opaque handle to the work group that gets
allocated.
n VMIOF_SUCCESS — This value is returned upon successful creation of the work group.
n VMIOF_NO_MEMORY — This value is returned if work group allocation fails due to memory allocation
failure.
Understanding VMIOF_WorkQueue () :
Use VMIOF_WorkQueue() to enqueue the work item into the work group queue. Each work item or thread
executes asynchronously. The master is thereby able to co-ordinate the activities performed by the work
threads. VMIOF_WorkQueue() returns VMIOF_SUCCESS upon successful execution, indicating that the work
thread was indeed en-queued into the work group queue.
n VMIOF_WorkGroup *group — This input parameter is an opaque handle to the work group into which the
work thread needs to be en-queued.
n VMIOF_WorkFunc func — This input parameter is a function pointer. It points to the function that will be
invoked to perform the actual task.
n void *data — This input parameter can be used to pass on, any user defined data/parameter that
VMIOF_WorkFunc can operate upon.
Understanding VMIOF_WorkGroupWait () :
VMIOF_WorkGroupWait() can be used by the master to wait for all the work items or threads in the given work
group to finish before the master proceeds further. Since it is watching the work group it needs a handle to
the group on which it needs to wait, as the input parameter.
n VMIOF_WorkGroup*group — This input parameter is an opaque handle to the work group that is watched
for all its work items to finish execution.
Understanding VMIOF_WorkGroupFree () :
When you no longer need the work group, you free it up using VMIOF_WorkGroupFree(). It takes a single
input parameter VMIOF_WorkGroup, an opaque handle to the work group that needs to be freed. All enqueued
work items must have finished execution or else you should have executed VMIOF_WorkGroupWait() before
you call VMIOF_WorkGroupFree().
n VMIOF_WorkGroup *group — This input parameter refers to the opaque handle of the work group that will
be freed.
IMPORTANT You should call VMIOF_WorkGroupWait() before calling this function. This function waits for any
existing work function to complete before destroying the work group.
1. VMIOF_WorkGroup *workGroupp;
2. …
3. void Worker(void *datap) {
4. int count = *(int *)datap;
5. sprintf(Mbuf,"worker: count is %d\n",count);
6. IOFLOG1(Mbuf);
7. return;
8. }
9. /* create a workgroup (work-pile thread model) with 1 thread */
10. if( VMIOF_SUCCESS != (res = VMIOF_WorkGroupAlloc(1 /* thread */, &workGroupp))) {
11. /* couldn't create the workgroup */
12. sprintf(Mbuf,"creating work group failed (%d)",res);
13. IOFLOG1(Mbuf);
14. } else {
15. IOFLOG1("created work group\n");
16. } /* if VMIOF_WorkGroupAlloc() */
17. …
18. /* add a call to the Worker function to the "work pile" pointed to by workGroupp (set in
DiskOpen) */
19. if( VMIOF_SUCCESS != (res = VMIOF_WorkQueue(workGroupp, Worker, (void *)&count))) {
20. /* well, that didn't work */
21. sprintf(Mbuf,"could not add work to the pile (%d).",res);
22. IOFLOG1(Mbuf);
23. } else {
24. IOFLOG1("added work to the pile\n");
25. }
26. …
27. VMIOF_WorkGroupWait(workGroupp);
28. VMIOF_WorkGroupFree(workGroupp);
29. …
n Line 1 : Declare a pointer to VMIOF_WorkGroup. It is an opaque handle to the work group that gets created
via VMIOF_WorkGroupAlloc().
n Line 3-8 :This is the function associated with work item thread that is passed to VMIOF_WorkQueue(). It
provides the functionality that the work item is expected to perform. In this example, the current value
of count variable is sent to log message by this function.
n Line 9-13 : Work group allocation happens here. The work group can accommodate one work item or
thread. The opaque handle to the work groupworkGroupp is passed to VMIOF_WorkGroupAlloc() as an
output parameter. If VMIOF_WorkGroupAlloc() fails, a log message is sent conveying that work group
creation failed.
n Line 14-16 : Upon successful execution of VMIOF_WorkGroupAlloc() the message conveying successful
creation of the work group is sent to the log.
n Line 18-22 : The work item thread Worker is enqueued into the queue work group workGroupp, count is
the parameter to be passed to Worker function.
n Line 23-25 : Upon successful execution of VMIOF_WorkQueue(), log a message to convey that work was
added to the work pile.
n Line 27-28 :When the work group is no longer needed, you call VMIOF_WorkGroupWait() to wait for it to
complete and then VMIOF_WorkGroupFree() to free it, passing them the work group handle workGroupp.
VMIOF_Status
(*diskDeleteBlocksPrepare)(VMIOF_DiskHandle *handle, const VMIOF_DiskDeleteBlocksInfo *info);
n uint64_t offset — Offset (in bytes) to the first disk block to be deleted
This callback returns VMIOF_SUCCESS if the operation is allowed to proceed, else it returns an appropriate
error value.
The second callback is called to signal the status of the block deletion to the filter. It is invoked after a set of
virtual disk blocks were deleted, or if the operation failed. The prototype for this callback is :
void
(*diskDeletedBlocks)(VMIOF_DiskHandle *handle, const VMIOF_DiskDeleteBlocksInfo *info,
VMIOF_Status status);
n VMIOF_DiskHandle *handle —This input parameter is an opaque handle to the disk and is valid only for
the filter that it is passed to.
These two callbacks are not allowed long running operations. If you are a cache solution and have cached
data for the blocks being deleted, you should hold the data, and only delete them after diskDeletedBlocks is
received.
NOTE While some blocks are being deleted, you may see parallel read / writes from a guest OS, but they
should not be the same blocks.
As an example, the following simple code was added to the sampfilt example.
void
SampleFilterDiskDeletedBlocks(VMIOF_DiskHandle *handle,
VMIOF_DiskDeleteBlocksInfo *info,
VMIOF_Status status)
{
VMIOF_Log(VMIOF_LOG_ERROR, "In callback %s\n", __func__);
VMIOF_Log(VMIOF_LOG_ERROR, "info->numBlockDescs = 0x%x\n", info->numBlockDescs);
VMIOF_Log(VMIOF_LOG_ERROR, "info->descs[0]: offset = 0x%lx length = 0x%lx\n",
(long unsigned int) info->descs[0].offset,
(long unsigned int)info->descs[0].length);
VMIOF_Log(VMIOF_LOG_ERROR, "status = 0x%x\n", status);
}
VMIOF_Status
SampleFilterDiskDeleteBlocksPrepare (VMIOF_DiskHandle *handle,
VMIOF_DiskDeleteBlocksInfo *info)
{
VMIOF_Log(VMIOF_LOG_ERROR, "In callback %s\n", __func__);
VMIOF_Log(VMIOF_LOG_ERROR, "info->numBlockDescs = 0x%x\n", info->numBlockDescs);
VMIOF_Log(VMIOF_LOG_ERROR, "info->descs[0]: offset = 0x%lx length = 0x%lx\n",
(long unsigned int) info->descs[0].offset,
(long unsigned int)info->descs[0].length);
return VMIOF_SUCCESS;
}
The code is then invoked using vmfstools, which is an easy way to send the SCSI UNMAP command. The
filter has already been attached to test.vmdk
The body of this callback must search for the IO being aborted to see if said IO is on the LI's list of owned
IOs. If so, it must invoke VMIOF_DiskIOComplete() on the IO being aborted with a status of VMIOF_IO_ABORTED,
and then return with that same status. If not, it must return VMIOF_NOT_FOUND. It may not return any other
values.
n Remove said IO from its list of owned IOs, as aborted IOs no longer exist
n Abort any subordinate IOs it may have created and submitted with VMIOF_DiskIOSubmit() by calling
VMIOF_DiskIOAbort()
Ensure the callback cancels workers, timers, etc. that may be associated with processing the IO as part of
cancelling it.
n VMIOF_DiskHandle *handle— This input parameter an opaque handle to the disk and is valid only for the
filter that it is passed to. Almost every callback function has this handle to the disk as the first
parameter.
n VMIOF_DiskIO *io— This input parameter describes a disk IO request that should get aborted by this
filter.
Return Values :
n VMIOF_NOT_FOUND— The does not own the IO and therefore cannot abort it
NOTE diskIOAbort() callback function is not allowed to block. It is also not allowed to defer the request to
another context (poll or worker), it must complete the IO through VMIOF_IO_ABORTED from within the calling
context if the IO is found.
NOTE If you want to test diskIOAbort(), you can simply delay your IO processing, this will trigger
diskIOAbort(). At some later point, we will issue the diskIOsReset() callback.
The body of this callback must search for its list of owned IOs to find any owned IOs whose resetIdentifier
matches the resetIdentifier passed into the callback. For each IO found with the matching resetIdentifier, the
callback must invoke VMIOF_DiskIOComplete() on it with a status of VMIOF_IO_ABORTED.
If a LI aborts an IO it owns during a diskIOsReset callback, it must remove said IO from its list of owned
IOs, as aborted IOs no longer exist. Ensure the callback cancels workers, timers, etc. that may be associated
with processing the IOs being reset.
Parameters :
n VMIOF_DiskHandle *handle— This input parameter an opaque handle to the disk and is valid only for the
filter that it is passed to. Almost every callback function has this handle to the disk as the first
parameter.
n VMIOF_DiskResetIdentifier resetIdentifier — IOs associated with this identifier should get aborted
NOTE diskIOsReset() callback function is not allowed to block. It is also not allowed to defer the request to
another context (poll or worker), it must complete the IO through VMIOF_IO_ABORTED from within the
calling context.
NOTE If you want to test diskIOsReset(), you can simply delay your IO processing, this will trigger
diskIOAbort(), and at some later point the framework will issue the diskIOsReset() callback.
Parameters :
n VMIOF_DiskHandle *handle — The handle to the VMDK for which origIO is a request
n VMIOF_HeapHandle *heap — The handle to the heap to be used for allocating IO memory for the
duplicated request returned in outIO
Return Value :
n VMIOF_SUCCESS — The function succeeded and outIO points to a newly allocated disk IO object
n VMIOF_NO_MEMORY — The function could not allocate the memory necessary to duplicate the IO request
A common use of this function is to create a snapshot of an IO request whose ioFlags element includes
VMIOF_ZERO_COPY, as the data pointed to by the addr members of said request's VMIOF_DiskIOElem
elements can change while the IO is still in flight. If your IO Filter Solution needs stable data in the IO
request, it should use this function to create a duplicate of the data and process the duplicate.
Parameters :
n VMIOF_DiskHandle *handle — The handle to the VMDK for which origIO is a request
n VMIOF_HeapHandle *heap — The handle to the heap to be used for allocating IO memory for the
VMIOF_DiskIOElem structures allocated by the function
n uint32_t numElems — The number of VMIOF_DiskIOElem structures the function must allocate and place
in the structure pointed to by *outIO.
n VMIOF_DiskIO **outIO — The new IO request structure, allocated from heap. The structure will have
numElems VMIOF_DiskIOElem structures in it. However, none of elements' members are set, nor is space
allocated for them. Further the function sets the ioFlags member of the VMIOF_DiskIO structure to zero
(0).
Return Value :
n VMIOF_SUCCESS — The function succeeded and outIO points to a newly allocated disk IO object
n VMIOF_NO_MEMORY — The function could not allocate the memory necessary to fulfill the request
A common use for this function is to create and issue IOs against VMDKs from within Daemons that have
performed a VMIOF_VirtualDiskOpen() on said VMDK.
VMIOF_Status
VMIOF_DiskIOSubmit(VMIOF_DiskHandle *handle, VMIOF_DiskIO *io);
The function returns VMIOF_SUCCESS on success, or some other status indicating why it failed (for
example VMIOF_BAD_PARAM for an invalid pointer to the IO structure).
IMPORTANT The IO Filter Framework provides a default completion mechanism that frees the IO structures
allocated by VMIOF_DiskIOAlloc() or VMIOF_DiskIODup(). If you indeed want the process the IO after it is
completed then you have to call VMIOF_DiskIOCompletionCallbackSet() before you invoke
VMIOF_DiskIOSubmit().
NOTE Filters using the VMIOF_DiskSubmitIO interface should not overload the storage HW. Your code
should look out for latencies, and if they rise, decrease the load they are causing on the HW.
VMIOF_Status
VMIOF_DiskIOFree(VMIOF_DiskHandle *handle, VMIOF_DiskIO *io);
n VMIOF_DiskHandle *handle — The handle passed into the function that created the IO request
n VMIOF_DiskIO *io — The IO to free that was previously allocated either via VMIOF_DiskIOAlloc() or
VMIOF_DiskIODup()
This function returns VMIOF_SUCCESS after freeing the IO request, or a descriptive result on error.
IMPORTANT This function frees the data structure pointed to by the io member. It will also free the memory
pointed by the addr members of the IO request's DiskIOElem structures if the io member is allocated by
VMIOF_DiskIODup(), but not by VMIOF_DiskIOAlloc().
NOTE VMIOF_DiskIO cannot be reused. When an IO completes, you need to free the VMIOF_DiskIO struct and
alloc new struct(s) for others IOs to be submitted.
The three specific things that VMware expects you do use in your IO Filter Solutions to provide better
performance (than not doing these things):
n Use the VMIOF_Crossfd* functions to share memory in the your LIs with your Daemon
n Use VMIOF_AIO* functions to perform asynchronous IO to and from cache files and buffers in your
Daemon or memory in your Crossfd-shared memory
n After the Daemon makes off-host TCP/IP connections, it should pass the file descriptors for those
sockets to the LIs (using standard fd-passing techniques through UNIX domain sockets), so that said
LIs can communicate directly with off-host entities. This obviates the need for the Daemon to proxy IOs
between the LIs and off-host entities.
The purpose of CrossFD functions is to allow the daemon to perform disk IO on behalf of a filter instance,
without needing to copy data. In other words, the purpose is to share memory between entities within a
filter, but using the file IO abstract instead of traditional shared memory APIs such as mmap() or System V
shared memory segments.
This set of functions create a special kind of file descriptor, called crossfd, that can be used to read and write
memory of the cartel that creates it. Said cartel then associates specific memory to which it is willing to grant
access with the crossfd. It then passes the crossfd to other cartels (using a UNIX domain socket sendmsg())
with which it is willing to share the associated memory. The other cartels, after receiving the crossfd (using a
Unix domain socket recvmsg()) can use the VAIO asynchronous IO (AIO) functions to perform IO directly to
the associated memory in the sharing cartel, it can also use file IO functions such as pread()/pwrite() to
access the associated memory in the sharing cartel.
c Send the crossfd via a UNIX domain socket to another cartel using sendmsg().
2 In the cartel that wishes to access the shared memory (typically a Daemon):
b Choice 1: Use AIO functions to perform IO directly to memory represented by the crossfd. When
the kernel processes such AIO requests, it writes directly into the address space of the cartel that
created the crossfd. For more details about AIO, see “Understanding and Using the IO Filters AIO
Functions (for cache file IO),” on page 186
NOTE The purpose of CrossFD is to allow daemon to perform AIO on behalf of a filter instance. If
the AIO is not needed, pread and pwrite can be used to access data from crossFD, but since every
access will introduce an additional system call (pread or pwrite), the performance is probably not
as good as expected. As a preferable alternative, System V shared memory segments can be used.
System V shared memory segments come from kernel directly, so there is no limitation. However,
the page table bookkeeping overhead comes from a different resource pool. For the VMX it comes
from the uwshmempt page pool (60U2 release), for other cartels, it comes from resource pools with
a fixed limit. You don't need to account the System V Shared memory itself, but you need to
account for the page table overhead in DAEMON_MEMORY_RESERVATION for Daemon and
diskRequirements() for LIs.
The doxygen for these functions provides details on their syntax. The following sub-sections provide a
synopsis of that information:
As shown, the function takes a single output parameter, pcrossfd, a pointer to an integer that is the file
descriptor of the crossfd.
n VMIOF_SUCCESS — The function call succeeded and int *pcrossfd now points to a crossfd.
n VMIOF_NO_RESOURCES — The function call failed, and the file descriptor was not created, because the
system did not have enough resources. The value pointed to by pcrossfd is undefined.
NOTE You can use the standard close() function on the crossfd
VMIOF_Status
VMIOF_CrossfdGrantAccessToRange(int crossfd, uintptr_t start, unsigned long length);
n uintptr_t start — The starting address of memory range (within the calling cartel) to which you are
granting access.
REMEMBER The size of a uintptr_t varies in size according to the execution environment. Typically, in
32-bit environments, it is 32-bits in size, while in 64-bit environments it is 64-bits in size.
n unsigned long length — The amount of memory, starting at start, to which you wish to grant access.
n VMIOF_ALREADY_EXISTS — The grant failed because a range covering the entire region or a part of it
already exists.
n VMIOF_BAD_PARAM — The grant failed because one of the parameters is invalid (for example start is an
invalid address)
n VMIOF_NO_RESOURCES — The grant failed because the system did not have enough resources to keep track
of it internally
VMIOF_Status
CrossfdRevokeAccessToRange(int crossfd, uintptr_t start, unsigned long length);
As shown, this function takes the same parameters as CrossfdRevokeAccessToRange, with the same
semantics. The difference is that start and length define the range of memory to which the function revokes
access for other cartels.
The return values are similarly analogous, except that VMIOF_NO_RESOURCES is not a value returned by the
function.
Understanding and Using the IO Filters AIO Functions (for cache file IO)
The VAIO provides a set of functions to manage asynchronous IO transactions to cache files (see
“Understanding and Using the VMIOF_Cache*() Functions,” on page 253) using scatter / gather lists that
increase your IO Filter performance vs alternative IO methods. For example, without the AIO functions, a
Daemon for a caching Filter that wants to write n sets of blocks to a cache file would have to invoke pwrite()
(or similar function) n separate times, waiting for each to complete, blocking the Daemon each time,
probably causing context switches for each. Using the AIO functions, the Daemon can create a list of n AIO
scatter / gather structures (of type VMIOF_AIO), and then submit the list to the IO Filter Framework with a
single function invocation, and then receive and process callbacks as said Framework completes each of the
items on the list.
NOTE The AIO functions and data structures in IO Filters are analogous to, but different from, those
defined by POSIX.
Further, the VAIO AIO functions have been enhanced to allow (but not require) code to perform IO between
a file (such as a cache file) and a crossFD file (shared memory) rather than just to a cartel's memory. This
further increases performance in IO Filter component by eliminating the need to copy data between cartel-
private and cartel-shared memory.
NOTE The older version of these functions forced the developer to use the poll callback mechanism, while
this is no longer the case.
NOTE While you can't use AIO with a socket, you are able to use crossFD with sendfile()
This topic provides a discussion of the data structures and functions related to performing AIO within IO
Filter Daemon and Library components.
The comments in the code above describe the usage of each member.
NOTE You can perform vectored IO via VMIOF_AIO_READV or VMIOF_AIO_WRITEV. This would allow you to
only have one VMIOF_AIO structure.
NOTE Each AIO request must be a multiple of 512 bytes. If it is vectored IO, the overall length of all the
vectors must meet this criteria, since overall they compose one IO request.
The type VMIOF_AIO Callback defines the prototype for all callback functions, which is:
n void *data — The value passed in the data member of the VMIOF_AIO structure associated with the IO
that just completed
n VMIOF_NO_CONNECTION — No connection to the device, you can retry, but only after some time.
The comments in the code above describe the usage of each member.
n VMIOF_AIOContextProperties *props — The properties from which the function allocates the context
structure
n VMIOF_NO_MEMORY — The function could not create the context because of insufficient memory. Either the
specified heap was out of space or the system failed the underlying memory allocation to the heap
n VMIOF_NO_RESOURCES — The function could not create the context because it lacked some resource other
than memory.
When the function returns any value other than VMIOF_SUCCESS the value in pcontext is undefined. You can
retry the function a few times with a delay before finally giving up.
n uint32_t submitted — This optional parameter gets the number of AIOs that were submitted. If the
system did not have all the available resources, this parameter will reflect the index of the first AIO that
was not submitted.
n VMIOF_BAD_PARAM — The AIOs were not submitted because a parameter was malformed.
NOTE If submitted is specified, the framework will not try to submit the remaining IOs and instead return
the number of submitted IOs. If submitted is not specified the AIOs that were not submitted will have their
completion callback invoked with an error status.
NOTE There is no strict limit for the number of AIOs you submit, but you might hit fast slab allocation
failures if you have more than 4k or 8k outstanding IOs, since they are all allocated from the same slab on a
per context basis. If you want to go way beyond that, you should create individual contexts and should be
able to push the IO limits well into the 16k to 32k range.
n VMIOF_AIOContext *context — the context to which the aio was previously submitted
n uint64_t maxIOs — The maximum number of IOs for which the completion callback will be called.
NOTE The caller of this function is responsible for calling eventfd_read() to determine maxIOs value.
n VMIOF_AIOContext *context — the context to which the aio was previously submitted
n VMIOF_Status *aioStatus — the status of the AIO only in the case where the AIO is complete, but the
callback has not been called.
n VMIOF_SUCCESS — The AIO was aborted. A completion callback for this IO has and will not been called.
1 Create a new AIO context in a Daemon's Start callback , and destroy it in the Cleanup callback.
2 For each IO operation in a group of operations (for example, for each VMIOF_DiskIOElem in a
VMIOF_DiskIO structure):
3 Submit the AIOs using the VMIOF_AIOSubmit() function. As the IO Filter Framework process each AIO, it
calls the appropriate callback.
2 When creating the context, set the eventfd and set the flag to VMIOF_AIO_CONTEXT_CUSTOM_EVENTFD
5 You will get called in your eventfd Poll callback whenever there are pending IOs. The Poll callback
needs to:
a Call eventfd_read(eventfd, &count) to get the number of AIOs that require completion
2 When creating the context, set the eventfd and set the flag to VMIOF_AIO_CONTEXT_CUSTOM_EVENTFD
4 Sit on a blocking call (eg. Blocking eventfd_read()) and loop until all AIO’s are complete
VMIOF_AIOSubmit(aios, naios);
pending = naios;
while (pending > 0) {
/* block waiting for at least one AIO to finish */
eventfd_read(fd, &count);
/* process completions */
VMIOF_AIOProcessCompletions(ctx, count);
pending -= count;
}
1 When creating the context, set the eventfd == -1 and set the flag to VMIOF_AIO_CONTEXT_FLAGS_NONE
3 You will get called in your AIO callback for each completed AIO.
4 You don’t need to call VMIOF_AIOProcessCompletions() as the AIO is complete once your callback is
called. In fact it is strictly forbidden to call VMIOF_AIOProcessCompletions() if you do not specify a
custom eventfd
NOTE In our internal testing, we observed the best performance from the 2nd option - custom eventfd and
no poll callback. The thread that submits the IOs should be the one that waits for them and processes their
completions.
The programming pattern between LIs and Daemons deserves further explanation in cache solutions.
Generally, in this case, the Daemon will create and manage the cache file, but the LI receives the
diskIOStart callbacks with the IO requests. Thus, the transaction involves two programming patterns: One
in the LI; one in the Daemon, as follows:
In the LI:
1 Create a set of buffers to share memory between the LI and Daemon. (Optional: You can directly map
the guest's IO buffers)
2 Create a crossFD and grant access to the buffers created in the previous step as described in the topic
“Understanding and Using the IO Filters CrossFD Functions,” on page 184. Do this in the LI's diskOpen
callback. Close the resulting file descriptor in the LI's diskClose callback. Send the crossFD file
descriptor to the Daemon.
3 In the diskIOStart callback, send a control message to the Daemon indicating how many IO requests to
perform to fulfill this IO's request. Then, for each VMIOF_DiskIOElem in a VMIOF_DiskIO structure that you
want the Daemon to fulfill:
a For read operations, assign a buffer in shared memory into which the Daemon must write and send
control information to the daemon with this information
b For write operations, assign a buffer in shared memory, copy the data from the VMIOF_DiskIOElem
into said buffer, and then send control information to the daemon with this information
In the Daemon:
1 For each LI connection, create an AIO context. Destroy it, performing any appropriate cleanup, when
the LI disconnects. Receive the crossFD from the LI during initial handshake with it.
b Allocate a new VMIOF_AIO structure and populate its members with the addresses of the buffers,
lengths, etc. specified in the control packet received in the preceding step. Remember to set the
crossFD member to the one received at handshake.
d Upon receiving all of the IO elements for a given request, submit the list.
IMPORTANT Your code must provide any synchronization necessary between submit / drain operations and
abort operations. For example, extending the preceding programming pattern, if a Daemon receives a
control message to abort an IO element after it has been submitted, said request can arrive while the callback
for the given IO element is running. Therefore the code must synchronize access to any data structures
shared between the callback and abort code.
NOTE You need to protect against data leakage from a VM by ensuring that a LI can’t read data in the cache
that is from a different VM. Typically, you do this by using some level of indirection in your cache
implementation.
n NFS and VMFS-based datastores use delta disks for VMDKs. Each time a snapshot happens:
a The existing VMDK is frozen. No new writes are allowed to this VMDK file. It is now called a
parent VMDK.
b The hypervisor creates a new VMDK file, called the child (relative to the parent discussed in the
preceding step). All writes to the virtual disk occur to this child VMDK file. This is why it is called a
delta disk.
The hypervisor satisfies reads by first checking for the block in the child VMDK file. If the child file
does not contain the block, the hypervisor looks for the block in the parent VMDK file.
c The virtual disk now consists of this chain of parent and child VMDK files. The VMDK file that
exists before any snapshots are taken is referred to as the base VMDK.
Should someone take another snapshot, the hypervisor repeats these 3 steps, and the virtual disk then
consists of a chain of 3 VMDK files etc. Reads may take longer now because the hypervisor may have to
check 3 files before it finds the block it needs. Long chains of delta disks can have a significant impact
on the performance of a VM.
n For native snapshots (VVOL and NFS VAAI), we don't create a delta disk, rather the whole disk is
copied, so you will not see the disk chain opened.
The IO Filter Framework passes several snapshot-related events onto IO Filters, including:
n Taking a Snapshot — When an administrator or program takes a snapshot on a VM, the IO Filters
attached to the VM's VMDKs are notified of the snapshot event (unless the VMDK is marked as
independent by the vSphere Administrator). A successful snapshot operation results in the host having a
complete image of the VM's state, including that of its VMDKs, such that the vSphere environment may
later return (revert) the VM to this state at will. For that to be possible, each IO Filter must do its part to
make the VMDKs to which they are attached clean and complete so that the snapshot can proceed, or
fail the event to prevent the snapshot from happening.
n Deleting a Snapshot — When an administrator or program deletes a snapshot, the hypervisor takes the
data from that snapshot and rolls it forward into the earlier VMDK file. After the snapshots are deleted,
the IO Filters attached to the VM's VMDKs are notified of a disk collapse event (unless the VMDK is
marked as independent by the Administrator). They must determine which snapshots were deleted and
update their state accordingly.
The following subsections provide details for the callbacks invoked for snapshot and collapse events. They
also discuss functions a filter can use to scan for blocks that have changed since the most recent snapshot.
NOTE Currently we don't support attaching filters to a VM/VMDK that has existing snapshots.
Parameters :
n VMIOF_DiskHandle *handle — This input parameter an opaque handle to the disk and is valid only for the
filter that it is passed to. Almost every callback function has this handle to the disk as the first
parameter.
Return Value :
n VMIOF_SUCCESS — The function succeeded in handling the reported phase of the snapshot
NOTE This return value is only allowed in the prepare phase. It must not be used in the notify or failure
phase.
n Any other value will indicate failure, which will fail the snapshot operation. Returning an error code is
only allowed in the phases of VMIOF_SNAPSHOT_PREPARE and VMIOF_SNAPSHOT_NOTIFY, while the phase
VMIOF_SNAPSHOT_FAIL may only return VMIOF_SUCCESS.
NOTE Usually the IO Filter framework expects filters to fail the snapshot operation in the
VMIOF_SNAPSHOT_PREPARE phase. Remember that there are cases where VMIOF_SNAPSHOT_PREPARE is not
delivered, e.g. when vSphere creates VMDKs for linked clone VMs. In such cases filters may fail the
snapshot operation in VMIOF_SNAPSHOT_NOTIFY phase.
Prepare Phase — During the prepare phase, the handler is invoked with the phase set to
VMIOF_SNAPSHOT_PREPARE. In this phase, the filter instance should stun all its background operations and
flush all its dirty data to disk to make the disk consistent. It may also perform additional work that is
required by a snapshot. IOs to the disk must continue to be processed while this notification is in progress.
On completion, the filter must continue to process IOs keeping the disk in a consistent state until the disk is
closed or a VMIOF_SNAPSHOT_FAIL notification is received. There is no guarantee that the snapshot will be
created immediately after filter finishes its prepare stage. Note that neither VMIOF_SNAPSHOT_NOTIFY nor
VMIOF_SNAPSHOT_FAIL may be seen in the event of a catastrophic failure.
If there is no asynchronous work required by the filter instance, a value of VMIOF_SUCCESS is returned. This
is the state where no action is needed and everything is ready for the next phase.
When there is work to be done by the filter instance, a value of VMIOF_ASYNC is returned. The filter should
stun all its background operations and flush all its "dirty data" to disk to make the disk consistent. The filter
instance must report the progress of its actions continuously to the Filter Framework using the
VMIOF_DiskOpProgressFunc() pointer in the VMIOF_DiskSnapshotInfo structure. This function includes two
parameters, the disk handle and the percentage complete, and allows for updating the progress on a
granularity of one percent. This function can report the same progress multiple times but decreasing the
progress is prohibited. Progress is reported approximately every 10 seconds until the necessary actions are
completed.
Completion of work is indicated to the Filter Framework using the VMIOF_DiskOpCompleteFunc() pointer in
the VMIOF_DiskSnapshotInfo structure. If the filter instance returns VMIOF_SUCCESS for this function, the Filter
Framework is allowed to proceed with the snapshot. However, if the filter sets any other status value, the
Prepare Phase will end for all filters and the snapshot will be terminated.
Once all filters have returned VMIOF_SUCCESS, the Prepare Phase is over. At this point, all disks associated
with the VM are stunned/closed. The Filter Framework issues a diskStun()/diskClose() to all the filters.
Notify Phase — The following diagram (taken from a VMware KB Article) shows an overview of a disk
with 3 snapshots.
Assuming we take the first snapshot. After the snapshot, we will have parent VMDK file vm.vmdk and
child VMDK file vm-001.vmdk. All sidecar files will also be copied for the child VMDK file. So if you have a
sidecar file vm.vmfd for vm.vmdk, then after snapshot, you will have a copy of vm.vmfd, named as
vm-001.vmfd, associated with the child VMDK file vm-001.vmdk.
In case of failure, the diskSnapshot() callback will be seen with the phase VMIOF_SNAPSHOT_FAIL. On failure, a
filter instance can resume background work and the disk is not required to be in a consistent state.
First, the child VMDK vm-001.vmdk will be opened with VMIOF_DiskFlags in VMIOF_DiskInfo set to
VMIOF_DISK_NO_IO. At this time, IOs to disk is not allowed, but IOs to sidecar files are allowed. All the
access to the sidecar files will go to the ones associated with the child VMDK, whick is vm-001.vmfd in our
example. If a filter needs to track parent–child disk relationships, VMware recommends that the child's
sidecar is updated so that the relationship can be inferred from it. Then vm-001.vmdk will receive callback
of diskSnapshot with VMIOF_DiskSnapshotPhase in VMIOF_DiskSnapshotInfo set to
VMIOF_SNAPSHOT_NOTIFY, and callback of diskClose aftewards.
Then, the whole VMDK chain, vm.vmdk and vm-001.vmdk will be opened together. In other words, in
VMIOF_DiskInfo, linksInChain will be 2, while filesInChain will refer to the two VMDK path names,
vm.vmdk and vm-001.vmdk. Access to sidecar file now will continue to go to vm-001.vmfd. At this point all
IO are still suspended until callback diskUnstun is received. Then the VMDKs will resume seeing IOs.
NOTE A Linked Disk Clone is implemented through Disk Snapshot mechanism. So the LI should only
expect a diskSnapshot callback, not a diskClone callback. In the current release, there is no way to
distinguish a Snapshot and a Linked Clone, but in the next release we will add a flag in diskSnapshot
callback to distinguish them. For a Linked Clone, the LI will not receive diskSnapshot PREPARE, but only
diskSnapshot NOTIFY. You do not need to flush dirty data in case of Linked Clone for a caching solution as
the Linked Clone is on top of a Snapshot, so the dirty should already has been flushed as part of the
Snapshot process.
NOTE Independent-Nonpersistent vmdk's are also implemented using Snapshot. When a VM is powered
on, the base Independent-Nonpersistent disk is opened as Read-Only mode, then a delta disk is created as
the “redo log”, and diskSnapshot callback of Phase Notify is delivered, then the disk chain is opened for
IOs. When the VM is powered off, the delta disk is deleted and the content inside is discarded, causing the
diskDetach callback to be invoked with the VMIOF_DISK_DETACH_DELETE flag set.
Example of log messages from vmware.log during a snapshot operation while the VM is running. In this
case, the disk name is cent55.vmdk —
Quesce Snapshot When customers take a snapshot, they have the option to "Quiesce the guest file system".
vSphere needs the VMWare tools to orchestrate this application level quiescing inside the guest operating
system. After application level quiescing is complete, applications are allowed to modify the qiuesced state
of the disk mostly to cleanup unwanted data. Our snapshots are readonly and to allow applications to write
to quiesce state we create one more child from the quiesced disk state. Then we hot add that disk to the
guest so that application can write to it. The second disk will always be opened with SYNC flags set, and a
caching based IO Filter Solution should write-through to this disk. After the snapshot operation is complete,
the snapshot points to the 2nd child, then the IO Filter Solution can open this 2nd child, and the base disk as
a chain for a consistent snapshot image backup.
Now we only take writable snapshots on Win2k3, Win2k8, Win2k8r2 and Win8server.
Both VMFS and NFS Native Snapshot create a 2nd child disk for a quiesce snapshot. But on VVOL we don't
create a 2nd child disk for quiesce snapshot, rather the parent disk will see the writes.
Reverting a Snapshot
A snapshot revert creates a new child disk by calling the snapshot callback and deleting the current running
point using detach callback. The following is an example of invoking the series of callback using the vSphere
Web Client GUI.
In this example, a VM(TVM) was created and 2 snapshots were created such as snap1 and snap2. The VM
was in a powered-on state. Using the VC GUI, the "Revert to the latest snapshot" operation was performed.
A new snapshot is created based on the chain of TVM.vmdk and TVM-000001.vmdk, by forming a new
child TVM-000003.vmdk. After that, the child disk TVM-000002.vmdk is deleted. This is reflected in
hostd.log:
n VMIOF_FAILURE — This return value indicates a failure. The framework will fail the disk collapse
operation, but the link hierarchy will be unaffected. Because the deletion of any delta disk (or analogous
object on VVOLS) has already occurred and cannot be reverted, the parent link now contains the
combined disk contents of the child and parent.
There is a consolidation of disks (if this wasn’t the tail of the snapshot chain), in which the snapshot that
followed the deleted snapshot is consolidated in order to maintain consistency. For example, suppose that at
time T, a virtual disk had the following chain:
Suppose an administrator (or program) deletes Snap 2. To effect this, the hypervisor takes the contents of
LinuxVM-000002.vmdk and writes them into LinuxVM-000001.vmdk, then moves the label of Snap 3 to
LinuxVM-000001.vmdk. For convenience, we express the combined disk as LinuxVM-000001.vmdk
+LinuxVM-000002.vmdk. There are no changes to the base disk or LinuxVM-000003.vmdk.
Let's see a real example to show what the IOFilter Framework does and what Library callbacks are invoked
during the process of disk consolidation when a VM is running.
Before the operation, the disk chain, parent disk "cent55.vmdk" and child disk "cent55-001.vmdk", is opened
and expecting IOs. There are two sidecar files, "cent55.vmfd" and "cent55-001.vmfd", associated to the parent
and child VMDK respectively. Access to sidecar file goes to the child file "cent55-001.vmfd".
During disk consolidation operation, the IOFilter Framework takes the following steps:
1 Since the VM is running, the Framework needs to first stun the VM. It does this by invoking diskStun
and then the diskClose callback on the disk chain.
2 In order to consolidate the two VMDKs, both are opened and then closed separately.
3 The framework then opens the disk chain and unstuns it, so that the VM can resume running.
4 Consolidation occurs by copying the content from "cent55-001.vmdk" to "cent55.vmdk" while the VM is
actively running.
5 The framework then stuns the VM again by invoking the diskStun and the diskClose callbacks to the
chain. It then copies the content of sidecar file "cent55-001.vmfd" to "cent55.vmfd" replacing the original
content.
6 The framework then opens the disk chain, invokes the diskCollapse callback, and then closes the disk
chain.
7 It then updates the vmx file to point to the consolidated disk "cent55.vmdk".
8 The next step is to open the child "cent55-001.vmdk" with VMIOF_DiskFlags in VMIOF_DiskInfo set to
VMIOF_DISK_NO_IO. The diskDetach callback is then invoked with the VMIOF_DiskDetachFlags in
VMIOF_DiskDetachInfo set to VMIOF_DISK_DETACH_DELETE. The framework then closes the child disk.
9 The last steps involves opening the parent disk (cent55.vmdk), unstunning it, and beginning to issuing
IOs. Access to the sidecar file is now directed to "cent55.vmfd".
NOTE These two callbacks only concern cached write data, not read cache. Since all write data goes to the
current delta disk, if currentDelta is not set, Filter can simply ignore these two callbacks.
The framework invokes diskExtentGetPre before a disk extent get operation (performed by vSphere).
VMIOF_Status
(*diskExtentGetPre)(VMIOF_DiskHandle *handle, VMIOF_DiskExtentGetInfo *info);
n VMIOF_DiskHandle *handle —This input parameter is an opaque handle to the disk and is valid only for
the filter that it is passed to.
n const VMIOF_DiskExtentGetInfo *info — This input parameter that is a pointer to a structure indicates
the disk extent data. The structure is defined as follows:
n uint64_t extentOffset —The offset, in bytes, of the found extent post scan.
n uint64_t length —The length, in bytes, of the found extent post scan.
n bool currentDelta — Indicates whether the scan includes current disk state.
This callback returns VMIOF_SUCCESS if the operation is allowed to proceed, else it returns an appropriate
error value.
The framework invokes The diskExtentGetPost after vSphere completes a disk extent get operation on a
VMDK.
VMIOF_Status
(*diskExtentGetPost)(VMIOF_DiskHandle *handle, VMIOF_DiskExtentGetInfo *info);
n VMIOF_DiskHandle *handle —This input parameter is an opaque handle to the disk and is valid only for
the filter that it is passed to.
n const VMIOF_DiskExtentGetInfo *info — This input parameter that is a pointer to a structure indicates
the disk extent data. The structure is defined as follows:
n uint64_t extentOffset —The offset, in bytes, of the found extent post scan.
n uint64_t length —The length, in bytes, of the found extent post scan.
n bool currentDelta — Indicates whether the scan includes current disk state.
This callback returns VMIOF_SUCCESS if the operation is allowed to proceed, else it returns an appropriate
error value.
NOTE Note: For a caching solution, Filter only needs to change the extentOffset and length if needed in
diskExtentGetPost, not in diskExtentGetPre. For example, in an diskExtentGetPost callback,
startOffset=0x1000, extentOffset=0x1400, length =0x300. If the Filter is caching a block at 0x1200 with
length=0x100, it needs to update extentOffset to 0x1200, length to 0x100. If it is caching a block at 0x1200 with
length=0x200, it needs to update extentOffset to 0x1200, length to 0x500
VMIOF_DiskScanBegin() — Use this function to setup a scan of disk extents within a virtual disk chain. In
other words this function initializes state used in scanning a disk for used or changed blocks.
n VMIOF_DiskHandle *handle — This input parameter is an opaque handle to the disk and is valid only for
the filter that it is passed to. Almost every callback function has this handle to the disk as the first
parameter.
n VMIOF_HeapHandle *heap — This input parameter refers to the heap used to allocate the scan state.
n VMIOF_DiskScan **pscan — This output parameter refers to the initialized scan state
n VMIOF_NO_MEMORY — This return value indicates that heap has insufficient space to allocate the scan state.
VMIOF_DiskScanEnd() — Use this function to conclude a scan of disk extents within a virtual disk chain. In
other words this function destroys state used in scanning a disk for used or changed blocks.
n VMIOF_DiskScan *scan — This input parameter refers to the scan state to destroy
VMIOF_DiskExtentGetChanged() — Use this function to find the blocks that are private to the current (most
recent) snapshot of a virtual disk chain. In other words this function gets region that has changed since the
last snapshot. It scans the disk starting at the provided startOffset, and returns the offset and length of a
region of the disk which is private to the most current snapshot.
n VMIOF_DiskScan *scan — This input parameter refers to the scan state returned from
VMIOF_DiskScanBegin()
n uint64_t *startOffset — As input parameter this refers to the byte offset where the search should
start. As output parameter it refers to the start of a private region or start of next search if length is 0.
n uint64_t *length — This output parameter refers to the length of the region. It can be 0.
n VMIOF_OUT_OF_RANGE — This return value indicates that the start offset is beyond the disk capacity.
VMIOF_DiskExtentGetUsed() — Use this function to find the blocks of a virtual disk change that exist in any of
the snapshot disks. That is, this function finds blocks that have changed since the first snapshot was taken
from the base disk of a virtual disk chain. It scans the disk starting at the provided startOffset, and returns
the offset and length of a region of the disk which contains valid data, that is, a disk region that has been
written to by the current or any prior snapshot. The offset and length are in bytes.
n uint64_t *startOffset — As input parameter this refers to the byte offset where the search should
start. As output parameter it refers to the start of a valid region or start of next search if length is 0.
n uint64_t *length — This output parameter refers to the length of the region. It can be 0.
n VMIOF_OUT_OF_RANGE — This return value indicates that the start offset is beyond the disk capacity.
{
...
/* list changed extents */
status = VMIOF_DiskScanBegin(handle, heap, &scan);
if (status != VMIOF_SUCCESS) {
VMIOF_Log(VMIOF_LOG_WARNING, "Failed to begin scan.\n");
}
offset = 0;
length = 0;
start = 0;
total = 0;
for (;;) {
status = VMIOF_DiskExtentGetChanged(scan, &offset, &length);
if (status != VMIOF_SUCCESS) {
break;
}
if (offset != start + total) {
if (total != 0) {
VMIOF_Log(VMIOF_LOG_INFO, "Changed @ %lu length %lu\n",
(long)start, (long)total);
}
start = offset;
total = 0;
}
total += length;
offset += length;
}
if (total != 0) {
VMIOF_Log(VMIOF_LOG_INFO, "Changed @ %lu length %lu\n",
(long)start, (long)total);
}
VMIOF_DiskScanEnd(scan);
...
}
Generally, invocation of diskStun callbacks are paired with invocations of diskUnstun callbacks. However,
there are exceptions to this rule, discussed later in this topic. Further, diskStun / diskUnstun callbacks may
be nested. That is, it is possible to get a diskStun, then another diskStun before receiving a diskUnstun. A
filter must keep a stun level in its instance data (not its sidecar, since sidecars can not be modified while the
disk is stunned). The level starts at one during processing of a diskOpen event, and is logically reset to zero
on receipt of a diskClose event (because instance data is destroyed at diskClose). The filter must increment
the stun level in its diskStun callback and decrement it in its diskUnstun callback. That said, if the filter
receives a diskUnstun callback while the stun level is zero, it must simply return from the callback,
essentially ignoring it. This can happen when a VM is resumed after a suspend or migration.
NOTE An exception is after the diskOpen() callback but before diskUnstun() callback, access to sidecar files is
allowed, but access to the disk is not allowed.
To be clear, a filter is permitted to do IO to its VMDK and sidecars during the processing of the diskStun
callback taking the stun level from zero to one (0 -> 1), but not after returning from said callback. A filter
may resume IO to its VMDK and as soon as it enters the diskUnstun event that takes the stun level from one
back to zero (1 -> 0).
Thus, upon receipt of a stun event taking the stun level from zero to one, a filter must complete all pending
IOs that it owns before returning from the callback. The framework will not issue any IOs to the Filter until
it returns from the diskUnStun event that returns the stun level to zero.
NOTE It is possible that a disk notification for another disk requires to stun the VM while there is already a
disk notification in progress for the current disk. In this case, the filter should halt the operations it is
performing for the earlier disk notification and resume the operations once the filter has received the
diskUnstun notification for the current disk. It is permitted to post progress updates during this time period
reporting the same progress value several times. As an example, the disk is doing a long running
diskDetach operation, and there is an in-guest VM reboot. The reboot will trigger a stun notification,
meaning the filter should suspend the detach operation, and just post the same progress value until it
receives the unstun.
n VMIOF_DiskHandle *handle - This input parameter is an opaque handle to the disk and is valid only for
the filter that it is passed to.
n const VMIOF_DiskStunInfo *info - This input parameter is a structure which provides a flag and a
progress function callback that can be updated by the filter to indicate the progress of the stun activity.
n The stunFlags can only have value of 0x1 at this point of time. The enumerated type is defined as
follows :
The code should check to see if the stun level is zero, in which case it should ignore the call, unless a
Migration is in progress. For processing in this latter case, see “Understanding and Processing an
xMigration (diskVmMigration) Event,” on page 216.
Parameters :
n VMIOF_DiskHandle *handle — This input parameter an opaque handle to the disk and is valid only for
the filter that it is passed to. Almost every callback function has this handle to the disk as the first
parameter.
Return Value :
NOTE A Linked Disk Clone is implemented through the Disk Snapshot mechanism. The LI should only
expect a diskSnapshot callback, and not a diskClone callback.
Usually, you need to track the status of the VMDK, e.g. whether it is dirty, also the position of the cache, in
the sidecar file. As part of the Clone, both the VMDK and the sidecar will be copied. After that, you will get
a diskClone callback on the newly cloned VMDK. Using the sidecar file, you will know the current status
and the cache location, which you can then use to talk to the original cache. If the disk is dirty, you can
transfer the data from the original cache and flush it later. Since when the diskClone is delivered, the disk is
in a stunned state, so you won't be able to flush in inside diskClone.
sMigration while the VM is powered off (aka VMDK cold migration, or a VMDK relocation) is implemented
using Disk Clone. So you will only get a diskClone callback, not a diskVmMigration callback. One way to
distinguish the two cases is using the disk UUID. vSphere assigns a new UUID for copied VMDKs, while
the UUID will keep the same for sMigration. Another way to deal with the two cases is to implement a
reference count for the cache of how many VMDKs are using it. When you talk to the original cache on
behalf of the cloned disk, increase the reference count. Then if it is a real clone, both the original VMDK and
the newly cloned VMDK will use the same cache, and you can separate them at a later time of your choice. If
it is a VMDK Cold Migration, the original VMDK will be opened and deleted. Since you have a reference
count and if it is more than 1, so you won't delete the cache as a result. Then since only the new VMDK will
use the cache, you can continue to use it or create a new one later.
Following is a code snippet describing the activities associated with diskClone() callback :
NOTE You cannot return VMIOF_ASYNC in this function, so all tasks must be completed synchronously.
** The framework creates the new disk, and attaches the filter
FiltLib: DisableDiskIO upcallThread 1000017310 initalized 1
FiltLib: DetachFromFiltMod 1F05F5E0
DISKLIB-LIB_CLONE : Failed to clone disk using Object Cloning
DISKLIB-LIB_CREATE : CREATE: "disk2.vmdk" -- vmfs capacity=20480 (10 MB) adapter=lsilogic
info=cowGran=1 allocType=3 objType=file policy=''
DISKLIB-LIB_CREATE : CREATE: Creating disk backed by 'default'
Clone: 10% done.DISKLIB-DSCPTR: "disk2.vmdk" : creation successful.
DISKLIB-VMFS : "/vmfs/volumes/53601316-8ca9ccc0-175e-000c290c3136/WB21/disk2-flat.vmdk" : open
** New disk is opened in R/W mode, and the diskClone callback is invoked.
In the callback SampleFilterDiskOpen
(SampleFilterDiskOpen): handle=1F0663B0, flags=0(). File Chains=1. Files in chain:
Name[0]='/vmfs/volumes/53601316-8ca9ccc0-175e-000c290c3136/WB21/disk2.vmdk'
clock_gettime: 1440486892 sec, 940818835 nsec
FiltLib: sampfilt: diskOpen successful.
In the callback SampleFilterDiskClone
The progress is 25 percent
The progress is 50 percent
The progress is 75 percent
The progress is 100 percent
FiltLib: sampfilt: diskClone successful.
In the callback SampleFilterDiskClose
In the callback SampleFilterDiskRequirements
Changed @ 0 length 10485760
FiltLib: heap sampfilt statistics: numGrowthOps 0 mem bytes 0 numShrinkOps 0 mem bytes 0
numSuccAllocs 1 numFailedAllocs 0.
FiltLib: sampfilt: diskClose successful.
FiltLib: DisableDiskIO upcallThread 1000017311 initalized 1
FiltLib: DetachFromFiltMod 1F066200
FiltLib: DisableDiskIO upcallThread 4294967295 initalized 0
DISKLIB-VMFS : "/vmfs/volumes/53601316-8ca9ccc0-175e-000c290c3136/WB21/disk2-flat.vmdk" :
closed.
PluginLdr_Load: Loaded plugin libvmiof-disk-sampfilt.so from /usr/lib/vmware/plugin/libvmiof-
disk-sampfilt.so
VTHREAD start thread 9 "Upcall-11136" pid 1000017312
FiltLib: Context 1F06B460: initialized the upcall thread.
Name[0]='/vmfs/volumes/53601316-8ca9ccc0-175e-000c290c3136/WB21/test.vmdk'
clock_gettime: 1440486893 sec, 569226610 nsec
FiltLib: sampfilt: diskOpen successful.
In the callback SampleFilterDiskClose
In the callback SampleFilterDiskRequirements
Changed @ 0 length 10485760
FiltLib: heap sampfilt statistics: numGrowthOps 0 mem bytes 0 numShrinkOps 0 mem bytes 0
numSuccAllocs 1 numFailedAllocs 0.
FiltLib: sampfilt: diskClose successful.
FiltLib: DisableDiskIO upcallThread 1000017312 initalized 1
FiltLib: DetachFromFiltMod 1F06B460
DISKLIB-VMFS : "/vmfs/volumes/53601316-8ca9ccc0-175e-000c290c3136/WB21/test-flat.vmdk" : closed.
BACKGROUND The framework will always open source sidecars in READONLY mode during a clone
operation. This is due to an issue where the sidecar’s were opened in write exclusive mode, as the disk was
opened with only the OPEN_NOIO flag. During a parallel linked clone, you could have a the same sidecar
opened concurrently, so the solution is to always open the sidecar in READONLY.
First, vSphere creates a snapshot, so it freezes the original disk cent55.vmdk, and creates a delta disk
cent55-000001.vmdk. The VM keeps running, and all new data goes to the delta disk. This is reflected in
vmware.log.
Then, vSphere clones the base disk cent55.vmdk and creates a new VMDK called cloned.vmdk. As part of
the process, the IOFilter framework invokes thediskClone callback of the filter. This is reflected in vpxa.log.
Finally, vSphere deletes the earlier created snapshot, so it consolidates the disks of cent55.vmdk and
cent55-000001.vmdk, and deletes the delta disk cent55-000001.vmdk. Then all new data goes to the base disk
cent55.vmdk. This is reflected in vmware.log.
NOTE There is no Migration happening or Migration completed phase. A filter knows that a migration
has completed when it gets a diskUnstun callback after a migration callback with a prepare phase.
n VMIOF_DiskVmMigrationType type – The type of the diskVmMigration callback. Possible values are:
n VMIOF_MIGRATION_SVMOTION — This value indicates that only the virtual disks belonging to the
VM have been migrated to a destination datastore on the same host.
n VMIOF_DiskVmMigrationIpSpec ipSpec — The IP address of the target host for VM migration. For
migration failure notification, the IO Filter Framework sets this to
VMIOF_MIGRATION_IP_ADDR_INVALID.
n VMIOF_ASYNC — Processing of the migration event continues asynchronously. The framework will
postpone the migration on until the function calls completionFunc.
n Any other value is taken as a failure causing the IO Filter Framework to abort the migration.
NOTE This callback function is not allowed to block and must not perform any long running activity.
The course VMware Fundamentals for Developers and the various VMware vSphere documentation topics
discuss that VMware vSphere products (ESXi and vCenter Server), when configured into a cluster, can
migrate VMs from one host to another, even while the VM continues to run. VMware calls this feature
vMotion. Further, since the introduction of vMotion, vSphere has gained the ability to migrate the virtual
disks of a VM, for example from one datastore to another, even while the VM is running and continues to
perform IOs to the disk. VMware calls this feature Storage vMotion. For clarity, this course uses the terms
xMigration and sMigration to refer to the two different types of migrations, respectively. There is an
additional scenario where vSphere performs an xMigration: when a VM's hardware is changed while the
VM is running, for example the hot-adding / removing of a vNIC, vHBA, vDISK, etc.
At a low-level, all migrations are invoked by functions provided by various APIs. At a high level, excepting
the hardware-hot-add case, the migration functions are invoked in one of two ways:
n A human starts the migration via one of VMware's management UIs (vSphere Client, vSphere Web
Client, esxcli). For example, in the VWC, if you right-click on a VM that is running on a host in a cluster
and select "All vCenter Actions" > "Migrate", your browser displays a wizard dialog to migrate the VM
from host to host or its disks from one datastore to another.
n A management automation product starts a migration. For example, the DRS feature of vSphere may
initiate an xMigration on a VM to balance CPU and/or RAM load between hosts in a cluster.
All migration events are significant to vSphere IO Filters Solutions. For example:
n A caching solution must flush the dirty blocks in the SSD back to the disk before either type of
migration:
n For xMigration from host H1 to H2, the caching filter on H1 will have used a local SSD to cache the
data. When the VM gets to H2, the filter there will use its own local SSD to cache appropriate
VMDK data.
n For sMigration, the data must be flushed to the VMDK or the copy of said VMDK on the
destination datastore will contain stale (wrong) data.
n A replication solution, at a minimum, must perform the following for the different types of migration:
n For xMigration from host H1 to H2, the filter's Daemon on H1 should close its socket connection to
the Daemon on the replication host if it is not replicating any other VMDKs there. The filter's
Daemon on H2 must then open a connection to the replication host so that it can start sending
writes there.
n For sMigration, the filter's Daemon will almost certainly need to obtain the pathname of the VMDK
on the destination datastore and advise the filter's Daemon on the replication host of that change,
for housekeeping purposes if not others.
n The actual migration - During the migration, the Framework invokes several callbacks (including
diskClose(), diskOpen(), diskStun() and diskUnStun()).
n The migration completes or fails - On failure, the Framework notifies the Library Instance
explicitly. Currently, the Framework does not send a similar notification to the Library Instance for
success. Instead, the Library Instance can infer the success by noting the Framework's invocation of
its diskUnStun() callback after an invocation of its diskOpen() callback on H2.
The following diagram illustrates the basic sequence of interactions between the IO Filter Framework and a
Solution's Library Instance during an xMigration (after the migration is initiated by DRS, SDRS, or user
interaction):
1 The Framework on H1 iterates through each Filter attached to effected VMDKs, invoking their
diskMigration() callback. The framework passes three parameters: A handle to the VMDK affected by
the migration; A VMIOF_DiskVMMigrationInfo structure, with the phase member set to
VMIOF_MIGRATE_PREPARE to indicate that this is the preparation phase of a migration; A pointer to a
function of type VMIOF_DiskProgressFunc(), generically referred to as progress.
2 The Library Instance must take whatever steps necessary for the migration. For example, a caching
solution may notify the daemon to flush all dirty blocks for the VMDK.
NOTE Until the preparation is complete, the Library Instance must call progress at least once every 10
seconds. Each invocation of progress must pass the current amount of work done and the total amount
of work to do. For example, in a caching solution, at the start of the event suppose the Library Instance
discovers there are 170 dirty blocks to be flushed, and saves that number in a well know location. Then,
as the Filter Instances writes the blocks to the VMDK, it can update a counter of how many have been
written, also stored in a well known location.
3 diskMigration() returns VMIOF_SUCCESS when it is ready for the migration to proceed, or some other
code, which prevents the migration from proceeding. In the latter case, the iteration is restarted
invoking diskMigration() with the prepare members set to VMIOF_MIGRATE_FAILED.
NOTE Given this, it is possible for the Framework to invoke a filter's diskMigration() callback with
failed set indicating failure before it invokes the callback indicating prepare. In this case, the filter
should treat the invocation as a no-op. To detect this condition, filters should keep migration state for
each VMDK.
4 The Framework continues iterating through the other filters on the VMDK.
5 If all conditions are met for a migration, vSphere begins the second phase, that is the actual migration.
During this time, the VM continues to run, which means it can continue to do IOs to its VMDKs, which
means that the Framework continues to invoke the Filter Library's callback's (for example
diskIOStart()) as appropriate. Code in these callbacks should check the VMDK's migration state and act
accordingly. For example, a caching solution may keep its cache synced with the VMDK. vSphere
performs migrations by copying all of the RAM (xMigration) or disk (sMigration) to the destination,
keeping track of things that change during the copy. It then repeats this step, just for the changes.
Eventually the change set will converge to a relatively small number of pages / blocks, respectively.
6 When the migration change set converges, vSphere must stop the VM from running while it finishes
copying the remaining changes (typically a very short period of time). To do this, the ESXi kernel stuns
the VM. When it does this, the Framework invokes the Library Instance's diskStun() callback, passing
the VMDK handle and a VMIOF_DiskStunFlags parameter.
8 vSphere must close the VMDK on H1 as that host will no longer access it. The Framework invokes the
LI's diskClose() callback.
9 As with all callbacks, diskClose() must return VMIOF_SUCCESS to indicate that the migration can
proceed. Any other return aborts the migration as discussed earlier in this sequence.
10 vSphere must open the VMDK on H2 as that host will now perform IOs to the it. The Framework
invokes the LI's diskOpen() callback.
11 The LI must connect to the Daemon and inform it that the disk is open.
12 The filter must return VMIOF_SUCCESS on success of its diskOpen() callback. Any other result causes the
migration to abort. At this stage, this involves moving the VM back to H1, if it can. If the move back to
H1 fails, the VM is left paralyzed.
13 vSphere un-stuns the VM. The Framework invokes the Library Instance's diskUnStun() callback (on H2).
The Library Instance can match the diskUnStun() call with the migration prepare state, and conclude
that the migration has succeeded. In this case, the callback should perform any necessary cleanup from
the migration operation.
NOTE It is possible to receive diskUnStun() callbacks without a corresponding diskStun() event. The
diskUnStun() callback should treat such cases as no-ops.
NOTE It is possible in the next step for this callback to return something other than VMIOF_SUCCESS. The
results would be similar to having the diskOpen() fail as discussed in step 2
Example of detailed log messages as seen on vmware.log for xMigration is as follows. These are the events
received on the source host (where xMigration was initiated):
These are the events received on the destination host (where the VM was migrated to):
The same set of events are received when we perform a sMigration, however the notifications are sent only
to the source disk(s). An example is a write-back caching filter that has queued up some writes during a
sMigration to a different datastore, while the VM resides on the same host. The filter stores the cache
location in the sidecar. After the sMigration, the destination gets a copy of the sidecar and continues to use
the same cache location. On receiving the prepare-migration notification during a sMigration, the source
filter can now flush all it's caches and continue in write-through mode during the migration. On the source,
after the sMigration succeeds, the framework sends a detach callback and sets the delete flag. The filter on
the destination is unaware of the migration and may try to delete or reset the cache, so communication
should be setup through the sidecar or via the Daemon in order to maintain cache consistency.
NOTE sMigration while the VM is powered off (aka VMDK cold migration, or a VMDK relocation) is
implemented using Disk Clone.
>>>> Now ESXi will create the disk on the new datastore
'/vmfs/volumes/549dec0c-38a54e37-14f1-0050569436e5/WB21/WB21_1.vmdk'
2015-11-19T06:14:16.964Z| Worker#1| I120: DISKLIB-LIB_CREATE : CREATE:
"/vmfs/volumes/549dec0c-38a54e37-14f1-0050569436e5/WB21/WB21_1.vmdk"
-- vmfs capacity=28672 (14 MB) adapter=lsi
logic info=cowGran=0 allocType=3 objType= policy=''
2015-11-19T06:14:16.964Z| Worker#1| I120: DISKLIB-LIB_CREATE : CreateObjExtParams: Object
backing type 0 is invalid. Figuring out the most
suitable backing type...
2015-11-19T06:14:16.967Z| Worker#1| I120: DISKLIB-LIB_CREATE : CREATE: Creating disk backed by
'default'
2015-11-19T06:14:17.651Z| Worker#1| I120: DISKLIB-DSCPTR:
"/vmfs/volumes/549dec0c-38a54e37-14f1-0050569436e5/WB21/WB21_1.vmdk" :
creation successful.
2015-11-19T06:14:18.119Z| Worker#1| I120: DISKLIB-VMFS :
"/vmfs/volumes/549dec0c-38a54e37-14f1-0050569436e5/WB21/WB21_1-flat.vmdk" :
open successful (17) size = 14680064, hd = 0. Type
3
2015-11-19T06:14:18.676Z| Worker#1| I120: FiltLib: There are no io filters for this disk.
2015-11-19T06:14:19.201Z| Worker#1| I120: DISKLIB-VMFS :
"/vmfs/volumes/549dec0c-38a54e37-14f1-0050569436e5/WB21/WB21_1-flat.vmdk" : closed.
2015-11-19T06:14:19.373Z| Worker#1| I120: MigrateWriteHostLog: Writing to log file took 170668
us.
2
…
2015-11-19T06:14:29.695Z| Worker#1| I120: DISKLIB-DSCPTR: Opened [0]: "WB21_1-flat.vmdk" (0x20a)
2015-11-19T06:14:29.695Z| Worker#1| I120: DISKLIB-LINK : Opened
'/vmfs/volumes/549dec0c-38a54e37-14f1-0050569436e5/WB21/WB21_1.vmdk'
(0x20a): vmfs, 28672 sectors / 14 MB.
2015-11-19T06:14:29.696Z| Worker#1| I120: FiltLib: There are no io filters for this disk.
2015-11-19T06:14:29.696Z| Worker#1| I120: DISKLIB-LIB : Opened
"/vmfs/volumes/549dec0c-38a54e37-14f1-0050569436e5/WB21/WB21_1.vmdk"
(flags 0x20a, type vmfs).
2015-11-19T06:14:33.486Z| Worker#1| I120: DDB: "longContentID" =
"47efb6fe4e1ef2bf24148069ba8e8bc8" (was "26cb0eae164c3e31ac0aba15fffffffe")
2015-11-19T06:14:35.731Z| Worker#1| I120: DDB: "uuid" = "60 00 C2 9c a0 83 c1 1c-93 25 54 0b 01
a5 65 e3"
(was "60 00 C2 9b 30 34 a8 cf-0b f4 43 7e 64 ea 2c 51")
2015-11-19T06:14:37.383Z| Worker#1| I120: SVMotionLocalDiskQueryInfo: Got block size 1048576 for
filesystem VMFS.
>> We stun the VM (not shown) and close the source disk so we can copy from a consistent state
>> The LI is reopened on the source. This is so we can install the “mirror” driver.
>> From the vSphere 5.0 Storage Technical Whitepaper: Mirror Mode enables a single-pass block
copy of the source disk to the destination
disk by mirroring I/Os of copied blocks.
>> At some point in the future, the Library indicates that the copy is complete
>> Once the storage migration is complete, the LI on the destination now gets an Open. Note the
new Path, which is the clue that we are
now in the destination LI.
>> The source LI gets an open/detach/close with the flag set to VMIOF_DISK_DETACH_DELETE (1)
On receipt of this event, the filter (between its LI and Daemon) must determine if a Daemon has the VMDK
open for offline processing, and if so, get it to stop said processing and close the VMDK, so that the current
cartel can open it. If, the VMDK gets closed by whomever has it opened, the library code must return
VMIOF_SUCCESS to the Framework, causing said Framework to retry the open. If the disk has not been
opened by this filter, the Library Instance should return VMIOF_FAILURE. The LI can also return other error
codes to indicate different error scenarios. If all the filters return failure or error, the Framework will fail to
open the disk. The following figure illustrates a simplified version of this sequence:
Parameters :
n VMIOF_DiskHandle *handle — This input parameter an opaque handle to the disk and is valid only for the
filter that it is passed to. Almost every callback function has this handle to the disk as the first
parameter.
Return Value :
n VMIOF_SUCCESS — The function returns VMIOF_SUCCESS when its operation succeeds and the disk is
indeed closed. The disk open operation can then be retried.
n VMIOF_FAILURE — The function failed if the disk could not be closed. It might not have been opened by a
component belonging to this filter.
NOTE Returning VMIOF_ASYNC is prohibited. Provision for this callback by a filter is optional. If not
provided, the behaviour is the same as if VMIOF_FAILURE was returned.
n The library code must maintain a separate owner sidecar that, at a minimum, includes the pathname of
the VMDK. This is required because, as shown in the prototype above, the diskRelease callback only
receives a VMIOF_DiskHandle, not a VMIOF_DiskInfo with the pathname of the VMDK. This sidecar
may also include the IP address of the host whose Daemon has opened the VMDK for offline
processing, if any. The diskOpen callback must write the host's IP address in the owner sidecar when it
is invoked in the context of a Daemon (which it must determine on its own), and set it back to zeros (or
some other magic number) during the corresponding diskClose callback. The reason to keep the IP
address is explained later in this topic.
n The Daemon must keep a list of all VMDKs it opens for background processing so that, on receipt of
request from an LI to release a disk, if they don't have it open, then can reject the request. If they do
have it open, they can stop processing and honor the request.
n All Daemons for the Solution within a cluster must be able to communicate with one another. This is so
that, if the Daemon that opens the VMDK and the VM are on different hosts (say host1 and host2
respectively), the Solution on host2 can ask the Daemon on host1 to close it.
n The Library code must only keep the owner sidecar open for a short period of time during the
diskAttach, diskOpen, and diskClose callbacks. Failing to keep this sidecar closed at all other times will
prevent diskRelease from opening it so that it can read the VMDK's pathname (and possibly owner IP)
to start processing the event.
With these things in place, the diskRelease code can perform the steps shown in the preceding figure.
However, that diagram assumed that the VM attempting to open a locked VMDK is on the same host as the
Daemon that has locked it. The following figure provides the same sequence, but dives deeper into step 5,
where the LI asks the Daemon to release the disk, and the Daemon that has locked it is on a separate host. If
the Daemon writes its IP (or other identifying information) into the owner sidecar, the Daemon on the host
with the VM attempting to open the VMDK can proxy the release request to the Daemon that has actually
opened the VMDK:
1 A Daemon (e.g. on H2) receives a release request. (call this the local Daemon).
2 The local Daemon checks its VMDK list. If it does not find the VMDK on its list, it broadcasts messages
to the other Daemons asking if any of them have it open.
3 Each Daemon responds with an ACK or NAK. If all NAK, the local Daemon NAKs to the LI. If one of
them ACK, the local Daemon ACKs to the LI.
4 The LI returns a result to the framework based on the ACK/NAK from the local Daemon.
n For VMDKs with multiple filters attached, a Daemon for a different filter may have opened the VMDK.
In this case, the Framework sends diskRelease to each of the filters, in the same order in which filters
receive IOs, until either: One of the filter's diskRelease callbacks return VMIOF_SUCCESS; Or all filters
return failure. In the latter case, the Framework fails the open.
n A VMDK could be locked by tools such as vmkfstools. Since IO Filters attached to VMDKs is opaque to
these tools, there is no (good) way for a diskRelease callback to get the tool to close the VMDK. Here,
the VM start will simply fail.
NOTE If the VM is powered-off, and the Daemon has opened the disk for offline flushing, then a user tries
to clone the VM, the diskRelease callback will be delivered to the filter. During the whole cloning
process,the IOFilter framework will keep the disk opened in Read-Only mode, so Daemon won't be able to
open it again. However, the filter will see the diskClose callback before the cloning happens and another
diskOpen callback after the clone is complete.
Should you need your Daemon to perform IO to a VMDK, open it with VMIOF_VirtualDiskOpen(). This
function causes the IO Filter Framework to load an instance of the filter's library into the context of the
Daemon, and call its diskOpen() callback, whence said daemon receives a VMIOF_DiskHandle as one of the
parameters to diskOpen(). Once your Daemon has a handle to the disk, it can invoke VMIOF_DiskIOAlloc()
and VMIOF_DiskIOSubmit(). That said, Daemons typically communicate IO requests to their LI via a UNIX
socket used as a control plane, and a crossFD shared memory area used as a data plane.
VMIOF_Status
VMIOF_VirtualDiskOpen(const char *path, VMIOF_DiskFlags flags, VMIOF_VirtualDiskHandle **handle);
n VMIOF_DiskFlags flags — The flags to pass in the flags parameter to the filter's diskOpen() callback
n VMIOF_VirtualDiskHandle **handle — A pointer to the virtual disk handle pointer that is created on
success
NOTE You cannot use a VMIOF_VirtualDiskHandle with any function other than
VMIOF_VirtualDiskClose(). Further, you cannot pass VMIOF_DiskHandle objects between entities. They
only have meaning in the context in which they are issued.
Upon success, this function returns VMIOF_SUCCESS. On failure, the function returns a status indicating the
reason for failure, for example VMIOF_BAD_PARAM for invalid flag combinations. The handle is only valid if the
function returns VMIOF_SUCCESS.
NOTE VMIOF_DISK_NO_IO, VMIOF_DISK_RO, VMIOF_DISK_SHARED are the supported flags values. If the disk gets
opened with VMIOF_DISK_NO_IO flag, it will not be locked. This means that other applications can access the
disk and no IO is permitted to the VMDK. Only sidecar operations are allowed. To access a sidecar in
VMIOF_DISK_NO_IO mode, the sidecar file needs to be closed first so that it can get opened by the caller of this
function, or all applications that have opened the sidecar must have opened the VMDK with VMIOF_DISK_RO.
It is not allowed to open the virtual disk in VMIOF_DISK_SHARED and VMIOF_DISK_RO mode.
When you have finished performing IO to a VMDK, such as during Daemon shutdown, your code must call
VMIOF_VirtualDiskClose(). VMIOF_VirtualDiskClose() takes only one parameter, the handle returned by
VMIOF_VirtualDiskOpen().
VMIOF_Status
VMIOF_VirtualDiskClose(VMIOF_VirtualDiskHandle *handle);
Return Value :
n VMIOF_Status — VMIOF_SUCCESS on success
If you want the daemon to open an existing VMDK that has a filter already attached, you can do that using
VMIOF_VirtualDiskOpen(). However, if you want the daemon to open a VMDK that does not exist, you will
need call VMIOF_VirtualDiskCreate(). This function creates new vmdk file with a single filter attached.
VMIOF_Status
VMIOF_VirtualDiskCreate(const char *path,VMIOF_VirtualDiskType type, uint64_t size,
const char *filter, const VMIOF_DiskFilterProperty * const props[],
uint64_t numProps)
NOTE This function is deprecated and will be removed once alternative functionality is available.
n VMIOF_VirtualDiskType — The type of the virtual disk (The valid disk-types are
VMIOF_VIRTUAL_DISK_TYPE_VMFS_THIN, VMIOF_VIRTUAL_DISK_TYPE_VMFS_THICK_EAGER_ZERO, and
VMIOF_VIRTUAL_DISK_TYPE_VMFS_THICK_LAZY_ZERO).
n const char *filter — The name of the filter to attach to the disk.
Return Value :
n VMIOF_SUCCESS — The function succeeded and the vmdk was created.
n VMIOF_MISALIGNED — The vmdk could not be created because the size of the disk is not a multiple of the
sector size (512 bytes).
Following is a code-snippet describing the activities associated with a VirtualDiskOpen event on the
Daemon:
1. static VMIOF_Status
2. VmiofTestCreateDiskDaemonStart(void)
3. {
4. const char *vmdk = getenv("VMDK_PATH");
5. const char *vmdkType = getenv("VMDK_TYPE");
6. const char *filter = "vmiofTestCreateDisk";
7. const size_t size = 10 * 1024 * 1024;
8. VMIOF_VirtualDiskType type;
9. const VMIOF_DiskFilterProperty prop1 = {
10. .name = "success", .value = "true",
11. };
12. const VMIOF_DiskFilterProperty prop2 = {
13. .name = "notsuccess", .value = "false"
14. };
15. const VMIOF_DiskFilterProperty * const props[3] = {
16. &prop1, &prop2, NULL,
17. };
18. if (strcmp(vmdkType, "VMIOF_VIRTUAL_DISK_TYPE_VMFS_THIN") == 0) {
19. type = VMIOF_VIRTUAL_DISK_TYPE_VMFS_THIN;
20. } else if (strcmp(vmdkType, "VMIOF_VIRTUAL_DISK_TYPE_VMFS_THICK_EAGER_ZERO") == 0) {
21. type = VMIOF_VIRTUAL_DISK_TYPE_VMFS_THICK_EAGER_ZERO;
22. } else if (strcmp(vmdkType, "VMIOF_VIRTUAL_DISK_TYPE_VMFS_THICK_LAZY_ZERO") == 0) {
23. type = VMIOF_VIRTUAL_DISK_TYPE_VMFS_THICK_LAZY_ZERO;
24. } else {
25. VERIFY(false);
26. }
27. LOG("%s: daemon starting up and creating vmdk \"%s\" of type %s", __FUNCTION__, vmdk,
vmdkType);
28. return VMIOF_VirtualDiskCreate(vmdk, type, size, filter, props, i. ARRAYSIZE(props));
29.}
Lines 1-2: Define VmiofTestCreateDiskDaemonStart() function. This is the callback for a daemon start and for
this example acts as the entry point for invoking the VMIOF_VirtualDiskCreate() function.
Lines 4-28: Initialize the various parameters to be passed to the VMIOF_VirtualDiskCreate() function.
Lines 9-17: Initialize the property to be passed to the filter and the disk.
Line 32: Calls the VMIOF_VirtualDiskCreate() function with the desired parameters.
Prototype :
Parameters :
n VMIOF_DiskHandle *handle - This input parameter an opaque handle to the disk and is valid only for
the filter that it is passed to. Almost every callback function has this handle to the disk as the first
parameter.
n VMIOF_DiskGrowInfo *info – This input parameter is a pointer to a structure that contains brief
information for a disk grow notification for the filter. The capacity member of this structure conveys
the new capacity to which the disk is about to be grown. The filter can post the progress update of
diskGrow() to the IO filter framework as part of its processing and notification via the function pointer
VMIOF_DiskOpProgressFunc.
Return VMIOF_SUCCESS to allow the grow to continue. Returning any other value aborts the disk grow
operation.
NOTE The provision of this callback by a filter is optional. If not provided, the behaviour is the same as if
VMIOF_SUCCESS was returned.
The following is a sample log file that helps explain the diskGrow event sequence:
>> First the framework stuns the VM and closes all the disks. Note: In this scenario, the
filter is only attached to disk 2 (WB21_1)
>> The framework only opens the disk that it is about to Grow
'/vmfs/volumes/53601316-8ca9ccc0-175e-000c290c3136/WB21/WB21_1.vmdk' (0x18):
vmfs, 24576 sectors / 12 MB.
2015-09-16T22:18:25.704Z| vcpu-0| I120: PluginLdr_Load: Loaded plugin libvmiof-disk-countio.so
from
/usr/lib64/vmware/plugin/libvmiof-disk-countio.so
2015-09-16T22:18:25.709Z| Upcall-2bd8e1| I120: VTHREAD start thread 9 "Upcall-2bd8e1" pid
1000158970
2015-09-16T22:18:25.710Z| vcpu-0| I120: FiltLib: Context 32ABAD70: initialized the upcall thread.
2015-09-16T22:18:25.710Z| vcpu-0| I120: In the callback TestDiskOpen
2015-09-16T22:18:25.710Z| vcpu-0| I120: diskFlags = 0x0
2015-09-16T22:18:25.744Z| vcpu-0| I120: FiltLib: countio: diskOpen successful.
2015-09-16T22:18:25.744Z| vcpu-0| I120: DISKLIB-LIB : Opened
"/vmfs/volumes/53601316-8ca9ccc0-175e-000c290c3136/WB21/WB21_1.vmdk"
(flags 0x18, type vmfs).
>> The framework has reopened the disk in order to notify the library that the disk is about to
be grown
>> The framework has closed the disk and proceeds to grow the it.
>> Now the framework reopens the disk then closes the disk, giving the filter a chance to update
its metadata (sidecars).
32ABF280
2015-09-16T22:18:28.114Z| vcpu-0| I120: FiltLib: heap countio statistics: numGrowthOps 1 mem
bytes 12288 numShrinkOps 0 mem bytes 0
numSuccAllocs 2 numFailedAllocs 0.
2015-09-16T22:18:28.115Z| vcpu-0| I120: FiltLib: countio: diskClose successful.
numLinks = 1, allocationType = 0
2015-09-16T22:18:28.273Z| vcpu-0| I120: SCSIDiskESXPopulateVDevDesc: Using FS backend
2015-09-16T22:18:28.273Z| vcpu-0| I120: DISKUTIL: scsi0:1 : geometry=14/64/32
2015-09-16T22:18:28.274Z| vcpu-0| I120: FiltLib: handle 32ABEE20 adapter 0
2015-09-16T22:18:28.283Z| vcpu-0| I120: Vigor_UpdateSchedulingPolicy: results: 1 args: normal
18446744073709551615 18446744073709551614 0
2015-09-16T22:18:28.285Z| vcpu-0| I120: # "#disk0.ddb.thinProvisioned" = "1"
2015-09-16T22:18:28.285Z| vcpu-0| I120: # "#disk1.label" = "scsi0:1"
2015-09-16T22:18:28.285Z| vcpu-0| I120: # "#disk0.ddb.toolsVersion" = "2147483647"
2015-09-16T22:18:28.285Z| vcpu-0| I120: # "#disk0.ddb.uuid" = "60 00 C2 9f 35 1c 92 6c-0b b6 30
8d 38 6c 51 06"
2015-09-16T22:18:28.285Z| vcpu-0| I120: # "#disk1.ddb.iofilters" = "countio"
2015-09-16T22:18:28.285Z| vcpu-0| I120: # "#disk1.ddb.longContentID" =
"47efb6fe4e1ef2bf24148069ba8e8bc8"
2015-09-16T22:18:28.285Z| vcpu-0| I120: # "#disk1.ddb.adapterType" = "lsilogic"
2015-09-16T22:18:28.285Z| vcpu-0| I120: # "#disk0.ddb.geometry.heads" = "255"
2015-09-16T22:18:28.285Z| vcpu-0| I120: # "#disk1.ddb.geometry.heads" = "64"
2015-09-16T22:18:28.285Z| vcpu-0| I120: # "#disk1.ddb.uuid" = "60 00 C2 9c a0 83 c1 1c-93 25 54
0b 01 a5 65 e3"
2015-09-16T22:18:28.286Z| vcpu-0| I120: # "#disk0.ddb.adapterType" = "lsilogic"
2015-09-16T22:18:28.286Z| vcpu-0| I120: # "#disk1.ddb.sidecars" = "countio_1,WB21_1-
a5df5f3ae9d2d1b5.vmfd"
2015-09-16T22:18:28.286Z| vcpu-0| I120: # "#disk0.label" = "scsi0:0"
2015-09-16T22:18:28.286Z| vcpu-0| I120: # "#disk0.ddb.geometry.cylinders" = "2610"
2015-09-16T22:18:28.286Z| vcpu-0| I120: # "#disk0.ddb.geometry.sectors" = "63"
2015-09-16T22:18:28.286Z| vcpu-0| I120: # "#disk0.ddb.virtualHWVersion" = "11"
2015-09-16T22:18:28.286Z| vcpu-0| I120: # "#disk1.ddb.geometry.cylinders" = "14"
2015-09-16T22:18:28.286Z| vcpu-0| I120: # "#disk0.ddb.longContentID" =
"0b5b332e78a517e8399fec717fb7afcb"
2015-09-16T22:18:28.286Z| vcpu-0| I120: # "#disk1.capacityMB" = "14"
2015-09-16T22:18:28.286Z| vcpu-0| I120: # "#disk1.ddb.geometry.sectors" = "32"
2015-09-16T22:18:28.286Z| vcpu-0| I120: # "#disk1.ddb.virtualHWVersion" = "11"
2015-09-16T22:18:28.286Z| vcpu-0| I120: # "#disk0.capacityMB" = "20480"
There are two ways to set and change VMDK capabilities / properties. One way is using vmkfstools, either
when attaching a filter to a VMDK, or after attaching. This is only for developing and testing purpose, so it
should not be used in production environment.
The preferred way is through configuring a SPBM Policy. You can configure a SPBM policy either using the
vSphere Web Client or using the vSphere Storage Policy API. SPBM is designed to be a configuration that
applies to a large number of virtual disks, but not as an extremely fine grained virtual disk management. If
you need fine grained control for each and every virtual disk, you need to have your VWC Plugin talk to
your CIM provider or daemon to provide additional configuration information to the Library Instance.
When you change the filter properties through SPBM policy, you will be asked whether you want to apply it
right now, and if you press yes, for each and every disk, diskPropertiesValid / diskPropertiesSet will be
called with the changed properties. Please keep in mind that it is possible that you also see this callback
when other filters change around you, so you might also see it without any actual changes to your filter.
NOTE If you change filter properties for a VMDK of a running VM, the VM will be stunned first.
NOTE If you change to a different SPMB policy, while both of the policies contain your filter, your LI will
only see diskPropertiesValid / diskPropertiesSet callbacks, but not detach/attach callbacks.
diskPropertiesValid() — Determine if the list of properties and their associated values are valid. In other
words given a list of filter properties, report if the properties can be applied to the virtual disk.
VMIOF_Status(*diskPropertiesValid)(VMIOF_DiskHandle *handle,
const VMIOF_DiskFilterProperty *const *properties);
Parameters —
n VMIOF_DiskHandle *handle — This input parameter is an opaque handle to the disk and is valid only for
the filter that it is passed to. Almost every callback function has this handle to the disk as the first
parameter.
n const char *name — A pointer to an ASCII string containing the name of the property to be
validated, set, retrieved, or freed.
n const char *value — For valid and set, this points to the value being diskPropertiesValid /
diskPropertiesSet, respectively, in ASCII.
Return Values —
n A return value of VMIOF_SUCCESS indicates that the specified filter properties are valid for this filter
instance. A return value of VMIOF_FAILURE indicates that the specified filter properties are not valid or
cannot be set at this time.
NOTE Provisioning of this callback by a filter is optional. If not provided, the filter is expected to
normally succeed when setting properties.
NOTE The diskHandle in diskPropertiesValid might be NULL depending on the context. For example,
the first time you attach the filter to a disk, the pointer will be NULL, since the filter hasn't been
attached to the disk yet. If the filter has already been attached to the disk, and you change the policy,
you will see a valid diskHandle.
diskPropertiesSet() — Update the values of the specified properties with their associated values, typically
by writing them to a sidecar.
Parameters —
n VMIOF_DiskHandle *handle — This input parameter is an opaque handle to the disk and is valid only for
the filter that it is passed to. Almost every callback function has this handle to the disk as the first
parameter.
Return Values —
n VMIOF_SUCCESS — The specified filter properties have been accepted and will be applied.
n VMIOF_FAILURE — The specified filter properties have not been accepted and will not be applied
NOTE Long running work that needs to take effect as a result of the change in properties, must happen
outside of the context of this callback.
diskPropertiesGet() — Retrieve the values of the specified properties. The function is expected to
dynamically allocate the space for the retrieved values.
Parameters —
n VMIOF_DiskHandle *handle — This input parameter is an opaque handle to the disk and is valid only for
the filter that it is passed to. Almost every callback function has this handle to the disk as the first
parameter.
Return Values —
n none
Parameters —
n VMIOF_DiskHandle *handle — This input parameter is an opaque handle to the disk and is valid only for
the filter that it is passed to. Almost every callback function has this handle to the disk as the first
parameter.
Return Values —
n none
NOTE Implementing this callback by a filter is optional. If it is not provided, the filter is expected to free the
associated memory on close.
WHAT'S NEW In ESX 60U1, after upgrading your Filter from one version to another, you need to delete the
SPBM policy and recreate it. This has been fixed for 60U2
In newer versions of yourfFilter, you can introduce new properties, but you can never remove old
properties.
Assume you install your filter version-1 on Cluster-1, create SPBM Policy-1, then install filter version-2 on
Cluster-2, which introduces some new properties in addition to version-1, then create SPMB Policy-2. SPBM
Policy-1 will have only the old properties, but if you edit it, you will see the new properties, and it can be
applied to both Cluster-1 and Cluster-2. SPBM Policy-2 will include all the old and new properties, but it can
only be applied to Cluster-2.
There are two implications for filters, one is that a filter should not crash if it receives an unexpected
property, since this might be an property that is valid in a future filter version. The other is from a SPBM
policy you cannot necessarily tell to which filter version it will fully apply or only partially apply.
It is important to understand the sequence in which the IO Filter Framework invokes these callbacks.
a diskPropertiesGet()
It is also important to understand how these IO Filter Framework passes properties and their values into
these callbacks. The second parameter in each of these callbacks is either:
VMIOF_DiskFilterProperty **properties
for Free.
This syntax should actually be written with *properties replaced by properties[] because properties is
actually an array of pointers to VMIOF_DiskFilterProperty structures, not a pointer to a pointer to a single
structure of that type. The Framework sets the last element in the array to NULL.
Each of the VMIOF_DiskFilterProperty (the elements in the array) are defined as:
n const char *name — A pointer to an ASCII string containing the name of the property to be validated,
set, retrieved, or freed.
n const char *value — For valid and set, this points to the value being diskPropertiesValid /
diskPropertiesSet, respectively, in ASCII.
For diskPropertiesGet, the function is expected to set this pointer to an address that contains an ASCII
representation of the property's value. If the property is naturally ASCII (a string), or you keep the
ASCII representation (uncommon), the code can just set this member to the address of that value. In the
more likely case that an integer value is not kept in ASCII, the code should dynamically allocate
memory to store and ASCII value of the current value of the named property and set this member to the
address of the allocated memory, which gets freed some time later by a call by the Framework to
diskPropertiesFree().
Having pointers for values for string properties pointing to existing memory and pointers for other
values dynamically allocated adds a small layer of complexity to the solution such that when the
Framework calls diskPropertiesFree, the code for that callback has to distinguish which were
dynamically allocated and which were not.
For diskPropertiesfree, properties contains the address the code set during diskPropertiesGet. The
code must free the memory pointed to if it was dynamically allocated.
The code for most functions is somewhat obvious once you understand the parameters. That said, here are
some important notes:
n Return VMIOF_SUCCESS from diskPropertiesValid if ALL of the parameters and their proposed values
are valid. The validity of values may be depend on dynamic factors such as the current time (for
example with licenses), the size of the disk, available cache memory, etc. Return VMIOF_FAILURE if any
property or its value is invalid.
n Only return from diskPropertiesSet only if you can set ALL of the parameters to their proposed
values. That is, a set operation should be atomic. Either the code changes all of them, or none of them.
n diskPropertiesValid is optional. If the Library component does not provide this callback, the
Framework just calls diskPropertiesSet() and diskAttach() as though it was present and returned
VMIOF_SUCCESS.
NOTE The SCSI commands issued from the IO Filter are batched and executed asynchronously.
n VMIOF_ScsiCallback done — This is essentially a callback that is invoked by the IO filter framework for
each SCSI command issued to an iSCSI target.
n void *doneData — The parameter to be passed to the SCSI command callback function
n void *buffer — The buffer is the placeholder for the read/write operations performed by the SCSI
commands.
n ssize_t length — This is the size of the buffer being used for IO operations.
n uint64_t timeoutMS — This specifies the time in milliseconds after the which on response, the SCSI
command times out.
n VMIOF_ScsiCommandFlags flags — Flags that describe the SCSI commands are as follows:
typedef enum {
VMIOF_SCSI_FLAG_NONE = 0, /* Empty flag. */
VMIOF_SCSI_FLAG_DIRECTION_TO_DEVICE = 1u << 0, /* Direction of transfer is to device. */
VMIOF_SCSI_FLAG_DIRECTION_FROM_DEVICE = 1u << 1, /* Direction of transfer is from device.
*/
VMIOF_SCSI_FLAG_NO_RETRY = 1u << 2, /* Upon transient failure, do not retry
command. */
} VMIOF_ScsiCommandFlags;
n uint8_t *cdb — This is the pointer to the SCSI CDB in which commands are specified.
n uint16_t hostStatus — This is the host adapter status (and error codes)
n uint32_t bytesXferred — This is a SCSI statistic to specify the actual number of bytes of data transferred
by SCSI.
n uint8_t *sense — This is the pointer to the sense buffer to hold status information
NOTE The memory for the SCSI command and the buffers and cdb should be allocated from the Heap.
Similarly, at the end of the command, you should also free the memory.
NOTE The LI is allowed to use SCSI APIs, but the SCSI device must be opened by the Daemon.
Understanding VMIOF_ScsiHandle
The VAIO API provides an opaque data type VMIOF_ScsiHandle. This represents a handle to the SCSI disk to
which SCSI commands need to be issued. The handle is returned on a successful call to the
VMIOFScsiDiskOpen() function (discussed in detail in the next topic).
Understanding VMIOF_ScsiCallback()
The IO filter framework invokes the VMIOF_ScsiCallback() function for each SCSI command issued to a
iSCSI target. The function prototype is as follows:
typedef
void (*VMIOF_ScsiCallback) (void *data, VMIOF_Status status)
n void *data — This parameter holds any data pertaining to the SCSI command issued.
Understanding VMIOF_ScsiEstimateHeapSize()
For performing any SCSI operations from an IO Filter, you will perform dynamic memory allocation for the
SCSI components. To this end, you will need to estimate the memory requirement and specify the maximum
number of devices and SCSI commands that will be allocated on the user-defined heap (as described in the
Section Managing Memory in an IO Filter Solution). Use VMIOF_ScsiEstimateHeapSize to determine how
much heap space is require for the SCSI device handle and the command tracking information.
Once the heap-size that is required for SCSI operations is calculated, you should add this value to the total
heap required and create the heap using VMIOF_HeapCreate() functions described in the topic “Managing
Memory in an IO Filter Solution,” on page 130 .
size_t
VMIOF_ScsiEstimateHeapSize(uint32_t numDevs, uint32_t numCmds)
n uint32_t numCmds — This is the maximum number of SCSI commands that will be issued.
The function returns the size of the heap that will required and this is the amount of heap that needs to be
created.
Understanding VMIOF_ScsiDiskOpen()
The prototype of the function is as follows:
VMIOF_Status
VMIOF_ScsiDiskOpen(VMIOF_HeapHandle *heap, const char *name, VMIOF_ScsiHandle **phandle)
n const char *name — This is the name of the SCSI disk to which the commands should be issued. Note
that the name does not include the path to the disk as the IO Filter framework appends the string
“/dev/disk” to the path.
n VMIOF_Scsihandle **phandle — The handle to the SCSI disk to which commands will be issued.
The function returns the status of the operation (VMIOF_SUCCESS if the operation is successful; It returns an
appropriate failure status on a failure condition).
Understanding VMIOF_ScsiDiskClose()
Once you are done performing operations on the SCSI disk, you can close the disk by calling the
VMIOF_ScsiClose() function. The prototype of the function is as follows:
VMIOF_Status
VMIOF_ScsiClose(VMIOF_ScsiHandle *handle)
The function returns VMIOF_SUCCESS if the operation is successful or returns VMIOF_BAD_PARAM if the handle is
invalid.
Understanding VMIOF_ScsiCommandsIssue()
The SCSI commands can be issued to the disk using the VMIOF_ScsciCommandsIssue. The commands are
batched and are completed asynchronously. On completion, each of the command will call its callback
function (defined in the VMIOF_ScsiCommand structure). The prototype of the function is as follows:
VMIOF_Status
VMIOF_ScsiCommandsIssue(VMIOF_ScsiHandle *handle, VMIOF_ScsiCommand **cmds, uint32_t count,
uint32_*submitted)
n VMIOF_ScsiCommand **cmds — is the array of pointers to commands that will be batched and sent to the
SCSI disk. When the command completes, the memory allocated for each of the command should be
freed separately.
n uint32_t submitted — is the number of commands that were successfully submitted. If there were to be
any malformed commands in the structure, they will not be submitted. Additionally, all the commands
subsequent to this malformed command in the array will not be submitted.
The function returns VMIOF_SUCCESS if all the commands were submitted. It returns VMIOF_BAD_PARAM if the
command is malformed. It returns VMIOF_NO_MEMORY if there is not enough heap space; VMIOF_NO_RESOURCES if
the system does not have enough resources.
45. {
46. VMIOF_ScsiCommand *c;
47. uint32_t len;
48. c = VMIOF_HeapAllocate(heap, sizeof *c);
49. assert(c != NULL);
50. memset(c, 0, sizeof *c);
51. len = MIN_BUFSZ + (rand() % (MAX_BUFSZ - MIN_BUFSZ));
52. c->done = ScsiTestCallback;
53. c->doneData = c;
54. c->buffer = VMIOF_HeapAllocate(heap, len);
55. assert(c->buffer != NULL);
56. c->length = len;
57. c->timeoutMS = 0;
58. c->flags = VMIOF_SCSI_FLAG_NONE;
59. c->cdb = VMIOF_HeapAllocate(heap, CMDSZ);
60. assert(c->cdb != NULL);
61. c->cdb[0] = 0x12; /* Opcode for Inquiry Command */
62. c->cdb[1] = 0; /* Misc CDB Info */
63. c->cdb[2] = 0;
64. c->cdb[3] = 0; /* MSB for Allocation Length */
65. c->cdb[4] = len; /* LSB for Allocation Length -> for Inquiry Command Needs to be
atleast 5 */
66. c->cdb[5] = 0; /* Control information */
67. c->cdbLen = CMDSZ;
68. assert(CMDSZ >= 6);
69. c->sense = VMIOF_HeapAllocate(heap, SENSE_BUFSZ);
70. assert(c->sense != NULL);
71. c->senseSize = SENSE_BUFSZ;
72. LOG("Inquiry Command CDB: %x %x %x %x %x %x \n",
73. c->cdb[0], c->cdb[1], c->cdb[2], c->cdb[3], c->cdb[4], c->cdb[5]);
74. return c;
75. }
76. static void ScsiTestCleanupTimerCb(void *data)
77. {
78. int exitVal = (int)(uintptr_t)data;
79. VMIOF_TimerRemove(timer);
80. VMIOF_WorkGroupWait(workGroup);
81. VMIOF_WorkGroupFree(workGroup);
82. workGroup = NULL;
83. VMIOF_ScsiClose(scsiDisk);
84. VMIOF_HeapDestroy(heap);
85. exit(exitVal);
86. }
87. static void ScsiTestWorker(void *data)
88. {
89. VMIOF_ScsiCommand *cmds[NUM_CMDS];
90. VMIOF_ScsiCommand *invalidCmd;
91. uint32_t i, numPending;
92. VMIOF_Status status;
93. for (i = 0; i < NUM_CMDS; i ++) {
94. cmds[i] = ScsiTestMakeCommand(true /* => valid sense buffer */);
95. }
96. status = VMIOF_ScsiCommandsIssue(scsiDisk, cmds, i, &numPending);
97. LOG("VMIOF_ScsiCommandsIssue: status=%d i=%d numPending=%d\n", status, i, numPending);
98. if (numPending == 0) {
149. VMIOF_ScsiClose(scsiDisk);
150. VMIOF_HeapDestroy(heap);
151. stoppedCb(data);
152. }
153. static void ScsiTestCleanup(void)
154. {
155. //Any required cleanup action
156. }
157. VMIOF_DEFINE_DAEMON(
158. .start = ScsiTestStart,
159. .stop = ScsiTestStop,
160. .cleanup = ScsiTestCleanup
161. );
n Lines 1-32: include all the necessary header files and defines some global and static variables for the
sample code.
n Lines 33-43: define the callback that is invoked on a completion of the SCSI command issued. In the
callback, we print the status of the command and free all the allocated memory for the IO buffer and
sense buffer. We also free the memory allocated to the SCSI command.
n Line 52: Sets the callback function to invoke when the command completes
n Lines 61-66: Fills the CDB structure with the SCSI command parameters
n Lines 76-86: define the Timer callback to clean up the oustanding workGroups and allocated memory.
n Lines 87-103: defines the worker thread callback that is invoked which is responsible for issuing the
SCSI command to the SCSI disk.
n Line 94: calls the function to fill up the VMIOF_ScsiCommand structure with appropriate command
parameters.
n Line 97: The parameter num_pending is the number of commands that have not yet been
“submitted”.
n Lines 104-146: define the function that is invoked when the daemon starts
n Lines 109-112: obtain the diskName to which the SCSI commands need to be issued against. The
diskname is obtained from the environment variable. There is also some sanity-checking
performed against the diskname to confirm it is not null. Please refer the section "How to get the
SCSI disk name" in order to get the diskname.
n Lines 113-117: calculates the estimated memory requirements for the SCSI commands
n Line 118: creates the heap and returns the heap-handle which will be used to allocate memory.
n Lines 130-144:creates the work group and submits the worker which is responsible for issuing the
SCSI commands.
$ pwd
/opt/vmware/vaiodk-6.0.-2897841/src/partners/samples/iofilters
$ cd countIO
2 Replace the countIODaemon.c file with the sample code shown above. Note that the sample-code above
issues SCSI commands from within the daemon context of the framework
3 Compile the filter and copy the vib as described in the Section “Chapter Summary,” on page 84
4 Once you have installed the countio IO Filter on the ESX and have identified the SCSI disk-name (Refer
Section "How to get SCSI disk name") to which you want to target, type the following on your ESX
console
Sample logs:
The name of the SCSI disk is the value within the parenthesis for the parameter “Display Name”. In the
above example, the name of the SCSI disk is “naa.6b083fe0bf1f32001bd84a99083d89bd”
NOTE The SCSI disk against which SCSI commands are to be issued should not be actively being used.
NOTE You can also use the coredump partition as a SCSI disk. This should not be done in a production
environment.
1 On the ESX console, type the following command to list the coredump partition
You can now use this coredump partition as a valid SCSI disk against which you can issue SCSI commands.
WHAT'S NEW In the 60U2 release, the IO Filter Framework fixed the bug that VMIOF_ScsiClose and a SCSI
command callback could have race condition. It is now acceptable to call VMIOF_ScsiClose from its last SCSI
command callback.
In 60U1, you can workaround this by doing the following: First make sure all the SCSI command callbacks
have been called. Then call VMIOF_ScsiClose to close the device in a poll callback or a timer callback, and
later call VMIOF_HeapDestroy to destroy the heap for the device.
Chapter Summary
The topics in this chapter presented details such that you should now be able to:
d Stop the VM
2 Which of the following should a diskOpen callback do? (choose all that apply)
3 What is the difference between instance data and sidecar data? (fill in the blank)
4 What is the difference between a diskAttach and a diskOpen event? (fill in the blank)
5 When does an filter know that a diskVmMigration has completed successfully? (fill in the blank)
6 Which two callbacks does the IO Filter Framework use to set capabilities / filter properties? (fill in the
blanks)
7 How does a filter know when its VMDK is getting deleted? (fill in the blank)
n “Updating the Dirty State of the VMDK using VMIOF_DiskContentsDirtySet(),” on page 257
For cache class filters, VMware highly recommends using a design pattern that includes having the filter's
Daemon perform all operations on the cache file. This greatly simplifies many other issues related to filter
design. With the Daemon owning the cache file, the LIs must communicate with the daemon to request
cache entries for reads and update the cache for writes. The protocol used between the LI and Daemon is
completely up to the implementor. That said, VMware strongly suggests the design include the following:
n The LIs should set up buffers for IO blocks, and share them with the Daemon via the VMIOF_crossfd*()
functions
n The Daemon should use the VMIOF_AIO*() functions to perform scatter-gather IO between these crossfd
buffers and the cache file. The Daemon must place blocks for read hits into crossfd buffers, and read
data to cache from crossfd buffers.
The Daemon uses the functions discussed in the following sub-sections to manage the cache file.
bool VMIOF_CacheFileVolumeIsAvailable(void)
VMIOF_Status
VMIOF_CacheFileVolumeGetAvailableSpace(uint64_t *spaceInMB);
If this function returns VMIOF_SUCCESS, it stores the number of megabytes available for new cache files in the
spaceInMB parameter. If it is unable to find the cache volume on the host it returns VMIOF_NO_FOUND. If it is
unable to get the free space available, it returns VMIOF_FAILURE.
The only parameter is a structure of type VMIOF_CacheFileParam which has the following 3 members :
n const char *name — A name for the cache file to be created. To prevent collisions between competing
users of a VFFS, use a URI-style name of the form com.vendor.filter_name. If you decide to have
multiple cache files in your filter, use the form com.vendor.filter_name.cache_nameX.
n bool needContiguous— Set to true if the solution requires the space in the cache file to be contiguous
within the VFFS volume. If it is true, space will be pre-allocated. If this is set to false, writes can fail due
to lack of space, and it is even possible to create a cache file that is larger than the VFFS partition.
NOTE VMIOF_CacheFileCreate won't fail due to fragmentation. Even you set needContiguous to true,
you will still be able to consume the entire VFFS volume.
NOTE While you may consider having one cache file per VMDK processed by a filter, VMware
recommends that each IO Filter Solution have a single cache file that it uses for all VMDKs it processes.
size_t VMIOF_CacheFileHandleAllocationSize(void);
The return values is the required minimum allocation size to accomodate the cache file handle.
n VMIOF_HeapHandle heap — The heap from which the handle gets allocated.
n VMIOF_NOT_FOUND — The cache file with the given name was not found.
n VMIOF_FAILURE — The function was unable to get a handle for the cache file.
NOTE You are expected to close the cache file once all operations are completed by calling
VMIOF_CacheFileClose()
n VMIOF_NOT_FOUND — The cache file with the given name was not found.
NOTE You are expected to close the cache file before calling this function.
n VMIOF_NOT_FOUND — The cache file with the given name was not found.
n VMIOF_NO_SPACE — There is not enough space available to resize the cache file.
n uint64_t byteOffset — The offset into the cache file to start freeing space
n uint64_t *bytesFreed — If the function returns VMIOF_SUCCESS, it writes the number of bytes actually
freed into this parameter
n VMIOF_NOT_SUPPORTED — The operation is not supported on files which are not contiguous.
n VMIOF_MISALIGNED — The byteOffset and/or numBytes parameters are not aligned to the correct block
size.
You can only perform this operation on non-contiguous cache files. You cannot free less than one block of
data on a cache file, as provided by VMIOF_CacheFileVolumeGetBlockSize(). Further, byteOffset and numBytes
should be multiples of the block size provided by said function.
As shown, this function takes a single parameter. It places the size of the blocks in the VFFS volume in the
space pointed to by blockSize parameter upon returning VMIOF_SUCCESS. A return value of VMIOF_NOT_FOUND
indicates that it was unable to find the cache volume on the host. A return value of VMIOF_FAILURE indicates
that it was unable to get the block size of the cache file.
VAIO provides the VMIOF_DiskContentsDirtySet() function to set the dirty / clean state of a VMDK. Use this
function to set the state of a VMDK to dirty whenever the VMDK is dirty, and again when it is clean. The
prototype of this function is:
n VMIOF_DiskHandle *handle — Set this to the handle passed into the callback from which you invoke this
function
NOTE You can only use this function from the diskOpen and the diskClose callbacks.
Chapter Summary
The topics in this chapter presented details such that you should now be able to:
a VMIOF_CacheFileCreate
b VMIOF_CacheFileGetFD
c open
d VMIOF_CacheFileOpen
2 Which IO Filter component does VMware recommend manage and perform IOs to cache files?
a Library
b Daemon
c CIM Provider
3 How many cache files does VMware recommend a caching IO Filter Solution create?
a One
c Two per VMDK it filters, one for clean data and one for dirty data
n Specify in which components with which you can not, may not, and may create threads using
pthread_create()
n Understand the use of blocking rules in Poll and Timer Callbacks, and WorkGroup Functions
n Understand when you can and cannot hold a lock across calls to the VAIO
n “Blocking Rules in Poll and Timer Callbacks and WorkGroup Functions,” on page 264
n “Troubleshooting vSphere Cluster and Filter Build Configuration (Checklist),” on page 265
1 Filter Library Component — You can not use pthread_create() in Filter Library code. The SDK looks
for pthread_create() in the source code at compile time and raises an error, preventing compilation. The
reason for this restriction is that Filter Libraries may run in resource constrained environments, where
creating additional threads may harm the system. One example of this is hostd, which is constrained to
200 total threads in vSphere 6.0.
Instead, use VMIOF_WorkGroupAlloc() to create a thread pool and VMIOF_WorkQueue() to submit work to
the thread pool.
2 Filter Daemon Component — You may, but should not use pthread_create() in Filter Daemon code.
VMware prefers that you use VMIOF_WorkGroupAlloc() to create a thread pool and VMIOF_WorkQueue() to
submit work to the thread pool.
3 CIM Provider — You may use pthread_create() as you wish within CIM provider code. That said,
please be considerate of others and limit the number of threads you create to just those you need to
perform CIM operations.
When you use pthread_create() in the context of a library instance, that thread is not allocated from the
thread pool. The thread is created directly in hostd. When the callback for the thread is invoked, the context
which originally created the thread is not identified. As a result you end up crashing the hostd.
The following code snippet shows how the usage of pthread_create() in the context of the library instance
can crash hostd. When a snapshot is requested, the diskSnapshot callback function
SampleFilterDiskSnapshot() is invoked. During the SNAPSHOT_PREPARE phase, you create a thread via
pthread_create(). The thread callback SampleFilterThreadSnapshotCallback() now tries to invoke the
progressFunc() callback. This thread actually gets created in hostd context and it has no knowledge of either
the progressFunc or completionFunc. As a result it leads to crashing the hostd. Note that the structure
SnapshotCallbackInfo is passed as a parameter to the thread created in the diskSnapshot callback.
12. sleep(5);
13.
14. if (sninfo->progressFunc != NULL) {
15. sninfo->progressFunc(sninfo->snhandle, VMIOF_SUCCESS);
16. }
17. if (sninfo->completionFunc != NULL) {
18. sninfo->completionFunc(sninfo->snhandle, VMIOF_SUCCESS);
19. }
20. return VMIOF_SUCCESS;
21. }
22.
23. VMIOF_Status SampleFilterDiskSnapshot(VMIOF_DiskHandle *handle,
24. const VMIOF_DiskSnapshotInfo *info)
25. {
26. int ret = 0;
27. SnapshotCallbackInfo *buf = NULL;
28. buf = (SnapshotCallbackInfo *)malloc(sizeof(SnapshotCallbackInfo));
29. buf->snhandle = handle;
30. buf->progressFunc = info->progressFunc;
31. buf->completionFunc = info->completionFunc;
32.
33. VMIOF_Log(VMIOF_LOG_INFO, "In the callback %s\n", __func__);
34.
35. if (info->phase == VMIOF_SNAPSHOT_PREPARE) {
36. pthread_t thread_info;
37. pthread_attr_t attr;
38. pthread_attr_init(&attr);
39. pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
40. ret = pthread_create(&thread_info, &attr,
41. SampleFilterThreadSnapshotCallback, buf);
42. VMIOF_Log(VMIOF_LOG_INFO, "%s : Created thread\n", __func__);
43. return VMIOF_ASYNC;
44. }
45.
46. return VMIOF_SUCCESS;
47.
48. }
n Lines 1-5 : You define the snapshot callback structure SnapshotCallbackInfo that stores information
about the disk which is getting snapshotted.
n Lines 7-22 : Define the snapshot callback function that is invoked when the thread created using
pthread_create() is called.
n Line 9 : Note that the parameter buf is type casted to SnapshotCallbackInfo and accepted into
sninfo.
n Line 14 & 17 : Calls to progressFunc and completionFunc respectively. This is the place where the
control should get transferred to the LI context. However since the thread is created in the context
of hostd and has no knowledge of either the progressFunc or completionFunc, it results in crashing
the hostd.
n Lines 28-32 : Allocate memory for a variable of type SnapshotCallbackInfo and assign values to the
members of this structure.
n Lines 33-44 : Check if the control came to the diskSnapshot callback in the VMIOF_SNAPSHOT_PREPARE
phase. We then create a thread using pthread_create() and assign the callback to
SampleFilterThreadSnapshotCallback() function. You return VMIOF_ASYNC on line 43.
n When this code sample is executed upon initiating a snapshot operation on the disk to which the
sampfilt filter is attached, you observe a hostd crash. The stack trace looks as follows :
p:767
#20 VigorVim::VigorVimOp::ReportProgress (this=0x259bbcc0, current=0, maximum=512)
at bora/vim/lib/vigorVim/vigorVimOp.cpp:340
#21 0x0ac8a00a in VigorCpp::VigorOp::ResultCbWrapper (cbData=0x259bae58, result=0x26b8004c)
at bora/lib/vigorCpp/VigorOp.cpp:181
#22 0x0aca5d3c in SnapshotVigorProgressCB (progressData=0x1f4a93e4, cur=0, max=512)
at bora/lib/vigorOffline/offlineSnapshot.c:387
#23 0x0c011d34 in SnapshotProgress (cbData=0x2627f83c, pos=0, end=100) at
bora/lib/snapshot/snapshot.c:1789
#24 SnapshotProgress (cbData=0x2627f83c, pos=0, end=100) at bora/lib/snapshot/snapshot.c:1766
#25 0x0bfb7925 in FLDiskNotificationReportProgressToBE (info=0x257fc5a0) at
bora/lib/filtlib/flDiskNotification.c:85
#26 FLDiskNotificationReportProgressToBE (info=0x257fc5a0) at
bora/lib/filtlib/flDiskNotification.c:75
#27 0x0bfb7a8b in FLDiskNotificationProgressCb (handle=0x257de0e0, percentage=0)
at bora/lib/filtlib/flDiskNotification.c:1122
#28 0x0de37151 in ?? ()
#29 0x0c49dd6a in start_thread (arg=0x26b80b70) at pthread_create.c:301
#30 0x0c586d9e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:133
(gdb)
Now go to frame 5 in the stack trace and list the code. You see that it is trying to get the thread from the
thread pool.
(gdb) frame 5
#5 0x070e540b in Vmomi::PropertyCollectorInt::PropertyCollectorImpl::TriggerProcessGUReqs
(this=0x25c5ade8, filter=0x255d7248)
at bora/vim/lib/vmomi/propertyCollector.cpp:1410
1410 this, &PropertyCollectorImpl::ProcessGUReqs));
(gdb) list
1405 /* schedule the thread
1406 */
1407
1408 GetThreadPool()->ScheduleWorkItem(
1409 MakeFunctor(
1410 this, &PropertyCollectorImpl::ProcessGUReqs));
1411 }
1412 }
1413 }
1414
If you look at the hex-dump, you see that GetThreadPool() is in fact returning NULL pointer.
0x070e5403 <+227>:
call 0x70ce420 <_ZN7Vmacore6System13GetThreadPoolEv@plt>
0x070e5408 <+232>: lea -0x38(%ebp),%edi
=> 0x070e540b <+235>: mov (%eax),%edx
The reason GetThreadPool() is returning NULL is because the progressFunc callback referred to on Line 15 of
the code-snippet shown above is not running on the thread-pool. It is in fact running on a thread created
from the iofilters-thread pool. The pthread_create() created the thread directly in hostd and has no
knowledge and control over iofilter Library Instance context’s threads.
In both LIs and Daemons, no blocking functions may be called in Poll callbacks and Timer callbacks. There
is a single thread in IOFilter framework, for each VMX and iofilterd, that calls all Poll callbacks and Timer
callbacks. As a result, if any callback blocks, all other callbacks have to wait. Furthermore, if one callback has
dependency on another callback, e.g. if it waits for a semaphore to be posted by another callback, it will run
into deadlock, since the other callback will never have the chance to run and post the semaphore.
In LIs, no blocking functions may be called in WorkGroup functions. There are several worker threads in the
IOFilter framework that are responsible to process all the WorkGroup functions in LIs. But in some
application environments, the number of worker threads might be as little as one. If any WorkGroup
function blocks, the worker thread blocks and cannot process other functions. Similar to Poll callbacks and
Timer callbacks, if WorkGroup functions have dependency on each other, they are subject to deadlock.
However, Daemons can use a set of the available worker threads (64 currently) for blocking operations.
You might wonder what you can do if you really need to block and wait for something to complete before
you can proceed. Our recommendation is to schedule a timer callback to periodically check the status of the
thing you want to wait for. One example is diskDetach callback, in which you need to detach your filter and
do some clean like removing sidecar files, but your filter might be issuing its own IOs. You need to wait for
these IOs to complete before you can detach the filter. In this situation, you can schedule a timer callback
that periodically checks whether all the IOs have been completed, you can also use this timer callback to call
VMIOF_DiskOpProgressFunc provided in VMIOF_DiskDetachInfo. Once all the IOs are completed, you can
perform the remaining clean up steps.
The URL you specified in the InstallIOFilter_Task when installing the IOFilter Bundle needs to be accessible
all the time until the Filter is Uninstalled. The reason is that VC doesn't store the bundle, so it has to access
that URL to get bundle info every time a new host joins the cluster, a stateless host reboots and reconnects to
VC, as well as when the Filter is uninstalled from the cluster.
If the URL is not accessible, you will observe various errors. If the error message on vCenter Server is not
clear enough, you can examine the vpxd.log and eam.log in vCenter Server to figure out the reason.
E.g. If the URL is not accessible when trying to uninstall the Filter, the task will fail and give an error
message on vCenter Server as:
This topic provides a brief list of things to check in your cluster and build configuration:
n Each host is in Community Supported mode — The InstallIOFilter_Task method requires this mode
to install unsigned VIBs.
n VIB's 'acceptance-level' set to 'community' in the SCONS file — The InstallIOFilter_Task method
requires this acceptance level to install unsigned VIBs.
n During install/uninstall, EAM will put the host into maintenance mode first, and put it back after
install/uninstall. EAM will do so successfully, however, it will report an error in the Task List on
vSphere Web Client UI. You can ignore the error message right now, and we will fix it in next release.
n DRS is properly set - When uninstalling or upgrading, DRS will migrate running VMs to other hosts. In
order for DRS to do so, (1) DRS needs to be set to Fully Automated Mode, (2) VMs need to be on shared
storage, (3) vMotion Traffic Tag must be set on one of the vmknics on all the hosts.
n Hosts / vCenter date / time synchronized — The authentication system used between vCenter Server
and ESXi hosts is time sensitive. If the time on the systems are too far out of sync, certain cluster
operations may fail. A best practice is to configure the vCenter Server and ESXi hosts to use the same
Network Time Protocol (NTP) server.
Following is a code snippet describing how to use this ramdisk to generate a shared memory key.
key_t key;
int shmid;
char *data;
size_t size = 1024*1024*2;
key = ftok("/var/run/iofilters/com.companyA.productA", 'R');
shmid = shmget(key, size, 0644 | IPC_CREAT);
data = shmat(shmid, (void *)0, 0);
n VMIOF_DiskIOSubmit() — This function is used to submit a previously allocated disk IO object to the
kernel.
Consider a filter A that invokes the following functions in sequence in its diskIOStart callback function —
pthread_mutex_lock(M);
VMIOF_DiskIOSubmit();
pthread_mutex_unlock(M);
Now, if this thread tries to acquire the same mutex M in its completion callback, it will have self deadlocked
it. It is possible that a filter underneath might have completed the IO, so that the completion path is getting
executed inside of VMIOF_DiskIOSubmit.
The following table summarizes various Sidecar functions along with the list of valid callback function that
can invoke them —
VMIOF_DiskSidecarGetSize() ANY
NOTE If the filter implements the diskStun and diskUnstun callbacks, then in stunned state,
VMIOF_DiskSidecarRead() and VMIOF_DiskSidecarWrite() can be used in all LI callbacks, while in unstunned
state, they can be used in any context.
Heap Debugging
When the Library Instance exits, an ASSERT() will occur if there are any memory leaks. In order to debug
the heap, you will require the heap.gdb script referenced in this topic. At the current time, it is not available
in the Dev Kit, so please open a DCPN case requesting this script.
As an example of using this script, the following code was added to the countIO sample in the
TestDiskStartIO() function
if (data->ioCount == 0) {
// Allocate an I/O
status = VMIOF_DiskIOAlloc(handle, data->heap, 1, &outIO);
}
Running this filter causes an ASSERT() when the filter is unloaded, and the heap.gdb script is used to assist
in determining the leak.
(gdb) frame 5
#5 0x01f880f8 in HeapDestroy (heap=0x2b926000) at bora/lib/filtlib/heap.c:743
743 bora/lib/filtlib/heap.c: No such file or directory.
in bora/lib/filtlib/heap.c
(gdb) heapprint heap
heap: countio
dlmalloc segment 0: base=0x2b928000, size=12288, next=0x2b927fe0
INUSE: mchunkptr: 0x2b928000 (raw addr 0x2b928008); mchunk size=160
Poison @0x2b928094 OK; bytes: 132, sufixBytes: 24, callerPC: 0x1f74931
<FiltLibVmiofDiskAllocIO+97>:
mov -0x38(%ebp),%edx
dlmalloc segment 1: base=0x2b92608c, size=8052, next=(nil)
FREE: mchunkptr: 0x2b926268 (raw addr 0x2b926270); mchunk size=7536 fd=0x2b926268
bk=0x2b926268
(gdb) p ((FiltLibDiskIO*)0x2b928008)->context.ioCount.value
$1 = 1
(gdb) p *((FiltLibDiskIO*)0x2b928008)
$2 = {diskIO = 0x2b928064, heap = 0x2b926000, completionStatus = VMIOF_SUCCESS, sharedDataIndex
= 0, context = 0x1f05f938,
currentClass = FILTLIB_DISK_FILTER_CLASS_CACHE, currentClassGuard =
FILTLIB_DISK_FILTER_CLASS_INVALID, referenceCount = {value = 1},
submit = 0x1f7acc0 <FiltLibSubmitUserIO>, finalize = 0x1f73e30 <FiltLibFreeIO>, completionPairs
= {{callback = 0, data = 0x0},
{callback = 0, data = 0x0}, {callback = 0, data = 0x0}, {callback = 0, data = 0x0}, {callback =
0, data = 0x0},
{callback = 0x1f744a0 <FiltLibDefaultUserIOCompletionCb>, data = 0x0}}, debugInfoIndex = 1}
(gdb)
/opt/vmware/vaiodk-symbols-6.0.0-2799832
/opt/vmware/vaiodk-symbols-6.0.0-2897841
Then "make prep-debug" and "make live-debug" will automatically choose the right symbol files to use.
During the following example, symbol files of build 2799832 is chosen:
proma-2n-dhcp211:/opt/vmware/vaiodk-6.0.0-2897841/src/partners/samples/iofilter/sampfilt # make
prep-debug
RDTSC(void)
{
#ifdef VM_X86_64
uint64 tscLow;
uint64 tscHigh;
__asm__ __volatile__(
"rdtsc"
: "=a" (tscLow), "=d" (tscHigh)
);
return tscHigh << 32 | tscLow;
#elif defined(VM_X86_32)
uint64 tim;
__asm__ __volatile__(
"rdtsc"
: "=A" (tim)
);
return tim;
#endif
}
n libvmkuserlib – This is a stable interface exported by the VMkernel to userlevel applications that want
to access VMkernel specific functions.
n librt – This library provides most of the POSIX Realtime Extension interfaces. VMware supports a
limited subset of these interfaces.
The four libraries will be automatically dynamically linked. All you need to do is to include the header files
and run make.
If you want to use any other library, either public or proprietary, there are two cases. If the library has no
dependency, then you can statically link it into your Library Instance or Daemon by using "extra objects"
keyword in the scons file. If the library has any dependency, e.g. glibc, you will have to extract the code you
need from the library, and create your own static library. You will need to take care of any legal obligations
involved.
Please let us know if you heavily depend on any public library, and creating a static library as described
above is not feasible to you. We will consider adding it to our supported list of dynamic libraries for a future
release.
Development Tips
The following tips can help speed up your development efforts :
n Use the built in NFS client on ESXi to NFS mount the Linux Dev System:
n Can be used with sym links instead of doing VIB installs after rebuilding your code (only after first
vib install)
n In order to reload the daemon, you will need to stop/start it using the instructions in the document
"vSphere APIs for IO Filtering Development Kit (VAIODK) Guide for the Command Line" in the
section "Starting and Stopping the IO Filter Daemon"
n Create a test user-space program to interact with the filter. See the section "Building a Test Application
for Your IO Filter" in the document "vSphere APIs for IO Filtering Development Kit (VAIODK) Guide
for the Command Line" .
n Use cache files for local databases (even if its not a caching solution)
n Use “extern char * program_invocation_name” within the Library Instance code to determine who
loaded the library. This is important if you need to understand if you are being loaded in the context of
the VMX, or a hostd process such as vmkfstools
n In the case where you have code that creates a sidecar, but need to remove it manually due to a bug, it
can be accomplished by manually editing the VMDK file with vi or an equivalent text editor. An
example of the line in the VMDK file is as follows:
ddb.sidecars = "sampcache_1,iofilter-97b1142038b77f6d.vmfd"
2 The ESXi configuration EnableBlockDelete is set, which is not the default. However, setting this
configuration has additional ramifications, because vSphere sends then TRIM/UNMAP commands
wether or not the underlying device is a local SSD or if the volume is VMFS or VFFS.
n Put the information in a sidecar or VMDK only created and used by the daemon
If you don't want to implement shared VMDK (multi-writer) support for guests such as Oracle, you should
fail the open call if you see the flag VMIOF_DISK_SHARED.
The IO Filter framework won't allow attaching a filter to a shared VMDK if the VM is powered on. If you
attach a filter to a shared VMDK when the VM is powered off, it will succeed but the VMIOF_DISK_SHARED flag
will not be present in either diskAttach or diskOpen callbacks, as the Framework will not have that
information. You will only see VMIOF_DISK_SHARED flag in diskOpen when VM is being powered on. If you
return VMIOF_NOT_SUPPORTED, the customer will see an error message of "The specified feature is not
supported by this version" on VC, and you can also log your own message in vmware.log in order to assist
customer troubleshooting.
Q: What is the maximum combined size of all the elems in a DiskIO structure?
We don't have any limit on the total size of an IO. Besides, since the filters are chained, if there is a filter
above your filter, it can issue IOs of arbitrary size.
Use the following commands to generate a backtrace for a specific world in vmkernel.log:
Use the following commands to force vSphere to generate a kernel core file:
Once you have installed valgrind for ESXi, issue the following command:
For the ESX Host, you can run "esxcli software vib get -n esx-base" and search for vmiof_, then you will
know whether it is IOFilter capable, and the IOFilter versions supported.
Q:The function VMIOF_DiskAdapterGet () fails with the status VMIOF_NOT_FOUND after a new
disk has been added to a running VM. What is the workaround?
This is a known bug, and likely won't be fixed since this function is marked as deprecated. The workaround
is to call the function from the first diskUnstun() callback after diskOpen().
Chapter Summary
The topics in this chapter presented details such that you should now be able to:
n Specify in which components with which you can not, may not, and may create threads using
pthread_create()
n Understand the use of blocking rules in Poll and Timer Callbacks, and WorkGroup Functions
n Understand when you can and cannot hold a lock across calls to the VAIO
Symbols D
/opt/vmware 43 Daemon 25, 113
/var/log/hostd.log 50 DAEMON MEMORY RESERVATION 100
/var/log/iofilter-init.log 50 Daemon Start Callback 113
/var/log/syslog.log 50 Daemon Stop Callback 114
Development 270
Numerics Development Environment Requirements 32
32-bit 33 diskAttach 103, 155
64-bit 33 diskClone 106, 210
diskClose 104, 165
A diskCollapse 105, 201
Acknowledgements 7 diskDeleteBlocks 105, 178
Asynchronous 22 diskDeleteBlocksPrepare 105, 178
diskDetach 61, 103, 158
B
diskExtentGetPost 109, 204
BC 20
diskExtentGetPre 109, 204
Buffer Cache 20
diskGrow 109, 233
build 48
diskIOAbort 108, 180
buildId 43
diskIOReset 181
bundle 48
diskIOsReset 108
Bundle 27, 264
diskIOStart 107, 257
DiskIOStart 169
C
DiskLib 17, 21
cache 15
diskOpen 104, 159
Capabilities 74
diskRelease 109
CATALOG 116
DiskRelease 228
Chapter Objectives 13
diskRequirements 106, 153
Chapter summary 12
diskSnapshot 104
checklist 265
diskStun 108, 208
CIM 26, 89, 116
diskUnstun 108, 208
CIM Provider (CIMP) 116
diskVmMigration 107
CIMP 116
downloading VAIODK 34
class 15
cluster 27
E
ClusterIOFilterInfo 68
encryption 15
clusterMIOD 68
esxcli 62
Common Information Model (CIM) 26
esxcli software vib install 53
Common rule set 74
esxcli system maintenanceMode set -e true 62
CommunitySupported 50
compression 15 F
config.xml 53 FAQ 270
countIO 45 filter class 74
Course Prerequisites 9 Filter class 22
crossFD 184 filterID 68
FSS 19 P
persistent 24
G Poll Callback 264
gdb 118, 120 properties 60
gdbiof 118, 120 Properties 74
glossary 7 property 60
proxy 45
H
pthread_create 260
Heap 266
python 53
hostd 17, 33
Q
I
QueryIOFilterInfo 68
init.d 53
inspection 15 R
InstallIOFilter_Task 68 RAMDisk 265
instance data 24 RDTSC 269
intended audience 7 Remote System Explorer (RSE) 55
IO Filter Architecture 28 replication 15
IO Stack 14 requiredMemoryPerDiskMiB 106, 153
iofiltd 53 requiredMemoryPerIO 106, 153
iofilterd 25, 33, 53 requiredStaticMemory 106, 153
IOFilterManager 68, 265 revert 192, 198
RSE 55
J
Rule-Set 1 74
json 98
S
L
sampcache 45
LI 23
sampfilt 45
Library Instance (LI) 23
scons 47, 88
live-debug 120
SCONS 25, 89
locale 116
scp 53
localization 116
SecretSauce Library 22
sidecar 24
M
sidecar data 24
maintenanceMode 62
sidecars 24
make 47
SimpleHTTPServer 53
Makefile 88, 89
snapshot 192, 198
Managed Object Browser (MOB) 68, 83
Snapshot 193
meta data 24
software acceptance 50
MOB 68, 83
SPBM Policy 74
N SSLib 21, 22
name.description 116 SSMod 21
name.label 116 static library 269
NFS 19 stdout 50
non-persistent 24 Strategic Objective 10
Summary 29
O Synchronous 22
Offline filtering 17
Online filtering 17 T
overview 9 Tactical objectives 10
Timer Callback 264
U VMIOF_DiskContentsDirtySet 257
uninstall 47 VMIOF_DiskDeleteBlockDesc 178
UninstallIoFilter_Task 68 VMIOF_DiskDeleteBlocksInfo 178
Unix domain socket 184 VMIOF_DiskDetachInfo 158
unwrapping VAIODK 34 VMIOF_DiskFilterPrivateDataGet 145
upcall 21 VMIOF_DiskFilterPrivateDataSet 145
UpgradeIoFilter 68 VMIOF_DiskFilterProperty 110
UpgradeIoFilter_Task 68 VMIOF_DiskGrowInfo 109
User Object Module 19 VMIOF_DiskHandle 104
User space cartels 16 VMIOF_DiskInfo 104
utility functions 181 VMIOF_DiskIO 107, 108, 167, 169, 182
VMIOF_DiskIOAlloc 169, 182, 183
V VMIOF_DiskIOComplete 107, 169
VAIO 88 VMIOF_DiskIOCompletionCallback 183
VAIODK 88, 89 VMIOF_DiskIOCompletionCallbackSet 107,
vCenter Server 27 169, 183
VCSA 27 VMIOF_DiskIOContinue 169
VFFS 25 VMIOF_DiskIODup 169, 182, 183
vFlash 25 VMIOF_DiskIOElem 107, 167, 169
VFS 19 VMIOF_DiskIOFree 183
VIB 27, 48, 53, 89 VMIOF_DiskIOSubmit 169, 183
vixDiskLib 17, 21 VMIOF_DiskMaxOutstandingIOsGet 169
VM Storage Policy 74 VMIOF_DiskMigrationPhase 107
VMFS 19 VMIOF_DiskOpCompletionFunc 107
vmiof_aio.h 44 VMIOF_DiskOpProgressFunc 106–108
VMIOF_AIOAbort 186 VMIOF_DiskResetIdentifier 107, 108, 169
VMIOF_AIOCallback 186 VMIOF_DiskSidecar 137
VMIOF_AIOQueueCreate 186 VMIOF_DiskSidecarClose 137
VMIOF_AIOQueueDestroy 186 VMIOF_DiskSidecarCreate 137
VMIOF_AIOQueueDrain 186 VMIOF_DiskSidecarDelete 137
VMIOF_AIOSubmit 186 VMIOF_DiskSidecarGetSize 137
VMIOF_ASYNC 22 VMIOF_DiskSidecarOpen 137
VMIOF_cache 253 VMIOF_DiskSidecarRead 137
vmiof_cache.h 44 VMIOF_DiskSidecarSetSize 137
vmiof_crossfd.h 44 VMIOF_DiskSidecarWrite 137
VMIOF_CrossfdCreate() 184 VMIOF_DiskSnapshotInfo 104
VMIOF_CrossfdGrantAccessToRange() 184 VMIOF_DiskStunFlags 108
VMIOF_CrossfdRevokeAccessToRange() 184 VMIOF_DiskStunInfo 108
vmiof_daemon.h 44 VMIOF_DiskUuidGet 149
VMIOF_DEFINE_DISK_FILTER 101 VMIOF_DiskVmMigrationInfo 107, 216
VMIOF_DirtyState 257 VMIOF_DiskVmMigrationIpSpec 107
VMIOF_DISK_CLEAN 257 VMIOF_DiskVmMigrationType 107
VMIOF_DISK_DIRTY 257 VMIOF_FailureReportDisabled 151
VMIOF_DISK_SIDECAR_ALIGN 137 vmiof_heap.h 44
VMIOF_DISK_SIDECAR_KEYMAX 137 VMIOF_HeapAllocation 130
VMIOF_DISK_SIDECAR_KEYMIN 137 VMIOF_HeapDestroy 130
VMIOF_DISK_STUN_FLUSH_DIRTY_DATA VMIOF_HeapEstimateRequiredSize 130
208 VMIOF_HeapFree 130
vmiof_disk.h 44 VMIOF_HeapHandle 130
VMIOF_DiskCloneInfo 106, 210 VMIOF_IOFlags 107, 167, 169
VMIOF_DiskCollapseInfo 105, 201 VMIOF_LOG_ERROR 50
VMIOF_LOG_INFO 50
VMIOF_LOG_PANIC 50
VMIOF_LOG_TRIVIAL 50
VMIOF_LOG_VERBOSE 50
vmiof_log.h 44
vmiof_poll.h 44
VMIOF_PollAdd 141
VMIOF_PollHandle 141
VMIOF_PollRemove 141
VMIOF_READ_OP 167
vmiof_scsi.h 44
VMIOF_ScsiCallback 242
VMIOF_ScsiClose 242
VMIOF_ScsiCommandsIssue 242
VMIOF_ScsiDiskOpen 242
VMIOF_ScsiEstimateHeapSize 242
VMIOF_ScsiHandle 242
VMIOF_STATUS 102
vmiof_status.h 44
VMIOF_SUCCESS 22
vmiof_timer.h 44
VMIOF_TimerAdd 171
VMIOF_TimerHandle 171
VMIOF_TimerRemove 171
VMIOF_VirtualDiskClose 231
VMIOF_VirtualDiskCreate 231
VMIOF_VirtualDiskOpen 231
VMIOF_VM_IO 167
vmiof_work.h 44
VMIOF_WRITE_OP 167
VMIOF_ZERO_COPY 167
vmiof.h 44
vmkaccess 231
vmkfstools 20, 33, 60, 112
VmkuserVersion_GetUniqueSystemVersion 150
VMware Workbench 32
vmware.log 50
VSAN 19
vSCSI Module 19
vSphere Cluster Configuration 265
vSphere Web Client (VWC) 28
VWC 28
W
WorkGroup Function 264
X
xMigration 216
Z
zdump 118