Inside The Windows95 File System
Inside The Windows95 File System
Stan Mitchell
O'REILLY'"
Cambridge • Koln • Paris • Sebastopol •
Tokyo
Inside the Windows 95 File System
by Stan Mitchell
Published by O'Reilly & Associates, Inc . , 101 Morris Street, Sebastopol, CA 95472.
Printing History:
Nutshell Handbook and the Nutshell Handbook logo are registered trademarks, and The Java
Series is a trademark, of O'Reilly & Associates, Inc. The use of the mollusk image in
associatipn with Windows file systems is a trademark of O'Reilly & Associates, Inc. Windows,
Windows NT, and Windows 95 are registered trademarks of Microsoft Corporation. Many of ·
the designations used by manufacturers and sellers to distinguish their products are claimed
as trademarks. Where those designations appear in this book, and O'Reilly & Associates, Inc. .
was aware of a trademark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher assumes
no responsibility for errors or omissions, or for damages resulting from the use of the
information contained herein.
This book is printed on acid-free paper with 85% recycled content, 1 5% post-consumer waste.
O'Reilly & Associates is committed to using paper with the highest recycled content available
consistent with high quality.
ISBN: 1-56592-200-X
Table of Contents
Implementation of VWIN32_1nt21Dispatch . . . .. . 73
.................. .. ............. ..... ......
v
vi Table of Contents
fhandle Structures and the FSD's Handle-Based Function Table .. 115 . ...........
10. Virtual Memory, the Paging File, and Pagers ............................ 205
The Windows 95 Paging File . . . 205
............................ ...... ............................... ..
Pagers 213
...........................................................................................................
Where Does Block Cache Memory Come From? ............. ....... 235 . . . . . . . . . . . . . . . . •. .
How Does the Memory Manager Control Block Cache Size? ................... 237 .
. .
........... ...... ....... . . . .. . ... ........... . .................... . . .. . . ................. .. . 271
Miscellaneous . . . . .
.. ..... ............. ................... ..... ..................... ........................ . 273
Debugging .. . .. .. . .
........... . ... ....... . ... . .. ................ ............ .......... .............. .... . . .. .. 274
VREDIR Interfaces .
.................... . . ................... ..... .......... ... . . . ............. ........ . . . . 277
The SMB File Sharing Protocol . .
.......... ....... .. . ................... ... . 281
................. ......
Bibliography .
.... ............... . . ............................................................................... 34 7
This book will walk you through the inner workings of the Windows 95 file
system. The standard file systems which ship with Windows 95 include: VFAT, the
virtual FAT file system; VREDIR, the Microsoft Networks client; and NWREDIR, the
Microsoft Netware client. These and other file systems supplied by third party
developers register with the Installable File System Manager, or IFSMgr, to make
their services available to the system. IFSMgr manages the resources which are
currently in use by each file system and routes client requests to the intended file
system.
This book anticipates some of the changes to the file system which will appear in
the successor to Windows 95 (code-named Memphis). These new features include
FAT32, support for volumes up to 2 terabytes in size, and WDM (the Win32
Driver Model). The Microsoft Networks file and printer sharing protocol-the SMB
(Server Message Block) protocol-is also undergoing some changes to make it
suitable for accessing the Internet. SMB's future extension to the Internet as CIFS
(the Common Internet File System) is also examined.
The core of this book is based on the flow of execution through the layers of the
file system (stopping short of the disk system, managed by IOS, the 1/0 Super
visor). Requests are made of the file system through the application programming
interfaces (APls) that are appropriate for the operating environment (interrupt
21h, Win16, or Win32). These requests ultimately arrive at IFSMgr, which must
find a file system driver to relay the request to. Although three different Windows
95 operating environments generate these requests, IFSMgr relays them to the file
system drivers using a common 1/0 request packet structure. A file system driver
doesn't know and doesn't care if the request originated in a DOS application or in
a Win32 program.
ix
x Preface
As file system requests pass through IFSMgr on their way to file system drivers, a
file system monitor may intercept the 1/0 request packets. These monitors may
simply report the file system requests and pass them on, or they may change the
operation or direct it to a different driver. This capability provides some inter
esting possibilities for developers.
The structure of file system drivers (FSDs) is examined and two sample FSDs are
implemented. One is for-a character device which acts as an interface to a mono
chrome display adapter; the other implements a "file system within a file" by
using some of IFSMgr's ring-0 services. The VFAT and VREDIR file system drivers
are also scrutinized.
Our coverage will stray a little from IFSMgr and FSDs by examining paging and
cache services. The paging file in Windows 95 is implemented as a VFAT file;
page-ins and page-outs to this file are done using the system pagers, routines
which control the lifecycle of pages. FSDs rely upon VCACHE's services to keep
the most recently used disk blocks in memory, thereby minimizing disk "hits. "
Chapter 1 1 , on VCACHE, will explain how these services work.
Since much of this material is new, you are probably wondering: "What is the
source for this information? Do you have access to IFSMgr source code, or do you
have a good connection at Microsoft?" Recently, Geoff Chappell (author of DOS
Internals) was asked a similar question in an Internet newsgroup. His answer says
it all:
Q: So have you gotten your hands on IFSMgr code somehow, or are you just
hacking through it with Soft/CE?
A: I have my hands on IFSMgr code. So have you. Source code, of course, is
another matter-but why should I want that? I may be the only person on the
planet who works primarily with VxDs but who doesn't use SoftICE (and
indeed never have), but yes, if I talk of looking over code, I mean the code
that the machine sees. I prefer to think of this as high-quality documentation
written in a language that happens not to be English. It is, however, the only
authoritative, reliable documentation that Microsoft releases.
Preface xi
Versions
Unless otherwise stated, code fragments shown in the book are from Windows 95
build 950. This is the retail release of the product. Some material is specific to
OEM Service Release 2, also known as Windows 95 build 950B. References to this
material are flagged with the abbreviation "OSR2".
Intended Audience
This book is geared to engineers and managers who wish to tap into the new
capabilities of Windows 95. IFSMgr, file system drivers, and file system monitors
are all implemented as kernel mode or ring-0 components. In the Windows 95
environment this means they are implemented as virtual device drivers, or VxDs.
First-hand experience with VxDs is not a requirement for reading this book.
However, I do not attempt to provide a tutorial on VxDs.
Chapter Summary
This book contains fourteen chapters and four appendixes:
Chapter 2, Where Do Filenames Go? traces the path of filenames, UNC names, and
device names as they pass through the file system.
Chapter 3, Pathways to the File System, examines the mechanisms that the kernel
(VMM) uses to allow DOS, Windows 3.x, and Win32 programs access to IFSMgr.
Chapter 4, File System AP/ Mapping, reveals how the Win32 APis create Kernel32
file objects and how file object services ultimately become Interrupt 21h requests.
xii Preface
Chapter 5 , Tbe ''New" MS-DOS File System, shows that the MS-DOS interrupt inter
faces are still supported but now they are mostly implemented in IFSMgr's ring-0
code.
Chapter 6, Dispatching File System Requests, looks at the how 1/0 request packets
are routed to file system drivers. Three key IFSMgr data structures are introduced:
the i f sreq structure, the shell resource, and the £handle structure. These data
structures allow IFSMgr to call into the appropriate file system driver entry points.
Chapter 7, Monitoring File Activity, examines the use of file system hooks and
looks at several example programs. IFSMgr_NetFunction and path hooks are also
discussed.
Chapter 8, Anatomy of a File System Driver, looks at the details of the linkage
between file system drivers and IFSMgr. It examines in detail how each type of
FSD handles the mounting and dismounting operations. Two sample FSDs are
described: MONOCFSD, a character FSD, and FSINFILE, a remote FSD.
Chapter 9, VFAT.· Tbe Virtual FAT File System Driver, reviews the FAT16 file struc
ture and contrasts it with that of FAT32. Some implementation details of VFAT are
examined, including initialization and registration, mounting a volume, opening a
file, and locating a directory. Some basic IOS data structures and services are
introduced.
Chapter 10, Virtual Memory, the Paging File, and Page-rs, shows how the paging
file is accessed via IFSMgr. The use of each of the system pagers is also explored.
Chapter 1 1 , VCACHE: Caches Big and Small, describes the VCache services and
data structures. Many undocumented features are described here.
Chapter 12, A Survey of IFSMgr Services, categorizes and enumerates all IFSMgr
services. It provides undocumented details on heap management, event manage
ment, and path-parsing services.
Chapter 13, VREDIR: Tbe Microsoft Networks Client, looks at how the redirector
interfaces with other network components. The NetBIOS and SMB protocols are
introduced and these protocols are traced with MultiMon to see how remote file
system requests are handled. The CIFS protocol is contrasted with the SMB
protocol.
Chapter 14, Looking Ahead, explores the differences between the Windows NT
and Windows 95 file systems. The impact of WDM is also assessed.
Appendix A, MultiMon: Setup, Usage, and Extensions, describes how to install and
use MultiMon, a Windows 95 internals snooping tool. A sample extension driver
is also described.
Preface xiii
Appendix D, IFS Dev_elopment Aids, describes four tools for VxD writers using the
DDK, including IFSWRAPS, a library of all IFSMgr services, and DEBIFS, a
debugger "dot" command for examining IFSMgr data structures.
Typographical Conventions
Throughout this book, we have used the following typographic conventions:
Bold
Indicates the name of a Windows API or a VxD service name, functions, moni
tors, and commands. Bold is also used to indicate menus, buttons, dialogs,
and other parts of the Windows 95 GUL ·
Italic
Indicates . filenames, variables, and is used for emphasis. Manifest constants
are represented by uppercased italicized names, e.g. , MAXFUNC
Constant width
Indicates a language construct such as a data type, a data structure, a macro,
or a code example.
Getting Updates
Updates to the source code on the companion diskette can be found at:
http://www.sourcequest.com/win95ifs
From time to time, new utilities will be posted there for download.
Acknowledgments
Thanks are due to the many people who have helped make this book possible.
Andrew Schulman, my editor, who saw the significance of the Windows 95 file
system and encouraged me to expose it in a Nutshell series book. This book
would not have been attempted without his encouragement. Although he sparks
controversy by his writings, he has won the admiration and respect of the
Preface xv
Geoff Chappell, for sharing some of his intimate knowledge of IFSMgr. Material
that he has generously provided is duly noted.
Rajeev Nagar, author of the forthcoming Windows NT File System Internals, for
making suggestions about the content of the "Looking Ahead" chapter.
Mark Russinovich, for supplying me with an advance copy of his Dr. Dobb 's
journal article, "Examining the Windows NT Filesystem" (February 1997), written
with Bryce Cogswell.
Russ Arun at Microsoft for prying the "IFS Specification" loose and getting it into
developers' hands during the Chicago beta.
The many developers who post file-system related questions in the Internet
newsgroups and CompuServe forums. Some of these questions became the basis
for a book topic or sample program.
The crew at O'Reilly who helped this novice bookwriter learn the ropes. Special
thanks to Troy Mott, my "O'Reilly connection, " who helped resolve many issues
that arose during the course of the project.. Thanks also to Edie Freedman for her
excellent cover design. Frank Willison, Editor in Chief, who made many sugges
tions for improvement. David Futato, for producing an attractive addition to our
bookshelves.
And last, but not least, Maggie, my wife, for enduring yet another project. Her
support kept me sane during the long haul. She also kept an eye on my schedule
and kept me moving towards the final goal.
From IFSMgr to
the Internet
The file system in Windows 95 resides in a component named the Installable File
System Manager, or IFSMgr. As its name suggests, IFSMgr is responsible for
routing file system requests to the installed file systems. · Multiple file systems are
implemented as independent drivers underneath IFSMgr. Thus, it is hard to get a
complete picture of the file system without examining file system drivers (FSDs)
too. Later chapters will focus on the underpinnings of IfSMgr and file . system
drivers, but for now let's get a feel for why the file system is so important.
Long Filenames
One of the most touted features of Windows 95 is its support for long filenames.
Th.is support is brought to you through the Win32 API (application programming
interface) and also through the clunky, old Int 21h interface. These two interfaces
cover three of the Windows 95. operating modes: Win32, Win16, and DOS box.
But what about MS-DOS mode, the real-mode DOS version 7.0? Does it support
long filenames?
To find out, let's build the simple DOS application in Example 1-1, which uses
one of the new long filename APis (the source and executable for this example
are in the DOSVOL directory of the companion disk).
For brevity , Example 1-1 does not display the implementations of several support
routines such as GetStartupDrive(), GetVollnfo(), etc. These are small C functions
that contain inline assembler Int 21h calls.
This little application prints the MS-DOS version and, if Windows is detected, the
Windows version as well. The function GetVollnfo moves its function arguments
into appropriate registers and then invokes interrupt 21h function 71a0h. This Int
21h service return5 volume information for the drive specified by a root path
1
2 Chapter 1: From IFSMgr to the Internet
string, e.g. , C: \. If successful, this service returns the file system name, the
maximum length for a filename component, and the maximum length for a fully
qualified filename for the specified volume. This is essentially the DOS equivalent
of the Win32 function GetVolUm.elnformation
if WinCheck ( ) == 0 )
print f ( " - Windows Vers ion %d . % 0 2 d\ n " , GetWinMaj orVers i on ( ) ,
GetWinMinorVers ion ( ) ) ;
else print f ( "\n • ) ;
s zRootNameCO ) + = GetStartupDrive ( ) ;
strcpy ( s zRootName , • @ : \ \ " ) ; / * volume string * /
print f ( " Get Volume Information , Int 2 1h Func tion 7 1A0h . \ n " ) ;
i f ( ! Ge tVo l info ( s zRootName , s zFS , s i zeof ( s zFS ) ,
&maxfn , &maxpath , & f l ags ) )
print f ( " Drive %c - FAILED . \n\n " , s zRootName [ 0 ] ) ;
else
print f ( • Drive %c - F i l e sys tem : % s MaxFi leName : %d •
MaxPathName : %d\n\ n " , s zRootName(O ] , s zFS , maxfn ,
maxpath l ;
Now let's take the same DOS application and execute it in MS-DOS mode. You
reach that mode by selecting "Restart windows in MS-DOS mode" from the Shut
Down Windows dialog. This time you get these results:
MSDOS Ver s i on 7 . 0 0
Get Vo lume Information , Int 2 1h Func tion 7 1AOh .
Drive C - FAILED .
Hmm . . . long filename support is not available from real-mode DOS! Well, where
is it coming from then? Function 71a0h and the other long filename (71xxh) func
tions are supplied by IFSMgr. IFSMgr defines the APis that a file system can
support, but it in tum needs an installed file system driver to fulfill the requests.
This simple example illustrates that the DOS long filename APis are only available
if VxDs, like IFSMgr, are present to provide them.
Windows 3. 1 1 Had an IFSMgr? 3
· n might appear that IFSMgr is adding features to an MS-DOS base. Actually, the
change is more fundamental than that. Most of the DOS-like functionality that you
enjoy in a Windows 95 DOS box, at least as far as the file system goes, is brought
to you by IFSMgr. It is more accurate to think of IFSMgr as a replacement for the
DOS file system. The MS-DOS code base is still used for some functions, but in a
subservient role.•
We've just looked at a single API here, one of many that are documented in "Part
5: Using Microsoft MS-DOS Extensions, " of Programmer's Guide to Microsoft
Windows 95. Microsoft calls these new APls MS-DOS extensions. The name is
significant: they look like good old MS-DOS but they are not a part of a new MS
DOS version. Rather, they are part of IFSMgr, extending it from the baseline imple
mentation that came with Windows 3 . 1 1 .
A good example of this is provided by the DOS subst command. The subst
command, you'll recall,· is used to map a drive letter to a local directory. If you
have a Windows 3.11 configuration available, you might want to try this. First you
should make sure that you are currently using 32-bit file access. You do this with
the 386 virtual memory settings from the Control Panel. Once you have 32-bit file
access set up, insert a command like this into autoexec. bat:
subst d : c : \windows \ sys tem
where d : is whatever the next available drive letter might be for the system.
Now shut down Windows and reboot the system so that the new line added to
autoexec. bat will execute. After the initial Windows logo screen is displayed, a
blue character mode "pop up" will appear with the following message:
3 2 -bit F i l e Sys tem
The 3 2 -bit f i l e sys tem is incompatible with the SUBST u t i l i ty .
To use 3 2 -bit f i l e acces s , do not use the SUBST uti l i ty before
start ing Windows for Workgroups .
Pres s any key to cont inue
• This topic is discussed in great detail in Unauthorized Windows 95 by Andrew Schulman (especially
Chapter 8, appropriately entitled "The Case of the Gradually Disappearing DOS"). Also see http://
www.sonic. netl-undoc/.
4 Chapter 1: From IFSMgr to the Internet
If you press Return, Windows continues to start up. But if you check the 386
virtual memory settings in the Control Panel, you will find that you are using 16-
bit file access, even though the checkbox for 32�bit file access is checked. What is
happening here? If IFSMgr detects that you have subst drives in the system during
its initializ�tion, it will not support 32-bit file access on any drive, and drops back
into 16-bit file access using MS-DOS.
subst is only one example where the Windows 3. 1 1 IFSMgr gracefully degrades
back to 16-bit file access; other examples include the presence of a DOS 6.0
DoubleSpace drive, the presence of some other types of compressed drives, and
the existence of open files on a drive when IFSMgr initializes. In contrast,
Windows 95 fully supports subst drives and DoubleSpace drives.
MultiMon is an exciting new tool, which you get with this book. It is described in
detail in Appendix A, MultiMon: Setup, Usage, and Extensions, and you also get
complete source code for it. Unlike a lot of other "snooping tools," MultiMon
reveals what is going on at ring-0. It doesn't tell you which Win32 API is being
called; instead, it may reveal a sequence of ring-0 APis and events that corre
spond to a single Win32 APL
The experiments we conducted at the beginning of this chapter give you first
hand knowledge about the role IFSMgr plays in Windows 95. Tools like MultiMon
will take you much further and allow you to ferret out many other secrets about
IFSMgr and other Windows 95 internals. Before we put MultiMon to work, let's
digress a bit to g�t an overview of IFSMgr. The next section may be a little
abstract, but having this conceptual framework will prepare you for what's to
come.
An Overview of IFSMgr
To reiterate, the Installable File System Manager is responsible for routing file
system requests to the installed file systems, and file systems are implemented as
independent drivers under IFSMgr. The target file system for a request depends
An Overview of IFSMgr 5
upon the format of the filename by which the file is initially opened or created.
The forms that a filename may take are discussed in Chapter 2, Where Do File
names Go?
The system components to which IFSMgr interfaces are shown in Figure 1-1. The
arrows leading in to IFSMgr are from clients that make requests upon a file
system. The arrows leading out from IFSMgr are to file system drivers (FSDs). All
of the components shown here execute in one of the Intel x86 processor's
protected modes. The dark grey boxes indicate components with the least privi
lege level (ring-3) whereas the pale boxes are virtual device drivers with the
highest privilege level (ring-0).
Ring3
Ringo
Supporting Sub-layers
parameters. This mode is available in a "DOS box," a window into a virtual 8086
machine executing some DOS application.
Given that all of these application modes ultimately make requests via an inter
rupt 21h interface, it should come as no surprise that this interface is IFSMgr's
primary client interface. However, this interrupt 21h interface is extended beyond
the range of commands currently encountered in the MS-DOS environment. In the
DOS environment, the upper limit is set at function 71h, which corresponds to the
new long-filename commands added as MS-DOS extensions to Windows 95.
IFSMgr maps commands over the range OOh to E7h, with OOh through 71h being·
equivalent to MS-DOS usage. (The highest DOS command is 73h in OSR2.)
IFSMgr also has many ring-0 clients. Figure 1-1 shows a couple of examples with
VSERVER and VWIN32. VSERVER provides support for the server side of an MS
NET peer-to-peer network. When some remote system requests a file operation of
a server, VSERVER fields the request and routes it directly to IFSMgr. Another
example is provided by VWIN32, the driver which helps KERNEL32 implement
the Win32 APis. This driver exposes an interrupt 21h dispatcher interface which
ultimately calls into IFSMgr when it executes interrupt 21h requests on behalf of
Win32 applications. Yet another example is provided by DYNAPAGE, the driver
which supports the dynamic paging file. When the memory manager needs to
page-out or page-in some part of virtual memory, it uses IFSMgr to do the reads
and writes via the DYNAPAGE driver.
which are required by the service request are combined in an i fsreq data struc
ture. IFSMgr uses this common ifsreq structure to send commands to all FSDs.
The FSD also uses the i fsreq structure to return the command results.
IFSMgr must keep track of registered resources and the FSDs that registered them.
Resources can include local disk drives, network connections, network drives,
and character devices. When a resource is added to the system, it is registered
with IFSMgr through a "mount" operation. This operation also binds a resource to
a particular FSD. Resources may also be removed from the system through a
"dismount" operation.
Similarly, IFSMgr tracks open file handles and the resources with which they are
associated. A file handle may refer to a mapping between a filename and a disk
allocation, or it can refer to a search context, as in the Win32 functions· FindFirst
File and FindNextFile. A file handle may also be used for tracking clients which
are accessing a character device.
Resources and file handles each have their own sets of operations. These opera
tions are exposed by each FSD through two separate function tables: a table of
functions for accessing a resource's services and a table of functions for accessing
services requiring an open file handle. The functions which make up these tables
are defined by the FSD interface; each function expects specific usage of fields in
the ifsreq structure for passing arguments and returning results.
When IFSMgr receives a request, it must convert it into one or more calls to an
FSD's function table. It uses the information in the request to pair up with a partic
ular FSD. In the case of local drives, the volume number provides this association;
in the case of remote drives and connections, the server name and share name
are used; in the case of character devices, the device name is used.
Local drive FSDs (e.g., VFAT) are responsible for implementing the semantics of a
particular file system. They know about things like disk layout, disk storage alloca
tion, and file and directory naming. These FSDs call upon IFSMgr for help with
name parsing but rely upon IOS (I/0 Supervisor) for accessing the physical disk
coordinates. Local file systems are used to partition the fixed disks and to provide
hardware-independent coordinates for locations on the disk (e.g. , volume C,
logical sector 234). The I/0 Supervisor is only briefly discussed in this book.
Remonte or network FSDs (e.g. , VREDIR) typically package a file system request
in one or more packets and ship it across a network. The request is translated
into a file-sharing protocol (such as SMB) and transferred using a transport
protocol (such as NETBEUI). These FSDs call upon IFSMgr for help with name
parsing, setting up, and tearing down connections, but rely upon the transport
layer for accessing the remote system.
In terms of the layers of the Open System Interconnect (OSI) Reference Model, a
network FSD or redirector occupies the application and presentation layers and
interfaces at its lower boundary with the session layer (e.g., VNETBIOS).
Character FSDs (e.g. , MONOCFSD) model devices that send and receive data one
byte at a time, in a serial fashion.
All FSDs use the same function table structure to interface with IFSMgr. The func
tions that each type of driver exposes can be quite different. If an FSD does not
need to support a particular function, it returns an error if a client should happen
to call it. This is necessary because there is no means of determining in advance
which functions a particular FSD has implemented.
The Function column in Figure 1-2 displays the names FS_OpenFile and FS_Close
File. These are the names of entry points provided by a file system driver. The
Device column tells us which file system driver is being used. In this case, all of
the file opens are completed by VFAT, the Virtual FAT file system. The Handle
column contains the numeric value of the handle returned by the open. Two
ranges of numeric handles will be seen in this column: DOS handles, which are
less than 200h, and extended handles, which are 200h and greater. The Args
column contains the pathname of the file. It is followed by a Flags2 column
Loading Netscape Navigator 9
which contains "oe" for each of the opens, which indicates open-existing,
meaning the open will fail if the file does not already exist.
In Figure 1-2, we see the span of time which starts with Explorer calling ShellExe
cute until Netscape is an independent process. We are narrowing our focus to
those components that are loaded by the operating system before control is actu
ally passed to the newly-formed Netscape process. During this intermediate stage,
the address space for Netscape is being prepared. It's not quite a complete
process yet, so its module name is flagged with a * prefix. You can see this in the
column labeled Module, where the name changes from "Explorer" to "*netscape"
to "Netscape".
Table 1-1 contains a list of the files that we see being opened in Figure 1-2. At the
bottom of the table, there is an entry for the VxD WSOCK. This is a helper VxD
that wsock32.dll opens when its entry point is called with the DLL_pROCESS_
ATTACH flag. This is after the Netscape process is created, so we will ignore it for
now.
You may feel a little uneasy about what is missing in this Table 1-1. Where are
KERNEL32, USER32, and GDI32? Surely, Netscape uses these ubiquitous system
DLLs. Actually, a better way to get a list of required modules is to look at the
import list for Netscape using a utility like Quick View. Doing this yields the
10 Chapter 1: From IFSMgr to the Internet
You may be thinking that these DLLs reside in shared memory and so there is no
need to load them for each process. That answer is partially correct. To see why,
let's look at the image base addresses for each of Netscape's imported modules.
The image base address is the preferred address at which a module wishes to be
loaded. If it gets that address, its memory image does not have to be relocated, so
· this provides a load-time optimization. (Image base addresses can also be deter
mined using Quick View.)
Table 1-2 shows the modules and their image base addresses in descending
order. The linear address of an application is divided into four regions or arenas:
DOS (0-003fffffh) , private (00400000-7fffffffh) , shared (80000000-bfffffffh), and
system (cOOOOOOO-ffbfffffh). The first five modules in Table 1-1 are loaded to the
shared memory arena. To quote the DDK documentation, "This arena is used for
ring-3 shared code and data. " Thus, once one of these DLLs is loaded it will be
visible to all other code and data, such as 1 6-bit Windows applications and DLLs,
DPMI memory, and 32-bit system DLLs.
The remaining ten modules in Table 1-2 are destined to be loaded into Netscape's
private arena. The private arena . is used for code and data that is private to a
Win32 process. Private means that the page table entries corresponding to the
linear address range are kept separately for each process. Each Win32 process has
its own mapping of pages in its private arena; this mapping is called a memory
context. This is why all applications can load at the same linear address of
400000h.
At this point you are probably comfortable with the idea of sharing DLL code and
data as long as it is in the shared arena. But what if modules are loaded into a
process's private arena--can they still be shared with other processes? We need
more information to answer this. Let's try another MultiMon trace. This time we'll
continue to look at only file opens (FS_OpenFile) and file closes (FS_CloseFile)
but we'll start sampling from the time the system boots and continue until we
have launched Netscape. This, in effect, will give us a list of open modules at the
time we start Netscape.
This experiment produces a lot of output, over 1800 lines for this particular config
uration. Many files go through an open and close cycle; we are not interested in
these. Once we filter out this noise, we are left with files which are opened and
remain opened. Further condensing this list to just the modules which Netscape is
dependent on, we arrive at Table 1-3.
In this experiment, we get a slightly different list of modules which are opened
and loaded along with netscape.exe. This list is given in Table 1-4.
What we see here is that any module that has already been loaded won't be
loaded again. It makes no difference whether the module is loaded into a private
arena; it can still be shared.
How does Windows 95 do this? It turns out that there is an obscure function,
called _PageAttach, made just for this purpose. For example, if I know that the
memory context for explorer.exe contains an image of the module OLE32, I can
map all or some of the pages of that image into my process's memory context.
Selective mapping is necessary because some pages of the image, such as data,
may have to be loaded directly from the source file and not be shared with other
memory contexts.
same set of pages in the memory context whose handle is c 10a0e20h (Explorer) .
Similarly, the 5 pages starting at linear address 7ffc7000h (the . orpc section) are
Loading Netscape Navigator 13
*nets cape FS_ReadFile (3f) 0 217 cnt=1 OOOH ofs=6a4 0 0 H ptr=c1 3510 0 0 H
*netscape Page Res e rve 0 0 0 71!60 0 0 0 0 0 0 8 6 0 0 0 0 0 0 1 0
*nets cape Page Com m it 0 0 0 71!60 00000001 09 00b20000 60040000
*netscape PageAttach 71!61 c1 Oa0e20 71!61 66
*nets cape PageAttach 7ffc7 c1 Oa0e20 7ffc7 5
*nets cape Page Com m it 0 0 0 7ffcc 0 0 0 0 0 0 0 1 01 OObOOOOO 60060000
*n ets cape Page Com m it 0 0 0 7ffcd 0 0 0 0 0 0 0 1 0 1 00b50001 60060000
*nets cape PageAttach 7ffce c1 O a0 e 2 0 7ffce 1
*netscape PageAttach 7ffcf c1 Oa0 e 2 0 7ffcf 6
*nets cape PageCommit 0 0 0 7ffd5 00000003 08 a0b00070 6 0 0 6 0 0 0 0
*nets cape Page Com m it 0 0 0 7ffd 8 00000001 08 a0b 1 0 0 7 3 60060000
*netscape Page Com m it 0 0 0 7ffd9 00000002 0 8 c0b 0 0 0 7 3 60060000
*netscape Page Com m it 0 0 0 7ffdb 00000001 0 8 c0b30075 6 0 0 6 0 0 0 0
*nets cape PageAttach 7ffdc c 1 O a 0 e 2 0 7ffdc 2
*netscape PageAttach 7ffde c1 Oa0e20 7ffde 2
*netscape PageAttach 7ffe0 c1 Oa0e20 7ffe0 6
*nets cap e FS_ReadFi l e (3f) 02cc cnt=1 OOOH ofs=73cOOH ptr=c1 35f0 0 0 H
*nets cape FS_Read Fi l e (3f) 02cc cnt=1 OOOH ols= 7 4 cOOH ptr=c1 35100 0 H
*netscape FS_Read File (3f) 0 2 cc cnt= 6 0 0 H ols= 75cOOH ptr=c1 35fOOOH
*netscape Page Commit 0 0 0 7ffd 9 00000001 08 c0b00073 60060000
*nets cape FS_Read Fi l e (3f) 0 2 cc cnt=1 OOOH ols=73cO O H ptr=c1 35fOOOH
*nets cape FS_ReadFile (3f) 0 217 cnt=1 OOOH ols=6b400H ptr=c1 3510 0 0 H
also mapped to the same set of pages in Explorer's memory context. You get the
idea: attached pages are mapped and thus shared whereas committed pages are
private. The three FS_ReadFile calls load a private copy of the . idata section, the
module's import table. A summary of how the page ranges are treated is given in
Table 1-5.
What we have seen in our first example · is how the file system intermingles with
operating system internals. Now let's tum our attention to an example from the
application realm.
_ A glance back at Table 1-1 will remind you that Netscape loads wsock32.dll and
then wsock.vxd is opened by WSOCK32 . The relationship between these two
components is that of a client and a service provider. WSOCK provides an inter
face to socket services, and WSOCK32 exports the Windows Sockets APls and
makes calls into WSOCK to implement the APis. WSOCK32 accesses these ring-0
socket services via the DeviceloControl Win32 APL
It just so happens that we have a MultiMon extension for monitoring DeviceloCon
trol calls (see Chapter 3, Pathways to the File System). Each DeviceloControl call
targets a specific device; it specifies a command code and buffers for input and
output arguments. To report on WSOCK calls, we just need to interpret the argu
ments which are passing through the monitor. A little bit of work leads to the
mapping shown in Table 1-6.
Armed with our primitive Winsock monitor we can now see web browser opera
tions in terms of socket calls. For the results which I show here, the Netscape disk
cache was cleared and a connection to my Internet service provider was already
established. To minimize extraneous noise, the display of the default home page
which you connect to should be finished as well. MultiMon is then started · and
monitors are enabled for "VWIN32 DeviceloControl" and "IFSMgr Filehook" (with
back to Netscape and at the Go to: prompt enter http : I /'WWw . ora . com/ and
FS_OpenFile, FS_CloseFile, FS_ReadFile, and FS_WriteFile APis selected). Then go
press Return. This will take you to the O'Reilly & Associates, Inc. home page.
Once the status message says "Document Done", you can stop MultiMon.
The output that I got for this experiment is spread over several examples, starting
file 1/0, extra calls, and file 1/0 for non-web-page files.
with Example 1-2. The output has been "cleaned up" by removing traces of swap
•select
Example' 1-2 shows the steps that are taken just to get connected to
www.ora.com. To establish a connection a socket is opened with the socket APL
Sockets have handles just like files do, but they also have a "handle context,"
which is like a file descriptor structure. The first socket opened returns a handle
of 42h, but is referenced in subequent calls with the handle context of c0f10e50h.
16 Chapter 1 : From lFSMgr to the Internet
Next we see several calls setting up the pFoperties and event handlers on this
socket. For instance, the WSAAsyncSelect call requests that notifications for read,
write, connect, accept, etc. be sent as Windows messages to the window with
handle 408h. A single registered message (cffeh) is used with the socket handle
in the wparam and the event in the lparam. The setsockopt API requests that the
socket linger a certain amount of time when it is closed if unsent data is present.
The ioctlsocket call requests that the socket operate in non-blocking mode.
At this point socket 42h is poised to connect to www. ora.com, but before it can
do so it needs to know the IP address (204. 148.40.9) to connect to. The next few
lines are involved with resolving this name. First, we see a read from the local
HOSTS file to see if there is a matching entry. My HOSTS file only contains names
of local machines so I know that will fail. So Netscape is forced to go to the
Internet to find the IP address for the name. To do this it opens another socket,
number 62h, and connects on that socket to 204. 156. 128. 1 , the IP address of my
Going to www.ora.com 17
service provider's DNS (Domain Name System) name server. I t connects on the
well-known port 53 for DNS and sends a packet containing information about the
name it is searching. The select call waits for the reply and the subsequent recv
presumably gets a matching IP address back. Now that we have the IP address,
we're done with socket 62h, so closesocket gets rid of it.
HTIP.
string "GET I HTTP/l .0 . . . ", which requests the server's home page from the root
Continuing with the trace in Example 1-3, Netscape sends a packet containing the
directory of the web server. Several recv's are then made on socket 42h, but the
actual amount read is uncertain since the requested amount is usually not the
same as the returned amount. With some portion of the HTML home page read
in, Netscape creates a file named mopl 7ie0 in its . \ cache directory in which to
-store it. As more data is received on socket 42h, it is appended to a local buffer.
Finally, at the bottom of Example 1-4, the entire home page has been received
all Oa18h bytes-the socket handle is closed, the buffer is written to mopl 7ie0,
and the file is closed.
While the home page is still being read in, sockets 63h, 64h, and 65h are created
in Example 1-4. These sockets are created in the same fashion as socket 42h was.
Note that as these new sockets are added, the socket lists passed to select appear
to include them as well, since the list sizes increase by the same amount. Each of
these sockets is going to handle the transfer of a referenced image in the HTML
page.
The final bit of output that we'll look at, shown in Example 1-5, corresponds to
socket 65h (handle context c0f29a3ch). The output for sockets 63h and 64h is
_
IP address for www.ora.com, Netscape sends a packet containing the string "GET I
essentially the same, so there is no need to show that too. After connecting to the
18 Chapter 1: From IFSMgr to the Internet
graphics/space.gif HTIP/1 .0'', which requests the server's space.gijfile from the I
graphics directory of the web server. Several recv's are then made on socket 65h.
Once the GIF file has been received, Netscape creates a file named mopl 7IE3.gif
in its . \ cache directory and then closes socket 65h. At the bottom of Example 1-
5 , the received buffer is written to mopl 7IE3.gif, and the file is closed.
This example illustrates the limits of looking just at the file system. If all we saw
were the opens, writes, and closes, we would be unaware of the concurrency of
these operations. By combining some rudimentary information about Windows
sockets with a trace of file system activity, we see that a socket connection is
assigned to each file transfer, and when the transfer completes, the socket goes
away.
We have covered a lot of territory in this chapter, literally from IFSMgr to the
Internet. I hope it has impressed upon you how pervasive the file system is. In
Going to www.ora.com 19
the next chapter we'll continue our excursion with a look at the varieties of file
names supported by Windows 95.
c 0 f 2 9a3 c
FS_OpenFile ( 6 c ) VFAT 2 9b * . . \NETSCAPE \N . . R\ CACHE\MO P 1 7 IE3 . GIF ca
( c losesocket ) WSOCK
FS_WriteFile ( d6 ) VFAT 2 9b cnt = 3 9 H o f s = O H ptr=12 c 6 6 1 8H
FS_CloseFile ( 3 e ) VFAT 2 9b f
Where Do
Filen ames Go?
A file system is an abstract idea. What you deal with on a daily basis are the
names of files that a file system stores and retrieves. Before Windows 95, DOS
and Windows 3.x users learned to accept the limitations of their systems. Instead
of a descriptive name like FooTech Annual Report 9 7.doc, they constructed a
name like foo_ar9 7.doc. Much of the talk about the Windows 95 file system
focuses on this transition from "short names" to "long names. " While increasing a
name's length is a long-awaited benefit, there are much more interesting aspects
of a filename.
What's in a Name?
Most of us equate filenames with strings like c: VoobarVoo.txt. This example
adhers to the "8.3" convention of limiting filename components to 8 characters
20
Accessing Local Files 21
95 VFAT file system. VFAT continues to support the 8.3 naming convention and
provides for conversions between long and short forms of pathnames.
We won't delve into the detailed rules governing the construction of valid file
names in the FAT and VFAT systems. These topics have been addressed in other
books and periodicals (see "Long Filenames" in Programmer's Guide to Microsoft
Windows 95, Microsoft Press, 1995).
Another kind of naming that you will encounter follows the Universal Naming
Convention (UNC). A UNC name con,sists of two leading backslashes followed by
a machine name, a share name, and then directory and filename, as in
\ \ TOPDOG\DEVDISK\ bin\ nmake.exe. These names are used primarily for refer
encing network resources, although a local share can be accessed with a full . UNC
name, as in \ \MYMACHJNE\MYSHAREVoodirVoofile.txt. The machine name is
limited to 16 characters, including the null terminator, and the share name is
limited to 13 characters, including the null terminator. The remaining portions of a
UNC name follow the VFAT naming conventions.
Some special forms of UNC names are based on the use of a dot ( .) for the server
name. These names are used to refer to resources residing on the local machine.
For example, a local mailslot is referenced as \ \ . \MAILSL01\fooslot. Windows 95
also uses this form of UNC name for referencing some devices. To open a virtual
device driver, you pass the name \ \ . \ VxDName to the Win32 API CreateFile.
VxDName can be either a VxD module name, a VxD file name, or an entry under
the registry key · HKLM\System \ CurrentControlSet\ Control\SessionManager\
Known VxDs. A filename is distinguished by having the name include an explicit
extension.
Another type of device name is used to reference the "standard devices. " Some of
these are holdovers from MS-DOS: devices like CON, LPTl , and PRN. New stan
dard device names can be added to the system by implementing a character file
system driver and registering it with IFSMgr.
So we see that Windows 95 supports several kinds of names. Some are meant to
access plain-vanilla disk files, others reach across the network to access a file at a
remote location, and yet others point to a device. Let's look at how Windows 95
deals with these different varieties of names.
application, called NT32, for testing names with the Win32 APls. It attempts to
open the filename entered on the command line with the fopen, CreateFile, and
OpenFile functions. If the function is successful, the returned handle is immedi
ately closed. This little application also emits tag strings at each step so that we
may easily trace its execution with MultiMon. Here is the MultiMon trace that was
logged when the command nt32 c:\windows\system.ini was executed:
This output packs quite a bit of information. Let's start by getting familiar with
what each column contains. The first column, Type, tells us which MultiMon
monitor reported the line. This trace contains lines of output contributed by five
different monitors: tag comes from TAGMON, fsh comes from FSHOOK, w2 1
comes from WIN32CB, and p2 1 and v2 1 come from 121HELP 1 .
Accessing Local Files 23
The next column, labeled Function, contains a description of the API or event
which the line represents. Many of the lines identify functions of the interrupt 21h ·
interface. Those whose names begin with "FS_" are functions in a file system
driver like VFAT.
The Flagsl column looks like a pattern in a bowl of alphabet soup. All these odd
looking characters are described in detail in Appendix B, MultiMon: Monitor Refer
ence. Each character represents a state flag that is either on-uppercase, or off
lowercase. For instance, the leading e indicates the function call succeeded
whereas an E indicates the function failed. The next four flags indicate the kind of
resource where a filename resides. In this example, every call into VFAT was
accompanied by the flags cLnu; the capital L signifies local.
The Dev (or Device column) contains the module name of the device that is
receiving the function request. For instance, in this listing, each "PS_" call is to the
VFAT file system driver.
The Hdl (or Handle) column contains the system file number, if the call is handle
based. When a file is initially opened and the handle is first created, it is marked
with an asterisk.
The Args column contains the filename or pathname that is an argument to the
function . . There is a limit to how many characters are stored, so you may see trun
cation at the beginning of the name.
Finally, we have another flags column, called Flags2. This column reports flags
that are passed to a function as part of the calling parameters. Here, we have oe
for open existing, f for final, Gt for get attributes, and Gm for get modification
time and date.
Now that you are little more comfortable with the output, what does it mean?
Start with the fopen call. In our test application, nt32, there are two program
statements:
that we see is an FS_OpenFile reported by the fsh monitor. This is where IFSMgr
is making a call into the VFAT file system driver. This open succeeds and returns
a handle of Ox2da. Note that this handle is not the same as the handle returned
by CreateFile.
What we have seen so far corresponds to a CreateFile call within the fopen func
tion. Before fopen returns, it also makes a call to the Win32 API GetFileType. This
call appears in the log as two lines reporting the interrupt 21h function 4400h (get
device data). As with the extended file open call, the w2 1 monitor first picks it up
. as a KERNEL32 call into VWIN32. Then VWIN32 passes it to the protected-mode
interrupt 21h interface which generates the p2 1 monitor line. Since this call is not
sent along any further, i.e . , to the file system driver, it is presumably handled by
IFSMgr.
To keep our little program tidy, we close the file descriptor returned by fopen as
soon as fopen returns. The fclose call adds three lines to our trace. These entries
follow the same pattern. We first see the close request in the w2 1 monitor of
VWIN32. VWIN32 passes the request down to the protected-mode interrupt 21h
interface, which generates the p2 1 monitor line. The next line that we see is an
FS_CloseFile reported by the fsh monitor. Again, we see IFSMgr making a call into
the VFAT file system driver.
I won't provide detailed descriptions of the CreateFile and OpenFile traces since
they are very similar. It is interesting that OpenFile is the "busiest" of the three;
apparently it has more work to do to fill in an OFSTRUCT. OpenFile also has
some different sequences than we have seen before. For instance, the removable
media check function 4408h goes from w2 1 to p2 1 to v2 1 to fsh. The v2 1
monitor is a virtual-86 mode interrupt 21h hook; it will see the interrupt before
IFSMgr sees it on its V86 interrupt 21h hook. By absorbing this interrupt 21h
request much later in the chain, IFSMgr is giving a wider range of drivers an
opportunity to see it.
Before we move on to see how the system handles a UNC name, let's sketch a
picture of the path we have followed. Tracing our path in Figure 1 - 1 , we started
in a Wiri32 application (nt32), then dropped down into the file system, passing
through KERNEL32, VWIN32, IFSMgr, and finally ended up in VFAT.
Here is a portion of the MultiMon trace that was logged when the command nt32
\\WEI'SUI1\C\windows\system.ini was executed:
Here we .only show the response to the fopen call. If you compare this with the
function sequence for a local file system call, you'll see they are the same.
However, if you compare the FS_OpenFile and FS_CloseFile calls you'll see that
they reference different devices-in this case VREDIR instead of VFAT. VREDIR is
a network file system driver, also known as a redirector. Note that the Flagsl field
has also changed from cLnu for a local file system call to clNU for a remote file
access. The "N" signifies a network resource is being accessed and the "U" indi
cates that the filename is a UNC name.
In the FS_OpenFile call to VREDIR, the server name and share name have been
stripped off; only the directory and filename are supplied (for example,
\ \ WETSUIT\ C\windows\system. ini becomes \windows\system.int}. This trun
cated name is passed · because there is an implicit connection established with the
server called "WETSUIT" for the share named "C" . Once the connection is made
there is no need to keep passing around its name; a resource handle is used
instead. This resource handle is a hidden argument to FS_OpenFile.
What we have been looking at is the client side of Microsoft Network. If you have
configured your machine to share files (and printers, too), you can be a server
like WETSUIT in the example above. If we run MultiMon on the server side, we
get a log like this corresponding to the fopen call :
What is conspicuously absent is any interrupt 21h call; we only see calls into
VFAT. First there is an attempt to locate the file using FS_FindFirstFile, and if that
succeeds an open is attempted. If you have keen eyesight, you might have also
noticed that the S flag is set in the Flagsl column. This flag is set if a file system
26 Chapter 2: Where Do Filenames Go?
Before we move on to see how the system handles a device name, let's refer back
to Figure 1-1 to trace the the path we have just followed. On the client side, we
started in a Win32 application (nt32) and then dropped down into the file system
passing through K.ERNEL32, VWIN32, IFSMgr, and finally ending up in VREDIR
and ultimately out onto the LAN On the server side, packets come in and move
.
up through the network layers to arrive at VSERVER; it passes the request directly
to IFSMgr, who relays it on to the local file system driver, VFAT.
One type of naming that IFSMgr is unable to cope with is a Uniform Resource
Locator (URL) . For example, in Chapter 1 , From IFSMgr to the Internet, we
retrieved a graphics image from the O'Reilly & Associates home page using the
URL http://www . ora.com/graphi_cs/space.gif. In addition to the server's directory
and filename, /graphics/space.gif, this name specifies a protocol, http, and server
location, www . ora.com. Currently, URLs are handled in the Explorer shell's
namespace using OLE COM (Component Object Model).* But there is an effort
underway to extend the SMB protocol, which is currently used as the LAN file
sharing protocol, to also share files across the Internet. This new file sharing
protocol is called CIFS, for Common Internet File System (see Chapter 13,
VREDIR: Tbe Microsoft Networks Client).
Accessing Devices
To complete our mini-tour of file system names, we'll look at the peculiarities of
using device names. Let's use nt32 again, but this time we'll supply it with the
name of a "standard device. " The standard device that we'll access is housed in
the file system driver, MONOCFSD , which is presented in Chapter 8, Anatomy of
a File System Driver (instructions are given there for installation). MONOCFSD
adds a device called "mono" which stands for a monochrome TTL display (as
opposed to a monochrome VGA display). This is a write-only device.
Here is a portion of the MultiMon trace that was logged when the command nt32
mono was executed:
• See the article "Sweeper," by Paul DiLascia and Victor Stone, in Microsoft Interactive Developer, available
at http://www.microsoft.com/mind/0396/sweeper.sweeper.htm.
Accessing Devices 27
If you compare this with the function sequences for our previous examples, you'll
see they are quite similar. One call that stands out here is FS_MountVolume. On
the first call to open this device, IFSMgr calls MONOCFSD's mount entry point.
This function establishes the linkage between the file system driver and IFSMgr.
Since this is a character file system driver, subsequent calls into MONOCFSD have
the C flag set in the Flagsl column, to indicate that this is a character resource.
Although we passed mono as the filename to fopen and CreateFile, notice that
the argument that the interrupt 21h functions see-and that ultimately gets passed
to FS_OpenFile-is E: \ ifsbook\ nt32\mono. The directory E: \ ifsbook\ nt32 was the
directory from which I executed nt32 . IFSMgr doesn't care because when it comes
to standard device names, it ignores the drive and path.
In this case, IFSMgr doesn't see these requests. Instead this is a job that VWIN32
assumes as part of its support for the DeviceioControl function. If we change
28 Chapter 2: Where Do Filenames Go?
The new lines that we . have added, of Type dev, ongrnate in the WIN32CB
monitor. One of the things this driver monitors is VWIN32's ring-0 Win32 service
to support KERNEL32's DeviceloControl interface. This interface is also "wired-up"
to the Win32 functions CreateFile and CloseHandle, when these functions are
referencing a VxD name. That is what we are seeing here, an "Open Device" for
IFSMgr from CreateFile and a "Close Device" for IFSMgr from CloseHandle. The
TAGMON driver, which spits out the tag strings in our trace, also uses Devicelo
Control to receive tag strings. The private code that it assigns to this function is
256. This trace also shows us that the Win32 OpenFile API doesn't accept VxD
device names.
To finish up our mini-tour of filenames, let's refer back to Figure 1-1 one last
time. We have traced two different paths for device names. For a standard device
name, we start in a Win32 application, then pass through KERNEL32, VWIN32,
and IFSMgr before ultimately arriving at the character file system driver,
MONOCFSD, in our example. On the other hand, for a VxD device name, only
KERNEL32 and VWIN32 are involved.
This chapter has been a quick "once-over" to introduce you to some of the
system components which play a role in the file system's operation. I have
thrown out some terms like Win32 services, protected-mode interrupts, and virtual-
86 interrupts. These system features are at the heart of what makes the file system
tick. They are the focus of the next chapter.
· · ·�· 1'he
:·.int�� cbaptef:�·
.i1tg JJ.a;ng
. .: •. AceeSsi
���i+flii�*, .
;: Yi-' 1 .:.'
>-, »; ;', ' '
..
'� .�,
Pathways to the
File System
In this chapter we will focus on file system plumbing-those mechanisms that are
used to make file system services available to an array of operating system modes:
DOSN86, Win16, Win32, and ring-0. In the next chapter we'll look at what gets
carried through this plumbing: the various APis.
To carry the plumbing analogy further, when a building is finished the pipes are
hidden from view. To see the plumbing you have to peer into crawl spaces with
a flashlight, or remove wall panels. But, if you visit while the building is going
up, before the floors and walls are erected, the plumbing is in clear view.
Well, we're not going to rebuild Windows 95 from the ground up; instead we're
going to watch as Windows 95 starts up to get a clearer view of the file system.
We'll be tracing through Windows 95 from the "Big Bang" to its quiescent state,
kernel idle. Armed with this background, we'll come back to the Windows 95
operating system modes, and examine how the file system is accessed from each
of them.
29
30 Chapter 3: Pathways to the File System
booting. More accurately, the log will collect events from System Critical Init until
Kernel Idle.
MultiMon can be configured with a variety of drivers to collect information about
different APis and events. In this chapter, we are especially interested in looking
at how the interrupt vector tables and callbacks get initialized. With this goal in
mind, I've used the set of MultiMon drivers shown in Table 3-1 to collect the
traces that we will be examining in the coming sections.
Rep!GlobalEnv (47)
VWIN32 Win32 Services K32Init (36)
VWIN32 Win32 Services
If you want to repeat this on your own system, you need to follow these steps:
• Install the drivers listed in Table 3-1, using MultiMon's Add/Remove Driver . . .
dialog from Options on the main menu.
• You must reboot your system to actually get the drivers loaded, since these
are static VxDs.
• After rebooting, start MultiMon and bring up the Filters dialog to adjust your
session logging options. Make sure the monitors in Table 3-1 are checked off
and other monitors are disabled. Within in each monitor, select only the APis
listed in Table 3-1 .
• After each monitor and its associated APis are selected, press the dialog but
ton Save As Default. (This button must be pressed once for each monitor.)
• Now reboot your system and this time, as it starts up, a log file will be cre
ated. Once the system has finished initialization, launch MultiMon; you will
be greeted with a message box stating: "BOOTMGR has captured a · log file.
menu; you may also want to remove other drivers which you don't plan to
use again.
With the exception of the first two services, all of these services are hooked by
vectors.vxd. For these hooked services, VECTORS has installed a preamble and/or
postamble which is executed whenever these services are called.
The Big Bang 33
In the last two sections of the session logfile, Begin PM App and Kernel32 Initial
ized, we also see other types of entries in the Function column. In these cases,
the line Type will be p2 1 , v2 1 , p2f, v2f, vw32 , or dev. The first four refer to inter
rupts 21 and 2f, whereas vw32 and dev refer to the Win32 callback. We have
hooked these interfaces by installing an interrupt handler and chaining it to the
previous handler. Hooking the Win32 callback is a little more involved and we'll
get to the details later in this chapter.
The other columns you will see in the log are:
Flagsl
May contain "Entry" or "Return" to indicate which side of a call the line was
reported from
Device
May contain the name of the VxD which is being called into
Handle
Used to store the interrupt number, as in "Int 2 1 "
Af8S
A string describing input arguments or return values
Flags2
Not used
Let us examine the output section-by-section, starting with the first two tables,
shown in Figure 3-1 . These tables display the values of the V86 and protect mode
interrupt vectors for the five software interrupts which IFSMgr monitors. The V86
vectors are segment:offset pairs that reference code that executes in V86 mode.
The protect mode vectors all have the characteristic 003Bh selector which
earmark it as a protected mode callback. The segment with this selector consists
of an array of Int 30h instructions (interrupt gates) which change the execution
ring level (see the sidebar "Breakpoints and Callbacks").
PM Vector=3B: 2E
VECT O R S bet I nitial I D T Vectors
? vec G eLPM_l nLVector E ntry I nt 17
? vec G et_PM_lnLVector E ntry l nt 21 PM Vector=3B: 42
? vec G et_PM_lnLVector E ntry l nt 25 PM Vector=3B: 4A
? vec G eLPM_l nLVector E ntry l nt 26 PM Vector.,3B: 4C
? vec G eLPM_l nLVector E ntry l nt 2F PM Vector=3B: 208
on the value of the MaxBPS key in the [ 3 8 6Enh ] section of system. ini. In
breakpoints and callbacks are stored. The amount of storage set aside depends
Windows 95 Build 950, the default value for MaxBPS is 400. The MaxBPSvalue
is rounded upwards to the actual number of breakpoints (ActualBPS) so the
storage claimed is the nearest whole number of pages. This storage is divided
into two portions.
The lower portion begins at the base address of the allocation and is Actual
BPS"8 in s.ize. Each V86 callback or PM callback consumes 8 bytes of this re
gion. A V86 breakpoint needs twice as much storage as a callback. To get the
additional space, ActualBPS is reduced by one and the freed storage is used
for the breakpoint.
For every callback and breakpoint two doublewords are stored, the Refdata
value and the Callback address as they were passed as arguments to the cor
responding services. Note that .this table does not distinguish a V86 callback
from a PM callback or a V86 breakpoint. This table grows towards higher ad
dresses, limited only by ActualBPS.
The additional 8 bytes of storage required for a V86 breakpoint is also allocated
from this same region but from the other end, i.e. , from higher addresses to
wards lower. The first breakpoint would be stored at (ActualBPS-1)*8, the next
at (ActualBPS-2)*8, and so on. Thus as breakpoints are added, the maximum
number of breakpoints (and callbacks) is reduced by one. In the 8 bytes of ad
ditional storage, the first doubleword is the linear address of the V86 break
point, followed by a word index into the "Refdata/Callback" array, followed by
the byte replaced with the arpl instruction, and then a byte of Offh for padding
(and probably to assure a mismatch when scanning for a matching CS:EIP).
Immediately following the region just described is a region filled with Int 30h
instructions, the interrupt gates for jumping from ring-3 to ring-0. The size of
this region is defined by the equation (ActualBPS+ 100h)*2 bytes. A descriptor
with selector 3Bh is defined just to reference this table. The additional lOOh
entries are included for default reflection of protect-mode interrupts to V86
mode.
When a V86 callback is called, an invalid opcode fault causes the program to
enter VMM VMM uses the CS:EIP in the client registers to determine if the call
.
er came from the arpl byte location. If it did, the actual segment-offset encod
ing of the address is used to look up the entry in the "Refdata/Callback" array.
-Continued-
The Big Bang 35
When a PM callback executes its matching Int 30h instruction, the interrupt
gate transfers control to VMM VMM uses the CS:EIP in the client registers to
.
determine if the interrupt came from code executing with selector 3Bh. If so,
EIP-2 is used to index into the "Refdata/Callback" array.
When a V86 breakpoint is "hit", an invalid opcode fault causes the program to
enter VMM In this case, the CS:EIP in the client registers does not point to the
.
single callback arpl instruction; rather, it points to an arpl that has been inserted
in the instruction stream. VMM uses the CS:EIP value to scan the breakpoint
array to locate a matching CS:EIP. If found, the index value is used to look up
the corresponding "Refdata/Callback" entry.
IDT stands for interrupt descriptor table. There isn't just one IDT; separate IDTs
exist for virtual-86 and protected mode. What is more, each virtual machine has
its own pair of V86 and PM IDTs. The current IDT is constantly changing, as VMM
switches. VMs and execution modes are changed within a VM. When Set_PM_Int_
Vector is called it sets the protected mode. IDT vector referenced by the current
VM to the specified handler; the IDT for V86 mode is not affected. In V86 mode,
occurs, not the interrupt vector table (IVI) at 0:0 in the current VM. The IVT
it is the V86 IDT which is consulted when a hardware or software interrupt
comes into play when no protected mode handler services the request. VMM then
reflects the interrupt to "real mode" to the corresponding entry in the IVT To .
assign a vector to the IVT for the current VM , Set_V86_Int_Vector is used. This
service stuffs the vector into the currently mapped VM at 00000000+4*intnum.
Software interrupts or traps occurring in V86 mode are always going to be initially
serviced at ring-0. In protected mode, the situation is a little more complicated.
Each entry in the PM IDT is a gate with a specific privilege level. When a software
interrupt occurs, the privilege level of the interrupting program is compared
against the privilege level of the gate. The interruptor must be at least the same
privilege level as the gate or a general protection fault is issued against the int n
instruction. This will still force the program to enter VMM, but at the GP fault
handler rather than at the intended interrupt handler.
This property of PM software interrupts also allows the PM IDT to contain
addresses of handlers which reside in a ring-3 DLL. It is also for this reason that
protected mode callbacks go through an interrupt gate which has a privilege level
of 3.
Now we have seen that Set_PM_Int_Vector and Set_V86_Int_Vector apply to the
current VM , but during System Critical Init, Device Init, etc. a VM does not yet
exist, so what affect do they have at this early stage? The DDK reference tells us
36 Chapter 3: Pathways to the File System
that if these services are called before the System VM lnit control message is
broadcast, the installed handler becomes part of the default IDT and IVT which
are used for every VM which is subsequently created.
Another observation we can make from the protected mode vectors shown in
Figure 3-1 is that each one is at an offset of 2*intnum in the Int 30h segment. The
first lOOh entries in this array are the default protected mode vectors that are used
for each VM. Their corresponding addresses will be from 3b:OOOO to 3b:Olfe. Note
that the address for the Int 2f handler lies outside this range. This is because VMM
has already overidden the default entry by installing a callback at 3b:0208. The
default protected mode vector which this handler should chain to would be at
3b:005e.
Continuing with the System Critical Init phase, Figure 3-2 shows a few of the
entries from this stage. There are no entries made by IFSMgr, but DOSMGR does
install protected mode handlers for Int 21h, 25h, and 26h, the same interrupts
IFSMgr has an interest in. Note that for each protected mode handler installed,
first a callback is allocated and then the protected mode vector is set to this call
back address. Each of the Allocate_PM_Call_Back calls associates a ring-0
procedure with the callback. For instance, in the case of Int 21h, the ring-0 proce
dure is c0220lac. VMM provides a handy service, _GetVxDName, that converts a
ring-0 address into a device name, segment, and offset form. For example, the
ring-0 address c02201ac is located in DOSMGR segment OAh at an offset of lACh
from its origin (DOSMGR(OA) + OOOOOIAC).
24h, and 2Ah. When VMM receives the interrupt via the V86 IDT, it will check to
Hook_V86_Int_Chain is used to install V86 interrupt handlers for lBh, 21h, 23h,
see if any handlers have been installed for the interrupt by the Hook_V86_Int_
The Big Bang 37
Chain service, and if so, control i s passed to the handler. This service may be
used to install multiple V86 handlers for a particular interrupt. The last handler
installed gets the first crack at handling the interrupt. Only if it doesn't handle the
interrupt or wishes other handlers to see the interrupt too, it returns with carry
set. If carry is cleared on return, then VMM does not pass the interrupt on any
further. Only if all of the installed handlers fail to service the interrupt (or if no
ring-0 handlers have been installed) VMM consults the IVT for this VM and pass
the interrupt to the "real mode" components in the VM .
Device Init phase is the phase during which devices do most of their initialization.
This is the phase where we see the first entries in the log file for IFSMgr. We see
from the output in Figure 3-3 that IFSMgr is interested in interrupts 17h, 21h, 25h,
26h, and 2Fh. Of these, 2lh, 25h, and 26h have protected mode vectors installed
using the Allocate_PM_CalLBack service. along with Set_PM_Int_yector, as we
saw with DOSMGR. For the V86 IDT, IFSMgr installs ring-0 handlers for interrupts
17h, 21h, 25h, 26h, and 2Fh. The only thing unaccounted for is the V86 call back.
This callback is passed to the DOS device driver ifshlp.sys. It provides a way for it
to enter IFSMgr (see the section "Bouncing Back from ifshlp.sys" in Chapter 5, Tbe
''New" MS-DOS File System).
Figure 3-4 shows the entries for the final VMM initialization stage, Init Complete.
Here, we see VMPOLL install both protected mode and V86 mode handlers for
Interrupt 21h.
bit l nitComplete
••••
IVf and protected mode IDT of the system VM are stored away as templates to
be used for creating future VMs.
VMs begin life in V86 · mode, and the System VM is no different. To switch the VM
to protected mode requires launching an application in the VM that makes use of
Window's DPMI services to make the change. The application that gets launched
is krn/386.exe, a 16-bit protected mode application. When a protected mode appli
cation starts in a VM, VMM broadcasts the message "Begin PM App." Starting with
this stage, we see ring-3 services added to the MultiMon trace in Figure 3-5.
Many of the services listed in the Function column in Figure 3-5 are ring-3, appli
cation level services. These include:
Win/386 Multiplex, Get Device API (Int 2Fh, AX= 1684h)
Win/386 Multiplex, Get DPMI Extension (Int 2Fh, AX=l68Ah)
Win/386 Multiplex, Get Win32 API (Int 2Fh, AX= l88Dh)
SetVect (Int 21h, AH=25h)
ReplGlobalEnv (VxDCall(002A0031h))
K3 2Init (VxDCall(002A001Fh))
These are just a small fraction of the services that could be logged at this stage.
There are numerous · Int 21h and Win32 services that don't show up here. The
services that were selected were chosen because they help to account for the ring-
0, Allocate_PM_Call_Back, and Set_PM_lnt_Vector calls.
The log shows us that KRNL386 at this stage is concerned with fault and excep
tion handlers. We see it installing protected mode handlers for Interrupts 1 and 3,
the. Debug Exception and Debug Breakpoint. We also see several PM callbacks
being allocated to the VMM address c023183bh. These are used to install excep
tion handlers for interrupts 6, B, C, D, and E: the invalid opcode, segment not
present, stack exception, general protection fault, and page fault, respectively.
Presumably DPMI calls are used to set these exception handlers.
There are several Int 2Fh calls to retrieve the protected mode interfaces for
devices. The devices that are interrogated on this . system are: PAGEFILE, VWIN32,
VMM, and V1DAPI. Note that the protected mode callback (which is used for the
PM APis for these VxDs) is not allocated until some client requests it from the
device.
The Big Bang 39
There are also a couple of rare Int 2Fh calls: 168Ah, which retrieves the protected
mode callback to vendor specific DPMI extensions, and 168Dh, which retrieves
the protected mode callback to Win32 services. It is KERNEL32 which actually
uses this callback to implement the undocumented VxDCall function. At the time
Get Win32 API is called, a protected mode callback is allocated and asssigned a
ring-0 handler in VMM In order to monitor VxDCall traffic we install our ring-0
.
handler in its place and then chain on to the original handler. This allows us to
examine all VxDCall calls, but we only show two at the end of this section of the
log. The first, ReplaceGlobalEnv, is a wrapper for the VMM function VMM_
Replace_Global_Environment. K32Init is a wrapper for the VMM System_Control
service. It is used to broadcast the control message "Kemel32 Init," which marks
the beginning of the next stage.
After the Kemel32 Initialized message is broadcast, the kernel continues with its
initialization and performs operations similar to what we saw in the previous
40 Chapter 3: Pathways to the File System
stage. The log is much longer for this stage; a portion of it is shown in Figure 3-6.
Again, there are several Int 2Fh calls to retrieve the protected mode interfaces for
devices. The devices that are interrogated on this system include VDD, VIDAPI,
VMOUSE, Device=37h, REBOOT, SHELL, VMM , VFLATD, CONFIGMG,
MMDEVLDR(44ah), VDSPD, and V]OYD.
The kernel also continues to toy with the protected mode IDT. In this stage we
see handlers installed for interrupts 0, 2, 4, 6, 7, 9, D, 2 1 , 24, 2f, 3 1 , 3e, 7 1 , and
75. The handlers that are getting installed are in ring-3; they are specific to the
System VM. Req1ll that after System VM Init, Set_PM_Int_Vector applies to the
current VM. So, the modification of the IDT we have seen here and in the
previous stage only affects the System VM.
This trace shows us traces from the dev monitor for the first time. These lines
come from the monitor for WIN32 DeviceloControl. This isn't the Win32 Devicelo
Control exactly; rather, it is the VWIN32 function that implements a large portion
Accessing IFSMgr 41
of it. We are seeing this function called through the Win32 callback on behalf of
Win32 APis: DeviceloControl, CreateFile, and CloseHandle.
Up until now our trace has shown a lot of Int 2Fh calls to retrieve the protected
mode interface for a variety_ of devices. These protected mode callbacks can only
be used from Win16 programs that still allow Int 2Fh calls. Win32 programs are
required to use a new mechanism for accessing VxDs.
This · requirement is that the device be opened by CreateFile, exchanges data or
. commands . using DeviceloControl, and is closed with CloseHandle. All three of
these functions go through the same VWIN32 function. If the dw!oControlCode is
0 we have an open on behalf of CreateFile Qabeled as Open Device in the trace);
if the dw!oControlCode is -1 we have. a close on behalf of CloseHandle. Other
dw/oControlCode values indicate specific DeviceioControl commands that are
private to the device, i.e., a value of 100 for IFSMgr does not mean the same as a
value of 100 for VREDIR.
For IFSMgr, the dwloControlCode of 100 is defined in ifs.h from the DDK as IFS_
IOCTL_2 1 . The comment with the equate states "These definitions are used by
MSNET32 for making DeviceloControl calls to IFSMgr." The last two lines in
Figure 3-6 show two such calls with an AX value of 5f8ah, indicating a call to the
DOS Int 21h function 5f8ah. There are three other dwloControlCodes which
IFS.Mgr recognizes: IFSJOCTL_2F(101), IFS_IOCTL_GET_RES(102), and IFS_
IOCTL_GET_NETPRO_NAME_A(103). In the next chapter we'll take a closer look
at what these functions do.
Accessing IFSMgr
Figure 3-7 illustrates the IFSMgr entry paths from the four Windows 95 execution
modes. IFSMgr is a virtual device driver that executes in ring-0; thus, three of the
paths involve a ring transition from the application level, ring-3, to the kernel
level, ring-0. To support DOS and Windows 3.x applications, we see continued
support for the software interrupt interfaces, whereas for Win32 applications and
ring-0, new interfaces have been introduced.
Using MultiMon, we traced the installation of these handlers for all interrupts by
hooking the VMM service Hook_V86_Int_Chain. Table 3-2 summarizes V86 inter
rupt handlers for interrupts 17h, 2lh, 25h, 26h, and 2th, the interrupts that IFSMgr
42 Chapter 3: Pathways to the File System
Ringo
Win1 6 PM/Ring3
Int 21h
lnt 25h DeviceloControl
lnt 26h 1N32 Win32 Services
lnf25h
lnt 26h
Int 2th
DOSN86 - Ring3
Int 1 7h
monitors. Each column shows the sequence of events for servicmg that interrupt.
For instance, interrupt 17h is initially handled by the service routine in the VM's
V86 IDT. This will be a ring-0 interrupt handler in VMM that will check for
installed V86 handlers. If handlers have been installed, then the last one installed
is called first, then next most recent, etc . , until one services the interrupt. If none
of them service it, then the ring-3 V86 handler in DOS is used.
SHELL ( O l )
+ 47C
Accessing IFSMgr 43
(VMM )
3b : 0 3 3 0 3b : 0 2 0 8
( DOSMGR )
3b : 0 0 2 e 3b : 0 0 4 2 3b : 0 0 4 a 3b : 0 0 4 c 3b : 0 0 5 e
44 Chapter 3: Pathways to the File System
The handlers in the protected mode IDT may reside in 16-bit Windows DLLs or in
ring-0 VxDs. In Table 3-3, the first handlers to get a shot at Int 21h and Int 2th
reside in DLLs. All of the other handlers in this table are the addresses of
protected mode callbacks. Each of these callbacks corresponds to an Int 30h inter
rupt gate which maps the callback to a ring-0 handler. The VxDs which own
these handlers are shown in parentheses in the table.
As we saw in our trace of MultiMon events, KRNL386 has further customized the
System VM by installing ring-3 protected mode interrupt handlers. This gives
KRNL386 an opportunity to look at some of the interrupt requests before they are
passed down to ring-0 drivers . The kernel has a chance to "skim off' some Int
21h requests and handle them internally so they never reach the lower interrupt
chain, or perhaps arrive there in a different form.
At the bottom of each column is the address of the default PM callback. If none of
the PM handlers service the interrupt request, then when VMM sees a default PM
callback it reflects the interrupt to V86 mode. This means the interrupt chain
continues in the corresponding column of Table 3-2.
One exceptional case is Int 17h. It does not have a protected mode interrupt
handler installed for it in the PM IDT. So whatever handler is found here was
installed by VMM during system initialization. If you examine the PM IDT (using
WDEB386 or Winlce) you will find a riri.g-0 interrupt gate in the Int 1 7h slot.
Gates are like selectors in that they have descriptors which provide details about
their address, type, and privilege level. When issuing a softWare interrupt from a
protected mode application, the interrupt gate or trap gate must have a privilege
level no higher than that of the application.
In the case of Int 17h, the interrupt gate has a privilge level of 0, but it is being
called by an application with a privilege lev�l of 3; the result is a General Protec
tion fault (Int ODh). The fault handler in VMM looks at which instruction caused
the fault; if it was an Int n, it reflects the interrupt to V86 mode as if VMM had
encountered the default PM callback for that interrupt number.
Although we have entered the brave new world of 32-bit Windows development,
maintaining compatibility with 16-bit applications puts some serious constraints
on the Windows 95 architecture . One such constraint is the "bitness" of VMs .
Recall that VMs begin life in virtual-86 mode. If DPMI services are subsequently
used to switch the VM into protected mode, either a 16- or 32-bit mode is
selected as one of the arguments. Thereafter, that VM is marked as either a 1 6-bit
or 32-bit protected mode VM.
Since the System VM is created to load KRNL386 (a 16-bit protected mode applica
tion) , the System VM is marked as a 16-bit protected mode VM. The offshoot of
this is that if Win32 apps were to call into VMM through PM callbacks, VMM
would still perceive them as having a 1 6-bit stack. This breaks routines like
Simulate_lret when it manipulates the stack using the contents of the Client_
Register structure.
For these reasons, Microsoft is endorsing the DeviceioControl interface as the way
to go. Protected mode callbacks are out. Here is a quote from the introductory
chapter of the DOK reference on VMM:
. . .Win32 programs will appear as 16-bit applications from VMM's point of view.
In other words, Win32 programs will not be recognized by VMM as 32-bit applica
tions. This should not be a problem because Win32 programs should be using the
DeviceloControl interface to communicate with VxDs. Tbis is merely a warning
not even to try it any other way because it won 't work. [my italics]
Despite this dire warning, KERNEL32 continues to use a protected mode callback
to access VxD services, specifically what are called Win32 services. Before
Windows 95, VxDs only exported functions which could be used by other VxDs
as a table of services. With Windows 95, VxDs can now export a table of services
which can be accessed from ring-3 through a special protected mode callback.
The table of Win32 services is constructed much like "regular" VxD services, by
using several macros: Begin_Win32_Services, End_Win32_Services, and Declare_
Win32_Service. Win32 services are dynamically registered with VMM using the
VMM service Register_Win32_Services. Only a few VxDs export Win32 services at
this time; the most notable are VMM and VWIN32 (IFSMgr does not).
To get the Win32 protected mode callback address, you need to use the Int 2Fh
interface with the function W386_Get_Win32_API(168Dh), which is defined in
int2fapi.h from the DOK. This function returns a PM callback in ES:DI. You can
see the call to this function in the MultiMon trace shown in Figure 3-5 . There is a
catch-22 situation here . We need the callback address in a Win32 program but we
can't retrieve it because software interrupts (Int 2Fh) are not allowed in a Win32
application! There are various work-arounds here; perhaps the easiest is to use an
undocumented KERNEL32 function which ha:s Ordinal 1 . In the early Windows 95
46 Chapter 3: Pathways to the File System
beta, this function was exported as VxDCall and the name has stuck although the
function is no longer exported by name in _the retail release. KERNEL32 relies
heavily on this interface to access Win32 services in VWIN32 and VMM . If you are
curious about the details of how this Win32 callback works, see the section "The
Win32 Callback. "
The Win32 callback is an interface to IFSMgr but not a direct one, since IFSMgr
does not provide Win32 services itself; rather, it is VWIN32 that provides the
connection. Andrew Schulman, in Unauthoriz,ed Windows 95, describes the argu
ments required for VxDCall . Here is the passage in which he describes the Win32
service that provides Int 2lh:
VxDCallO expects a VxD Win32 service number (such as 2A0010h), and any
values for EAX and ECX on the stack. 2A0010h indicates VxD ID #002Ah, Win32
service #QOlOh. The PM callback in VMM decodes such Win32 service requests.
VxD 2Ah is VWIN32, and the PM callback in VMM will call its Win32 service #10.
VWIN32's Win32 service #lOh issues INT 21h on behalf of Win32 applications by
calling Exec_PM_Int, a VMM service new to Windows 95, with the parameter 2lh.
This VWIN32 Win32 service is being called constantly but doesn't show up in our
MultiMon trace because it was filtered out. Many Win32 file operations are
converted into one or more calls to this service and many ultimately are handled
by IFSMgr's protected mode Int 21h callback. There are other VWIN32 Win32
services which call IFSMgr services directly; we will examine these in the next
chapter.
The real meat of the file system services is provided via the Win32 callback by
way of VWIN32 . Another Win32 interface to IFSMgr that we uncovered during our
examination of · the MultiMon trace is DeviceloControl. IFSMgr exports this inter
face to MSNET32 , the Network API Library for Microsoft Networks.
IFSMgr provides services that allow other VxDs to install hooks into the file
system. In some cases, a program needs to only monitor file activity. These
services provide mechanisms for doing so .
IFSMgr also installs service hooks on a number of other VxDs at Init Complete
time. These include:
VWIN32_ActiveTimeBiasSet (2a001 5)
Schedule_Global_Event (l OOOe)
Resume_Exec(l 0085)
Suspend_VM(1 002b)
Resume_VM(1002c)
No_Fail_Resume_VM(1002d)
Nuke_VM(1 002e)
Close_VM(l OOec)
Crash_Cur_VM(l 002f)
So IFSMgr is even lurking around in VxD-land and doing a number on some stan
dard VxD services.
It pushes these registers and the Win32 service number for VWIN32's Int 21h
provider and then calls yet another KERNEL32 · service. This function, ORD_OOO l
is exported as ordinal 1 ; it is also known as VxDCall. This function is a wrapper
for the Win32 callback. It copies the first argument on the stack (the Win32
service number) to EAX and then pops the return address over the top stack argu
ment, replacing the Win32 service number with another copy of the return
address. It then performs an intersegment call using a far pointer (an FWORD in
32-bit land). It should come as no surprise that the address stored in
CS: [BFFBC004] is none other than our Win32 callback address: 003b: 000003da.
The Int 30h interrupt gate then transfers us into ring-0.
KERNEL3 2 ! 0RD_0 0 0 1
0 1 3 7 : BFF7 1 3 D4 MOV EAX , [ ESP+4 ]
0 1 3 7 : BFF7 1 3 D8 POP DWORD PTR [ ES P ]
0 1 3 7 : BFF7 1 3 DB CALL FWORD PTR C S : [ BFFBC 0 0 4 ]
0 0 3 B : 0 0 0 0 0 3 DA INT 30 ; # 0 02 8 : C023 6 2 8 8 VMM ( OD ) + l 2 8 8
Before w e look a t the Win32 service handler, let's take a quick look a t the Int 30h
handler. This is the common entry point in VMM for all protected mode callbacks,
not just for Win32 services. On entry into ring-0, the ring-3 regiSter state is
preserved in the client register structure. VMM checks whether the caller's ring-3
CS was selector 3bh on entry, i.e. , VMM is expecting an Int 30h from the break
point segment to get us here. If that is true, then the caller's EIP is decremented
by two to point to the beginning of the Int 30h instruction that caused the
transfer. This value is then used to consult the breakpoint table to load the corre
sponding reference data to EDX before branching to the installed PM callback.
Note that a number of other registers are also initialized before control is trans
ferred: EBX is set to the current VM handle, EDI is set to the current thread
handle, an'1 ESP is set to the current thread's stack. Note also the check for an EIP
value less than 200h; this signifies a default PM callback and requires a different
handler which is responsible for reflection to V86 mode.
VMM ( O l )
+ Ob04 sub esp , + 0 4 ; no error code o n stack
; for trap
+ Ob0 7 cld
+ Ob0 8 pus had ; complete the c l ient
i regis ter area
+ Ob0 9 mov ebp , esp ; set EBP to c l ient
i register s truc ture
+ ObOb mov dword ptr [ ebp+ 3 c ] , ds ; save s egments to c l ient
i regi s ters
+ ObOe mov dword ptr [ ebp+3 8 ] , es
+ Obll mov dword ptr [ ebp+4 0 ] , f s
+ Obl4 mov dword ptr [ ebp+ 4 4 ] , gs
+ Obl7 cmp word ptr [ esp+2 8 ] , + 3 bh ; cl i ent cs == 3 bh?
+ Obld j nz short L_B5D
+ Obl f mov ax , 0 0 3 0 ; set segment registers
.+ Ob2 3 mov ds , eax
1be Win32 Callback 49
L_B 3 D :
+ Ob3 d xchg dword ptr [ edi+4c ] , esp
+ Ob4 0 sti
+ Ob41 push o f f s e t C1_3 0 0 ; the handler wi l l return
; to C1_3 0 0
+ Ob4 6 cmp ah , 0 2 ; cal lback o f f s e t < 2 0 0h?
+ Ob4 9 jc L_1 6 6 6 ; branch to V8 6 reflection
L_B5D :
+ Ob5d mov esi , OcOh
+ Ob62 j mp L_2 C O
+ Ob67 nop
We've finally arrived at the handler for Win32 services. A close study of this code
reveals some interesting facts. The first thing it does is examine the caller's stack
by testing SS from the client registers. SS is just a selector to which there corre
sponds a descriptor. The LAR assembly instruction returns a byte of attributes
from the descriptor for a given selector. Only one bit is of interest here-the B-bit
(big-bit). It tells us whether the stack segment is 32-bit (pushes and pops are 32
bits at a time) or whether it is 16-bit (pushes and pops are 16 bits at a time) . If it
is a 1 6-bit stack then VMM is careful to clear the upper 16 bits of ESI since it is an
alias for SP and not ESP.
VMM ( OD )
+ 1288 mov ds , word ptr [ ebp+ 3 4 ] ; c l ient SS
+ 128c mov e s i , dword ptr [ ebp+ 3 0 ] ; c l ient ESP
+ 12 8 f lar eax , dword ptr [ ebp+ 3 4 ] ; load attribute byte of
; S S des criptor
+ 1293 test eax , 4 0 0 0 0 0h ; te s t B-bit for 3 2 -bit stack
; ( ESP )
+ 12 9 8 j nz short L_1 2 9 D ; branch if 3 2 -bit
+ 129a movzx e s i , s i ; z ero extend 1 6 -bit s tack
; o f fset ( SP )
Note that DS:ESI now points to the caller's stack, with the following contents:
Continuing our trace, we see that VMM discards the Win32 callback and ORD_
0001 return addresses on the client stack by adding 12 to the stack pointer (ESI).
It sets the client CS:EIP to the return instruction in the procedure starting at
BFF7 1 2C5, as if returning from ORD_OOOl .
L_1 2 9 D :
+ 129d mov eax , dword ptr [ es i + O S J ; get E I P t o ORD_O O O l return
+ 12a0 mov edx , dword ptr [ es i + 0 4 ] ; get cs to ORD_O O O l return
+ 1 2 a3 mov dword ptr [ ebp+2 4 ] , eax ; s tore to c l i ent regis ters
+ 12a6 mov word ptr [ ebp+ 2 8 ] , dx
+ 12aa add esi , +Oc ; remove return addres s e s from s tack
Recall that EAX is loaded with the Win32 service number before entering the call
back, so this argument is retrieved here and its device number is extracted. If the
device number is less than 40h, VMM consults a Win32 service table in an array
for faster lookup. If the device number is 40h or higher, the VxD list is searched
for a matching device ID. In either case, if the device ID is found and the device
has Win32 services registered for it, the service number is compared against the
total number of services offered. If this is within range, then a lookup in the
Win32 service table is made for the number of expected arguments (pushed on
the stack) and the address of the service routine.
l ookup_Win32_service :
+ 12c6 movzx eax , ax ; extrac t Win3 2 servi ce to EAX
+ 12c9 cmp dword ptr e s : [ edx] , eax ; number o f services >
; reques ted servi ce# ?
+ 12cc j be short s ervice_not_ found ; branch i f s ervi ce out s ide
; range
Now prepare the ring-0 stack before calling the service. A VWIN32 Int 21h service
is passed two arguments on the stack, EAX and ECX, so 8 bytes are reserved on
the ring-0 stack for these arguments.
Next, the current VM handle, then the address of the client register structure, and .
finally the address of the return procedure, are pushed onto the stack. The passed
arguments are copied from the ring-3 stack to the reserved area on the ring-0
stack. This leaves ESI pointing at BFF76802 (the return address from BFF71 2B9)
and it is stored as the new ESP in the client registers.
L_l 2 EB :
+ 1 2 eb mov eax , s s ; restore DS
+ 1 2 ed mov ds , eax
+ 12ef mov dword ptr [ ebp+ 3 0 ] , es i ; save new s tack ptr to
; c l i ent ESP
When control is transferred to the Win32 service, the ring-0 stack looks like this:
next_DDB :
+ 12f9 mov ecx , dword ptr es : [ ecx] ; last devic e ?
+ 1 2 fc j ecxz short s ervice_not_found ; then exi t loop
+ 12fe cmp word ptr es : [ ecx+ 0 6 ] , dx ; matching device ID?
+ 1303 j nz short next_DDB ; no , then loop back
52 Chapter 3: Pathways to the File System
This concludes our examination of the file system plumbing. In the next chapter
we turn our attention to the file system APis, especially the Win32 APL
File System
AP/ Mapping
In Chapter 3, Pathways to the File System, we saw how file system requests are
channeled in diverse operating environments. The MS-DOS Int 21h interface
forms the core API for the operating system modes: DOS/V86, Win16, and Win32 .
To a considerable extent, the Win32 file APis are mapped t o the extended MS- ·
DOS API, although some additional assistance is needed from VWIN32 and VMM .
In this chapter, we will survey the Win32 and Win16 APis and see how they map
to the extended MS-DOS APL We'll also encounter the concept of KERNEL32
objects, a concept which will provide a framework for our examination of the
Win32 APis. Microsoft has us all believing that Win32 is the API of the future, so
let's begin with a look at how the Win32 APis are implemented, primarily those
related to file I/0 .
53
54 Chapter 4: File System AP! Mapping
Function 71A6H is one of many new Int 2lh services that have been added to
Windows 95 to support long filenames and other extensions for MS-DOS and
Win1 6 applications .•
There are still other calls to Int 2lh hiding here. For instance, x_GetExtendedError
is another thin wrapper around a Win32 callback. In this case the code is:
EnterMustComplete ( ) ;
x_MaybeChangePSP ( hFi le , &wPSP ) ;
K3 2 0BJ_INCREF I K3 2 0BJ_FILE_TYPE , 0 ) ;
pK3 2 F i l eObj = retc x_ConvertHandleToK3 2 0bj ect ( hFi le ,
i f ( pK3 2 F i l eObj ) {
_asm movzx ebx , word ptr pK3 2 Fi l eObj - >hExtendedFileHandle
_asm mov edx , dword ptr lpF i leinfo
_asm mov eax , 7 1a6h
• You will find documentation for these functions in the Programmer's Guide to Microsoft Windows 95,
· Part 5: Using Microsoft MS-DOS Extensions. See http://www.microsoft.com/msdnlsdklplatforms/doclsdk/
win32195guidelsrc/95.func_28.htm.
t A PSP (Program Segment Prefix) refers to the DOS data structure that describes a program's execution
environment.
The Win32 AP/ and KERNEL32 Objects 55
else retc = 1 ;
There is a lot more going on in this function besides Win32 callbacks. Let's take a
closer look. First, you'll notice some unfamiliar functions names: EnterMustCom
plete, x_MaybeChangePSP, x_ConvertHandleToK320bject, etc. These are names
I've coined for some internal KERNEL32 functions.
Why would a thread want to change its PSP? In this case, it wants the PSP to
match the owner of the handle. As we'll see later, the handle table is a per
process data structure and handles are indexes into this table. For instance, a
handle of 5 in one process may reference a file, whereas in another process it
may reference a pipe. However, KERNEL32 also recognizes global handles; these
are handles which are associated with the KERNEL32 process and one of its PSPs.
These global handles have a unique signature formed by the index value exclu
sive-ORed with Ox544a4d3f. To test if a handle is global, first AND .it with
OxffffOOOO and then compare with Ox544a0000.
56 Chapter 4: File System AP/ Mapping
The last function that is also preparatory before making the Int2 1Dispatch is x_
ConvertHandleToK320bject. Basically, this function converts any type of Win32
handle into a pointer to a KERNEL32 data structure that describes that object. In
this case, we are asking it to take what we believe to be a file handle (hFile) and
convert it into a KERNEL32 file data structure. Now, if the caller passes us, say, a
console handle instead, the return value stored in pK32File0bj will be NULL
causing the else if (x_ConvertHandleToK320bj ect... ) clause to be executed.
This time the call will look for any bandle type (K320BJ_ALL_ TYPE). If this last
call succeeds, the function fails and an ERROR_NOT_SUPPORTED will be
returned by GetLastError.
If a valid file handle is supplied by the caller, then pK32File0bj will contain a
pointer to a file object structure. The only piece of information we need from it is
yet another file handle, one that IFSMgr will understand, an "extended file
handle" in the field named hF.xtendedFileHandle. This is the handle that is ulti
mately passed to Int2 1Dispatch to acquire the BY_HANDLE_FILE_INFORMATION
data structure .
For example, a file is an instance of a file object type and an event is an instance
of an event object type. As with Windows NT, instances of object types are
created by services and are represented by object handles. Again using the same
examples, a file is created by the service CreateFile, which returns a file handle;
and an event is created by the service CreateEvent, which returns an event
handle. Quoting again from Helen Custer, "An NT object handle is an index into a
process-specific object table. "
For each indexed entry in the object table there is a pointer to the object instance
and a flags field specifying access rights and inheritance designations. Although
there are a lot of similarities between NT executive objects and Windows 95
The Win32 AP/ and KERNEL32 Objects 57
For each of these object types, a block of data is allocated from the KERNEL32
heap to represent an object's instance . The KERNEL32 process object is also
known as the process database, or PDB. Similarly, the KERNEL32 thread object is
also known as the thread database, or TDB. Both of these data structures are
described in detail in Windows 95 System Programming Secrets, Although each
KERNEL32 object is represented by a different data structure, all KERNEL32
objects have the same header:
The dwType field takes a value between 1 and 17 corresponding to its object
type. The dwRejCnt field is used to maintain a usage count for the object. When a
handle is closed and the dwRe/Cnt of its corresponding object has reached zero,
the object is qestroyed.
The KERNEL32 process object contains a member (at offset Ox44) which points to
the table of object handles. The Win32 handles which are returned by CreateFile,
CreateMutex, etc. are simply indices into this table. The function that we met in
the last section, x_ConvertHandleToK320bject, is designed to retrieve an object
from the object handle table given its Win32 handle. Thus given a handle of one
of these 17 object types, we can get the address of its corresponding data struc
ture, which was allocated from the KERNEL32 heap . Actually, there are two fields
for each entry in the object handle table:
The first DWORD in the object handle table contains the maximum n:umber of
entries in the table, so the handle table can be represented by this structure:
Dropping down another level, we see in the listing below that x_RefHandle
ToK320bject sandwiches its body by acquiring a KERNEL32 mutex and releasing
it on exit. We also see the reference count for the KERNEL32 object incremented
on return from x_Win32HandleToK320bject if the K320BJ_INCREF flag is set in
fObjTypes.
K3 20bj ectHeader* x_RefHandleToK3 2 0bj ect ( PPDB pProces s , HANDLE hObj ect ,
DWORD fObj Types , DWORD fAccess ) {
K3 2 0bj ectHeader* pK3 20bj ;
DWORD fObj TypeFlags ;
if (1 << ( pK3 20bj - >dwType- 1 ) ) & fObj Types ) return pK3 20bj ;
}
InternalSetLastError ( ERROR_INVALID_HANDLE ) ;
return NULL ;
}
This function can be split into roughly two halves. The first half massages the
input handle to get it intG a form that can be used to directly access the process's
object handle table. The second half retrieves the entry in the object handle table
· and returns its pK320bject member.
First we see that the high-order word of the handle is tested for the signature
Ox544a. Normally when an application creates KERNEL32 objects, the handles
which are returned are nice small integer numbers, so we are talking handle
values in the range 1 to say 1 000. However, if you place a breakpoint at this loca
tion in the f unction, you will see handles are frequently passed which indeed
have this Ox544a signature. The next two lines in the code help clarify what these
handles signify. First, we switch to a different process (pK32Process), namely
KERNEL32, and then the handle is exclusive-ORed with the value Ox544a4d3f.
After this operation the handle value becomes a "nice small integer. "
So what have we done? We have just created an index into KERNEL32's handle
table and ultimately, when we return, we'll be returning a KERNEL32 object that
actually belongs to the KERNEL32 process.
In my statements above, I've simplified things a bit by separating handles into just
two groups. There is actually a third group · that might be called "standard
handles; " these are handles every process has. For instance, the return value from
GetCurrentProcess is always Ox7fffffff no matter which process you are calling
from. Similarly, the return value from GetCurrentThread is always Oxfffffffe no
matter which thread you call from. These magic values as well as the standard
console handles are just constants that KERNEL32 translates into "real" object
handles. In the switch statement, the first four magic values are translated into
handles by looking up the values in the environment database (pEDB) of the
process. The fifth value in the switch statement represents the handle of the
current thread. Here, it is easier to just look up the KERNEL32 object for the
The Win32 AP/ and KERNEL32 Objects 61
current thread since it is stored in a global variable, rather than determine its
index in the object handle table.
Before the KERNEL32 thread object is returned, we see that some test is .
performed. This test is in the form of the following expression:
(1 << ( pK3 2 0bj - >dwType - 1 ) ) & fObj Types
The first half of this expression simply takes a KERNEL32 object type number,
decrements it by 1 , and then left-shifts a single bit that number of times. In other
words, it is converting the object type number to a bit position. For example,
OxOOOOl represents K320BJ_SEMAPHORE, Ox00002 represents K320BJ_EVENT,
Ox00040 represents K320BJ_FILE, and OxlOOOO represents K320BJ_SOCKET. /Obj
Types is also a bit map of the types of KERNEL32 objects that the caller will accept
a conversion into. We know that a thread object has a dwType of 6 so its bit map
will be Ox00020. If the caller did not set this bit in /ObjTypes, the function will fail
and return NULL; otherwise it will return pK320bj for the thread object.
Now, we are faced with the last half of the function. We have our Win32 handle
massaged so it can index the object handle table, so we first find the object
handle table pHdlTbl in the process database. Then the Win32 handle is
compared with the range of the object handle table by verifying that it is less than
the maximum handle value in the first DWORD of the table. If this test succeeds,
the handle · is used as an index into the array of table entries. The KERNEL32
object pointer in the entry is then tested to see that it is non-zero and not -1 . If
this holds true then the /Access argument is tested for a non-zero value. If the
caller has specified /Access bits, then these are also tested. Finally, the requested
object types /ObjTypes are compared against the returned object type. If these
match, then a pointer to the KERNEL32 object is returned.
Now that your curiosity about KERNEL32 objects has been whetted, let's fill in
some more details about the following types: K320BJ_FILE, K320BJ_PIPE,
K320BJ_MAILSLOT, K320BJ_CHANGE, K320BJ_MEM_MAPPED_FILE, and
K320BJ_DEVICE_IOCTL.
1 2h WORD reserved
1 4h DWORD dwModeAndFlags
handle is less than Ox200 (a DOS handle) then store the mode and flags word
used to open or create the file.
18h DWORD pszFullPath
This member is 0 except for some special cases. If dwModeAndFlags is non
zero, the a heap allocation is made in which the full path of the file is stored;
in that case, this member holds the pointer to that allocation.
There are numerous file object services supplied by the Win32 API. Some of these
services are general purpose and work with many different types of KERNEL32
objects. CreateFile and OoseHandle are good examples of such general purpose
services. Internally they have separate implementations for each object type.
Table 4-2 enumerates the file object services and key Int2 1Dispatch calls used in
their implementation. All of the Int 2lh functions listed are documented.
Table 4-3 enumerates the file-change object services and key Int2 1Dispatch calls
used in their implementation. All of the Int 21h functions listed are undocumented.
Table 4-4 enumerates the pipe object services and key Int2 1Dispatch calls used in
their implementation. The Int 21h functions in the 5fxxh series are undocu
mented. (See Chapter 13 for more information.)
Table 4-5 enumerates the mailslot object services and key Int2 1Dispatch calls
used in their implementation. The Int 21h functions in the 5fxxh series are undoc
umented. (See Chapter 13 for more information).
Table 4-6 enumerates the memory-mapped file object APis and key Win32
services used in their implementation. (See Chapter 10, Virtual Memory, the
Paging File, and Pagers, for more information.)
\ \ . \ VI'ESTD. Later when the device is closed and its dwRecCnt reaches zero,
member contains a pointer to the pathname used to load the device, e.g. ,
Unlike the file, file-change, pipe, and mailslot object services, which rely on
IFSMgr for implementation support, the device object is dependent on VWIN32,
specifically the Win32 service with ordinal Ox2a001f. This service takes 12 argu
ments and there appears to be three distinct ways of calling it. First, when a
virtual device is loaded or opened by a call to CreateFile, the calling arguments
take this form:
VxDCal l ( DWORD svc , I I has the Win3 2 service ordinal ( Ox2a0 0 1 f )
DDB pDDB , I I pointer to device descriptor block
DWORD FuncAddr , I I FuncAddr , i s the address o f a K3 2 procedure
char * p s zDevName , I I 8 character devi ce name as it appears in
II the DDB
BOOL bDoLoad , I I if TRUE load device , e l s e s earch DDB l i s t
char * p s z LoadPath , I I pathname used to load the devi ce
DWORD unusedO , I I has the value 0
DWORD unused! , I I has the value 0
DWORD Init ialRing O I D , I I contains a ring - 0 THCB
DWORD unused2 , I I has the value 0
PPDB pProces s , I I pointer to . the proce s s database
char* p s z ReturnName ) ; I I pointer at which to s tore device name
This call is always made with pDDB equal to NULL. There are two variations ·
based . on the value of bDoload. If bDoLoad is FALSE, the Device Descriptor Block
list is searched for a device with a name matching pszDeviceName. If bDoLoad is
TRUE, the VXDLDR_LoadDevice service is used to attempt to load the device file
pszDeviceName. It turns out that bDoLoad is TRUE if the device name has an
extension, but FALSE if an extension is not specified. If the device is located or
loaded successfully, the 8-character device name is copied to pszReturnName and
a DIOC_OPEN (DIOC_.GETVERSION) call is made to the device's control proce
dure. The arguments FuncAddr and Initia!RingOID appear to only be used for
70 Chapter 4: File System AP/ Mapping
initialization of VWIN32 variables when the first call is made to Win32 service
Ox2a001f.
IOCTL Services
Once an application retrieves a handle to a device object, it may use that handle
to access IOCTL services using the DeviceloControl APL It turns out that both
The Win32 AP/ and KERNEL32 Objects 71
VWIN32 and IFSMgr offer public services of this kind, each with different sets of
functionality.
VWIN32 provides a DeviceloControl interface for a limited set · of MS-DOS func
tions. It seems that these functions were added primarily for disk utility programs
which require direct access to file system structures and need to request exclusive
volume locks on the drives which are being manipulated. There are four dwloCon
trolCode values that are defined:
VWIN32_DIOC_DOS_INT1 3 (4)
This control code is used for BIOS level Int 1 3h. It allows access to the phys
ical sectors of a disk drive but only for the floppy disk drives in a system. This
behavior is documented by the MSDN KnowledgeBase Article Q137176: PRB:
DeviceloControl Int 13h Does Not Support Hard Disks. If you need BIOS Int
13h services for a fixed disk, this article shows how to thunk to a Winl 6 DLL
that uses the DPMI Simulate Real Mode Interrupt function to issue Int 13h.
VWIN32_DIOC_DOS_INT25 (3)
This control code is used for issuing an absolute disk read on a specific
volume. Int 25h reads chunks of disk storage which · are referenced by logical
sectors. To force a read from the physical disk, an exclusive volume lock
needs to be acquired for the volume or the read may actually return cached
data. This interrupt has been superseded by Int 21h Function 440dh Minor
Code 61h, Read Track on Logical Drive.
VWIN32_DIOC_DOS_INT26 (2)
This control code is used for issuing an absolute disk write on a specific
volume. Int 26h writes chunks of disk storage which are referenced by logical
sectors. To write to the physical disk, an exclusive volume lock needs to be
acquired for the volume; otherwise a write protect error will be returned. This
interrupt has been superseded by Int 21h Function 440dh Minor Code 4lh,
Write Track on Logical Drive.
VWIN32_DIOC_DOS_IOCTL (1)
This control code iS used for issuing Int 21h Functions in the range 4400h
through 441 lh. This range includes i:he "conventional" DOS IOCTL functions
as well as the new volume locking functions.*
To issue the above DeviceloControl calls, the lpvlnBuffer and lpvOutBuffer refer
ence DIOC_REGISTERS structures. These structures define the values of the 32-bit
• See Programmer's Guide to Microsoft Windows 95, Article 25, "Exclusive Volume Locking . "
72 Chapter 4: File System API Mapping
registers EAX, EBX, ECX, EDX, EDI, ESI, and flags. Note, however, that the
segment registers are not specified. •
IFS_IOCTL_2 1 (100)
This control code is used for issuing Int 21h functions of the SFxxh series
which are handled by IFSMgr's dFunc5F dispatch function (see Chapter 6,
Dispatching File System Requests). Other Int 21h functions are passed to the
IFSMgr_NetFunction hook chain (see Chapter 7, Monitoring File Activity). The
lpvlnBuffer and lpvOutBuffer arguments to DeviceioControl reference
win32apireq structures. These structures define the values of the 32-bit regis
ters EAX, EBX, ECX, EDX, EDI, ESI, and EBP. There is also a field that will
give the ID of the Network Provider and a field in which to store a return
code. This structure is defined in ifs.h of the Windows 95 DDK.
IFS_IOCTL_2F (101)
This control code is used for issuing Int 2Fh functions. These are also passed
to the IFSMgr_NetFunction hook chain. The same calling arguments are used
as with control code IFS_IOCll_21.
IFS_IOCTL_GET_RES (1 02)
This function takes a WORD size input buffer (lpvlnBuffer) which holds an
SFT or extended file handle that is owned by the calling process. The output
is returned in a DWORD size output buffer (lpvOutBuffer) which holds the
address of the file's fhandle structure after it has been exclusive-ORed with
Oxa5a5a5a5 and rotated left by 13 bit positions.
IFS_IOCTL_GET_NETPRO_NAME_A (103)
This function takes a buffer containing an ASCIIZ UNC pathname (lpvin
Buffer) with the length of the pathname in cblnBuffer. It looks up the Net ID
of the FSD which owns this UNC connection and returns it in the DWORD
size output buffer (lpvOutBuffer). Net IDs are enumerated in the SDK header
file winnetwk.h, e.g., the Net ID for Microsoft Networks is given the manifest
constant WNNC_NET_LANMAN (Ox00020000).
• For more details o n using these functions, see Programmer's Guide to Microsoft Windows 95, Article 20,
"Device I/0 Control. "
Implementation of VWIN32_lnt21Dispatch 73
Implementation of VWIN32_
Int21 Dispatch
Our survey of the Win32 API, as summarized in Tables 4-2 through 4-6, has
shown that Int2 1Dispatch is the primary link that KERNEL32 has to IFSMgr. In
Chapter 3, we traced a Win32 callback into VMM and looked at how a Win32
service was dispatched. For a review of that, see the section "The Win32 Call
back" in Chapter 3. Now we are going to pick up where we left off there, and
trace into the VWIN32's Win32 service Ox2a0010, which we'll refer to as VWIN32_
Int2 1Dispatch hereafter. The assembly code for VWIN32_Int2 1Dispatch is shown
in Examples 4-3 and 4-4.
L_B4C :
+ Ob4c mov e s i , dword ptr [ edi ] . TCB_Flags
+ Ob4e and e s i , THFLAG_CHARSET_MASK
+ Ob5 4 · and dword ptr [ edi ] . TCB_Flags , NOT THFLAG_CHARSET_MASK
+ Ob5a tes t dword ptr [ ebx ] . Flags , fOkToSetThreadOem
+ Ob6 1 jz short L_B 6 9
+ Ob63 or dword p t r [ edi ] . TCB_Flags , THFLAG_OEM
L_B6 9 :
+ Ob69 mov eax , dword ptr [ esp+ O c ] ; Int 2 1h func t i on
+ Ob6d mov edx , dword ptr [ esp+ 0 4 ] ; c l i ent regi ster struc ture
+ Ob7 1 mov dword ptr [ edx ] . C l i ent_EAX , eax
+ Ob7 4 mov ecx , dword ptr [ esp+l O ] ; 3 rd VxDCall arg
+ Ob7 8 mov dword ptr [ edx ] . C l ient_ECX , ecx
+ Ob7b push dword ptr [ edx ] . C l i ent_FS ; preserve thi s
nes ted_exec :
+ Ob8 8 mov eax , 2 1h
+ Ob8d VMMc all Exec_PM_Int
L_B9 3 :
+ Ob9 3 pop eax
+ Ob9 4 mov edx , dword ptr [ esp+ 0 4 ]
74 Chapter 4: File System API Mapping
L_BA8 :
+ Oba8 retn 0010
Note :
FILE_MASK e qu ( THFLAG_EXTENDED_HANDLES O R THFLAG_OPEN_AS_IJ:-.:!MOVABLE_FILE )
The raw disassembly has been cleaned up by adding equates from VMM. INC and
using names that Matt Pietrek has assigned to members of the thread database
structure (IDB). In the simplest case this function takes five steps. It modifies the
current thread's flags, it initializes some client registers, it performs the Int 21h
request, it restores som,e client registers, and it restores the current thread's flags
before returning. Let's look at each of these steps.
Lines Ob2bh to Ob63h modify the current thread's flags. This starts with a call to
Get_Cur_Thread_Handle which returns the handle, which is also the address, of
the thread control block (tcb_s in vmm. inc). The first field of the thread control
block contains the thread flags, TCB_Flags. The first flag to be modified is
THFLAG_Extended_Handles; it is simply set. This informs IFSMgr that this thread
uses extended file handles. The next flag which may be modified is THFLAG_
Open_As_Immovable_File. Whether this flag is set depends upon the setting of the
equivalent flag in the ring-3 thread database. Yes, even down in VWIN32, the
current KERNEL32 thread object is being accessed! The DOK has this to say about
this flag: "Used by VWIN32 to prevent defragmenter from moving an open file."
_
Mov.ing along to the last set of flags, THFLAG_ANSI and THFLAG_OEM., are both
cleared, which implies use of the ANSI character set. Then the current KERNEL32
thread object is consulted to see if it is using the OEM character set; if so, the
THFLAG_OEM. bit is set.
Recall that on entry to VWIN32_1nt2 1Dispatch the stack looks like this:
Next, in lines Ob69h to Ob7bh, we see the calling arguments being accessed.
We see that EAX is loaded with the requested Int 21h function number (the
second V:xDCall argument) and EDX is loaded with the address of the client
Implementation of VWIN32_Int2 /Dispatch 75
register structure. Then we see EAX stored to Client_EAX and the third VxDCall
argument stored to Client_ECX Finally, the current value of Client_FS is pushed
on the stack. These actions prepare the registers that will be used when Int 21h is
invoked.
On lines Ob7eh to Ob86h, we see a check for AH values 3fh (read) and 40h
(write). If either of these functions is being requested, a branch is made to the
code shown in Example 4-4.
L_BF 6 :
+ Obf6 clc
L_BF7 :
+ Obf7 pop edx
+ Obf 8 pop ebx
+ Obf9 pop esi
+ Ob fa jc short nes ted_exec try Int 2 1h
+ Obf c j mp short L_B 9 3
76 Chapter 4: File System AP! Mapping
Finally, on lines Ob88h and Ob8dh, Int 21h is invoked by the service Exec_PM_Int.
This service simulates the interrupt into the current virtual machine (the System
VM). It first assures that the caller is in PM execution mode, and if not calls Set_
PM_Execution_Mode. Then it safeguards its stack from being paged out by
locking it in place, using the service Begin_Use_Locked_PM_Stack. It uses the
current client registers during the execution of the interrupt, except that a PM call
back is stored in · CS:EIP. This breakpoint becomes the return address after the
interrupt completes. The interrupt is then launched by the service Exec_Int,
which in tum performs the Simulate_Int and Resume_Exec services. When the
interrupt returns, control is regained at the breakpoint. Then the service End_Nest_
Exec is called, which restores CS:EIP and the original stack before returning from
Exec_PM_Int.
Exec_PM_Int does pack quite a punch. It has a serious side effect too. The client
registers and flags are modified to reflect the results of the software interrupt that
was performed. Perhaps this is why the DDK warns us: "This service is intended
to be used only by the Windows kernel; external virtual devices should not use it.
External virtual devices should use the Exec_Int service instead. "
O n lines Ob93h to Ob98h, we see the original value of Client_FS being popped
into EAX and then written back to the client register member Client_FS. So when
VxDCall returns, the only client register which you can be sure of is FS! On lines
Ob9ch to Oba8h, VWIN32_Int2 1Dispatch undoes any changes it has made to
thread control block flags and then returns.
Now let's look at the case where the requested function is a read or write. For
these cases, VWIN32 tries to perform an optimization. Instead of sending the
request to the protected mode Int 21h handler, it attempts to convert the
extended file handle into a ring-0 file handle using the IFSMgr service IFSMgr_
Win32_Get_Ring0_Handle. This service takes an extended file handle in EBX and
returns a ringO handle, also in EBX. Extended file handles are numbers greater
than Ox200, whereas ring-0 file handles are ring-0 addresses. If this conversion
succeeds, then another IFSMgr service, IFSMgr_RingO_FileIO, is used to perform
the file read or write, thereby completely bypassing Int 21h.
IFSMgr_RingO_FilelO supports a range of DOS-like file I/0 services. For read and
write, it takes the following arguments:
RemoveDirectory 713ah
SetCurrentDirectory SetCurrentDirectoryA
SetFileAttributes 7143h
Table 4-7 shows an added twist for some of the new Win16 APls. APis such as
FindFirstFile, FindNextFile, and FindClose thunk to the corresponding KERNEL32
routines. Thus, even though the function originates in a Win16 application, it will
still generate VWIN32_Int2 1Dispatch calls.
::·:, :�:_�:,:>.::1: i:!· \,: :.,:>). · : .!:' '\:>\:A' .i:.-' .:'.J:·:J;;:.>: ;::{
; ·� inierf.upt21:h
,.
J
Xi •·· ·
Handlers
Interrupt2.fh ; . ·
' .
�qrutter
..
r
�;�::��\·
Back in Chapter 3, Pathways to the File System, we saw that IFSMgr hooks several
"legacy" interfaces. In this chapter we'll look at IFSMgr's handlers for · these inter
rupts and see to what extent they are passed down the interrupt chain or handled
within IFSMgr. Recall from Chapter 3 that there are five interrupts to be consid
ered and they come in either PM or V86 modes, or both. Here again is the list of
interrupts:
Although the bulk of file I/0 continues to be serviced through these interrupt
interfaces, this need not be the case since ring-0 file services (IFSMgr_RingO_
FileIO) are also available and in a few instances are used directly for performance
or design reasons.
Interrupt 21 h Handlers
IFSMgr's protected mode and virtual-86 mode Int 21h handlers have many similari
ties. Disassemblies of these handlers are shown in Examples 5-1 and 5-2. Keep in
mind that a protected mode handler consumes an interrupt by returning via
Simulate_Iret and chains to the previous handler by a · Simulate_FarJmp. In
Example 5-1 , the labels Sim!Ret and NxtPM21 correspond to these two cases. On
the other hand, a V86 interrupt handler consumes an interrupt by returning with
carry clear and chains to the previous handler by returning with carry set. In
Example 5-2, NextV86Hook and a return through line 1238h both set the carry
79
80 Chapter 5: The "New" MS-DOS File System
flag, so the next V86 interrupt handler will be called. So to see which Int 21h func
tions are handled by IFSMgr and which are passed on, we need to examine how
these handlers decide upon these alternatives.
Initially, both PM and V86 handlers look at the Int 21h function in the AH client
register, to see if it lies below the constant MAXDOSFUNC+ 1. The functions
between 0 and MAXDOSFUNC make up the MS-DOS APL For the retail release of
Windows 95, MAXDOSFUNC is 71h, and for OSR2 it is 73h. Function numbers
. from MAXDOSFUNC+ 1 to FFh correspond to APis supported by various network
providers, or vendor specific extensions; e.g . , function EAh is used to detect if a
Netware client is installed. Each of these groupings has a separate lookup table
for it. The lookup table is indexed by the function number and the table entries
are the addresses of preamble functions.
· The first table of functions, called Lower72_Preambles, is filled in with default
handlers by IFSMgr. The second table of functions, called Upper8E_Preambles, is
not created by IFSMgr until a network provider or other client registers a
preamble for a function in the range MAXDOSFUNC+ 1 to FFh. When the table is
initially created, it is filled with addresses of a preamble function which just sets
carry and returns. A preamble function for either table can be registered using the
IFSMgr service IFSMgr_SetReqHook, which is available during Device !nit or !nit
Complete phases.
Example 5-1 . Protected Mode Int 21h Handler at IFSMGR(1)+1 140h (continued)
1186 jz short S imIRet
NxtPM2 1 :
1188 mov ecx , dword ptr NextPM2 1Sel
118e mov edx , dword ptr NextPM2 10fs
.1194 VMMj mp S imulate_Far_Jmp
FuncGt7 1 :
119a cmp dword ptr Upper8E_Prearnbles , O O
l lal jz short NxtPM2 1
lla3 mov edx , dword ptr Upper8E_Prearnbles
l la9 mov e s i , O f f f f f f f fh
l lae cal l dword ptr [ edx+ecx* 4 - l c 8 h ]
l lb5 jc short NxtPM2 1
l lb7 mov ecx , Od4h
Di spatch_PM_Int2 1 :
l lbc VxDcall IFSMgr_F i l l HeapSpare
llc2 mov eax , dword ptr O fsVMCB
llc7 mov edx , O f f f f f f f fh
llcc c a l l dword p t r [ ebx+eax+ O c ]
l ldO jc short NxtPM2 1
S imIRet :
l l d2 mov ax , word ptr [ ebp ] . Cl i ent_Flags
lld6 and ax , + 0 1
l lda VMMc all S imulate_Iret
lleO and word ptr [ ebp ] . Cl i ent_Flags , - 0 2
lle5 or word ptr [ ebp ] . C l i ent_Flags , ax
lle9 retn
In Examples "5- 1 and 5-2, you can see calls to the Lower72_Preambles at lines
.
1 17Ah and 122Ah. In each case, the Int 21h function number is multiplied by 4,
the size of each doubleword address in the table, and added to the base of the
table. You can also see calls to the UpperSE_Preambles, at lines l lAEh and
124Ah. In these cases, the offset is reduced by 1C8h (or lDOh for OSR2), the
offset to the base of the table ((MAXDOSFUNC+l) • 4).
In both Examples 5-1 and 5-2, we see that a number of tests are performed before
a Lower72_Preambles function is called. The first test involves the HookerFlags
variable, which uses two bits of one byte of storage. This variable is global in
scope; that is, it is visible across all VMs. I've called bit 1 LOCAIJNT21 and bit 0
UNUSEDFLAG. The UNUSEDFLAG bit is always zero. The LOCAIJNT21 bit is set
when V86 Int 21h is hooked in any VM. For instance, if I startup a DOS box and
run a DOS application that hooks Int 21h, this flag will be set and will be seen
from the System VM as well as other VMs. So we may interpret the . four lines of
code starting at 1 14bh in Example 5-1 and at 1 200h in Example 5-2 as a three-way
test. If both flags are clear, then call the preamble with EDX=O. If only the
UNUSEDFLAG bit is set, call the preamble with EDX= l . And last, if only the
LOCAIJNT21 bit is set, continue performing additional tests.
82 Chapter 5: The "New" MS-DOS File System
NextV8 6Hook :
1278 stc
12 7 9 retn
Interrupt 21 h Handlers 83
Let's assume only LOCAIJNT21 is set. We then drop into another bit test over the
next five lines, starting at 1 15bh in Example 5-1 , and at 1212h in Example 5-2. At
this point, EBX is the current VM handle, which is also the base of the VM control
block. During Device Init, IFSMgr calls _Allocate_Device_CB_Area to allocate a
block of memory which is specific to IFSMgr and which is private to each VM.
This block begins at offset OfsVMCB from the beginning of the VM control block;
thus EBX + OfsVMCB is the address of the base of this pervm data structure (see
Appendix C, IFSMgr Data Structures, for pervm's typedef). The pv_jlags member
of this structure, a byte at offset 8, contains flag bits. Bit 4, which I've named
LOCALINT21HOOKER, indicates whether there is a local Int 21h hooker in this
VM. So this test is checking whether this VM is the VM which has installed the
local hook . If not then the preamble is called with EDX=O.
Ok, now let's assume the LOCALINT21 bit is set and we are in a VM which has a
local Int 21h hook; then the function Is7 1 _A3_A4_A5_A8 is called. This is a simple
function which returns with carry set if the requested function is not 71A3h,
71A4h, 71A5h, or 71A8h. So unless the Int 2lh request is for one of these func
tions, the request will be passed to the next PM or V86 handler. It is interesting to
riote that functions 71A3h to 71A5h are undocumented but clearly are related to
the implementation of Find Change Notification. Function 71A8h is used to
generate a short name alias from a long filename.
In any event, if a preamble is called, the carry flag on return determines whether
the function is ultimately dispatched. If the preamble returns with carry set, then
the function is not handled and is passed on to the next handler. However, if the
preamble returns with carry clear, then the function is dispatched to the file
system at Dispatch_PM_Int2 1 or Dispatch_V86. In either case, the address of the
dispatch function is located in the VM's pervm data structure in the member pv_
dispfunc. If the dispatch function fails, it also returns with carry set, and the func
tion is passed on to the next handler in the chain . .
The LOCALINT21 bit of HookerFlags and the LOCALINT21HOOKER bit of the pv_
flags member of the VM's pervm structure have a dramatic effect on the routing of
Int 21h requests. When both bits are set for a VM , they essential)y shut down the
PM and V86 Int 2lh handlers. This is a pretty drastic measure. Why would IFSMgr
do this? Well, before we explore this mystery let's take a closer look at preamble
functions.
Preamble Functions
Preamble functions are described in the DDK's IFS Specification under the section
on the IFSMgr_SetReqHook service. This service takes two arguments, an
unsigned int containing the interrupt number in the high word and the function
number in the low word, and the address of the preamble function to install. At
84 Chapter 5: The "New" MS-DOS File System
this time, this service only installs preambles for Int 21h. IFSMgr_SetReqHook
returns the address of the previous preamble function, if successful, or 0 if the
service fails. If a preamble function rejects an Int 21h request, it must chain to the
previous preamble function.
EBX
The current VM handle
ECX
The Int 21h function number
EBP
A pointer to the client register structure
ES!
The provider ID which is initialized to ANYPROID (-1)
The preamble function decides whether to accept or reject the Int 21h request.
There is always a default preamble function installed for a given request number.
The default preamble function will return with carry set if it wishes for the
request to be rejected, and with carry clear if the request is to be accepted. An
installed preamble function will return with carry clear if it accepts the request,
but chains on to the next preamble if it rejects the request. So the net effect of
calling a preamble function chain is to return with carry set to indicate rejection
or clear to indicate acceptance. Note that this description is at odds with the IFS
Specification, which incorrectly states that an installed preamble function should
return with carry set if it accepts a request.
Table 5-t enumerates the default preamble functions which IFSMgr uses to
initialize Lower72_Preambles. Functions 44h and 71h also have subtables indexed
entered as 44xxh and 71xxh. The 71xxh series functions (except 71a0h-71aah)
by the subfunction number in the AL register. These preamble functions are
are remapped by the preamble into their non-long filename equivalent functions
but with the LPN flag set (bit 30 of the ECX register). Functions 71a0h through
Interrupt 2Ih Handlers 85
71aah are mapped to a different set of functions, but these also have the LPN flag
set.
The functions which do not appear in Table 5-1 are not accepted by IFSMgr.
with Windows for Workgroups 3 . 1 1 and Windows 95. The experiments were
performed with a simple DOS application, TEST21 , which hooks Int 21h using
DOS function 25h, set interrupt vector. TEST21 issues a sequence of Int 21h func
tions and tabulates a count of received Int 21h requests. It then compares the sent
versus received counts for each function number.
When TEST21 is executed at the DOS prompt (outside of Windows), the sent and
received counts are equal. However, if TEST21 is executed in a Windows for
Workgroups 3 . 1 1 DOS box, the only Int 21h request which is received is the func
tion 25h request; the other calls, functions 3D, 3F, 40, and 3E, are handled by
IFSMgr without being reflected to DOS. When the same test is performed in a
Windows 95 DOS box, all of the Int 21h requests are received by TEST21 .
box (VM) after Windows is running. The reflection of file 1/0 Int 21h requests to
TEST21 falls into the category of a "local hooker" since it is executed in a DOS
By using the HOOKER21 TSR, which is on the companion diskette, you can
confirm this behavior for yourself. HOOKER21 is a minimal TSR that calls set inter
rupt vector to establish a new Int 21h handler that does nothing except chain to
the previous handler. If this TSR is placed in a winstart. bat file in the \ windows
directory, it will be executed in the context of the System VM after IFSMgr has
Interrupt 21 h Handlers 87
completed Device Init. Thus IFSMgr detects the re-vectoring of Int 21h and flags
the System VM for Int 21h reflection.
To see this, perform a "before-and-after" test. Run MultiMon with the monitors
"VWIN32 Int 2 1 " and "V86 Int 21 (post-IFSMgr)" enabled. Generate some file
activity by using the right mouse button to create a shortcut on the desktop. Most
of the Int 21h requests which originate in VWIN32 do not make it as far as the
V86 Int 21h handler. Now, perform the steps above after creating a
\ windows\winstart. bat file and having it load hooker21 .exe. Then restart the
system. Repeat the MultiMon test and generate some file activity. The MultiMon
trace will now show a matching V86 Int 2lh request for each VWIN32 Int 21h
request (at least for the file 1/0 functions).
We can see why this is happening if we examine the code for the function 25h
preamble in Example 5-3. First we see that this preamble is only interested in
changes to the Int 21h vector and only if they originate in V86 mode. If the client
making the request is executing in protected mode or if the vector being set is not
for Int 21h, the preamble returns immediately. Next, the preamble determines
whether the vector it is restoring is the original vector (whose linear address was
stored in LinV86121 Vec during Device Init) or whether a new vector is being set.
The vector argument in DS:DX is converted to a linear address for comparison
with Lin V86121 Vec, and execution continues at the label ResVect or SetVect,
depending on the outcome.
The flags which track Int 21h reflection are found in three different locations.
First, there is the pv_flags member of the VM's pervm structure. Next, there are
HookerFlags and HookedVMs variables which reside in global IFSMgr memory.
Finally, there are flags in the DOS device driver, ifshlp.sys. These flags are refer
enced as offsets from lin_IFSHLP_data, the linear address of a shared data area in
ifshlp.sys.
The key flag is LOCALINT21HOOKER of pv_flags in the VM's pervm data struc
ture. If it is getting cleared by the restoration of the Int 21h vector or if it is getting
set because a new vector is installed, then all of the other flags also are updated.
Setting the Int 2lh vector multiple times in a VM has no affect on the flags after
the first change.
If a VM has a local hooker installed, it will appear before ifshlp.sys in the IVT
chain. There may also be other global hookers installed via autoexec. bat or
config.sys that appear in the IVT chain before ifshlp.sys and any local hooker.
If a request gets routed all the way down to ifshlp.sys, what happens to it? Does it
keep going and end up being serviced by MS-DOS? To answer these questions
we'll need to look at the disassembly of the Int 21h handler in ifshlp.sys, shown in
Example 5-4.
This handler routes requests in two · possible directions. If line 4E3 is reached, the
request is being sent down the interrupt chain to the next "real-mode" handler
and may end up being serviced by MS-DOS. If line 4EE is reached, the jmp trans
fers control to a V86 callback which re-enters IFSMgr.
90 Chapter 5: The "New" MS-DOS File System
This 1 6-bit code bears some resemblance to the PM_lnt2 l_Chain and V86_Int2 1_
Chain handlers shown in Examples 5-1 and 5-2. The flags variable resides in
global memory and is modified by all VMs. Bit 1 signifies that IFSHLP has been
initialized by a call from IFSMgr, bit 7 is equivalent to the LOCAIJNT21 bit of
HookerFlags, and bit 6 is equivalent to the UNUSEDFLAG bit of HookerFlags, as
used in IFSMgr. The other variable tested here is perVM_jlags. It lies in a region
of IFSHLP which is instanced, i.e . , which has a private copy mapped into each
VM's address space.
The V86 callback to IFSMgr is called if, at least, the following conditions are met:
• Bit 1 is set in the flags variable, indicating IFSHLP has been initialized by
IFSMgr;
• Bit 7 is set in the flags variable, indicating that some VM has a local Int 21h
hooker;
• Bit 0 is set in the perVM_jlags .variable, indicating that the current VM has a
local Int 21h hooker;
• The function number is OBh or greater but less than 72h.
The callback may also get called if a preamble returns with carry clear. Preambles
may be called on the following Int 21h functions: OBh, ODh, OEh, 3Eh, 3Fh, 40h,
41h, 42h, 47h, 57h, 5Ch, 5Dh, 5Eh, 5Fh, 68h, and 71h.
One question still remains unanswered: who sets and clears these IFSHLP vari
ables? We can find the answer back in Example 5-3 in the code for Preamble_25 .
IFSMgr stores away a linear address (in lin_IFSHLP_data ) which points to offset
0024h in IFSHLP, the start of the shared data area. Preamble_25 loads EDI with
lin_IFSHLP_data and then uses EDI to reference bytes at offsets l l h and 1 2eh. If
you add 24h to these offsets, you get the addresses of the flags and perVM_flags
variables in IFSHLP.
Before we move on, let's recap. Several flags are maintained at a global scope
and at a per-VM scope, to determine whether to reflect an Int 21h request down
ward towards MS-DOS land. IFSHLP is positioned along this downward path so
that it can snatch up these requests and redirect them back to IFSMgr just before
they drop into MS-DOS. For more details on how IFSMgr and IFSHLP exchange
data, see the sidebar "The IFSHLP/IFSMgr Connection. "
This excursion into IFSHLP and its role in Int 21h reflection has uncovered a
"back door" into IFSMgr-that of the V86 callback. The ring-0 code for this call
back is shown in Example 5-5. This routine's first order of business is to clean up
the client stack. It does this by simulating a POP BX and then an IRET. Before BX
is restored, the value of BX in the client registers is loaded into ECX to use as the
function number. Except for the check for a special function value, BDh, which is
Interrupt 21 h Handlers 91
vectored to the Int 17h handler, this code closely follows that of V86_Int_Chain.
There is one small difference in the arguments to preamble functions: EDX has
the value 2; when preambles are called from PM_Int2 1_Chain and V86_Int_Chain,
EDX is either 0 or 1 . If the preamble function rejects the request, or if IFSMgr fails
the call, then the request is channeled back down the "real-mode" interrupt chain.
92 Chapter 5: 1be "New" MS-DOS File System
The address of the previous Int 21h handler is loaded from IFSHLP's shared data
area. This address is passed to Build_Int_Stack_Frame to make it the new CS:EIP
after the client registers Client_ CS and Client_EIP, and Client Flags are pushed on
the client stack. When the callback returns, execution resumes in the VM at this
previous handler. Note that if the callback services the request, CS:EIP is set to the
instruction following the Int 21h call since the request has been completed.
The handler for interrupt 2Fh function 05h, shown in Example 5-6, is quite
simple. If AL is zero, the call is an installation check and AL is returned as OFFh
Interrupt 2Fh Handler 93
Example 5-6. V86 Interrupt 2Fh Function 05h Handler at IFSMGR(3)+ 1 130
Int2 f_0 5xx_Handl er Proc Near
1 13 0 mov edx , dword ptr [ ebp ) . Cl i ent_EAX
1133 test dl , dl
113 5 jnz short L_1 1 3 C
113 7 mov byte ptr [ ebp ) . Cl ient_AL , O f f
113b retn
L_1 1 3 C :
113c mov ecx , Od2h
1141 push edx
1142 call Di spatch_V8 6
1147 pop edx
1148 test byte ptr [ ebp ) . Client_Flags , 0 1
114c jnz short L_l 1 4 F
114e retn
L_l 1 4 F :
114f mov dword ptr [ ebp ] . C l i ent_EAX , edx
1152 stc
1153 retn
The handler for the Network Redirector functions (1 born) is more complicated.
The disassembly for this routine is shown in Example 5-7. For each minor func
tion number (in client AL), a table in IFSHLP is consulted to see if it is supported.
The linear address for the table is at lin_IFSHLP_data + 2eh. This table lies in the
instanced portion of the IFSHLP data area, so the address in the current VM's
address space is found by adding [EBX] .CB_High_Linear, where EBX is the current
VM handle. This table is indexed by the minor function number. If the high order
bit of the byte at the indexed location is set, then a function in the array, Table_
2fl l , is called. Otherwise, the previously installed .Y86 Int 2Fh handler will get
control.
The code which determines the index into Table_2fl I is a little tricky . If the
minor function number is less than 80h, then the comparison at line I I le will set
the cany flag. The instruction at line l l lf then complements the cany flag,
thereby clearing it, so that the subtract with borrow at line 1 1 20 makes EDX zero.
The net effect is that Table_2fl I is indexed by (function*4). However, if the minor
function number is 80h or greater, then the comparison at line 1 1 lc will clear the
cany flag. Complementing the cany flag then sets it so that the subtract with
borrow leaves EDX equal to ffffffffh. The subsequent AND with fffffec8 sets EDX
to that value. This is equivalent to c8h-(80h*4). The net effect is that minor func
tions 80h or greater index a section of Table_2fl 1 starting at offset c8h.
Table 5-2 summarizes the functions for which handlers are installed by IFSMgr.
Most of the functions are mapped to a different function number and then sent to
Dispatch_V86.
Note that in MS-DOS the Network Redirector functions are called by DOS. The
functions which are enumerated here are not called internally. For more informa
tion on the Network Redirector, see Chapter 8 of Undocumented DOS by Andrew
Schulman et al.
mode code is shown here in Example 5-8. On entry, AL contains the drive
number on which the read or write is to be performed. If the drive number is vali
dated, the request is sent to the dispatch point as function DDh for Int 25h or
function DEh for Int 26h. After the request is dispatched and returns, the client
flags are pushed onto the client stack. This is done to simulate the "quirky"
behavior of these software interrupts.
Example 5-8. Protected Mode Int 25h I 26h Handler at IFSMGR(3)+ 162f
PM_Int2 5_2 6_Chain Pree Near
162f mov eax , edx
1631 movzx edx , byte ptr [ ebp ] . C l i ent_AL
1635 call Val idateDrive
163a jc short next__pm_int
163c mov ecx , Oddh
1641 cmp eax , +2 5
1644 jz short di spatch_int
1646 mov ecx , Odeh
dispatch_int :
1 6 4b VMMc all S imulate_Iret
1651 mov eax , dword p t r OfsVMCB
1656 mov edx , O f f f f f f f fh
1 6 5b call dword ptr [ ebx+eax+ O c ]
165f mov eax , dword ptr [ ebp ] . Cl i ent_EFlags
1662 VMMc all S imulate_Push
1668 retn
next__pm_int :
1669 mov ecx , dword ptr NextPM2 5Sel
166f mov edx , dword ptr NextPM2 5 0 f s
1675 cmp eax , + 2 5
1678 jz short L_l 6 8 6
167a mov ecx , dword p t r NextPM2 6Sel
1680 mov edx , dword p t r NextPM2 60fs
L_l 6 8 6 :
1686 VMMj mp S imulate_Far_Jmp
Interrupt 1 7h Hand/et:
The virtual-86 mode Int 17h handler for BIOS printer services would take several
pages if we were to display it all. However, it is relevant to discuss one aspect of
it. This is that even printer services are channeled to the Dispatch_V86 routine.
The function number which they are dispatched under is CCh.
Storing the dispatch address in such a convenient location makes it easy to write
a simple hook for monitoring traffic through the dispatch point. The IFSDSPAT
monitor driver does just that. It hooks the dispatch point in all VMs and displays
each dispatched function and some associated registers. This driver works in
conjunction with MultiMon, so its output is displayed in MultiMon's application
window along with the output from other monitors that are also enabled.
The output in Example 5-9 was generated in response to clicking the right mouse
button on the desktop and selecting "New Folder. " These are just the first few
lines; the complete trace spans several pages. The lines of output that we see
here are from three different monitors:
• w2 1 , VWIN32's Int 21h dispatcher (WIN32CB)
• p2 1 , protect-mode Int 21h hook before IFSMgr (12 1HELP1)
• dsp, hook at IFSMgr's dispatch point (IFSDSPAT)
For each Int 21h function, two or three lines are displayed. If the interrupt request
originated in VWIN32, then the trace begins with the Win32 callback shown as a
w2 1 line. VWIN32's interrupt dispatcher then generates a protected-mode nested
IFSMGR's Common Dispatch Routine 97
execution of the interrupt which produces the p2 1 line. If the interrupt request is
handled by IFSMgr, then it gets sent to the dispatch point and we get a dsp line.
The Fune value shown on each dsp line is the function number. We see that this
is usually the same as the Int 21h function number. The get file attributes func
tion, 7143h, is mapped to function 43h with the long filename flag set in the high
order byte giving us 40000043h. We also see this apply to the make directory func
tion, 7139h. Something different is happening with the last function call in the
trace. Here, 71A4h becomes 400000elh when it is dispatched. In this case, there
is no standard implementation of function A4h so it is mapped to an available
number above 7lh, which happens to be Elh. In fact we have been seeing this
kind of mapping in the handlers for interrupts 2Fh, 25h, 26h, and 17h.
Here is a more formal description of the calling convention for the dispatch point:
ECX
The dispatched function number in the low byte, the high byte consists of
several flag bits
EBX.
The current VM handle
EAX
The offset to IFSMgr's pervm data structure for the VM
EBP
Pointer to the client register structure
ES!
The provider ID (usually -I for ANYPROID)
EDX
? (may be function specific)
EDI
? (may be function specific)
The file API which IFSMgr exports to other VxDs, IFSMgr_RingO_FileIO, is also a
thin veneer around a call to the dispatch point. Unfortunately, the dispatch
routine is called directly and not through the entry in IFSMgr's area of the VM
control block. So our hook doesn't show these calls.
The first involves determining the offset of IFSMgr's VM control block area, and
the other is how to track the dispatch function for each VM separately.
To get IFSMgr's VMCB offset, I used a direct approach: just load before IFSMgr,
hook the _Allocate_Device_CB_.Area service, and watch for IFSMgr's call. The
code for this is shown in Example 5-10. This function has a special header in
order to support Unhook_Device_Service; HOOK_PREAMBLE is the macro which
achieves this. At the center of the code is the indirect call to pPrevAllocDevCB, a
variable which holds the previous service address when the Hook_Device_Service
returns. The key to knowing which VxD has made the call is to look at the return
address on the stack. This address is passed to _GetVxDName to let it do the
grunge work of figuring out which device that address belongs to. For instance, if
IFSMgr is making the · call, the string returned might be "IFSMGR(2)+c01 234567".
The intrinsic function memcmp() then compares the first 6 characters returned
against "IFSMGR" . If we get a match, then we've got what · we're after and store
the returned offset in the global variable OfslfsVMCB. Since our hook has served
its purpose, we unhook it before retumiQ.g-that way it won't get called again .
_asm popad
_asm mov eax , dWOf s
_asm mov esp , ebp
IFSMGR's Common Dispatch Routine 99
The second problem I needed to address was how to keep track of each VM's
dispatch function address so that if MultiMon shuts down, the original dispatch
function can be restored on a per-VM basis. Currently, all VMs use the same
dispatch function but IFSMgr's design allows multiple dispatch addresses, so let's
support that.
This doubleword of storage lies at the address VMHandle+ OftMyVMCB for each
VM. The original dispatch address for a VM is stored in this location before it is
replaced with the dispatch hook function.
Disp atching File
System Requests
This chapter is going to look at what is our first taste of the real IFS. So far, we
have been hovering about looking at the various ways we arrive at the IFSMgr
and its services, but now we have arrived. The dispatch point is the ultimate IFS
named-pipes also passes through here. At this point, we start utilizing data struc
tures and file system drivers that are uniquely those of IFSMgr. We are no longer
propping up legacy APls. However, IFSMgr borrows a lot from DOS and builds
upon it, so we can't claim a clean break with the past.
This dispatch point is just another API of sorts. It is not one that has been docu
mented in the IFS Specification, although key . data structures that are part of it
have been partially documented. Unlike the many · interrupt-based APis we have
been looking at, this new API is based upon a packet or block of data describing
a desired operation. This packet is constructed from a set of input parameters,
one of which is a function number. This function number lies in the range 0 to
and EAh for OSR2. The values 0 through MAXDOSFUNC (see Chapter 5, The
MAX.IFSFUNC, where MAXIFSFUNC is E7h for the retail release of Windows 95
''New " MS-DOS File System) overlap with the corresponding DOS function
numbers, although there are large gaps in the coverage, especially for those func
tions which are not file-related. Other legacy APis are also mapped in this
function range; for instance, Int 25 and Int 26h are mapped to functions DDh and
DEh, and Int 17h is mapped to function CCh.
100
Tbe Dispatch Point 101
This API is not just a convergence of legacy interrupts into a single linear range of
function numbers; it is more fundamental than that. By moving the function
description into a packet structure, a function request can be more completely
described. It can carry a complete description of the register state and pointers to
important system data structures upon which the command depends. Packets can
also be scheduled to execute as an event providing a mechanism for asynchro
nous operations.
Since the packet is such a key part of this new API, we'll start by examining how
these packets are constructed. The dispatch point is where this process begins.
Although the dispatch point is primarily the common entry point for ring-3 file
system requests, there are two ring-0 IFSMgr services which also use it. First, the
service IFSMgr_RingO_FileIO enters the dispatch point directly using a near call.
On the other hand, IFSMgr_ServerDOSCall enters the dispatch point using an indi
rect jump through pv_dispfunc.
The dispatch point routine needs to do several things. It builds an ifsreq packet
and passes it to a function handler. After the function handler returns, it performs
some optional cleanup and other completion handling chores.
Think for a minute about who will be calling this routine. Just about every compo
nent in the system will be executing this code-applications, system services, and
ring-0 clients-on different threads and in different process contexts. Is this inter
face going to be synchronous or asynchronous? Will it be re-entrant? If so, how
might these objectives be achieved?
Figure 6-1 portrays the i fsreq packet, showing its members and the groupings
which are initialized by the dispatch point handler. For details on each of the
members, see Appendix C, IFSMgr Data Structures. There are four groupings of
members that are distinguished in Figure 6-1 . At the bottom of the i fsreq
packet, storage is set aside for saving the client register structure. On top of the
client register structure is a group of members which are undocumented. These
start with the member ifs_pdb and ends with member ifs_ VMHandle. These are
all initialized in the dispatch point handler. Then there is a section which is initial
·
ized to zero, followed by the topmost members of the structure. The topmost
members are documented in the IFS Specification. Of these, members ir_length
through ir_data are initialized by the dispatch point handler.
lfsrsq Structure
It wotild be interesting to walk through the dispatcher code, but it would take us
four or five pages just to display it in pseudocode form. Instead, Table 6-1 · distills
this routine into a chart of ifsreq members and how each member gets its value
from the execution of the dispatcher code. Although the main purpose of the
.
dispatcher is to get the ifsreq packet into a good known state before passing it on,
it also performs other chores such as passing CTRL-C down to IFSHLP if CTRL-C
checking is turned on and the VM is not the system VM It also performs a series
.
Table 6-2 shows the contents of an i fsreq before and after a file create opera
tion: creating a shortcut on the Windows 95 desktop. The Int 21h function that is
behind the ultimate dispatch call is 716ch.
In the Return column, several of the ioreq members have different names than
the operation started with in the Entry column. These represent overlays of
different members of a union. For example, ir_auxl is a union of type aux_t.
The ioreq structure declaration in ifs.b declares this member as:
aux_t ir_auxl ; / * secondary user data buf fer ( CurDTA) */
The ifs.b header file also contains this declaration of the union aux_t:
typede f uni on {
ubuffer_t aux_buf ;
unsigned long aux_ul ;
106 Chapter 6: Dispatching File System Requests
Any of these members can be combined with ir_auxl. So if this field happened
to represent an unsigned long volume handle, then it would be referred to as ir_
auxl . aux_ul, or if it represents a table of handle-based functions, it would be
referred to as ir_auxl .aux_hf ifs.h has gone further and defined macros for some
common union references:
#define ir_volh ir_auxl . aux_ul / * VRP addres s for Mount * /
#def ine ir_hfunc ir_auxl . aux_hf / * f i l e handle func tion vector * /
The ir_hfunc member is one of the more interesting return values on a file create.
It points to a table of functions in the FSD that support read, write, and other
handle-based operations. The results column also contains three different forms of
handles. The member ir_sfn contains the System File Number for the newly
created file. This is the number that backs up a Win32 file object (see Chapter 4,
File System AP! Mappin[j). The field ifs_pfb is a pointer to a fhandle structure
which also happens to be used as a ring-0 file handle. And lastly, ir_jh is a file
handle that is private to the FSD.
It is interesting to follow what has happened to the file name that was passed to
the function. Originally, it was a pointer in the client registers, specifically,
Client_ESI, and it pointed to the long filename C: \ WINDOWS\Desktop\New
Shortcut.Ink.
On return, four different fields contain some representation of the original file
name: ir_ppath, ir_data, ir_upath, and ifs_pbuffer. Now, ir_data just holds the
original pointer to the filename but the other three pointers are different. The
member ir_upath is declared as type string_t, which is unsigned short *, i.e. , a
Unicode string. This string is also "unparsed"-it is a straight conversion of the
input path to Unicode. The members ir_ppath and ifs_pbuffer, on the other hand,
· are of type Parsed.Path. A path which is represented by a ParsedPath structure
is called a canonicalized path. Here is the declaration for the ParsedPath type:
struct ParsedPath {
uns igned short pp_totalLength ;
uns igned short pp_pre fixLength;
s truc t PathElement pp_e lement s [ l ] ;
};
The member pp_tota/Length gives the total length of the pathname including the
size of the Parsed.Path structure (4 bytes). The member pp_prefi:x:Length gives
Dispatch Functions 107
the offset of the last path element in the pathname relative to the start of the
ParsedPath structure. These members are followed by zero or more PathEle
ment structures. A PathElement structure has this declaration:
s truc t PathElement
unsigned short pe_length ;
unsigned short pe_uni chars [ l ] ;
};
The member pe_length gives the length in bytes of pe_unichars, including its null
termination. The member pe_unichars contains the zero or more Unicode charac
ters that make up the path element string. The PathElem.ents in a pathname are
delimited by the path separator character ("\ " or "/") but the separator character is
removed from the extracted Unicode string.
An example will make this much more clear. Here is the ParsedPath representa
tion for our "New Shortcut" :
0 0 4 6h 0 0 2 4h
O O l Oh "WINDOWS "
O O lOh " DESKTOP"
0 0 2 2h "NEW SHORTCUT . LNI< "
In this example, the total length of the path, 46h, is equal to the sum of the
lengths of the PathElements (10h+l0h+22h) plus the length of the ParsedPath
structure (4). We also see that pp_prefixLength, which has a value of 24h, gives us
the offset to the filename portion of the path. Note that all elements are converted
to uppercase and the strings are in Unicode. These canonicalized paths are
always relative to the root of the volume, and a volume designator is not part of
the path description. For instance, a root path can be represented by a Parsed
Path structure containing a pp_tota/Length of 4 and a pp_prefixLength of 4.
There is a lot more information that we could extract from Table 6-2, but it will
make more sense once we have better grounding in the IFSMgr's internal data
structures.
Dispatch Functions
The dispatch function table contains functions for handling each command type,
as shown in Tables 6-3 through 6-5. For instance, the command 6Ch can come in
several forms. If it is function 6Ch using a short filename, then the LFN command
flag will be cleared. However, if it was called using function 716Ch, then the LFN
bit will be set. Or, it may have been invoked in response to an IFSMgr_RingO_
FilelO service and the command LFN and IFSMgr_RingO_FileIO flags will be set.
Yet another variation in command flags would be seen if the call was made via
IFSMgr_ServerDosCall . Although several different calling methods could be used,
the same dispatch function will service all of these requests for function 6Ch.
108 Chapter 6: Dispatching File System Requests
Tables 6-3, 6-4, and 6-5 enumerate the functions in the dispatch function table.
Each known function has been given a descriptive name in these tables. These
are simply names that I have created for convenience; you will not find them
documented anywhere. If a function number is not represented in the tables but
lies in the range 0 through MAXJFSUNC, the default handler shown in Example
6-1 is called. This routine does nothing but return with a error code of 1 .
instruction. The contents of the ECX, EAX, EDX, and EBX will indicate which
However, if a kernel debugger is loaded, a breakpoint will occur at the int 3
command was attempted and where it originated. In reality, this function should
not get called; the preamble routines should weed out any unsupported functions.
Table 6-3 consists entirely of Int 21h functions with the table function number
handlers for the Int 2th function 1 l xxh interface. Where this is the case, the func
corresponding to the Int 21h function. In Table 6-4, many of the functions are
To get a feel for how a dispatcher function is implemented, we'll take a look at
the pseudocode for dGetVollnfo, one of the shorter functions (see Example 6-2).
The Programmer's Guide to Microsoft Windows 95 describes the input and output
parameters for this function in the section "Interrupt 21h Function 71AOh Get
Volume Information." There is essentially one input, the root path of the volume
for which information is requested. This string takes the form "C: \ " . Upon arrival
at dGetVollnfo, the pointer to the rootname, which was originally in DS:DX or
Dispatch Functions. 111
retc = _PathToShRes ( p i f s , O ) ;
i f ( ! retc ) {
i f ( p i f s - > i f s_drv == 2 && I I drive B
( DriveAttribs [ l ] & OxO B ) && I I s ingle drive sys tem
! ( DriveAttribs [ l ] & OxB O ) &&
p i f s - > i f s_VMHandle == hvmSystem ) {
p i f s - > i f s_ir . ir_error = ERROR_INVALID_DRIVE ;
return ;
r eturn ;
}
if ( retc == Oxf f f f f f c O l {
i f ( I s Phys icalDrive ( p i f s - > i f s_drv ) ) {
p i f s - > i f s_crs . Cl ient_CX OxO O O c ; / / Max fn len
p i f s - > i f s_crs . Cl ient_BX = OxB O O O ; / / FS flags
p i f s - > i f s_crs . Cl ient_DX = OxO O S O ; / / Max path len
p i f s - > i f s_ir . ir_error = O ;
}
e l s e p i f s - > i f s_ir . ir_error ERROR_INVALID_DRIVE ;
1 12 Chapter 6: Dispatching File System Requests
EDX, is now stored in the i fsreq member ir_data. Other members of ifsreq
are filled in as outlined in Table 6-1 .
The dispatcher function wants to pass the request to a file system driver, specifi
cally the driver's FS_QueryResourcelnfo routine which is designed to return its
"Volume Information. " To do this, it has to find which FSD handles the requested
· volume. The call to _PathToShRes (my name) achieves this by processing the
i fsreq packet. It relies upon the service IFSMgr_ParsePath to convert the path in
member ir_data into a ParsedPath with a pointer to it left in ir_ppath (and ifs_
pbuffer) on return. This service also fills in ir_uFName (ir_aux2), ir_upath (ir_
aux3), and, most importantly, ifs_psr. This last member is important because a
i?arsedPath only contains the path components and not the drive letter. The ifs_
psr member is a pointer to an IFSMgr shell resource; it describes the volume to
which the ParsedPath refers. When IFSMgr_ParsePath returns, _PathToShRes
does some additional processing and also fills in the ir_rh member. This is a
resource handle for the volume; a handle which the FSD returned when the
volume was initially mounted.
Once the i fsreq packet is primed with this information, we know how to call
the FSD. Before doing so, there are few more parameters which need to be set
up: ir_options is set to 2 for a level 2 request, ir_data is now pointed at the buffer
which will hold the file system name on return, ir_length contains the length of
this buffer, and ir_pos is set to 0. The i fsreq structure is now ready for a FS_
QueryResourcelnfo call (for a description of the calling parameters see the DDK's
IFS Specification).
This brings us to the Call_FSD function. The first argument to this function is
key-it is the address of the FSD function to be called. How does it know which
FSD and which function? By using ifs_psr. This pointer to the shell resource gives
us access to a function "exported" by the FSD. The shell resource's member sr_
June is a pointer to a volfunc structure, which is an array of all of the volume
based entry points in the FSD. This structure is defined in ifs.h along with mani
fest constants for each function. In our case, we need VFN;... Q UERY, which
corresponds to FS_QueryResourcelnfo. The pir argument to Call_FSD will be
passed as an argument on the call to the FSD function.
restored before the Int 21h request ultimately returns. By changing this image we
are assured that the caller will see the returned values.
From this example we have seen that volume-based FSD functions are found in a
shell resource structure for a given local or remote drive. There are also handle
based FSD functions which are found in the fhandle structure corresponding to
the file's SFN. So, just as the i fsreq member ifs_psr is required for volume-based
FSD function calls, ifs_pfh is required for handle-based FSD function calls.
Detailed descriptions of fhandle structures and shell resource are given in
Appendix C. In the next two sections we will examine these key file system struc
tures in more detail.
Descriptions of the members of the shell resource are given in Appendix C. For
our purposes now, we are interested in the srJunc->vfnJunc and sr_rb entries.
When a file system driver registers with IFSMgr during the Device Init stage, the
address of the FS_MountVolume function provided by the FSD is supplied. When
the first access is made to this volume, the FS_MountVolume function is called to
mount the volume. This establishes its table of volume-based functions and the
FSD returns a unique handle, sr_rh, which is then passed to the FSD on future
calls. This handle is not interpreted by IFSMgr, so the FSD is free to use the
address of a data structure or any other unique value to identify a volume.
The contents of the FSD's volume-based function table is shown in Table 6-6. At
the head of the table, version and revision are given first, followed by the table
size, and then the actual function entries (this structure is defined in ifs.b). The
1 14 Chapter 6: Dispatching File System Requests
SysVolTable
corresponding FS_ function name for each table entry is also shown. These are
the functions which are described in the IFS Specification.
SFNBuckets
Several data structures are used to represent a system file number. Initially a
single SFNBucket is allocated; it is a pointer that references a block of storage
able to hold 256 files. As more handles are required, additional SFNBuckets are
allocated by IFSMgr. The maximum number of SFNBuckets that can be accomo
dated is 254, so the file system has a capacity for 65024 files.
The contents of the FSD's handle-based function table is shown in Table 6-7. At
the head of the table, version and revision are given first, followed by the table
size and then the actual function entries (this structure is defined in ifs.h). The
corresponding FS_ function name for each table entry is also shown. Note that the
functions FS_ReadFile and FS_WriteFile correspond to the members hf_read and
hf_write and are not included in the table pointed to by fh_hf.hf_misc.
ESI is a pointer t o a n i fsreq packet, pifS, and [ESI+7 d references its member ifS_
psr, the shell resource. EAX is assigned the address of the shres structure, so
[EAX+Oc] references its member, sr_func, the volfunc structure. Finally, the func
tion at offset 24h in the structure is pushed on the stack as an argument. This·
The volume-based call was straightforward. Now let's take a look a t a handle
based call from the dispatch handler: dByHandlelnfo. Here is the call into Call_
FSD as it appears in assembly language:
push 00
push esi
push +11
mov eax , dword ptr [ es i + 7 4 J
mov eax , dword ptr [ eax+ O B J
push dword ptr [ eax+2 0 J
cal l Cal l_FSD
add esp , + 1 0
ESI is a pointer to an ioreq packet, pifS, and [ESI+74] references its member ifS_
pfh, the fhandle structure. EAX is assigned the address of the fhandle structure,
so [EAX+08] references its member, fh,_hf->hf_misc, the handle-based function
table. Finally, the function at offset 20h in hf_misc is pushed on the stack as an
argument. This corresponds td Jh_hf->hf_misc.hm_func[HM_ENUMHANDLE]. In C
the function call would look like this:
Cal l_FSD ( p i f s - > i f s_pfh-> fh_hf- >hf_mi s c . hm_func [ HM_ENUMHANDLE ] ,
IFSFN_ENUMHANDLE , p i f s , FALSE ) ;
1 18 Chapter 6: Dispatching File System Requests
From these two examples, we see that the first argument to Call_FSD is the
address of either a volume-based or handle-based FSD function. The other argu
ments include a constant which identifies the FSD function, a pointer to the
i fsreq packet, and a Boolean. To gain some further insight into this function,
take a look at its pseudocode in Example 6-3.
Call_FSD is j ust a wrapper around the call to the FSD function which is passed as
the first argument. Call_FSD decides whether or not to call a file system API hook
rather than making a direct call to the FSD. The Boolean argument bHookLock
plays a role in making this decision. If bHookLock is FALSE, which is the most
common situation, the file system API hook will not be called if the volume refer
enced by the i fsreq packet has a lock on it.
int Cal l_FSD ( pIFSFunc FSDFnAdr , int Func , i f sreq* p i f s , BOOL bHookLock ) {
fhandle * p fh = pi f s - > i fs_pfh ;
shres * psr = p i f s - > i fs_psr ;
DWORD flags , drive , retc ;
BOOL bCal lHook = bHookLock ;
f l ags = psr->sr_flags ;
i f ( f l ags & IFSFH_RES_NETWORK ) {
if ( Fune <= IFSFN_ENOMHANDLE ) drive = O xf f f f f f f f ;
else drive = p i f s - > i f s_drv ;
}
else drive = psr->sr_uword + l ;
If it is decided that the file system hook will be called, then some additional work
is needed to prepare the arguments to the hook function. Here is a prototype for
this function:
int FileSys temApiHookFunction ( pIFSFunc FSDFnAddr , int FunctionNum,
int Drive , int ResourceFlags ,
int CodePage , pioreq pir ) ;
The first argument is simply the address of the FSD function to be called. The
second argument is the function number being called. This is the same as the
second argument to Call_FSD and would be IFSFN_QUERY or IFSFN_ENUM
HANDLE in the examples shown above. There are some special cases, however. If
the second argument to Call_FSD is either IFSFN_ CLOSE or IFSFN_READ, these
may need to be translated. For JFSFN_CLOSE, IFSFN_FINDCLOSE or IFSFN_
FCNCLOSE may be substituted if the fhandle indicates it refers to a find or file
change handle. Similarly, IFSFN_READ may be replaced with IFSFN_FINDNEXT or
IFSFN_FCNNEXT, if appropriate.
The drive argument for a local drive is derived from the sr_uword member of the
shell resource. This is a zero-based drive number so one is added to it. If the
drive is remote, the . drive is set to -1 for functions less than IFSFN_
ENUMHANDLE; otherwise the drive number in the ifsreq packet is used. The
ResourceFlags argument is the value of the sr_jlags member of the shell resource
ANDed with the mask ALLRE S. The CodePage is determined by the corresponding
bits in the ifsreq member ift_njlags.
Before each call into the file system hook, the global variable cntHookCalls is
incremented; when the file system hook returns, this count is decremented. If this
120 Chapter 6· Dispatching File System Requests
variable is zero, there are no calls executing or blocked which were initiated from
the file system hook chain. A related global variable, claimHookerList, is a
syncronization primitive used to control access to the list of installed file system
hooks. When either IFSMgr_lnstallFileSystemApiHook or IFSMgr_RemoveFileSys
temApiHook attempt to modify the hook list, the critical section around the hook
list needs to be claimed. If cntHookCalls is non-zero, then these services block
until all pending hook calls complete. Threads are blocked waiting for this critical
section when claimHookerList is non-zero. The blocked threads are awakened by
the call IFSMgr_WakeUp(&claimHookerList), once cntHookCalls drops to zero.
FSDs as Providers
The idea of a "provider" stems from the WOSA (Windows Open System Architec
ture) concept of a SP and SPI, a service provider and service provider interface.
IFSMgr and its file system drivers are part of the WOSA-SPI layer, and thus are
considered service providers. During the Device Init stage of system initialization,
each FSD registers with IFSMgr using one of the registration services and thereby
establishes its provider ID. There are four types of providers that an FSD can
supply and these have distinct registration functions: IFSMgr_RegisterMount for
local drives, IFSMgr_RegisterNet for remote drives, IFSMgr_RegisterCFSD for char
acter devices, and IFSMgr_RegisterMailSlot for mailslots. Each of these registration
functions returns a provider ID on success.
��I
� - -
�����!
"
Tu �
� ' " :l>O - •
'
1""'!! • :;. " ' • ;. • •• � -
A cl 05dll94 Sr 04 c0fd4d4c c002ae44 (VDEF(Ol ) • 000001 04) 00000002 0000 0000 10 00000000
c c1021 e74 Sr 00 00000000 c001fde8 (VFAT(Ol ) • OOOOOF54) 00000086 0002 0084 10 00000002
D c0fd4d4c Sr 02 c1 038950 c001 fde8 (VFAT(Ol ) • OOOOOF54) OOOOOOl a 0003 001 8 10 00000002
E c1 038950 Sr 01 c1021 e74 c001fde8 (VFAT(Ol ) • OOOOOF54) OOOOOOOc 0004 OOOa 10 00000002
F c1 074nB Sr 05 c1 05d094 c001fde8 (VFAf (01 ) • OOOOOF54) 00000002 0005 0000 10 00000002
G c1 0cd6e4 Sr 03 c1074718 cOld7554 (CDFS(Ol) • 00000944) 00000004 0006 0002 10 00000001
K c1 07fld4 Sr 06 00000000 c00379d0 (VREDIR(Ol ) • 00004818) 00000004 0000 0002 09 OOOOOOOa
Each column in Figure 6-4 corresponds to a member of the shres structure (see
Appendix C for details) with the exception of the Drive and Sr Address columns
which contain the drive letter and the address of each line's shres structure,
links are shown in the sr_next column. The lists for local drives and remote drives
respectively. The shell resource structures are arranged in a singly linked list; the
are kept separately . . The sr_func column contains the address of this drive's
volume-based function table. The address is decomposed into the FSD's name,
segment, and address, The system that this output was produced on has a floppy
drive A which has not had a floppy inserted since system startup. Until it sees
some media inserted, the default FSD is used: VDEF. The other local drives all use
VFAT except for a CD-ROM which is using the CDFS driver. A connection to
\ \SERVER\SERVER_C is mapped to drive K and it is represented by the MSNet
redirector VREDIR. Note that each of these FSDs has a unique provider ID given
in the sr_Proid column.
If you run ScanDisk on a volume and at the same time capture output from
sr.exe, you will see results like those in Figure 6-5 . You may refresh the SR
display while the ScanDisk operation proceeds, by selecting Refresh from the
Operations menu. In this case, ScanDisk is being executed on drive D. The sr_
LockType column shows the type of volume lock currently active, with 0 corre
sponding to none, 1 to a level 0, 2 to a level 1 , 3 to a level 2, etc. It is interesting
that the sr_func column now indicates that IFSMgr owns the volume function
table for this drive; the original function table address is stored in sr_LockSav
Func. This reflects the fact that IFSMgr takes over the function tables for drives
that are volume locked.
122 Chapter 6: Dispatching File System Requests
Another windows utility, jb.exe, displays fhandle structures for currently open
files on a specified volume. Each column in Figure 6-6 corresponds to a member
of the fhandle structure (see Appendix C for details) with the exception of the
sjn, Pathname, and pfb columns which contain the system file number, the associ
ated pathname, and the address of each line's fhandle structure, respectively.
You may select a different drive or refresh the FH display by selecting the corre
sponding option from the Operations menu.
The first few entries of the list of files open on a system drive (drive C) are shown
in Figure 6-6. The numbers in the sfn column appear to have gaps in the
sequence. In some cases this is because the file was opened as a memory�
mapped file. A memory-mapped has two handles refer to it, the initial fhandle
used to open it (jh_sjn) and a duplicate handle used for the memory-mapping (jh_
mmsfn).
To retrieve a list of open files on a volume, fh .exe relies upon a dynamically
loaded VxD, filefh.vxd. This virtual driver supports a DeviceloControl interface. A
list of open files is requested of FILEFH by supplying it with volume number and
a buffer in which to copy the fhandle structures and associated file names.
FILEFH creates the list by first installing a file system hook and then requesting a
level 1 volume lock on the specified volume. One of the activities associated with
acquiring a level 1 lock is to build a list of open files on the volume. To do this,
the volume locking function (interrupt 21h, function 440dh, subfunction 084ah)
calls FS_EnumerateHandle repeatedly to get the names of all the open handles
associated with the volume. As each FS_EnumerateHandle call comes in, the ifS_
pfh and ir_sfn members of the ifsreq structure are copied. After the FS_Enumer
ateHandle call completes, the filename is also copied. When the volume lock
function completes, the volume is immediately unlocked and the file system hook
is removed. One advantage of using a volume lock to get the file list is that it
creates a snapshot at one instant in time.
Monitoring File
Activity
IFSMgr provides at least three methods for hooking file system notifications. The
most general technique is to install a file system API hook. This method allows an
application to see much of the i fsreq packet traffic that passes through to file
system drivers. This method can also change the way a request is handled, and so
can serve to override the behavior of a FSD . Another source of notifications can
be tapped by installing a hook (using Hook_Device_Service) on the service
IFSMgr_NetFunction. IFSMgr makes various internal broadcasts through this func
tion, such as when a drive appears in a system or when a drive goes away. This
service is also called when a "hooked" Int 21h function is called. Here, the term
"hooked" means that a preamble has been installed for an Int 21h function which
is greater than 71h. Some Int 2th functions also generate events here. Yet another
source of notifications can be received by way of IFSMgr_ParsePath (or IFSMgr_
FSDParsePath) to allow a FSD installed path checking routine to get a first crack at
parsing a path. This path checking routine is installed with the service IFSMgr_
SetPathHook.
A file system API hook is installed using the IFSMgr service IFSMgr_InstallFileSys
temApiHook. Once it is installed, it is not permanent, it can be removed using the
124
The File System API Hook 125
Under what conditions is the file system hook called? Generally, any file system
request, either local or remote, will pass through the installed hook function. The
hook will also see activity on any character FSDs, such as LPTn and PRN of
spooler.vxd and PIPESTDX of vcond.vxd. IFSMgr_RingOFileIO and IFSMgr_Server
DOSCall services are also routed through the file system hook.
Having said that, you should be aware of some exceptions. IFSMgr does not
always use Call_FSD as the gateway into file system drivers. For instance, there
are circumstances where FS_MountVolume is called directly using the addresses in
MountVolTable [ ] . Similarly, FS_ConnectNetResource sometimes is called
directly through ConnectNetTable [ ] .
Even if Call_FSD is used, recall that one argument to that function controls
whether a file hook will be called when a volume lock is taken. So, if a volume
lock is in place you won't see the FSD calls on that volume. Another peculiarity
occurs with the functions that support file change notifications. The FindFirstFile
ChangeNotification call does not go through the file system hook although the
FindNextChangeNotification and FindCloseChangeNotification functions do.
Although, some "change" notification functions do go through the file system
hook, they do not get serviced by a file system driver; rather they are routed back
into IFSMgr.
We saw this function called in the routine Call_FSD in the previous chapter.
The first argument, FSDFnAddr, is simply the address of the function to call in the
FSD . It corresponds to one of the addresses in the volume-based or handle-based
tables (see Table 6-6 and Table 6-7). Most commonly this address resides in
another VxD, although there are cases where this address will reside in IFSMgr
(the change notification functions and the mailslot functions).
The value of the FunctionNum argument tells us which FSD function is being
called. There is a mapping between the set of FunctionNum values and the
entries in the FSD's volume-based and handle-based function tables. Table 7-1 ,
later in the chapter, shows this relationship. There are two exceptions to this rule:
IFSFN_FCNNEXT and IFSFN_FCNCLOSE do not have FSD functions corresponding
126 Chapter 7: Monitoring File Activity
to them. This is because the support for file change notifications is done entirely
within IFSMgr without the participation of FSDs. Still, these functions are sent
down the file system hook before being processed by IFSMgr, and IFSMgr has an
internal handle-based function table which is referenced by the fhandle structure
which FindFirstChangeNotification creates. FindFirstChangeNotification is not sent
down to the file system hook, so there is no FunctionNum corresponding to it.
The third argument, Drive, is the 1-based volume number to which the function
refers. If the volume resource is a UNC name, this argument has the value -1 .
There are situations where Drive can have the value 0. This may happen when
the target resource is a character FSD. In general, you can think of Drive as corre
sponding to ir_rh, the resource handle, and ifs_psr, the address of the shell
resource structure.
The fourth argument, ResourceFlags, is a collection of four bits extracted from the
shell resource that indicate whether the resource is a character FSD , whether it is
local, whether it is remote, and whether it is represented by a UNC name.
The CodePage argument indicates which of the ANSI or OEM code page character
sets should be used with the function. The corresponding manifest constants are
BCS_ WANS! and BCS_OEM.
The last argument, pir, is a pointer to the ioreq or i fsreq stn,icture. This is the
only argument passed to the FSD. The other arguments here are provided as a
convenience to the file hook.
So what can a file system hook do when it gets called? Here is what Microsoft
says, in DOS/Win32 Installable File System Specification, p. 70:
The hooker gets control before the FSD is called to perform the function and it
can do anything it wants. Hookers can do one of four things when they get called
on a hooked call:
Ignore the call and chain on to the previous hooker in the hook chain.
Process the call and return directly to the IFS manager.
Change the call or make multiple calls to the FSD directly, and then return to the
IFS manager.
It can call down the chain and do some processing on the way back.
Basically, the hooker has complete control over how it wants to process the call.
From this description it would appear that anything is possible in a hook func
tion. The documentation does not elaborate on how to go about making "multiple
calls to the FSD directly. " It does hint that:
The preferred method for hookers to perform other functions while on a hooked
call is to use the ring-0 APis. It is usually quite safe to issue a ring-0 API call while
on a file system API hook; the IFS manager is re-entrant.
1be File System AP/ Hook 127
By "ring-0 API call'' one would have to consider that both IFSMgr_RingO_FileIO
and IFSMgr_ServerDOSCall are fair game. An equally attractive alternative is to
perform a direct call into the FSD without performing re-entrant calls to the
dispatch point. This requires that we use our knowledge of undocumented fields
in the i fsreq structure, namely ifs_psr and ifs_pfh, to access the volume-based
and handle-based function tables. It is clear that this is what is implied in the state
ment "make multiple calls to the FSD directly." We'll work through a few
examples to give you a feel for these different approaches.
FSHook
FSHook is a file system API hook that reports all FSD calls to MultiMon for
display. Its predecessor, FILEMON, was the basis for an article on monitoring file
system activity in Windows 95 that appeared in what was then called Windows/
DOS Developer's journal ("Monitoring Windows 95 File System Activity in Ring O,"
July 1995; now Windows Developer's journal). The file monitor presented here is
much improved. It is configurable through MultiMon's filter settings; it spools its
output to a file for later display and the spooler file is accessed using ring-0 APis.
These changes eliminate the buffer overrun problems that FILEMON had. FSHook
output can be combined with other monitor output to gain a multidimensional
picture of system activity.
FSHook displays one line of output for each FSD call. Each FSD call is identified
by a function number (see Table 7-1 for a list of possible values). Output from
FSHook tends to be rather lengthy if all functions are included, so usually it helps
to filter out Read, Write, and Seek functions. Figure 7-1 contains a trace fragment
that was collected during the system's response to a right mouse-button click on
the icon for drive A, when the drive did not contain a floppy diskette. The first
column, which contains "Explorer," is the process which was executing when the
128 Chapter 7: Monitoring File Activity
call was made. fsh is an identifier for file system hook entries in the trace. The
next column contains the name of the operation; here we see FS_MountVolume
for IFSFN_CONNECT, FS_loctl16Drive for IFSFN_IOCTL16DRIVE, and FS_FileAt
tribs for IFSFN_FILEATTRIB. The dispatch function (ifs_func) associated with an
operation is shown in parentheses. The Flagsl column shows the settings for ifs_
nflags and the ResourceFlags passed in to the file hook function. For the FS_
MountVolume entries, ifs_func and ifs_riflags are both 0, indicating that these
FSD calls did not directly originate from a dispatch call; rather, they were "spun
off' to bring the volume online. For the FS_FileAttributes entries we see · the
dispatch function 43h, which corresponds to the Int 21h function number for
this function. It is a long filename call (L) and it uses extended handles (X), i.e . ,
getting or setting file attributes. The ifs_njlags indicate two conditions accompany
this was Int 21h function 7143h. The first five characters i n the Flagsl column are
a sequence of 5 letters, eclnu. An e indicates the call reported an error, a c indi
cates the call is to a character FSD, an 1 indicates the call is to a local FSD, an n
indicates the call is to a network FSD, and a u indicates that the remote volume is
referenced by an UNC name. From this we see that all of the FS_MountVolume
function calls for drive A have failed. The Device column gives the name of the
FSD which was called. Here we see an attempt to mount drive A through VFAT,
but that fails. The next available local FSD is VDEF, the default FSD. A mount is
attempted through its FS_MountVolume, and it also fails. If there were additional
local FSDs in the system, they would be called before VDEF. Finally, we see the
call to FS_FileAttributes getting passed to VDEF and it fails. The Gt signifies get
attributes and "A: " is the path for which the attributes are requested.
Figure 7-2 shows anoth_er sample fragment. Here we see a sequence of FS_Read
File calls on a local volume supported by the VFAT FSD. For FS_ReadFile and FS_
WriteFile functions, the FSD name is followed by system file number, some func
tion arguments, and another set of flags in the Flags2 column . The possible
characters in the Flags2 column are msn, where an m indicates a memory-mapped
file access, an s indicates a swap file access, and an n indicates that caching
should not be used on the call. What is significant about the calls in this sample is
that they are reads from the paging file and they all have a system file number of
200h, the base value for the range of extended file handles. Also notice the value
of the dispatch function (d6h) and the R flag under Flagsl . These indicate that the
read originated as an IFSMgr_RingO_FileIO call.
Explorer fsh FS_ReadFile [d6) e_clnu_sl.HRmwoa VFAT 200 cnt=1 OOOH ofs=388000H ptr=cl 35IOOOH ·sn
Explorer fsh FS_ReadFile [d6) e_clnu_sLHRmwoa VFAT 200 cnl=1 OOOH ofs=373000H ptr=cl 35fOOOH ·sn
Explorer fsh FS_ReadFile [d6) e_clnu_slxRmwoa VFAT 200 cnl=1 OOOH ofs=372000H ptr=cl 35fOOOH ·sn
For a complete reference to the meanings of the various fields in FSHook output,
see Appendix B, MultiMon: Monitor Reference.
To ease implementation of FSHook (and other samples), all of the IFSMgr services
· have been wrapped as C-callable routines · and made available through
ifswraps.clb. (For more information see Appendix D, IFS Development Aids.)
The simplest scenario for installing a file system hook would start with a call to
IFSMgr_InstallFileSystemApiHook during Device Init phase. This function takes the
address of the hook function to be installed and returns the address of the
previous hook function you chain onto. Example 7'-1 shows the simplest possible
hook function, where ppPrevHook is a pointer to the previous hook function. It
simply calls the previous hook function and returns.
The FileHook function used by FSHook examines the function number . to deter
mine the type of function call and fills in an event structure describing the
function call. When the call into the previous hook function returns, the error
status and sometimes other values are retrieved and added to the event structure
The File System API Hook 131
FSHQuery
FSHQuery demonstrates how to "piggyback" an additional call to a FSD whenever
a FS_DeleteFile is attempted. The piggybacked call is a FS_QueryResourcelnfo,
the equivalent of a GetVolumelnformation Win32 call for local drives or a WNet
GetConnection for a remote drive. The code for FSHQuery's file system hook
function is shown · in Example 7-2 . This is a stand-alone driver that is installed by
making an entry in the system. ini file. To see its output you need to execute it
with a kernel debugger (Winlce or WDEB386).
drv , ps zName ) ;
Debug_Print f ( " Query level 0 , drive %d resource name %s\n" ,
cp , ( p i oreq ) p i f s ) ;
IFSMgr_RetHeap ( pirx ) ;
}
The general approach is to clone the i fsreq packet that is used by the FS_Delete
File call. This gives us a painless way to get the ir_pid, ir_user, ir_rh, ifs_psr, ifs_
VMHandle, and ifs_PV fields. Some of the remaining fields will require initializa
tion for the FS_QueryResourcelnfo call. Specifically, it is necessary to set the ir_
options member to the "query level, " level 2 for local resources and level 0 for
remote resources. If it is a level 2 query, we need to provide a buffer to hold the
returned file system name string, in ir_data, with the length of the buffer given by
ir_length. On the other hand, for a level 0 query, we just provide a pointer, in ir_
ppath, to a buffer . for the returned ParsedPath structure which represents the
name of the remote resource.
Several of the fields require buffers--one to contain the cloned ifsreq, one to
contain a ParsedPath structure, etc. You'll notice that _HeapAllocate is not used
here, but instead IFSMgr's heap routines: IFSMgr_GetHeap and IFSMgr_RetHeap.
IFSMgr creates its heap in . pages of locked system memory. There is a main heap
and a "spare heap" ; the latter is allocated prior to entering the dispatch point by a
is that for requests less than a page in size, it will not trigger paging activity. This
call to IFSMgr_FillHeapSpare. The advantage of using the IFSMgr_GetHeap routine
is a requirement for file hooks and FSDs that are accessing the swap file or a
The File System API Hook 133
In Example 7-2, the actual call into the FSD occurs at the following lines:
FSHEnum
FSHEnum demonstrates how to piggyback an additional call to a FSD whenever a
FS_CloseFile is attempted. The piggybacked call is a FS_EnumerateHandle,
subfunction ENUMH_GETFILENAME. There is no Win32 or Int 21h call that
directly maps to this function. The closest ones are GetFilelnformationByHandle
which maps to FS_EnumerateHandle, subfunction ENUMH_ GETFILEINFO, and Int
21h Function 440dh Subfunction 086dh, Enumerate Open Files. The code for
FSHEnum's file system hook function is shown in Example 7-3. This is a stand
alone driver that is installed by making an entry in the system. ini file. To see its
output you need to execute it with a kernel debugger (Winlce or WDEB386).
Here again we clone the i fsreq packet that, in this case, is used by the FS_Close
File call. This gives us a painless way to get the ir_pid, ir_user, ir_rh, ir_sfn, ir_
jb, ifs_psr, ifs_pjb, ifs_ VMHandle, and ifs_PV fields. Some of the remaining fields
will require initialization for the FS_EnumerateHandle call. Specifically, it is neces-
134 Chapter 7: Monitoring File Activity
sary to set the ir.Jlags member to ENUMH_ GETFILENAME to request the filename
for the given resource handle (ir_rh) and FSD file handle (irJh). We also need to
provide a pointer, in ir_ppath, to a buffer for the returned ParsedPath structure
which represents the name of the file.
cp , (pioreq) & i f s ) ;
memset ( ps zName , 0 , MAX_PATH ) ;
qw = UniToBCSPath ( pszName , pUniPPath- >pp_elements , MAX_PATH , cp ) ;
i f ( qw . ddLower ) {
Debug...:,P rint f ( • closing f i l e % s \ n • , ps zName ) ;
}
IFSMgr_RetHeap ( ( void* ) pUniPPath ) ;
}
IFSMgr_RetHeap ( ps zName ) ;
}
It is important to note that the filename is not stored by IFSMgr. It is the job of the
FSD to store this information for files which are opened on its drives. IFSMgr only
holds onto the FSD file handle and fhandle information. When an open occurs
the FSD receives a name in a standard canonicalized form (a ParsedPath.).
Whether the drive accepts a particular name depends on its underlying filesystem.
So it makes sense that, given a SFN (System File Number), it would be necessary
to retrieve its name from its FSD.
The File System API Hook 135
In Example 7-3, the actual call into the FSD occurs at the following line:
(pioreq) & i f s ) ;
When I was testing this code with Build 950 of Windows 95, I found an inter
esting bug in VCOND, the Virtual console . device for Win32 console applications.
VCOND registers a character FSD with IFSMgr called PIPESTDX. This is . used
when redirecting output from a console application, such as l'lllg
lilin NMAKE
from an editor and collecting its output to a file. FS�CloseFile is called on a
handle of this character FSD. The bug appears when attempting to call FS_Enumer
ateHandle for this handle--it will always crash the system. The problem occurs
because VCOND's handle-based function table does not contain a valid function
address for HM_ENUMHANDLE (it is always OOOOOOOlh). It should implement an
error handler if it doesn't support the function.
FSHAttr
For a final file system hook example, we'll use IFSMgr_RingO_FileIO to create a re
entrant call into the dispatch point. We aren't able to take the FSHQuery or
FSHEnum examples and redo them using this ring-0 API because they each use
FSD APis that are not exposed through the ring-0 interface. So in some cases, the
"direct call to FSD" approach is the only one viable.
FSHAttr demonstrates how to piggyback a ring-0 call to Get File Attributes when
ever a FS_DeleteFile is attempted. The piggybacked call is a IFSMgr_RingO_FilelO,
subfunction RO_FILEA TTRIBUTES. This is equivalent to a Int 21h function 7143h
136 Chapter 7: Monitoring File Activity
call. The code for FSHAttr's file system hook function is shown ib. Example 7-4.
This is a stand-alone driver that is installed by making an entry in the system. ini
file. To see its effect, you need to look at the trace output from FSHook after
performing some file deletes.
*p++ = · @ · + drv;
memset ( ps zName , 0 , MAX_PATH l ;
;
attr = ( retc== O l ? r . r_ecx : O ;
Debug_Print f ( " FSHATTR : % s attribs : % 0 4x\n " , p s zName , attr )
}
IFSMgr_RetHeap ( ps zName ) ;
}
AH = 4 3 h , AL = O Oh ,
E S I = l inear address o f pathname .
On return, if carry is clear, then the attributes are in the ex register; if carry is set,
AX holds the error code.
The NetFunction Hook 13 7
There is an error in the IFS Specification regarding the arguments to this function.
It shows the calling parameters as AH=RO_FILEATIRIBUTES. This has the effect of
setting AH to 0 because RO_FILEATIRIBU1ES is defined as Ox4300 in ifs.h.
Instead, you should set AX=RO_FILEATIRIBU1ES and then adjust AL to 0 for a get
and 1 for a set.
Figure 7-3 shows the FSHook trace when deleting c: \ windows\desktop \ test.txt
from Explorer. The FS_FileAttributes entry preceding the FS_DeleteFile shows that
the re-entrant ring-0 API call goes through the file system hook.
An IFSMgr_NetFunction hook will receive four arguments on each call. These are
a pointer to an i fsreq structure appropriate for the call, a pointer to the client
registers structure, a provider identifier, and a flag indicating whether the call origi
nated from a Win32 API (see Example 7-5). All of the arguments actually
reference the contents of the i fsreq structure, i.e . , pRegs is &(pir->ifs_crs), prold
is pir->ifs_proid, and flags is given by the expression (pir->ifs_njlags & Ox04). A
NetFunction handler will need to examine the Client_AX value in the client regis
ters structure to determine the type of call. The calls can be grouped into three
different categories: IFSMgr broadcasts, dispatch handlers, and DeviceloControl
handlers.
Table 7-2 shows the function values for IFSMgr broadcasts. The first five entries in
the table correspond to events generated by IFSMgr. The function type is given by
the value of Client_AX in the client register structure. Functions 1 and 2 occur
when a drive (local or remote) appears or disappears from the system. When
these events are broadcast, the i fsreq structure contains the resource handle for
138 Chapter 7: Monitoring File Activity
the drive (ir_rh), the 1-based drive letter (irJlags), and the provider ID for the
FSD which handles the drive (ir_aux1 .aux_u/). Functions 3, 4 and 5 report events
for network printers. For these functions, prold contains the provider ID of the
printer handler, and ifsreq holds the resoiirce handle (ir_rh) for the printer, a
LPT9) to the printer (irJlags). For each of these calls, the return value is stored to
buffer to contain a returned job ID (ir_data), or an index (0-8 for LPTl through
The last entry in Table 7-2 corresponds to an Int 2Fh function call and should be
lumped together with the dispatch handlers. DOS!Win32 Installable File System
Specification, p. 9 1 , has this to say about NetFunctions:
This service is provided to export certain functions most of which are specific to
the network FSDs. These functions can come from a variety of sources: Int 21h
and int 2th functions that the IFS hooks but does not support, Int 21h functions
that the IFS does not support that are hooked via IFSMgr_SetReqHook . . .
Several of the dispatch functions listed in Table 6-3 call into IFSMgr_NetFunction.
These include dProcExit, dFunc5F, and dNetFunc. dProcExit corresponds to the
.
Int 2th call 1 1 l dh. Some other Int 2th functions are sent to dNetFunc: 1 180h,
· 1 181h (NF_NetSetUserName), 1 1 82h, 1 184h, 1 18bh, 1 18ch, 1 18dh, and 1 18eh.
dFunc5F handles several Int 21h functions in the range 5f00h through 5f53h.
Many of the functions in this range and all those greater than 5f54h are routed to
IFSMgr_NetFunction. For some of these functions, IFSMgr does provide an imple
mentation (e.g. , dProcExit) and the call to IFSMgr_NetFunction is only another
form of broadcast. However, in most cases IFSMgr only goes as far as wiring the
functions up to the dispatcher so that a FSD can use a NetFunction hook to
provide an implementation.
1be NetFunctton Hook 139
Actually, IFSMgr takes this interface a step further by allowing some Int 21h func
tions to be attached to the dNetFunc dispatch function. This is done by installing
a preamble for the function using IFSMgr_SetReqHook. We looked at preamble
functions back in Chapter 6, Dispatching File System Requests. There we concen
trated on the preambles which IFSMgr installs by default for Int 21h functions in
the range 00 through MAXDOSFUNC Here, we are interested in the Int 21h func
tions from MAXDOSFUNC+ 1 to FFh.
The preamble function decides whether it wishes to accept the Int 21h function
(see Figure 6-1 and Example 6-1), which has dNetFunc as its handler. The
preamble function only decides whether it wants to accept the call; it is the
IFSMgr_NetFunction hook which will actually look for the function call by exam
ining the Client_AX register value . Unlike the broadcasts from IFSMgr, which
provide information, these calls to IFSMgr_NetFunction are requests for a service.
This implies that if a FSD completes the request it should not pass the request
down the chain. Rather, it should return with the same value that it stuffed into ir_
error.
One additional source of calls into IFSMgr_NetFunction come from IFSMgr's Devi
celoControl interface. In Chapter 4, several IOCTL Services were described. Two
of these, IFS_IOCTL_2 1 and IFS_IOCTL_2F, use the contents of the win32apireq
structure to fill the client register portion of an ifsreq packet. The remainder of
the packet is initialized and then, for functions of the 5fxxh series, are sent to
dFunc5f. Others are routed to the chain of IFSMgr_NetFunction hooks.
NetFunc
NetFunc is a IFSMgr_NetFunction hook that reports all calls to MultiMon for
display. NetFunc shows one line of output for each NetFunction call. Figure 7-4
shows a sample trace fragment that was collected while running a simple program
from DEBUG in a DOS box. The first column, which contains "VM2", indicates
the process was executing in a second VM (DOS box) when the call was made.
nfn is an identifier for NetFunction entries in the trace. The next column contains
the function number. 8000h corresponds to an Int 21h function that NetFunc has
installed. Function l l ldh is recorded when DEBUG is terminated. The Args field
shows the values of the EDX and ESI registers. The four bytes that comprise EDX,
from most significant to least significant, are: ift_nflags, ift_hflag, ift_drv, and ift_
June from the i fsreq structure; ESI contains the value of the provider ID passed
to the hook function. In Figure 7-4, we see interrupt 21h function 80h map to the
dispatcher function D4h and we see interrupt 2fh function l l ldh map to
dispatcher function 93h.
140 Chapter 7: Monitoring File Activity
The hook function installed by NetFunc is shown in Example 7-6. This function
does not use a stack frame so that the HOOK_PREAMBLE macro can insert extra
information to allow the hook to be removed. This also requires that the calling
arguments be moved into local variables so they can be referenced by C state
ments. There are two main sections here. Iri the clause beginning i f
(bEnabled) ..., the routine i s checking if MultiMon has enabled monitoring of
IFSMgr_NetFunction calls. If so, it prepares a notification structure and sends it.
The next interesting clause begins if (pRegs->Client_AX == Ox8 0 0 0 ) .... This
checks if the function we are being called on is one that we have installed a
handler for. If it is, we just print out a message and return. Otherwise, we restore
the original stack frame and jump to the next hook function.
The actual preamble function, MyPreamble, is shown in Example 7-8. This func
tion simply clears the cany flag and returns. Some logic may be required to
decide whether to accept or reject the request.
# i fde f NOT_HOOKED
I I . . . I f we don ' t handle i t , call the next preamble
_asm jmp dword ptr pPrevPreamble
#else
I I . . . D o whatever checks a r e required
_asm c l c I I C lear carry i f w e accept the function c a l l
_asm ret
# endi f
}
To test our preamble and NetFunction hook we need to generate an Int 21h Func
tion 80h call in either V86 or protected mode. The simplest way to do this is to
142 Chapter 7: Monitoring File Activity
open a DOS box and run DEBUG. At the - prompt, type the following four-line
program:
- al O O
mov ax , 8 0 0 0
int 2 1
mov ax , 4c 0 0
int 2 1
-g
Then let it execute. To see the message "Int 2 1 h Function 8000h called, " a kernel
debugger will have to be running (Winlce or WDEB386) . This little program also
creates the MultiMon trace shown in Figure 7-4 when the IFSMgr NetFunction
filter is enabled.
Hooking a Path
The last hook function that we'll take a look at, IFSMgr_SetPathHook, is closely
tied to IFSMgr_parsePath (and IFSMgr_FSDParsePath) . Recall that IFSMgr_Parse
Path is called for the volume-based FSD functions that receive a path string (in
i fsreq member ir_data). In other words, in preparation for calling FS_OpenFile,
FS_FileAttributes, etc. , a call into IFSMgr_ParsePath is needed to set up the
i fsreq packet. By parsing the path string, this service fills in the ifs_psr member
of the ifsreq packet, as well as the ParsedPath structure required for ir_ppath.
This service installs a path check routine and returns a previous path check
routine. The service is available at Device Init or Init Complete time. The path
check routine is called by IFSMgr_ParsePath if the input path does not contain
leading \ , / , or d : characters. What does a path check routine do? Here is what
Microsoft has to say in DOS!Win32 Installable File System Specification, p. 90:
This service has been provided for FSDs to check for special path prefixes and
process them separately. The FSD can register a routine with the IFS manager that
is called every time a path is parsed. If this is a prefix the FSD wants to process, it
can claim it and the IFS manager will then call the FSD directly on the path-based
operation.
If the path check routine does not "claim" the path, then it needs to jump to the
previous path check routine with all registers preserved. The last path check
routine in the chain is supplied by IFSMgr; it just sets the carry flag and returns.
This tells the parser to use default handling.
The inputs to and outputs from the path check function are summariz_ed in Table
7-3. As you can see it is entirely register-based, so it needs to be written in inline
Hooking a Path 143
assembly code. We also see from the input arguments that by the time the path
check function is called, the ir_data member of i fsreq has been translated into
a Unicode string (ESI); however, the PathElements (EDI) have not been created
yet.
The path check .routine can look for a specific signature at the beginning of the
string pointed to by ESL This string can be a prefix which is stripped off from the
remainder, or it may convert the prefix into some other string or character and
store it to a PathElement structure in the buffer pointed to by EDI . The prefix
string may also just be copied to a PathElement. There is considerable flexibility
here: from one extreme, the string may be completely parsed into PathElements
before retUrning; to the other extreme, the entire path might be passed back and
no parsing is done, only the provider ID is set. If any of the string is passed back
Over the course of this book we have progressively stripped away the layers of
the Windows 95 file system. We have seen that the programming APis converge
upon a dispatch point that has the characteristics of an extended Int 21h interface.
Many of the dispatch functions require support from an underlying file system
driver. In the last chapter we used MultiMon, with the FSHook driver, to monitor
the calls into the underlying FSDs. In this chapter we will shift our focus to the
file system drivers.
144
FSDs Come in Tbree Flavors 145
returns volfunc [ l
Volume Mounting FS_MountVolume � Mount call
�
returns hdlfunc [ ]
File Open FS_OpenFile � Open call
�
Character FSDs
The term character originated in · the UNIX world to distinguish block and char
acter devices. Block devices are characterized by data transfers of blocks of data
of a fixed size (usually the sector size), whereas character devices transfer data
byte-at-a-time in a serial fashion. This is also the meaning attached to character as
it applies to FSDs.
Some examples of character FSDs include vcond.vxd and spooler. vxd. VCOND,
the virtual console driver, exposes a number of Win32 VxD services which are
used by KERNEL32 to provide support for Win32 console applications. Tucked
away inside this driver is a character FSD, which registers under the name
PIPESTDX. This device is opened by redirect.mod, which in turn is loaded by
KERNEL32, to enable redirection for certain kinds of console applications.
SPOOLER, the other example given, is a character FSD registered for the system
printer devices: LPTl through LPT9 and PRN.
Character FSDs are good candidates for modeling devices which transfer data a
byte at a time and which do not already have an existing driver class. It is the
lack of dependency on the 1/0 subsystem or network protocol stack that makes
this type of FSD most flexible.
146 Chapter 8: Anatomy of a Ftle System Driver
Local FSDs
A local FSD provides support for local storage devices, such as floppy disk drives,
fixed disk drives, and CD-ROM drives.
Local FSDs register with IFSMgr by calling the service IFSMgr_RegisterMount. The
registering FSD passes the address of its FS_MountVolume entry point. Local
storage devices are partitioned into volumes, and when a volume is first accessed,
FS_MountVolume is called on each local FSD until one recognizes the media and
claims it. This establishes a shell resource for the local device and the volume
based function table which provides linkage to IFSMgr.
·
The system registers one default local FSD through IFSMgr_RegisterMount. When
IFSMgr searches for a local FSD to claim a volume, the search may fail. The
default local FSD is there to claim those volumes that other local FSDs do not
recognize. Some common situations where this would occur include an unfor
matted volume or a floppy drive without media inserted.
Some examples of local FSDs include vfat.vxd, cdjs.vxd and vdefvxd. VFAT is the
protected mode FAT file system driver that provides access to most floppy and
fixed media. CDFS is the protected mode IS0-9660 file system driver that provides
access to CD-ROM media. VDEF is the default local FSD (the source for vdej.vxd
is given in the DDK).
Each storage device present in the system requires one or more hardware drivers
that fall under the umbrella of the I/0 subsystem. These drivers hide the differ
ences · in bus types and controller chip sets, and present a logically consistent
view of the various devices, to the file system drivers. Thus, local FSDs rely upon
the I/0 subsystem services for their implementation. Local FSDs also conceal
knowledge of the disk layout for a specific file system. A local FSD just accepts
properly constructed filenames and returns handles through which logical opera
tions may be performed.
Remote FSDs
A remote FSD connects to a resource which is shared by a server. There are two
scenarios. In a peer-to-peer network, each system may be a client and a server
and the protocol stacks of the client and server match, layer for layer. In a
non-peer-to-peer network, a client PC system connects to a server host; there is
no peer server.
The remote FSD, which resides in a client machine, connects to the server
through some network medium and protocol. IFS requests on the client machine
are redirected by the remote FSD to the server. The shared resource can be a char
acter or block storage device.
FSD Mechanics 147
Remote FSDs register with IFSMgr by calling the service IFSMgr_RegisterNet. The
registering FSD passes the address of its FS_ConnectNetResource entry point.
Dynamic connections to remote resources are made using the service IFSMgr_
SetupConnection and broken by IFSMgr's internal function IoreqDerefConnection.
These services call FS_ConnectNetResource and FS_DisconnectResource, respec
tively. A connection is attempted when a UNC path is resolved to a remote server
and share. If the connection is mapped to a volume, then the connection persists
until the volume is explicitly unmapped. Each connection to · a unique remote
server and share is represented by a shell resource.
To support the Windows 95 peer-to-peer networking, Microsoft Networks and
Microsoft Netware Networks clients and servers are included in the package. The
Microsoft Networks client is the remote FSD, vredir.vxd, and its matching server is
vseroer.vxd. These components work with NetBEUI, TCP/IP, and IPX/SPX proto
cols through the NetBIOS interface. When an IFS request is redirected by VREDIR,
it is in the form of the Server Message Block (SMB) protocol. VSERVER interprets
the SMB protocol and, if appropriate, generates an IFS request on the server
mach_ine using the IFSMgr_ServerDOSCall service. The results of the request are
then returned via the SMB protocol.
In a similar fashion, the Netware Networks client is the remote FSD, nwredir.vxd,
and its matching server is nwserver.vxd. These components work with the IPX/
SPX protocols. When an IFS request is redirected by NWREDIR, it is in the form
of the Netware Core Protocol (NCP). NWSERVER interprets the NCP protocol and,
if appropriate, generates an IFS request on the server machine using the IFSMgr_
ServerDOSCall . The results of the request are then returned via NCP.
FSD Mechanics
There are certain characteristics of an FSD that you must understand to use them
properly: the contents of the Device Description Block; whether it is static or
dynamic; how it can be segmented; and how it is affected by multiple threads.
Initialization order for a static FSD is important. The header file vmm.h defines
the manifest constant FSD_INIT_ORDER (Oxa0010100) as the base value for FSDs.
This assures that they load after IFSMgr. This is the /nit_ Order assigned to VFAT,
CDFS, and VDEF. But again there are exceptions to the rule. In the case of remote
FSDs, the /nit_ Order may also require that other network components be loaded
before the FSD. For example, VREDIR has an Init_Order of Oxa0021000, which
assures that it loads after IFSMgr and also after vnetsup.vxd. VCOND breaks even
this rule by having an /nit_ Order of UNDEFINED_ ORDER (Ox80000000) that is less
than IFSMgr. It gets away with this because VCOND does not register its character
device with IFSMgr until a V86 API is called in response to running a console
application. This is long after IFSMgr has completed its initialization.
All VxDs have a control procedure and FSDs are no different.
Static or Dynamic?
The DOS!Win32 Installable File System Specifcation is emphatic about FSDs being
static drivers. On page 3, it states:
The FSDs will be loaded and initialized when the system starts up. Once they are
loaded they will . remain loaded until the system hardware is shutdown or
rebooted.
This makes sense because a file system has to be in place for the operating
system to start up. However, there may be circumstances where an FSD might
load dynamically; this is especially true of character FSDs.
If you intend to unload the FSD as well, one precaution needs to be observed.
This arises because registering an FSD with IFSMgr creates a permanent linkage to
the mount entry point and, in the case of character FSDs, a list of device names.
Removing these from memory by performing an unload may eventually lead to a
page fault. One work-around is to make the segment containing the mount entry
point and device names a static segment.
OEM Service Release 2 appears to expand the options available to FSDs. Although
the services are undocumented at this time, two new services are provided for
registering and deregistering FSDs with IFSMgr. (See Chapter 1 2, A Survey of
IFSMgr Services.)
Segmentation
This section may seem to be an anachronism; after all, weren't segments
supposed to go away with 32-bit code? Segmentation as used here might be more
accurately thought of as groupings of code or data with similar attributes. For
instance, some code gets discarded after Device Init, other code is locked in
FSD Mechanics 149
memory and never swapped to disk, while pageable code may be paged-out
when demands upon system memory require it. Although these code and data
areas are distinct "objects" with different memory attributes, they are part of the
continuum of the 4-gigabyte address . space and thus don't require selector
changes when switching from one to another.
The segmentation of a V:xD is rooted in its linear executeable (LE) file format.
Each grouping of code or data is assigned to a distinct object in the file. The
attributes of each object determine what the loader does with it. An object will be
created for each unique (non-empty) segment in the assembly language source.
Traditionally, a macro from vmm. inc is used to specify the segment directives in a
V:xD.
Using C to write V:xDs is more typical today and this change requires using a
different sort of macro to specify segmentation. These new macros are found in
vmm.h. The more common ones are reproduced in Example 8-1 .
The keywords code_seg and data_seg are pragma directives specific to the
Microsoft compiler. The first argument in parentheses is the Portable Executable
section name and the second argument is a class name. At the compile stage, a
section. At the link stage, instead of creating a portable executeable (PE) format
COFF object module is created with each segment name mapped to the named
EXE file, the linker generates a V:xD with the OBJ's sections mapped to linear
executeable objects.
Example 8-2 shows a C code fragment using pragmas to set the code and data
segments. The assembly language output from the compiler for this fragment is
given in Example 8-3. To assure that pageable_item is assigned to the .proper
segment (_PDATA), it is necessary to initialize it; otherwise the variable will be
assigned to the _DA TA segment, the default segment for uninitialized data.
Segmentation also affects which library routines are statically linked to a V:xD. The
libraries VXDWRAPS and IFSWRAPS create six versions of each routine, one
specific to each of the main segment types. The name of a library routine is
PUBLIC _pageable_func
_PTEXT SEGMENT
_pageable_func PROC NEAR
Multi-Threading Considerations
As noted in Chapter 7, Monitoring File Activity, the path through the file system is
multi-threaded. This will have an impact on the design of an FSD. Any global data
accessed by more than one thread in an FSD must be protected by synchroniza
tion primitives. A variety of synchronization services are supplied by VMM to fill
this need.*
In the sample FSDs described at the end of this chapter, I use a simple technique
based on blocking identifiers. To gain access to a critical section containing a
shared resource, the following page-locked code acts as a guard:
DWORD c lail'Q_resource = - 1 ;
• For a good discussion of synchronization setvices, see Walter Oney's account i n Systems Programming
for Windows 95 (Microsoft Press), Chapter 9.
152 Chapter 8: Anatomy of a File System Driver
If only a single thread has attempted to claim the critical section, then on leaving,
the variable claim_resource will be 0, and decrementing it will restore it to -1 and
execution will continue at the label released_resource. However, if one or
more threads have been blocked attempting to get at the resource, then claim_
count will be greater than or equal to zero . after the decrement operation. In this
case, claim_resource is reset to -1 , all threads which are currently blocked on the
specified blocking ID are signaled by the call to the service _SignalID, and then
the critical section is left. Since all threads blocked on the &claim_resource ID
will be awakened, the first one to retry the get_resource test above will be able
to access the critical section.
FSD Linkage
Although much of IFSMgr's internals are undocumented, perhaps an area where
documentation is most sorely missed is in how IFSMgr and FSDs establish their
linkage. A better understanding of this linkage can help when analyzing certain
kinds of bugs, like "Why doesn't IFSMgr call my FSD?" or "Why isn't my FSD
mounted?"
The process of making a device visible to IFS is called mounting if the device is
local, or connecting if the device is remote. The reverse processes, dismounting
or disconnecting, remove a device from the system. At the FSD level, mounting is
FSD Linkage 153
FSD Registration
The FS_MountVolume and FS_ConnectNetResource functions are installed by
each FSD through one of the registration calls to IFSMgr. Recall that there are
three different types of registration: IFSMgr_RegisterMount, IFSMgr_RegisterNet,
and IFSMgr_RegisterCFSD, corresponding to local FSDs, remote FSDs, and char
acter FSDs. The provider IDs returned by IFSMgr_RegisterMount and IFSMgr_ ·
RegisterNet form a continuous range 0 through 9 for local FSDs and 10 through
17 for remote FSDs. IFSMgr creates a function pointer table, MountVolTable [ ] ,
of 18 entries, where FS_MountVolume and FS_ConnectNetResource addresses are
Character devices store their mount function pointers in a table separate from
local and remote FSDs. The elements in this table are structures with two
members:
typede f s truct { int ( *mnt func ) ( ) ; PathElement * pDevName [ ] ; }
CHARDEV , * PCHARDEV;
The first member, mnifunc, holds the address of the mount function, and the
second member, pDevName, is a pointer to an array of pointers to device names
stored as PathElements. Up to 8 character FSDs can be registered with IFSMgr
and these are stored in an array I've named MountCharTable [ ] . Once a
matching device name is located in MountCharTable [ ] , its accompanying mount
function can be called like this:
Cal l_FSD ( MountCharTable [ i ] . mnt func , IFSFN_CONNECT , p i f s , FALSE )
154 Chapter 8: Anatomy of a File System Driver
initiate a mount operation if that volume is not already mounted. In practice, the
system drive will be accessed first and mounted first, but only after IFSMgr has
initializes its internal data structures to reflect known drives in the system as deter
completed its Device Init phase. It is during the Device Init phase that IFSMgr
mined by examining the DOS CDS array and querying IOS for drive information.
For each such drive detected, a zero-filled volinfo structure is allocated and its
address stored in SysVolTable [ ] . Recall from Figure 6-2 that for each local
volume (volnum 0-31), SysVolTable [ volnum] contains the address of a
volinfo structure. The first member of the volinfo structure, vi_psr, is a pointer
to the volume's shell resource structure (see Appendix C, IFSMgr Data Structures,
for details on the volinfo structure).
The first access to a local drive typically occurs through IFSMgr's Int 21h dispatch
routines. These routines indirectly rely upon a pair of IfSMgr's internal functions
( Gen FSMount IFSReq) The prototype for _NeedMount has this form:
to check if a mount is needed (_NeedMount) and to actually perform the mount
_ _ _ .
If the function returns TRUE, the specified zero-based Drive needs to be mounted.
The variable pifs holds a pointer to the ifsreq structure for the current file system
request, and the variable bChgReset indicates whether the IOS function for media
change reset is to be called.
One indicator that a drive needs to be mounted is given by SysVolTable [ ] . If
the indexed entry is NULL, or if the volinfo member which points to the shell
resource (SysVolTable[drive]->vi_psr) is NULL, the drive needs to be mounted.
After a successful mount, volinfo and shell resource structures are allocated and
initialized.
To do the mounting operation, _Gen_FSMount_IFSReq is called. It has the
prototype:
int _Gen_FSMount_IFSReq ( int Drive , int arg2 )
the FSD which supports the drive. The steps which are taken can be summarized
as follows:
• Allocate an i fsreq structure and initialize its contents
.. If SysVolTable [Drive] is NULL, allocate a volinfo structure and insert it
in SysVolTable [Drive]
FSD Linkage 155
parent drive's shell resource. Three members of a volinfo structure are used to
track the subst drive: vi_drv contains the volume number for the referenced drive,
vi_subst_patb is the null-terminated Unicode string of the complete path to which
the subst drive refers, and vi_leng contains the length of the Unicode string in
bytes. While the creation of such a drive generates IFSMgr_NetFunction (NF_
DRIVEUSE) notifications, there is no underlying call to the parent FSD's FS_
MountVolume entry point. Figure 8-1 shows the relationships between the various
data structures used to track standard and subst local drives.
Sys Vo/Table
Shell Resource
where pifs is a pointer to the ifsreq structure for the current file system request
and wildcards indicates how wildcards are to be treated; a value of 0 for no
FSD Linkage 157
Shell Resource
pDevNames
Figure 8-2. Using a CharSrTable to get device name ("zeta ") and mountfunction (funcl)
The variable pifs holds an i fsreq structure which has been allocated and initial
ized for the function call.
_Dismount_Local_Drives and . the functions it calls attempt to reduce the reference
counts on the various data structures that track the local drives. This involves
closing any open files, reclaiming heap allocations, and ultimately calling FS_
DisconnectResource on each volume.
2. Walk list of local shell resources (a nested walk) for those having a matching
VRP address and a non-zero sr_inUse. Remove any remaining references such
as subst drives.
3. Do a final IoreqDerefConnection which reduces the sr_inUse to zero and
forces a FS_DisconnectResource call on the volume; this call also frees the
shell resource structure if it succeeds, followed by removal of the resource
from the SrTable with adjustment of Head Local Srs
_ _ .
Finally, for each drive which has been removed, perform these steps:
1. Generate an IFSMgr_NetFunction broadcast of type NF DRIVEUNUSE
_
The variable pifs holds a pointer to an i fsreq structure which has been allocated
and initialized for the function call. _Dismount_Char_Devices and the functions it
calls attempt to reduce the reference counts on the various data structures that
track the character device. This involves closing any open handles, reclaiming
heap allocations, and ultimately calling FS_DisconnectResource on each device.
IFSMgr maintains a separate table, CharSrTable, containing addresses of shell
resources for character devices and printers. A one-way linked list threads
through the table. The head for the list is Head Cbar Srs, and starts with the most
_ _
with each device. The steps which are taken at each shell resource in the list can
be summarize.d as follows:
• If the resource has a non-zero sr_inUse and a valid pointer to a chain of fhan
dles, the corresponding handles are closed, thereby reducing the sr_inUse.
• Sr_inUse is decremented.
1 60 Chapter 8: Anatomy of a File System Driver
FSD Connecting
Some examples of connections are mapping a local drive letter to a remote server
and share name, and accessing a remote file by a UNC pathname.
Drive-based connections
When mapping a local drive to a remote drive and directory, the standard connec
tion dialog is displayed in response to the WNetConnectionDialogl APL The
information gathered by this dialog is used by the Multiple Provider Router (MPR)
to route the request to an appropriate Network Provider and call that provider's
NPAddConnection SPI.
The Network Provider then passes the request to the remote FSD, using the Devi
celoControl, IFS_IOCTL_2 1 , interface. As an example, for Microsoft Networks, Int
21h function 5F47h (NetUseAdd, a Lan Manager DOS extension), is called. This
function receives the following register arguments: BX is the level number, either
1 or 2; CX is the size of the use_info structure; and ES:DI is a pointer to the use_
info structure. The use_info structure which is passed to NetUseAdd is either a
use_info_l or a use_info_2 structure, depending on the level of the call. As
part of its argument checking, IFSMgr verifies the size (CX) of the use_info struc
ture to be either 26 bytes for use_info_l or 52 bytes for use_info_2 . This
function is actually handled by IFSMgr's dispatch function dNetFunc. A similar Int
21h function, Make Net Connection, 5F03h, serves the same purpose but uses
different arguments.
The handlers for Int 21h functions 5F03h and 5F47h massage the input parameters
and call a common internal IFSMgr function which I've named _UseAdd. This
function can also be accessed at ring-0 through the service IFSMgr_UseAdd. This
internal function, _UseAdd, is a frontend to a call to IFSMgr_SetupConnection.
The function prototype for _UseAdd takes this form:
_UseAdd ( i fs req* p i f s , void* pinfo , int conns tatus , int bStat i c )
The calling arguments consist of pifs, a pointer to the i fsreq structure; pinfo, a
structure containing information about the mapping; connstatus, an integer
having the value 0 if the resource is setup connected and 1 if the resource is
setup disconnected; bStatic, a Boolean which is 0 if the connection is to be estab
lished at system startup (static), and 1 if the connection is established by the user.
FSD Linkage 1 61
The declaration for the use info 2 structure is given in ifsmgrex.h on the
companion disk. The only members which _UseAdd cares about are ui2_local,
ui2_remote, ui2_password, and ui2_asg_type, whether pinfo points to a use_
info_l. or use_info_2 structure. (Note that the use_info_2 structure given in
the DDK file ifsmgr.inc is not correct.)
_UseAdd performs several preliminaries prior to calling IFSMgr_SetupConnection:
• Validates the local drive (from ui2_locaf) to use in a mapping, and verifies it
is not a drive in use and does not exceed the "last drive" limit; the local drive
number (1-based) is placed into pifs->ifs_drv.
• If a printer port is specified in place of a drive letter, e.g. , LPTl , a drive num
ber is assigned in the range 21h to 29h for LPTl to LPT9 and is placed into
pifs->ifs_drv (it isn't clear how generic character devices are redirected).
• Validates the server name and share name (from ui2_remote) via a call to
IFSMgr_ParsePath; this path must be a UNC path or a path which has been
parsed by a custom parser installed via IFSMgr_SetPathHook; the resultant
ParsedPath is stored to pifs->ir_ppath, e.g. , \ \SERVER\SHARE.
• Allocates a volinfo structure which is stored to SysVolTable[pifs->ifs_drv-1] .
_UseAdd then calls IFSMgr_SetupConnection with these arguments:
IFSMgr_SetupConnection ( p i f s , RESOPT_DEV_ATTACH , RESTYPE_DISK
The contents of the i fsreq structure are modified to reflect the arguments passed
to _UseAdd. This form of connection is referred to as a "drive-based" connection
in the IFS specification.
Now what does IFSMgr_SetupConnection do internally? Without getting into all of
the details and handling . of error and exceptional conditions, here are the basic
steps it takes:
1 . Allocate a block in which to store a shell resource structure.
1 62 Chapter 8: Anatomy of a File System Driver
_UseAdd also clears the ifs_psr member of the ifsreq structure. This step assures
that the connection's reference count is not immediately decremented by a call to
IoreqDerefConnection.
UNG-based connections
The path that we have just traced is the system response to the deliberate
mapping of a drive. IFSMgr_SetupConnection is also called when a UNC path
name is processed by IFSMgr's Int 21h dispatch routines. Many of the dispatch
routines, including dRingO_OpenCreate, dOpenCreate, dMkRmDir, dChDir, dGet
CurDir, dAttribs, dGetVollnfo, dDelete, dGetFullName, dFindFile, dRename,
FSD Linkage 163
dSubst, and dloctl, use IFSMgr's internal function _PathToShRes to convert a path
name, UNC or otherwise, into a shell resource. These "connections on demand"
are made by a call to IFSMgr_SetupConnection, which takes this form:
IFSMgr_SetupConnec tion ( p i f s , RESOPT_UNC_REQUEST , RESTYPE_WILD
This call establishes what the IFS specification refers to as a UNG-based connec
tion. It follows the same basic steps as described above for drive-based
connections. The sr_jlags member of the shell resource for a UNC-based connec
tion has both IFSFH_RES_NETWORK and IFSFH_RES_UNC attributes set.
One of the main differences between a UNC-based and a drive-based connection
is in the way the connection's reference count is maintained. For a UNC-based
connection the reference count is decremented by a call to IoreqDerefConnection
as soon as the dispatch function completes. This happens because the ifs_psr
member of ifsreq is not cleared before returning to the dispatcher. This would
seem to suggest that UNC-based connections only last for the length of a file
system request if the reference count drops to zero. This is not the case and we'll
see why when we look at how UNC-based disconnection occurs.
FSD Disconnecting
Some examples of disconnection are removing a drive letter mapping to a remote
setver and share name and automatic disconnection after a period where a
connection is not used.
Drive-based disconnection
The calling arguments include pif.s, a pointer to the ifsreq structure; drvnum,
the one-based local drive which is to be unmapped; and ForceLevel, the force
level to use for the disconnection. There are four force levels which are inter
preted differently depending on the resource connected to. In the case of a drive
based disconnection, force levels 0 and 1 will fail if there are any open files on
the mapped drive or if it is the current drive, whereas force level 2 closes open
files and then disconnects the drive, but will fail if it is the current drive, and force
level 3 closes open files and disconnects the drive even if it is the current drive.
_UseDel performs the following steps when called with a mapped-drive argument:
PNPT_VOLUME J DBTF_NET )
IFSMgr_PNPEvent ( DBT_DEVICEREMOVECOMPLETE , drvnum ,
UNG-Based Disconnection
UNC-based connections persist as long as the connection's reference count does
not drop to zero. Some actions on a connection keep the connection open until
the actions are explicitly undone, e,g. , opening a file will increment the reference
.
count until a close on that file decrements the reference count.
Other file system requests will only keep the reference count incremented for the
duration of the operation. For example, checking the file attributes on an explicit
UNC pathname will create a UNC-based connection via a call to _PathToShRes.
After the request has been completed, the dispatcher will check for a non-zero ifs_
psr member of the i fsreq structure. If it is non-zero, IoreqDerefConnection will
be called to decrement . its reference count. If the reference count drops to zero,
then the sr_flags for the shell resource are checked for the IFSFH_RES_ UNC
attribute. If this attribute is set, the connection is not immediately disconnected, as
would be the case with a drive-based connection. Instead, the shell resource's
reference count is left as zero to mark the connection for removal.
In order for one of these marked UNC-based connections to get removed it needs
to "age" a few minutes. To handle the aging of these connections and their even
tual removal, IFSMgr schedules a recurring event every 120 seconds. The event
handler walks the list of current connections and looks for two special connection
states. The first state is a UNC-based connection which has a reference count of
zero. When a connection with this state is found it is advanced to the next state.
UNC I Ox02.
If the connection to this particular server and share gets used before it is
removed, the state gets reset on a call to IoreqDerefConnection. However, if the
MONOCFSD is on the companion diskette. This makes a good example for intro
ducing the structure of an FSD since we don't have to wony about IOS or
network protocol details. In the next section, we'll look at an example of a
remote FSD, FSINFILE.
Features
Basically, MONOCFSD is an FSD for a standard 80x25 monochrome display
adapter. It associates a single device name, MONO, with the character device.
Multiple file handles can be opened on MONO. It accepts independent writes on
these separate open handles. Any programming language that supports file open,
file write, and file close can use MONO as an output device. Multiple processes
can write to MONO simultaneously. MONO is equally accessible from Win32,
Win16, and DOSN86 operating environments.
Output to the MONO device is buffered in the driver. A primitive keyboard inter
face allows scrolling of the display using line up, line down, and clear screen
operations, using keys on the numeric keypad.
MONOCFSD fails initialization if a monochrome display adapter is not detected.
Design
The design centers on using a file model to interact with the monochrome display
device. A client uses the MONO device much like one would use stdout, except
that an explicit open is required. Thus for a client to use MONO, an open is
performed, which returns a handle if successful. Output is sent to the device by
performing writes to the handle. A separate line buffer is managed for each
handle. A line will be displayed when either a carriage return and line feed are
received or the 80 character buffer fills. Thus, all screen output is in complete
lines. This allows multiple processes to interleave lines of output. The combined
output of all clients is stored in a 200-line buffer. Normally, only the most recent
25 lines are displayed. A line-up operation will scroll back through the buffer by
one line; a line-down operation will scroll forward through the buffer by one line.
A keyboard interface to the scroll operations is achieved by assigning each to a
· hotkey.
MONOCFSD supports up to 10 clients; this is an arbitrary limit. MONOCFSD loads
as a static VxD.
Implementation
During Device Init phase, MONOCFSD registers with IFSMgr using IFSMgr_Regis
terCFSD, passing it the address of a mount function and the single device name,
MONOCFSD: A Character File System Driver 1 67
given by the PathElement { 10, 'M', 'O', 'N', 'O' }. The first element is the total
ments, with the end of array marked by a NULL pointer. The device name is
length of the array which is 5 * sizeof(WORD) since the characters are in Unicode.
The mount function for MONOCFSD will get called the first time the "MONO"
device is accessed. The source for the mount function is shown in Example 8-4.
The mount function exchanges parameters with IFSMgr using the ioreq structure.
As input, MONOCFSD receives the resource handle that IFSMgr is using to track
this device (what we have referred to as a shell resource). MONOCFSD does not
interpret this handle but does store it away, in ifs_resource_hdl, for possible
future use in calls to certain IFSMgr services. MONOCFSD returns to IFSMgr a
pointer to the structure containing all of the volume-based entry points. This
address is placed in the ioreq member ir_vfunc. This structure is shown in
Example 8-5. The other value returned to IFSMgr is a resource handle known
only to the FSD. This handle is placed in the ioreq member ir_rh. It can be the
address of an internal data structure or other guaranteed unique value. IFSMgr
does not interpret this value, it simply passes it in on calls into MONOCFSD corre
sponding to this particular mount. The FSD can use this value to validate calls and
also to distinguish mounts under different device names. As an example, the
screen might be split into scrolling and non-scrolling regions, and these could be
given separate device names. The non-scrolling screen might be treated as a fixed
size file, using a file seek to position the output cursor. For our needs it is suffi
cient to use the unique integer value 'MONO' .
The volume-based function table in Example 8-5, which supplies the linkage to
IFSMgr, provides a function for every entry in the array. For most of the functions,
the routine FailFsdCall is used. This function sets the ir_error member of the
ioreq structure to ERROR_INVAIJD_FUNCTION and returns that value. This
informs IFSMgr that the function is not implemented. The functions which are
implemented include FS_OpenFile, FS_Ioctl, and FS_Disconnect. Of these, FS_
Disconnect has the simplest implementation; it just sets ir_error to ERROR_
SUCCESS and returns that value. This allows MONO to be dismounted without
returning an error.
The FS_Ioctl function, shown in Example 8-6, is required to support the Int 21h
function 4400h, Get Device Data. For all other Ioctl functions, an ERROR_
INVALID_FUNCTION error code is returned. The ir_flags member of the ioreq
structure contains . the Ioctl subfunction number, and only subfunction 0 is
checked for. Depending on the value of the ir_options member, a pointer to the
client registers structure is retrieved from either ir_data or ir_cregptr (ir_aux2).
Within the client registers, bit 7 of DX is set to 1 to indicate that the handle in BX
refers to a device.
els� {
return pir-> ir_error 1 ) ;
1 70 Chapter 8: Anatomy of a File System Driver
Only two handle-based functions are supported for the MONO device: FS_Write
File and FS_CloseFile. The remainder of the functions in the handle-based
function table (see Example 8-8) call into FailFsdCall , to indicate that they are not
implemented.
else {
return pir->ir_error 1 ) ;
The last function that we'll take a look at is FS_WriteFile, shown in Example 8-10.
On each write, ir_length contains the number of characters written, ir_data
contains a pointer to the buffer containing the characters to be written, and ir_jh
contains the particular MONO handle to which to write the data. The handle in ir_
jh is validated by checking that it is contained in the OpenHandles array. If the
handle is found to be valid, then the handle is cast to a pointer to a line buffer
structure. The characters in the buffer at ir_data are transferred into the line
buffer starting at the current line buffer index. If a carriage return/line feed pair is
encountered or if the line buffer fills (80 characters), the accumulated line is
written to the monochrome monitor, using the MonoPrint function. The index
into the line buffer is then reset to the beginning and the process continues until
ir_length is exhausted. Multiple writes to a handle may be made before the assem
bled line is actually written to the monitor.
Using MONOCFSD
To illustrate how one might use MONOCFSD, we'll show typical usage from a C
program. First, the device must be opened using statements like the following:
FILE* fMono ;
fMono = fopen ( " mono " , " r+ " ) ;
Then, at points where output is to be displayed, any of the standard C stream 1/0
functions could be used with the fMono stream. For example, the following lines
output a single line of text:
fprint f ( fmono , " In func tion % s , SomeVariable=%lx \ n " ,
" SomeFunc " , SomeVar ) ;
f f lush ( fmono ) ;
Since stream 1/0 is buffered by default, fflush forces the text to be written immedi
ately. Another way to accomplish this is to use the functions setbuf or setvbuf to
disable buffering for the stream. Finally, the program wou}d release the MONO
handle with a call to fdose. MONO might also be used from a DOS box as a
target for redirection, as in the command dir > mono.
Features
FSINFILE creates a file called fsif. bin in the windows directory. The creation of
and reads and writes to this file are done using the IFSMgr_RingOFileIO service.
Internally, fsif.bin contains the structure of a simple file system. It is divided into
three sections: allocation bitmap, root directory entries, and user space. The unit
of user space is a 512 byte sector. For each sector in user space, there corre
sponds a single bit in the allocation bitmap. If a bit is set, the sector is allocated;
otherwise it is free. Directory entries hold the 8.3 names of files which are stored
in user space, as well as a creation date and time, size, attributes, and a map of
allocated sectors. This is not a "production" file system, but it does provide a great
test-bed for experimenting with FSD functions and exploring interactions with
IFSMgr. A production remote file system would also supply a Network Provider
DLL to support drive enumeration and other WNet functions.
Implementation Notes
The source code for FSINFILE is amply documented, so refer to the companion
disk for complete information. Here, I will just single out one aspect of its imple
mentation that is a little unusual. The file system registers through IFSMgr_
RegisterNet as a network FSD. I use a "bogus" Net ID, i.e. , a value which lies
outside the range of currently assigned networks. This registration returns a
provider ID which is used with subsequent IFSMgr services.
If you think about it, a remote FSD just maps local file operations to operations in
another domain. This applies equally well to our situation except instead of our
file system residing on another machine across the network, it resides on our
machine and it is embedded in a local file.
The main reason for using this approach is that it is the simplest way to create a
drive. IFSMgr provides facilities which make connections to network drives easy
to setup and tear down. This facility is supported through the services IFSMgr_
UserAdd and IFSMgr_InitUseAdd. I use the latter because it allows us to create the
drive implicitly at system startup by assigning it the next available drive in the
range of available drives as shown in Figure 8-3 (the upper limit is set by the Last
Drive command if it is issued in config.sys, otherwise the default is either 26, or if
you have the Netware client installed, 32). IFSMgr_InitUseAdd uses the supplied
provider ID and use_info_2 structure to create a properly formed IFS request to
the service IFSMgr_SetupConnection. The latter prepares the FS_ConnectNetRe
source call into the FSD which matches the provider ID. This initial call is used to
1 74 Chapter 8: Anatomy of a File System Driver
liEI Printers
System Folder
mount our file system, by either creating or opening the file fsif. bin and initial
izing the file system's internal state.
VFA T: The
Virtu al FAT File
System Driver
The FAT file system was invented in 1977 as a method for storing data on floppy
disks for Microsoft Stand-Alone Disk BASIC. It achieved wider usage in 1981 as
the floppy disk storage mechanism used by MS-DOS Version 1 shipped with the
first IBM PC. At that time, the OS code ran in 8 KB of memory and 5.25" floppy
disk media only had a single level directory. With the introduction in 1982 of the
IBM PC-XT with a 10 MB fixed disk, MS�DOS underwent a major revision. In MS
DOS Version 2, we saw the introduction of a hierarchical directory structure,
support for fixed disks as well as floppy disks, and a UNIX-like handle-based file
structure. Filenames were a maximum of 8 characters long with a 3 character
extension and a pathname could be up to 64 characters long. Since then, the
various releases of MS-DOS have extended support for larger and larger hard
disks, but much of the underlying file structure has remained unchanged.
VFAT was introduced with Windows for Workgroups Version 3 . 1 1 . Up until that
time, the manipulation of file system structures in Windows 3.x was done by MS
DOS code executing in virtual-86 mode. Although the actual FAT file structures
on the disk still mirrored those of MS-DOS 5 and 6.x, VFAT and IFSMgr provided
file system services that executed in ring-0 protected mode.
The latest version of VFAT which accompanied the rollout of Windows 95 goes
further by making some changes to the FAT file structures on the disk in order to
support long filenames. Even more recent changes to VFAT, in OEM Service
Release 2 (October 1996), increased the size of entries in the file allocation from
1 75
1 76 Chapter 9: VFAT.· The Virtual FAT File System Driver
16 bits to 32 bits, thereby increasing the maximum allowable drive size to 2047
gigabytes.
The role of VFAT is to control reads from and writes to the disk in accordance
with the FAT file structure. It understands how to convert a pathname into the
chain of disk clusters and then return the contents of those sectors. Or, it can
reverse direction and create long filename directory entries from a pathname and
allocate clusters of storage and save a file's image within them. Before we dig into
some aspects of VFAT's implementation, let's review the FAT file structure. In
large measure, the DOS 6.x structure remains the same in Windows 95.
A volume will usually contain space for two FATs which are mirror images of
each other. The extra FAT is used to detect disk corruption and allows recovery
from some minor FAT problems.
Following the two FATs, space is set aside for the root directory entries. This is a
part of the disk structure that has undergone some change with the Windows 95
version of VFAT. We will take a closer look at directory entries below. The space
following the root directory entries is available for user data, and the first sector
here marks the beginning of cluster number 2.
The boot record is always present as the first sector whether the volume is boot
able or not. In addition to containing the OS boot code, it begins with a
BOOTSECTOR structure which describes the layout of the disk volume. This
includes such parameters as the size of a sector, the size of a cluster, the number
of sectors used up by the FAT, the number of entries in the root directory, and
the total number of sectors in the volume.
The information in the boot record is sufficient to delineate the starting positions
of all of the important volume structures. The diskette accompanying this book
contains the utility DUMPDISK, which displays the contents of the boot record,
portions of the FATs, and the root directory entries for a fixed or floppy diskette.
It is a Win32 console application (see source on the diskette) that illustrates use of
the DeviceloControl interface to VWIN32 to do direct disk reads. Some sample
output from DISKDUMP is shown in Example 9-1 . In this particular example, a
fixed disk of 455 MB, sectors 0 through 467 are set aside for the boot record, the
FATs, and the root directory entries. The first sector available for allocation to files
and subdirectories is at 468.
Note that the Chksum column is blank except for the longname entry "Program
Files." The checksum is only used on longname entries; however, the checksum
is calculated on its associated alias entry (which follows on the next line).
Windows 95 Directory Entries 1 79
LONGDIRENTRY , * PLONGDIRENTRY ;
Since each longname entry can hold 13 characters, if a filename is longer than
that, additional longname entries are needed to store the additional characters.
180 Chapter 9: VFAT.· '!he Virtual FAT File System Driver
The first byte in a longname entry serves as an integral sequence number starting
at 1 . The sequence number of the last longname entry is ORed with 40h. A typical
sequence of longname entries is shown in Example 9-3.
C i lename - -vshr le
2 ee_direntry_f - -vshr le
1 Thi s_is_a_thr - -vshr le
THIS_I-1 a- - - - - 0 5 - 2 0 - 9 6 1 6 : 5 7 : 0 8 0 5 - 2 0 - 9 6 247c 8
This sample sequence of entries (shown in Example 9-3) consists of three long
name entries followed by a single alias entry. The filename which is spread over
the three longname entries is Tbis_is_a_three_direntry_ftlename. The sequence
numbers are 1 , 2, and C (43h). Adjacent to the first longname entry is the alias
entry which contains an 8.3 format name, 1HIS_I-1, which is a capitalized and
compressed version of the long filename. The alias entry is crucial for recording
the actual attributes, creation date/time, starting cluster, and file size. The
checksum value which is stored in the longname entry is computed on the alias
name. This provides a means for reconciling a longname entry with an alias entry.
The first thing you'll notice is that the boot sector has expanded. Actually, the
SDK does not define a BOOTSECTOR structure as was the case with MSDOS.
Instead you have to piece together a "BOOTSECTOR32" structure like this:
typede f struct _BOOTSECT3 2
BYTE bsJump [ 3 ) ; I I j mp ins truc tion
char bsOemName [ 8 ) ; I I OEM name and ver s i on
BOOTSECTOR3 2 , * PBOOTSECTOR3 2 ;
The structure named A_BF_BPB is a new expanded BPB (BIOS Parameter Block)
for FAT32. It is documented in the SDK and it is this portion of the
BOOTSECTOR32 structure where the change has occurred. If you look back at
the DUMPDISK output, a range of entries in the BOOTSECTOR area are marked
with asterisks. These members are either new to the FAT32 BPB or are "widened"
members, i.e. they have expanded from 16 to 32 bits. The Reserved Sectors entry
tells us the number of sectors before the start of the first FAT; in this case it is 20h
or 32 sectors. On this particular drive, only 6 of these sectors are put to use. Four
of these sectors are used for the boot sector, two for a primary copy and two for
/OS and the Layered Driver Model 183
a backup copy. Two sectors are now needed for the boot sector because the BPB
has expanded in size causing the boot code to spill over to another sector.
The other two sectors are for a primary and a backup copy of a FS INFO sector.
The SDK describes the structure in this way:
. . . there is a sector in the reserved area on FAT32 drives that contains values for
the count of free clusters and the cluster number of the most recently allocated
cluster. These values are members of the BIGFATBOOTFSINFO (FAT32) structure
which is contained within this sector. These additional fields allow the system to
initialize the values without having to read the entire file allocation table.
This sector is sandwiched between the two boot sectors on this particular drive.
Another peculiarity about FAT32 partitions is that the BPB indicates they have 0
root directory entries. Instead of specifying a fixed number of entries, a FAT32
root directory is treated like a file. It has a minimum size consisting of a single
starting cluster but can be expanded by adding more clusters to its chain. Note
that for the example FAT32 DUMPDISK output above, the first available cluster on
the drive is also the first cluster of the root directory.
As its name implies, FAT32 File Allocation Tables contain 32-bit cluster numbers.
The SDK notes that " . . . the high 4 bits of the 32-bit values in the FAT32 file alloca
tion table are reserved and are not part of the cluster number. Applications that
directly read a FAT32 file allocation table must mask off these bits and preserve
them when writing new values. " The first cluster which can be allocated is
number 2. A look at the FAT tables reveals a Oxffffffff at this location; this signifies
the end of a cluster chain.
The sample code for DUMPDISK illustrates some techniques for determining
whether a system supports FAT32 and whether a particular drive is a FAT32 drive.
It also includes typedefs for some of the FAT32 data structures.
occupied by drivers that handle physical aspects of disk 1/0 and are referred to in
the DOK as the Layered Block Device Drivers. The layered driver model in
Windows 95 is implemented in a VxD called the 1/0 Supervisor (IOS).
The subject of the IOS could easily fill another book.* Here, we will be content
with addressing only two aspects of IOS: the types of drivers which make up the
layered model and the role which IOS serves.
Some of the common types of block device drivers which Windows 95 uses fall
into these categories, arranged from highest to lowest:
The volume tracking driver, or VTD , makes sure that the target drive for an
Volume trackers
incoming request matches the media that is actually in the drive . . The VTD is
only needed for drives which have removable media, e.g. , floppy drives and
CD-ROMs.
Type-specific drivers
All devices of a certain class have a common type specific driver, or TSO. The
TSO is responsible for casting the logical view of a device, as it is viewed
from an FSD, into its physical view. This might involve translating a logical
block address into the physical head, cylinder, and sector. TSDs also know
about drive partitions and are able to match up a volume identifier with a sub
section of a fixed disk as defined in its master boot record.
Vendor-supplied drivers
Several slots in the hierarchy are set aside for vendor supplied drivers, or
VSDs. This is a provision for adding vendor specific functionality for a device
by inserting an auxiliary driver in the path of 1/0 requests.
SCSI manager and miniport drivers
The SCSI device architecture is inherited from Windows NT. The SCSI
Manager is a device independent layer that abstracts the behavior of SCSI
controller cards. A miniport driver is the lower layer which supports the SCSI
manager for a specific type of SCSI adapter.
Port drivers
For non-SCSI controller cards a port driver is required. The port driver
controls the hardware. It does such things as write to I/0 ports, program
OMA transfers, and service hardware interrupts, in order to take control of a
disk drive or other device which is attached. Port drivers are also inherited
from Windows NT.
dows 95, Chapter is, "Block Device Drivers . " For a higher-level account, see Chapter 7, "The Filesystem,"
• For more extensive coverage, see the DOK, and Walter Oney's book, Systems Programming for Win
Real-mode mappers
In cases where no protected-mode driver exists for a piece of hardware, calls
to a real-mode driver are passed from protected mode to real mode using this
type of driver.
Here are a couple of examples. A standard floppy disk drive is represented by
drivers from three layers. It has a volume tracking driver for detecting a media
change; a disk-type specific driver, and a port driver for an NEC floppy controller
card. An IDE fixed disk also has three layer drivers. It has a disk-type specfic
driver, a miscellaneous port driver (layer 19), and a port driver for an IDE
controller card.
Three basic services which IOS supplies to clients are IOS_Register, IOS_SendCom
mand, and IOS_Requestor_Service. IOS_Register is the means by which IOS
becomes aware of a driver. It receives a DRP (Driver Registration Packet) struc
packet, or IOP, is passed through this interface and IOS routes it through the
driver layers using the calldown chain. The IOS_Requestor_Service interface
supplies a number of utility functions for clients.
As drivers initialize during startup, each driver for a device specifies the level at
which it wishes to be called in the layered hierarchy. In response, IOS builds a
1/0 request is routed to a device, the order of the functions in the calldown chain
chain of target functions, the calldown chain, in the correct order. Later, when an
receives an I/0 packet, it decides what to do with it; it may decide to pass it
determines the order in which the layered drivers will be called. When a driver
down the chain, or possibly complete the request and not pass it down.
The DRP_LGN member specifies the driver's load group and initialization layer.
Each bit of DRP_LGN corresponds to one of 32 initialization layers. The lower the
bit, the higher the layer and the later it will be initialized. At the top of the hier
archy is IFS manager, followed by FSDs, etc. The DRP_LGN value also informs
. IOS of the driver's registration type. Noncompliant registration is used for FSD's
and IFS drivers; this means the driver will · not receive AEP (asynchronous event
packet) notifications at its asynchronous event routine. Since VFAT supplies an
asynchronous event routine (Aysnc_Event_Rtn), it uses a load group of DRP_ TSD,
giving it the same initialization order as a type specific driver.
The DRP_ilb member supplies the address of an ILB (IOS linkage block) struc
ture, which IOS will fill in before returning. The members of this structure are
shown in Example 9-6. This structure contains several IOS entry points for
requesting services.
ebp_pir->ws . ior_error = O ;
ebp_pir->ws . b3 5 = O ;
ebp_pir->ws . hi_op t i ons = O ;
ebp_pi r - >ws . w3 8 = O ;
retc = _MountVol ( ) ;
ebp_pir->ir_vfunc = VolFunc ;
_Release_Leve1 2 ( ) ;
_asm btr dword ptr D1_9E 6 6 , 0 0
de faul t :
retc = ERROR_INVALID_FUNCTION ;
break ;
of the DOS DPB (Disk Parameter Block) chain, ir_mntdro (ir_aux2) contains the
drive number of the volume to be mounted, and ir_jh contains the address of
IFSMgr's as yet unfilled shell resource structure. On return, ir_rh will contain
VFAT's resource handle for the volume and ir_vfunc, will contain the address of
the table of volume-based entry points.
The first four lines in FS_MountVolume (see Example 9-7) initialize members of
the structure WS. Recall that the irJsd member of ioreq is a 64-byte "provider
work space" for use by FSDs. VFAT puts this entire area to use.
Most of the logic for mounting the volume is implemented in the routine
_MountVol. It reads the first logical sector of the volume, which should be a DOS
boot sector. Using the BOOTSECTOR structure (see the Microsoft MS-DOS
Programmer's Reference, Version 5 or newer, for a description of this structure) at
the beginning of this sector, VFAT creates a Resource Block structure for the
volume and adds it to a doubly-linked list of such structures. The C volume gets
special treatment; if it is being mounted, the DOS DPB structure is compared field
by-field with corresponding members of the Resource Block structure. If there is a
mismatch, an error message is displayed via VMM's Fatal_Error_Handler service.
(1/0 Request Packet). An IOS service request is made by pushing the address of
as using the BIOS Int 1 3h interface. The first step is to ask IOS to allocate an IOP
an ISP (IOS Services Packet) on the stack and calling the address of the IOS
service routine in the ILB_service_rtn member of the ILB.
The form and content of the ISP varies from service to service. Example 9-8
shows how the ISP is structured for an ISP_CREATE_IOP service.
The first two members of this structure are common to all ISP structures, and the
remaining members are unique to the Create IOP call. The following members are
initialized prior to making the call: ISPJune is set to ISP_CREATE_IOP, ISP_IOP_
size is set to pVRP->VRP_max_req_size, ISP_delta_to_ior is set to pVRP->VRP_
Mounting a VFAT Volume 189
I'll confine our discussion to just the elements of IOR that are initialized for the
boot sector read; for more details on the IOR structure see the Windows 95 DDK
documentation for layered block drivers.
The bits which are not set in the IOR_.flags member are more revealing than those
which are. !ORF_ CHAR_ COMMAND flag clear implies IOR_:xfer_count refers to
sectors rather than bytes. IORF_SYNC_COMMAND flag clear implies that the
command is asynchronous and IOR_callback is called on completion. !ORF_
LOGICAL_STAR.T_SECTOR flag clear implies that IOR_start_addr is a physical
address which is in the range pVRP->VRP_partition_offset to pVRP->VRP_
partition_offset + total sectors in the volume.
The address of the IOR is placed in the ESI register and EDI is set to the address
of the DCB (Device Control Block) for the physical device which holds the
volume. Then the IOS_SendCommand service is invoked to perform the read. This
190 Chapter 9: VFAT.· The Virtual FAT File System Driver
call sets the wheels in motion by passing the request down through the layers of
the IOS subsystem. Before the disk access is completed, IOS_SendCommand will
return, since VFAT made an asynchronous request.
Upon return from IOS_SendCommand, VFAT suspends the current thread until
IOR_callback is called. To coordinate the suspension and resumption of the
thread, the first two doubleword elements of IOR's _ureq member are used; the
first doubleword is used as a simple flag and the address of the second double
word serves as a blocking identifier.
Example 9-10 shows the code used to suspend the thread. Interrupts are disabled
to assure that the test and call to block are treated as an "atomic" operation. The .
doublewords at EBX+2Ch and EBX+30h are elements in the _ureq member of
IOR. Bit 0 of the first element is set by the callback handler once the requested
service completes. So on the first execution of this loop, the bit test will return
with the carry flag clear, and the function Cli_Block_Thread will be called. This
function takes the address . of a blocking identifier; it increments the contents of
that address and then calls IFSMgr_Block. IFSMgr_Block, in tum, is a wrapper for
the VMM service _BlockOnID, which is passed the same blocking identifier and
the flags BLOCK_ENABLE_INTS and BLOCK_SVC_INTS. These flags force inter
rupts to be re-enabled.
The function Cli_Block_Thread will not return until the blocking identifier is
signaled. This, of course, is done in the callback handler and the code fragment
which achieves this is shown in Example 9-1 1 . The Wakeup_Thread function is a
wrapper to IFSMgr_Wakeup which in turn, is a call to _SignallD with the given
blocking ID.
When control does return from Cli_Block_Thread, the bit test will set the carry
flag, and execution will resume at the label continue. The IOR_status member
will then reveal whether the request was successful. If an error is reported by
IOS, _MountVol calls the IOS service IOSMapIORSToI2 1 , to convert the error code
into an equivalent Int 2lh error code before returning.
i f ( ( action ! = 2 ) 1 1
51 i f ( carry_flag )
52
Opening a VFAT File-Top Level 195
Since we are tracing the open of an existing file, the function _OpenE:xisting will
be called. If the function succeeds, the cany flag will be clear on return and the
action variable will be assigned the return value ACTION_ OPENED, and execution
will continue at line 1 (with the label store_results) in Example 9-14. If the
cany flag is set on return, the open failed and execution continues at line 26
(with the label error_exit) in Example 9-14.
After a successful open of an existing file, return values are extracted from the
VFAT File Instance Block and File Open Block structures. These values are stored
to the ir_size, ir_dostime, and ir_attr members of the ioreq structure. The value
of the action variable, returned by _OpenExisting, is stored to ir_options.
The common cleanup code starts at line 27 where the first if clause checks if an
allocation needs to be freed or just placed on the free list. Then at line 36, the
current error code value (O if no error) is stored to the ir_error member of the
ioreq structure. At line 37, a check is made to see if the file open was for a ring-
0 swapper file or memory-mapped file; if so special action is taken here.
This top level view of FS_OpenFile reveals some interesting aspects of VFAT's
implementation, but we need to descend to lower levels to see how the file is
located on the disk and to learn more about the File Instance and Open Block
structures.
This function starts out by extracting the path-parsing flags which were passed
into FS_OpenFile in the upper word of the ir_attr member of the ioreq structure.
This is accomplished by the call to _Init_PathAttribs on line 8. The path-parsing
flags as well as other path-related attributes are combined into ebp_pir->ws.path_
attribs, a word-sized member of the ioreq's working area, WS structure.
Next, on lines 10 through 13, the validity of the access and sharing modes is veri
fied. If invalid values are detected here, the error code ERROR_INVALID_ACCESS
(OxOc) is returned to the caller and the carry flag is set. These operations are
combined in the macro return_carry ( ) .
At line 15, the EAX register is initialized with the address of ir_ppath, the pointer
to the Parsed.Path structure for the canonicalized input filename. A special case
is checked at line 16, where this address is NULL, signifying an open using an
SFTOpeninfo structure.. In this situation, the address of this structure is contained
in the ir_uFName member of the ioreq structure. This is passed via the ESI
register to the function _SFT_Open, where the file is opened not by pathname,
but by logical cluster number, directory entry index, and an 8.3 FCB-style name.
The IFS specification states that, "This special kind of open is issued by the IFS
manager when it is taking over a file handle left open by a TSR before booting
into Windows. " We are more interested in the other half of the if clause which
starts at line 22.
The _FindPath function, which is called at line 22, attempts to walk the disk
through each of the path elements in the ir_ppath member of the ioreq struc
ture. It follows a sequence like this: For each path component, starting from the
Locating a Directory Entry 199
root, locate the directory entry for the path component (using the function _Find
DirEntry). A "located" path component has a pointer to a cache buffer containing
the corresponding directory entry. The starting cluster of the directory entry is
then used to retrieve the next directory level, where an attempt is made to locate
the next path component. This process is repeated for all the components in the
path and ultimately, if a filename is specified, it is searched for in the last located
directory.
_FindPath also makes use of the- Path Cache and the Name Cache. Before starting
to walk the disk for a pathname, it consults the Path Cache to see if it holds an
entry for the path portion of a filename. If it finds an entry, the starting cluster for
the specified directory is returned, thereby saving one or more directory entry
traversals. Similarly, the Name Cache is consulted to see if it has an entry for the
filename portion of the pathname. If it does, the starting cluster and directory
entry index for the file are used to vector more directly to the file's contents.
Eventually, when _FindPath returns, the EBX register contains a pointer to the
directory entry structure for the file, if the search was successful. An error return is
indicated by setting either the cany flag, the zero flag, or both, and returning an
error code. On a successful return, the attribute byte in the directory entry is
checked for read-only attributes (see line 31). If this is true, then some special
actions are taken in lines 32 through 40.
The next significant event occurs at line 43. Here, the call to _Add_Open_Instance
uses the information in the file's directory entry to fill in VFAT's file structures.
The first of these structures is a File Instance Block; the address of this block
becomes VFAT's file handle which is returned in the ir_jh member of ioreq. The
second structure is an Open File Block, which is added to VFAT's table of open
files. Only one Open File Block is created for each unique file, whereas a sepa
rate File Instance Block a created for each file open or create. Note that Vcache_
Hold and Vcache_Unhold calls are used to make sure that the cache block for the
directory entry is not discarded while it is in use during the _Add_Open_Instance
call.
Finally, before returning from _OpenExisting, some of the file is loaded into the
cache. This is accomplished by the call to _ReadAhead at line 57.
On entry, _FindDirEntry clears and initializes its workspace buffer, null terminates
the path element it receives, and then makes an initial read from the specified
start sector. The read may actually be avoided if the sector is found in the cache.
Following this initialization, the search loop begins. Here are the various steps
taken:
Next entry:
• If the first byte of the directory entry is 0, then the end of the used portion of
the directory has been reached. Go to Match failed.
• Examine the attribute byte of the directory entry in the cache buffer; if it is a
Ofh attribute, go to Long entry. Otherwise, go to Short entry.
Short entry:
• Copy the 8.3 BCS (byte character set) filename and extension from the direc
tory entry to the workspace buffer.
• Create a Unicode FCB style name using IFS manager's .BCSToUni service to
convert the BCS filename and extension.
• Use the IFS manager service FcbToShort to convert the Unicode FCB style
name to a Unicode 8.3 name with a dot separating primary and extension
components.
• If a longname buffer exists which has been created from long directory
entries preceding the alias directory entry, go to Alias entry.
• Now use the IFS manager service, IFSMgr_MetaMatch, to compare the input
Unicode path component with the Unicode 8.3 name created from the direc
tory entry. For this example, the UFLG_NT flag is passed to this service to
select NT matching semantics.
If a match is found go to Match attributes; otherwise, continue at the label
Increment entry.
Long entry:
• If this directory entry has the last-in-sequence indicator (it is encountered
first), the number of directory entries in this sequence is determined from the
first byte of the entry and stored as a counter. The checksum byte for . the
shortname alias is also saved.
Locating a Directory Entry 201
• For all long directory entries, append the Unicode characters in the fields of
the directory entry to a longname buffer and decrement the entry count. If
the directo.ry entry does not have the last-in-sequence indicator, compare its
checksum against that which was initially saved. Go to Increment entry.
Alias entry:
• A checksum is calculated on the 1 1-character name in the alias directory entry
and it compared against the value found in the preceding long directory
entries.
• If the path component is a filename, and the path portion was added to the
Path Cache, then the filename portion is added to the Name Cache.
• Now use the IFS manager service IFSMgr_MetaMatch to compare the input
Unicode path component with the long filename created from the one or
more long directory entries. For this example, the UFLG_NT flag is passed to
this service to select NT matching semantics. If this match succeeds, perform
an uppercase comparison with the alias name up until the first "-" character
is encountered. If this also succeeds, go to Match attributes.
• If the previous compare fails, use IFSMgr_MetaMatch to compare the input
Unicode path component with the Unicode 8.3 name created from the alias
directory entry. For this example, the UFLG_NT flag is passed to this service to
select NT matching semantics. If this match succeeds, go to Match attributes,
otherwise go to Increment entry.
Match attributes:
• If the directory attributes match the input criteria, then go to Match return;
otherwise go to Increment entry.
Increment entry:
• The directory index is incremented and the cache buffer pointer is advanced
to the next directory entry. If the cache pointer exceeds the cache block
range, then the cache block for the next sector will have to be filled.
• If the end of the directory is reached go to Match failed; otherwise go to Next
entry.
Match return:
• Replace the null termination of the path component with the original value.
• Set EAX to 0.
• The EBX register points to the short or alias directory entry for the match.
Match failed:
• Replace the null termination of the path component with the original value.
• Set carry flag to indicate failure.
202 Chapter 9: VFAT: The Virtual FAT File System Driver
0 unused ?
4 unused ?
8 POPEN_BLK f irs t_open_block
c POPEN_BLK las t_open_block
When a file is opened, an OPEN_BLK structure is created for it and the first INST_
BLK structure is created to reference it. As new file handles are requested on the
open file, additional INST_BLK structures are created to reference the single OPEN_
BLK structure. Initially, the pfirst_inst and plast_inst members of the OPEN_BLK
point to the single INST_BLK structure. As new instances of the file are opened,
each new INST_BLK is added to the head of the list at pfirst_inst. The INST_BLK
structure contains pnext and pprev members for traversing forwards and back
wards through the list of instances. The last pnext pointer and the first pprev
pointer point to the referenced OPEN_BLK structure. There is also a pob member
which points to the common OPEN_BLK structure.
To determine if the open should succeed VFAT calls the service IFSMgr_CheckAc
cessCon.flict. One of the arguments to this service is the address of an
enumeration function. This function is called by IFSMgr for each open instance of
204 Chapter 9: VFAT.· The Virtual FAT File System Driver
the file. On each call to the enumeration function, VFAT returns information
about an instance of the open file. The enumeration function returns 1 for enumer
ation to continue and 0 for enumeration to stop. When the enumeration is
complete, IFSMgr_CheckAccessConflict returns 0 if the desired access and sharing
mode can be granted, or an error code if not.
Virtu al Memory,
the Paging File,
and Pagers
Virtual memory and paging have been the topics of numerous texts. If you would
like some background in these areas, I recommend Operating System Concepts, by
Abraham Silberschatz and Peter Galvin (Addison-Wesley, March 1994), especially
Chapter 8 on memory management and Chapter 9 on virtual memory. Paging in
Windows 95 is, of course, dependent on hardware support in the x86 family of
microprocessors. Many books have described the details of page directories, page
tables, and page faults of the Intel microprocessors-Programming the 80386 by
John Crawford and Patrick Gelsinger is one that I refer to frequently. This back
ground is really essential to understanding this chapter, although I'll throw in a
brief refresher for some of the thornier topics.
Paging is not new to Windows 95. Earlier versions of Windows utilized the paging
capability of the 386 and 486. Andrew Schulman's article, "Exploring Demand
Paged Virtual Memory in Windows Enhanced Mode," in Microsoft System journal,
December 1992, examines paging in Windows 3. 1 . More recently, Matt Pietrek, in
Chapter 5 of his book, Windows 95 System Programming Secrets, looks at
memory paging as a prelude to his in-depth discussion of Win32 memory
management.
205
206 Chapter 10: Virtual Memory, the Paging File, and Pagers
the root directory with system attributes and accessed via either the Windows
block device driver or Int 13h. In the \ windows directory another file was created
called spartpar, which gave the size and location of 386partpar.
Windows 3.x also had the option to use a temporary swap file which it created
while Windows was running and deleted automatically on exit. It also could grow
or shrink as necessary. This was a DOS file with normal attributes, called
win386.swp. Since access was via Int 21h in virtual-86 mode, performance
suffered compared to the fixed file option. Although the temporary swap file was
not a popular option with Windows 3.x users, it is the only option available in
Windows 95.
Paging or Swapping?
A leisurely scan of the Microsoft Windows 95 Resource Kit reveals several ref
erences to the Windows 95 swap file. For instance, in Chapter 17 on Perfor
mance Tuning, there is a section on "Optimizing the Swap File, " and in Chapter
31 on Windows 95 Architecture there is a section on "Windows 95 Swap File. "
The file that is being referred to is stored under the filename win386.swp. The
term swapping has traditionally referred to the process of moving entire pro
cesses to and from the disk (see Operating System Concepts, pp. 303-304). This
is not the mechanism used by Windows 95. The technically correct term is pag
ing. The distinction is that a pager moves page-sized chunks (4096 bytes) of
code or data to main memory from the disk but only when that page is needed.
On the other hand, a swapper brings in the code and data for the entire pro
cess, while moving a process to disk to make room. You will see the terms
swapping and paging used interchangeably in Windows 95 documentation.
1 . Launch MultiMon.
2. Select only the FSHook and BOOTMGR monitors in the Add/Remove Drivers
dialog that you get from the Options Menu, Add/Remove Drivers . . .
command. FSHook will allow us to capture file system events and, when
used in conjunction with BOOTMGR, we can capture events during system
startup.
3. Bring up the Filter Options dialog by clicking the Filters button on the
toolbar. Select "IFSMgr Filehook" and then check the boxes for the following
The Windows 95 Paging File 207
FS_OpeDFi l e d 5 l 5 0
.... SysCr i t I D i t
••••
FS_Vr i teFi l e d 6 l 5 0 1
Device I D i t
-1--
FS_Ge t D i sk I D 3 6 1 0 1
VFAT 0 2 0 0 oa spD C : ,V I R3 8 6 . SVP
FS_Ge t D i sk I D 3 6 1 0 1
-- -- VFAT 0 2 0 0 -sD OH81 0 0 0 0 0 H
--1-- VFAT C 0 3 3 de 0 0 0
-1--
FS_Ge t D i sk I D 3 6 1 0 1
VFAT C 0 3 3de0 0 0
•••• IDi tCoap l e t e
FS_Ge t D i sk I D 3 6 1 0 1
--1--
FS_Ge t D i sk I D 3 6 1 0 1
VFAT C 0 3 3 de0 0 0
-1--
FS_Ge t D i sk I D 3 6 1 0 1
VFAT C 0 3 3 de 0 0 0
-1--
FS_Ge t D i sk I D 3 6 1 0 1
VFAT C 0 3 3de0 0 0
-- 1--
FS_Ge t D i skID 3 6 1 0 1
VFAT C 0 3 3 de 0 0 0
-- 1--
FS_Ge t D i sk I D 3 6 1 0 1
VFAT C 0 3 3 de 0 0 0
--1--
FS_Ge t D i sk I D 3 6 1 0 1
VFAT C 0 3 3de0 0 0
--1--
FS_Ge t D i sk I D 3 6 1 0 1
VFAT C 0 3 3 de0 0 0
--1--
FS_Ge t D iskID 3 6 1 0 1
VFAT C 0 3 3 de0 0 0
--1--
FS_Ge t D i skID 3 6 1 0 1
VFAT C 0 3 3 de 0 0 0
-- 1--
FS_Ge t D i skID l c l O l
VFAT C 0 3 3 de 0 0 0
--1 --
FS_Ge t D i sk I D 3 6 1 5 0
VFAT C 0 3 3de0 0 0
--1 --
FS_Vr i teFi le d6 l 5 0
VFAT C 0 3 3 de 0 0 0
--1--
FS_Ge t D i sk I D 3 6 1 5 0
VFAT C 0 3 3de0 0 0
--1-- VFAT 0 2 0 0 -SD OH81 8 0 0 0 0 H
--1-- VFAT C 0 3 3 5e0 0 0
FS_Vr i t eF i l e _d.�J��- --1==---. . . ___VFAT 0 2 0 0 -SD OH82 0 0 0 0 0 H
f '.. . , :t
In Figure 10-1, groups of lines are separated by tags that BOOTMGR inserts to
flag the stages of system initialization: " * * * * Deviceinit'', " * * * * InitCom
plete", etc. The third line in the listing shows an FS_OpenFile command being
sent to VFAT for the file named c: \ win386.swp. The field d5 I 50 indicates the
dispatched command and accompanying flags. Referring back to Chapter 6,
Dispatching File System Requests, we know that the command d5 corresponds to a
ring-0 open or create, the function I named dRO_OpenCreate (see Table 6-5). The
flags byte 50 signifies the LFN and IFSMgr_RingO_FileIO bits. These pieces of infor
mation point to a IFSMgr_RingO_FileIO call and in this case the subfunction RO_
OPENCREATEFILE.
We can read more into this call from the flags which accompany the open. The
characters "oa" signify ACI10N_OPENALWA YS, meaning open an existing file but
208 Chapter 10: Virtual Memory, the Paging File, and Pagers
if it doesn't exist, create it. The special options "spn" are "s" for RO_SWAPPER_
CALL, "p" for OPEN_FLAGS_NO_COMPRESS, and "n" for OPEN_FLAGS_NO_
CACHE Another thing to note is that the value 200h (ir_sfn) is the first value in
the range of extended file handles.
Scanning down the listing, you will also note a few FS_WriteFile calls on this
extended file handle using "-sn" attributes: RO_SWAPPER_CALL and RO_NO_
CACHE It's interesting that the length of the writes is 0 but the position of the
write is not, e.g. , OH@ lOOOOOH. This initial write sets the size of win386.swp to 1
megabyte. If we were to extend our logging and launch some applications, we
would see FS_WriteFile and FS_ReadFile calls on the handle 200h with lengths
which are a multiple of lOOOh, the size of a page.
To sum up, we have found that Windows 95, like Windows 3.x, uses a temporary
file called win386.swp for its paging file. While Windows 3.x used only virtual-86
DOS calls to access this file, Windows 95 uses IFSMgr's ring-0 APis (when the
underlying hardware supports it). As we have seen, these APis are a thin veneer
to the underlying FSD, VFAT. VFAT in turn utilizes IOS services. These changes
have breathed new life into what was a sluggish Windows 3.x option.
FSHook has a registry option for just such a need. This is not a feature that most
users will want to experiment with, so it is left as a registry entry that is set manu
ally using REGEDIT. In Figure 10-2, the registry values under the MultiMon_
fshook key are shown. The value name "Int30n" will not be defined unless you
have . experimented with this feature already. To add this value, select the menu
Edit, submenu New, followed by DWORD Value. Type in Int30n for the value
name. The DWORD associated with this is a Boolean, 1 for "on" and 0 for "off. "
When the breakpoint occurs, execution stops on the instruction following the Int
3. The actual code, in both C and indented assembly, is shown in Example 10- 1 .
Here you see the call to the previous file system hook function, which looks a
little strange because of the double indirection involved, (*(*ppPrevHook)). Using
the debugger to step forward we can watch as each of the arguments are pushed
onto the stack in preparation for calling down into the FSD. Right now, I'm inter
ested in seeing who is making this call, so I won't step into the FSD code, but
rather step over it. By continuing to step through code we work our way up
through the series of nested functions which initiated the call into FS_OpenFile.
DYNAPAGE, internally, in its Device Descriptor Block, this driver goes by the
name PAGEFILE. PAGEFILE and PAGESWAP are not new to Windows 95. They
are revamped versions of their Windows 3.x counterparts.
Fortunately, we are given the entire source code for the PAGEFILE (DYNAPAGE)
driver; it can be found in the Windows 95 DOK directory \ base\samples\ . .
At this point there are two possibilities: paging is provided through the virtual-86
DOS Int 21h interface, or paging is provided through IFSMgr's ring-0 APls. We'll
only show the ring-0 case for the remainder.
• Uses IFSMgr_RingO_FileIO subfunction RO_OPENCREATEFILE to create the
paging file with normal attributes. Perform the create using the special flags:
RO_SWAPPER_CALL, RO_NO_CACHE, and OPEN_FLAGS_NO_COMPRESS.
• Uses IFSMgr_RingO_FileIO subfunction RO_WRITEFILE to set the initial length
of the paging file to the value specified by MinPagingFileSize. If this fails
then tries again using a different value. If the system has less than 9 mega
bytes of RAM under control of the memory manager (as reported by _GetDe
mandPagelnfo), then sets the file size to 9216 Kbytes (amount of physical
RAM in Kbytes). Otherwise retries with a size of 0.
• On success, returns the maximum paging file size in EAX (in pages) and the
current paging file size in EBX (in pages).
212 Chapter 10: Virtual Memory, the Paging File, and Pagers
The call trace shown in Figure 10-1 also reveals several calls to FS_GetDisklnfo.
Those which are marked by command and flag bytes of 36 1 01 are the result of
Int 21h function 36h requests. Note that only the ANSI code page flag is set, so
these calls are not invoked using IFSMgr_RingO_FileIO. Instead, they originate as
Exec_VxD_Int calls in PageFile_Get_Size_Info. This latter function reports the
minimum, maximum, and current size of the paging file. The amount of free
space on the disk containing the paging file enters into the calculations of these
parameters. PageFile_Get_Size_Info, in tum, is called by two VMM services:
_GetDemandPagelnfo and _PageGetAlloclnfo.
The last four lines shown in Figure 10-1 are two pairs of FS_GetDisklnfo and FS_
WriteFile calls. Both of these calls are made via IFSMgr_RingO_FileIO. Each pair of
calls corresponds to a single call to PageFile_Grow_File requesting that the paging
file grow by 80h pages (51 2 Kbytes). Growing and shrinking the paging file is an
ongoing process. Any service that commits "swappable pages" (e.g. , _Page
Commit) adds that number of pages to a running total. The requests are not acted
on until the total outstanding exceeds the current paging file size by at least 80h
pages. Similarly, decommitting swappable pages reduces the size of the paging
file by a like amount, but th� paging file is not shrunk until its new size would be
at least 80h pages less than its current size. While the growth of the paging file
occurs directly in response to committing new swappable pages, shrinking the
paging file goes on as a background process from a callback installed by the
·
VMM service Call_When_Idle. Pages which are allocated as fixed or which are
subsequently locked do not require space in the paging file, since they will never
be candidates for page-outs. Also, some pages use a different backing file, such as
those for memory-mapped files, and are not counted as swappable.
The key PAGEFILE service for moving pages to and from the paging file is
PageFile_Read_Or_Write. This service takes a single argument, a pointer to a
PageSwapBufferDesc structure (see Example 10-3). PAGEFILE converts the
parameters in this structure into an IFSMgr_RingO_FileIO call for either RO_READ
FILE or RO_WRITEFILE, depending on the value of PS_BD_Cmd.
The transfer count is equal to PS_BD_nPages * 4096 bytes. The file position at
which the operation begins is determined by PS_BD_File_Page * 4096. The paging
Pagers 213
file remains open, so the handle returned by the OpenCreateFile call in PageFile_
Init_File is still valid and used by PAGEFILE here. Note that although we can't
explicitly specify the RO_NO_CACHE, RO_SWAPPER_CAU, and OPEN_FLAGS_NO_
COMPRESS options as we did on the OpenCreateFile call, these attributes are
stored with the £handle structure. Before the call is passed down to the FSD,
IFSMgr propagates these attributes to the ir_options member of the ifsreq struc
ture, so they will be seen by FS_WriteFile and FS_ReadFile.
Pagers
Pagers are anew addition to the VMM in Windows 95. A pager is simply code
called by the VMM to move pages in and out of memory. A pager does not have
to reside in a virtual device, and in fact several pager routines are located in
KERNEL32.
Pagers are used for loading and initializing both swappable and fixed pages.
Pagers are involved during the entire lifetime of a page, from the time it is
committed until it is freed. Not all pages fall under the control of a pager though;
the exceptions include hooked pages, instanced pages, and pages committed
using the service _PageCommitPhys.
A pager exposes one or more action functions through a Pager Descriptor (PD)
structure (see Example 10-4). Each pager action function (e.g., pd_virginin) has
the following prototype:
ULONG _cdec l FUNPAGE ( PULONG ppagerdata ,
PVOID ppage , ULONG faul tpage ) ;
If a function pointer member of the PD structure is zero, the pager will not be
notified when the corresponding action is taken. It is customary that a pager will
not implement all action functions.
A virtual device may register a pager with VMM using the _PagerRegister service.
This service takes a pointer to a PD structure as its only argument. It returns a
handle, actually a 1-based index, that represents the pager. This handle can be
passed to other services, such as _PagerQuery, to retrieve the pager's PD struc
ture, or _PagerDeregister, to remove the pager from VMM.
All system pages which are under control of a pager have such a handle associ
ated with them. The association is made at the time pages are committed through
_PageCommit. Here are the parameters passed in to _PageCommit:
ULONG _PageConnni t ( ULONG page , ULONG pages ,
ULONG hpd , ULONG pagerdata , ULONG f lags ) ;
• page is the linear page number, i.e. , the linear address returned by _PageRe
serve divided by 4096
• pages specifies the number of pages to commit but can be no larger than the
number of pages initially reserved by the call to _PageReserve
• hpd is the handle of the pager whose action functions will be called for these
pages. VMM supplies four internal pagers with handles 1 to 4, which are:
PD_ZEROINIT(l) for swappable zero-initialized pages
PD_NOINIT(2) for swappable uninitialized pages
PD_FIXEDZER0(3) for fixed zero-initialized pages
PD_FIXED(4) for fixed uninitialized pages
• pagerdata is a 32-bit value associated with this page or pages; if used in con
junction with the PC_INCR flag, then pagerdata is incremented by one for
each page in the range
• flags specifies various options such as whether the pages are permanently
locked, are accessible by ring-3 applications, etc.
For each pager action function there is a corresponding column, Virginln, Taint
edln, etc. The addresses displayed in these columns are given as Device(obj) +
The System Pagers 215
ofs, where Device is the virtual device, obj is the object or segment number, and
ojs the offset from the beginning of the segment. A zero indicates that the action
function is not implemented for that pager. In a few cases, a linear address is
given, e.g. , bff7b4b6. This is an address in KERNEL32.
If you compare the pager type with the number of functions it has implemented
you will note that SWAPPER type pagers provide the most functionality. This is
understandable, since these pagers support the movement of data to and from the
paging file. PAGERONLY type pagers do not use the system paging file, either
because the pages are fixed or because they use a different backing file.
Another item of interest is that a pager can "inherit" functions from another pager.
For instance, under the columns TaintedFree and Dirty, all pagers use the same
implementation provided by VMM .
216 Chapter 10: Virtual Memory, the Paging File, and Pagers
Ignore the descriptions column for a moment and just look at the addresses of the
action functions. Handles 8 through 1 2 are unique in that the action functions are
in KERNEL32's address range. Handles 5, 6, 7, 10, and 1 1 have action functions
that reside in VWIN32. If the description strings weren't available, this KERNEL32/
VWIN32 association would be enough to suspect that these pagers are used by
Win32.
The descriptions for the pagers with handles 5 through 12 were found by using
the .M debugging command which is built-in to VMM for both the retail and
debug versions. This command can be invoked in either Winlce or WDEB386; it
has many options and reveals a wealth of information about the internal workings
of the memory manager. The subcommand which displays the pager descriptors
is .MG.
The last pager displayed in the output, the one with handle 13, is registered by
qpagers.vxd, the helper VxD which PAGERS uses to collect the information it
displays. We will be using this pager to get a closer look at when and why the
pager action functions are called.
VMM will call the various pager functions in the PD structure, to control the life of
a page. The function pd_virginin is called to move a page into memory, if the
page is clean and has never been modified. This could involve reading a portion
from the original file on disk into the page or just initializing the page contents to
zero. The function pd_taintedin is also used to move a page into memory, but for
pages which have undergone some change. VMM also has two functions for
moving pages out of memory. The first is called pd_cleanout, which is used to
move out a page which has not been dirtied since the last time it was paged out.
The function pd_dirtyout does the same, but for pages which have not been
paged out since they were dirtied. The destination for a page out could be the
paging file or the backing file for a memory-mapped file.
The System Pagers 21 7
The first test routine is shown in Example 10-5. The sequence that this routine
follows is very simple. It first reserves three pages of memory and then commits
the pages. It then reads a byte and writes a byte to each page. The pages are then
decommitted and then freed. Interspersed with these steps are printouts to the
debug console of several data structures. qpagers. vxd installs its own pager which
is a wrapper around calls · to VMM 's Swappable Zero-Init pager. As the Testl
routine executes, the calls to the pager's action functions are also logged to the
debug console. This output is shown in Example 10-6.
Tes tNum = l ;
CheckPageRange ( 0 , 0 ) ;
Tes tNurn = O ;
_Dirty ( C 0 4 0 7 2 BB [ 7 6 0 ] , 0 , 0 )
TEST1 ( 6 ) : _PageDecommi t : l inear addr = 7 6 0 0 0 0
The first group of lines starts at TESTl ( 1 ) . These show the page table entries for
the three pages reserved in the private arena (PR_PRIVATE). The linear address
for the first page is at 760000h, the second is at 761000h, and the third is at
220 Chapter 10: Virtual Memory, the Paging File, and Pagers
762000h. The corresponding addresses of the page table entries (pPTE) are
FF801D80h, FF801D84h, and FF801D88h. These are computed using the formula:
f f B O O O O Oh + 4 * [ l inear page number] = pPTE
At this stage, the page table entries (PTE) at these locations are non-zero but the
flags in the lower 12-bits are all cleared. The number which is stored in page
frame address is an index to an Arena Record (iAR).
After committing the pages, the PTE contents are displayed again at TESTl ( 2 ) .
The lower 12 bits of flags in the PTE now have the value 206h. This corresponds
to the attributes: committed, clean, uhaccessed, user, read/write, and not present.
Bits 9, 10, and 1 1 are not predefined by the x86 chip, and are used by the
memory manager to indicate whether the page is committed (Bit 9) and whether
the page is physically mapped (Bit 1 1). The number which is now stored in the
page frame address is an index to a Virtual Page (iVP). At this point, we haven't
actually made the pages physically present. We could have done that by speci
fying the PR_PRESENI' flag in our _PageCommit call. What we have done is first,
reserve a swath of the linear address space which is private to our memory
context, and second, commit some pages of virtual memory.
Indented under _Virginin.. is a line starting with pVP=.... This shows the
contents of a Virtual Page structure. It includes such things as the handle to the
pager, the pagerdata passed in to pd_virginin, the index to the Arena Record,
and a flags byte describing the state of the page.
In the mid-section of the Testl routine in Example 10-5, you will notice a for loop
where the page "touching" and "dirtying" is done. A touch occurs when a
memory location in the page is read (abyte *p), while we make the page
dirty by writing a byte to it ( *p = ' a ' ). Examination of the PTEs immediately
=
following each of these program statements reveals the changes that the page is
undergoing. The dump of the PTE immediately following a touch shows that the
The System Pagers 221
lower 12 bits now have the value 227h, and indicate these attributes: commited,
clean, accessed, user, read/write, and present. After a page has been dirtied, the
lower 12 bits of the PTE have the value 267h, indicating that a single attribute has
changed: it has gone from clean to dirty. Also note that since the present bit is
set, the page frame address now refers to the physical address of a page of some
system memory (it is no longer an iAR or iVP).
Since we dirtied some pages, we would expect to see some pd_dirty pager func
tion calls (here called _Dirty). VMM 's memory manager does not guarantee timely
delivery of these notifications, in fact, we don't see them until we are decommit
ting the pages under TESTl ( 6 } . The pd_dirty function receives a pointer to the
p VP->pagerdata for the page, but the other arguments do not appear to be valid.
VMM ' s PD_ZEROINIT pager handles this call by freeing the corresponding swap
file page if one has been allocated in the paging file.
As we leave the Testl routine, we call _PageDecommit and _PageFree for the
pages which we have been using. As each page is decommitted, the pager func
tion, pd_tainteclfree (here named _TaintedFree), is called. This call informs the
pager that this is the last reference to the Virtuai Page (pVP) before the page is
decommitted. The pd_tainteclfree function receives a pointer to p VP->pagerdata
but the other arguments are not valid. VMM's PD_ZEROINIT pager handles . this
call by freeing the corresponding swap file page if one has been allocated in the
paging file.
After _PageDecommit returns, a dump of each page's PTE shows that it has been
reverted to its reserved state. _PageFree goes a step further by setting the PTEs to
zero.
The output from the Test2 routine is shown in Example 10-7; the source code for
this routine is similar to that for Testl so it isn't shown here. Like Test l , Test2
reserves and commits two pages, reads from one page and writes to the other,
and then decommits and frees the pages. The additional twist added here is that
Test2 forces these two pages to get written out to the paging file.
Ii:xample 10- 7. Pager Function Trace Showing Page-Outs & Page-Ins (continued)
Test2 does a couple of things to nudge these pages out. First, it makes use of the
VMM service _PageDiscardPages to mark these pages as unaccessed . An unac
cessed page will get paged out before an accessed one. You can see the
The System Pagers 223
Next, Test2 needs to overcommit pages to force the memory manager to start
moving some pages from memory to the paging file. As a starting point for deter
mining the minimum number of pages to commit, the VMM service
_GetFreePageCount is used to determine the number of free pages in the system.
These pages are then reserved, committed, and touched to force them to be
present. Once pd_dirtyout has been called, signaling that one of our pages has
been moved to the paging file, a flag is set. If Test2 sees that this flag has been
set, it assumes it has succeeded; if it is not set, this group of pages is freed, and
the process is repeated with the same amount plus 256. At TEST2(9) in Example
10-7, you see that 4f3h pages were committed and touched, but that amount was
not sufficient, so they were freed and then 5f3h pages were tried, this time with
success. The pager functions pd_dirtyout (here named _DirtyOut) and pd_
cleanout (here named _CleanOut) were called to page out the dirty page and
then the clean page. Only two arguments to these functions are used. The first is
a pointer to pagerdata and the second is the linear address of the page's contents.
The third argument is always -1 . This is the primary pager function where
PageFile_Read_Or_Write is called to write the contents of a dirtied page to the
paging file. While a swappable page is in memory, the Virtual Page structure
holds the address of the page's Page Frame structure. When the page is swapped
to the paging file, the Virtual Page structure holds the Swap Frame for the page,
i.e . , the offset into the paging file to find the page's contents. You can see this
under TEST2 ( 11 ) at the line starting pVP= . Here, the SF=9F entry in the VP struc
...
ture tells us that frame 9fh in the paging file contains this page.
At TEST2 ( 10 ) , the contents of the page's PTEs are shown after both of the pages
have been paged out. Both pages have the same attributes: committed, clean,
unaccessed, user, read/write, and not present. The page frame field of the PTE
holds the index to the page's Virtual Page structure.
At TEST2 ( 1 1 ) , the two pages are accessed by reading a byte from each of them.
For the page which had been earlier modified, the pager function pd_taintedin
(here named _Taintedln) is called by the memory manager, requesting that the
page's contents be restored. The pager function receives a pointer to pagerdata,
which now contains the swap frame in the paging file; a pointer to a buffer where
the page can be written; and the original linear page number where this page was
committed. This pager function is the counterpart to pd_dirtyout, because this is
the primary pager function where PageFile_Read_Or_Write is used to read the
contents of a tainted page from the paging file. Since the other page was never
224 Chapter 10: Virtual Memory, the Paging File, and Pagers
modified, pd_virginin (here named _Virginln) only needs to create it from scratch
by zero-initializing the page's contents.
At TEST2 ( 12 } the PTEs for these two pages are displayed. Both pages have the
same attributes: committed, clean, accessed, user, read/write, and present. The
fact that one of the pages is tainted is stored in the Virtual Page structure flags.
Finally, at TEST2 ( 13 } , we decommit and free the two pages. The page which
was tainted has the pd_taintedfree (here named _TaintedFree) function called for
it whereas the unmodified page has the pd_virginfree (here named _VirginFree)
function called for it. Both functions receive a pointer to the pVP->pagerdata
member of the Virtual Page structure; the other arguments are zero. As noted in
Testl , VMM's PD_ZEROINIT pager handles the pd_taintedfree call by freeing the
corresponding swap file page if one has been allocated in the paging file. VMM ' s
PS_ZEROINIT pager does not implement the pd_virginfree function.
Demand Page Loading 225
After VWIN32 has registered its three pagers, it proceeds to reserve and commit
pages for KERNEL32. To reserve the linear address range needed by KERNEL32, it
issues the service call _PageReserve(Oxbft70, Ox8f, PR_STATIC). This will reserve
the address range BFF70000h to BFFFEFFFh .
. Next, VWIN32 commits the first page of the file image using the service call _Page
Commit(Oxbft70, 1 , 6, 0, PC_INCR I PC_STA 11C I PC_ USER). This page contains the
file's DOS header and PE (portable executeable) header. From these, the layout of
the remainder of the file can be determined. In fact, the rest of the file gets
loaded based upon the contents of the PE header's section table.°
KERNEL32 contains six sections; their names, sizes, and characteristics are summa
rized in Table 10-2. The VWIN32 loader looks at two characteristics of a PE
section to decide which pager to commit it with. If it is loading a read-only
section without initialized data, then pager 6 is used. If it is loading a read-only ·
section with initialized data, then pager 7 is used. If it is loading a writeable
section, then pager 5 is used. Here are the actual service calls which commit
KERNEL32's sections:
_ FREQASM ( code )
_ Page C onnni t ( Oxbff71 , 6 , 6 , 4 0 0 0 0 0 0 0h , PC INCR I PC STATIC I PC USER)
_ _ _
. text ( code )
_ Page C onnni t (bff7 8h , 41h, 6 , 2 0 0 0 0 0 07h, PC_INCR I PC_STATIC I PC_USER)
_ P a ge C onnn i t (bffb9h , 1 , 6 , 2 0 0 7 0 0 48 , PC_ INCR I PC_ STATIC I PC_USER)
• See Chapter 8 of Windows 95 System Programming Secrets, by Matt Pietrek, for details of the PE file
format.
226 Chapter 10: Virtual Memory, tbe Paging File, and Pagers
_INIT ( code )
_PageConuni t ( bf fbah , 1 , 6 , 4 0 0 0 0 0 4 8h , PC_INCR I PC_STATIC I PC_US ER )
_PageConuni t ( b f fbbh , 1 , 6 , 4 0 0 4 0 0 4 9 h , PC_INCR I PC_STATIC I PC_US ER )
. edata ( exports )
_PageConuni t ( b f f c O h , 4 , 6 , a0 0 0 0 0 4 dh , PC_INCR I PC_STATIC I PC_USER )
_PageConunit ( b f f c 4h , 1 , 6 , a0 0 4 0 0 5 1h , PC_INCR I PC_STATIC I PC_USER )
. rsrc ( resources )
_PageConuni t ( b f f c 5h , 12h , 6 , 2 0 0 0 0 0 5 2 h , PC_INCR I PC_STATIC I PC_USER )
_PageConuni t ( bf fd7h , 1 , 6 , 2 0 0 6 0 0 6 4 , PC_INCR I PC_STATIC I PC_USER )
There are two _PageCommit calls for each section because VWIN32's algorithm
commits the whole pages first and then, if it finds a remainder-a fraction of a
page-it commits one more page for it. The .data section, which is the only
section which is writeable, uses pager 5; all other sections use pager 6.
The pagerdata value supplied to these _PageCommit calls may look a little
strange. The doubleword has two fields. The most significant 10 bits hold an
to the raw data to be read into a page; this is the byte offset divided by 5 1 2 . Now
index which is used to lookup a file handle. The lower 22 bits hold the file offset
take that value and rotate it to the right by 3 bits. This last twist has the magic
effect of aligning bit 0 on the page digit. Since the PC_INCR flag is set for these
pages, the pagerdata values will be incremented for each page in the set. This
rotation makes sure the increment actually increases the file offset by lOOOh bytes.
Referring once again to Figures 10-3 and 10-4, you can see that pager 5 is the
same as VMM 's Swappable Zero-Init pager, except that pd_virginin has been
replaced with an action function in VWIN32. This same action function is used by
pager 6 · for handling both pd_virginin and pd_taintedin. This action function
switches to KERNEL32's PSP, extracts the file handle index and file offset from the
Demand Page Loading 227
pagerdata, and then proceeds to seek to that location and read the page. The
current PSP is restored and the function returns. The seek and read are executed
using _ExecVxDintMustComplete.
It is interesting that pager 5 uses the system paging file for backing up changes to
KERNEL32's .dat.a section. Except for the fact the section's · initial contents are
loaded directly from the KERNEL32 image, the life of pages in this section will be
the same as those controlled by the PD_ZEROINIT pager.
The three pagers we just examined are only used with KERNEL32. It appears that
at one time, files other than KERNEL32 were demand-paged using this code, since
there is a file index built into the pagerdata value. Perhaps this pager is separate
because it can be put to use before the Win32 subsystem is up and running, and
thus serves as sort of a bootstrap pager.
are used but via the Win32 VxDCall interface. Rather than drill down into
KERNEL32's code, I'm going to spy on the VxDCalls for PageReserve and Page
Commit. We can use MultiMon to do this by loading the WIN32CB and FSHook
drivers and enabling the filters for vMM Win32 Services (PageReserve and Page
Commit) and IFSMgr Filehook (FS_OpenFile). To capture the trace that we'll be
looking at, press the Start button, launch the Notepad application, terminate
Notepad, and then press the Stop button.
After you hit the Show button, scroll through the output until you find the point
where notepad.exe is being opened (FS_OpenFile); you should see something
similar to the output in Figure' 10-5. What we see is a trace of the Win32 loader as
it assigns pages and pagers to the sections of Notepad.
Right after the FS_OpenFile line, a PageReserve call is made with these arguments:
linear page number: 400h, number of pages Och, and flags lOh (PR STA ITC)
= = = _ .
This call is reserving 48Kbytes for the file image of Notepad starting at linear
address 400000h. We can use a tool like the Explorer's QuikView to determine
Notepad's PE file sections. With this information we can interpret the sequence of
PageCommit calls as follows:
PE header
_ PageCommit ( O x4 0 0 , 1 , 9 0 0 f 2 0 0 0 0h , PC INCR I PC STATIC I PC USER )
, _ _ _
PageCoaa i t 0 0 0 8 1 5 3 b 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0
PageDecoaa i t 8 1 5 3b 1 2 0 0 0 0 0 0 0
FS_OpenF i l e 6 c l 6 0 -- 1 -- VFAT 0 2 0 9 o e C : ,VIBDOVS'JIOTEPAD .
PageReserve 0 0 0 0 0 4 0 0 O O O O O O Oc 0 0 0 0 0 0 1 0
PageCoaa i t 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 1 0 9 O O f 2 0 0 0 0 6 0 0 4 0 0 0 0
PageCoaai t 0 0 0 0 0 4 0 1 0 0 0 0 0 0 0 3 0 9 4 0 f O O O O O 6 0 0 4 0 0 0 0
PageCoaa i t 0 0 0 0 0 4 0 4 0 0 0 0 0 0 0 1 0 9 4 0 f 5 0 0 0 3 6 0 0 4 0 0 0 0
PageCoaai t 0 0 0 0 0 4 0 5 0 0 0 0 0 0 0 1 0 1 O O f 3 0 0 0 0 6 0 0 6 .0 0 0 0
PageCoaa i t 0 0 0 0 0 4 0 6 0 0 0 0 0 0 0 1 0 8 e0 f 2 0 0 0 3 6 0 0 6 0 0 0 0
PageCoaa i t 0 0 0 0 0 4 0 7 0 0 0 0 0 0 0 1 0 9 2 0 f 7 0 0 0 4 6 0 0 4 0 0 0 0
PageCoaa i t 0 0 0 0 0 4 0 8 0 0 0 0 0 0 0 2 0 9 O O f 0 0 0 0 5 6 0 0 4 0 0 0 0
PageCoaa i t 0 0 0 0 0 4 0a 0 0 0 0 0 0 0 1 0 9 0 0 f 6 0 0 0 7 6 0 0 4 0 0 0 0
PageCoaa i t 0 0 0 0 0 4 0b 0 0 0 0 0 0 0 1 0 9 c 0 f 5 0 0 0 7 6 0 0 4 0 0 0 0
PageReserve 8 0 0 0 0 4 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0
PageCoaai t 0 0 0 0 0 4 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0
Pa ve 8 0 0 0 4 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0
Figure 10-5. MultiMon output showing page commits when loading Notepad
This outpu.t appears to be generated by the same algorithm that is used by the
KERNEL32 loader, only different pagers are used. Pager 9, which is used to load
read-only sections of code or data, only implements pd_virginin. Pager 8, which
is used to load · read-write, initiaUzed data sections, uses the same implementation
of p d_virginin, but in other respects is a clone of PD_ZEROINIT. For uninitialized
data sections, VMM 's PD_ZEROINIT pager is used. Pages which are under control
of pagers 1 or 8 are backed up by the system paging file.
Demand Page Loading 229
Standard Win32 code for creating and accessing a mapped file is shown in
Example 1 0-8. You can launch this test code from pagers.exe by selecting the Test
menu, sub-item MemMapped R/O. The . output shown in Figure 10-6 was collected
by MultiMon while this code executed. MultiMon had WIN32CB and FSHook
drivers loaded and the filters for VMM Win32 Services (PageReserve, PageCommit,
and PageFree) and IFSMgr Filehook (FS_OpenFile, FS_ReadFile, FS_WriteFile, FS_
FileSeek, and FS_CloseFile) .
The first line of output is from an attempt to create a new copy of mapfile. tst, a
test file 64 Kbytes in length. In this case, the file had already been created, so the
create call fails, but the subsequent open of the existing file succeeds, and returns
a file handle of 264h. There are three intervening seeks, perhaps to determine the
file size, before the FS_ReadFile call. This read corresponds to the Win32 Create
FileMapping call. It is a special case where ir_length is 0 and the RO_MM_READ_
WRflE flag is set in ir_options. This combination indicates that a memory-
230 Chapter 10: Virtual Memory, the Paging File, and Pagers
mapping is being created to an existing open file. This special call originates from
IFSMgr_Win32DupHandle when it is called with the DUP_MEMORY_MAPPED flag.
This service duplicates the handle 264h to 26bh before making the FS_ReadFile
call on the duplicated handle.
When the Win32 A.Pl MapViewOtFile is called, virtual memory is reserved for the
file image. Since we specified that the entire file be mapped, an equivalent
number of pages are reserved. The _PageReserve request is for lOh pages in the
shared memory area at 80060000h with the PR_STA11C flag. The subsequent
commit passes in 82869h as the linear page number, so _PageReserve must have
returned 82869000h as the base linear address of the mapping. _PageCommit
.
commits all lOh pages using pager 10 with PC_INCR, PC_STA11C, and PC_ USER
flags. Since we requested FILE_MAP_READ, we are not given the PC_ WRITEABLE
attribute and the mapping is read-only.
Next, we proceed to read the first byte of each page of the mapping. Each read
forces a pd_virginin call for a page which results in the series of FS_ReadFile calls
on the duped handle 26bh. These reads also are marked with the RO_MM_READ_
WRI/'E flag. Note that if a page out occurs for one of mapped pages, it is essen
tially a discard since the pages can not enter the dirty state. A subsequent access
would restore the page using pd_virginin. At the bottom of trace, we see the
pages being freed in response to the UnmapViewOfFile, and then the Close
Handle calls for hMapFile and hFile.
Demand Page Loading 231
Very similar Win32 code for creating and accessing a mapped file is shown in
Example 10-9. You can launch this test code from pagers. exe by selecting the Test
menu, sub-item MemMapped R/W. The output shown in Figure 1 0-7 was
collected by MultiMon while this code executed. The difference between this
example and the previous one is in granting the mapping read-write access and
writing to it.
Zeroing in on just those areas which are different in Figure 10- 7, we see that
_PageCommit uses the PC_ WRIIEABLE attribute since we passed FILE_MAP_
WRflE to MapViewOfFile. Although we are writing a byte to each page of the
mapping, each write forces a pd_virginin call for a page which results in the
series of FS_ReadFile calls on the duped handle. Eventually, when UnmapViewOf
File is called, we see pd_dirtyout in action as each page which has been dirtied
written out to mapfile.tst.
Example 10-10 again illustrates very similar Win32 code for creating and accessing
a mapped file. You can launch this test code from pagers.exe by selecting the Test
menu, sub-item MemMapped WriteCopy. The output shown in Figure 10-8 was
collected by MultiMon while this code executed. The difference between this
example and the previous one is that write access is granted only to a copy of the
mapping file. This difference in behavior is brought about by subtle changes in
the flags to CreateFileMapping, which uses PAGE_ WRflECOPY, and MapViewOf
File, which here uses FILE_MAP_COPY.
Underneath the Win32 code, we can see what is going on by looking at the
MultiMon trace in Figure 10-8. When MapViewOfFile commits memory to match
mapfile.tst's file size, it uses pager 12, the one described as Win32 Copy-On-Write
Mapped File. We see this in the _PageCommit call:
_PageCommi t ( 8 2 8 6 9 h , l Oh , 12 , 0 0 7 0 0 0 0 0 h ,
PC_INCR j PC_STATIC j PC_USER j PCWRITEABLE )
232 Chapter 10: Virtual Memory, the Paging File, and Pagers
}
C l o s eHandl e ( hMapF i l e ) ;
}
C l o s eHandl e ( hF i l e ) ;
}
Demand Page Loading 233
Paging aims to minimize disk access and resource usage by bringing the disk
imae into memory only as needed. In the next chapter we'll look at caching,
which reduces disk access by keeping frequently used portions of the disk image
in main memory.
VCACHE: Caches
Big and Sm all
The idea of a cache was motivated by the need to reduce costly 1/0 processing. It
is much faster to read a block of data from memory than it is to read the same
data from a physical disk. The cache keeps some subset of a larger collection of
data within local memory. Often, the items in the cache are determined by usage.
The most recently used items are kept in the cache, and once the cache is full,
the least recently used items are discarded to make room for new additions. This
algorithm is referred to as least recently used, or LRU.
Windows 95 supplies vcacbe. vxd to provide two kinds of LRU caches to VxD
writers. The first type of cache, the block cache, deals with 4096 byte memory
pages; the size of the allocation is fixed. A separate data structure, represented by
a cache block handle, is used to track each page. It contains information such as
ownership, lookup keys, lock .counts, and usage counts. This is the cache used by
VFAT when accessing the system's disk drives. The second type of cache, the
lookup cache, is suitable for small iterris; these items may be of variable and arbi
trary size. This cache is the in-memory image of a section of the system registry. A
lookup cache is created as a key with some maximum number of elements. The
elements are just values under the key. The LRU algorithm kicks in when the
number of values added under the key exceeds the maximum number of
elements. The registry file serves as persistent storage for a lookup cache.
234
Where Does Block Cache Memory Come From? 235
The official documentation for VCACHE's services is in the DDK document file
stdvxd.doc. Unfortunately, the information presented there is incomplete. This
chapter will help fill in what's missing and supply additional background
information.
where PR_ SYSTEM requests that the pages be reserved anywhere in the system
arena (COOOOOOOh-FFBFFFFFh) and PR_FIXED says do not move the pages on a
_PageReAllocate. The subsequent call, which commits some of this range to form
the initial cache, takes this form:
_PageCommi t ( l inBase>>12 , initCache , PD_FIXEDZER0 , 0 , PC_FIXED )
Note that these pages are PC_FIXED, meaning that the memory is permanently
locked. Not all of the pages initially reserved are committed. Instead the following
algorithm is used to determine the initial cache size:
minini tial = (minCache>= 6 4 ) ? 6 4 : minCache ;
ini tCache = maxCache - 1 0 2 4 ;
i f ( ini tCache < = minini tial ini tCache = minini t ial ;
i f ( ini tCache > maxCache ) initCache = maxCache ;
i f ( ini tCache > 2 3 0 4 ) initCache = 2 3 0 4 ;
Put simply, the initial cache size will be 1024 less than the number of reserved
pages but will not exceed 2304.
In somewhat the same way that DYNAPAGE and PAGESWAP use legacy entries
in the system. ini file to set various parameters controlling the paging file, VCache
uses entries in the [ vcache ] section of the system. ini file to set parameters
controlling the block cache. The keys which VCache retrieves during initialization
are minfllecacbe, maxfllecacbe, and CacbeBujRR T. The minfllecacbe and maxflle
cacbe entries are in units of kilobytes; if a value is not specified in the system. ini
file, a default of 0 is used.
initCache, the subset of reserved pages which are initially committed for use. To
get from minfilecache and maxfllecache to the final values of minCache and
maxCache, the following algorithm is used:
max = Get_Pro f i l e_Decimal_Int ( " vcache " , " maxfilecache " , 0 ) ; I I kbytes
min = Get_Profile_Decimal_Int ( " vcache " , " minfi lecache " , 0 ) ;
maxCache = (max + 3 ) 1 4 ; I I round up to neares t page
minCache = (min + 3 ) 1 4 ;
numFreeLockablePages = _GetFreePageCount ( O ) ; I I returned in EDX
i f (minCache == 0 ) I I us ing defaults ?
minCache = ( numFreeLockablePages < 1 2 8 0 ) ? numFreeLockablePages l 4 0 :
numFreeL6ckablePage s l 2 4 ;
avail = ( numFreeLockablePages >= 3 92 ) ? numFreeLockablePages - 3 8 4 : 8;
i f (minCache > avail ) minCache = avai l ;
i f (mincache <= 8 ) minCache = 8 ;
i f (maxCache > avai l ) maxCache = ava i l ;
i f (maxcache > 2 0 4 8 0 0 ) maxcache = 2 0 4 8 0 0 ;
i f (maxCache < minCache ) minCache = maxCache ;
Summarizing, if your system is using defaults for its cache size, VCache will deter
mine these values at Device Init time from the number of lockable free pages
returned by _GetFreePageCount. If this function reports 1280 pages or more, the
minimum cache size is the number of free lockable pages divided by 40; if more
than 1 280 pages are free, this amount is divided by 24 to arrive at the minimum
size. In no case will the minimum be less than 8 pages. The default setting for the
maximum cache size is the number of free lockable pages minus 384. In no case
will the cache size exceed 204800 pages. Table 1 1-1 shows default initial cache
sizes for several PC configurations.
Table 1 1-1. Default Block Cache Sizes for Some Typical Systems
What we have described so far is the initial configuration of the cache if you were
to take a snap shot after VCache has finished its initialization. Like the swap file,
cache size is dynamic. Let's take a look at how the memory manager can make
the cache shrink or allow it to grow.
How Does the Memory Manager Control Block Cache Size? 23 7
DWORD Take_VCache_Page ( )
DWORD l inPage , numPage , iCachePage ;
i f ( amtShrinkCache= = O I I curCachePages<=minCachePages ) return O ;
amtGrowCache = O ;
l inPage = VCache_Rel inquishPage ( ) ; I I reques t a page
i f ( l inPage == 0 ) goto not_taken ;
numPage = l inPage>> l2 ; I I convert l inear addr
II to page number
i f ( numPage < pgnumCacheStart ) goto not_taken ; I I l e s s than
II cache ?
iCachePage = numPage - pgnumCacheStart ; I I page index
i f ( iCachePage >= maxCachePages ) goto not_taken ; I I greater than
II cache?
_FreeUsedPage ( pBi tMap_VCachePages , ++iCachePage ) ; I I mark page
I I unused
amtShrinkCache- - ; I I shrunk by one page
curCachePages - - ; I I current cache s i z e is one l e s s
return l inPage ; I I return l inear address o f page
not_taken :
amtShrinkCache O; I I shrink failed, turn o f f further attemp t s
return O ; I I no l inear address returned
On entry this function checks several global VMM variables before proceeding.
First, amtShrinkCache should be set to a non-zero value by the memory manager,
to indicate the number of pages to reclaim. Secondly, the current number of
pages in the cache should not drop below minCachePages; if it does then the
request is ignored. If these conditions are met, VCache_RelinquishPage is called to
get the linear address of a page within the cache. In response to this request,
VCache will first give up pages which are on its free list. Once those are
238 Chapter 1 1: VCACHE: Caches Big and Small
exhausted it will start searching for candidates on its LRU list. Only those which
are not held or dirty, and which have aged sufficiently, will be sacrificed.
The opposite of shrinking the cache is growing the cache, and VMM has a global
variable, amtGrowCache, which indicates how many pages to give back to
VCache. This variable is updated at one-second intervals by a timeout procedure
installed by Set_Async_Time_Out. The decision to grow the cache is based on
two statistics returned by VCache_GetStats at these one-second intervals: the
number of cache blocks which have been discarded and the number of cache hits
to the last 26 LRU cache blocks. When conditions are appropriate for growing the
cache, VMM sets up an event callback that will invoke VCache_UseThisPage.
Rather than call this function directly, VMM schedules a wrapper function,
Give_VCache_Page (my name), as an event using the Call_Restricted_Event .
service. The pseudocode for Give_VCache_Page follows:
void Give_vcache_Page ( void ) {
DWORD iCachePage , numPage ;
i f ( amtGrowCache == 0 ) return ; I I is VCache get t ing pages ?
whi l e ( TRUE) {
iCachePage = _GetUnus edPage ( pBi tMap_VCachePages , maxCachePages ) ;
i f ( iCachePage == 0 ) return ;
numPage = pgnumVCacheS tart + iCachePage - 1 ; I I new page
if ( _PageCommi t ( numPage , 1 , PD_FIXED , 0 , PC_FIXED I PC_WRITEABLE ) == O )
_FreeUsedPage ( pBi tMap_VCachePages , iCachePage ) ;
return ;
Flush_TLB ( ) ;
Deccounter ( ) ; I * Dl_F7E4 * I
VCache_Us eThis Page (numPage< < 12 ) ; I I give page to VCache
curCachePages + + ;
i f ( amtGrowCache == 0 ) break ;
amtGrowCache- - ;
}
This routine first checks that amtGrowCache is non-zero, i.e . , there is something
to do. If so, it enters a loop where it attempts to grow the cache a page at a time
Block Cache Data Structures 239
·
until the requested number of pages has been added. To add a page to the cache
it needs to know the linear address of a page in the cache's address range which
is currently uncommitted. By scanning the bitmap of unused cache pages,
pBitMap_ VCachePages, . the index of an unused page is returned by
_GetUnusedPage. This index is converted into a page number and passed to
_PageCommit to map a physical page to a linear address in the cache. That linear
address is then passed to VCache_UseThisPage, to inform VCache that it is
available.
To be complete, I should mention one other method by which the cache can be
made to grow. VMM 's Win32 service number Ox28 checks if the current cache size
is at least 1 28 pages. If it is not, amtGrowCache is set by the following expression:
i f ( 1 2 8 <= maxCachePages ) amtGrowCache = 1 2 8 - curCachePages ;
e l s e amtGrowCache = maxCachePages - curCachePages ;
Pages which are used to store cache blocks are tracked by an array (pCBPag
esList) of the page linear addresses. The size of this array is determined by the
maximum cache size; it is given by the formula: ((maxCachePages + 63)/64)*4
bytes. This array is allocated from the heap at Device Init time. Initially it is zero
filled, but as each page is removed from the free list to create new cache blocks,
the page's linear address is added to the first available slot in the array. Once a
page is allocated for creating cache blocks, it is never reclaimed to the free page
pool.
Pages which are used to contain data are referenced by the linear address stored
in the BufPtr member of the cache block data structure (shown below). These
pages come from the same pool of free pages. There is a one-to-one correspon
dence between cache blocks and data pages.
This brings us to the cache block, the central data structure used by block cache
services. Here is the layout of this structure:
typedef s truct {
s truct cb* cb_next ; / * 0 0 - head of free l i s t / col l i s i on l i s t * /
240 Chapter 1 1: VCACHE: Caches Big and Small
Cache blocks which are not in use are placed on a free list whose head is given
by a VCache global variable (pCBFreeList). In these cache blocks, the members
cb_next and cb_prev provide linkage for members in the list.
Cache blocks which are in use are strung together on a different list, the LRU list.
The head of this list is a pseudo-cache block in VCache's locked data area. Only
two members of this cache block are used, lru_next and lru_prev. These point to
the head and the tail of the list. The most recently used cache block is at the head
of this list, while the least recently used cache block is at the tail of this list. The
lru_next and lru_prev members provide the linkage for this doubly-linked list.
Each cache block is uniquely identified by two keys, FSKeyl and FSKey2, and a
one-byte ownership ID, FSD_ID. The FSKeyl and FSKey2 values are allowed any
values other than 0. For example, VFAT uses FSKeyl as the logical sector number
and FSKey2 as the volume resource handle. These two keys are used in conjunc
tion with a hash table. Each bucket or entry in the hash table consists of two
pointers. If the bucket is empty, the pointers reference the address of their
bucket. If the bucket contains one cache block, then both bucket pointers point
to the same cache block. Both of the cache block's cb:_next and cb_prev pointers
refer back to the hash table bucket. If the bucket contains more than one cache
block, the first bucket pointer refers to the first cache block and the second
bucket pointer refers to the last cache block. The intervening cache blocks that
belong to the bucket are linked by the cb_next and cb_prev members. The
cb_prev pointer of the first cache block and the cb_next pointer of the last cache
block refer back to the hash table bucket. The cache blocks in a bucket have
FSKeyl and FSKey2 values which hash to the same value. This hash value serves
as an index into the hash table.
To calculate a hash value VCache uses a simple hash function which is repre
sented here as C pseudocode:
i = ( FSKeyl & Oxf f f f 0 0 0 0 ) >> 1 6 ;
i " = FSKeyl ;
i "= FSKey2 ;
i &= LookupMask ;
Block Cache Services 241
The value i which results from these statements is used to directly index the hash
table. The value i is constrained to the hash table range by the last step where it
is ANDed with the LookupMask. The LookupMask depends upon the hash table
size. If the hash table has 2047 (7ffh) buckets, then the mask will be (7ffh)«3 or
3ff8h. Before a match is returned by a search, the cache blocks in the bucket are
compared with FSKeyl, FSKey2, and FSD_ID, to verify it is exact.
VCACHE may have up to 10 clients. Each client registers with VCache at Device
Init time and if successful receives a unique identifier. This is the value that will
be stored in the FSD_ID member of this client's cache blocks. Internally, VCache
keeps track of its clients using a structure like this:
s truct { DWORD Blks inUs e ;
DWORD BlksReserved ;
void ( *DiscardFunc ) ( ) ;
DWORD reserved ;
} reg_data [ l O ] ;
An FSD should set the Dirty byte in the cache block structure, to a non-zero value
if the contents of a page have been modified. This flag is controlled by the FSD
and is u�ed to prevent VCache from discarding a page. It is the responsibility of
the FSD to write a dirty page to disk and clear the flag. Another flag which the
FSD can use to prevent a page from being discarded is HoldCnt. This word value
is an unsigned count of locks which have been requested on the page. As long as
at least one lock is outstanding, the page will not be discarded. An FSD may use
the 28 bytes in FSDData [ ] for any information it may wish to store along with a
page. This area is free format, so it is up to the FSD to define how it will be used.
When a new cache block is created, its age, the member AgeCnt, is initialized to
the current value of VCache's global variable nAgeCount, and then nAgeCount is
incremented. This is equivalent to making the cache block most recently used.
This also implies that the block is placed at the head of the MRU list.
time with the service VCache_Deregister. When registering you supply a buffer
discard callback function.
Service Function
VCache_AdjustMinimum Adjusts the number of reserved blocks for a FSD
VCache_CheckAvail Verifies that enough cache blocks are available
VCache_Deregister Frees cache resources owned by a FSD
VCache_Enum Calls enumeration function for all blocks owned by FSD
VCache_FindBlock Finds or creates a cache block
VCache_FreeBlock Places a cache block and its data page on free lists
VCache_GetSize Returns number of blocks in cache
VCache_GetStats Returns statistics for use by memory manager
VCache_Get_Version Gets Vcache's version number
VCache_Hold Increments cache block's HoldCnt
VCache_MakeMRU Moves cache block to head of MRU list
VCache_RecalcSums Debugs only (not available in retail release)
VCache_Register Installs discard function and returns FSD ID
VCache_SwapBuffers Swaps data pages between two cache blocks
VCache_TestHandle Validates a cache block handle
VCache_TestHold Tests cache block's HoldCnt
VCache_Unhold Decrements cache block's HoldCnt
VCache_VerifySurns Debugs only (not available in retail release)
This buffer discard function will receive the address of the cache block which is
being discarded, in the ESI register. Cache block discards may occur in response
to VCache_RelinquishPage and VCache_FindBlock (with the VFCB_Create flag)
calls. A cache block is a candidate for discarding if it has its Dirty flag clear, its
HoldCnt is zero, and its AgeCnt is such that: (nAgeCount - cb.AgeCnt) > AgeDelta.
At initialization time, the global variable AgeDelta is set to initCache I 8 (where
initCacbe is the initial cache size) or 16, whichever is smaller. As the cache is
dynamically sized, AgeDelta is not adjusted unless the cache size drops below 1 28
pages, in which case it is recalculated as curTotalCachePages I 8.
cache block) and E.AX contains the address of the buffer (the BujPtr member). If
the AL has the VCFB_Create flag set, and a matching cache block is not found, a
new cache block will be created. In this case, the return values refer to the newly
created cache block and buffer. Other flags can be used in AL, such as
VCFB_Hold to increment the HoldCnt of a find, and VCFB_MakeMRU to move a
find to the head of the MRU list. The service VCache_MakeMRU provides a more
efficient way to move a cache block to the head of the MRU list. It takes a cache
block handle in ESI as its single argument.
Before allocating some cache blocks, you can verify that the number of cache
blocks you need are available using the service VCache_CheckAvail. Before
calling, the AH register is loaded with the FSD ID and ECX is loaded with the
desired number of blocks. The result of this call is given by the state of the ca11')1
flag. If the ca11')1 flag is set, not enough buffers are available; otherwise the
request can be granted and · the number of buffers available is returned in E.AX .
VCache_FreeBlock removes a cache block specified by the ESI register and its
associated buffer from the MRU list. The cache block and the buffer page are
placed on their respective free lists.
Monitoring VCache
MultiMon includes a monitor for VCache services. Using it in conjunction with the
file system hook adds some additional detail to our understanding of VFAT's FSD
functions. As an example, I'll execute the DISKDUMP program from Chapter 9
with three monitors: VCHook, FSHook, and TAGMON. Example 1 1-1 is a small
portion of the trace output.
Note that for vch lines, the dev column contains the FSD ID and the handle
column contains the cache block handle. If the handle is marked with an asterisk,
it represents a newly created cache block.
In this trace, DISKDUMP performs three FS_DirectDiskIO reads. The first read is of
the volume's boot sector, the second is of the first sector of the first FAT, the third
is of the first sector of the second FAT, and at the end of the trace we see the
beginning of a read of the root directory sectors. The fsh entries in the trace are
highlighted; these lines of the trace are added on completion of the
FS_DirectDiskIO calls. The vch entries of the trace record VFAT's calls into
VCache's services.
244 Chapter 1 1: VCACHE: Caches Big and Small
For instance, the following sequence is associated with the read of boot sector 0:
From this sequence we see that VFAT first searches for a cache block for the
needed sector and volume, and only if that fails does it create a new cache block.
We can also infer that VFAT doesn't just read in a single sector; rather, it reads an
entire page. This is revealed by the following sequence for the subsequent read
of the first sector of the first FAT (sector 1):
In this case the search · for the cache block succeeds because it is already in
memory, having been loaded along with the boot sector.
The keys which are passed to VCache_FindBlock require some explanation. The
second key is the simply the address of the volume's resource block structure (see
Chapter 9, VFAT: Tbe Virtual FAT File System Driver) which is owned by VFAT.
The first key represents the sector on the volume. But how does sector 0 become
Ox:fffff
f fd? Why do both sector 0 and sector 1 use this same hash key?
To understand this, you need to look at the disk layout. The sectors in a volume
either lie in the system area (boot sector, FATs, root directory entries) or in clus
ters which are assigned to files and subdirectories through the FAT. The line
between these regions is drawn at the first sector of the first available cluster.
Cache blocks are also aligned at this boundary. In our DISKDUMP example,
volume D has the first sector of the first cluster at sector 125h. This value serves
as a key for sectors 1 25h, 126h, . . . 12Ch, since the volume's sector size allows 8
sectors to be stored in a cache page. Since this alignment boundary lies on a
sector which is not an even multiple of 8, the key for the first cache block will
start at (125h mod 8)-8 or -3 (Oxfffffffd), and this value will serve as the key for
sectors -3, -2, -1 , 0, 1, 2, 3, and 4.
246 Chapter 1 1: VCACHE: Caches Big and Small
i··· ii PAGESWAP
l·-·ii PARITY (value not set)
!· ·Iii PERF 00 00 02 00
JE ·Wll PPP 00 00 02 00
� ·!iB SHELL
I ii REBOOT 00 00 02 00
00 00 00 00
! ···Wll SPOOLER 5c 00 53 00 45 00 52 00 56 00
B Wll VCl>.CHE
! --iiiJ VB6MMGR 5c 00 54 00 41 00 50 00 44 00 4
i B ·Wll Lookup
l i····� ServerNameCache
5c 00 4b 00 55 00 4d 00 51 00
L..(iB VREDIR_Names
1 e 00 00 00
03 00 00 00
I
! · ·Iii VCDFSD
l····ii VCOMM
Figure 1 1-1. Registry editor display of the lookup cache
Internally, VCache uses the IFSMgr_GetHeap service to allocate storage for data
structures and the memory-image of each lookup cache. IFSMgr's heap allocator
disburses blocks from locked pages. Each cache is represented by a single
LOOKUP_KEY data structure and one or more LOOKUP_VAL structures, one for
each cache item. The LOOKUP_KEY structures are strung together in a linked list to
facilitate the validation of lookup cache handles (HLOOKUP), to determine
whether a cache name is already in use. Here is the layout for a LOOKUP_KEY:
typede f struct
vo id* next ; / * head of l i s t of LOOKUP_VAL structures (mru ) * /
void* prev ; / * tai l of l i s t of LOOKUP_VAL structures ( lru ) * / ·
PLOOKUP_KEY next_cache ; / * next l ookup cache * /
char* p s zCacheName ; / * name o f the cache * /
DWORD refcnt ; / * number of cache users * /
DWORD numElement s ; / * current number of elements * /
DWORD maxElement s ; / * max number o f elements retained in memory * /
DWORD Flags ; / * determines type o f background process ing * /
Lookup Cache Services 247
These two types of structures are what the lookup cache is built from.
Whenever new items are added to a cache, or when the value of a cache item
changes, or when a lookup occurs which moves an item to the head of the MRU
list, this change needs to be written to the corresponding registry key. These
updates to the registry are deferred until an Appy Time callback is executed. This
callback is scheduled each time a cache change occurs, unless a callback is
already pending; a callback will occur after a 300 second time-out expires. Prior
to scheduling the callback, the Flags members of the affected LOOKUP_KEY and
LOOKUP_VAL structures are set to indicate the kind of processing which is
required. When it is called, the callback handler starts at the head of the
LOOKUP_KEY list and examines the Flags member of each structure. For those
structures needing attention, it first clears the Flags member and then completes
the registry update. While this background processing is taking place, calls to the
lookup services will return with an error code of 1 .
Service Description
_VCache_CloseLookupCache Closes registry key and releases storage
_VCache_CreateLookupCache Creates or opens a lookup cache
_VCache_DeleteLookupCache Not implemented, just returns 0
_VCache_Lookup Looks up a cache key and return its data
_Vcache_UpdateLookup Adds or updates elements in the cache
248 Chapter 1 1: VCACHE: Caches Big and Small
Unlike the block cache services which use . processor registers for passing argu
ments, the lookup cache services all use a C calling convention. Also, unlike a
block cache which must be registered at Device Init time, a lookup cache can be
created after initialization.
It receives four arguments. The first is the name of the cache which will become
the name of the registry key which will hold the cache's contents. Additional argu
ments include the maximum number of elements the cache will hold in memory,
a DWORD of flags (initialized to 0), and the address of a doubleword in which a
handle to the lookup structure will be returned. VCache searches through the list
of LOOKUP_KEY structures to see if the named cache already exists. If the
LOOKUP_KEY does not exist, then an attempt is made to open the registry key. If
the registry key is found, then the values under the key are enµmerated; a
LOOKUP_KEY and one or more LOOKUP_VAL structures are allocated from IFSMgr's
heap and initialized with the results of this enumeration. The address of the
LOOKUP_KEY is then inserted at the head of list.
If this is a brand new cache without an entry in the registry, then only a
LOOKUP_KEY structure is allocated from IFSMgr's heap, and its address is inserted
at the head of the LOOKUP_KEY list. The registry key is not created until an entry
is added to the cache using _VCache_UpdateLookup.
an infinite loop! Perhaps this is why IFSMgr and VREDIR call this function only at
system shutdown, so the Appy Time callback never gets called.
It calculates a checksum value for the specified key's value (pointed to by pKey)
and compares this checksum with the KeySum member of any LOOKUP_VAL5 in
the cache. If a match is found, the contents of the existing LOOKUP_VAL structure
are modified to hold the new values. If no match is found, a new LOOKUP_VAL
structure is allocated and initialized with the pKey and pData values provided as
arguments. In either case, appropriate Flags bits are set and then an Appy Time
callback is scheduled in 300 seconds. The Appy Time handler will refresh or
create keys and values in the registry to reflect the current set of LOOKUP_VAL
structures. Note that if a new value is being added to the cache, . its LOOKUP_VAL
moves to the head of the cache's MRU list. Also, once the number of elements in
the cache exceeds maxElements, each addition of an element requires that the
LOOKUP_VAL at the LRU end of the list be removed.
_VCache_Lookup is the service used for retrieving data for a · specified cache key.
This function's prototype has this form:
int _vcache_Lookup ( HLOOKUP h , DWORD keylen , void* pKey ,
DWORD* pdatalen , void* pData )
It calculates a checksum value for the specified key's value (pointed to by pKey)
and compares this checksum with the KeySum member of any LOOKUP_VAL5 in
the cache. If a match is found, the data associated with the key is copied to the
buffer at pData. One side effect of this function is that it moves the accessed
cache element to the head of the MRU list.
i f ( hServerNameCache == 0 ) return ;
Data = NetIDs [pro i d ] ;
pUni Path = pp- >pp_e lements [ O ] - >pe_unichars ;
keylen = pp- >pp_e lement s [ O ] - >pe_length - s i zeof ( short ) ;
return _vcache_UpdateLookup ( hServerNameCache , keylen ,
pUniPath , datalen , &Data ) ;
The Parsed.Path argument to this function comes from the ir_ppath member of
the ioreq structure. This contains the canonicalized UNC path, starting with the
server name and share name. (For a review of the Parsed.Path structure see
Chapter 6, Dispatching File System Requests.) The first element of the Parsed.Path
structure, the Unicode server name, is used as the key for the cache. The second
argument to this function is the provider ID for the FSD which performed the
connection. This value is converted to a NetID and it becomes the data associated
with the key.
i f ( hServerNameCache == O ) return O ;
pUni Path = pp- >pp_elements [ O J- >pe_unichars ;
keylen = pp- >pp_elements [ O ] ->pe_length - s i z eo f ( short ) ;
retc = _vcache_Lookup ( hServerNameCache , keylen ,
pUni Path , &datalen , &Data ) ;
i f ( retc ! = 0 ) return O ;
return Data ;
}
An Example: IFSMgr's ServerNameCacbe 251
This function takes a single ParsedPath argument which contains the seiver
name as its first PathElement. This is used to perform a cache lookup and if
successful, the variable Data will contain the matching NetlD. IFSMgr uses
another internal function to convert the NetID into a provider ID.
A Survey of IFSMgr
Services
I promised myself that if I ever wrote a book about VxDs, I wouldn't fill it up
with warmed-over API descriptions. The DDK's IFS document and online help file
should be your basic references for API descriptions. But in some cases, the infor
mation these resources contain is inadequate to effectively use IFSMgr's services.
In this chapter, I'll address some of these shortcomings. I'm going to single out
several categories of services and provide more complete documentation for
them. However, all IFSMgr services �re summarized in a series of tables.
The summary tables use the following conventions. The Ordinal column contains
will have a subscript; this is the ordinal for the equivalent service in Windows
the service ordinal number starting with 0. In a few cases the value in this column
3 . 1 1 . The column headings 16, 22, and 22+ refer to the three different versions of
IFSMgr: Windows 3 . 1 1 , Windows 95 build 950, and build 950B (OEM 2). The
trend is toward providing more services, starting with 61 in Windows 3 . 1 1 , to 117
in Windows 95 build 950, to 121 in build 950B. These counts include a number of
services which have no implementation, i.e . , in the retail builds, at least, the
service returns 0 or perhaps sets the carry flag. In the table, these "unimple
mented" services are marked with a u, debug services are marked with a d, and
services which are only available at initialization are indicated by an i. An h
252
FSD Registration 253
indicates that a service is meant to be hooked, and not called directly. The
.
Segment column indicates whether the function resides in locked or pageable
code. Note that just because a service entry point is in locked code doesn't
preclude it from taking a path through pageable code. The Ref column gives
chapter numbers where a service is used or described.
.
The descriptions presented here apply to the Windows 95 v:ersions of IFSMgr.
However, the services provided by Windows 3. 1 1 are also tabulated. The
companion disk contains the library ifswraps.clb, a C library of wrapper functions
for all of the IFSMgr services. For more information on the library, see Appendix
D, IFS Development Aids.
IFSMgr Versions
Your first line of attack to determine which version of IFSMgr a system is using
should be to call IFSMgr_Get_Version. For Windows 3 . 1 1 this will return Ox16,
and for Wi.1,1dows 95 it will return Ox22. Currently, two versions of Windows 95
exist; the retail build 950 and OEM service release 2, which is referred to as build
950B. The IFSMgr VxDs which accompany these two Windows 95 versions are
somewhat different. If you examine the file properties of these drivers using
Explorer, the file versions reported are 4.00.950 and 4.00. 1 1 1 1 . One way to distin
guish these drivers at runtime is to examine the Device Descriptor Block to see
how many services are in the service table. For file version 4.00.950 this value is
117 and version 4.00. 1 1 1 1 it is 1 2 1 .
FSD Registration
Table 12-1 lists IFSMgr's registration services. For a detailed discussion of these
functions see Chapter 8, Anatomy of a File System Driver.
Services 1 17 and 1 18 are new to build 950B. Although these services are not yet
documented, it is clear that they provide FSDs with the capability of registering
and deregistering with IFSMgr.
Heap Management
Given the extensive set of VMM services for memory allocation, you might
wonder why IFSMgr has to offer yet another set of services (see Table 1 2-2). It is
because FSDs and filehooks can't touch pageable memory and can't invoke
memory allocation services which might cause paging when handling the swap
file and memory-mapped files. The reasons for these requirements are discussed
in Chapter 7, Monitoring File Activity. To work around these restrictions, IFSMgr
allocates some fixed system pages and then disburses blocks from these pages
using the service IFSMgr_GetHeap. The blocks are returned to the heap by the
service IFSMgr_RetHeap. Beyond these basic functions, there are additional
servi�es for special needs, such as assuring memory is available under critical
conditions. To begin, let's look at how the heap gets initialized and how it is
organized.
The main and spare heaps are separate one-way linked lists of heap blocks. A
heap block consists of one or more pages of fixed system memory. At the begin
ning of each heap block, a 32-byte structure is used to manage the heap block's
allocations. This structure has the following layout:
typede f s truct tagMemHdr
void* pBl k ; / * address o f this heap block * ;
DWORD s i gnature ; / * IFSMgr • s s i gnature , ' IFSH ' * /
Heap Management 255
Allocations are made from the block's memory range starting at alloc and
extending to pEnd. The first available (free) allocation address in the block is at
pAvail. The following diagram illustrates a heap block containing three alloca
tions, A, B, and C.
20 Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-20 Bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
10 Ccccccccccccccccc
00 << end of block
The four bytes preceding each allocation holds its length. For example, allocation
A is specified by the address of its first byte (represented by the uppercase A); the
length of this allocation is given by the doubleword at address A-4 and is 20 bytes
long. Allocation B has a negative length; this signifies that the 20 byte allocation is
free. It is followed by allocation C with a length of 10 bytes. The length of a heap
block's first allocation is given by alloc [ 0 ] . Starting with this first allocation, all
allocations in a heap block can be walked using these length fields. The very last
doubleword in the heap block contains a 0, and marks an allocation of length 0.
IFSMgr's allocator uses a "first-fit" algorithm. The amtFree member of the header
structure indicates the maximum block size th�t might be allocated from the heap
block. If the amtFree value is large enough to satisfy the requested allocation size,
the first available allocation in the heap block, given by the address pAvail, is
combined with any adjoining free allocations to create a single free allocation. If
this allocation is large enough to satisfy the request, it is used, possibly splitting
the allocation into a used portion and a new free allocation. If the size of the all9-
block. If the request is not satisifed in one block, the next block m the heap is
cation is insufficient, this process is repeated for the next free allocation in the
tried. For each successful allocation from a heap block, the cnt member is
incremented.
IFSMgr_GetHeap
IFSMgr_GetHeap receives a single argument, the requested size of the allocation,
. in bytes. If .successful it returns the address of the allocation; if it fails it returns
NULL. The actual size of the allocation is adjusted by the formula (req_amt + 7) &
256 Chapter 12: A Survey of IFSMgr Services
Oxfffffffc . This rounds the allocation size up to the nearest multiple of 4 . and adds
4 bytes for the doubleword which holds the size of the allocation.
Heap blocks are searched in order for one which will satisfy this allocation
request. When all of the main heap blocks have been searched and none can
satisfy the request, some other storage possibilities are tried. First, the registered
heap reclamation functions are called to see if any user can free an allocation of
at least the requested size. If that does not succeed, the blocks on the spare heap
are searched to see if they can satisfy the request. If a block on the spare list can
supply the required allocation, the block is moved from the spare list to the main
heap, and the allocation succeeds. Finally, if the spare heap can not meet the
request, the allocation will fail if it is less than or equal to 4096 bytes but will
succeed if it is greater than this amount and the required pages can be allocated.
This distinction between "small" and "large" allocations is important. There are
situations in which you would rather fail an allocation than have the service
attempt to grow the heap by allocating more pages. As long as you stick to alloca
tions of 4096 bytes or less, you will get this behavior. However, if you call IFSMgr_
GetHeap at a time when it is safe to perform page allocations, then you can make
multiple page allocations from this service. Then, instead of failing, a new heap
block containing the needed pages will be added to the main heap.
The DDK documentation states that the largest allocation that may be made by
this function is 32 Kbytes or 8 pages. It would seem that the upper limit on alloca
tion size is determined by the amtFree member of the heap block. This member is
an unsigned short so 64 Kbytes or 16 pages appears to be the actual upper limit.
Note that this is a limit imposed by the maximum size of a heap block.
IFSMgr_RegisterHeap
places it at the current front of a linked list. When reclamation functions are called
by IFSMgr_GetHeap, the function at the head of the list is called first, then the
next function, and so forth until the tail of the list is reached. The tail of the list
holds IFSMgr's heap reclamation function; it returns without doing anything.
When a reclamation function is called it receives the requested size of the alloca
zero in EAX , then it is saying that it can not supply the needed memory.
tion on the stack in the doubleword at location EBP+OCh. If the function returns
However, if the function returns a non-zero value in EAX, then the doubleword
stored at location EBP+8 is interpreted as the address of a heap block. IFSMgr_
GetHeap will examine the available allocation given by the pAvail member of the
Time Management 257
heap block. If this allocation can· not satisfy the request, then IFSMgr_GetHeap
will fail; otherwise it will be used to satisfy the request.
IFSMgr_FillHeapSpare
Each call to this service adds a one-page block of fixed memory from "the system
arena to. the spare heap list. As we have seen, IFSMgr_GetHeap uses the blocks
on the spare list as a reserve when an allocation can not be met by the main
heap. Once a block on the spare list is used, it is removed from the spare list and
added to the main heap.
IFSMgr calls this service before dispatching protected mode and V86 mode Int
21h requests.
IFSMgr_RetHeap
This function receives the address of an allocation made via IFSMgr_GetHeap. In
response the function searches the main heap blocks to find one for which the
allocation's address lies between alloc and pEnd. The allocation being freed is
combined with any free allocations which may follow it. This free allocation is
then marked with a negative value equal to its total size. The cnt member of the
heap block is then decremented and if the cnt has reached 0, this heap block is
moved to the spare heap list.
Time Management
The ti.rile management services deal with three different ti.rile representations:
DOS, Net, and Win32. A DOS ti.rile represents a local time; it is stored in a
dostime_t structure which consists of three components:
• Packed 16-bit word containing, year, month, and day: bits 0-4, day (1-31); bits
5-8, month Cl-Jan, 2-Feb, etc.); bits 9-15, year offset from 1980.
• Packed 16-bit word containing hour, minute, and second: bits 0-4, seconds
divided by 2; bits 5-10, minute (0-59); and bits 1 1-15, hour (0-23).
258 Chapter 12: A Suroey of IFSMgr Services
elapsed since January 1 , 1970. This time is in UTC (Coordinated Universal Time)
A Net time is a 32-bit unsigned value which is the number of seconds which have
which used to be known as Greenwich Mean Time (GMT), i.e., the local time at
the Greenwich meridian. A remainder component preseives the number of milli
seconds in a fractional 1 second inteival.
Table 12-3 enumerates the time management setvices which IFSMgr provides. The
seivices IFSMgr_Get_NetTime and IFSMgr_Get_DOSTime retrieve the current date
and time as Net time or DOS time, respectively. The next six functions are pairs
of functions which convert a given time representation to one of the other
possible representations, e.g. , IFSMgr_NetToDosTime converts a Net time to a
DOS time and IFSMgr_NetToWin32Time converts a Net time to a Win32 time.
Note that ;:i Win32 time can not be retrieved directly-it must be derived from
either a Net time or DOS time.
Network Management
Table 1 2-4 lists IFSMgr's network management services. Many of these services
are discussed in Chapters 8 and 13. The services in this group can be divided into
server and client categories. The server functions include IFSMgr_ServerDOSCall,
IFSMgr_SetLoopBack, and IFSMgr_ClearLoopBack. IFSMgr_ServerDOSCall is the
means that a server uses to execute a local file system request on the behalf of
some network client. How IFSMgr dispatches these requests is described in
Chapter 6, Dispatching File System Requests. A server will also use IFSMgr_
SetLoopBack and IFSMgr_ClearLoopBack to maintain loopback paths. A loopback
path refers to a shared network resource on the local machine. For instance, if a
system's server name is TOPDOG and it is sharing a directory C: \BIN as DEV,
then one of the system's loopback paths is the UNC path \ \ TOPDOG\DEV. The
function IFSMgr_SetLoopBack receives pairs of UNC paths and local paths which
allow mapping of local UNC paths to a local drive and directory, e.g. ,
\ \ TOPDOG\DEV maps to C: \BIN. IFSMgr_ParsePath checks the UNC paths it
receives against this loopback list and for matches, it substitutes the local path.
Event .A!/anagernent
Table 12-5 lists IFSMgr's event management services.
The event management services i n several cases are simply wrappers fo r VMM
services. Important exceptions are IFSMgr_SchedEvent, IFSMgr_QueueEvent, and
IFSMgr_Freeloreq. These allow creation of a special kind of IFSMgr event that is
accompanied with an initialized i fsreq structure. Considerable detail is provided
for these functions, since the DDK documentation is incomplete.
The type of event which is scheduled depends on the options which are set in
There . are several combinations which are permitted; these are shown in Table
12-6. Most of these flags restrict when the event is scheduled. The one exception
is EVF_ TASKI1ME, which determines whether an i fsreq structure is initialized
and passed to the callback.
Note that the callback routine is not the event procedure. A single event proce
dure is used for all of the event types. The function ev_func is called from the
common event procedure. The callback function has the following prototype:
void EventCal lback ( pevent pev , pioreq pir )
tion, EVF_ TASK.11ME can be used in conjunction with EVF_NOTCRIT and EVF_
NOTNESTEDEXEC.
262 Chapter 12: A Survey ofIFSMgr Services
This service may also be used to create initialized ifsreq blocks for calling into
an FSD or IFSMgr. The ifsreq blocks created in this way set the ir_pev member to
the address of the event structure associated with it; ir_user, ir_error, ifs_
VMHandle, and ifs_PV are the only other members which are initialized. Although
the documentation refers to the allocated structure as ioreq, a full ifsreq struc
ture is actually allocated (including space for the client register structure). Once
the callback procedure has completed its event processing, it must return the
i fsreq block to IFSMgr using the service IFSMgr_Freeloreq.
IFSMgr_KillEvent
This service can be used to cancel an event which has been scheduled by either
IFSMgr_SchedEvent or IFSMgr_QueueEvent. It receives the address of the event
structure and, depending on the state of the event and type of event, it may issue
Cancel_Time_Out, Cancel_Priority_VM_Event, or Cancel_Restricted_Event.
push BlockID
VMMC all ( _BlockOnID ) ;
_asm cld
}
Two other services are provided which allow events to run in a nested execution
block. The main difference between these is that IFSMgr_Yield enables interrupts
in the VM before running events. Here are the implementations of these functions:
Codepage and Unicode Conversion 263
void IFSMgr_Yield ( ) {
VMMC al l ( Begin_Nes t_Exec ) ;
VMMC all ( Enable_VM_Int s ) ;
VMMC al l ( Resume_Exec ) ;
VMMC all ( End_Nest_Exec ) ;
_asm cld
}
void I FSMgr_RunScheduledEvents ( )
i f ( bPendingGlobalEvents ) {
VMMC all ( Begin_Nes t_Exec ) ;
VMMC al l ( Resume_Exec ) ;
_
VMMC al l ( End_Nest_Exec ) ;
_asm c l c
BCS encodings are represented by codepages. Two codepages are available for
an application to use: an ANSI codepage and an OEM codepage. The OEM code'."
page is associated with MS-DOS applications and includes the line-drawing
characters. Win32 console applications also use the OEM codepage by default.
The ANSI codepage is used by Windows 95 applications (Win16 and Win32). The
specific codepages a Windows 95 system uses depends on the locale; for the
United States, the defaults are MS-DOS US . codepage 437 for OEM, . and US code
page 1 252 (Latin 1) for ANSI.
While it is always possible to convert non-Unicode data to Unicode, the reverse is
not always possible. When it isn't possible to convert a Unicode character to a
character of the current codepage, a default character is used (the underscore
character, "_ " (Ox5f)).
When IFSMgr initializes, it loads conversion tables that map between its local
codepages (OEM and ANSI) and the corresponding subset of Unicode. Addresses
of these tables are returned by IFSMgr_GetConversionTablePtrs.
Each of the conversion services shown in Table 12-7 that contain BCS requires an
argument specifying one of the manifest constants BCS_ OEM or BCS_ WANSI to
264 Chapter 12: A Survey of IFSMgr Services
select a codepage for the conversion. The services BCSToBCS and BCSToBCS
Upper require two such arguments, since these functions convert a string from
OEM to ANSI codepage or vice versa (the "Upper" version also uppercases the
destination string). The services UniToBCS and BCSToUni convert from Unicode
to BCS or vice versa. UniToBCSPath takes a ParsedPath structure representing a
canonicalized Unicode pathname and converts it to BCS. UniChatroOEM converts
a Unicode character to a character of the OEM codepage. UniToUpper converts a
Unicode string to upper case.
Filename Manipulation
Name, Unicode 8.3 Name, and Unicode Long Name. The FCB is an ancient MS
There are three fundamental filename types which IFSMgr uses: Unicode PCB
DOS structure known as the file control block which contains a drive identifier,
filen�me, extension, file size, record size, various file pointers, and date and time
stamps. The filename is limited to 8 characters and padded with spaces; similarly,
version of this name format is the same except that each character occupies 16
the extension is limited to 3 characters and also padded with spaces. The Unicode
A Unicode Long Name is just a Unicode string. The dot character assumes no
special significance and is treated like any other character. A Unicode 8.3 Name is
a special case of a Unicode Long Name.
The services which IFSMgr supplies for manipulating these types of names are
shown in Table 1 2-8.
IFSMgr provides several services for converting one name type to another. Create
Basis takes a Unicode Long Name and converts it into a Unicode FCB Name (the
"basis") according to a set of truncation and translation rules. FCBToShort
converts a Unicode FCB Name to a Unicode 8.3 Name, whereas ShortToFCB does
just the opposite. The service ShortToLossyFCB also translates a Unicode 8.3
Name to a Unicode FCB Name but uses only Unicode characters which are also
available in the OEM codepage. The Append.BasisTail service adds a "numeric tail"
to the 8 character filename portion of a Unicode FCB Name created by Create
Basis. This function assures that after appending the numeric tail, the filename will
not exceed 8 bytes if it is converted to BCS. This service is used to create short
·
name aliases for long filenames. One thing that the short to FCB conversion
services fail to do is convert "*" into a sequence of "?" characters. You can detect
the presence of this wildcard character by examining the parsing flags; it w:ill be
indicated with the FILE_FLAG_HAS_STAR bit. This becomes an issue with the
this matching mode, only the "?" character is treated as a wildcard ("*" is a literal
meta-matching services when short name matching semantics are being used. In
character).
Filename Matching
Table 1 2-9 lists the filename matching services which IFSMgr provides.
When an FSD needs to search media for a matching filename or a set of filenames
that match a wildcard string, IFSMgr_MetaMatch is the service to use. This service
takes a pattern string, a filename to test, and flags which control the matching
semantics. If the pattern string and the filename to be tested are in Unicode FCB
266 Chapter 12: A Survey of IFSMgr Services
format, then DOS matching semantics are specified. If the pattern string and file
name are Unicode Long or Unicode 8.3, then NT matching semantics are
specified. When matching a Unicode Long Name pattern against Unicode 8.3
Names, it may be necessary to append a trailing dot to the short name to get DOS
compatible match behavior.
name, a list of numeric values already in use will be obtained. A new unique alias
of numeric tail). After testing the entries in a directory for matches with a basis
Path Parsing
IPSMgr's path parsing services are listed in Table 1 2-10. The primary path parsing
service is IFSMgr_ParsePath. IFSMgr_FSDParsePath is a wrapper around IFSMgr_
Path Parsing 267
The ir_data member of ifsreq holds the input path string which is to be parsed.
This string can be encoded as either BCS or Unicode. The ifs_nflags member
contains two bits which indicate the string type. If bit 0 is set, it contains charac
ters which are in the current OEM codepage, whereas if it is clear, characters
come from the current ANSI codepage. If bit 1 is clear, the string uses BCS
encoding, but if it is set, Unicode is used.
The parsing routines require some buffers for working space and to return the
ParsedPath data structure. If the ir_ppa_th member of i fsreq is initialized to
Oxfffb ffb b, then IFSMgr will assign the caller a buffer from its pool of parse
buffers. These buffers are reclaimed by IFSMgr when it performs cleanup after a
command is dispatched. You shouldn't use this facility if you are performing your
own cleanup since the internal functions which are needed are not available to
FSDs. The alternative is to pass in a pointer to your own buffer. You do this by
creating a 1820-byte allocation and assigning its address to both ir_ppatb and ifs_
pbuffer.
The main result of a parsing operation is a canonicalized path stored in a Parsed
The return value of IFSMgr_ParsePath also contains information about the path;
the format of the doubleword which is returned is described by Table 12-1 1 . The
DDK documentation only gives descriptions of the parsing flag values; it does not
mention the value returned in the low byte. This value classifies the path type.
Parsing Flag/Path Name Type High Byte Mid Word Low Byte
FILE_FLAG_LONG_PATH 20h x x
FILE_FLAG_KEEP_CASE lOh x x
FILE_FLAG_HAS_DOT 08h x x
04h
Path name types:
FILE_FLAG_IS_LFN x x
Standard path x x 0
? x x 1
UNC Path x x 2
Invalid Pathname! x x 3
Path is Hooked x x 4
Network Printerl x x 5
Invalid . Resourcel x x 6
Character FSD Device Name x x 7
DOS Device Name x x 8
1 Thanks to Geoff Chappell for supplying these entries.
File Sharing
Table 12-12 lists IFSMgr's file sharing services.
These services fall into two categories. The first group i s used by a n FSD t o main
tain a lock list for a file handle. IFSMgr is the actual keeper of the active lock list
for a file. To add a lock to a file, IFSMgr_LockFile is called like this:
IFSMgr_LockF i l e ( &pFSDLockLi s t , pir->ir_pos , pir->ir_locklen ,
pir-» i r_p i d , pir->ir_fh , pir->ir_options )
This call is shown as it might be made from an FSD's FS_LockFile function, which
receives a pointer to the ioreq structure in pir. As you can see, in addition to the
lock's starting position and length, the process, file open instance, and lock
options are recorded as well. The variable pFSDLockList holds the return value,
the head of the lock list for this file. Typically, this would be stored as part of a
data structure that is associated with the open file instance. IFSMgr_UnlockFile
removes a single lock; it must be called with the same parameters that were used
in the IFSMgr_LockFile call. There are occasions when all locks must be removed
from a single file open instance or all file open instances, such as closing a file or
deleting a file. To handle this situation, use IFSMgr_RemoveLocks. Before
touching a locked region of a file, an FSD should call IFSMgr_CheckLocks to see
if a read or write operation would violate any active locks. Finally, IFSMgr_Count
Locks gives an FSD a means of counting the number of active locks on an open
file instance.
The services IFSMgr_UnassignLockList and IFSMgr_ReassignLockList are used for
saving and restoring locks for files which are temporarily closed during a levei 3
volume lock. A level 3 lock prevents all processes except the lock owner from
reading or writing to the disk. In preparation for entering this mode, the files on
the volume are closed with a special ir_options flag (FILE_CLOSE_FOR_LEVEL3_
LOCK). On a normal close, the FSD would call IFSMgr_RemoveLocks, but when it
receives this flag it should save the lock list for each file by calling IFSMgr_Unas
signLockList. Later, when the level 3 volume lock is relinquished, a special ir_
options flag (OPEN_FLAGS_REOPEN) is specified for each file as it is reopened. As
part of opening the file, the FSD needs to restore any locks that previously
existed; IFSMgr_ReassignLockList retrieves the necessary information.
See Chapter 8 for details on using IFSMgr_CheckAccessConflict.
270 Chapter 12: A Survey of IFSMgr Services
Plug-and-Play
Table 12-13 lists IFSMgr's plug-and-play services.
Three of these functions are called by IOS (1/0 Supervisor) to query or report a
change in state of a plug-and-play drive. NotifyVolumeArrival reports the appear
ance of a new drive to IFSMgr, NotifyVolumeRemoval reports the removal of a
drive, and QueryVolumeRemoval checks the status of a drive prior to removing it.
_VolFlush is also included under plug-and-play services, since it is usually neces
sary to flush dirty buffers to a volume before removing it from the system. This
service takes a volume number and an optional flag which forces any cached data
to be discarded. This service ultimately results in a FS_FlushVolume call to the
volume's FSD.
resources, plug-and-play drives, and network transports. Drivers that register with
the Configuration Manager through CONFIGMG_Register_Device_Driver supply a
callback entry point that receives these PNP broadcasts. These broadcasts are also
sent to applications via the WM_DEVICECHANGE message.*
Win32 Support
Table 1 2-14 lists IFSMgr's Win32 support services.
The Win32 Support services all carry the warning: "This service is intended solely
for the purpose of the Win32 subsystem. It should not be used by any other VxD
in the system. " OK, you've been warned.
• For an excellent discussion of plug-and-play and the configuration manager, see Chapters 11 and 12 of
Systems Programming for Windows 95 by Walter Oney. His book also includes a useful spy utility which
monitors WM_DEVICECHANGE messages.
Ring-0 File l/O 271
EAX, EBX, ECX, EDX, and ESI registers with parameters, invoke the function, and
IFSMgr_RingO_FileIO is essentially a ring-0 interrupt 21h interface. You load the
input registers. As in the Int 21h interface, the AH portion of EAX input register
get the results in the EAX and ECX registers and in buffers referred to by the
holds the function number. . Only 1 5 major functions are supported, some of
which have subfunctions; these are listed in Table 1 2-16. See the DDK documenta
tion for details on register usage for each function.
As with the protected-mode and virtual-86 mode Int 21h handler, a preamble is
called on each ring-0 Int 21h function. If the preamble returns with carry set, the
function is not dispatched. Note that the preamble functions for the ring-0 inter
face can not be modified using IFSMgr_SetReqHook. For many of the functions,
the RO_Default preamble is used, which simply clears the carry flag and returns,
allowing the function to be dispatched. Functions which receive a pathname as
an argument call RO_MapPath, which in tum calls an Int 21h preamble which uses
Miscellaneous 273
Map_Flat to convert DS:DX into linear addresses and possibly run the path
through IFSMgr_ParsePath. When this preamble is called from the ring-0 interface,
however, it does nothing. The only preambles which actually test the input param
eters are RO_DriveChkl and RO_DriveChk2, and they only validate the -zero-based
drive number. So you need to heed the DOK warning: "Users of this service
should be very careful to check that they are passing in valid parameters. "
Table 12-16 also enumerates the dispatch routines which are invoked fo r each
ring-0 function. For most of the functions, a common dispatch routine is shared
by the ring-0 interface and the PM/V86 mode Int 21h handler. The dispatch
routines which are unique to the ring-0 interface have names which begin with
dRO. These routines reside in locked code.
Miscellaneous
Table 1 2-17 lists IFSMgr's services which don't fall into one of the other categories.
Debugging
Table 12-18 lists IFSMgr's debugging services.
To aid our exploration of VREDIR, two new monitors for MultiMon are intro
duced. The first is a NetBIOS monitor that displays all calls through VNETBIOS;
the second is a monitor that displays the types of SMB packets passing through
NetBIOS. While they aren't a substitute for a LAN protocol analyzer or "packet
sniffer," they have the advantage of integrating well with our IFSMgr Filehook
monitor so we can relate file system requests to the resultant network activity.
275
276 Chapter 13: VREDIR: The Microsoft Networks Client
transport protocols: NetBEUI, TCP/IP, or IPX/SPX (or any transport that supports
NetBIOS). The last two require shims to convert the NetBIOS request into a form
amenable to TCP/IP or IPX/SPX. These protocols frame the SMB packet or trans
ferred data with appropriate headers and trailers before passing it to the NDIS
driver. Incoming packets wend their way up to VNETBIOS which notifies clients
of completed requests and the receipt of data. Since VREDIR is the Microsoft
Networks client, it does not accept requests from other systems; VSERVER fulfills
that role.
The two interfaces in Figure 13-1 which we are most interested in are the IFSMgr/
VREDIR and VREDIR/VNETBIOS boundaries. IFSMgr and VREDIR use the stan
dard FSD linkage which we explored in Chapter 8, Anatomy of a File System
Driver. For VREDIR to establish a connection to a remote "share," there must be a
server on a remote computer which is sharing it. Although peer-to-peer Windows
95 networks would rely on VSERVER to provide these shares, many other SMB
server possibilities exist, including Windows for Workgroups, LAN Manager,
Windows NT, OS/2, and UNIX/Linux workstations running SAMBA. To represent
a connection, IFSMgr creates a shell resource on the client computer. For
instance, suppose a single server exposes two different directories as shares with
UNC names \ \SERVER\DESKTOP (local directory: c: \ windows\Desktop) and
\ \ SERVER\PGMS (local directory: c: \Program Files). If a file is opened in each
directory from a remote computer using full UNC paths, two shell resources will
be created, one for each shared resource connection. On the other hand, if we
were to open two files in the remote directory \ \ SERVER\DESKTOP only a single
shell resource would be required. In either case, two file handles are needed.
VREDIR Interfaces 277
VREDIR Interfaces
The upper side of VREDIR communicates with IFSMgr via the function table inter
face. Network FSDs populate their function tables with somewhat different
routines than a local FSD. Since shared resources may be of several different
types, open operations on these resources may return addresses to one of several
handle-based function tables. The lower side of VREDIR needs to communicate
with the local area network. There are two levels at which this is done. The first is
concerned with the mechanics of sending and receiving packets to specific
servers on the net-this is taken care of by the NetBIOS interface, which we
examine here. The second level concerns the content of these packets, i.e., format
ting the packets according to the protocol expected by the server. This is taken
care of by the SMB file sharing protocol which we'll examine in the next section.
The functions which are listed in bold characters are implemented by VREDIR.
Note that FS_Ioctl16Drive, FS_GetDiskParms, and FS_DASDIO are not imple
mented in a network FSD but FS_NamedPipeUNCPipeRequest is. This is in
278 Chapter 13: VREDIR: The Microsoft Networks Client
Table 13-1 shows the handle-based functions (in bold) for each resource type.
The FS_ReadFile and FS_WriteFile functions at the top use different routines
depending on the open access mode. A "deny" entry means that the function
both sets ir_error to ERROR_ACCESS_DEMED and returns that error code. A
"zero" entry means that the function sets ir_length to zero and it returns success.
We can see from the table that all resource types use a common FS_CloseFile func
tion, close 1 . For RESTYPE_ WIW and RESTYPE_ CHARDEV resources, FS_CloseFile
VREDIR Interfaces 279
pointer to the NCB. The way that VxDs use NetBIOS is to load the linear address
of the NCB in EBX and call the service VNEI'BIOS_Submit.
The Network Control Block which is used to request NetBIOS services has the
following layout:
typedef s truct _NCB
UCHAR ncb_command ; I* 0 0 command code * /
UCHAR ncb_retcode ; /* 0 1 return code * /
UCHAR ncb_lsn ; /* 0 2 local sess ion number * /
UCHAR ncb_num ; /* 03 number o f our network name * /
PUCHAR ncb_bu f fer ; /* 0 4 address o f mes s age buf fer * /
WORD ncb_length ; /* 0 8 s i z e o f mes sage buf fer * /
UCHAR ncb_callname [ NCBNAMS Z ] ; /* O A blank-padded name o f remote * /
UCHAR ncb_name [ NCBNAMSZ ] ; /* lA our blank-padded netname * /
UCHAR ncb_rto ; /* 2A rev t imeout/ retry count * /
UCHAR ncb_s to ; /* 2 B send timeout / sys t imeout * /
void ( *ncb_pos t ) ( s truc t _NCB * ) ; / * 2C POST routine address * /
UCHAR ncb_lana_num; /* 3 0 lana ( adapter ) number * /
UCHAR ncb_cmd_cpl t ; /* 3 1 Oxff = > c ommmand pending * /
UCHAR ncb_reserve [ l O ] ; /* 3 2 reserved , used by BIOS * /
HANDLE ncb_event ; /* 3 C HANDLE t o Win3 2 event which * /
/* will be set to the signalled * /
/* s tate when an ASYNCH command * /
/* completes * /
} NCB , * PNCB ;
This definition comes from the Win32 SDK header file nb30.h ; an equivalent
header is not provided in . the DDK. Several fields in this structure are used in
every NetBIOS command; others are only needed for certain commands. The
280 Chapter 13: VREDIR: The Microsoft Networks Clif!nt
NetBIOS commands are grouped into four broad categories: name support, data
gram support, session support, and utility. The manifest constants which are used
here to refer to NetBIOS commands are defined in the header file nb30.h. The
name commands add and remove names from the local name table. The first
name in this table is the local node name or MAC address and cannot be deleted.
A name is added to the table with the command NCBADDNAME but only if it is
verified to be unique on the LAN. Each name is subsequently referred to by its
index in the local name table. A name is removed from the local name table with
NCBDELNAME. A non-unique group name may also be added to the local name
table using the command NCBADDGRNAME. In order for this command to
succeed, the group name must not have already been claimed as a unique name
on the LAN. Group names are intended to be registered by more than one
network node.
• For )llOre information see How to Use LANA Numbers in a 32�bit Environment, Microsoft Knowledge
Base article Q138037. See bttp:/lwww.microsoft.com/kb/articles/q138/0/3 7.htm.
The SMB File Sharing Protocol 281
The NetBIOS utility commands include NCBRESET, which resets the NetBIOS
name and session tables and aborts any existing sessions; NCBCANCEL, which ·
cancels a specified NetBIOS command; NCBASTAT, which requests status of a
local or remote adapter. NCBASTAT can be used to retrieve the MAC address of
an adapter."
MultiMon's NetBIOS monitor only sees commands which are issued through
VNETBIOS_Submit. The driver name for this monitor is nbhook.vxd. We will be
using this monitor in a following section to trace VREDIR's operation.
This has been a condensed oveiview of NetBIOS. For more, see C Programmer's
Guide to NetBIOS, by W. David Schwaderer (Howard Sams & Co. , 1988).
since then to become the native file-sharing protocol for LAN Manager, Windows
seivers and clients by installing the SAMBA suite. SAMBA is available via FTP
NT, OS/2, and Windows 95. UNIX and Linux platforms can also become SMB
• See Getting the MAC Address for an Ethernet Adapter, Microsoft Knowledge Base Article Ql 18623. See
http://www.microso.ft.com/kb/articles!q1 1816123.htm.
t From the draft document Microsoft Networks SMB File Sharing Protocol, Document Version 6.0p, Jan. 1 ,
1996, Microsoft Corp.
282 Chapter 13: VREDIR: The Microsoft Networks Client
};
USHORT Tid; II 1 8 Tree ident ifier
USHORT Pid; II lA Caller ' s proc e s s id
USHORT Uid ; I I lC Unauthenticated user id
USHORT Mid; II lE mul t iplex id
UCHAR WordCount ; I I 2 0 Count of parameter words
I I The remaining fields depend upon command type
USHORT ParameterWords [ wordCount ] ; I I The parameter words . .
USHORT ByteCount ; I I Count of bytes
UCHAR Buffer [ ByteCount ] ; I I The byte s
} SMB_HEADER ;
calling process. A client uses the Pid value in a response message block to sort
out which process the server is responding to. A Mid would be used by a multi
threaded client to identify a thread within a process. It allows for multiplexing
multiple message blocks on the same connection. A Uid is returned in a server
response message block as an identifier representing a validated account name
and password. Uids are only returned by user level servers but not by share level
servers. A share level server simply makes a resource available on the network to
any client which knows its name; password protection is optional. The last fixed
member in the header is WordCount. It tells us the number of intervening words
between it and the member ByteCount. ByteCount tells us the number of bytes .
until the end of the message block.
PC NETWORK PROGRAM 1 . 0
MICROSOFT NETWORKS 3 . 0
DOS LM1.2X002
DOS LANMAN 2 . 1
Windows fo r Workgroups 3 . l a
NT LM 0.12.
There are something like 10 different dialects of the SMB protocol. When a client
claims compatibility with a certain dialect, it is also claiming compatibility with
that dialect's · precursors. Table 13-2 indicates the major dialect in which a
command was introduced.
The most basic dialect is that named PCNET PROGRAM 1 .0. This is also called the
"core protocol" because it is the minimum SMB implementation. The next signifi
cant expansion of the protocol occurred with LANMAN 1 .0. The other dialects
listed in Table 13-2 are LM1.2X002 for Lan Manager 2.0 and NT LM 0.12, Lan
Manager 2.0 for Windows NT. Windows 95 supports this "highest" dialect.
The names of the commands provide some hint as to what they do. For instance,
SMB_COM_OPEN opens a file on the server, SMB_COM_QUERY_INFORMATION,
gets file attributes for a file on a server, and SMB_COM_TREE_CONNECT estab
lishes a connection to a shared directory (or "tree") on the server. You'll notice
many commands have the suffix "ANDX" . These commands support a form of
command batching in which a single message block . contains more than one
command. For instance, SMB_COM_OPEN_ANDX will open a file and possibly do
commands "X" , where additional commands are defined by fields in the param
eter section of the message block.
Chapter 13: VREDIR: The Microsoft Networks Client
�age Flow
� To get a feel for how the SMB protocol is used, let's follow the steps taken in
response to a simple Win32 program that performs these statements:
hFile = CreateF i l e ( " \ \ \ \WETSUIT\ \ DESKTOP \ \Notes . doc " ,
GENERIC_READ , 0 , NULL , OPEN_EXISTING , 0 , NULL ) ;
s i z e = GetFi leS i z e ( hFi le , NULL ) ;
ReadF i l e ( hFi le , pBu f , s i z e , &ac tual , NULL ) ;
CloseHandle ( hFi l e ) ;
Table 13-3 shows the exchange of messages between client and server when this
code executes. The first six lines in the table correspond to the single Win32
CreateFile call. If a connection does not already exist with the specified server
(WETSUIT) then a session is established using SMB_COM_NEGOTIATE and SMC_
COM_SESSION_SETUP ANDX If these commands succeed then a connection is
_ .
the File Transfer Protocol (FTP). HTTP is a read-only protocol and FTP is for trans
cols used on the Internet today are the Hypertext Transport Protocol (HTTP) and
ferring complete files. CIFS would provide file sharing with read-write access and
thus support collaborative work on files across the Internet. The SMB protocol,
upon which CIFS is based, already implements a variety of locking and security
FTP. CIFS is also intended to given all applications access to files on the Internet,
features which give clients more optimized access to server files than HTTP or
Task Force (IETF) as an Internet draft document and is available via FTP from ftp.//
The full specification for CIFS/1 .0 has been submitted to the Internet Engineering
this time)-a web browser is required to interpret them. The second example the
specification gives is an UNC name, such as \ \ corpseroer\public\policy.doc. Here
again, the server name is delimited by the leading double slashes and the next
'
slash, and everything after that is the relative name, i.e . , corpserver and
public\policy.doc, respectively. In the specification's final example, a drive letter
is mapped to a server and relative name, through a lookup table. For instance, if
drive x: is mapped to the server, corpseroer, and the relative name is public, then
the name x: \policy.doc is equivalent to our previous example.
Once a server name is extracted from a client URL or UNC name, it needs to be
converted to a server transport address. Again, this is not a part of the CIFS specifi
cation. Traditionally, the SMB protocol is implemented using the NetBIOS API and
so a server name would be limited by NetBIOS naming conventions (i.e., up to 1 5
characters and uppercase). However, CIFS i s really targeted a t servers out o n the
Internet and server names should be resolved using DNS (the Domain Name
System). The CIFS specification also notes that a server name may be given using
dotted decimal notation, as in 1 57.33.135.101. In this case, the server transport
address is simply its 32-bit IP address.
A connection is established with session service TCP port 139 of the server by
sending a session request packet. This packet contains a calling name and called
name. The calling name is used to distinguish client:S using the same transport
address. The called name is the invalid NetBIOS name *SMBSERVER padded with
spaces to 1 5 characters. A CIFS server should accept a session request with this
called name. Note that CIFS is using NetBIOS on top of TCP as detailed in RFC
1001/1002 .*
Once the connection is established with the server, the flow of SMB commands
would follow the same pattern as we saw in the previous section, "Message
Flow. "t
• See Karl Auerbach, Protocol Standardfor a Netbios Seroice on a Tcp!Udp Transport: Concepts and Meth
ods, RFC 100 1 , March 1987; and Protocol Standardfor a Netbios Seroice on a Tcp/Udp Transport: Detailed
Specifications, RFC 1002, March 1987.
t For a readable account of the CIFS/SMB protocol's various types of locks (opportunistic locks, exclusive
locks, batch oplocks, and level II oplocks) see the article by Paul Leach and Dan Perry, CIFS: A Common
Internet File System, in Microsoft Interactive Developer, November, 1996 (this article can be viewed online
at http://www. microsoft.com/mind).
Tracing VREDIR Operations 289
monitors that were used to collect this trace were lnt21 Win32 Service (w2 1),
IFSMgr Filehook (fsh), NetBIOS Calls (ncb), and SMB Packets (smb):
Monitor Function Status Device Handle Parameters
CreateFile
w2 1 LFN ( 7 1 ) Extended Open ( 6c )
\ \WETSUIT\ DESKTOP\Notes . doc
ncb Cal l a sync Lana=07 c 1 6b7 6 4 0 Cal lname : WETSUIT
ncb Call post ( O O ) c 1 6b7 6 4 0 LSN : 0 7 *
ncb Send a sync Lana=0 7 c 1 6b7 6 4 0 LSN : 0 7
Buf fer : c 3 a 7 4 3 e4 ( 0 0 9 a )
smb NEGOTIATE reque s t c 1 6b7 6 4 0
ncb Send post ( O O ) c 1 6b7 6 4 0
ncb Send async Lana= 0 7 c 1 6b7 6 4 0 LSN : 0 7
Bu f fer : c 3 a7 4 3 e4 ( 0 0 8 e )
smb SESSION_SETUP ANDX
TREE_CONNECT_ANDX request c 1 6b7 6 4 0 \ \WETSUIT\DESKTOP
ncb Send post ( O O ) c 1 6b7 6 4 0
ncb Send a sync Lana= 0 7 c 1 6b7 6 4 0 LSN : 0 7
Buf fer : c3 a7 4 3 e 4 ( 0 0 4 c )
smb OPEN_ANDX request c 1 6b7 6 4 0 Notes . doc
ncb Send post ( 0 0 ) c 1 6b7 6 4 0
fsh FS_OpenFi l e ( 6c ) VREDIR 2 f2 * \NOTES . DOC oe
GetFileSize
w2 1 Seek ( 4 2 ) 2 f2 ( 1 ) offs=O
fsh FS_FileSeek ( 4 2 ) VREDIR 2 f2 o f s = OH b
w2 1 Seek ( 4 2 ) 2 f2 ( 2 ) o f fs = O
f sh FS_FileSeek ( 4 2 ) VREDIR 2 f2 ofs=OH e
w2 1 Seek ( 4 2 ) 2 f2 ( 0 ) offs=O
fsh FS_Fileseek ( 4 2 ) VREDIR 2 f2 ofs=OH b
ReadFile
w2 1 Read ( 3 f ) 2 f2 cnt=4 8 0 0
buf= 1 3 f : d9 3 4
ncb Receive a sync Lana=07 c 1 6b7 6 4 0 LSN : 0 7
Buf fer : c3 ab7 9 3 4 ( 4 8 0 0 )
ncb Send a sync Lana=07 c 1 6b7 6 e 0 LSN : 0 7
Buf f er : c3 a7 4 3 e 4 ( 0 0 3 3 )
smb READ_RAW request c 1 6b7 6 e 0
ncb Send post ( O O ) c 1 6b7 6 e 0
ncb Receive post ( O O ) c 1 6b7 6 4 0
fsh FS_ReadFi l e ( d6 ) VREDIR 2 f2 cnt=4 8 0 0H o f s = O H
ptr= 6 5 d9 3 4H
CloseHandle
w2 1 Close ( 3 e ) 2 f2
ncb Send a sync Lana= 0 7 c 1 6b7 6 4 0 LSN : 0 7
Buf fer : c3 a 7 4 3 e4 ( 0 0 2 9 )
smb CLOSE request c 1 6b7 6 4 0
fsh FS_CloseFi l e ( 3 e ) VREDIR 2 f2 f
ncb Send post ( O O ) c 1 6b7 6 4 0
290 Chapter 13: VRED/R· The Microsoft Networks Client
The output has been grouped into four sections, one section for each Win32 func
tion call.
Beginning with the CreateFile call, we see that it gets passed to VWIN32 where it
becomes dispatched as a protected-mode Int 21h function 716ch. This function
will enter IFSMgr through the dispatch function which we named dOpenCreate
(see Chapter 6, Dispatching File System Requests). As dOpenCreate prepares an
ifsreq structure, it generates a canonicalized pathname by a call to IFSMgr_Parse
Path. As we saw in Chapter 7, Monitoring File Activity, this service will establish a
connection to a server and share using IFSMgr_SetupConnection, if it is passed an
UNC path. VREDIR is called at this point through its FS_ConnectNetResource
entry point, but this doesn't show up in our trace because the call is made directly
through the table of registered FSDs (ConnectNetTable) and not through the
system filehooks.
The first action that we see VREDIR take is to make a NetBIOS Call to the speci
fied server, in this case WETSUIT. The line in the trace indicates that this function
call was made asynchronously to LANA 7 using an NCB at address c16b7640h.
(post ( 0 ) ) and a Local Session Number of 7 has been assigned to this connection
The next line of the trace shows that this command has completed successfully
with WETSUIT.
Now that a session has been established, VREDIR does a NetBIOS Send, reusing
the same NCB at c16b7640h. This NCB contains a pointer to a buffer at c3a743e4h
which is 9ah bytes in size. This buffer contains the message block for the SMB_
COM_NEGOTIATE command which is sent to the session partner of LSN 7
lines down where its matching post ( 0 ) is recorded. At this stage, we have noti
(WETSUIT). Again this is an asynchronous command, and we see it complete two
fied WETSUIT about the dialects of SMB which we support. The next NetBIOS
Send command transfers a message block containing a batched command
consisting of SMB_COM_SESSION_SETUP_ANDX and SMB_COM_TREE_
CONNECT_ANDX. The latter command creates a connection to the subdirectory
\ \ WETSUIT\DESKTOP and returns a Tid which is used in subsequent commands
which reference this server and share. When this command completes, we have
seen the last action taken on behalf of FS_ConnectNetResource. From this we see
that VREDIR needs to keep at least two pieces of information about this connec
tion, its LSN and its Tid. The resource handle (ir_rh) which VREDIR returns to
IFSMgr retains this and other state information. IFSMgr in tum builds its own shell
resource structure (shres) to represent the connection.
The last NetBIOS Send, under the CreateFile section, transfers a message block
containing a SMB_COM_OPEN_ANDX command. This requests that the server
WETSUIT open the file named Notes.doc on the Tid for this connection. This
action is taken in response to a call to VREDIR's FS_OpenFile entry point. The
IPCfor Network FSDs 291
trace output line for this call occurs after the NetBIOS activity, because the file
hook reports function calls after they complete. Just as the resource handle retains
VREDIR's information about a connection, VREDIR's returned file handle (ir_jh)
retains information about this open file. This would include things such as the Fid
(file identifier) returned by the SMB_COM_OPEN_ANDX command, its open
mode, and various file attributes. When VREDIR returns, IFSMgr builds its own
file handle structure (fhandle) and assigns it an extended handle of 2f2h.
GetFileSize is implemented as three Int 21h function 42xxh calls via VWIN32. The
first seek moves the file pointer from its current position to offset 0. Then a seek
the file pointer is restored to the beginning of the file. Although VREDIR's
is performed to the end of the file to determine its maximum byte position; then
FS_File
Seek entry point is called on each of these seeks, VREDIR refers to information
stored in its file handle structure to satisfy the requests.
ReadFile becomes an Int 21h function 3fh call passed to IFSMgr via VWIN32. This
call then gets passed to the FS_ReadFile entry point of VREDIR. The first action
we see taken is to initiate an asynchronous NetBIOS Receive command for 4800h
bytes on LSN 7. While this Receive is pending, a NetBIOS Send transfers a
message block containing a SMB_COM_READ_RAW command to the server. We
see the read command finish first, followed by the receive. The underlying
protocol handles the assembly of incoming data packets into the 4800h byte
buffer.
Finally, at the end, CloseHandle becomes an Int 21h function 3eh call passed to
IFSMgr via VWIN32. This call then gets passed to the FS_CloseFile entry point of
VREDIR. The NetBIOS Send transfers a message block containing a SMB_COM_
CLOSE command for the Fid returned by the earlier SMB_COM_OPEN_ANDX
command.
Mailslots
The simplest type of interprocess communication (IPC) which VREDIR and
IFSMgr support is the mailslot. A mailslot user plays one of two roles. The
mailslot server creates the mailslot and only reads from it. The mailslot client
opens the mailslot and only writes to it. A single process may be both a mailslot
client and server. Data is transferred as datagrams and thus its arrival is not
guaranteed.
Registering a mailslot
ir_jlags
0, create mailslot; 1 , delete mailslot; 2, write mailslot
ir_options
1 , first mailslot create; >l, subsequent create
irJJpath
canonicalized UNC mailslot name without the leading \MAJLSLOT\ component
ir_data
supplies address of function to be used for mailslot reads
ir_auxl
IFSMgr's mailslot handle (address of mailslot block)
irJJos
TRUE, call originated in an FSD; FALSE, call originated_ in User API
ir_hfunc
pointer to handle function table
ifsJJsr
pointer to IFSMgr's mailslot shell resource
ir_auxl
on return, contains mailslot handle created by IFSMgr
ir_error
on return, contains error code (0 if successful)
In Chapter 8 we examined the mounting and connecting functions used by local,
network, and character FSDs. In these cases, the FS_MountVolume or FS_Connect
NetResource functions always returned a volume-based function table. We don't
/PCfor Network FSDs 293
see that with mailslots; furthermore, the shell resource structure for mailslots sets
sr_func to NULL. Mailslots which are created using Win32 and MS-DOS APis are
represented by an SFI'-backed DOS file handle. The fhandle structure associated
with this file handle holds the handle-based function table in the member jb_hf
The functions which a mailslot implements are FS_ReadFile, FS_WriteFile, FS_
CloseFile, FS_FileDateTime, and FS_NetHandleinfo.
Server-side
\ \ . \MAJLSLOT\testslot. The leading characters, ''\ \ . \ ', indicate that a mailslot can
When a mailslot is created, it is given a UNC name of the form
'
only be created on a local machine. The actual name of the mailslot is the portion
that follows "\ \ . \MAJLSLOT\ . Also note that mailslot names follow the 8.3
"
naming convention. •
If a mailslot message is present to be read, the Win32 API ReadFile or one of the
MS-DOS functions 3fh (Read File) or 5f50h (DosReadMailslot) is called. Ultimately,
these functions utilize FS_ReadFile in the handle-based function table which was
setup when the mailslot provider registered itself. The jb_jh member of the file's
fhandle structure tells us where the mailslot block is located. The read operation
is completed by transferring the requested amount of data from the mailslot's
buffers into the caller's buffer and adjusting pointers and counts.
• This is documented in the Microsoft Knowledge Base article Q139716, BUG: Windows 95 Limits Mai/slot
Names to 8.3 Naming Convention. See http://www.microsoft.com/kb/articles/ql3917/16.htm.
294 Chapter 13: VREDIR: The Microsoft Networks Cltent
name. When a datagram does come in, a Receive Datagram completes and the
post routine is called. The post routine stores an appropriate handler address in
the NCB, and then calls Call_Priority_VM_Event with an event procedure and the
NCB as reference data. In the event handler, a Receive Datagram command is re�
issued for the same local name number and the post handler function is called.
The handler processes the NCB and input buffer. It verifies that the buffer
contains a SMB message block with a SMB_COM_TRANSACTION command (sub
command 1). If everything is in order, then a IFSMgr_WriteMailslot command is
issued using the contents of the NCB and associated buffer. This service gets an
asynchronous i fsreq packet from IFSMgr, fills it with the service's arguments,
and then calls into the mailslot FS_WriteFile. When FS_WriteFile returns, the
ifsreq packet is released by calling IFSMgr_FreeIOReq.
Removing a mailslot requires calling the matching close function. For a handle
returned by CreateMailslot use CloseHandle; for a handle returned by MS-DOS
function 5f4dh (DosMakeMailslot), call either MS-DOS function 3eh (Close) or
function 5f4eh (DosDeleteMailslot); for a handle returned by IFSMgr_MakeMailslot
call IFSMgr_DeleteMailslot. •
Client-side
A message is actually written to a mailslot when the Win32 WriteFile API is called.
This function, in tum, invokes the MS-DOS function 5f52h (DosWriteMailslot). If
the write originates in an MS-DOS application or a Win16 program, then only MS
DOS function 5f52h need be called, since the Win32 CreateFile and CloseHandle
calls are only for KERNEL32 object housekeeping. Ultimately the way the write
operation is completed depends on whether the write is to the local machine or a ·
• Partial documentation for the MS-DOS variants of the mailslot functions can be found in Chapter 1 9
(LAN Manager) o f Uninterrupted Interrupts b y Ralf Brown a n d J i m Kyle (Addison-Wesley).
/PCfor Network FSDs 295
so, then that function is called, otherwise IFSMgr's implementation is called which
writes to the local mailslot buffer. On the other hand, if FS_ConnectNetResource
(ir_jlag 2) is called, it will generate a NetBIOS Send Datagram command. The
=
Named Pipes
Unlike mailslots, named pipes fit nicely into the remote FSD model. Windows 95
only supports client-side named pipes. A client connects to a known named pipe
by calling the Win32 API CreateFile using a UNC name of the form
\ \ SERVER\PIPE\ testpipe. As with other UNC names, a connection is first
attempted to the specified server using the service IFSMgr_SetupConnection. A
call to VREDIR's FS_ConnectNetResource entry point attempts to establish the
connection. If the connection succeeds, then a shell resource structure is
constructed for the connection, and, in this case, it is marked with sr_type of 4 for
IPC (interprocess communication). The shell resource structure also will receive
srJunc, the address of VREDIR's UNC path-based function table. To finish the
CreateFile call, the FS_OpenFile entry point in this table is called to connect to the
server's named pipe. A successful return results in a fhandle structure for the
extended file handle which is used to refer to this named pipe in subsequent API
calls. This fhandle structure will hold the FS_ReadFile, FS_WriteFile, and a
pointer to the miscellaneous handle-based functions in VREDIR.
During the media blitz that accompanied the rollout of Windows 95 in the
summer of 1995, Microsoft kept asking us "Where do you want to go today?"
Now, Microsoft is at work on our destination for tomorrow. Although the Internet
phenomenon caught them off guard, Microsoft is positioning the Windows plat
form as the platform of choice for Internet browsing and establishing personal
intranets. Even if the Internet dominates the future, it will require an infrastructure
to support it on both client and server.
Since the release of Windows 95, we have seen some indications as to what direc
tion these infrastructure changes will take. As of the close of 1996, Microsoft has
completed or announced two enhancements to Windows 95 that are relevant to
the file system. The first is the shipment of OEM Service Release 2, which
included support for FAT32. The second is the WDM (Win32 Driver Model) initia
tive. We looked at FAT32 in Chapter 9, VFAT: Tbe Virtual FAT File System Driver,
but we haven't discussed WDM yet.
What is significant about WDM is that the Windows NT driver model is becoming
the model for future Windows 95 drivers. To better understand WDM, we need to
look at the Windows NT architecture, especially as it applies to the file system. It
is also important to contrast these systems so that you'll have some idea of how a
Windows 95 file system design would be ported to Windows NT.
296
IFSMgr vs. NTs Object Manager 297
a sep�te process acting as a server of a particular API, and their clients are appli
cations written to those APis. In theory, when a client application calls an API the
application makes a request of the server through an inter-process communication
mechanism known as LPC (a local variant of RPC). To improve performance,
requests which don't use or modify the subsystem's global data are serviced
within client-side DLLs.
manager, the virtual memory manager, and the 1/0 manager. Of these, the object
into several system service groupings such as the object manager, the process
manager and the 1/0 manager play significant roles in the implementation of
Windows Nf file systems.
attributes and methods. The attributes describe the state of the object, such as
name or access mode, and the methods provide ways of performing operations
on the objects, such as open, close, or query. Except perhaps for KERNEL32
objects (see Chapter 4, File System API Mappin!i), there is nothing comparable in
Windows 95.
Objects need to be located, retrieved, and shared. This is made possible by giving
them unique names. These names are global to a single computer. An object of
type object directory may contain other objects and object directories. This allows ·
object names to be structured in a hierarchical fashion, much like pathnames. As
with pathnames, the component object · names are separated by backslashes. For
example, \Device\HardDiskO\Partitionl refers to an object directory named
Device which contains a variety of device objects including FloppyO, SerialO,
Seriall, and ParallelO, to name a few. It also contains HardDiskO, which is an
· object directory that, in tum; contains the device objects PartitionO and Partitionl.
To minimize name searching, objects are opened by name and returned a unique
handle. Thereafter, other object methods are invoked using the handle. When a
thread is done using the object, it closes the object's handle and thereby relin
quishes its use of the resource.
Symbolic link objects can be used · to assign an alias to another object name.
When a lookup is performed for a name, if a symbolic link object is encountered,
the lookup continues with the name which the link references. A special type of
298 Chapter 14: Looking Ahead
symbolic link is used to represent the system's drive letters. For example, when
the object manager is asked to lookup \DosDevices\ C:, it finds that DosDevices is
a symbolic link to the object directory named ??. The search is continued in the
object directory ?? for C:. There, the object C: is located and is found to be a
symbolic link to \Device\HardDiskO\Partitionl. The object manager uses this
technique to associate a specific device with a drive letter or volume. Symbolic
links are also used to associate devices with other names, like LPTl , NUL, PRN,
COMl , PIPE, etc.
We can now begin to see the mechanism that the object manager uses to asso
ciate names in the Windows NT namespace with devices. But does the object
manager know about n_ames that are used by a file system? For example, how is
the name c:/winnt!notepad.exe treated by the object manager? We know from the
discussion above that c: is a symbolic link which after expansion will leave us
with the compl�te name, \Device\HardDiskO\Partition1 \ winnt\notepad.exe. As
the object manager performs a name search, for each object in a name, it looks to
see if the object has a parse method. This is a method that is unique to some
objects; it is registered with the object manager when these objects are created. If
a parse method is found, then the remainder of the name is passed to the parse
method to locate the object. Thus, a parse method allows an object t9 extend the
namespace beyond that which object manager is aware of. In the example above,
the device object Partition1 defines a parse method which is responsible for the
namespace on a partition of the hard disk. Depending on whether the partition is
FAT, HPFS, or NTFS, a different parse method will be used to locate members of
the namespace.
object, but how is 1/0 performed on that device and how is a particular file
The object manager is able to use a drive letter to link a filename to a device
The J/0 manager is concerned with three types of NT Executive objects: file,
device, and driver. A file object is an in-memory representation of some physical
device. It could be a text file on a floppy disk, a tape drive, or a serial communica-
IFSMgr vs. NTs 1/0 Manager 299
tions port, so don't let the word "file" make you think it applies only to disk
subsystems. File objects are different than other objects that are handled by the
object manager. Most objects are manipulated directly because the object is a
memory resource. A file object, however, is an iritermediary between some phys
ical resource and the object manager. The object manager doesn't know about the
peculiarities of the hardware to which the file object refers. Instead, the object
manager calls the 1/0 manager to assist with accesses to the device.
When a user-mode program opens a file handle, a new file object is created to
represent the underlying physical resource. More than one process may open a
file handle to a single physical resource and each is represented by a separate file
object. Since multiple processes are accessing a shared resource, they must
synchronize their access using locks or by opening the file object with exclusive
write access.
The device object refers to one of three types of NT device drivers. There is the
low-level driver, which corresponds to a device object; a file system driver, which
corresponds to a particular file system such as FAT, HPFS, or NTFS, and is repre
sented by a driver object; and an intermediate driver, which situates itself
between the other two, e.g. , a network transport driver would be above the MAC
layer NDIS driver but below the file system redirector driver. Although these
drivers provide drastically different functionality, they all use a common structure.
At a minimum, a device driver has routines which load and unload it from the
system plus a set of dispatch routines for each operation which it supports.
As I noted above, file objects carry around pointers to the device objects which
contain them. Device objects contain pointers which refer back to the driver
object which is layered above them. Driver objects contain the dispatch routines
which the 1/0 manager calls when it needs to satisfy an 1/0 request. The driver
object will need to call upon the dispatch routines in the device object to fulfill
these requests. This linkage up and down the driver chain is very flexible and
300 Chapter 14: Looking Ahead
allows for the insertion of auxiliary drivers to achieve special needs, such as
providing filtering.
What we have been examining is the linkage used to tie filenames to specific file
system drivers, In Windows 95, linkage ties a filename or file handle to a shell
resource which contains a pointer to the dispatch routines of the responsible file
system driver. Although KERNEL32 creates file objects for Win32 applications, the
actual tracking of file handles occurs within IFSMgr, by its use of �e
structures.
Just as IFSMgr creates ifsreq packets to route 1/0 requests to file system drivers,
the NT 1/0 manager creates IRPs (1/0 request packets) in response to 1/0
requests and routes them through the various driver layers. Unlike the packets
which IFSMgr uses, IRPs contain separate stack locations for each driver which it
will be sent to. For instance, when the I/0 manager receives a disk file read
request, it would create an IRP and fill in the first stack location with parameters
describing the operation from the file system driver's point of view. On receiving
the IRP, the file system driver would convert the request into a form that the disk
device driver will understand, and place those parameters in the second stack
location. On return, the 1/0 manager sends the same IRP to the disk device driver
which then uses the parameters in the second stack locations to perform the
operation.
references for additional information: Helen Custer, 1993, Inside the Windows NI'
This has been a very brief look at the file system in Windows NT. Here are some
the NI' Device Driver Kit-, Mark Russinovich and Bryce Cogswell, "Examining the
File System (Microsoft Press, 1993); the online help documents which accompany
Windows NT File System," Dr. Dobb 's ]ournal 0997); Art Baker, Tbe Windows NI'
Windows NI' File System Internals (O'Reilly & Associates, Inc . , 1997).
DeVice Driver Book: A Guide.for Programmers (Prentice-Hall, 1997); Rajeev Nagar,
·
privileged operations is also worlds apart. Windows 95 uses VxDs to provide ring-
0 support, whereas Windows NT uses kernel-mode drivers.
The way that these two driver types expose their interfaces is also very different.
A VxD exports the address of its Device Descriptor Block, which contains the
address of its control procedure, optional service table, optional PM and V86
APis, and optional Win32 service table. On the other hand, a kernel-mode driver
exports the names of its entry points, in the same way you would export func
tions in a Win32 DLL. To call ring-0 operating system functions in the NT
Executive, you link a kernel-mode driver with the import library NTOSKRNL and
simply call the functions by name (or ordinal). Contrast this with the mechanism
used by a VxD to call a service in another VxD using Int 20h dynalinks.
As you know, writing a VxD requires selecting appropriate services from the
hundreds which are provided by VMM , IFSMgr, VWIN32, etc. Similarly, writing a
kernel-mode driver requires selecting appropriate functions from the hundreds
which are provided by NTOSKRNL: Add to this the fundamental architectural
differences which we examined in the last two sections, and you should have a
pretty clear picture of the chasm that separates these two worlds.
WDM
WDM was officially unveiled at the Windows Hardware Engineering Conference
(WinHEC) in April 1996. Although it impacts Windows 95 developers most by
making them prepare for a new driver infrastructure, it also impacts Windows NT
developers by iritroducing common drivers for plug-and-play, power manage
ment, and the Universal Serial Bus (USB). The presentations emphasized that the
initial focus of WDM would be on device drivers and not file system drivers.
Furthermore, although Windows NT will not support VxDs, VxDs can peacefully
302 Chapter 14: Looking Ahead
coexist with WDM on Windows platforms. WDM will also coexist with existing
class-specific driver models such as mass storage and networking.
Even though the stated focus of WDM will be on new buses and device types, the ·
changes should impact a lot of system components. This is because drivers
written to this standard require a new and extensive . APL Most of this API is
declared in the header file ntddk.h. Services from the VO manager, the virtual
memory manager, the kernel, etc. are represented here.
At the time this book is being completed, WDM is still under development. At
WinHEC-97, in April 1997, a WDM beta was distributed as well as a Developer's
Release of Memphis. In addition to FAT32, and WDM support for USB, 1394, Plug
and-Play, and Power Management, the next release of Windows (code-named
Memphis) will incorporate WDM streaming-class drivers for audio and video. This
is inline with the Microsoft goal of making the PC the "Entertainment PC" in 1998.
To support this effort, Memphis will ship with DVD drivers, including a new file
system driver called udf vxd for the Universal Disk Format used by the DVD-ROM.
Is WDM on Windows in your future? Probably not any time soon, if you are
working on file system drivers or file system hooks. When I put this question to
one of the Microsoft speakers at the WinHEC-96 conference, their response was
that the Windows · platform would probably be phased out before they got around
to converting the mass storage, network, and file system drivers to WDM.
However, WDM is in your future if you plan to do any Windows NT file system
development. As Windows NT continues to build momentum, there may be more
pressure to extend WDM on Windows to a wider array of drivers.
MultiMon: Setup,
Usage, and
Extensions
MultiMon is used throughout this book as a multi-purpose spy program. By ·
installing this tool you can perform the experiments described in the text and do
exploration on your own. To help you get up to speed with MultiMon, this
appendix will describe what it is, how it works, and how to set it up and use it.
I've also included some background information on its . design and implementa
tion. For the more adventurous, I'll show how to extend its capabilities for your
own purposes.
What Is MultiMon?
Monitor or spy programs are very popular among PC programmers. They afford
the user an opportunity to examine the inner workings of living and breathing
systems and applications. This is a valuable capability because seeing code in
action speaks louder than words. Spy programs also have the annoying habit of
revealing undocumented or incompletely documented APis and data structures.
You will encounter a fair share of undocumented features in this way.
The predecessor to MultiMon was called FileMon. It was the basis for my article
"Monitoring Windows 95 File Activity in Ring O," in Windows/DOS Developer's
journal, July 1995. FileMon is a monitoring tool which displays the calls made by
IFSMgr into the underlying file system drivers. It was used to demonstrate how to
write a Windows 95 file system hook using IFSMgr services. FileMon also illus
trated a simple technique for exchanging information between a Win32
application and a VxD which allowed the VxD to display its output in a console
application window. MultiMon includes and extends . the capabilities that FileMon
had.
303
304 Appendix A: MulUMon: Setup, Usage, and Extensions
MultiMon, which you get on the companion diskette, was designed as a general
purpose tool to use in exploring Windows 95 internals. MultiMon provides a
general framework for collecting and reporting on events of interest. An event
could be the occurrence of a software interrupt, a call to a hooked VxD service,
or even a direct application call. These events are reported by monitors. A
monitor detects a certain kind of event, encapsulates a description of it in a
generic data structure, and then sends that structure to an event manager. The
event manager acts as a funnel. It receives events from a variety of monitors and
serializes these events in a large queue. The event manager also supplies moni
tors with chunks of memory in which events are recorded. The event manager is
also busy writing portions of the queue to a logflle.
Two types of event managers are . supplied: a session manager and a boot
manager. The boot manager allows monitoring of events during system startup,
and the session manager is a dynamic VxD loaded by the Win32 reporter applica
tion. The reporter application formats and displays the events so they can be
scrolled or saved to a text file. The reporter is also responsible for displaying the
drivers which are available for installation, the APis which will be monitored for
each driver, and whether the APis are to be monitored during system startup.
This approach is inherently extensible and configurable. Simply add and remove
monitors to get the mix that provides the picture you want.
Using MultiMon
We have included MultiMon on the companion disk. This section explains how to
install, configure, and use MultiMon.
Installation
The installation diskette contains a Setup program for installing MultiMon as well
as other utilities and source code. Simply launch setup.exe from the floppy
Using MultiMon 305
New entries are also added to the system registry. For this reason, MultiMon and
other components are removed by running uninstal.exe using the standard
Windows 95 uninstall procedure (from Control Panel select Add/Remove
Programs) and following the steps of the uninstall wizard.
MultiMon maintains entries of known static and dynamic monitors in the system
registry. Candidates for inclusion in the registry are VxDs in the directory from
which MultiMon is launched. Only VxDs which have a Versionlnfo resource with
a File Description containing a "MultiMon" string are included. During initializa
tion, MultiMon determines which of these monitors are present and displays them
in the Add/Remove Driver dialog box. Dynamic monitors are distinguished from
. static monitors by having the string "Dynamic" somewhere in their File Descrip-
tion string.
MultiMon setup is the initial step where the user selects a set of drivers to be used
for event collection (using the Add/Remove Driver dialog). After a set of drivers
has been selected, it may be necessary to restart the system if the selection
includes static components which are not currently in memory. Figure A-1 shows
the Add/Remove Driver dialog which is reached via the Options menu. A driver is
added by selecting it in the uninstalled column and then clicking the Add button.
A driver is removed by selecting it in the installed column and then clicking the
Remove button. A driver with a " ,s" suffix it is a static driver; if it has a ",d" suffix
it is a dynamic driver.
Once MultiMon detects installed drivers, the Filters dialog will display all available
monitors for those drivers. A driver may contain more than one monitor; each
306 Appendix A: MultiMon: Setup, Usage, and Extensions
monitor is independently enabled and disabled. You enable those which are of
interest and disable the others. Table A-1 shows the list of drivers and supported
monitors which are included on the companion diskette. Each of these monitors
is used in this book.
Filtering Output
In addition to being able to tum monitors on and off, individual APis may also be
selectable. For instance, you may enable notifications of Int 21h Function 4ch but
disable notifications of Int 21h Function 2ah. Not all monitors have APl selections.
Figure A-2 shows the Filters dialog which is reached via the Filters toolbar button
or the Options menu. It shows two panes. On the left all available monitors are
displayed. If the checkbox in front of the monitor name is checked, that monitor
is enabled. The right pane displays a list of API functions for that monitor. If an
API is checked, it will generate notifications. Two buttons at the bottom of the
dialog provide shortcuts for either selecting all APis or deselecting all APls.
1iiJ FuncOE
1iiJ QueueKerneL6.PC(ij
D VWIN32 Devicelo
� lnt21 (1 0)
D VMM Win32 Servi
� IFS_DupHandle(1 1 )
� BlkThdSetB�(1 2)
1iiJ AdjThdExecPri(1 3)
Saving a Con.figuration
The registry is used to save one default configuration for each monitor. A configu
ration is defined as the enabled/disabled state for a monitor and its map of
enabled/disabled APls. The configuration for the currently selected monitor is
saved by pressing the Save As Default button. In addition to the convenience of
saving a commonly used configuration, the default configuration is the configura
tion used by BOOTMGR.
The Options menu under the main menu provides access to the Filters and the
Add/Remove Drivers dialogs, as shown in Figure A-4.
A Sample Session
Here are the steps to follow to get a quick sample of the output from the FSHook
monitor.
1 . In the Add/Remove Drivers dialog: remove all drivers from the installed
column; add only FSHook. You may be prompted to restart your system to
load the static FSHook driver.
2. In the Filters dialog: under the monitor type column, check "IFSMgr File
Hook"; in the window entitled "APis for IFSMgr FileHook" check all boxes by
pressing the Select All APis button.
3. Press the Start button to begin capturing events.
4. Perform some . activity you wish to monitor, e.g., pop the Properties dialog for
the desktop window.
5 . Press the Stop button to end capturing events.
6. Press the Show button to display the contents of the log file.
Two lines of output from the log file are shown in Figure A-5. This view of the
data is the same as the "Details" view used by the Windows 95 Explorer. A
column can be resized by dragging the right boundary of the column header. If
the current column size truncates data, the display shows an elipsis (. . . ) to indi
cate there is more to see.
All monitors use the same column headers for their output. The columns and their
contents are described in Table A-2. These are general guidelines about what to
Using MultiMon 309
83
C:\\.l/I NDO\.l/S\...
expect in each column; for specifics about usage for a particular monitor, see
Appendix B, MultiMon: Monitor Reference.
Generic flags
Function An API name or description
Flags!"
Device Target device name for the call
Handle File or other handle value
Args Arguments passed in or return values from the API call
Flags2 Additional flags specific to the API
At system startup, the file system is not ready to receive writes to a log file. To
circumvent this, an additional driver is used, called bootmgr. vxd. It allocates some
pages of memory in which to temporarily store captured events. Events are
captured until either the buffer fills up or the user launches MultiMon after system
initialization completes. When MultiMon starts, it writes BOOTMGR's buffer to a
boot.log file and then frees the allocated pages. The size of the capture buffer
defaults to 10 pages but a user-defined value can be specified through the registry
value cpglnBuf (a DWORD type) under the key HKLM\System\ CurrentCon- ·
trolSet\Services\ VxD\MultiMon_bootmgr.
When MultiMon is initially started after collecting a trace using BOOTMGR, the
user receives a prompt advising him of the captured log and asks if he would like
to view it.
310 Appendix A: MulttMon: Setup, Usage, and Extensions
The second area of the registry which MultiMon utilizes is also under HKEY_
LOCAL_MACHINE, in the section which defines the system's static VxDs:
System\ CurrentControlSet\Services\ VxD. The Windows 95 loader enumerates the
subkeys in this section. The loader attempts to load each VxD driver name given
by the StaticVxD value in each subkey. The value of StaticVxD is a string which
may contain a fully-qualified path.
MultiMon creates a subkey for each static driver which is displayed in the Add/
Remove Dialog. To prevent name collisions, the key name is formed by
prepending MultiMon_ to the driver or device name. For example, the entry for
fshook.vxd would be MultiMonJshook. The StaticVxD value is defined to point to
the launch directory for MultiMon, where all monitor drivers are kept.
Underneath the MultiMon_ driver key, one key will be defined for each monitor
that the driver supports. Monitor keys start at 0 and increment by one for each
additional monitor. For example, if the driver has two monitors, then the keys 0
and 1 will be defined. Within each monitor key several values will be defined
which are used to record its default configuration. These include the values
Enabled, NumApi, Index, and ApiStates. ·
Win32 Frontend
The frontend or reporter portion of MultiMon is a respectable Win32 application
written in C. The user interface is based upon a dialog box which contains a list
view control and status control, so no window creation code is needed for these
parts. A dialog procedure handles the requisite windows messages, like WM_INIT
DIALOG, WM_SIZE·, WM_COMMAND, etc.
At one point, I had output from the monitors being displayed directly to listview.
However, this had a major drawback. Since much of the window drawing code
relies heavily on 16-bit USER and GDI, it is acquiring the Win16Mutex. This
created a severe bottleneck at times. To alleviate this, output is written to a log
file by a separate thread, independently of the user interface thread. This creates
much smoother operation and significantly reduces the impact of monitoring on
system performance.
The main thread handles the message pump and responds to user input. A
secondary thread is dedicated to the interface with the event manager,
sessmgr. vxd. When events are being captured with bootmgr. vxd, the MultiMon
application is not loaded.
VxD/Win32 Interface
When MultiMon initializes it looks to see if bootmgr.vxd is loaded. If it is found, a
DeviceloControl command is sent to it, requesting that it shut down any .active
monitors and save its capture buffer to boot.log. Then sessmgr.vxd is loaded and a
secondary thread is created to interface with it.
SESSMGR also receives a list of drivers, their active monitors, and selected APis
before event capture begins. SESSMGR uses this list to initialize the monitors.
again. This loop exits with an error when MultiMon sends SESSMGR a Devicelo
Control command to stop.
During this loop SESSMGR is writing the collected events to a binary log file
named session.log, using IFSMgt's ring-0 file VO functions. When event collection
is stopped, MultiMon reads, formats, and displays the contents of this file into the
listview control.
VxD Monitors
SESSMGR creates a pool of event blocks from an area of locked memory. Event
blocks hold an EBLOCK structure in which . a monitor describes an event. Moni
tors request an event block, record the . event, and then send it back to the event
manager. The event manager then writes one or more event blocks to the log file
and then frees the event blocks for reuse.
Extending MultiMon
Extending MultiMon with a new monitor requires additions in two areas. First an
existing VxD needs to be modified or a new VxD must be written, · to collect the
desired data. Secondly, the Win32 application has to add a new report routine for
the new type of data.
Writing a Monitor
Writing a monitor involves writing a VxD. VxDs can be written in assembly
language, hut it is more common today to use either the C wrappers that accom
p�ny the Windows 95 DDK or a third party package called VToolsD from Vireo
Software. The examples in the book use C and the DDK.
I'd like to give you a feel for how easy it is to write a monitor. To illustrate, I've
come up with an example that is both simple and useful. It is sometimes handy to
output strings to the trace log file to mark various execution points or perhaps
print out a function's return values. This requires that you have the source to the
application you are monitoring so that DeviceloControl calls can be inserted.
We'll only consider Win32 applications, although the idea could be extended to
· Win16 and DOS applications.
The implementation of the entire monitor VxD is in a single source file, tagmon.c,
which you can find on the companion diskette. It starts off wii:h a Declare_DDB
macro which defines the Deviee Descriptor Block for the VxD. This specifies the
VxD's · name, initialization order, etc. so the loader will install it properly. The
DDB also gives the address of the VxD's control procedure, CtrlMsgDispatch,
which is the heart of our monitor (see Example A-1).
The system sends messages to each VxD's control procedure to notify it of system
wide events which it may need to respond to. The control procedure only needs
to respond to messages in which it is iriterested.
The event which our monitor is going to report is actually a DeviceloControl call
into the VxD. This is handled by the third line, which can be read "on receiving a
W32_DEVICEIOCON1ROL message call the function CtrlMsg_W32Devicelo
Control. " The code for this handler is shown in Example A-2.
case DIOC_TAG_STRING :
if ( pDIOCParams ->cbinBuf fer == 0 l
return ERROR_NOT_SUPPORTED ;
defaul t :
return ERROR_NOT_SUPPORTED ;
The value of the input variable service can be a system-defined value such as
DIOC_OPEN or DIOC_CLOSEHANDLE, or it can be a programmer-defined value
like DIOC_TAG_STRING. When the DIOC_TAG_STRING service is requested, we
expect the input structure DIOCParams to contain specific values; the member
lpvlnBuffer should point to a buffer containing a string and cblnBuffer should
of services which VCACHE exports. Since TAGMON does not have any APis, we
use 0. Next, insert the device name of the driver which is to contain the monitor
in the member szDevName. The rest of the members are initialized to 0 or NULL ,
as appropriate. If you have more than one monifor in your driver, you need to
bump iMon by 1 for each additional monitor.
UI NT f lags ;
typede f s truc t {
I I bitO : ins talled, bitl : enabled
BOOL bChecked ; I I monitor checked in Filters dialog
int iMon ; I I 0 -based index for monitor in this driver
char* name ; I I User- friendly moni tor name
int numApi s ; / I number of APi s moni tored
UINT* pAp i S tate ; I I array of enabled/di sabled s tates
char s zDevName [ 9 ] ; / / device name for Moni tor
} MONDEF , * PMONDEF ;
The common index to these three data structures is defined by a unqiue . manifest
constant which is added to multimon.h. For TAGMON, we will use the constant
TAG_STRING. This index is used as the type in the EBLOCK structure.
With the data structures taken care of, we need to now write some code-the
display handler and filter function. The display handler function is called to return
a string for each column of the listview display. The prototype for the function
has this form:
void Di splay_Handler ( int iSubitem , PEBLOCK pb , char* p s z Text )
The filter function is called to return a string which describes an APL This is used
to populate the listview control in the Filter dialog. The prototype for the function
has this form:
char* Filter_Func ( int index )
The display handler and filter function along with static string tables are placed in
a separate C file and added to the build. Some additional examples of extension
files can be found on the companion diskette: hookmon. c, vcmon. c, int2fmon.c,
etc.
MultiMon: Monitor
Reference
MultiMon comes with the monitors listed in Table A-1 of Appendix A, MultiMon:
Setup, Usage, and Extensions. The kind of output produced by each of these moni
tors is quite varied and yet MultiMon presents this information using the same
view. This appendix describes in detail the information displayed by each monitor
and thus serves as a reference.
In the descriptions that follow, a C printf format is used to define output strings.
These format strings are enclosed in double quotes, while arguments are repre
sented by suggestive variable names, e.g. , " drive=%c " , drive_letter.
Interrupt 21 h
Driver Monitor Type
121Helpl PM Int21 hook (pre IFSMgr) p21
121Helpl V86 Int21 hook (pre IFSMgr) v21
121Help2 PM Int21 hook (post IFSMgr) p21-
121Help2 V86 Int21 hook (post IFSMgr) v21-
Win32cb VWIN32 Int21 Dispatch w21
318
Interrupt 2Fh 319
F/ags2 byfunction:
7143h Gt(GET_AITRIBUTES)
St(SET.:_AITRIBUTES)
Gs(GET_AITRIB_COMP_FILESIZE)
Sm(SET_AITRIB_MODIFY_DATETIME)
Gm(GET_AITRIB_MODIFY_DATETIME)
Sa(SET_AITRIB_LAST_ACCESS_DATETIME)
Ga(GET_AITRIB_LAST_ACCESS_DATETIME)
Sc(SET_AITRIB_CREATION_DATE_TIME)
Gc(GET_AITRIB_CREATION_DATE_TIME)
Gu(GET_AITRIB_FIRST_CLUST)
Interrupt 2Fh
Driver Monitor Type
12fmonl PM Int2f hook (pre IFSMgr) p2f
12fmonl V86 Int2f hook (pre IFSMgr) v2f
320 Appendix B: MultiMon: Monitor Reference
IFSMgr Dispatcher
Driver Monitor Type
ifsdspat IFSMgr dispatcher dsp
.
caching of read/write
IFSFN_SEEK flag character: b-seek relative to
beginning of file; e-seek relative to
end of file
IFSFN_CLOSE, IFSFN_FINDCLOSE, flag: f-CLOSE_FINAL, p-CLOSE_FOR_
IFSFN_FCNCLOSE PROCESS, h-CLOSE_HANDLE
IFSFN_COMMIT flag: a-FILE_COMMIT_ASYNC,
n-FILE_NO_LAST_ACCESS_DATE
IFSFN_FILELOCKS flag: L-LOCK_REGION, U-UNLOCK_
REGION
IFSFN_FILETIMES Gm(GET_MODIFY_DATETIME)
Sm(SET_MODIFY_DATETIME)
Ga(GET_LAST_ACCESS_DATETIME)
Sa(SET_LAST_ACCESS_DATETIME)
Gc(GET_CREATION_DATE_TIME)
Sc(SET_CREATION_DATE_TIME)
IFSFN_ENUMHANDLE fi ENUMH_GETFILEINFO
getfile info by handle
fn ENUMH_GETFILENAME
get.filename associated with handle
irENUMH�GETFINDINFO
get info for resuming
rf ENUMH_RESUMEFIND
resume find operation
rd ENUMH_RESYNCFILEDIR
resync dir entry info forfile
lFSMgr File System Hook 323
v(ffi_FSD_VERIFY)
m(IR_FSD_MOUNT)
g(ffi_FSD_UNLOAD)
c(IR_FSD_MOUNT_CHILD)
p(IR_FSD_MAP_DRIVE)
u(m_FSD_UNMAP_DRIVE)
IFSFN_DIR option string: mk(CREATE_Dm),
rm(DELETE_DIR), ck(CHECK_DIR),
83(QUERY83_DIR), If(QUERYLONG_
DIR)
IFSFN_FILEATIRIB option string:
Gt(GET_ATTRIBUI'ES)
St(SET__ATTRIBUI'ES)
Gs(GET_ATIRIB_COMP_FILESIZE)
Sm(SET_ATIRIB_MODIFY_DATETIME)
Gm(GET__ATTRIB_MODIFY_DATETIME)
Sa(SET_ATTRIB_LAST_ACCESS_
DATETIME)
Ga(GET_ATTRIB_LAST_ACCESS_ .
DATETIME)
Sc(SET__ATTRIB_CREATION_DATE_
TIME)
TIME), Gu(GET_ATTRIB_FIRST_CLUST)
IFSFN_FILEATTRIB (cont.) · Gc(GET_ATTRIB_CREATION_DATE_
IFSMgr_NetFunction Hook
Driver Monitor Type
netfunc IFSMgr_NetFunction hook nfh
Flagsl:
Not used
Device:
Not used
Handle:
Not used
Atgs:
11EDX=o/o08lx ESI =%08lx11, ifsreq.ifs_func, provider
Flags2:
Not used
Flagsl:
"Entry" or 11Retum11 depending on which side of the service the display line
was generated.
Device:
Not used
Handle:
On entry, interrupt number as a string "Int o/ox"
326 Appendix B: MultiMon: Monitor Reference
Ar.gs by service:
On entry:
Install_V86_Break_1>6int ''V86 BrkPt=o/oX:o/o04X RingO Func
tion=%08lx (%s)", brk_segment,
brk_offset, func_addr, VxD_Name
Allocate_V86_Call_Back "RingO Function=%08lx (%s)", func_addr,
Vxd_Name
Allocate_PM_Call_Back "RingO Function=%08lx (%s)", func_addr,
Vxd_Name ·
Hook_V86_Int_Chain "RingO Hook=%08lx (%s)", func_addr,
Vxd_Name
Get_V86_Int_Vector, Set_V86Jnt_Vector ''V86 Vector=%X:o/o04X", V86_segment,
V86_offset
Get_PM_Int_Vector, Set_PM_Int_Vector "PM Vector=%X:%1X", PM_selector,
PM_offset
On return:
Allocate_V86_Call_Back "V86 App Callback: %x:%04x",
V86_callback_segment,
V86_callback_offset
Allocate_PM_Call_Back "PM App Callback: %x:%04x" ,
PM_callback_selector,
PM_callback_offset
Flags2:
Not used
VCACHE Services
Driver Monitor Type
vchook VCache services vch
Flagsl:
Options on entry to VCache_FindBlock:
Create Hold MakeMRU
LowPri MustCreate RemoveFromLRU
Device:
FSD cache ID
Handle:
Cache block handle
A78S byfunction:
VCache_Get_Version(Retum) Ver: o/o04x, version_number
VCache_Register(Retum) "DiscardFunc: %08lx MinReserv: %Ix",
buffer_discard_func,
min_reserved_blocks
VCache_GetSize(Retum) For a specific FSD ID:
"MaxFSDBlks: %Ix MaxCacheBlks: %Ix",
max_blocks_for_fsd,
max_num_cache_blocks
For any FSD (id=O):
"CurCacheSize: %Ix MaxCacheBlks: %Ix",
num_blocks_in_cache,
max_num_cache_blocks
VCache_CheckAvail(Entry) "Needed: %Ix", num_blocks_needed
VCache_CheckAvail(Retum) "Avail: %Ix", num_avail_blocks
VCache_FindBlock(Entry) "Keyl : o/o08lx Key2: o/o08lx", keyl_value,
Flags2:
Not used
VWIN32 DeviceloControl
(IFSMgr, VWIN32, WSOCK)
Driver Monitor Type
win32cb VWIN32 DeviceloControl dev
VMM Win32 Services 329
I
Driver Monitor
win32cb
A tgs:
Arguments specific to a function as array of unlabeled doubleword values
Flags2:
Not used
NetBIOS Calls
I I
Driver Monitor
nbhook lfpe
NCBCAN�EL
NCBFINDNAME
"Canceled NCB: o/o08lx", addr_of_ncb
NCBDGSEND, NCBASTAT "Buffer: o/o08lx(o/o04x) Callname: o/os" ,
ncb_buffer, ncb_length, ncb_callname
SMB Packets 331
Ar.gs:
Return values specific to a function
NCBCALL "LSN: o/o02x*", ncb_lsn
Ox48 (Send-Receive) "RecvBuf: %08lx(%04x)", ncb_buffer,
ncb_length
NCBRECVANY: "LSN: %02x Length: %04x", ncb_lsn,
ncb_length
Flags2:
Flags specific to a function
NCBRECVANY, NCBDGSEND, "NAME#: %02x", ncb_num
NCBDGRECV, NCBDGSENDBC,
NCBDGRECVBC, NCBRESET,
Ox48(Send-Receive)
SMB Packets
I
Driver Monitor Type
nbhook smb
Device:
Not used
Handle:
Address of NCB (Network Control Block) whose buffer references the SMB
command
Af8S:
Arguments specific to a function
SMB_COM_OPEN "o/os", pathname_or_domain
SMB_COM_OPEN_ANDX
SMB_COM_TREE_CONNECT
SMB_COM_SESSION_SETUP_ANDX
SMB_COM_TREE_CONNECT_ANDX
SMB_COM_TRANSACTION "o/os SubCommand:%02x",
mailslot_or_namedpipe,
subcolilliland_code
SMB_COM_TRANSACTION2 Subcolillilands 0 through OxOe: "o/os",
trans2_subcommand_name
Flags2:
Not used
IFSMgr Data
Structures
333
334 Appendix C: IFSMgr Data Structures
typedef s truct {
ioreq i f s_i r ; / * 0 embedded ioreq struc ture * /
/ * Thes e members are known only to IFSMgr * /
fhandle * i f s_p fh ; / * 7 4 pointer t o fhandle structure * /
DWORD i f s_ps ft ; / * 7 B pointer t o SFT * /
shre s * i f s_psr ; / * 7 C pointer to she l l resource * /
DWORD i f s_pdb ; I* BO l inear base o f owner PSP * /
DWORD i f s_proid; / * B 4 provider id * /
BYTE i f s_func ; I* BB func tion o f dispatc�ed command * /
BYTE i f s_drv ; / * B 9 . drive from dispatched command * /
BYTE i f s_hf lag ; I * BA f lag * /
BYTE i f s_nf lags ; I * BB f lags , see Table C - 1 * /
· void* i f s_pbu f f er ; / * BC pointer to parse buf fer * /
HVM i fs_VMHandl e ; / * 9 0 VM o f reques t * /
void* i f s_PV ; I * 9 4 pointer t o • per VM data • area * /
union
C l i ent_Register i f s_crs ; /* 9B c l i ent regis ters for
di spatch * /
RingO_Register i f s_ringO_frame ; I* 9B c l i ent regis ters for
ringO file i / o * /
ServerDos_Regis t er i f s_server_frame ; /* 9B c l ient regis ters for
s erver DosCa l l * /
} i fs req ;
Chapter 8, Figure 8-1); it also holds references to the CDS structure and the shell
resource structure.
typede f struc t
shres * vi_psr ; /* 0 0 ptr shell resource for volume * /
char* vi_p s zRootDi r ; / * 0 4 path following drive & colon in CDS * /
WORD vi_C l i ent_CX ; /* 08 */
BYTE vi_unkl ; /* OA */
BYTE vi_flags ; /* O B Volume i s subst drive Oxl O
? Ox0 8
? Ox04
Static connection Ox02
? OxO l */
WORD vi_leng ; /* oc length of Unicode subst path * /
BYTE vi_unk2 ; /* OE * /
BYTE vi_drv ; /* O F one -based volume * /
s tring_t vi_subst_path ; /* 1 0 Unicode Subst path * /
void* vi_CDS_copy ; /* 1 4 Copy o f C D S * /
} volinfo ;
Per-VM Data
During Device Init, IFSMgr allocates per-VM data using the service Allocate
Device_CB_Area. The size of this area is determined by the following formula:
What is returned by this service is the offset to IFSMgr's per-VM data from the
address given by the VM handle. It is the sum of these two values which is stored
in ifsreq. ifs_PV.
The layout of IFSMgr's per-VM data is divided into three areas. At the. beginning
of the area is the pervm structure given below. It is followed by two additional
tables of equal size which will hold pointers for up to 256 SFT entries plus
pointers for FCB's inherited from MSDOS before Windows 95 started. The second
of these two tables is pointed at by the pervm member pv_ppsft.
typedef struct
void* pv_next ; /* 00 */
void* pv_prev ; / * 04 * /
BYTE pv_flags ; / * 0 8 bit 0 * /
/* bit 1 * /
Per-Thread Data 339
/* bit 2 */
/* bit 3 */
I* bit 4 Local Int2 1 hooker* /
/* bit 5 Contro l - C check * /
/* bit 6 */
/* bit 7 */
BYTE pv_cnt-; /* 09 *I
BYTE pv_curdrv ; /* OA */
BYTE pv_unk2 ; /* OB */
vo id* pv_di spfunc ; I* oc address o f di spatch func tion * /
i fsreq* pv_pi f s ; /* 10 active i f s req * /
pevent pv_pev_vm ; /* 14 VM tasktime event * /
DWORD pv_C l i ent_DS ; /* 18 DS : DX o r DS : EDX * /
DWORD pv_C l ient_EDX ; / * lC addres s o f Disk Trans fer Area * /
HEV}!!NT pv_hev ; /* 20 */
£handle* pv_pfh [ 3 2 ] ; /* 24 */
pevent pv_pev_vm2 ; /* 48 */
void* pv_pps f t ; I* 4C pointer to s econd SFT table * /
void* pv_curdir [ 3 2 ] ; I * 50 current directory for thi s VM * /
WORD pv_f lags 2 ; /* DO */
WORD pv_unk2 ; /* D2 *I
} pervm ;
Per-Thread Data
IFSMgr piggybacks a doubleword onto every thread. It does this by allocating a
thread data slot (using VMM service _AllocateThreadDataSlot) at Device Init time.
Unlike some other devices which use this doubleword to store a pointer to a
more substantial data structure, IFSMgr is content with using just the data slot.
which is the same as the ring-0 thread handle. The layout of IFSMgr's thread
The data slot is located by an offset from the address of a thread's control block
doubleword is as follows:
Geoff Chappell shared his insights regarding the use of these bit flags in a recent
email:
The bit flags are concerned with the status of one thread with respect to threads
that propose to work or have started to work on a volume lock.
then it will have to wait until nobody else is working on the same volume lock
but even after then, it will have to wait until no thread is doing anything that
might be affected by the change in the volume's lock state.
At the time that a thread is to start working on a volume lock, there is not much
status information to go on. IFSMgr assumes that just about any thread that is in
an IFS operation is liable to be affected. The general scheme is to set the Marked
flag in each of them.
Some threads will already have the NoBlock flag set because it was deduced at an
earlier stage that their IFS operation could not be affected by work on a volume
lock. For instance, these threads do not get "marked. "
Some threads will already have the Blocked flag set because they are blocked at
places in their IFS operations where it is known not to matter if a volume lock
gets worked on. For instance, if a thread has to wait for a parse buffer to become
available, then it is not very far into its IFS operation and certainly a long way
from being worried whether some volume is locked. Threads that have blocked at
safe places do not get "marked" either.
The thread that wants to work on the volume lock blocks on a special key. As the
other threads execute, some may finish their IFS operations. That's good: it makes
one less thread to worry about. The general scheme when the IFSMgr decides a
thread can't be affected by work on a volume lock is that if the thread has its
"Marked" flag set, then the flag is cleared and the thread is deemed to no longer
contribute to the count of threads that could be affected. When there are no
longer any threads that could be affected, all threads waiting to work on volume
locks are signalled.
Another good outcome, handled the same way, is that a "marked" thread blocks
at a place known to be safe.
Some threads that were blocked at safe places may wake up. These and other
threads (with or without the Marked flag) may eventually reach far enough into
their IFS operation that they want access to a volume whose lock is to be worked
on by some waiting thread. For some operations (such as on the paging file and
on µiemory-mapped files and on pages opened as immovable), this won't matter,
but in general, a thread that wants to access the volume will have to block until
the work on that particular volume's lock is done. Again, the IFSMgr knows that
the thread cannot now be affected by work on the volume's lock and so again, it
may signal the threads that are waiting to work on volume locks.
In summary , Marked means that the thread is thought (possibly only cautiously)
to prevent proceeding immediately with proposed work on a volume lock,
Blocked means that the thread is blocked at a stage where it can't be affected by
proposed work on a volume lock and NoBlock means that if work on a volume
lock is proposed, then this thread is not to be regarded as preventing the work.
IFS Development Aids
1his appendix describes some aids that were used in developing the sample code
which accompanies the book. Since I have adopted the DDK's approach to
writing VxDs in C (see What '.s New in Windows 95 for VxD Writers? by Ruediger
Asche, April 24, 1994, MSDN CD), these aids fill in a few gaps where I felt there
were some deficiencies.
A VxD has a single exported symbol which is its device name with the suffix ·
"_DOB" appended. 1his points to the device descriptor block and is used by the
loader to find the segments in a VxD when bringing the module into memory.
The C compiler only allows names like _FSHOOK_DDB, where FSHOOK_DDB is
what is really desired. Using the decorated nam� would require clients of the VxD
to use the name _FSHOOK when referring to it. Clearly this is not desirable.
The chentry.exe utility lets you go ahead and use decorated names by removing
the underscore from the exported name after the VxD is built. If the exported
DOB name does not have a leading underscore, CHENTRY does nothing. To use
CHENTRY, you simply add the command chentry VxdName folloWing the link
step in your makefiles.
341
342 Appendix D: IFS Development Aids
Every VxD requires two basic structures, a device descriptor block and a control
message dispatch procedure. The primary purpose of vxd.h is to provide macros .
for setting up these two constructs.
Setting up a VxD's device descriptor block requires two steps. First, before the
include statement for vxd.h, define the name for your device descriptor block. For
example, these statements set up a device descriptor block for the VECTORS VxD:
#def ine DDB VECTORS_DDB
# include " vxd . h "
. Inside vxd.h the following macro is defined which will b e used from our C source
file to initialize the contents of VECTORS_DDB:
I I Dec lare Device Descriptor Block
#de f ine Declare_DDB ( name , maj or , minor , di spatch , devID , init ,
v8 6proc , pmproc , re fdata , svctbl , numsvcs )
s truc t VxD_Desc_Block
DDB = { 0 , DDK_VERS ION , devID , maj or , minor , 0 , name , init ,
( DWORD ) dispatch , ( DWORD ) v8 6proc , ( DWORD ) v8 6proc ,
0 , 0 , re fdata , svctbl , numsvcs , 0 , ' Prev • ,
s i zeof ( s truct VxD_Desc_Block ) , ' Rsvl ' , ' Rsv2 ' ,
' Rsv3 ' } ;
Then from the C source file, within a locked data segment, a global instance of
the DOB is defined like this:
Declare_DDB ( ' VECTORS " , 1 , 0 , CtrlMsgDispatch ,
UNDEFINED_DEVICE_ID , VMM_ INIT_ORDER ,
0, 0, 0, 0, 0 ) ;
The control message dispatch procedure is constructed from macros that make it
resemble a message map. Here is a typical dispatch procedure for a MultiMon
monitor:
void _declspec ( naked ) CtrlMsgDispatch ( void ) · {
BEGIN_DIS PATCH_MAP
ON_SYS_CRITICAL_INIT CtrlMsg_Sys_Crit_Ini t
ON_DEVICE_INIT CtrlMsg_Device_Init )
ON_INIT_COMPLETE CtrlMsg_Init_Complete
ON_SYS_VM_TERMINATE CtrlMsg_Sys_V!!l_T erminate
END_DISPATCH_MAP
}
These prototypes are required so that the proper arguments are pushed on the
stack prior to calling the handler. The header file vxd.h contains macros and
message handler prototypes for known control messages.
The dispatch macros also handle directed system control messages, those control
messages which are private to a set of cooperating VxDs. The macros ON_
DIRECTEDO and ON_DIRECTEDl take two arguments, the handler function and a
message number (e.g. PRIVA TE_IN/1). The message number is private to the coop
erating VxDs but is required to be in the range Ox70000000 to Ox7FFFFFFF. The
reason that two ON_DIRECTED macros are used here is that ON_DIRECTEDO calls a
handler that takes no arguments whereas ON_DIRECTEDl calls a handler which
takes one argument which is passed in the EBX register.
One more fundamental macro that is included helps when creating a stack frame
for a "hooked procedure. " This is used when declaring a hook procedure for
VMM's Hook_Device_Service. New with Windows 95 is the ability to unhook
these services. To do so requires creating a proper function preamble and this is
done by declaring the function with the HOOK_PREAMBLE macro:
I I Thes e two j umps make up the hook preamble
I I The s e are needed to support Unhook_Devic e_Service
II The real hook procedure begins after these at • real_entry •
#de fine HOOK_PREAMBLE (prev ) \
_asm j mp short real_entry \
_asm j mp dword ptr prev \
_asm real_entry :
The prev argument to this macro is a doubleword storage location which holds
the original service's address. This location is filled in automatically by the Hook_
Device_Service function. Here is an example of how this macro would be used:
I I Win9 5 Hook_Device_Servic e f i l l s thi s in !
PFN pPrev_Al locate_PM_Cal l_Back ;
_asm ret
vxd.h contains a variety of other simple macros which I leave to you to explore.
IFSWRAPS
IFSWRAPS is a static library, included on the companion diskette, which provides
C callable functions for all IFSMgr services as well as a few VWIN32 and VMM
s�rvices. This library was constructed in the same way as VXDWRAPS which
accompanies the DDK. The header file ifswraps.h is included in source files
where you call the library functions.
Most of the services supplied by IFSMgr use the C calling convention. This makes
it almost trivial to make wrappers for these functions since no coding is required.
For these functions, the calling parameters and return values are as described in
the DDK. There are a handful of functions which use registers to pass arguments
and receive return values; only these functions require some special treatment.
These exceptions are described below:
If the return value is 0, the call was successful and · the EREGS structure
contains the return values in registers; if the return value is non-zero, it is an
error code. See the DDK for register assignments for each call.
DEBIFS 345
DWORD VWIN32_GetCurrentProcessHandle(VOID)
VOID. Simulate_FarJmp(DWORD selector, DWORD offset)
BOOL Get_PM_Int_Vector(DWORD intnum, PWORD pSel, PDWORD pOfs)
BOOL Hook_PM_Interrupt(DWORD intnum, PWORD pSel, PDWORD pOfs,
PVOID handler, DWORD refdata)
BOOL Hook_V86_Int_Chain(DWORD intnum, PVOID handler)
BOOL Test_Sys_VM_Handle(HVM hvm)
PVOID Map_Flat(DWORD segofs, DWORD offof)
BOOL Directed_Sys_ControlO(PVMMDDB pDDB, DWORD SysControl)
BOOL Directed_Sys_Controll(PVMMDDB pDDB, DWORD SysControl,
PVOID argl)
BOOL Directed_Sys_Control2(PVMMDDB pDDB, DWORD SysControl,
PVOID arg l , PVOID arg2)
PVOID Hook_Device_Service(DWORD svcnum, PVOID handler)
BOOL Unhook_Device_Service(DWORD svcnum, PVOID handler)
DEB/PS
DEBIFS is the name of a VxD, included on the companion diskette, which
contains a dot command. By dot command I mean a command which you enter
in your debugger, like .vmm b. The commands which DEBIFS provides dump out
useful information about IFSMgr's data structures. The available commands are:
346 Appendix D: IFS Development Aids
.debifs i address
Dumps an ifsreq structure at specified address
.debifs s address
Dumps a shres structure at specified address
.debifs f address
Dumps a fhandle structure at specified address
This dump was created from Softlce fo r Windows 9 5 . Note that a register name
may be passed as an address; in actuality, any valid debugger expression may be
used for an address. The hexadecimal value in parentheses following each
member name is the offset of the member from the beginning of the structure.
Bibliography
Arun, Russ. 1994. "Chicago File System Features-Tips & Issues," Microsoft Corp.
White Paper, April 22, 1994.
Asche, Ruediger. 1994. "What's New in Windows 95 for VxD Writers?, " Microsoft
Developer's Network CD-ROM, April 1994.
Baker, Art . 1997. Tbe Windows NT Device Driver Book: A Guide for Programmers.
Prentice-Hall, Inc.
Brown, Ralf and Kyle, Jim. 1994. Uninterrupted Interrupts. (A Programmer's CD
ROM Reference to Network APls and to BIOS, DOS, and Tbird-Party Calls).
Addison-Wesley Publishing Co.
Crawford, John and Gelsinger, Patrick. 1987. Programming the 80386. SYBEX, Inc.
DiLascia, Paul and Stone, Victor. 1996. "Sweeper, " Microsoft Interactive Developer,
vol. 1 , no. 1 (Spring 1996), p . 1 6
347
348 Bibliography
Microsoft Corp. 1996. "Microsoft Networks SMB File Sharing Protocol," Document
Version 6.0p.
Mitchell, Stan. 1995. "Monitoring Windows 95 File Activity in Ring O," Windows/
DOS Developer's journal, vol.6, no.7 Quly 1995), p.6
Oney, Walter. 1996. Systems Programming for Windows 95. Microsoft Press.
Perry, Dan. 1996. "CIFS: A Common Internet File System," Microsoft Interactive
Developer, vol. l , no. 5 (November 1996) , p.56
Pietrek, Matt. 1996. Windows 95 System Programming Secrets. IDG . Books
Worldwide.
Russinovich, Mark and Cogswell, Bryce. 1997. "Examining the Windows NT File
System," Dr. Dobb 'sjournal, vol.21 , no.2 (February 1997).
Internet Resources
Windows 95 File System / VxDs
O'Reilly Wmdows Center http.//www.ora.com/centers/windows/
Author Page: "Inside Wm95 File System" http://www.sourcequest.com/win95ifs/
Device Driver Development Home Page http://www.albany. net!-danorton/ddk
Vireo Software Home Page http://www.vireo.com
UseNet Newsgroup comp.as. ms-windowsprogrammer.vxd
C/FS/SMB
CIFS and SMB specifications ftp:!/ftp. microsoft.com/developr/drg/CIFS
CIFS Home Page http://www.microsoft.com/intdev/cifs/cifs.htm
SAMBA download ftp://samba.anu.edu.au/pub/samba
UseNet Newsgroup comp.protocols.smb
WDM/Kernel-Mode Drivers
WDM Home Page http://www. microsoft.com/hwdev/pcfuture/wdm.htm
WDM for Wmdows & http://www. microsoft.com/hwdev/pcfuture/
Wmdows NT wdmview.htm
NT Internal Home Page - http://www. ntinternals.com
Microsoft Interactive Developer http://www. microsoft.com/mind
UseNet Newsgroup comp.as. ms-windows.programmer. nt.kernel-mode
-�·
.... _,
,. . . . -...: .
. : �;
., .
°'- r
:-:i.;
..,_ . . �.·· . ·. . . .. .
- . · .• .
:• · .
. .. · ·
_,_
I (slash) in filenames, 20
+ (plus) in filenames, 20
mailslot object services, 66
memory-mapped file object services, 66
OpenFile interface, 22
Numbers pipe object services, 65
Ox544a signature, 60 UnmapViewOfFile, 68, 230
386part.par file, 205 Win16 file services, 77
8.3 filename convention, 20 arenas, 10-1 1
A B
access backslash (\) in filenames, 20
to devices, 26-28 BCS encodings, 263
file systeiii. structures, 71-72 BDDs (block device drivers), 184-185
to local files, 21-24 Begin PM App stage, 30, 38
to remote files, 24 bitness, VM , 45
to win386.swp file, 208-210 block cache, 234
ActualBPS key, 34 data structures, 239-241
aging memory source, 235-239
cache blocks, 241 services of, 241-243
connections, 165 block device drivers, 184-185
alias directory entries; 179, 201 blockinfo structure, 337
_Allocate_Device_CB_Area service, 83, 98, _BlockOnlD service, 152, 190
338 boot monitor (see BOOTMGR monitor)
Allocate_PM_Call_Back service, 36 boot records, 177
allocating memotjr (see memory) BOOTMGR monitor, 30, 206, 309
anonymous pipes, 64 BOOTSECTOR structure, 177
APis (application programming interfaces) breakpoints, V86, 34
CloseHandle interface, 41
351
352 Index
IFSMgr, 4-8
IFSMgr and, 91
FSHQuery hook, 1 3 1-133
FSINFILE remote driver, 172
accessing, 41-47
FSKeyl , FSKey2 cache keys, 240
client interface, 5
FTP (File Transfer Protocol), 287
common dispatch routine, 9 1-95
functions
dispatcher, MultiMon reference for, 320
dispatch (see dispatch functions)
file system hook, 321-324
handle-based table, 1 1 5
FSD linkage and, 144
preamble (see preamble functions)
IFSHLP and, 91
volume-based table, 1 13
interrupt handlers
MONOCFSD, 168
Int 17h, 95
(see also under specific funciton
Int 2lh, 79-92
number)
Int 25h, 26h, 94
Int 2fh, 92-94
G resource and file handle management, 6
gates, interrupt, privileges and, 35, 44 SetverNameCache, 249-25 1
GetFilelnformationByHandle, 53-56 services for (see services)
_GetVxDName service, 36 system startup and, 37
global handles, 55 V86 callback (see V86 callback)
K
block cache and, 235-239
context, 1 1
K32Init call, 39 Dils and, 1 0
K320BJ_ (see Kemel32 objects) heap management services, 254-257
Kemel32 private arena, 1 1
DLL, protected-mode callbacks, 45 private VM, allocating, 83
Init stage, 30 shared arena, 10
Initialized message, 39 memory-mapped files, 66, 229--2 32
objects, 53-61 menu commands, MultiMon, 307
device objects, 68-70 message block format, . 281-283
file objects, 61--63 Microsoft Netware Networks, 147
p
quick reference, 318-332
sampling startup tim�line with, 30
tracing VREDm operations, 288-291
_PageAttach service, 1 2
VCache services and, 243-245
_PageCommit service, 2 1 4 , 226
Multiple Provider Router (MPR) DLL, 72
PAGEFILE driver, 210-2 1 3
multithreading, 1 5 1
PageReserve service, 227, 230
per-thread data, 339
_PagerRegister service, 214
must-complete sections, 55
pagers, 2 1 3-224
PAGERS utility, 214
N pages, memory (see memory)
0
for Int 21h function 25h, 85-88
private memory arena, 1 1
privileges, interrupt gates and, 35, 44
object manager, WinNT, 296-298
process database (PDB), 57
object type instances, 56 .
358 Index
s
SPOOLER driver, 1 45
SR.EXE utility, 1 2 1
SrTable table, 1 5 8
Schulman, Andrew, 46, 348
standard devices, 2 1 , 26
SCSI device architecture, 184
standard handles, 60
segmentation, VxD, 148-15 1
startup, file system, 29-41
select calls, Winsock, 16
static FSDs, 148
ServerNameCache, 249-25 1
static variables, 101
service providers, 120
subst ·command, 3
services
surfing operation, Web (example), 14-18
block cache, 241-243
swapping versus paging, 206, 2 1 5
debugging, 274
(see also paging)
event management, 260-263
symbolic link objects, WinNT, 297
file-change object, 64
synchronization services, VMM, 262
file object, 62
Index 359
u
vendor supplied drivers (VDSs), 184
versions
UNC (Universal Naming Convention), 21 IFSMgr, 253
·
v
machine, bitness of, 45
pages, 220
redirector (see VREDffi)
V86 callback, 90
Virtual-86 mode (see V86 mode)
V86 mode, 5
VMM
accessing IFSMgr from, 41
MultiMon reference for, 329
breakpoint storage, 34
pagers, 213-224
IDT for, 35
synchronization services, 262
Int 21h handlers, 24
VMM_Replace_Global_Environment
interrupt handlers (see interrupts)
function, 39
V86_Int_Chain function, 91
360 Index
x
(see also FSDs)
w x_ConvertHandleToK320bject
W386_Get_Win32_API function, 45 function, 56, 58
Wakeup_Thread function, 190 x_GetExtendedError function, 54
WDM (Win32 Driver Model), 301 x_MaybeChangePSP function, 54
Web surfing operation (example), 14-18 x_RefHandleToK320bject function, 58
Win16 x_RestorePSP function, 54
API, 6 x_Win32HandleToK320bject
file services, 77 function, 58-60
About the Author
Stan Mitchell is a consulting software engineer iii Silicon Valley. He specializes in
driver and system level programming on the Wintel · platform. Stan earned a Bach
elor of Science degree from Wayne State University in 1970 and a Master of
Science from University of Waterloo in 1976.
He entered the microcomputer field in 1979. His early projects emphasized logic
design of single-board microcomputers and micro-controllers. The most memo
rable project during this period was the design of a full-SCSI host adapter with
8048 firmware at Adaptec, Inc.
After the introduction of the IBM-PC, Stan shifted his focus to MS-DOS system soft
ware and then to MS-Windows. His recent projects have included developing a
NetBIOS layer over' TCP/IP for NetManage and a Windows 95 file system monitor
for Xerox/XSoft.
Stan and his wife Maggie, make Milpitas, CA, their home. In his spare time, he
likes to romp with his dogs (Yanni and Munchkin), play a serious game of table
tennis, and browse the shelves of nearby bookstores .
. Colophon
The animal featured on the cover of Inside the Windows 95 File System is a repre
sentative of one of the more than 65,000 species of mollusks. There are six classes
of mollusk. The largest of these classes is the gastropod. The coiled shell on the
animal on the cover of this book is typical of many, but not all, gastropods. This
mollusk may be an Astraea Heliotropium, a native of the waters surrounding New
Zealand. The Astraea Heliotropium grows to a size of three to four inches, and
has a lovely iridescent purplish-pink shell.
No species shows as much diversity of shape and size as the mollusk. Despite this
diversity, most mollusks have the same basic body plan. The word mollusk means
"soft bodied." The soft mollusk body is composed of a combined head-foot
containing the central nervous system and a layer of tissue called the mantle that
covers the internal organs. The mantle also secretes the shell that covers the
mollusk's body. The shell is part of the animal and grows with it.
Edie Freedman designed the cover of this book, using a 19th-century engraving
from the Dover Pictorial Archive. The cover layout was produced with Quark
XPress 3 :3 using the ITC Garamond font.
The inside layout was designed by Edie Freedman and Nancy Priest and imple
mented in FrameMaker 5.0 by Mike Sierra. The text and heading fonts are ITC
Garamond Light and Garamond Book. The illustrations that appear in the book
were created in Macromedia Freehand 5.0 by Chris Reilley. This colophon was
written by Clairemarie Fisher O'Leary.