Walking the File Tree
Do you need to create an application that will recursively visit all the files in a file tree? Perhaps you need to delete every .class
file in a tree, or find every file that has not been accessed in the last year. You can do so with the FileVisitor
interface.
The FileVisitor Interface
To walk a file tree, you first need to implement a FileVisitor
. A FileVisitor
specifies the required behavior at key points in the traversal process: when a file is visited, before a directory is accessed, after a directory is accessed, or when a failure occurs. The interface has four methods that correspond to these situations:
preVisitDirectory()
– Invoked before a directory's entries are visited.postVisitDirectory()
– Invoked after all the entries in a directory are visited. If any errors are encountered, the specific exception is passed to the method.visitFile()
– Invoked on the file being visited. The file'sBasicFileAttributes
is passed to the method, or you can use the file attributes package to read a specific set of attributes. For example, you can choose to read the file'sDosFileAttributeView
to determine if the file has the "hidden" bit set.visitFileFailed()
– Invoked when the file cannot be accessed. The specific exception is passed to the method. You can choose whether to throw the exception, print it to the console or a log file, and so on.
If you do not need to implement all four of the FileVisitor
methods, instead of implementing the FileVisitor
interface, you can extend the SimpleFileVisitor
class. This class is an adapter, which implements the FileVisitor
interface, visits all files in a tree and throws an IOError
when an error is encountered. You can extend this class and override only the methods that you require.
Here is an example that extends SimpleFileVisitor
to print all entries in a file tree. It prints the entry whether the entry is a regular file, a symbolic link, a directory, or some other "unspecified" type of file. It also prints the size, in bytes, of each file. Any exception that is encountered is printed to the console.
The FileVisitor
methods are shown in the following code:
Kickstarting the Process
Once you have implemented your FileVisitor
, how do you initiate the file walk? There are two walkFileTree()
methods in the Files class.
The first method requires only a starting point and an instance of your FileVisitor
. You can invoke the PrintFiles
file visitor as follows:
The second walkFileTree()
method enables you to additionally specify a limit on the number of levels visited and a set of FileVisitOption
enums. If you want to ensure that this method walks the entire file tree, you can specify Integer.MAX_VALUE
for the maximum depth argument.
You can specify the FileVisitOption
enum, FOLLOW_LINKS
, which indicates that symbolic links should be followed.
This code snippet shows how the four-argument method can be invoked:
Considerations When Creating a FileVisitor
A file tree is walked depth first, but you cannot make any assumptions about the iteration order that subdirectories are visited.
If your program will be changing the file system, you need to carefully consider how you implement your FileVisitor
.
For example, if you are writing a recursive delete, you first delete the files in a directory before deleting the directory itself. In this case, you delete the directory in postVisitDirectory()
.
If you are writing a recursive copy, you create the new directory in preVisitDirectory()
before attempting to copy the files to it (in visitFiles()
). If you want to preserve the attributes of the source directory (similar to the UNIX cp -p
command), you need to do that after the files have been copied, in postVisitDirectory()
. The Copy
example shows how to do this.
If you are writing a file search, you perform the comparison in the visitFile()
method. This method finds all the files that match your criteria, but it does not find the directories. If you want to find both files and directories, you must also perform the comparison in either the preVisitDirectory()
or postVisitDirectory()
method. The Find
example shows how to do this.
You need to decide whether you want symbolic links to be followed. If you are deleting files, for example, following symbolic links might not be advisable. If you are copying a file tree, you might want to allow it. By default, walkFileTree()
does not follow symbolic links.
The visitFile()
method is invoked for files. If you have specified the FOLLOW_LINKS
option and your file tree has a circular link to a parent directory, the looping directory is reported in the visitFileFailed()
method with the FileSystemLoopException
. The following code snippet shows how to catch a circular link and is from the Copy
example:
The visitFile()
method is invoked for files. If you have specified the FOLLOW_LINKS
option and your file tree has a circular link to a parent directory, the looping directory is reported in the visitFileFailed()
method with the FileSystemLoopException
. The following code snippet shows how to catch a circular link and is from the Copy
example:
This case can occur only when the program is following symbolic links.
Controlling the Flow
Perhaps you want to walk the file tree looking for a particular directory and, when found, you want the process to terminate. Perhaps you want to skip specific directories.
The FileVisitor
methods return a FileVisitResult
value. You can abort the file walking process or control whether a directory is visited by the values you return in the FileVisitor
methods:
CONTINUE
– Indicates that the file walking should continue. If thepreVisitDirectory()
method returnsCONTINUE
, the directory is visited.TERMINATE
– Immediately aborts the file walking. No further file walking methods are invoked after this value is returned.SKIP_SUBTREE
– WhenpreVisitDirectory()
returns this value, the specified directory and its subdirectories are skipped. This branch is "pruned out" of the tree.SKIP_SIBLINGS
– WhenpreVisitDirectory()
returns this value, the specified directory is not visited,postVisitDirectory()
is not invoked, and no further unvisited siblings are visited. If returned from thepostVisitDirectory()
method, no further siblings are visited. Essentially, nothing further happens in the specified directory.
In this code snippet, any directory named SCCS is skipped:
In this code snippet, as soon as a particular file is located, the file name is printed to standard output, and the file walking terminates:
Finding Files
If you have ever used a shell script, you have most likely used pattern matching to locate files. In fact, you have probably used it extensively. If you have not used it, pattern matching uses special characters to create a pattern and then file names can be compared against that pattern. For example, in most shell scripts, the asterisk, *
, matches any number of characters. For example, the following command lists all the files in the current directory that end in .html
:
The java.nio.file
package provides programmatic support for this useful feature. Each file system implementation provides a PathMatcher
. You can retrieve a file system's PathMatcher
by using the getPathMatcher(String)
method in the FileSystem
class. The following code snippet fetches the path matcher for the default file system:
The string argument passed to getPathMatcher(String)
specifies the syntax flavor and the pattern to be matched. This example specifies glob syntax. If you are unfamiliar with glob syntax, see the section What is a Glob.
Glob syntax is easy to use and flexible but, if you prefer, you can also use regular expressions, or regex, syntax. For further information about regex, see the section Regular Expressions. Some file system implementations might support other syntaxes.
If you want to use some other form of string-based pattern matching, you can create your own PathMatcher
class. The examples in this page use glob syntax.
Once you have created your PathMatcher
instance, you are ready to match files against it. The PathMatcher
interface has a single method, matches()
, that takes a Path
argument and returns a boolean
: It either matches the pattern, or it does not. The following code snippet looks for files that end in .java
or .class
and prints those files to standard output:
Recursive Pattern Matching
Searching for files that match a particular pattern goes hand-in-hand with walking a file tree. How many times do you know a file is somewhere on the file system, but where? Or perhaps you need to find all files in a file tree that have a particular file extension.
The Find
example does precisely that. Find
is similar to the UNIX find
utility, but has pared down functionally. You can extend this example to include other functionality. For example, the find
utility supports the -prune
flag to exclude an entire subtree from the search. You could implement that functionality by returning SKIP_SUBTREE
in the preVisitDirectory()
method. To implement the -L
option, which follows symbolic links, you could use the four-argument walkFileTree(Path, Set, int, FileVisitor)
method and pass in the FOLLOW_LINKS
enum (but make sure that you test for circular links in the visitFile()
method).
To run the Find
application, use the following format:
The pattern is placed inside quotation marks so any wildcards are not interpreted by the shell. For example:
The Find Example
Here is the source code for the Find
example:
The Copy Example
The Chmod Example
Last update: January 4, 2024