Scalable Detection of Semantic Clones
Scalable Detection of Semantic Clones
5-2008
Lingxiao JIANG
Singapore Management University, [email protected]
Zhendong SU
University of California, Davis
Citation
GABEL, Mark; JIANG, Lingxiao; and SU, Zhendong. Scalable detection of semantic clones. (2008). ICSE '08: ACM/IEEE 30th
International Conference on Software Engineering: 10-18 May 2008, Leipzig, Germany. 321-330. Research Collection School Of
Information Systems.
Available at: http://ink.library.smu.edu.sg/sis_research/934
This Conference Proceeding Article is brought to you for free and open access by the School of Information Systems at Institutional Knowledge at
Singapore Management University. It has been accepted for inclusion in Research Collection School Of Information Systems by an authorized
administrator of Institutional Knowledge at Singapore Management University. For more information, please email [email protected].
Scalable Detection of Semantic Clones ∗
statement node
8 i++; formal-out
func()
9 }
11 finish = get_time_millis(); Figure 3: The PDG for Figure 1.
12 printf("loop took %dms\n", finish − start);
tation of our definitions and algorithm (Section 3). We then dis-
14 j = 2 * k; cuss our implementation (Section 4) and present the results of our
empirical evaluation (Section 5). Finally, we discuss related work
16 printf("i=%d, j=%d\n", i, j);
17 return k; (Section 6) and conclude with ideas for future work (Section 7).
18 }
stmt- stmt-
init while return
expr expr
0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0 1,0,2,0,0,1,0,0,0,0,1,2,0,1,0,1,1 0,1,1,0,1,0,1,0,0,1,0,1,0,1,0,1,0 0,1,2,1,0,0,0,0,0,0,0,2,0,1,1,0,0 0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0
expr- expr- stmt- para- expr- expr- expr- variable block atomic node
integer imultiply
variable variable expr meter constant variable variable
string child node
3.3.2 Semantic Threads In addition, although some overlapping among different slices
should be allowed (and even desired), a significant amount of over-
Although analyzing weakly connected components for clones lapping may imply that these slices might be part of the same higher-
does yield interesting results, many semantic clones may not be level computation. This is especially evident in forward slices from
detected this way. Thus, for better coverage, we need a more fine- consecutive, related declarations:
grained partitioning of the statements in a procedure body. First,
to illustrate the problem, let us consider the following hypothetical int count_list_nodes(struct list_node *head) {
example in Figure 6. int i = 0;
struct list_node *tail = head−>prev;
It is clear that there are two distinct flows of data throughout the
function, which only merge at the end. However, the aggregation while (head != tail && i < MAX) {
of the two calculations (through “return result”) causes the entire i++;
function body to be grouped as a single component. The PDG sub- head = head−>next;
graphs corresponding to these computations overlap, but only at the }
end when the results are returned.
One way of modeling this particular flow of data is through the return i;
}
concept of a forward slice [8], which is a specific variant of pro-
gram slicing [20]. A forward slice from a program point s with The forward slices from the declarations on the first and second
respect to a variable v consists of all program points that may be lines differ only in their respective first nodes. Considering these to
directly or indirectly affected by the execution of s with the value be separate computations would be a mistake: we not only create
Algorithm 1 Construct Semantic Threads and the domain of semantic threads consists of non-empty unions
1: function BST(P : PDG, γ : int): STs of forward slices, it follows that:
2: IST, seen ← ∅
3: Sort nodes in P in asc. order w.r.t. their locs. in source code. |f (s1 ) ∩ f (s2 )| > γ ⇒ ∃ T ∈ IST (P, γ).f (s1 ) ∪ f (s2 ) ⊆ T
4: for all node n in P do
5: if n ∈/ seen then
Similarly, if two slices must be combined, then any thread that
6: slice ← DepthFirstSearch(n, P ) conflicts with this combined thread must also be combined.
7: seen ← seen ∪ slice AddSlice implements this process of recursive greedy combi-
8: IST ← AddSlice(IST, slice, γ) nation exactly.
9: end if
10: end for Theorem 3.8 (Maximality) The set of interesting semantic threads
11: return IST returned by BST is maximal.
12: end function
13: P ROOF. Define BST (P, γ) to be the set returned by Algorithm 1.
14: function A DD S LICE(IST : STs, slice : ST, γ : int): STs Assume that there exists another set, IST , that meets all require-
15: conf licts ← ∅ ments and is strictly larger than this set.
16: for all thread T in IST do
17: if |slice ∩ T | > γ then BST (P, γ) = {T1 , . . . , Tn }
18: conf licts ← conf licts ∪ {T } ˘ ¯
19: end if IST (P, γ) = T1 , . . . , Tn , . . . , Tm
20: end for Because each set contains no fully subsumed threads, it follows that
21: if conf licts = ∅ then
22: return IST ∪ {slice}
there exists at least one “head node,” h1 , . . . , hn and h1 , . . . , hm
23: else for each semantic thread. By the pigeonhole principle,
S
24: slice ← slice ∪ T ∈conf licts T
∃ i, j, k. hi ∈ Tk ∧ hj ∈ Tk
25: return AddSlice(IST \ conf licts, slice, γ)
26: end if hi and hj are associated with unique forward slices that do not fully
27: end function subsume each other. By Lemma 3.7, because AddSlice com-
bined these two slices (f (hi ) ∪ f (hj ) ⊆ Tk ), they must be com-
redundant data, but we also fail to recognize the larger semantic
bined in every set that meets our requirements. Because these slices
thread and may miss important clones.
are separated in IST , IST does not meet our requirements—a
We define a set, IST (P, γ), that consists of our set of interesting
contradiction.
γ-overlapping semantic threads. These subgraphs represent our
candidates for possible semantic clones. In the worst case, the algorithm’s execution time is cubic in the
number of nodes of a given procedure’s PDG. In practice, this is
Definition 3.6 (Interesting Semantic Threads) The set of inter-
not a problem. The size of a given PDG is usually small, in the
esting γ-overlapping semantic threads is a finite set of semantic tens of nodes, and the number of non-subsumed forward slices is
threads with the following properties:
considerably less.
1. The set is complete; its union represents the entire PDG. The problem also scales gracefully in the sense that procedure
sizes are generally bounded: larger code bases have more proce-
2. The set must not contain any fully subsumed threads: dures, not necessarily larger procedures. Finally, our empirical re-
sults show that the execution time of this algorithm is inconsequen-
sl, sl ∈ IST (P, γ).sl ⊆ sl tial (Section 5).
3.3.3 Empirical Study
3. Any two threads in the set share at most γ nodes.
This section contains a brief evaluation of the occurrence of weakly
∀sl, sl ∈ IST (P, γ).|sl ∩ sl | ≤ γ connected components and semantic threads in real systems. We
evaluated five open source projects: The GIMP, GTK+, MySQL,
4. IST (P, γ) is maximal, i.e., it has the maximal size of all sets PostgreSQL, and the Linux kernel (these same projects are ana-
that meet properties 1-3. lyzed for clones in Section 5). Figure 7 contains the numbers of
weakly connected components per project.
With γ set to one, the first code example has two semantic threads.
Note that setting γ to zero is precisely equivalent to computing Procedures
Procedures with n WCCs
weakly connected components. 1 2 3 4 5+
Algorithm 1 is a simple greedy algorithm for computing this GIMP 13337 7255 3498 1255 627 702
set. The function AddSlice ensures that the final set contains GTK 13284 8773 2967 763 348 433
MySQL 14408 5419 6134 1450 616 789
no threads that overlap by more than γ nodes, and the enumeration PostgreSQL 9276 4105 3290 1033 335 513
of each node in the PDG implies the completeness of the returned Linux 136480 60533 52771 13273 5094 4809
set. The following argues that Algorithm 1 produces a maximal set. Figure 7: Number of weakly connected components.
Lemma 3.7 If AddSlice combines two slices, then they must be
combined in any set that satisfies the definition of interesting se- We noted that each project contains a significant number of pro-
mantic threads. cedures with more than one weakly connected component. This
suggests that there are functions in real systems that do in fact per-
P ROOF. Consider a procedure P and two statements, s1 and s2 . form separate computations. Not all of these computations are nec-
Assume that |f (s1 ) ∩ f (s2 )| > γ. Their individual presence in the essarily interleaved; they could be sequential, and may not repre-
final set would clearly violate property 3. sent new targets for clone detection.
Let IST (P, γ) be an arbitrary set that meets our requirements. Figure 8 counts the number of procedures that contain non-trivial
Because every node must be included in at least one semantic thread weakly connected components (γ = 0) and γ = 3 semantic threads.
Procs w/non-triv. Procs w/non-triv. pass increases the sliding window size by a multiplicative factor,
Procedures
γ = 0 STs γ = 3 STs
which we have set at 1.5. The sliding window phase terminates
GIMP 13337 903 3008 when the minimum vector size exceeds the size of the procedure.
GTK 13284 697 2380
MySQL 14408 1618 2441 We apply this exponential sliding window when generating vectors
PostgreSQL 9276 1221 2267 over semantic threads as well.
Linux 136480 10609 22514
Figure 8: Number of procedures with non-trivial semantic threads. 4.2 Other Implementation Considerations
Our greatest limitation is that we must have compilable code to
Non-trivial semantic threads include interleaved sequences of re- retrieve ASTs and PDGs, and only the compiled code is reflected in
lated code that cannot be detected by current scalable clone detec- these structures. At present, we do not have a way to scan code that
tion techniques. is deleted by the preprocessor before compilation (other than run-
Overall, we are able to consider a significant number of new ning multiple builds with different settings). To mitigate this prob-
clone candidates. In addition, the concept of γ-overlapping seman- lem for our evaluated projects, we set the configuration to maximize
tic threads allows us to extend our search to a far greater number of the amount of compiled code whenever possible. For example, our
potential clone candidates. Linux configuration builds every possible kernel option and builds
modules for every driver.
4. IMPLEMENTATION The construction of PDGs is not a trivial task, and it presents
This section describes the implementation of our tool. It consists scaling issues in its own right. CodeSurfer facilitates this process
of three primary components: AST and PDG generation, vector by offering numerous options that control the precision of the PDG
generation, and LSH clustering. To generate syntax trees and de- build. In order to build PDGs for projects on the million line scale,
pendence graphs, we use Grammatech’s CodeSurfer1 , which allows we were forced to disable precise alias analysis on all builds. This
us to analyze both C and C++ code bases. We output this data to undoubtedly leads to some imprecision in the final graphs, and we
a proprietary format using a Scheme script that utilizes Grammat- could potentially produce multiple semantic threads where only
ech’s API. one truly exists. This may cause our tool to miss certain seman-
This raw data is read and used by a Java implementation of tic clones, but it does not cause false positives.
D ECKARD’s vector generation engine. This component also per- With the addition of the semantic vector phase, we have the po-
forms the syntactic image mapping and semantic vector generation. tential to generate many duplicate vectors. This is not a problem in
The LSH clustering back-end of D ECKARD is used without modi- practice. First, we take advantage of the intraprocedural model and
fication. buffer all vectors before printing them, conservatively removing the
likely duplicates as they are added. The comparatively small size
4.1 Implementation Details of a single procedure lets these linear algorithms run quickly.
D ECKARD’s vector generation engine previously operated over Second, the LSH back-end is robust against these extra vectors:
parse trees. We reimplemented the algorithm to generate vectors they merely show up as duplicates in true clone groups or as spu-
for abstract syntax trees. In the process, we made several core im- rious clone groups. Our post-processing engine quickly removes
provements. these.
In order to efficiently utilize our dual core machines, we made Third, we exploit the fact that there is a correlation between the
the vector generation phase parallel using Java’s concurrency API. number of PDG nodes and the number of AST statement nodes. In
At present, we use a procedure as a single unit of work. The tasks the semantic vector phase, we size γ (the overlap constant) to be
are inherently independent: generating the vectors for a procedure strictly smaller than our minimum vector size.
does not require any data outside of the procedure. Our paral-
lel Java implementation generated vectors faster than we expected 5. EMPIRICAL EVALUATION
(Section 5). We evaluated the effectiveness of our tool on five open source
The move to ASTs also posed a challenge. Unlike token-based projects: The GIMP, GTK+, MySQL, PostgreSQL, and the Linux
parse trees, setting the minimum size for a vector was not intu- kernel. The evaluation was performed against D ECKARD, the state-
itive. While 30 tokens (D ECKARD’s default) usually map to about of-the-art tool for detecting syntactic clones. In this section, we
three statements, 30 AST nodes could map to either fewer (less than also present examples of new classes of detectable clones.
one) or many more. Instead of judging a vector’s size on its magni-
tude, we utilize the additional semantic information from the AST 5.1 Experimental Setup
type hierarchy to judge vectors based on the number of contained
We performed our evaluation on a Fedora 6/64bit workstation
statement nodes. We modified both the subtree and sliding window
with a 2.66GHz Core 2 Duo and 4GB of RAM. We used CodeSurfer
phases to use this new measure.2
2.1p1 and the Sun Java 1.6.0u1 64-bit server VM. To set up the data
One challenge the original D ECKARD faced was the coverage
for analysis, we first maximized the build configuration of each
of all interesting combinations of statements. The coverage was
project. We then allowed CodeSurfer to build the PDGs and dump
affected by three parameters: the minimum vector size, the size of
the information to a file. Figure 9 lists the approximate project sizes
the sliding window, and the sliding window’s stride, or how often it
and build times for our test targets. The size metric is approximate;
outputs vectors. Because the sliding window is now sized on state-
all whitespace is counted.
ment nodes, we can permanently set the stride to one and output all
interesting vectors: each new vector has at least one new statement.
Size (MLoc) PDG Build Time PDG Dump Time
Instead of operating in a single pass over a fixed vector size,
we scan several times, starting at the minimum vector size. Each GIMP 0.78 25m 57s 20m 40s
GTK 0.88 12m 50s 16m 54s
1
http://www.grammatech.com MySQL 1.13 16m 56s 12m 36s
2 PostgreSQL 0.74 9m 12s 21m 48s
As a usability improvement, we can also set the measure to use Linux 7.31 296m 1s 241m 4s
lines of code. This can cause issues with multiline statements,
though. Figure 9: Project sizes and AST/PDG build times.
The PDG builds—especially for the Linux kernel—are particu- 1 static void zc0301_release_resources(struct zc0301_device* cam)
larly expensive. When viewed in the context of other PDG based 2 {
detection approaches that use subgraph isomorphism testing, though, 3 DBG(2, "V4L2 device /dev/video%d deregistered"
, cam−>v4ldev−>minor);
the build times are quite reasonable. In addition, this cost is in- 4 video_set_drvdata(cam−>v4ldev, NULL);
curred once per project: repeated runs of our tool reuse the same 5 video_unregister_device(cam−>v4ldev);
input data. 6 kfree(cam−>control_buffer);
7 }
5.2 Performance
Through our testing, we observed that requiring a minimum state- 1 static void sn9c102_release_resources(struct sn9c102_device* cam)
2 {
ment node count of 8 produces clones similar in size to D ECKARD’s 3 mutex_lock(&sn9c102_sysfs_lock);
minimum token count of 50. Figure 10 shows the execution times
for both our semantic clone detection algorithm and our tree-based 5 DBG(2, "V4L2 device /dev/video%d deregistered"
algorithm. , cam−>v4ldev−>minor);
6 video_set_drvdata(cam−>v4ldev, NULL);
7 video_unregister_device(cam−>v4ldev);
AST Only AST/PDG
VGen Cluster VGen Cluster 9 mutex_unlock(&sn9c102_sysfs_lock);
10 kfree(cam−>control_buffer);
GIMP 0m37s 1m11s 0m44s 1m45s 11 }
GTK 0m31s 0m57s 0m34s 0m53s
MySQL 0m27s 1m16s 0m29s 1m34s
PostgreSQL 0m40s 1m50s 0m51s 2m30s Figure 12: Two semantic clones differing only by global locking
Linux 8m42s 6m1s 9m48s 7m24s (Linux).
Figure 10: Clone detection times.
This low false positive rate is possibly due to the relatively large
magnitude of AST-based vectors: the Linux kernel code contained
In this table, the VGen phase performs all vector generation. For
(after macro expansion) an average of 30 AST nodes per line. These
both the tree-only and the tree/PDG modes, this includes the sub-
larger vectors create a more unique signature for each line of code
tree and sliding window phases. In addition, the AST/PDG mode
that is less likely to incidentally match a non-identical line.
enumerates both the weakly connected components and the γ = 3
semantic threads (Algorithm 1) and enumerates their respective 5.3 Qualitative Analysis
vectors using the sliding window. The quantitative results show that this technique finds more clones
Semantic vector generation adds surprisingly little to the execu- with a larger average size. However, this new class of analysis de-
tion overhead. We can attribute this to several factors: serves a closer, qualitative look at the results. Semantic clones are
• PDGs are in general significantly (about an order of magni- more interesting than simple copied and pasted or otherwise struc-
tude) smaller than their equivalent ASTs; turally identical code. We have observed programming idioms that
• There are relatively few semantic threads per procedure; and are pervasive throughout the results.
• Our parallel Java implementation allows the utilization of On a general level, our tool was able to locate semantic clones
spare CPU cycles that sat idle during the previously IO-bound that were slightly to somewhat larger than their syntactic equiva-
tree-only phase. lents, which were also found. The semantic clone often contained
the syntactic clone coupled with a limited number of declarations,
Coverage wise, our tool locates more clones than its tree-only initializations, or return statements that were otherwise separated
predecessor. This is expected: we produce exactly the same set from the syntactic clone by unrelated statements. In addition, many
of vectors, then augment it with vectors for semantic clones. In semantic clones were subsumed by larger syntactic clones.
many cases, we observed that the average number of cloned lines We observed cases where our tool was able to locate clone groups
of code per clone group differs significantly between the tree-only that differed only in their use of global locking (Figures 12 and 13).
and semantic versions of the analysis. As we increase the minimum In each case, the tool generated semantic threads for the intrinsic
number of statement nodes for a given clone group, the clones re- calculation as well as the locking. While the locking pattern itself
ported by the semantic analysis tend to cover more lines of code was too small to be considered a clone candidate, the calculations
than those reported by the tree-only analysis. themselves were matched. In Figure 12, we omitted the third mem-
We believe this is because when the minimum vector size is set ber of the clone group due to space restrictions. This third member
to smaller values, the larger semantic clones are detected simul- also had the locking code in place, and each came from very similar
taneously with their smaller, contiguous constituent components. drivers. This lack of locking in one of the three could possibly be
While the semantics-based analysis is able to tie these disparate indicative of a bug.
components together, it does not necessarily increase the cover- Our tool also found clones that differed only by debugging state-
age. When the minimum is raised, these smaller components are ments. One example appears in Figure 14. While we found several
no longer detected as clones. examples of this behavior, we do suspect that we missed other cases
Figure 11 contains our coverage results for the Linux kernel. due to the fact that logging code often displays current state infor-
Line counts are conservative: we count the precise set of lines cov- mation. This places a data dependency on the logging code and
ered by each clone group. Whitespace is ignored, and multi-line causes its inclusion in a larger semantic thread.
statements are usually counted as a single line. We were able to discover specific data access patterns. One ex-
After each of our experiments, we sampled thirty clone groups at ample appears in Figure 15. The pattern consists of the semantic
random and verified their contents as clones. When the minimum thread created by the union of the forward slices of the underlined
number of statement nodes was set to 4, we experienced a false variables. Note that unrelated (data-wise, but perhaps temporally)
positive rate of 2 in 30. These false positives took the form of small statements are interleaved through the pattern-forming code. These
(two to three lines) snippets of code that incidentally mapped to frequency and complexity of these “patterns” implies that they are
identical characteristic vectors. When the minimum was set to 8 or possibly prescriptive and not just coincidental. They could then be
more, we found no false positives in these random samples. used as a specification for bug finding.
Min. Nds. AST Only PDG/AST Min. Nds. AST Only PDG/AST Min. Nds. AST Only PDG/AST
4 935203 940497 4 160934 170544 4 13.9 14.1
8 350804 354079 8 49003 54761 8 15.5 16.2
14 150694 152484 14 16114 18918 14 20.8 22.5
22 65275 66489 22 5692 7439 22 26.5 30.1
32 30039 30367 32 2295 3446 32 31.9 38.9
(a) Cloned LOC (b) Num. of Clone Groups (c) Avg. Cloned LOC / Group
Figure 11: Coverage results for the Linux Kernel.
Figure 13: Another example of semantic clones differing only by global locking (MySQL).
6. DISCUSSION AND RELATED WORK and the ability to detect interleaving patterns might increase the
This work presents the first scalable clone analysis that incorpo- scope of the analysis.
rates semantic information. Komondoor and Horwitz [12] use the Another potential interesting application of this work is soft-
dependence graph to find semantically identical code fragments. ware plagiarism detection. Current, well-used tools include Moss
They also successfully use this technique [11] to identify candi- [18] and JPlag3 , but these are too coarse grained to find general
dates for automatic procedure extraction. Our work also has the sets of code clones. Liu et al. [16] have recently developed the
potential to be used in this way: our semantic threads are similar GPLAG tool, which applies subgraph isomorphism testing to PDGs
to the subgraphs that they discover and extract. Their work relies to identify plagiarized code. They note that PDGs are resilient to
heavily on expensive graph algorithms and pairwise comparisons semantics-preserving modifications like (unrelated) statement in-
and does not scale like ours: they report analysis times of more sertion, statement reordering, and control replacement. Our tech-
than an hour [12] (not including the PDG build) for a 10,000 line nique can easily handle interleaved clones, which are characteristic
program. Our algorithm’s scalability would allow us to analyze of code with purposefully inserted garbage statements. We expect
larger projects that may have a greater number of duplicate code that our technique can be straightforwardly extended to handle con-
fragments. This scalability also makes possible a more detailed trol replacement as well.
and direct comparison with different techniques and tools, similar We also handle statement reordering: we eliminate ordering in-
to the experiments performed by Bellon et al. [4]. formation as a result of our transformation of trees to characteristic
We use a scalable, approximate technique for solving the tree vectors. Our scalability provides additional opportunities. For ex-
similarity problem. Wahler et al. [19] use frequent itemset mining ample, our tool could be used to perform open source license com-
on serialized representations of ASTs to detect clones. Other tech- pliance checks for proprietary software. In this mode, we could
niques [13, 17] generate fingerprints of subtrees and report code generate a large body of vectors representing common or related
with similar fingerprints as clones. Compared with our vector- open source projects and include them in the clustering phase. We
based clone detection, these techniques are less scalable and more leave for future work the evaluation of our tool’s applicability to
coarse grained. plagiarism detection.
Most potential applications for purely syntactic clone detection Aside from the scale issues of performing both pairwise compar-
are also feasible for semantics-assisted clone detection, and other, isons and subgraph isomorphism testing, GPLAG considers only
new applications exist as well. In the previous section, we iden- top level procedures as candidates for clones. We are able to con-
tified code patterns that our tool was able to find with the aid of sider a much larger set that includes smaller code fragments. Their
dependency information. Bruntink et al. studied the capabilities definition of code similarity as general subgraph isomorphism is
of token-based and AST-based clone detection tools for detecting also less refined than ours: two isomorphic subgraphs that cross
crosscutting concerns [5]. Our PDG-based clone definition may logical flows of data are not likely to be interesting clones. We
further facilitate such a detection since crosscutting concerns may more carefully enumerate these flows as semantic threads.
form large semantic threads. Li and Zhou [15] use frequent item-
set mining to identify similar code patterns. Their technique is 7. CONCLUSIONS AND FUTURE WORK
highly scalable as well, but the mined properties lack temporal
This paper presents the first scalable algorithm for semantic clone
information–only association. The patterns we inferred are specific
detection based on dependence graphs. We have extended the def-
and precise, reflecting direct data flow relationships. However, we
inition of a code clone to include semantically related code and
found fewer total patterns. We leave for future work the study of
provided an approximate algorithm for locating these clone pairs.
our tool’s efficacy in mining true specifications and the evaluation
We reduced the difficult graph similarity problem to a tree similar-
of these pattern and data-based specifications against those found
ity problem by mapping interesting semantic fragments to their re-
by automaton-learning techniques [1].
lated syntax. We then solved this tree-based problem using a highly
Clone detection has also been used to detect design level similar-
scalable technique. We have implemented a practical tool based
ities. Basit and Jarzabek [2] use CCFinder to detect syntactic clone
on our algorithm that scales to millions of lines of code. It finds
fragments and later correlate them using data mining techniques.
Our semantics-based technique could be used in this way as well, 3
http://www.jplag.de
1 struct nfs_server *server = NFS_SB(sb);
1 struct nfs_server *server = NFS_SB(sb);
4 struct inode *inode;
5 struct inode *inode;
5 int error;
6 int error;
7 dprintk("--> nfs4_get_root()\n");
8 /* create a dummy root dentry with dummy inode for this superblock */
9 if (!sb−>s_root) {
9 /* create a dummy root dentry with dummy inode for this superblock */
10 struct nfs_fh dummyfh;
10 if (!sb−>s_root) {
16 nfs_fattr_init(&fattr);
17 nfs_fattr_init(&fattr);
17 fattr.valid = NFS_ATTR_FATTR;
18 fattr.valid = NFS_ATTR_FATTR;
18 fattr.type = NFDIR;
19 fattr.type = NFDIR;
Figure 14: Partial semantic clones differing only by a debugging statement (Linux).