In upgrading rust-lang/rust to LLVM 5.0 we've discovered that the compiler itself will segfault when compiled with LLVM 5.0. This looks like it's a misoptimization during one of the `-jump-threading -correlated-propagation` passes. We used `bugpoint` to shorten the program from ~1 million lines of IR to just a few: ; ModuleID = 'small.ll' source_filename = "bugpoint-output-834ffe1.bc" target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" declare i8* @foo() declare i32 @rust_eh_personality() unnamed_addr ; Function Attrs: nounwind declare void @llvm.assume(i1) #0 define void @bar() personality i32 ()* @rust_eh_personality { bb9: %t9 = invoke i8* @foo() to label %good unwind label %bad bad: ; preds = %bb9 %t10 = landingpad { i8*, i32 } cleanup resume { i8*, i32 } %t10 good: ; preds = %bb9 %t11 = icmp ne i8* %t9, null %t12 = zext i1 %t11 to i64 %cond = icmp eq i64 %t12, 1 br i1 %cond, label %if_true, label %done if_true: ; preds = %good call void @llvm.assume(i1 %t11) br label %done done: ; preds = %if_true, %good ret void } attributes #0 = { nounwind } This was tested out locally via `opt -S -jump-threading -correlated-propagation foo.ll`, and then the results from before/after were diffed, yielding: --- before.ll 2017-07-24 15:52:48.364787248 -0700 +++ after.ll 2017-07-24 15:53:34.701537275 -0700 @@ -21,13 +21,12 @@ resume { i8*, i32 } %t10 good: ; preds = %bb9 - %t11 = icmp ne i8* %t9, null - %t12 = zext i1 %t11 to i64 + %t12 = zext i1 true to i64 %cond = icmp eq i64 %t12, 1 br i1 %cond, label %if_true, label %done if_true: ; preds = %good - call void @llvm.assume(i1 %t11) + call void @llvm.assume(i1 true) br label %done done: ; preds = %if_true, %good I think that the transformation here may be invalid? The `%t11` temporary is inferred to being always `true` but that's not always the case coming out of `@foo`
Oh I should mention that this is against the current `release_50` branch of LLVM as well!
The transformation is being performed because %t11 is passed to @llvm.assume, which according to the semantics [1], allows the optimizer to assume the conditional is _always_ true. It seems your code expected that @llvm.assume, when applied to a conditional, indicates that the conditional may be assumed to be true only at the point of the intrinsic. [1] https://llvm.org/docs/LangRef.html#llvm-assume-intrinsic
(In reply to Kavon Farvardin from comment #2) > The transformation is being performed because %t11 is passed to > @llvm.assume, which according to the semantics [1], allows the optimizer to > assume the conditional is _always_ true. > > It seems your code expected that @llvm.assume, when applied to a > conditional, indicates that the conditional may be assumed to be true only > at the point of the intrinsic. > > [1] https://llvm.org/docs/LangRef.html#llvm-assume-intrinsic I haven't looked at this closely, just bisected to https://reviews.llvm.org/D30352 but I'm afraid there's a bug hidden somewhere. In fact, if you run ./opt -jump-threading | ./opt -correlated-propagation you get a different result (in fact, the code is not optimized). This feels ... wrong. I think we're getting somehow a wrong value from LVI, and as JumpThreading caches LVI this only happens when you run the two passes in sequence.
@Kavon: Mh, are you sure? That's not how I read the semantics section of assume, to quote: > The intrinsic allows the optimizer to assume that the provided condition is always true **whenever the control flow reaches the intrinsic call**. Note the **highlighted** part.
(In reply to Tim Neumann from comment #4) > @Kavon: Mh, are you sure? That's not how I read the semantics section of > assume, to quote: > > > The intrinsic allows the optimizer to assume that the provided condition is always true **whenever the control flow reaches the intrinsic call**. > > Note the **highlighted** part. Oops, I read the wrong subsection... transformation is wrong.
Again, I suspect it's a bug in LVI. Please note that if you remove the caching of LVI from JT (i.e. you comment the addPreserved<JumpThreading...> line) you get the correct result. When we ask what it thinks about the value produced by the invoke, it thinks that it's never null (notconstant in LVI lingo means that the constant *can't* have that value) LVI Getting value %t9 = invoke i8* @foo() to label %good unwind label %bad at 't11' Result = notconstant<i8* null> while it should be Result = overdefined as we can't necessarily conclude anything about `%t9` for a couple of reasons: 1) This is an intraprocedural analysis 2) We don't know what `@foo` will do so inferring a return value out-of-thin-air seems just wrong.
This is an hilarious bug, and I think I got to the bottom of it. So, the reason why LVI gives us that answer is that it thinks the assumption is valid for the context we're in (which, of course is not). If we run only correlated-value-propagation (or disable the caching of LVI), we don't have a DomTree available, so `isValidAssumeForContext()` looks at BBs parent->children relations, i.e. llvm::isValidAssumeForContext (Inv=0x5b77490, CxtI=0x5b76ac0, DT=0x0) at ../lib/Analysis/ValueTracking.cpp:475 475 if (DT) { (gdb) n 485 if (Inv->getParent() != CxtI->getParent()) (gdb) n 486 return false; If we cache LVI, we *think* we have a DomTree, so here's how the invocation of `llvm::isValidAssumeForContext() looks like` (gdb) s llvm::isValidAssumeForContext (Inv=0x5b77490, CxtI=0x5b76ac0, DT=0x5b977e0) at ../lib/Analysis/ValueTracking.cpp:475 475 if (DT) { (gdb) n 476 if (DT->dominates(Inv, CxtI)) (gdb) n 477 return true; I was very confused because I didn't really believe in this dominance relation, so I ran verify() on the DT and booom, I got a crash. So, we happen to get an answer, but the underlying dominator tree is completely bogus. Running with some more debugging reveals that: [2017-07-25 11:22:56.638141270] 0x76c8d00 Executing Pass 'Function Pass Manager' on Module 'x.ll'... [2017-07-25 11:22:56.638220852] 0x76e8860 Executing Pass 'Dominator Tree Construction' on Function 'bar'... [2017-07-25 11:22:56.638297844] 0x76e8860 Executing Pass 'Basic Alias Analysis (stateless AA impl)' on Function 'bar'... 0x76e8710 Required Analyses: Assumption Cache Tracker, Dominator Tree Construction, Target Library Information [2017-07-25 11:22:56.638354993] 0x76e8860 Executing Pass 'Function Alias Analysis Results' on Function 'bar'... 0x76e8660 Required Analyses: Basic Alias Analysis (stateless AA impl), Target Library Information 0x76e8660 Used Analyses: Scoped NoAlias Alias Analysis, Type-Based Alias Analysis, ObjC-ARC-Based Alias Analysis, Globals Alias Analysis, ScalarEvolution-based Alias Analysis, Inclusion-Based CFL Alias Analysis, Unification-Based CFL Alias Analysis [2017-07-25 11:22:56.638427811] 0x76e8860 Executing Pass 'Lazy Value Information Analysis' on Function 'bar'... 0x76e96d0 Required Analyses: Assumption Cache Tracker, Target Library Information [2017-07-25 11:22:56.638462642] 0x76e8860 Executing Pass 'Jump Threading' on Function 'bar'... 0x76e8410 Required Analyses: Function Alias Analysis Results, Lazy Value Information Analysis, Target Library Information 0x76e8410 Preserved Analyses: Lazy Value Information Analysis, Globals Alias Analysis -- 'Jump Threading' is not preserving 'Function Alias Analysis Results' -- 'Jump Threading' is not preserving 'Basic Alias Analysis (stateless AA impl)' -- 'Jump Threading' is not preserving 'Dominator Tree Construction' -*- 'Jump Threading' is the last user of following pass instances. Free these instances [2017-07-25 11:22:56.638719401] 0x76e8860 Freeing Pass 'Jump Threading' on Function 'bar'... [2017-07-25 11:22:56.638737039] 0x76e8860 Freeing Pass 'Function Alias Analysis Results' on Function 'bar'... [2017-07-25 11:22:56.638751239] 0x76e8860 Freeing Pass 'Basic Alias Analysis (stateless AA impl)' on Function 'bar'... [2017-07-25 11:22:56.638765276] 0x76e8860 Freeing Pass 'Dominator Tree Construction' on Function 'bar'... [2017-07-25 11:22:56.638782278] 0x76e8860 Executing Pass 'Value Propagation' on Function 'bar'... 0x76e9780 Required Analyses: Lazy Value Information Analysis [2017-07-25 11:22:56.638881102] 0x76e8860 Made Modification 'Value Propagation' on Function 'bar'... 0x76e9780 Preserved Analyses: Globals Alias Analysis -- 'Value Propagation' is not preserving 'Lazy Value Information Analysis' So, JumpThreading claims it preserves LVI, but not DomTree, we free DomTree, but LVI still uses it as it's cached, so it's basically accessing invalid memory. Trying to work on a fix now.
Thanks so much for the investigation and quick diagnosis!
Anna, this looks like may interfere a bit with yours "print LVI cache after transforms" work, therefore I'd love your opinion on it. After the analysis, it's actually clear that LVI isn't really preserved by JT, because all the analyses which depend on it (and LVI holds as references) gets freed if JT is the last user (as it happens here). So, I see several options: 1) Drop the preservation of LVI from JT (which, as far as I can tell, is the only in-tree pass preserving that analysis). This may have a compile-time impact, but I'd rather leave with it than querying broken analyses. FWIW, as far as I can tell Chandler implemented in the new pass manager a mechanism to invalidate LVI if one of the dependencies gets invalidated & in the new pass manager JT currently doesn't preserve LVI, so that is a legacy PM only problem. 2) Teach JT to preserve all the dependencies of LVI (i.e. DomTree & AA & AssumptionCache). I haven't estimated how much work it is, but my gut feeling says it won't be trivial. This has a maintainability drawback as well as every new analysis that preserves LVI should also explicitly be taught to preserve all the other passes, otherwise it's useless. Ideally long(er) term we might consider moving away from analyses holding dependencies to other analyses as they're quite tricky to handle correctly (as this PR shows :) What do you think? CC:ing Chandler as well as he might have thoughts.
(In reply to Davide Italiano from comment #9) > Anna, this looks like may interfere a bit with yours "print LVI cache after > transforms" work, therefore I'd love your opinion on it. > > After the analysis, it's actually clear that LVI isn't really preserved by > JT, because all the analyses which depend on it (and LVI holds as > references) gets freed if JT is the last user (as it happens here). > > So, I see several options: > 1) Drop the preservation of LVI from JT (which, as far as I can tell, is the > only in-tree pass preserving that analysis). This may have a compile-time > impact, but I'd rather leave with it than querying broken analyses. > FWIW, as far as I can tell Chandler implemented in the new pass manager a > mechanism to invalidate LVI if one of the dependencies gets invalidated & in > the new pass manager JT currently doesn't preserve LVI, so that is a legacy > PM only problem. > Hi Davide, Thanks for the analysis on this bug and I agree that dropping preservation of LVI in JT is probably the safest (and quickest) way forward. This would also fix another class of bugs: LVI cache from JT that's used by CVP. You're right that it would interfere with the "printing of LVI after transforms" work. I had initially added it precisely for LVI caching bugs in JT, which are itself tricky :) However, the printing also enables us to view the LVI analysis results (for example: -lazy-value-info -print-lvi), apart from what happens after a transform. That functionality would remain intact, and useful even when we move away from the caching model for LVI, or when we no longer preserve LVI after JT.
(In reply to Anna from comment #10) > (In reply to Davide Italiano from comment #9) > > Anna, this looks like may interfere a bit with yours "print LVI cache after > > transforms" work, therefore I'd love your opinion on it. > > > > After the analysis, it's actually clear that LVI isn't really preserved by > > JT, because all the analyses which depend on it (and LVI holds as > > references) gets freed if JT is the last user (as it happens here). > > > > So, I see several options: > > 1) Drop the preservation of LVI from JT (which, as far as I can tell, is the > > only in-tree pass preserving that analysis). This may have a compile-time > > impact, but I'd rather leave with it than querying broken analyses. > > FWIW, as far as I can tell Chandler implemented in the new pass manager a > > mechanism to invalidate LVI if one of the dependencies gets invalidated & in > > the new pass manager JT currently doesn't preserve LVI, so that is a legacy > > PM only problem. > > > Hi Davide, > Thanks for the analysis on this bug and I agree that dropping preservation > of LVI in JT is probably the safest (and quickest) way forward. This would > also fix another class of bugs: LVI cache from JT that's used by CVP. > > You're right that it would interfere with the "printing of LVI after > transforms" work. I had initially added it precisely for LVI caching bugs in > JT, which are itself tricky :) > > However, the printing also enables us to view the LVI analysis results (for > example: -lazy-value-info -print-lvi), apart from what happens after a > transform. That functionality would remain intact, and useful even when we > move away from the caching model for LVI, or when we no longer preserve LVI > after JT. Thanks. I agree this functionality is very useful and should stay here. After the change one of the tests seem to fail (lvi-after-jumpthreading.ll) because the LVI cache has been purged when we call printLVI. Can you think of an alternative way to test? (as there is no coverage of the feature without that test?) Thanks! -- Davide
Yes, that test will fail. I thought about the printer pass a bit more. There are 3 kinds of bugs we can catch with the LVI printer: 1. missed cache invalidation entries of LVI between transforms 2. missed cache invalidation entries of LVI during a transform. 3. incorrectly calculated LVI results during a transform #1 will no longer occur once we avoid preservation of LVI in JT (since there's no in-tree passes that preserve LVI). I had written and used the pass for catching bugs of kind #1. To get #2 and #3, we will need to print the LVI *before* purging the cache in JT (and/or CVP) passes. Can we require the printer pass as part of finalization of the JT pass? I'm not sure how else we can test #2 and #3 (without which we'll have no coverage or purpose of the printer pass in tree).
(In reply to Anna from comment #12) > Yes, that test will fail. I thought about the printer pass a bit more. > There are 3 kinds of bugs we can catch with the LVI printer: > 1. missed cache invalidation entries of LVI between transforms > 2. missed cache invalidation entries of LVI during a transform. > 3. incorrectly calculated LVI results during a transform > > #1 will no longer occur once we avoid preservation of LVI in JT (since > there's no in-tree passes that preserve LVI). I had written and used the > pass for catching bugs of kind #1. > To get #2 and #3, we will need to print the LVI *before* purging the cache > in JT (and/or CVP) passes. Can we require the printer pass as part of > finalization of the JT pass? I'm not sure how else we can test #2 and #3 > (without which we'll have no coverage or purpose of the printer pass in > tree). What about adding a cl::opt that calls the printer pass after JT runs (but before freeing the state?)
(In reply to Davide Italiano from comment #13) > (In reply to Anna from comment #12) > > Yes, that test will fail. I thought about the printer pass a bit more. > > There are 3 kinds of bugs we can catch with the LVI printer: > > 1. missed cache invalidation entries of LVI between transforms > > 2. missed cache invalidation entries of LVI during a transform. > > 3. incorrectly calculated LVI results during a transform > > > > #1 will no longer occur once we avoid preservation of LVI in JT (since > > there's no in-tree passes that preserve LVI). I had written and used the > > pass for catching bugs of kind #1. > > To get #2 and #3, we will need to print the LVI *before* purging the cache > > in JT (and/or CVP) passes. Can we require the printer pass as part of > > finalization of the JT pass? I'm not sure how else we can test #2 and #3 > > (without which we'll have no coverage or purpose of the printer pass in > > tree). > > What about adding a cl::opt that calls the printer pass after JT runs (but > before freeing the state?) Ah, that should work as well, and definitely much simpler :)
Anna, the first patch is here https://bugs.llvm.org/show_bug.cgi?id=33917 I'd love to get this included in 5.0 (otherwise the Rust folks will have to carry local patches around). Thanks! -- Davide
(In reply to Davide Italiano from comment #15) > Anna, the first patch is here > https://bugs.llvm.org/show_bug.cgi?id=33917 > > I'd love to get this included in 5.0 (otherwise the Rust folks will have to > carry local patches around). > > Thanks! > > -- > Davide done. Thanks!
r309353/r309355. Hans, we might consider backporting this to 5.0 as it's blocking the Rust people.
Awesome thanks so much again for the prompt fixing and helping to track this down!
(In reply to Davide Italiano from comment #17) > r309353/r309355. > > Hans, we might consider backporting this to 5.0 as it's blocking the Rust > people. Merged in r309439.