[GSoC 2024] Remove undefined behavior from tests

nlopes · February 26, 2024, 9:22pm

Many of LLVM’s unit tests have been reduced automatically from larger tests. Previous-generation reduction tools used undef and poison as placeholders everywhere, as well as introduced undefined behavior (UB). Tests with UB are not desirable because 1) they are fragile since in the future the compiler may start optimizing more aggressively and break the test, and 2) it breaks translation validation tools such as Alive2 (since it’s correct to translate a fuction that is always UB into anything).
The major steps include:

Replace known patterns such as branch on undef/poison, memory accesses with invalid pointers, etc with non-UB patterns.
Use Alive2 to detect further patterns (by searching for tests that are always UB).
Report any LLVM bug found by Alive2 that is exposed when removing UB.

Expected result: The majority of LLVM’s unit tests will be free of UB.

Skills: Experience with scripting (Python or PHP) is required. Experience with regular expressions is encouraged.

Project size: Either medium or large.

Difficulty: Medium

miguelraz · February 27, 2024, 1:40am

Hello Nuno!
This seems like cool stuff!

So far, I’ve found this:

~/o/l/l/test (main)> rg "phi(.)*undef, " | wc -l
2091
~/o/l/l/test (main)> rg "undef, " | wc -l
147419
~/o/l/l/test (main)> rg "poison, " | wc -l
70385
~/o/l/l/test (main)> rg "phi(.)*poison," | wc -l
676

Quick questions:

Should the ultimate goal just be “no poison or undef should be part of the test suite” or rather “be clever about your searching so that no use of poison or undef that leads to UB should be part of the test suite”?
Are there known “quick fixes” for some of the easier patterns to detect?

Thanks for putting this together, to see this initiative take some traction.

ChuanqiXu · February 27, 2024, 2:50am

nit: the “unit test” here actually means “lit test”, right?

nlopes · February 27, 2024, 7:59am

No, the goal is not to eliminate all uses of undef and poison, just the ones that render functions always UB. The text above mentions the known patterns.
Furthermore, uses of undef should be replaced with poison whenever it makes sense.

Yes, e.g., ‘br undef’ → br %a_new_fn_argument’

nlopes · February 27, 2024, 8:00am

Anything in llvm/tests.

pzh97 · March 21, 2024, 12:36am

Hi! As previously discussed, the task is to identify tests that cause undefined behavior (UB) in functions. So, I was wondering if we are simply removing tests containing the following patterns using regular expressions:

div v, undef / poison / 0
br undef / poison
load ptr undef / poison
store val, ptr undef / undef
memcpy ptr undef / poison
getelementptr undef/poison

I was also curious about how we could theoretically define which patterns might result in UB (in a general sense). And are there any suggested references?

Thanks!

nlopes · March 21, 2024, 8:43am

There is a theoretical definition of UB, yes. There are many cases, so the easiest is to use a tool like Alive2 to find all those cases automatically.

If you are curious about the cases, you can start by searching for undefined in LLVM Language Reference Manual — LLVM 19.0.0git documentation

azmat-y · March 25, 2024, 6:04am

Hi,
I am interested in this project. I have scanned through the Language reference and read some articles on alive2 Alive2 Part 1: Introduction . Are there some other references or documented fixes or such that you would recommend to read?

cascades-sjtu · March 25, 2024, 7:26am

Hi, I’m interested in this project, As you mentioned, the project needs experience with Python scripting, is there any recommendation for the LLVM’s Python binding? What about llvmlite and llvmpy?

nlopes · March 25, 2024, 7:59am

You can read this: https://web.ist.utl.pt/nuno.lopes/pubs/undef-pldi17.pdf

nlopes · March 25, 2024, 8:01am

I was thinking about regular expressions, not reading/writing IR through some bindings, as that would cause too many changes in the tests.
Also, it doesn’t have to be Python. PHP is fine as well (as much faster FWIW).

cascades-sjtu · March 27, 2024, 8:04am

I see, thanks for your answer

azmat-y · March 28, 2024, 6:31am

Hello again,

This is a question related to proposal and alive2. In the proposal should I document all the undefined behavior found in the LangRef that can be found in llvm/test and fixes that I can find?

Also I tried running alive2 using on InstCombine transforms

$LLVM2_BUILD/bin/llvm-lit -s -Dopt=$ALIVE2_HOME/alive2/build/opt-alive.sh $LLVM2_HOME/llvm/test/Transforms/InstCombine

It seem to go on for 2 hours without terminating.
Is this due to processing power or is there something else at play?

My CPU Specifications
i3 10th generation, 2 cores 4 threads

nlopes · March 28, 2024, 9:20am

Alive2 takes ~4 hours to run on an 8-core server (for the whole Transforms dir). I don’t have stats for InstCombine alone.
An i3 2x cores is going to take a whole day I guess.

The proposal doesn’t need to list all UB cases. We know what they are. You should write about the strategy to find & fix all the cases.

kevwjin · March 29, 2024, 1:18am

Hello, I finished reading through the Alive2 Part 1: Introduction that @azmat-y linked. The article features a bug found using Alive2, which I’m having trouble interpreting the Alive2 output for. Considering the %y = 0 as suggested by Alive2, I manually traced through and wrote down the (incorrect?) range for each value, also arriving at the conclusion that the transformation doesn’t verify:

define i1 @cv0_GOOD(i8 %y) {
%0:
  %x = call i8 @gen8()          (-128, 127) // assuming x is signed
  %tmp0 = lshr i8 255, %y       (255) // assuming tmp0 is unsigned
  %tmp1 = and i8 %tmp0, %x      (-128, 127) // bitwise `and` returns x
  %ret = icmp sgt i8 %x, %tmp1  (1) // x == tmp1 is always true
  ret i1 %ret
}
=>
define i1 @cv0_GOOD(i8 %y) {
%0:
  %x = call i8 @gen8()          (-128, 127) // assuming x is signed
  %tmp0 = lshr i8 255, %y       (255) // assuming tmp0 is unsigned
  %1 = icmp sgt i8 %x, %tmp0    (0) // anything in (-128, 127) isn't > than 255
  ret i1 %1
}
// %ret = 1 and %1 = 0 respectively, so the transformation doesn't verify

However, the Alive2 output is as follows:

Example:
i8 %y = #x00 (0)

Source:
i8 %x = #x7f (127, -129)
i8 %tmp0 = #xff (255, -1)
i8 %tmp1 = #x7f (127, -129)
i1 %ret = #x0 (0)

Target:
i8 %x = #x7f (127, -129)
i8 %tmp0 = #xff (255, -1)
i1 %1 = #x1 (1)
Source value: #x0 (0)
Target value: #x1 (1)

It arrives at the same conclusion, with %ret = 1 and %1 = 0, but I’m not sure what its parentheses notation represents. Any clarification would be appreciated!

nlopes · March 29, 2024, 8:54am

The parenthesis are not ranges. They just show the signed & unsigned representation of the values.

kevwjin · March 29, 2024, 3:41pm

Since they are not ranges, how did Alive2 determine the values 127 and -129 for %x? The -129 signed value is particularly confusing to me since it requires 9 bits to represent in two’s complement form, but i8 can only store 8-bit signed values.

nlopes · March 29, 2024, 4:02pm

I can’t reproduce what you’re observing. See here: Compiler Explorer
Maybe you’re using an old version of Alive2 or Z3?

If you are able to make the online produce such a wrong result, please file a bug report.

kevwjin · March 29, 2024, 5:12pm

Sorry, I believe I wasn’t clear in my original post. I did not produce the output; I’m just reading the Llvm Bugzilla Report which has your comment from 2019:

Nuno Lopes 2019-06-08 15:42:02 PDT
The following instcombine rewrite seems incorrect (test/Transforms/InstCombine/canonicalize-constant-low-bit-mask-and-icmp-sgt-to-icmp-sgt.ll):

define i1 @cv0_GOOD(i8 %y) {
%0:
  %x = call i8 @gen8()
  %tmp0 = lshr i8 255, %y
  %tmp1 = and i8 %tmp0, %x
  %ret = icmp sgt i8 %x, %tmp1
  ret i1 %ret
}
=>
define i1 @cv0_GOOD(i8 %y) {
%0:
  %x = call i8 @gen8()
  %tmp0 = lshr i8 255, %y
  %1 = icmp sgt i8 %x, %tmp0
  ret i1 %1
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
i8 %y = #x00 (0)

Source:
i8 %x = #x7f (127, -129)
i8 %tmp0 = #xff (255, -1)
i8 %tmp1 = #x7f (127, -129)
i1 %ret = #x0 (0)

Target:
i8 %x = #x7f (127, -129)
i8 %tmp0 = #xff (255, -1)
i1 %1 = #x1 (1)
Source value: #x0 (0)
Target value: #x1 (1)

So as suggested, perhaps the wrong result is a consequence of running an old version of Alive2 or Z3?

nlopes · March 29, 2024, 5:18pm

Probably.

Topic		Replies	Views
Please don't use undef in tests (part 2) LLVM Project clang , llvm	22	979	March 18, 2025
Please don't use "br undef" in tests (aka please avoid test cases with UB) IR & Optimizations core	6	719	June 13, 2022
The undef story LLVM Dev List Archives	2	79	July 13, 2017
RFC: Proposal to Remove Poison LLVM Dev List Archives	19	187	February 17, 2015
Evolution of undef and poison over time LLVM Project core	11	1327	February 2, 2022

[GSoC 2024] Remove undefined behavior from tests

Related topics