[GSoC 2024] Remove undefined behavior from tests

Many of LLVM’s unit tests have been reduced automatically from larger tests. Previous-generation reduction tools used undef and poison as placeholders everywhere, as well as introduced undefined behavior (UB). Tests with UB are not desirable because 1) they are fragile since in the future the compiler may start optimizing more aggressively and break the test, and 2) it breaks translation validation tools such as Alive2 (since it’s correct to translate a fuction that is always UB into anything).
The major steps include:

  1. Replace known patterns such as branch on undef/poison, memory accesses with invalid pointers, etc with non-UB patterns.
  2. Use Alive2 to detect further patterns (by searching for tests that are always UB).
  3. Report any LLVM bug found by Alive2 that is exposed when removing UB.

Expected result: The majority of LLVM’s unit tests will be free of UB.

Skills: Experience with scripting (Python or PHP) is required. Experience with regular expressions is encouraged.

Project size: Either medium or large.

Difficulty: Medium

2 Likes

Hello Nuno!
This seems like cool stuff!

So far, I’ve found this:

~/o/l/l/test (main)> rg "phi(.)*undef, " | wc -l
2091
~/o/l/l/test (main)> rg "undef, " | wc -l
147419
~/o/l/l/test (main)> rg "poison, " | wc -l
70385
~/o/l/l/test (main)> rg "phi(.)*poison," | wc -l
676

Quick questions:

  1. Should the ultimate goal just be “no poison or undef should be part of the test suite” or rather “be clever about your searching so that no use of poison or undef that leads to UB should be part of the test suite”?
  2. Are there known “quick fixes” for some of the easier patterns to detect?

Thanks for putting this together, to see this initiative take some traction.

1 Like

nit: the “unit test” here actually means “lit test”, right?

1 Like

No, the goal is not to eliminate all uses of undef and poison, just the ones that render functions always UB. The text above mentions the known patterns.
Furthermore, uses of undef should be replaced with poison whenever it makes sense.

Yes, e.g., ‘br undef’ → br %a_new_fn_argument’

Anything in llvm/tests.

Hi! As previously discussed, the task is to identify tests that cause undefined behavior (UB) in functions. So, I was wondering if we are simply removing tests containing the following patterns using regular expressions:

  • div v, undef / poison / 0
  • br undef / poison
  • load ptr undef / poison
  • store val, ptr undef / undef
  • memcpy ptr undef / poison
  • getelementptr undef/poison

I was also curious about how we could theoretically define which patterns might result in UB (in a general sense). And are there any suggested references?

Thanks!

1 Like

There is a theoretical definition of UB, yes. There are many cases, so the easiest is to use a tool like Alive2 to find all those cases automatically.

If you are curious about the cases, you can start by searching for undefined in LLVM Language Reference Manual — LLVM 19.0.0git documentation

Hi,
I am interested in this project. I have scanned through the Language reference and read some articles on alive2 Alive2 Part 1: Introduction . Are there some other references or documented fixes or such that you would recommend to read?

Hi, I’m interested in this project, As you mentioned, the project needs experience with Python scripting, is there any recommendation for the LLVM’s Python binding? What about llvmlite and llvmpy?

You can read this: https://web.ist.utl.pt/nuno.lopes/pubs/undef-pldi17.pdf

I was thinking about regular expressions, not reading/writing IR through some bindings, as that would cause too many changes in the tests.
Also, it doesn’t have to be Python. PHP is fine as well (as much faster FWIW).

I see, thanks for your answer

Hello again,

This is a question related to proposal and alive2. In the proposal should I document all the undefined behavior found in the LangRef that can be found in llvm/test and fixes that I can find?

Also I tried running alive2 using on InstCombine transforms

$LLVM2_BUILD/bin/llvm-lit -s -Dopt=$ALIVE2_HOME/alive2/build/opt-alive.sh $LLVM2_HOME/llvm/test/Transforms/InstCombine

It seem to go on for 2 hours without terminating.
Is this due to processing power or is there something else at play?

My CPU Specifications
i3 10th generation, 2 cores 4 threads

Alive2 takes ~4 hours to run on an 8-core server (for the whole Transforms dir). I don’t have stats for InstCombine alone.
An i3 2x cores is going to take a whole day I guess.

The proposal doesn’t need to list all UB cases. We know what they are. You should write about the strategy to find & fix all the cases.

1 Like

Hello, I finished reading through the Alive2 Part 1: Introduction that @azmat-y linked. The article features a bug found using Alive2, which I’m having trouble interpreting the Alive2 output for. Considering the %y = 0 as suggested by Alive2, I manually traced through and wrote down the (incorrect?) range for each value, also arriving at the conclusion that the transformation doesn’t verify:

define i1 @cv0_GOOD(i8 %y) {
%0:
  %x = call i8 @gen8()          (-128, 127) // assuming x is signed
  %tmp0 = lshr i8 255, %y       (255) // assuming tmp0 is unsigned
  %tmp1 = and i8 %tmp0, %x      (-128, 127) // bitwise `and` returns x
  %ret = icmp sgt i8 %x, %tmp1  (1) // x == tmp1 is always true
  ret i1 %ret
}
=>
define i1 @cv0_GOOD(i8 %y) {
%0:
  %x = call i8 @gen8()          (-128, 127) // assuming x is signed
  %tmp0 = lshr i8 255, %y       (255) // assuming tmp0 is unsigned
  %1 = icmp sgt i8 %x, %tmp0    (0) // anything in (-128, 127) isn't > than 255
  ret i1 %1
}
// %ret = 1 and %1 = 0 respectively, so the transformation doesn't verify

However, the Alive2 output is as follows:

Example:
i8 %y = #x00 (0)

Source:
i8 %x = #x7f (127, -129)
i8 %tmp0 = #xff (255, -1)
i8 %tmp1 = #x7f (127, -129)
i1 %ret = #x0 (0)

Target:
i8 %x = #x7f (127, -129)
i8 %tmp0 = #xff (255, -1)
i1 %1 = #x1 (1)
Source value: #x0 (0)
Target value: #x1 (1)

It arrives at the same conclusion, with %ret = 1 and %1 = 0, but I’m not sure what its parentheses notation represents. Any clarification would be appreciated!

The parenthesis are not ranges. They just show the signed & unsigned representation of the values.

Since they are not ranges, how did Alive2 determine the values 127 and -129 for %x? The -129 signed value is particularly confusing to me since it requires 9 bits to represent in two’s complement form, but i8 can only store 8-bit signed values.

I can’t reproduce what you’re observing. See here: Compiler Explorer
Maybe you’re using an old version of Alive2 or Z3?

If you are able to make the online produce such a wrong result, please file a bug report.

Sorry, I believe I wasn’t clear in my original post. I did not produce the output; I’m just reading the Llvm Bugzilla Report which has your comment from 2019:

Nuno Lopes 2019-06-08 15:42:02 PDT
The following instcombine rewrite seems incorrect (test/Transforms/InstCombine/canonicalize-constant-low-bit-mask-and-icmp-sgt-to-icmp-sgt.ll):

define i1 @cv0_GOOD(i8 %y) {
%0:
  %x = call i8 @gen8()
  %tmp0 = lshr i8 255, %y
  %tmp1 = and i8 %tmp0, %x
  %ret = icmp sgt i8 %x, %tmp1
  ret i1 %ret
}
=>
define i1 @cv0_GOOD(i8 %y) {
%0:
  %x = call i8 @gen8()
  %tmp0 = lshr i8 255, %y
  %1 = icmp sgt i8 %x, %tmp0
  ret i1 %1
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
i8 %y = #x00 (0)

Source:
i8 %x = #x7f (127, -129)
i8 %tmp0 = #xff (255, -1)
i8 %tmp1 = #x7f (127, -129)
i1 %ret = #x0 (0)

Target:
i8 %x = #x7f (127, -129)
i8 %tmp0 = #xff (255, -1)
i1 %1 = #x1 (1)
Source value: #x0 (0)
Target value: #x1 (1)

So as suggested, perhaps the wrong result is a consequence of running an old version of Alive2 or Z3?

Probably.

1 Like