The select story

Juneyoung Lee

Oct 4th 2021

This is a story about the removal of a compiler optimization that had been in LLVM for a long time.

1. Background: How Does LLVM Compile ‘e1 && e2’?

In C and C++, ‘e1 && e2’ conditionally evaluates e2 only if e1 evaluates to true. This is called a short-circuit evaluation, and programmers often rely on this to write an idiom like ‘idx < LEN && check(array[idx])’ that prevents the latter expression from raising an out of bounds access using the former condition[1].

To correctly implement short-circuit evaluation, Clang inserts a branch that has the result of e1 as its condition when it translates ‘e1 && e2’ into the intermediate representation of LLVM. Then, it places e2 at a basic block which is executed only if the branch is taken. For example, the emitted intermediate representation of LLVM – or simply LLVM IR – translated from ‘idx < LEN && check(array[idx])’ roughly looks like this.

entry:

  %e1 = icmp slt i64 %idx, %LEN                 ; e1 = %idx < %LEN

  br i1 %e1, label %land.rhs, label %land.end   ; Is ‘e1’ true?

land.rhs:

%val = load i32, i32* ...(array + idx)

  %e2 = call @check(i32 %val)                   ; e2 = check(array[idx])

br label %land.end

land.end:

  ... (%e2 if taken, otherwise %e1)

However, we know that conditional branches are, well, expensive. It is the SimplifyCFG pass that tries to remove the branch that looks like the one in the above example. It checks whether hoisting out the evaluation of e2 from the condition branch is (A) correct and (B) beneficial. A is met if e2 does not have any side effect, or the operations in e2 are possible to execute speculatively. B is met if e2 is very cheap, e.g., a simple integer comparison.

For the above example, neither A nor B holds because speculating a load is not safe or beneficial in general. For a simpler expression like ‘idx < LEN && idx != 1’ however, both A and B hold. Therefore, SimplifyCFG optimizes it to an IR program that does not have a branch. To represent the output, it uses the select instruction. Select is equivalent to the ternary operator ‘?:’ in C, except that the instruction can only take a constant or variable as its value operands. In the code generation, select can be efficiently lowered into an assembly like cmov in x86-64.

%e1 = icmp slt i64 %idx, %LEN           ; %idx < %LEN

  %e2 = icmp ne  i64 %idx, 1              ; %idx != 1

  %res = select i1 %e1, i1 %e2, i1 false ; %e1 ? %e2 : false

The journey does not stop here, and LLVM goes one step further. The InstCombine pass in LLVM rewrites ‘select’ to ‘and’. This is beneficial because it allows many optimizations for the bitwise ‘and’ operation (e.g., ‘x & (y & x)’ => ‘x & y’) to be reused.

%e1 = icmp slt i64 %idx, %LEN ; %idx < %LEN

  %e2 = icmp ne  i64 %idx, 1     ; %idx != 1

  %res = and i1 %e1, %e2         ; %e1 & %e2

2. The Canonicalization Was Correct, But It Isn’t Anymore!

The topic of this post is about the validity of the last canonicalization (select -> and). Unsurprisingly, the canonicalization has been in InstCombine for a long time; It was introduced in Apr 2004[2], which was a month after ‘select’ was added to the list of IR instructions[3].

However, the canonicalization is incorrect because the optimized program is more undefined if one of its operands is a poison value. Precisely speaking, the canonicalization had been correct in the past because the poison value was introduced six years after the canonicalization was written. After the introduction of the poison value, its correctness was in the gray area. Then, it explicitly became incorrect after a recent clarification of the semantics of the select instruction with respect to the poison value.

Then, what is a poison value? A poison value is a conceptual value that represents the result of an erroneous operation. For operations like shifting with an amount larger than integer bitwidth - 1, the evaluation does not immediately make the program undefined, but a poison value is stored into the virtual register[4]. If the program does not access the register at all, everything is fine. An instruction that takes a poison value as input also returns poison (with a few exceptions). If the instruction has a side-effect, the program behavior finally becomes entirely undefined. According to the GitHub commit history, it was initially called a “trap value”[5]. After a year, it was renamed to a “poison value”[6].

Then, why does the existence of a poison value make the canonicalization incorrect? It is because the ‘select’ and ‘and’ operations have different rules about how a poison input is propagated to the output. According to the latest LangRef, they are like this:

  1. For ‘select false, poison, y’, the result is simply y (not poison) because the true operand is not chosen. Similarly, for ‘select true, y, poison’. This is analogous to how C’s ternary expression is evaluated.
  2. For ‘and 0, poison’, the result must be poison. This is analogous to how ‘x & y’ in C evaluates. For example, ‘0 & (1 << 32)’ has undefined behavior.

According to this definition, canonicalizing ‘select A, B, 0’ into ‘and A, B’ is incorrect because the result is more poisonous if A is false and B is poison! To see the problem in practice, let’s consider a C expression ‘i < 32 && (16 >= 1 << i)’ where ‘i’ is a 32-bit integer. The expression is evaluated to false if i is 32. However, the canonicalization in LLVM converts it into an expression that uses the bitwise ‘and’, which is analogous to ‘i < 32 & (16 >= 1 << i)’. According to the poison propagation rule, the result is more undefined if i = 32.

Two tables below show the results of the two expressions ‘select x, y, false’ and ‘and x, y’ when various values are given to x and y:

x \ y

false

true

poison

false

false

false

false

true

false

true

poison

poison

poison

poison

poison

Table 1. The outputs of ‘select i1 x, y, false’

x \ y

false

true

poison

false

false

false

poison

true

false

true

poison

poison

poison

poison

poison

Table 2. The outputs of ‘and i1 x, y’

Note that the poison propagation rules were fully specified in LangRef in 2020, which is quite recent. Before this change, LangRef stated that instructions other than phi nodes depend on their operands. This meant that ‘select false, poison, _’ was poison, invalidating lowering C’s “:?” / “&&” / “||” into the select instruction. Due to these problems, LangRef was updated to treat ‘select’ like a phi node in 2020, reaching to the current wordings[7]. Instead, the canonicalization explicitly became an invalid transformation.

The canonicalization had been in the InstCombine pass of LLVM for more than 14 years. Then, it was discovered by a formal verification tool in 2014 that the canonicalization was problematic[8]. However, the canonicalization was not fixed because it did not cause any real-world miscompilation[9]. The first real-world miscompilation bug was reported in early 2020. It caused a patch that implements a valid optimization[10] to be reverted because it interacted badly with the canonicalization, causing an end-to-end miscompilation. Then, after a poison constant was added to LLVM in 2020[11] and used more and more, many miscompilation bugs began to appear. This necessitated the removal of the old and incorrect canonicalization.

3. Two Possible Solutions

To fix the incorrect canonicalization, we came up with two solutions. The first one was to simply remove it. However, this could lead to performance regressions in optimized programs. In LLVM, simplification patterns for the bitwise ‘and’ operation (e.g., ‘x && (y && x)’ -> ‘x & (y & x)’ -> ‘y & x’) were responsible for optimizing conjunctions (similarly for bitwise ‘or’ and disjunctions). Without this canonicalization, conjunctions could not be optimized.

The second solution was to use the freeze instruction. Canonicalizing ‘select i1 x, y, false’ to ‘and x, (freeze y)’ is correct because the freeze instruction blocks the propagation of the poison value. The two expressions are not equivalent as the latter has a more defined result. This transformation is correct because a compiler can remove undefinedness from the source program.

x \ y

false

true

poison

false

false

false

false

true

false

true

false or true
(nondet. chosen)

poison

poison

poison

poison

Table 3. The outputs of ‘and i1 x, (freeze y)’. If x is true and y is poison, the result is well-defined.

This solution salvages a portion of, but not all, optimizations. For example, ‘x && (y && x)’ is canonicalized to ‘and x, (freeze (and (freeze y), x))’, which can be optimized into ‘and x, (freeze y)’ thanks to the existing optimizations on the (1) ‘and’ operation and (2) ‘freeze’ operation.

However, the second solution had one serious drawback: the insertion of freeze permanently removes undefinedness from the source program (look at the table above). Since undefined behavior in the program gives more freedom to compiler to do optimizations, its removal induces suboptimal code generation. I will leave a realistic optimization example that can be blocked at the bottom of this blog post.

Of course, it is reasonable to add a transformation that inserts freeze if the transformation brings a performance improvement. However, the canonicalization does not make the program fast or small by itself; it is the later pipeline that recognizes the bitwise operations and makes things better. Well, in theory, we could update all bitwise optimizations to accept selects as well and still achieve the goal.

4. Biting the Bullet

After discussions, we chose to pursue the first solution. That was a tough decision because, as mentioned before, it meant we needed to start investigating all optimizations (as well as analyses) in LLVM and fix them.

1) Conditionally Reallowing the Canonicalization If Valid

A question that was naturally raised was whether we could revive the canonicalization in certain cases to minimize the impact. For example, if ‘y’ in ‘x && y’ is known to never be a poison value, then the canonicalization is valid. The never-poison analysis already existed in LLVM[12] and we could simply use it. However, this is not the only case where the canonicalization is valid.

Then, what is the precise analysis that allows this canonicalization? The only problematic case was when x was false and y was poison. What about implementing a value analysis that takes two values (x, y) and returns true if ‘x != false or y != poison’? According to the definition of implication, this is equivalent to ‘y = poison implies x != false?’. But, the analysis was too specific to this problem. There was also a canonicalization that rewrites a disjunction ‘x || y’ to ‘x | y’ as well, and the analysis certainly needed to be reused to this case. By slightly sacrificing the precision, we generalized it to ‘does y = poison imply (x != false /\ x != true)’, which is equivalent to

Does ‘y = poison’ imply ‘x = poison’?

The new value analysis function was called ‘impliesPoison(y, x)’. It is currently located at Analysis/ValueTracking.{h,cpp}.

Let’s visualize the cases where impliesPoison works. In table 4, a cell has a check (v) if it is theoretically correct to canonicalize ‘select i1 x, y, false’ to ‘and i1 x, y’ when (x, y) have the corresponding values. Also, a cell is colored blue if impliesPoison(y, x) is true in that case. Note that impliesPoison(y, x) is true if y is not poison. This is because false implies anything in logic.

x \ y

false

true

poison

false

v

v

x

true

v

v

v

poison

v

v

v

Table 4. Using impliesPoison covers most cases.

You can see that using impliesPoison reallows the canonicalization in most cases.

Then, how is the impliesPoison(y, x) implemented? We inductively defined impliesPoison as follows:

  1. If y = x or y is a non-poison constant, impliesPoison(y, x) trivially holds.
  2. If x = op x1 x2 and op propagates poison, then
     
    impliesPoison(y, op x1 x2) = impliesPoison(y, x1) \/ impliesPoison(y, x2).
  3. If y = op y1 y2 and op never creates poison (e.g., add without any flag), then y = poison means either y1 = poison \/ y2 = poison. In both cases, x must be poison.
     impliesPoison(op y1 y2, x) = impliesPoison(y1, x) /\ impliesPoison(y2, x).

Interestingly, this analysis turned out to work well in practice. It was because many optimizations on conjunctions were dealing with formulae whose two subexpressions use the same set of variable, e.g., ‘idx < 10 && idx < 20’ or ‘(a | b) && (~a | b)’. Note that we can easily prove that impliesPoison(idx < 20, idx < 10) is true.

  1. idx < 20 is poison iff idx is poison, therefore it is enough to check impliesPoison(idx, idx < 10)
  2. Similarly, idx < 10 is poison iff idx is poison, so it is equivalent to impliesPoison(idx, idx), which is true!

Therefore, the formula can be canonicalized into ‘and (idx < 10), (idx < 20)’. Similarly, impliesPoison(~a | b, a | b) is true.

2) Updating Optimizations to Recognize Selects

Resurrecting the canonicalization using impliesPoison could partially recover the performance. However, the performance recovery was still not very satisfactory. We needed to update existing optimizations on the ‘and’ operator to deal with the select form (similarly for ‘or’ and its select form as well).

In order to facilitate updating existing optimizations, two pattern matchers were added into LLVM: m_LogicalAnd and m_LogicalOr. m_LogicalAnd(e, A, B) checks whether expression e is either ‘and i1 e1, e2’ or ‘select i1 e1, e2, false’ (allowing a vector type of i1s as well) such that ‘e1’ satisfies pattern A and ‘e2’ satisfies pattern B; similarly, m_LogicalOr checks whether the expression is a disjunction. Also, to track missing optimizations on the select form, unit tests on ‘and’ optimizations in LLVM were duplicated and edited in order to track whether ‘select’ instructions are also optimized. To make the comparison of the assembly outputs easy, a command-line flag that disables the select canonicalization was added as well.

These preparations made the updates easier than before, but we still needed to visit each optimization and carefully check whether generalizing it to accepting the ‘select’ form. The impliesPoison-based canonicalization could significantly reduce the number of optimizations to visit, but there were still many. Fortunately, we find that many of the remaining optimizations were recognizing conjunctions at conditional branches, and they fell into one of these two cases:

  1. There are optimizations that recognize a conjunction in a conditional branch and use it in the taken branch:

if (cond1 && a == b) {
 use(a); // can be optimized to ‘use(b)’ because neither

          // a nor b can be poison.
}

This optimization is still valid even if ‘cond1 && cond2’ is not canonicalized into the ‘and’ form.

  1. On the other hand, there are optimizations that recognize a conjunction and use it in the untaken branch:

int i = 0;
while (i < n1 && i < n2) { i++; }
use(i); // cannot be optimized to ‘use(min(n1, n2))’ because
       // n1 = 0 /\ n2 = poison!

This optimization isn’t valid because replacing use(i) with use(min(0, poison)) = use(poison) is incorrect.

3) Adding More Optimizations to Recover Performance

To fully recover the performance, we added peephole optimizations that simplify expressions containing selects. The simplest and new one was an optimization folding ‘a && (a && b)’ (in C) into ‘a && b’ (in C)[13]. General patterns were added by calling isImpliedCondition(lhs, rhs) in LLVM’s value analysis[14]. isImpliedCondition is an LLVM function that returns true if lhs = true implies rhs = true (e.g., isImpliedCondition(“x < 10”, “x < 20”) returns true). For example, ‘(a || b) && c’ is now simplified into ‘a && c’ if isImpliedCondition(c, !b) holds. Optimizations for expressions containing both ‘select’ and bin ops - such as folding ‘a & (b ? A : B)’ into ‘a ? A : false’ if a = true implies b = true – were added as well[15].

Checking the correctness of these optimizations was not trivial due to the existence of poison and undef values. For each patch, we left link(s) to Alive2 to show the validity of transformation.

Fixing some optimizations on the and/or operations required additional tricks. For example, LLVM simplifies ‘((t1 & k) == 0) | ((t4 & k) == 0)’ into ‘((t1 | t4) & k) == 0’ because the latter has fewer operations. However, the select version of the input pattern - ‘((t1 & k) == 0) || ((t4 & k) == 0)’ - cannot be optimized to the latter because the result may be more undefined if t4 is poison. To correctly fix it, t4 must be replaced with ‘freeze t4’. This tweak was necessary to resolve 6% performance regression after the removal of canonicalization[16].

We updated LLVM to recognize the select form of conjunction as having the cost of an ‘and’ operation[17]. The backend was updated to generate optimized assembly for the selects[18].

5. Removing the Canonicalization in Three Steps

The canonicalization was gradually removed from LLVM in three steps.

First, we left InstCombine as the only place where the canonicalization happened. Previously, there were several optimizations that used the ‘and’ operation to create a conjunction expression in an IR program (‘or’ for a disjunction as well). For example, SimplifyCFG was inserting ‘and’ to represent the merged condition after merging two consecutive branches into one[19]. Also, a loop vectorizer and peephole optimization were incorrectly introducing ‘and’ as well[20].

Second, we conditionally disabled the InstCombine canonicalization[21] to fix a miscompilation bug reported in March 2021[22]. The canonicalization was disabled if the second operand of the select (‘b’ in ‘select a, b, false’) was a poison-creating operation like ‘add nsw’. Since many arithmetic operations translated from C have poison-generating flags, it made the canonicalization seldomly happen. Fortunately, no performance regression that was hard to fix was reported after this patch had landed. This meant that the complete removal of the canonicalization was prospective.

The last step was to fully disable the canonicalization unless impliesPoison hold[23]. It was spurred by another miscompilation bug report due to this canonicalization[24]. After a lot of optimizations were added in advance to avoid a performance regression, the canonicalization was fully removed in May 2021.

Interestingly, after each step, hidden bugs and issues in LLVM were revealed[25]. We found miscompilation bugs after the first step and the third step. Also, an unexpected performance regression and infinite loop were found after the second step. If the removal of the canonicalization had been done at once, downstream projects of LLVM must have fallen into a chaotic state. Gradually removing the canonicalization helped us avoid the hassle.

6. Conclusion & What’s Next?

That was a long trip!

To summarize, the story was as follows:

  1. LLVM had been canonicalizing selects into the and/or binary operations for a long time.
  2. The recent clarification of the semantics of select in LangRef explicitly made the canonicalization an incorrect one.
  3. There were two possible solutions, and we chose the solution that was unlikely to cause a performance regression if everything was correctly fixed. Existing optimizations were updated and new ones were added to simplify expressions containing selects. Also, a new analysis ‘impliesPoison(x,y)’ was added to reenable the canonicalization if possible.
  4. The canonicalization was gradually disabled in three steps to avoid undesirable consequences. It was fully removed in May 2021.

The changes have been shipped to the official release of LLVM 13.0.

However, the story isn’t over yet. A report[26] demonstrates suboptimal code generations due to uncovered simplification patterns on select. This implies that there still exists a set of transformations that are uncovered by our updates. Furthermore, we don’t know which transformations are uncovered unless users report performance regressions. Exhaustively adding all possible optimizations on select is infeasible and will cause the compiler to run infinitely.

Then, how can we solve this problem? The problem exists because LLVM has poison value, but we cannot change the IR because its existence allows another set of important optimizations. Instead, we can update the frontends (e.g., C/C++, Swift, Rust) to leave never-poison marks at SSA variables. Then, the canonicalization will fire again because impliesPoison(y,x) returns true if y is never poison (it’s because false implies everything).

This is somewhat analogous to how the alignment information of a pointer is encoded in LLVM IR. Many architectures support fast and wide memory instructions that are tailored for specific alignment(s). Therefore, to use the instructions, it is important to check whether a dereferencing pointer has the alignment. However, in LLVM IR, a pointer type ‘ty*’ has nothing to do with its underlying type ‘ty’. This means that one cannot infer the alignment of a pointer from ‘ty’. Instead, it is the frontend that leaves the alignment information. For example, the C language standard states that types have alignment requirements, and the addresses of objects of that type must satisfy the restriction. The clang frontend attaches the ‘align’ attribute (with the corresponding alignment number) to pointers and instructions using them.

Actually, LLVM IR already has an attribute for presenting the never-poison information - it is the ‘noundef’ attribute. If the ‘noundef’ attribute is attached to a function argument, its value is neither undef nor poison because passing such values to the argument is undefined behavior. What is cool is that clang already has a command-line flag that enables attaching the ‘noundef’ attribute to arguments if C/C++ standards allow doing so[27]. Turning on the flag effectively resolved a performance regression after another miscompilation bug was fixed using the freeze instruction. I believe that the flag will also help address unseen performance regressions after the removal of select canonicalization as well. To enable the flag by default, a student who participated in Google Summer of Code this year created a patch[28]. We welcome any support for reviewing our patch!

Appendix I. An example showing that folding ‘select x, y, false’ to ‘add x, (freeze y)’ can block further optimizations.

The inserted freeze can block induction variable widening. “The nsw story”[29] has a nice background about the transformation. Imagine this loop:

// n is a 64-bit integer

int y = 0;

while (x && (y <= n)) {

arr[y] = 0;
y = add nsw y, 1

}

If the branch condition is canonicalized into ‘and’ with its second operand frozen, it becomes:

int y = 0;

while (and x, (freeze (y <= n))) {

arr[y] = 0;

y = add nsw y, 1

}

Let’s assume that x was true, leaving “freeze (y <= n)” only:

int y = 0;

while (freeze (y <= n)) {
arr[y] = 0;

y = add nsw y, 1

}

This is bad because the loop condition is frozen. This blocks loop optimizations such as induction variable widening on y. If there was no freeze at all, like this:

int y = 0;

while (y <= n) {
arr[y] = 0;

y = add nsw y, 1

}

y could be widened into 64 bits integer, which is beneficial.


[1] One can also use a similar technique for `e1 || e2`. I will mainly use ‘e1 && e2’ as an example in this post for brevity.

[2] https://github.com/llvm/llvm-project/commit/1c631e813da9d

[3] https://github.com/llvm/llvm-project/commit/439b128ce7b6d4a09fc2c042c5330aef0323d4a6

[4] I will not explain what undefined behavior is in this post because there are a lot of other great posts that describe it.

[5] https://github.com/llvm/llvm-project/commit/ffc9a6b4ac6b979bec1592cfc9f2722f9bb031ed

[6] https://github.com/llvm/llvm-project/commit/9a2a0933edeb7970a6f11f7de68be68ddbce6b4f

[7] https://reviews.llvm.org/rGfaba1d034a046305a3a7902e1753bf5dfc4f686e

[8] https://groups.google.com/g/llvm-dev/c/eQ7Fw2C_O9Q

[9] Fortunately, the situation is much better now. If you report that a transformation is buggy with anAlive2 proof (https://alive2.llvm.org/), it is likely to be fixed.

[10] https://reviews.llvm.org/D72396

[11] https://reviews.llvm.org/D71126

[12] It is the isGuaranteedNotToBePoison function in ValueTracking.{h,cpp}.'

[13] https://reviews.llvm.org/D96945

[14] https://reviews.llvm.org/D101375, https://reviews.llvm.org/D101807

[15] https://reviews.llvm.org/D101720

[16] https://reviews.llvm.org/D101423

[17] https://reviews.llvm.org/D97360, https://reviews.llvm.org/D99884 

[18] https://reviews.llvm.org/D93853 

[19] https://reviews.llvm.org/D95026 

[20] https://reviews.llvm.org/D95217

[21] https://reviews.llvm.org/D99674

[22] https://bugs.llvm.org/show_bug.cgi?id=49688

[23] https://reviews.llvm.org/D101191

[24] https://bugs.llvm.org/show_bug.cgi?id=50145

[25] https://bugs.llvm.org/show_bug.cgi?id=49495

[26] https://bugs.llvm.org/show_bug.cgi?id=51577

[27] https://godbolt.org/z/Tq6bdWsT1 - the flag does not attach noundef to load instructions yet.

[28] https://reviews.llvm.org/D105169, https://reviews.llvm.org/D108453 

[29] https://groups.google.com/g/llvm-dev/c/sDYaYV_ZF-g/m/5Ektu6vM_0oJ