-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crypto: new assembly policy #37168
Comments
In general the policy seems good to me, though I do feel that if a contributor is following the current assembly policy then they would already be meeting the bar of the new one. Maybe we need to expand the advice even more, I'm not sure. One thing that this proposal doesn't address is the fairly common practice of taking assembly from another project (e.g. OpenSSL) and porting it to Go assembly. I see this in reviews every so often. One concrete thing that I think we should encourage people to do is to document the repositories and, importantly, the exact version control revisions that their code is based on so that it is easier to check for upstream changes. Having said that, when reviewing such code do you think that it is better to encourage keeping the code close to the original so that it is easier to update with upstream changes (assuming anyone ever bothers to check for them) or to modify it more heavily to fit with the policy? Sometimes contributors don't fully understand the implementation or algorithm they are porting and so it will be very hard for them to rework it to fit with the policy. Or they may just be reluctant to change the code too much. How firm should we be in that sort of situation, assuming it is otherwise well tested? One other point I'd like to make is that I don't think we should mandate the use of code generation. It should only be used when it would increase maintainability. Short hand-written assembly files can be preferable in some situations. |
Yes, I agree, I switched from assembly in go-ristretto to using plain Go for Go 1.13 as it has |
@bwesterb please do file issues when you encounter specific intrinsics that are important for eliminating assembly. There’s uncertainty around how vector intrinsics should work, but I think there’s openness to new scalar intrinsics. Edit: just realized you were asking about a vector intrinsic. Sorry. I’d personally like to have those, too, but no one has put together a complete design yet. There’s some prior discussion in another issue I can’t find at the moment. |
I seem to remember a discussion when we wrote https://github.com/golang/go/wiki/AssemblyPolicy (only 18 months ago). Does anyone know if there's a link somewhere to that discussion? |
Hmm, the major differences should be a stronger preference for code generation, and the requirement for a Go reference implementation that matches small assembly functions 1:1. For example, there are generic implementations that do 32-bit limbs while the assembly does 64-bit limbs, and that would not be acceptable anymore, or assembly where the only entry point is ScalarMult, which would have to be broken up in a dozen or so functions.
I should mention that by default, this is not ok license-wise. There's a path to bring in assembly that was dual-licensed as part of cryptogams, and that's what was taken a couple times. I think we tried the "keep the port close to the source" route, and it does not meet the standards of maintainability we'd like to have, also because the source is often in perlasm or something like that, and the calling conventions are different, and Go has reserved registers, etc. I'd like us to try developing a small and curated set of assembly implementations rather than bring in large amounts from other projects with different priorities.
This is particularly experimental, and I'm willing to reconsider after we've seen some results, but I'd like to try it out, as I think it might lead to much better readability and portability. Maybe it will even force the creation of more tools like avo that can abstract some of the useless manual work. |
Should the new assembly policy make any statement about when assembler code eventually gets removed or is very unlikely to get merged? Probably it's not worth further maintaining M asm implementations of a "broken" cryptographic primitive. One example, I remember, has been the Similarly, we may not want to maintain more asm code for "outdated" primitives - i.e. SHA-1. However, both aspects are quite specific for cryptographic implementations. So they may be to specific for a general assembly policy. |
Can we clarify how "high level programs to generate assembly" make the code more secure? If anything, I would think they require more scrutiny. The author of a program that calls a third-party program to generate assembly has no idea why the program generated it the way it did. They need to understand their implementation in their high level language as well as review and understand the low-level assembly generated by the third-party program. It is a mistake to delegate trust to a third party code generator to generate assembly for a unique cryptographic primitive.
Which processes? What is the context of this statement? Are we talking about guarantees that the instructions aren't corrupting memory or verifying that the implementation itself is doing the right thing for all possible inputs. The latter seems impossible, and the intent of formal verification should be clearly stated in the proposal to reduce confusion. |
Hi @FiloSottile, FWIW I like the general philosophy here around active experimentation to see if a new approach yields better results, and it sounds like the concept is to observe and refine based on how the experiment goes. A few questions on the fuzzing piece:
|
Firstly a disclaimer: I wrote avo. I'm pleased to see it mentioned in this proposal as one of many tools available to help reduce complxeity in high performance crypto assembly. That said, I'm not naive enough to think it's always the right tool for the job! The claim is that code generation makes code easier to write, review and maintain, ultimately making bugs less likely. It allows us to assign meaningful names to variables, and use control structures to handle the repetitive block structures that are common in cryptographic primitives (avo SHA-1 example). I would claim that concise code generators written this way are more likely to be reviewable than thousands of lines of assembly (see p256 and AES-GCM). I would also point out that this isn't a novel idea, in fact code generation is already used extensively in crypto assembly, specifially via the preprocessor macros. Preprocessor being used for "register allocation" in AES-GCM: Lines 14 to 21 in 3eab754
Here's a poor-man's "function" defined with the preprocessor: Lines 264 to 275 in 3eab754
Likewise in P-256: go/src/crypto/elliptic/p256_asm_amd64.s Lines 18 to 25 in 3eab754
go/src/crypto/elliptic/p256_asm_amd64.s Lines 1664 to 1683 in 3eab754
Again in I could go on. My point is that code generation is already being used by Go crypto in some sense, it's just the pre-processor is a particularly awful form of it (in my opinion). And Go isn't alone here: OpenSSL's Your point that it's also necessary to review the code generator itself is completely fair and correct. I'd make a few points in response to this, some general, some defending
I can't speak for @FiloSottile, but based on a conversation we had at Real World Crypto I have some context. It seems that after the HACS workshop there is some chance that the fiat-crypto team will add support for Go output. If so this could be a great outcome for the Go community, and I assume the intention was for the assembly policy to encompass auto-generated code from formally verified tooling. @FiloSottile I appreciate you putting this together and I'm in broad agreement. I'm glad to see the emphasis that this is an experiment -- we can learn as we go. I wonder if it might be a good idea for us to concretize this in some specific cases, to see how it might play out. Perhaps we could pick one or more of the critical primitives and ask ourselves how we would have implemented it under the proposed policy (for example P-256, AES-GCM, curve/ed25519). By the way I'm not suggesting we write any code, just that we sketch out what a policy-compliant implementation would look like. |
I've contributed several ARM crypto implementations. It is work I enjoy and am suited to having written 10s of thousands of lines of ARM assembler in the past. However spending weeks writing assembly code to not have it merged really sucks! I think this is an excellent initiative. It will require some restraint on the part of the assembly authors who naturally want to go for as fast as possible. I like the idea of code generation. I've used the preprocessor extensively along with register name definitions which makes life easier. The preprocessor is a poor second to a good macro assembler though. Alas avo is x86 only at the moment isn't it? I'm not sure there are any ARM tools at the moment (I'd love to be corrected if wrong). |
As for code generation, There are examples for comparison. https://go-review.googlesource.com/c/go/+/51670 The code in CL 51670 is hand-written. The code has the exact same structure with the s390x implementation. The code generator in CL 136896 has 298 lines. It is clear that I need more time to read the 298 lines of Go than those 124 lines of assembly. Besides, there are another 740 lines of assembly code to review. If I am the reviewer, I need much more time to review CL 136896. |
@crvv According to my measurements these two CLs have substantially different performance. See the following gist for the methodology, and please let me know if I got something wrong. https://gist.github.com/mmcloughlin/70598689ad2da5c6aadf19e30b6642a1 Here's the
Note I've included AES-GCM benchmarks partly to show the benchmarks ran in similar environments, but also because it should be possible to write an AES-CTR mode implementation with similar performance to AES-GCM, if not better. I believe this experiment shows that CL 136896 is approx 2x faster than CL 51670. Therefore this is a classic trade-off. CL 136896 is unquestionably more complicated than 51670, but one could argue the complexity is justified for the speedup. CL 136896 uses some of the techniques employed in the GCM implementation, producing performance fairly close to GCM for the 8K benchmark. But I would argue that the code generator used in the CL makes it easier to understand than |
There is no doubt that the complex one is faster. As for the speed difference between the two CLs, it depends on the CPU architecture. |
@crvv wrote:
I'd agree in this case. I think you could do everything the code generator does with preprocessor macros in the .s file itself and it would be clear what is going on rather than having to work out how the code generator works then puzzling through how what the code generator did matches the assembler code generated. I personally think the solution to this is to make the macro system in the go assembler more powerful and then mandate that all assembler code is written with that, rather than third party or ad-hoc code generation tools. |
@crvv Yes, fair point and I probably agree that CL 51670 wouldn't benefit from code generation. I was reacting to your previous comment in which it seemed to me you compared the two CLs as if the only difference was one uses code generation and the other doesn't. One is up to 2x faster than the other, that needs to be mentioned in any comparison.
Yes, CL 136796 isn't optimal either. |
To clarify, the goal is to present the assembly in the simplest, most reviewable form. Sometimes it is small and that's a direct .s file. Sometimes not, and a generator is better. And yes, one can write difficult-to-understand generators - don't do that. Strive to find the simplest way to express what you need to express. That's true for all code. I think everyone basically agrees that CL 51670 doesn't need a generator, although I think it's close to the upper limit for that category. And I think basically everyone agrees that CL 136896's generator is much better than the 700 lines of assembly it generates. The question of whether a particular optimization is worth the added complexity is almost always going to be context-dependent. (In that specific case, a factor of two was deemed worthwhile. We're not here to second-guess that.) |
From reading the discussion, it sounds like people are generally in favor of the policy @FiloSottile put in the comment atop this issue. Do I have that right? Is anyone opposed to this policy? (There was some discussion about tradeoffs in the specific AES-CTR case, but the policy is not about settling every possible tradeoff.) |
As far as I can tell from the discussion, there is pushback against the phrasing "Use higher level programs to generate assembly", in that it appears to suggest code generation is required. As you say it seems there is consensus here that for small enough programs code generation may be counterproductive. Perhaps a slight softening of that language would be good? It would be nice if we could be more specific about when code generation is required, but it seems quite difficult to strictly define. |
I agree with most of what's written in principle. Use asm only when there is no other way. Write it as clear as possible, document well, test well. Good rules to follow for any asm not just crypto. I would just like to point out that much of the policy is written with amd64 in mind. There is no avo for others. The performance of generated code for Go is more likely to be worse than asm if not amd64. Writing small functions could result in more call overhead if the functions can't be inlined. |
Thanks everyone for the input and the discussion. Re: code generation, this is definitely an experiment, and I agree with Russ that there can be more and less complex generators, and they should be weighted against the complexity they save. They can be as simple as some Go functions named after a specific operation (say, field addition) that take parametrized registries and print assembly, so they can be reused and looped. How about the following language? I think "Discuss the implementation strategy on the issue tracker in advance." gives us space to assess case by case as we go forward.
That's definitely not the intention, as that would only solve a fraction of the maintenance problem, so I would love to hear more. Asm is going to be faster than compiler-generated amd64 as well, even if the gap might be smaller. The requirement to highlight what the compiler needs to do to replace the asm is meant to fight that gap over time. When you say small functions do you mean asm or Go? The requirement is specifically for asm functions, and I realize sometimes using the Go ABI is too expensive, so it's ok to make simple jumpable units with test adapters. |
@FiloSottile, could you please post in a new comment the exact policy that we are now discussing accepting or rejecting. I see an edit a few comments back and just want to make sure everyone agrees how it applies. @dgryski, note that we won't be inlining arbitrary small assembly functions any time soon. Too much subtlety for the compiler to take on. The path we chose instead was good intrinsified APIs. |
Hello, if the above policy is passed, does the golang community need to reconstruct some existing assembly implementations with more than 100 lines? Such as splitting into smaller and more assembly or golang functions. Compared with the current assembly implementation, this way may lead to performance degradation. Will this be accepted by the golang community? |
Yes, eventually we do want to adapt current code to the policy, although that's lower priority than draining the current review queue. Like with new code, performance is not the ultimate goal, but I'd like to think that as we gain experience with the policy, the degradation will be small or null.
Here's the full current text. I also clarified functions vs units.
|
Does anyone object to the policy in the preceding comment by @FiloSottile? |
I think the policy looks good. It might be worth mentioning here that the output of the assembly generators needs to be checked in and have a comment at the start like
|
How much assembly quantifies a non-trivial amount? |
Based on the discussion above, this seems like a likely accept. |
No change in consensus, so accepted. |
This is now golang.org/wiki/AssemblyPolicy and will apply to the Go 1.16 cycle. |
The crypto packages are unfortunately rich in assembly implementations, which are especially hard to review and maintain due to the diversity of architectures, and to the complexity of the optimizations.
Our current assembly policy at golang.org/wiki/AssemblyPolicy has failed to make reviews manageable with the resources the security team currently has. https://go-review.googlesource.com/q/hashtag:crypto-assembly
The result is suboptimal for everyone: implementers have to follow more rules, but reviewers still can’t effectively review their CLs, and no one is happy.
I am proposing a new, much stricter policy. This acknowledges the reality that assembly reviews are currently not moving at an acceptable pace, and shifts more of the load on the implementers, but with a promise that their work won’t go wasted in the review queue. It should also progressively increase the maintainability, reliability and security of the assembly codebases, as well as surface improvement areas for the compiler so that the assembly can be eventually removed.
This policy would apply to all packages in
crypto/...
andgolang.org/x/crypto/...
. Due to its use in cryptographic packages, and to the fact that it's partially maintained by the security team, this policy would also extend tomath/big
.go get
-able programs, like avo. Output of other reproducible processes (like formally verified code generators) will also be considered. Discuss the implementation strategy on the issue tracker in advance.The text was updated successfully, but these errors were encountered: