-
Notifications
You must be signed in to change notification settings - Fork 3k
M487: Support ECP H/W accelerator #5812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ARM Internal Ref: IOTSSL-2000 |
|
||
/* R = m*P = (multiple of order)*G = 0 */ | ||
/* NOTE: If grp->N (order) is a prime, we could detect R = 0 for all m*P cases | ||
* by just checking if m is a multiple of grp->N. Otherwise, sigh. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order grp->N
is not the order of the entire elliptic curve, but the order of the subgroup <G>
spanned by ecp->G
, which is assumed to be prime. Therefore the check is fine and exhausts R=0
completely, provided one makes the assumption that P
lies in <G>
-- whereas without this assumption, the check wouldn't be accurate in the first place. I think P
belonging to <G>
is a reasonable yet undocumented assumption the default S/W ECC code also makes at times, but I don't remember details - ping @mpg? Continuing on this assumption, the case of order 2
, i.e. P = (-P)
but P != 0
, cannot occur.
What happens to the H/W accelerator if R=0
is not covered beforehand? It seems that it's not capable of returning non-affine points, or is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the short Weierstrass case, for all curves we currently support, <G>
is the whole curve, and mbedtls_ecp_check_pubkey()
(called at the beginning of mbedtls_ecp_mul()
and hopefully all other public functions that receive both secret input and some untrusted input) ensures that the point lies on the curve, hence that it belongs to <G>
. If we ever added support for short weierstrass curves with non-trivial cofactor, we would need to adapt mbedtls_ecp_check_pubkey()
so that it keeps checking that the point is a multiple of G
.
Also, mbedtls_ecp_check_privkey()
ensures that m
is in the range [1, N) so together with the previous check on P
, this ensures that m * P != 0
. Note however that this protection only applies to the scalar multiplication part: when doing the final addition in R = m * P + n * Q
you still need to be prepared for the R == 0
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to the H/W accelerator if R=0 is not covered beforehand? It seems that it's not capable of returning non-affine points, or is it?
@hanno-arm If R = 0
, H/W accelerator will hang and its finish/error flag won't be set. That's why I must filter out all R = 0
cases.
mbedtls_ecp_point *R, | ||
const mbedtls_mpi *m, | ||
const mbedtls_ecp_point *P, | ||
const mbedtls_mpi *n, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter n
is unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanno-arm The parameter n
is kept to be consistent with R = m*P + n*Q
. I add comment for it in 1728037.
@hanno-arm Nuvoton's M480 series is not released yet (about 3/E). Do you have a strong need for ECP H/W spec for the review? |
1. Add comment for unnecessary parameter 'n' in mbedtls_internal_run_eccop 2. Fix warning message with goto which causes `bypass initialization` 3. Fix comment
* n is kept with unused modifier. | ||
* | ||
*/ | ||
int mbedtls_internal_run_eccop(const mbedtls_ecp_group *grp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's meant to be an internal function, please make it static
and remove the mbedtls_
prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MBEDTLS_INTERNAL_MPI_NORM(&Np, *o2, N_, *p); | ||
MBEDTLS_MPI_CHK(mbedtls_internal_mpi_write_eccreg(Np, (uint32_t *) CRPT->ECC_Y1, NU_ECC_BIGNUM_MAXWORD)); | ||
} else if (modop == MODOP_DIV) { | ||
MBEDTLS_INTERNAL_MPI_NORM(&Np, *o2, N_, *p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the divisor is 0
? Do we need to check for it or will we get a graceful failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanno-arm I add zero divisor check in c9cc357.
* | ||
* N_: Holds normalized MPI if the passed-in MPI N1 is not | ||
* Np: Pointer to normalized MPI, which could be N1 or N_ | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the benefit of this macro compared to performing a modular reduction on the respective variable in-place. On the functional side it's the same, on the performance side it has the chance to be less costly because it is not enforced that the bignum module will need to make a copy, and it would be more readable (for my taste). Could you elaborate why you chose this path, or change MBEDTLS_INTERNAL_MPI_NORM(a,b,c,d)
to MBEDTLS_INTERNAL_MPI_NORM(x,N)
performing a modular reduction on x
mod N
in-place if necessary?
Looking at it, I think you can even reduce this to a call to mbedtls_mpi_mod_mpi( x, x, N )
everywhere, as the latter function checks if already x < N
before performing any expensive operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, many MPI's are unfortunately declared const
in the fixed API, and we are not allowed to perform modular reduction in-place. That makes the necessity and motivation for the macro clearer.
CRPT->ECC_CTL = (pbits << CRPT_ECC_CTL_CURVEM_Pos) | (ECCOP_MODULE | modop) | CRPT_ECC_CTL_FSEL_Msk | CRPT_ECC_CTL_START_Msk; | ||
ecc_done = crypto_ecc_wait(); | ||
|
||
/* FIXME: Better error code for ECC accelerator error */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An error code for HW acceleration has recently been introduced in Mbed-TLS/mbedtls#1303, but it's not yet in Mbed OS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanno-arm OK. I leave it unchanged currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in this PR looks good to me. However, it is not entirely clear at which level the PR seeks to provide the alternative ECP implementation. I will go into detail now, recalling first the two ways of providing alternative ECP implementations:
-
Provide a full re-implementation of the entire ECP module.
This is enabled withMBEDTLS_ECP_ALT
and mandates re-implementing the entire ECP API. This option is usually suitable if the accelerator provides high-level ECP functionality like point-multiplication. -
Provide a per-function re-implementation of parts the ECP module.
This option is what the PR currently uses: A set of functions is picked in the config for re-implementation, and only those need to be provided. These functions are the low-level building blocks at the heart of the ECP comb multiplication method: Mixed addition, Point doubling, modular arithmetic. This kind of replacement should can be used to speed up the building blocks while keeping the overall multiplication strategy.
This PR in a sense does both: It provides the main high-level entry point to the ECP module, namely, mbedtls_ecp_mul_jac
, but after all this function is never called or used! It's because for overwriting this function, you would need a full replacement of the ECP module, i.e. use alternative (1). But the PR does also provide the low-level replacements for doubling, mixed addition etc., which as mentioned looks good and seems to be working.
I am neither approving nor disapproving at this stage, but only want to emphasize that currently it is not clear which path this PR wants to take: The accelerator is capable of accelerating the high-level point multiplication, and code for this is provided, but it is not used.
@ccli8 Could you clarify the path you want to take, and adapt the PR accordingly? This means that if you want full replacement, the fine-grained rewriting is not necessary, and if you want fine-grained replacement, then the high-level mbedtls_internal_ecp_mul_jac
should be removed.
MbedTLS doesn't support point multiplication for MBEDTLS_ECP_INTERNAL_ALT acceleration configuration.
@hanno-arm Actually, I would like to go alternative (2) with point multiplication accelerated. |
@ccli8 Please be aware that if your accelerator provides the entire ECP multiplication routine, there's no need for writing acceleration code for the low-level subroutines - these are only used by our default S/W implementation of ECP multiplication. So if high-level acceleration is what you want in the end, you should be able to provide a rather small full-module replacement of ECP without too much work: apart from numerous formatting functions like |
@hanno-arm OK. Please let the PR keep going. I'll consider full-module replacement. If it comes out, I'll send another PR to replace current one. |
/* Implementation that should never be optimized out by the compiler */ | ||
void crypto_zeroize32(uint32_t *v, size_t n) | ||
{ | ||
volatile uint32_t *p = (uint32_t*) v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be (uint32_t volatile *) v
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanno-arm I fixed it in cfdc72d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* | ||
* crypto_zeroize32() has excluded optimization doubt, so we can safely set H/W registers to 0 via it. | ||
*/ | ||
crypto_zeroize32((uint32_t *) eccreg + i, eccreg_num - i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change this to n
instead of i
, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanno-arm I fixed it in 03f0ea1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only two minor suggestions left - overall the PR looks good to me. Usual caveat: I haven't done any on-target testing.
@mazimkhan Could you give this a final look, too? |
/morph build |
Build : SUCCESSBuild number : 1184 Triggering tests/morph test |
Exporter Build : SUCCESSBuild number : 861 |
Description
This PR includes support for cryptography ECP H/W accelerator.