Skip to content

M487: Support ECP H/W accelerator #5812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 20, 2018
Merged

Conversation

ccli8
Copy link
Contributor

@ccli8 ccli8 commented Jan 9, 2018

Description

This PR includes support for cryptography ECP H/W accelerator.

@ciarmcom
Copy link
Member

ciarmcom commented Jan 9, 2018

ARM Internal Ref: IOTSSL-2000

@hanno-becker
Copy link

@0xc0170 @ccli8 I can't find the specification of the ECP H/W accelerator - can you help?


/* R = m*P = (multiple of order)*G = 0 */
/* NOTE: If grp->N (order) is a prime, we could detect R = 0 for all m*P cases
* by just checking if m is a multiple of grp->N. Otherwise, sigh. */
Copy link

@hanno-becker hanno-becker Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order grp->N is not the order of the entire elliptic curve, but the order of the subgroup <G> spanned by ecp->G, which is assumed to be prime. Therefore the check is fine and exhausts R=0 completely, provided one makes the assumption that P lies in <G> -- whereas without this assumption, the check wouldn't be accurate in the first place. I think P belonging to <G> is a reasonable yet undocumented assumption the default S/W ECC code also makes at times, but I don't remember details - ping @mpg? Continuing on this assumption, the case of order 2, i.e. P = (-P) but P != 0, cannot occur.

What happens to the H/W accelerator if R=0 is not covered beforehand? It seems that it's not capable of returning non-affine points, or is it?

Copy link

@mpg mpg Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the short Weierstrass case, for all curves we currently support, <G> is the whole curve, and mbedtls_ecp_check_pubkey() (called at the beginning of mbedtls_ecp_mul() and hopefully all other public functions that receive both secret input and some untrusted input) ensures that the point lies on the curve, hence that it belongs to <G>. If we ever added support for short weierstrass curves with non-trivial cofactor, we would need to adapt mbedtls_ecp_check_pubkey() so that it keeps checking that the point is a multiple of G.

Also, mbedtls_ecp_check_privkey() ensures that m is in the range [1, N) so together with the previous check on P, this ensures that m * P != 0. Note however that this protection only applies to the scalar multiplication part: when doing the final addition in R = m * P + n * Q you still need to be prepared for the R == 0 case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to the H/W accelerator if R=0 is not covered beforehand? It seems that it's not capable of returning non-affine points, or is it?

@hanno-arm If R = 0, H/W accelerator will hang and its finish/error flag won't be set. That's why I must filter out all R = 0 cases.

mbedtls_ecp_point *R,
const mbedtls_mpi *m,
const mbedtls_ecp_point *P,
const mbedtls_mpi *n,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter n is unused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanno-arm The parameter n is kept to be consistent with R = m*P + n*Q. I add comment for it in 1728037.

@ccli8
Copy link
Contributor Author

ccli8 commented Jan 19, 2018

I can't find the specification of the ECP H/W accelerator

@hanno-arm Nuvoton's M480 series is not released yet (about 3/E). Do you have a strong need for ECP H/W spec for the review?
@0xc0170

ccli8 added 2 commits January 22, 2018 10:51
1. Add comment for unnecessary parameter 'n' in mbedtls_internal_run_eccop
2. Fix warning message with goto which causes `bypass initialization`
3. Fix comment
* n is kept with unused modifier.
*
*/
int mbedtls_internal_run_eccop(const mbedtls_ecp_group *grp,
Copy link

@hanno-becker hanno-becker Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's meant to be an internal function, please make it static and remove the mbedtls_ prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanno-arm I remove the mbedtls_ prefix in df76e29 for internal functions. For Nuvoton's self-test on ECP alter. module, I add a macro NU_CRYPTO_SELF_TEST to conditionally remove the static modifier in 2525352.

MBEDTLS_INTERNAL_MPI_NORM(&Np, *o2, N_, *p);
MBEDTLS_MPI_CHK(mbedtls_internal_mpi_write_eccreg(Np, (uint32_t *) CRPT->ECC_Y1, NU_ECC_BIGNUM_MAXWORD));
} else if (modop == MODOP_DIV) {
MBEDTLS_INTERNAL_MPI_NORM(&Np, *o2, N_, *p);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the divisor is 0? Do we need to check for it or will we get a graceful failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanno-arm I add zero divisor check in c9cc357.

*
* N_: Holds normalized MPI if the passed-in MPI N1 is not
* Np: Pointer to normalized MPI, which could be N1 or N_
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the benefit of this macro compared to performing a modular reduction on the respective variable in-place. On the functional side it's the same, on the performance side it has the chance to be less costly because it is not enforced that the bignum module will need to make a copy, and it would be more readable (for my taste). Could you elaborate why you chose this path, or change MBEDTLS_INTERNAL_MPI_NORM(a,b,c,d) to MBEDTLS_INTERNAL_MPI_NORM(x,N) performing a modular reduction on x mod N in-place if necessary?

Looking at it, I think you can even reduce this to a call to mbedtls_mpi_mod_mpi( x, x, N ) everywhere, as the latter function checks if already x < N before performing any expensive operations.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, many MPI's are unfortunately declared const in the fixed API, and we are not allowed to perform modular reduction in-place. That makes the necessity and motivation for the macro clearer.

CRPT->ECC_CTL = (pbits << CRPT_ECC_CTL_CURVEM_Pos) | (ECCOP_MODULE | modop) | CRPT_ECC_CTL_FSEL_Msk | CRPT_ECC_CTL_START_Msk;
ecc_done = crypto_ecc_wait();

/* FIXME: Better error code for ECC accelerator error */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error code for HW acceleration has recently been introduced in Mbed-TLS/mbedtls#1303, but it's not yet in Mbed OS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanno-arm OK. I leave it unchanged currently.

Copy link

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in this PR looks good to me. However, it is not entirely clear at which level the PR seeks to provide the alternative ECP implementation. I will go into detail now, recalling first the two ways of providing alternative ECP implementations:

  1. Provide a full re-implementation of the entire ECP module.
    This is enabled with MBEDTLS_ECP_ALT and mandates re-implementing the entire ECP API. This option is usually suitable if the accelerator provides high-level ECP functionality like point-multiplication.

  2. Provide a per-function re-implementation of parts the ECP module.
    This option is what the PR currently uses: A set of functions is picked in the config for re-implementation, and only those need to be provided. These functions are the low-level building blocks at the heart of the ECP comb multiplication method: Mixed addition, Point doubling, modular arithmetic. This kind of replacement should can be used to speed up the building blocks while keeping the overall multiplication strategy.

This PR in a sense does both: It provides the main high-level entry point to the ECP module, namely, mbedtls_ecp_mul_jac, but after all this function is never called or used! It's because for overwriting this function, you would need a full replacement of the ECP module, i.e. use alternative (1). But the PR does also provide the low-level replacements for doubling, mixed addition etc., which as mentioned looks good and seems to be working.

I am neither approving nor disapproving at this stage, but only want to emphasize that currently it is not clear which path this PR wants to take: The accelerator is capable of accelerating the high-level point multiplication, and code for this is provided, but it is not used.

@ccli8 Could you clarify the path you want to take, and adapt the PR accordingly? This means that if you want full replacement, the fine-grained rewriting is not necessary, and if you want fine-grained replacement, then the high-level mbedtls_internal_ecp_mul_jac should be removed.

@ccli8
Copy link
Contributor Author

ccli8 commented Feb 6, 2018

@hanno-arm Actually, I would like to go alternative (2) with point multiplication accelerated. ecp.c has been thoroughly tested which I can rely on and just focus on accelerator logic. Per my test, with alternative (2) on Nuvoton's ECP accelerator, point multiplication can improve 30X~40X, but point addition and point double can improve only around 2X and 2X respectively. It would be very appreciated if you could also open point multiplication acceleration interface in alternative (2). Anyway, I remove mbedtls_internal_ecp_mul in 95d4110 currently.

@hanno-becker
Copy link

@ccli8 Please be aware that if your accelerator provides the entire ECP multiplication routine, there's no need for writing acceleration code for the low-level subroutines - these are only used by our default S/W implementation of ECP multiplication. So if high-level acceleration is what you want in the end, you should be able to provide a rather small full-module replacement of ECP without too much work: apart from numerous formatting functions like mbedtls_ecp_tls_read_group that you can copy over, the only public functions related to EC arithmetic are mbedtls_ecp_mul and mbedtls_ecp_muladd, and you have direct acceleration for these primitives it seems.
It is unlikely that there will be functionality for replacing mbedtls_internal_ecp_mul only, as you requested - instead, full-module replacement is what be used in this case.

@ccli8
Copy link
Contributor Author

ccli8 commented Feb 8, 2018

@hanno-arm OK. Please let the PR keep going. I'll consider full-module replacement. If it comes out, I'll send another PR to replace current one.

/* Implementation that should never be optimized out by the compiler */
void crypto_zeroize32(uint32_t *v, size_t n)
{
volatile uint32_t *p = (uint32_t*) v;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be (uint32_t volatile *) v.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanno-arm I fixed it in cfdc72d.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

*
* crypto_zeroize32() has excluded optimization doubt, so we can safely set H/W registers to 0 via it.
*/
crypto_zeroize32((uint32_t *) eccreg + i, eccreg_num - i);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change this to n instead of i, please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanno-arm I fixed it in 03f0ea1.

Copy link

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only two minor suggestions left - overall the PR looks good to me. Usual caveat: I haven't done any on-target testing.

@hanno-becker
Copy link

@mazimkhan Could you give this a final look, too?

@hanno-becker
Copy link

@cmonr @0xc0170 Ready to go.

@0xc0170
Copy link
Contributor

0xc0170 commented Feb 20, 2018

/morph build

@mbed-ci
Copy link

mbed-ci commented Feb 20, 2018

Build : SUCCESS

Build number : 1184
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/5812/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

@mbed-ci
Copy link

mbed-ci commented Feb 20, 2018

@mbed-ci
Copy link

mbed-ci commented Feb 20, 2018

@cmonr cmonr merged commit 817f9a5 into ARMmbed:master Feb 20, 2018
@ccli8 ccli8 deleted the nuvoton_crypto branch February 21, 2018 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants