Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TF FE]: Fix accuracy issue for Prod on complex tensors on ARM #26601

Open
rkazants opened this issue Sep 14, 2024 · 8 comments
Open

[TF FE]: Fix accuracy issue for Prod on complex tensors on ARM #26601

rkazants opened this issue Sep 14, 2024 · 8 comments
Assignees
Labels
category: TF FE OpenVINO TensorFlow FrontEnd no_stale Do not mark as stale

Comments

@rkazants
Copy link
Contributor

Context

FAILED install/tests/layer_tests/tensorflow_tests/test_tf_ReduceArithmeticOps.py::TestComplexProd::test_reduce[ ie_device:CPU - precision:FP16 - keep_dims:True - params:{'shape': [2, 3, 5], 'axis': 1} ] - AssertionError: Comparing with Framework failed: ie_res={'Imag:0': array([[[ 219.75 , 480. , 372.25 , -128. , -414.25 ]],

   [[-854.     ,   41.84375,  583.5    ,  145.625  ,  471.     ]]],
  dtype=float32), 'Real:0': array([[[ 2.0012500e+02,  2.3217773e-01,  2.4625000e+02, -6.6000000e+01,
     -2.8175000e+02]],

   [[-2.7725000e+02,  8.8050000e+02, -5.8500000e+02, -2.8425000e+02,
     -1.7150000e+02]]], dtype=float32)}; framework_res={'Real:0': array([[[ 200.,    0.,  246.,  -66., -282.]],

   [[-275.,  881., -584., -284., -171.]]], dtype=float32), 'Imag:0': array([[[ 220.,  480.,  372., -128., -414.]],

   [[-855.,   42.,  584.,  146.,  471.]]], dtype=float32)}.

FAILED install/tests/layer_tests/tensorflow_tests/test_tf_ReduceArithmeticOps.py::TestComplexProd::test_reduce[ ie_device:CPU - precision:FP16 - keep_dims:False - params:{'shape': [2, 3, 5], 'axis': 1} ] - AssertionError: Comparing with Framework failed: ie_res={'Imag:0': array([[-550. , 522.5 , -83.75 , -44.65625, 70.0625 ],
[-236.125 , -34.6875 , 27.6875 , -134.25 , -246.875 ]],
dtype=float32), 'Real:0': array([[ -24.453125, -353.75 , -340.5 , -129.125 , -140. ],
[-302.25 , 149.125 , 1089. , -555. , 13.15625 ]],
dtype=float32)}; framework_res={'Real:0': array([[ -25., -354., -340., -129., -140.],
[-302., 149., 1090., -555., 13.]], dtype=float32), 'Imag:0': array([[-550., 522., -85., -45., 70.],
[-236., -35., 30., -135., -247.]], dtype=float32)}.
FAILED install/tests/layer_tests/tensorflow_tests/test_tf_ReduceArithmeticOps.py::TestComplexProd::test_reduce[ ie_device:CPU - precision:FP16 - keep_dims:False - params:{'shape': [3, 1, 2, 4], 'axis': -2} ] - AssertionError: Comparing with Framework failed: ie_res={'Imag:0': array([[[ 42. , 7. , -74. , -21. ]],

   [[  7.9140625,  45.03125  ,  46.       ,  76.       ]],

   [[ 28.03125  ,  41.90625  ,   8.9765625,  77.9375   ]]],
  dtype=float32), 'Real:0': array([[[-14.        ,   0.9804[687](https://github.com/openvinotoolkit/openvino/actions/runs/10807302723/job/29979218095?pr=26512#step:15:688)5,  -0.10742188, 105.0625    ]],

   [[ 62.96875   , -34.9375    ,  29.96875   ,  -2.1894531 ]],

   [[ 81.        , -66.0625    ,  32.03125   ,  54.0625    ]]],
  dtype=float32)}; framework_res={'Real:0': array([[[-14.,   1.,   0., 105.]],

   [[ 63., -35.,  30.,  -2.]],

   [[ 81., -66.,  32.,  54.]]], dtype=float32), 'Imag:0': array([[[ 42.,   7., -74., -21.]],

   [[  8.,  45.,  46.,  76.]],

   [[ 28.,  42.,   9.,  78.]]], dtype=float32)}.

= 3 failed, 3507 passed, 83 skipped, 68 xfailed, 481 xpassed, 50 warnings in 46.50s =

What needs to be done?

Needs to fix it

Example Pull Requests

No response

Resources

Contact points

@openvinotoolkit/openvino-tf-frontend-maintainers
@rkazants

Ticket

No response

@rkazants rkazants added good first issue Good for newcomers category: TF FE OpenVINO TensorFlow FrontEnd no_stale Do not mark as stale labels Sep 14, 2024
@rkazants
Copy link
Contributor Author

@hub-bla, please take a look.

@hub-bla
Copy link
Contributor

hub-bla commented Sep 14, 2024

Hi @rkazants, the current formula that is used by the translator requires lots of steps to create the output, which is the root cause of inaccuracies.

I believe there is no other way of doing that with the current OpenVINO ops.

I think the best idea is to create new version of ReduceProd operation that would apply product of 2 complex numbers based on the following formula: (a + ib) (c + id) = (ac – bd) + i(ad + bc). (That's also how other frameworks like TensorFlow handle this kind of operation.)

Since there are already reduce operations in OpenVINO ops, extending them with this one would not be hard and really beneficial because it might be also used somewhere in the future.
Sorry for inconvenience!

@cmagapu
Copy link

cmagapu commented Sep 15, 2024

@rkazants I'm new to Open Source and would like to contribute. Can I work on this?

@rkazants
Copy link
Contributor Author

@cmagapu, hi! I do not recommend this for newcomers. Instead, please take a look at new JAX good first issues if you wish. They are much easier.

Best regards,
Roman

@rkazants
Copy link
Contributor Author

rkazants commented Sep 15, 2024

Hi @rkazants, the current formula that is used by the translator requires lots of steps to create the output, which is the root cause of inaccuracies.

I believe there is no other way of doing that with the current OpenVINO ops.

I think the best idea is to create new version of ReduceProd operation that would apply product of 2 complex numbers based on the following formula: (a + ib) (c + id) = (ac – bd) + i(ad + bc). (That's also how other frameworks like TensorFlow handle this kind of operation.)

Since there are already reduce operations in OpenVINO ops, extending them with this one would not be hard and really beneficial because it might be also used somewhere in the future. Sorry for inconvenience!

Don't we have a problem with overflow of angle for example in the current implementation?
Btw, do you reproduce this issue on your CPU? Can it be arm specific problem in our CPU plugin?
We have many other operations decomposed into several steps.

@hub-bla
Copy link
Contributor

hub-bla commented Sep 15, 2024

Don't we have a problem with overflow of angle for example in the current implementation?

I did try applying modulo operation on the angle but it did not change the result.

Btw, do you reproduce this issue on your CPU? Can it be arm specific problem in our CPU plugin?

I do not have ARM processor but I did get to the point where the translator throws AssertionError on my cpu here. Also, we had an issue also with the accuracy on GPU as far as I remember.

We have many other operations decomposed into several steps.

I think the prod operation is quite special case because even slight errors will compound during multiplication.

I did not found translator that would have similar problem other than Rsqrt where I already changed the formula once.

@rkazants rkazants changed the title [Good First Issue][TF FE]: Fix accuracy issue for Prod on complex tensors on ARM [TF FE]: Fix accuracy issue for Prod on complex tensors on ARM Sep 15, 2024
@rkazants rkazants removed the good first issue Good for newcomers label Sep 15, 2024
@rkazants
Copy link
Contributor Author

Hi @hub-bla,

I propose you to implement decomposition using Loop operation that will iteratively multiply complex tensors along reduction axes and replace existing solution.
If we find performance issue on real model with such case, we will plan separate operation in our opset.

How do you think?

Best regards,
Roman

@hub-bla
Copy link
Contributor

hub-bla commented Sep 18, 2024

Oops, I guess I omitted Loop operation when I was looking for other solution. Thank you for pointing that out. ;)

I'm currently trying to implement this. However, I could use a hint.

    default_op_checks(node, 2, {"Prod"}, true);
    auto input = node.get_input(0);
    auto axis = node.get_input(1);
    auto keep_dims = node.get_attribute<bool>("keep_dims", false);

    auto complex_type_mark = as_type_ptr<ComplexTypeMark>(input.get_node_shared_ptr());

    if (complex_type_mark) {
        element::Type complex_part_type = complex_type_mark->get_complex_part_type();
        input = complex_type_mark->input_value(0);

        auto gather_index_real = make_shared<v0::Constant>(element::i64, Shape{}, 0);
        auto gather_index_imag = make_shared<v0::Constant>(element::i64, Shape{}, 1);
        auto minus_one = make_shared<v0::Constant>(element::i32, Shape{1}, -1);

        Output<Node> real_part = make_shared<v8::Gather>(input, gather_index_real, minus_one);
        Output<Node> imag_part = make_shared<v8::Gather>(input, gather_index_imag, minus_one);

        Output<Node> example_real_part = std::make_shared<v1::ReduceProd>(real_part, axis, keep_dims);

        Output<Node> real_part_shape = std::make_shared<v0::ShapeOf>(example_real_part);

        auto const_one = create_same_type_const_scalar<float>(real_part, 1);
        auto init_reduced_real = std::make_shared<v1::Broadcast>(const_one, real_part_shape);

        auto const_zero = create_same_type_const_scalar<float>(imag_part, 0);
        auto init_reduced_imag = std::make_shared<v1::Broadcast>(const_zero, real_part_shape);


        auto reduced_real_input = std::make_shared<v0::Parameter>(real_part.get_element_type(), example_real_part.get_partial_shape());
        auto reduced_imag_input = std::make_shared<v0::Parameter>(imag_part.get_element_type(), example_real_part.get_partial_shape());
        auto gather_idx = std::make_shared<v0::Parameter>(element::i64, Shape{});


        auto gather_init = std::make_shared<v0::Constant>(element::i64, Shape{}, 0);
        auto gather_increment =  std::make_shared<v0::Constant>(element::i64, Shape{}, 1);

        auto real_to_compute = std::make_shared<v8::Gather>(real_part, gather_idx, axis);
        auto imag_to_compute = std::make_shared<v8::Gather>(imag_part, gather_idx, axis);

        // ac - bd + (ad - bc)

        auto ac = make_shared<v1::Multiply>(reduced_real_input, real_to_compute);

        auto bd = make_shared<v1::Multiply>(reduced_imag_input, imag_to_compute);

        auto ad = make_shared<v1::Multiply>(reduced_real_input, imag_to_compute);

        auto bc = make_shared<v1::Multiply>(reduced_imag_input, real_to_compute);

        auto new_real = make_shared<v1::Subtract>(ac, bd);
        auto new_imag = make_shared<v1::Subtract>(ad, bc);
        auto new_gather_idx = make_shared<v1::Add>(gather_idx, gather_increment);

        auto trip_count = std::make_shared<v0::Constant>(element::i32, Shape{}, 11);
        auto exec_condition = std::make_shared<v0::Constant>(element::boolean, Shape{}, true);

        auto loop = std::make_shared<v5::Loop>(trip_count, exec_condition);
        auto body =
                std::make_shared<Model>(OutputVector{new_real, new_imag, new_gather_idx}, ParameterVector{reduced_real_input, reduced_imag_input, gather_idx});

        loop->set_function(body);
        loop->set_merged_input(gather_idx, gather_init, new_gather_idx);
        loop->set_merged_input(reduced_real_input, init_reduced_real, new_real);
        loop->set_merged_input(reduced_imag_input, init_reduced_imag, new_imag);

//        auto real_output = loop->get_iter_value(new_real, -1);
//        auto imag_output = loop->get_iter_value(new_imag, -1);
//        example_real_part
        auto real_output = example_real_part;
        auto imag_output = example_real_part;
        auto real_unsqueeze = make_shared<v0::Unsqueeze>(real_output, minus_one);
        auto imag_unsqueeze = make_shared<v0::Unsqueeze>(imag_output, minus_one);

        auto concat_result = make_shared<v0::Concat>(OutputVector{real_unsqueeze, imag_unsqueeze}, -1);

        set_node_name(node.get_name(), concat_result);

        auto complex_result = make_shared<ComplexTypeMark>(concat_result, complex_part_type);
        return {complex_result};
    }

I'm currently getting this error at body part of the code even though it is not used as an output for now:

        auto body =
                std::make_shared<Model>(OutputVector{new_real, new_imag, new_gather_idx}, ParameterVector{reduced_real_input, reduced_imag_input, gather_idx});

Screenshot from 2024-09-18 09-48-31

I don't quite understand why it throws the error about complex type mark when it was has been taken down at the top of if statement. Do I need to mark parameters of body as complex type?

Thank you in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: TF FE OpenVINO TensorFlow FrontEnd no_stale Do not mark as stale
Projects
None yet
Development

No branches or pull requests

3 participants