Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Binary --> Utf8View casting #6531

Closed
alamb opened this issue Oct 9, 2024 · 2 comments · Fixed by #6592
Closed

Support Binary --> Utf8View casting #6531

alamb opened this issue Oct 9, 2024 · 2 comments · Fixed by #6592
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@alamb
Copy link
Contributor

alamb commented Oct 9, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While working on apache/datafusion#12788 with StringView upstream in DataFusion, @goldmedal found that casting from BinaryArray --> Utf8View is not supported

Describe the solution you'd like
Support casting BinaryArray --> Utf8View

Describe alternatives you've considered
Here is a modified test_binary_to_view test that should pass

    #[test]
    fn test_binary_to_view() {
        _test_binary_to_view::<i32>();
        _test_binary_to_view::<i64>();
    }

    fn _test_binary_to_view<O>()
    where
        O: OffsetSizeTrait,
    {
        let binary_array = GenericBinaryArray::<O>::from_iter(VIEW_TEST_DATA);

        assert!(can_cast_types(
            binary_array.data_type(),
            &DataType::Utf8View
        ));

        assert!(can_cast_types(
            binary_array.data_type(),
            &DataType::BinaryView
        ));

        let string_view_array = cast(&binary_array, &DataType::Utf8View).unwrap();
        assert_eq!(string_view_array.data_type(), &DataType::Utf8View);

        let binary_view_array = cast(&binary_array, &DataType::BinaryView).unwrap();
        assert_eq!(binary_view_array.data_type(), &DataType::BinaryView);

        let expect_string_view_array = StringViewArray::from_iter(VIEW_TEST_DATA);
        assert_eq!(string_view_array.as_ref(), &expect_string_view_array);

        let expect_binary_view_array = BinaryViewArray::from_iter(VIEW_TEST_DATA);
        assert_eq!(binary_view_array.as_ref(), &expect_binary_view_array);
    }
Full Diff

diff --git a/arrow-cast/src/cast/mod.rs b/arrow-cast/src/cast/mod.rs
index e3fad3da19..f147a9c3f6 100644
--- a/arrow-cast/src/cast/mod.rs
+++ b/arrow-cast/src/cast/mod.rs
@@ -5523,7 +5523,7 @@ mod tests {
     }

     #[test]
-    fn test_bianry_to_view() {
+    fn test_binary_to_view() {
         _test_binary_to_view::<i32>();
         _test_binary_to_view::<i64>();
     }
@@ -5534,14 +5534,25 @@ mod tests {
     {
         let binary_array = GenericBinaryArray::<O>::from_iter(VIEW_TEST_DATA);

+        assert!(can_cast_types(
+            binary_array.data_type(),
+            &DataType::Utf8View
+        ));
+
         assert!(can_cast_types(
             binary_array.data_type(),
             &DataType::BinaryView
         ));

+        let string_view_array = cast(&binary_array, &DataType::Utf8View).unwrap();
+        assert_eq!(string_view_array.data_type(), &DataType::Utf8View);
+
         let binary_view_array = cast(&binary_array, &DataType::BinaryView).unwrap();
         assert_eq!(binary_view_array.data_type(), &DataType::BinaryView);

+        let expect_string_view_array = StringViewArray::from_iter(VIEW_TEST_DATA);
+        assert_eq!(string_view_array.as_ref(), &expect_string_view_array);
+
         let expect_binary_view_array = BinaryViewArray::from_iter(VIEW_TEST_DATA);
         assert_eq!(binary_view_array.as_ref(), &expect_binary_view_array);
     }

The test currently fails with

assertion failed: can_cast_types(binary_array.data_type(), &DataType::Utf8View)
thread 'cast::tests::test_binary_to_view' panicked at arrow-cast/src/cast/mod.rs:5537:9:
assertion failed: can_cast_types(binary_array.data_type(), &DataType::Utf8View)
stack backtrace:
   0: rust_begin_unwind
             at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:665:5
   1: core::panicking::panic_fmt
             at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/panicking.rs:74:14
   2: core::panicking::panic
             at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/panicking.rs:148:5
   3: arrow_cast::cast::tests::_test_binary_to_view
             at ./src/cast/mod.rs:5537:9
   4: arrow_cast::cast::tests::test_binary_to_view
             at ./src/cast/mod.rs:5527:9
   5: arrow_cast::cast::tests::test_binary_to_view::{{closure}}
             at ./src/cast/mod.rs:5526:29
   6: core::ops::function::FnOnce::call_once
             at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:250:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Additional context

@alamb alamb added arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog labels Oct 9, 2024
@tustvold
Copy link
Contributor

tustvold commented Oct 9, 2024

It should be relatively straightforward to make the cast kernel do this by first casting to StringArray

@ngli-me
Copy link
Contributor

ngli-me commented Oct 20, 2024

Hi, I saw this issue looked doable, created a PR to add this enhancement. Thanks tustvoid for the hint, helped me figure out the direction in this large file!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants