Skip to content

Conversation

lithomas1
Copy link
Contributor

@lithomas1 lithomas1 commented Mar 13, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

A little more than 10% improvement here. Not posting asv's, since the other benchmarks are really flaky.

Idea here is that we can reject on the first non-string entry, but infer_dtype will go through the entire array, wasting comparisons.

@lithomas1 lithomas1 force-pushed the faster-factorize-object branch from 8712ed3 to ed77ca3 Compare March 13, 2023 00:29
@lithomas1 lithomas1 added Performance Memory or execution speed performance Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Mar 13, 2023
@lithomas1 lithomas1 marked this pull request as ready for review March 13, 2023 02:13
Copy link
Contributor

@topper-123 topper-123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. LGMT.

@topper-123 topper-123 added this to the 2.1 milestone Mar 13, 2023
@mroeschke mroeschke merged commit f122e2e into pandas-dev:main Mar 13, 2023
@mroeschke
Copy link
Member

Thanks @lithomas1

@lithomas1 lithomas1 deleted the faster-factorize-object branch March 14, 2023 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants