Case-Insensitive Search On The Oracle Database
Case-Insensitive Search On The Oracle Database
As you know, managing the input data is crucial to have the correct data. I have seen
many applications without proper user input handling. For example, I remember that
in an application, the user could enter a character instead of a number, and this would
cause the data to be corrupted.
Another example is data that looks like a number but isn't a number, such as phone
numbers that start with zero. If this data type is treated as a number, we lose the
number of leading zeros before the other numbers. This situation gets worse when
the number of zeros before the number is meaningful or the length of the string is
uncertain and we do not know how many zeros there are to the left of the number.
In this case, all the values 000123, 00123, 0123, and 123 are treated as one value
(123). Sometimes we have to work with corrupted data or correct them after
changing the application and correctly managing the data input. In this example, I
explained one of the most common issues related to working with incorrect data. We
want to search for case insensitivity in our current data. Suppose that we have this
table to keep our user's information:
As it's clear, username must be unique and we have a unique constraint to control it.
However, 16 people entered a similar username that is not acceptable for the
business:
We insert another 10,000 rows for other users so that we can have a better and more
realistic execution plan comparison:
SQL> commit;
Now, after a while, for example, during an ETL operation, we understand that we
have 16 users with usernames similar to "amir" and our reporting application cannot
manage them correctly. This query only returns one of them:
SQL> select * from test_case_tbl
where username = 'amir';
So, the request is to report all usernames equal to "amir" regardless of uppercase or
lowercase letters.
Solution 1
It seems easy to use the SQL character functions to solve it, for example, we have
these options:
SQL> select * from test_case_tbl where lower(username) = 'amir';
Although these methods return the desired result, they result in a "full table scan".
Therefore, we can create a Function-Based index to address this problem:
SQL> create index USERNAME_IDX1 on test_case_tbl(lower(username));
Please, keep in mind that we just created this index for the "lower" function. So, if
developers use two other functions ("upper" and "initcap"), we will face to "Full
Table Scan" again.
Solution 2
In the Oracle Database version 12.2, there is another solution:
SQL> select * from test_case_tbl
where username collate binary_ci = 'amir';
Note that you can write the string however you like to the right of the equal sign
SQL> select * from test_case_tbl
where username collate binary_ci = 'AmiR';
Good Luck
Amirreza Rastandeh