13.5.1 String Comparison Functions 13.5.2 Regular Expressions
13.5.1 String Comparison Functions 13.5.2 Regular Expressions
5 String Functions
13.5.1 String Comparison Functions
13.5.2 Regular Expressions
Table 13.7 String Operators
Name Description
ASCII() Return numeric value of left-most character
BIN() Return a string containing binary representation of a number
BIT_LENGTH() Return length of argument in bits
CHAR() Return the character for each integer passed
CHAR_LENGTH() Return number of characters in argument
CHARACTER_LENGTH() Synonym for CHAR_LENGTH()
CONCAT() Return concatenated string
CONCAT_WS() Return concatenate with separator
ELT() Return string at index number
EXPORT_SET()
Return a string such that for every bit set in the value bits, you get an
on string and for every unset bit, you get an off string
FIELD()
Return the index (position) of the first argument in the subsequent
arguments
FIND_IN_SET()
Return the index position of the first argument within the second
argument
FORMAT() Return a number formatted to specified number of decimal places
FROM_BASE64() Decode to a base-64 string and return result
HEX() Return a hexadecimal representation of a decimal or string value
INSERT()
Insert a substring at the specified position up to the specified number
of characters
INSTR() Return the index of the first occurrence of substring
LCASE() Synonym for LOWER()
LEFT() Return the leftmost number of characters as specified
LENGTH() Return the length of a string in bytes
LIKE Simple pattern matching
LOAD_FILE() Load the named file
LOCATE() Return the position of the first occurrence of substring
LOWER() Return the argument in lowercase
LPAD() Return the string argument, left-padded with the specified string
LTRIM() Remove leading spaces
MAKE_SET()
Return a set of comma-separated strings that have the corresponding
bit in bits set
Name Description
MATCH Perform full-text search
MID() Return a substring starting from the specified position
NOT LIKE Negation of simple pattern matching
NOT REGEXP Negation of REGEXP
OCT() Return a string containing octal representation of a number
OCTET_LENGTH() Synonym for LENGTH()
ORD() Return character code for leftmost character of the argument
POSITION() Synonym for LOCATE()
QUOTE() Escape the argument for use in an SQL statement
REGEXP Pattern matching using regular expressions
REPEAT() Repeat a string the specified number of times
REPLACE() Replace occurrences of a specified string
REVERSE() Reverse the characters in a string
RIGHT() Return the specified rightmost number of characters
RLIKE Synonym for REGEXP
RPAD() Append string the specified number of times
RTRIM() Remove trailing spaces
SOUNDEX() Return a soundex string
SOUNDS LIKE Compare sounds
SPACE() Return a string of the specified number of spaces
STRCMP() Compare two strings
SUBSTR() Return the substring as specified
SUBSTRING() Return the substring as specified
SUBSTRING_INDEX()
Return a substring from a string before the specified number of
occurrences of the delimiter
TO_BASE64() Return the argument converted to a base-64 string
TRIM() Remove leading and trailing spaces
UCASE() Synonym for UPPER()
UNHEX() Return a string containing hex representation of a number
UPPER() Convert to uppercase
WEIGHT_STRING() Return the weight string for a string
String-valued functions return NULL if the length of the result would be greater than the value
of the max_allowed_packet system variable. See Section 9.12.2, “Tuning Server
Parameters”.
For functions that operate on string positions, the first position is numbered 1.
For functions that take length arguments, noninteger arguments are rounded to the nearest
integer.
ASCII(str)
Returns the numeric value of the leftmost character of the string str. Returns 0 if str
is the empty string. Returns NULL if str is NULL. ASCII() works for 8-bit characters.
BIN(N)
CHAR() interprets each argument N as an integer and returns a string consisting of the
characters given by the code values of those integers. NULL values are skipped.
CHAR() arguments larger than 255 are converted into multiple result bytes. For
example, CHAR(256) is equivalent to CHAR(1,0), and CHAR(256*256) is equivalent to
CHAR(1,0,0):
If USING is given and the result string is illegal for the given character set, a warning
is issued. Also, if strict SQL mode is enabled, the result from CHAR() becomes NULL.
CHAR_LENGTH(str)
Returns the length of the string str, measured in characters. A multibyte character
counts as a single character. This means that for a string containing five 2-byte
characters, LENGTH() returns 10, whereas CHAR_LENGTH() returns 5.
CHARACTER_LENGTH(str)
CONCAT(str1,str2,...)
Returns the string that results from concatenating the arguments. May have one or
more arguments. If all arguments are nonbinary strings, the result is a nonbinary
string. If the arguments include any binary strings, the result is a binary string. A
numeric argument is converted to its equivalent nonbinary string form.
For quoted strings, concatenation can be performed by placing the strings next to each
other:
CONCAT_WS() does not skip empty strings. However, it does skip any NULL values
after the separator argument.
ELT(N,str1,str2,str3,...)
ELT() returns the Nth element of the list of strings: str1 if N = 1, str2 if N = 2, and so
on. Returns NULL if N is less than 1 or greater than the number of arguments. ELT() is
the complement of FIELD().
Returns a string such that for every bit set in the value bits, you get an on string and
for every bit not set in the value, you get an off string. Bits in bits are examined
from right to left (from low-order to high-order bits). Strings are added to the result
from left to right, separated by the separator string (the default being the comma
character “,”). The number of bits examined is given by number_of_bits, which has
a default of 64 if not specified. number_of_bits is silently clipped to 64 if larger than
64. It is treated as an unsigned integer, so a value of −1 is effectively the same as 64.
Returns the index (position) of str in the str1, str2, str3, ... list. Returns 0 if str
is not found.
If all arguments to FIELD() are strings, all arguments are compared as strings. If all
arguments are numbers, they are compared as numbers. Otherwise, the arguments are
compared as double.
If str is NULL, the return value is 0 because NULL fails equality comparison with any
value. FIELD() is the complement of ELT().
The optional third parameter enables a locale to be specified to be used for the result
number's decimal point, thousands separator, and grouping between separators.
Permissible locale values are the same as the legal values for the lc_time_names
system variable (see Section 11.7, “MySQL Server Locale Support”). If no locale is
specified, the default is 'en_US'.
Takes a string encoded with the base-64 encoded rules used by TO_BASE64() and
returns the decoded result as a binary string. The result is NULL if the argument is
NULL or not a valid base-64 string. See the description of TO_BASE64() for details
about the encoding and decoding rules.
For a string argument str, HEX() returns a hexadecimal string representation of str
where each byte of each character in str is converted to two hexadecimal digits.
(Multibyte characters therefore become more than two digits.) The inverse of this
operation is performed by the UNHEX() function.
Returns the string str, with the substring beginning at position pos and len
characters long replaced by the string newstr. Returns the original string if pos is not
within the length of the string. Replaces the rest of the string from position pos if len
is not within the length of the rest of the string. Returns NULL if any argument is NULL.
INSTR(str,substr)
Returns the position of the first occurrence of substring substr in string str. This is
the same as the two-argument form of LOCATE(), except that the order of the
arguments is reversed.
This function is multibyte safe, and is case sensitive only if at least one argument is a
binary string.
LCASE(str)
In MySQL 5.7, LCASE() used in a view is rewritten as LOWER() when storing the
view's definition. (Bug #12844279)
LEFT(str,len)
Returns the leftmost len characters from the string str, or NULL if any argument is
NULL.
LENGTH(str)
Returns the length of the string str, measured in bytes. A multibyte character counts
as multiple bytes. This means that for a string containing five 2-byte characters,
LENGTH() returns 10, whereas CHAR_LENGTH() returns 5.
Note
LOAD_FILE(file_name)
Reads the file and returns the file contents as a string. To use this function, the file
must be located on the server host, you must specify the full path name to the file, and
you must have the FILE privilege. The file must be readable by all and its size less
than max_allowed_packet bytes. If the secure_file_priv system variable is set to
a nonempty directory name, the file to be loaded must be located in that directory.
If the file does not exist or cannot be read because one of the preceding conditions is
not satisfied, the function returns NULL.
mysql> UPDATE t
SET blob_col=LOAD_FILE('/tmp/picture')
WHERE id=1;
LOCATE(substr,str), LOCATE(substr,str,pos)
The first syntax returns the position of the first occurrence of substring substr in
string str. The second syntax returns the position of the first occurrence of substring
substr in string str, starting at position pos. Returns 0 if substr is not in str.
This function is multibyte safe, and is case-sensitive only if at least one argument is a
binary string.
LOWER(str)
Returns the string str with all characters changed to lowercase according to the
current character set mapping. The default is latin1 (cp1252 West European).
For collations of Unicode character sets, LOWER() and UPPER() work according to the
Unicode Collation Algorithm (UCA) version in the collation name, if there is one, and
UCA 4.0.0 if no version is specified. For example, utf8_unicode_520_ci works
according to UCA 5.2.0, whereas utf8_unicode_ci works according to UCA 4.0.0.
See Section 11.1.14.1, “Unicode Character Sets”.
LPAD(str,len,padstr)
Returns the string str, left-padded with the string padstr to a length of len
characters. If str is longer than len, the return value is shortened to len characters.
MAKE_SET(bits,str1,str2,...)
OCT(N)
ORD(str)
If the leftmost character of the string str is a multibyte character, returns the code for
that character, calculated from the numeric values of its constituent bytes using this
formula:
If the leftmost character is not a multibyte character, ORD() returns the same value as
the ASCII() function.
QUOTE(str)
Quotes a string to produce a result that can be used as a properly escaped data value in
an SQL statement. The string is returned enclosed by single quotation marks and with
each instance of backslash (“\”), single quote (“'”), ASCII NUL, and Control+Z
preceded by a backslash. If the argument is NULL, the return value is the word
“NULL” without enclosing single quotation marks.
REPEAT(str,count)
Returns a string consisting of the string str repeated count times. If count is less
than 1, returns an empty string. Returns NULL if str or count are NULL.
Returns the string str with all occurrences of the string from_str replaced by the
string to_str. REPLACE() performs a case-sensitive match when searching for
from_str.
REVERSE(str)
Returns the string str with the order of the characters reversed.
RIGHT(str,len)
Returns the rightmost len characters from the string str, or NULL if any argument is
NULL.
RPAD(str,len,padstr)
Returns the string str, right-padded with the string padstr to a length of len
characters. If str is longer than len, the return value is shortened to len characters.
SOUNDEX(str)
Returns a soundex string from str. Two strings that sound almost the same should
have identical soundex strings. A standard soundex string is four characters long, but
the SOUNDEX() function returns an arbitrarily long string. You can use SUBSTRING()
on the result to get a standard soundex string. All nonalphabetic characters in str are
ignored. All international alphabetic characters outside the A-Z range are treated as
vowels.
Important
We hope to remove these limitations in a future release. See Bug #22638 for
more information.
Note
This function implements the original Soundex algorithm, not the more
popular enhanced version (also described by D. Knuth). The difference is that
original version discards vowels first and duplicates second, whereas the
enhanced version discards duplicates first and vowels second.
SPACE(N)
The forms without a len argument return a substring from string str starting at
position pos. The forms with a len argument return a substring len characters long
from string str, starting at position pos. The forms that use FROM are standard SQL
syntax. It is also possible to use a negative value for pos. In this case, the beginning of
the substring is pos characters from the end of the string, rather than the beginning. A
negative value may be used for pos in any of the forms of this function.
For all forms of SUBSTRING(), the position of the first character in the string from
which the substring is to be extracted is reckoned as 1.
SUBSTRING_INDEX(str,delim,count)
Returns the substring from string str before count occurrences of the delimiter
delim. If count is positive, everything to the left of the final delimiter (counting from
the left) is returned. If count is negative, everything to the right of the final delimiter
(counting from the right) is returned. SUBSTRING_INDEX() performs a case-sensitive
match when searching for delim.
TO_BASE64(str)
Converts the string argument to base-64 encoded form and returns the result as a
character string with the connection character set and collation. If the argument is not
a string, it is converted to a string before conversion takes place. The result is NULL if
the argument is NULL. Base-64 encoded strings can be decoded using the
FROM_BASE64() function.
Different base-64 encoding schemes exist. These are the encoding and decoding rules
used by TO_BASE64() and FROM_BASE64():
Returns the string str with all remstr prefixes or suffixes removed. If none of the
specifiers BOTH, LEADING, or TRAILING is given, BOTH is assumed. remstr is optional
and, if not specified, spaces are removed.
UCASE(str)
In MySQL 5.7, UCASE() used in a view is rewritten as UPPER() when storing the
view's definition. (Bug #12844279)
UNHEX(str)
For a string argument str, UNHEX(str) interprets each pair of characters in the
argument as a hexadecimal number and converts it to the byte represented by the
number. The return value is a binary string.
The characters in the argument string must be legal hexadecimal digits: '0' .. '9',
'A' .. 'F', 'a' .. 'f'. If the argument contains any nonhexadecimal digits, the result
is NULL:
A NULL result can occur if the argument to UNHEX() is a BINARY column, because
values are padded with 0x00 bytes when stored but those bytes are not stripped on
retrieval. For example, '41' is stored into a CHAR(3) column as '41 ' and retrieved
as '41' (with the trailing pad space stripped), so UNHEX() for the column value
returns 'A'. By contrast '41' is stored into a BINARY(3) column as '41\0' and
retrieved as '41\0' (with the trailing pad 0x00 byte not stripped). '\0' is not a legal
hexadecimal digit, so UNHEX() for the column value returns NULL.
For a numeric argument N, the inverse of HEX(N) is not performed by UNHEX(). Use
CONV(HEX(N),16,10) instead. See the description of HEX().
UPPER(str)
Returns the string str with all characters changed to uppercase according to the
current character set mapping. The default is latin1 (cp1252 West European).
See the description of LOWER() for information that also applies to UPPER(). This
included information about how to perform lettercase conversion of binary strings
(BINARY, VARBINARY, BLOB) for which these functions are ineffective, and information
about case folding for Unicode character sets.
This function returns the weight string for the input string. The return value is a binary
string that represents the sorting and comparison value of the string. It has these
properties:
The input string, str, is a string expression. If the input is a nonbinary (character)
string such as a CHAR, VARCHAR, or TEXT value, the return value contains the collation
weights for the string. If the input is a binary (byte) string such as a BINARY,
VARBINARY, or BLOB value, the return value is the same as the input (the weight for
each byte in a binary string is the byte value). If the input is NULL, WEIGHT_STRING()
returns NULL.
Examples:
The preceding examples use HEX() to display the WEIGHT_STRING() result. Because
the result is a binary value, HEX() can be especially useful when the result contains
nonprinting values, to display it in printable form:
For non-NULL return values, the data type of the value is VARBINARY if its length is
within the maximum length for VARBINARY, otherwise the data type is BLOB.
The AS clause may be given to cast the input string to a nonbinary or binary string and
to force it to a given length:
o AS CHAR(N) casts the string to a nonbinary string and pads it on the right with
spaces to a length of N characters. N must be at least 1. If N is less than the
length of the input string, the string is truncated to N characters. No warning
occurs for truncation.
o AS BINARY(N) is similar but casts the string to a binary string, N is measured
in bytes (not characters), and padding uses 0x00 bytes (not spaces).
The LEVEL clause may be given to specify that the return value should contain weights
for specific collation levels.
The levels specifier following the LEVEL keyword may be given either as a list of
one or more integers separated by commas, or as a range of two integers separated by
a dash. Whitespace around the punctuation characters does not matter.
Examples:
LEVEL 1
LEVEL 2, 3, 5
LEVEL 1-3
Any level less than 1 is treated as 1. Any level greater than the maximum for the input
string collation is treated as maximum for the collation. The maximum varies per
collation, but is never greater than 6.
In a list of levels, levels must be given in increasing order. In a range of levels, if the
second number is less than the first, it is treated as the first number (for example, 4-2
is the same as 4-4).
If the LEVEL clause is omitted, MySQL assumes LEVEL 1 - max, where max is the
maximum level for the collation.
If LEVEL is specified using list syntax (not range syntax), any level number can be
followed by these modifiers:
o REVERSE: Return the weights in reverse order (that is,the weights for the
reversed string, with the first character last and the last first).
Examples:
ORDER BY
CAST(SUBSTRING(
Host,
1,
LOCATE('.', Host) - 1)
AS UNSIGNED),
CAST(SUBSTRING(
Host,
LOCATE('.', Host) + 1,
LOCATE('.', Host, LOCATE('.', Host) + 1)
- LOCATE('.', Host) - 1)
AS UNSIGNED),
CAST(SUBSTRING(
Host,
LOCATE('.', Host, LOCATE('.', Host) + 1) + 1,
LOCATE('.', Host,
LOCATE('.', Host, LOCATE('.', Host) + 1) + 1)
- LOCATE('.', Host, LOCATE('.', Host) + 1) - 1)
AS UNSIGNED),
CAST(SUBSTRING(
Host,
LOCATE('.', Host, LOCATE('.', Host,
LOCATE('.', Host) + 1) + 1) + 1,
3)
AS UNSIGNED)
The following formula can be used to extract the Nth item in a delimited list, in this case the
3rd item "ccccc" in the example comma separated list.
The above formula does not need the first item to be handled as a special case and returns
empty strings correctly when the item count is less than the position requested.
SELECT
`ip` ,
SUBSTRING_INDEX( `ip` , '.', 1 ) AS a,
SUBSTRING_INDEX(SUBSTRING_INDEX( `ip` , '.', 2 ),'.',-1) AS b,
SUBSTRING_INDEX(SUBSTRING_INDEX( `ip` , '.', -2 ),'.',1) AS c,
SUBSTRING_INDEX( `ip` , '.', -1 ) AS d
FROM log_table
Reverses email, counts the characters from left minus the @. Reverses the reverse and returns
'domain.com'.
Perhaps there is a better/fast/easier way, however it's not easily found. So here is mine.
Both will return identical results on email addresses, since they only have one @ in them. I
can't believe you didn't think of SUBSTRING_INDEX, even after the previous two
comments used it :)
This will create an area filled with "+", where the length of each "+" bar equals the number in
column ColName in that row.
70 is an upper bound on the values in ColName; change it to match your actual data.
Posted by Erel Segal on July 24, 2006
Correction to the previous tip: in the current version, EXPORT_SET does not create a string
with more than 64 chars, even if you explicitly ask for 70 chars.
Another problem is that for numbers N > 53, 2^N - 1 equals 2^N because of rounding errors,
so you will not see a bar, only a single "+".
Posted by Andrew Hanna on August 24, 2006
I created a user-defined function in MySQL 5.0+ similar to PHP's substr_count(), since I
could not find an equivalent native function in MySQL. (If there is one please tell me!!!)
delimiter ||
DROP FUNCTION IF EXISTS substrCount||
CREATE FUNCTION substrCount(s VARCHAR(255), ss VARCHAR(255)) RETURNS
TINYINT(3) UNSIGNED LANGUAGE SQL NOT DETERMINISTIC READS SQL DATA
BEGIN
DECLARE count TINYINT(3) UNSIGNED;
DECLARE offset TINYINT(3) UNSIGNED;
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET s = NULL;
SET count = 0;
SET offset = 1;
REPEAT
IF NOT ISNULL(s) AND offset > 0 THEN
SET offset = LOCATE(ss, s, offset);
IF offset > 0 THEN
SET count = count + 1;
SET offset = offset + 1;
END IF;
END IF;
UNTIL ISNULL(s) OR offset = 0 END REPEAT;
RETURN count;
END;
||
delimiter ;
`count` would return 4 in this case. Can be used in such cases where you might want to find
the "depth" of a path, or for many other uses.
Posted by Michael Newton on August 31, 2006
To [name withheld] who suggested a method for turning IP addresses into numbers, I would
suggest that the INET_ATON() function is a little easier to use!
Posted by NOT_FOUND NOT_FOUND on August 21, 2008
It's pretty easy to create your own string functions for many examples listed here
## Count substrings
select ucfirst("TEST");
+-----------------+
| ucfirst("TEST") |
+-----------------+
| Test |
+-----------------+
##Or a more complicated example, this will repeat an insert after every nth position.
evaluating (field1 <=> NULL) returns 0 (zero) if the field is not null and 1 (one) if the field is
null. Adding 1 (one) to this result provides positional information that fits what 'elt' expects.
elt will return "not null" (position 1) if the evaluation of ((field1 <=> NULL) + 1) = 1
This can be altered to output messages based on any test that I've tried. Just remember that
'elt' returns null or 1 for a comparison so you need to add 1 (one) to that result to be able to
choose between different messages.
Posted by J Vera on October 26, 2006
As above I couldn't find a function for splitting strings based on a character set rather than
string position, where the results were independent of substring lengths. I used this query to
split out the Swiss-Prot accession numbers from BLAST result subject ID's, which are
bracketed by pipe ('|') characters, but any two relatively unique characters should work.
select left(substring(<columnName>,locate('|',<columnName>)+1),
locate('|',substring(<columnName>,
locate('|',<columnName>)+1))-1)
as '<resultColumnName>' from <table>
Posted by Giovanni Campagnoli on December 20, 2006
This is the php function strip_tags
delimiter ||
UPDATE temp SET string = TRIM(BOTH ',' FROM REPLACE(CONCAT("," , string, ","),
CONCAT(",",'value_to_remove', ",") , ',')) WHERE id=1
Posted by Robert Glover on February 13, 2007
There is a simple way to convert the following Oracle usage of decode into MySql:
Oracle version:
DELIMITER //
DROP FUNCTION IF EXISTS get_mcv;
CREATE FUNCTION get_mcv (list text(10000)) RETURNS text(1000)
BEGIN
DECLARE cnt int(10);
DECLARE iter_cnt int(10);
DECLARE item text(100);
DECLARE f_item text(100);
DECLARE prv_cnt int(10) default 0;
DECLARE nxt_cnt int(10) default 0;
set iter_cnt=iter_cnt+1;
end while;
RETURN f_item;
END
//
------------------------------
DELIMITER $$
until l1 > 4
end repeat;
return diff;
END$$
DELIMITER ;
----------------------
other DBMS have this function and i kinda needed one. so looked and mysql's online docs
shows a DIFFERENCE function but that was for GIS apps and isnt current implemented.
just change the "user@hostname" and the "db.function_name" to reflect your info.
returns an INT value from 0 to 4, where 0 means the SOUNDEX of each string doesnt have
any same value. 4 means each 4 alphanumeric digit is the same:
returns
so DIFF3("hello", "jello")
returns a 3
while DIFF3("hello","great")
returns a 1
Any numbers that generate more than the number of digits (4 in this case) would be truncated
from the left:
delimiter ||
DROP FUNCTION IF EXISTS locatelast||
CREATE FUNCTION locatelast(s VARCHAR(1000), ss VARCHAR(1000)) RETURNS
TINYINT(3) UNSIGNED LANGUAGE SQL NOT DETERMINISTIC READS SQL DATA
BEGIN
DECLARE last TINYINT(3) UNSIGNED;
DECLARE offset TINYINT(3) UNSIGNED;
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET s = NULL;
SET last = 0;
SET offset = 1;
REPEAT
IF NOT ISNULL(s) AND offset > 0 THEN
SET offset = LOCATE(ss, s, offset);
IF offset > 0 THEN
SET last = offset;
SET offset = offset + 1;
END IF;
END IF;
UNTIL ISNULL(s) OR offset = 0 END REPEAT;
RETURN last;
END;
Instead of looping through the string to look for the last occurrence, simply reverse() the
string and look for the first occurrence, then substract the found position from the string
length:
The default is 8192 (bytes), and if the result is bigger, it will be silently cropped, leading to
unexpected results.
Some examples here: http://confronte.com.ar/groupconcat
Posted by Axel Axel on December 14, 2007
If you want to compare an empty string to a numeric value or an integer field, you'll have to
CAST the integer field or value to a string, due to the fact that for mysql, a zero (the integer
one) equals to an empty string
Example :
SELECT 0 = '';
==> 1
This is common when you want to check user input : if a user inputs a "0" for a field, the
check without cast will fail because mysql thinks this is an empty string.
CREATE TEMPORARY TABLE temp (id TINYINT NOT NULL auto_increment, val
CHAR(20), PRIMARY KEY(id));
SET input=REPLACE(input, ",", "'),('");
SET @dyn_sql=CONCAT("INSERT INTO temp (val) VALUES ('",input,"');");
PREPARE s1 FROM @dyn_sql; EXECUTE s1;
SELECT * FROM temp;
That is a great example. Here is how I used a very similar example to find a contact's last
name from a contacts database by sub string on the last instance of the space:
Something that would more elegant would be to have the LOCATE function include a
direction option, like:
if you want to find the last occurrence of a particular string, use the tools mysql provides for
you:
select reverse( substring( reverse( field ), locate( 'xyz', reverse( field ) )+1 ) )
---
this is way easier to implement and debug
DELIMITER $$
DROP FUNCTION IF EXISTS `initcap`$$
CREATE DEFINER=`root`@`%` FUNCTION `initcap`(x varchar(255)) RETURNS
varchar(255) CHARSET utf8
begin
set @l_str='';
set @r_str='';
else
end if;
end$$
DELIMITER ;
There are several stored procedures (e.g., see post of Grigory Dmitrenko) to transform
string like 'a,b,c' into something to be used like
....WHERE x IN ('a','b','c').
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(url,'://',-1),'/',1)
FROM urls
It works for URLs with and without http(s). But doesn't work if you have local URLs without
a leading slash like "folder/index.html". In that case it extracts "folder" instead of an empty
string.
Posted by Ilde Giron on November 28, 2008
I recently found that after filling a table with info from a csv file created with MS Excel, an
unwanted character went into the end of a field, and it showed up as "^M". So, when I issued
a
mysql> select description from catalog;
I used next command to remove it (most, but not all of the rows in the table were
contaminated):
mysql> update catalog set description = left(description,length(description) -1) where
description like "%^M%";
Please note that to replicate that "^M" you must press <ctrl> and v --though no character will
be displayed-- and then <ctrl> and m.
Posted by Steve Klein on February 16, 2009
You can use REVERSE to parse the last token from a string. This can be useful for name
processing, for instance (first name is everything except last token and last name is last
token):
SELECT
REVERSE(SUBSTR(REVERSE(name),INSTR(REVERSE(name),' ')+1)) AS first,
REVERSE(SUBSTR(REVERSE(name),1,INSTR(REVERSE(name),' '))) AS last
FROM table
Posted by Phil Barone on March 6, 2009
Thanks Ilde for sharing your <control><m> example and especially how to type the
<control><m> into the console.
One minor change, since you are only replacing the one character at the end is, change the
where clause to
-- ***********************************
DELIMITER $$
DROP FUNCTION IF EXISTS `initcap` $$
CREATE FUNCTION `initcap`(x char(30)) RETURNS char(30) CHARSET utf8
BEGIN
SET @str='';
SET @l_str='';
WHILE x REGEXP ' ' DO
SELECT SUBSTRING_INDEX(x, ' ', 1) INTO @l_str;
SELECT SUBSTRING(x, LOCATE(' ', x)+1) INTO x;
SELECT CONCAT(@str, ' ',
CONCAT(UPPER(SUBSTRING(@l_str,1,1)),LOWER(SUBSTRING(@l_str,2)))) INTO
@str;
END WHILE;
RETURN LTRIM(CONCAT(@str, ' ',
CONCAT(UPPER(SUBSTRING(x,1,1)),LOWER(SUBSTRING(x,2)))));
END $$
DELIMITER ;
-- ***********************************
One gotcha to note: this method strips out any leading and trailing spaces from the input,
which really isn't that big of a deal, but something to keep in mind.
Posted by Ed Anderson on May 28, 2009
The ^M character is the DOS EOL character - and you can avoid the entire problem by
dumping the file from Excel to a CSV file - if you're running in UNIX/Linux you can use the
"dos2unix" utility which will strip out the ^M's and leave you with a portable file. Just my
two cents.
Posted by Kim TongHyun on July 23, 2009
I modified the function strSplit(from Chris Stubben) for utf8.
select replace(
substring_index(field, delim, pos),
substring_index(field, delim, pos - 1),
'')
from table;
select substring_index(
substring_index(field, 'xyz', pos)
, 'xyz', -1)
from table;
that will get the last element of the list of x that were found. which should be the one you
want.
SELECT
stringfield,
LENGTH(stringfield)-LENGTH(REPLACE(stringfield,'@',''))
FROM tablename
IF LENGTH(string)>0 THEN
IF LOCATE(' ',string) > 0 OR LOCATE('.',string) OR LOCATE('(',string) > 0 OR
LOCATE('¿',string) THEN
REPEAT
IF upperchar = 1 THEN
SET final_string = CONCAT(final_string,UPPER(SUBSTRING(string,char_index,1)));
SET upperchar = 0;
ELSE
SET final_string = CONCAT(final_string,SUBSTRING(string,char_index,1));
END IF;
IF (SUBSTRING(string,char_index,1) = ' ') OR (SUBSTRING(string,char_index,1) = '.') OR
(SUBSTRING(string,char_index,1) = '(') OR (SUBSTRING(string,char_index,1) = '¿')
THEN
SET upperchar = 1;
END IF;
SET char_index = char_index + 1;
UNTIL char_index > LENGTH(string)
END REPEAT;
ELSE
SET final_string = CONCAT(UPPER(SUBSTRING(string,1,1)),SUBSTRING(string,2));
END IF;
ELSE
SET final_string = string;
END IF;
RETURN final_string;
END
Posted by Claude Warren on February 10, 2010
I needed a way to parse UTF-8 strings into words, not finding any mechanism that would
allow me to specify a list of characters to split on I hit upon using regexp and string
manipulation to parse the string. The following is a function to find regex defined positions in
a string and a procedure to break words out of a string based on regex.
delimiter $$
--
-- This function will return the first position in p_str where the regexp is true
--
drop function if exists regexPos $$
create function regexPos( p_str TEXT, p_regex varchar(250) ) returns int
BEGIN
declare v_pos int;
declare v_len int;
set v_pos=1;
set v_len=1+char_length( p_str );
while (( substr( p_str, 1, v_pos) REGEXP p_regex)=0 and (v_pos<v_len))
do
set v_pos = v_pos + 1;
end while;
return v_pos-1;
end $$
--
-- This procedure parses p_str into words based on the regular expression p_regex.
-- The simplest usage is call ParseWords( "some string", "[[:space:]]" );
-- this will break the string on spaces.
CREATE procedure ParseWords (IN p_str TEXT, IN p_regex varchar(256))
begin
declare v_startPos int;
declare v_strLen int;
declare v_wordLen int;
set v_startPos=1;
set v_strLen=char_length( p_str )+1;
delimiter ;
DELIMITER $$
set v_endpos=1;
set v_len=1+char_length( p_str );
while (( substr( p_str, 1, v_endpos) REGEXP p_regex)=0 and (v_endpos<v_len))
do
set v_endpos = v_endpos + 1;
end while;
return v_endpos;
END $$
DELIMITER ;
Here is a quick and dirty find of start position. It will find the minimal match instead of the
maximal pattern match. Please feel free to modify this to find the maximal pattern match.
DELIMITER $$
return v_startpos;
END $$
DELIMITER ;
The extract uses the above two functions, so it will likewise extract the minimal pattern.
DELIMITER $$
return mid(p_str,startpos,endpos-startpos+1);
end $$
DELIMITER ;
Posted by Vector Thorn on June 7, 2010
I don't know if anyone else ever needed to convert to base32, in a non-standardized way; but
if so, i found a great way to do it here, and they even provide mysql functions for it:
http://ionisis.com/?a=WCMS_Page_Display&id=27576001275882717
To work around this limitation, one can use nested IF() functions like this:
This inserts 'edient(s)' between the 'g' and ':' in 'Ing:', giving us 'Ingredient(s):' It also tests
t.field_name to see if it is to be updated. This is based on knowing that the first three charters
in t.field_name will be 'Ing:' or not, and if it is then we spell it out. You can even expand the
spelling in the t.field_name, not just the start or end of it, as this might suggest. Use
INSTR(t.field_name, 'str_to_expand'), so it would end up looking like:
If you know that you have the abbreviation in more than just one place within a field (aka
column) then just run the command again. In both cases the number zero '0' is the key, it tells
the INSERT command not to overwrite any of the following charters, just insert the requested
sub-string.
Posted by Benjamin Bouraada on October 27, 2010
Stringcutting:
DELIMITER |
CREATE PROCEDURE `SPLIT_STRING` (IN `MY_STRING` TEXT, IN
`MY_DELIMITER` TEXT)
LANGUAGE SQL
SQL SECURITY INVOKER
BEGIN
#-------------------------------------------------------------------------------
IF NOT ISNULL(MY_STRING) THEN
IF NOT ISNULL(MY_DELIMITER) THEN
#
SET @SS = TRIM(MY_STRING);
SET @DEL = TRIM(MY_DELIMITER);
#
IF LENGTH(@SS) > 0 THEN
IF LENGTH(@DEL) > 0 THEN
#
SET @DP = (SELECT LOCATE(@DEL, @SS, 1));
IF @DP > 0 THEN
#------------------------CREATE TEMP TABLE-----------------------
DROP TABLE IF EXISTS `TEMPORARY_TABLE_OF_SPLIT_STRINGS`;
#
CREATE TEMPORARY TABLE `TEMPORARY_TABLE_OF_SPLIT_STRINGS` (
`SUB_STRING` text CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL
)
ENGINE=INNODB
CHARACTER SET utf8
COLLATE utf8_general_ci ;
#----------------------------------------------------------------
SET @SS_2 = @SS;
#
REPEAT
#
SET @FIRST_ELEMENT = (SELECT SUBSTRING_INDEX(@SS_2, @DEL, 1));
SET @SS_2 = (SELECT TRIM(LEADING CONCAT(@FIRST_ELEMENT, @DEL)
FROM @SS_2));
#
INSERT INTO `TEMPORARY_TABLE_OF_SPLIT_STRINGS` (`SUB_STRING`)
VALUES (@FIRST_ELEMENT);
SET @DP = (SELECT LOCATE(@DEL, @SS, @DP + 1));
#
IF @DP = 0 THEN
SET @LAST_ELEMENT = (SELECT SUBSTRING_INDEX(@SS_2, @DEL, -1));
INSERT INTO `TEMPORARY_TABLE_OF_SPLIT_STRINGS` (`SUB_STRING`)
VALUES (@LAST_ELEMENT);
END IF;
UNTIL @DP = 0
END REPEAT;
#
SELECT * FROM TEMPORARY_TABLE_OF_SPLIT_STRINGS;
#----------------------------------------------------------------
DROP TABLE IF EXISTS `TEMPORARY_TABLE_OF_SPLIT_STRINGS`;
#----------------------------------------------------------------
ELSE
SELECT NULL;
END IF;
ELSE
SELECT NULL;
END IF;
ELSE
SELECT NULL;
END IF;
ELSE
SELECT NULL;
END IF;
ELSE
SELECT NULL;
END IF;
END; |
DELIMITER ;
Posted by Juan Andrés Calleja on February 14, 2011
Hi, I found one usefully example in mysql forums (http://lists.mysql.com/mysql/199134) for
split a varchar to use in the 'IN' clause in query. I create new input parameter with table name
for use in dynamic CREATE table statement.
Thus split function can be used more than once in the same stored procedure because table
have any name.
DELIMITER $$
IF cur_position = 0 THEN
SET cur_string = remainder;
ELSE
SET cur_string = LEFT(remainder, cur_position - 1);
END IF;
Here the difference between the string functions LOCATE and FIND_IN_SET is..
example:
If I need to return 1 if 2 is in the set '1,2,3,4,5'.
SELECT IF(LOCATE(2,'1,2,3,4,5,6,7,8,9')>0,1,0);
You know very well it return 1,because the set contains value 2 in given set.
SO it is no problem...
SELECT IF(LOCATE(2,'11,12,3,4,5,6,7,8,9')>0,1,0);
even though 2 is not available in set,it gives 1.
here LOCATE function takes the set as the STRING not the comma(,) separated value..
In this situation Please use the FIND_IN_SET - which is great function for the comma(,)
separated value set.
Now,
SELECT IF(FIND_IN_SET(2,'11,12,3,4,5,6,7,8,9')>0,1,0);
It returns 1 as we expected...
Note:
1.Use LOCATE function for alphabetic strings only..
2.And also use LOCATE for numeric numbers that set contains the numbers only
0,1,2,3,4,5,6,7,8,9
i.e.,
SELECT IF(LOCATE(input,'0,1,2,3,4,5,6,7,8,9')>0,1,0);
http://thesocialexpo.com/?a=SUBS_Blog_Display&id=13023741560341786
Posted by Lies DJILLALI on August 31, 2011
Thank you Giovanni for your strip_tags function,
Here is a patched version because Mysql crashed when I tryied to proceed a NULL value
delimiter ||
DELIMITER $$
DROP FUNCTION IF EXISTS `initcap`$$
CREATE FUNCTION `initcap`(x varchar(255)) RETURNS varchar(255) CHARSET utf8
DETERMINISTIC
begin
set @out_str='';
set @l_str='';
set @r_str='';
DELIMITER ;
Posted by Paul Caskey on November 10, 2011
For DATE and DATETIME operations, MySQL demands the year be first, then month, then
day. In the USA, a common date format is month-day-year . This example converts a date
from MM-DD-YYYY format to MySQL's preferred YYYY-MM-DD. It also works on M-D-
YY input, or other shortcut forms. Change the '-' separate to '.' or whatever you need. This
takes 6-13-2011 and returns a STRING of '2011-6-13':
Now you can CAST this to a DATE, and then it will ORDER BY or GROUP BY properly.
E.g. this takes '11.1.2011' and returns a real DATE of 2011-11-01. As usual I'm sure there are
other ways to do this. I was just happy to figure this out without resorting to PHP or Perl.
So I ran...
Where "id" is just some unique identifier for each field you're splitting up and "list" is the list
of separated values. The thought behind the query is to just join a table to itself, rank the rows
for each id, then only show rows where the rank is less than the number of occurrences of the
separator in the list you're splitting up. The outter most select then shows the value in
between the rank and rank + 1 occurrence of the separator in the list.
This may not work if some of the lists don't have any occurrence of the separator.
so if you want to check if the contents of one column are in another column
http://www.edmondscommerce.co.uk/mysql/compare-two-columns-in-mysql/
Posted by halászsándor halászsándor on February 13, 2012
I had the same problem that Edmonds described, and for that I used this expression:
-Mo
Ex, for the text "0,1,3,5,6", you wanna get the third element. This would do the trick:
P.S..: This seems a simplification of the previous example (which I managed to miss)
Posted by die manto on February 7, 2013
Tip to compare two columns:
SELECT *
FROM
`table`
WHERE
`col1` LIKE CONCAT('%', `col2`, '%')
OR col2 LIKE CONCAT('%',`col1`,'%')
Posted by http://www.competenciaperfecta.com/
This code updates the column named 'phone_number' in the table called 'user' by
concatenating '0' in front of the new phone_number.
The new phone_number is old phone_number minus the first 4 characters or beginning from
the 5th character.
The update will only be applied to the records with id between 3 and 30 exclusive.
REPEAT
SET @char= MID(p_string,@x,1);
IF @uc=1 THEN
SET @out= CONCAT(@out,UPPER(@char));
ELSE
SET @out= CONCAT(@out,LOWER(@char)) ;
END IF;
SET @x= @x + 1;
UNTIL @x > @len END REPEAT;
RETURN @out;
END
Posted by Shivakumar Durg on August 7, 2014
The following formula can be used to extract the Nth item in a delimited list.
SET x := x + 1;
UNTIL x > input_string_length END REPEAT;
END IF;
RETURN output_string;
END
#########################
## Usage (all lower case input):
select str_titlecase('i am a cat') as title from dual;
## Results:
title
------
I Am A Cat
## Results:
title
------
## Results:
title
------
I Am A Dolphin
## Results:
title
------
I Am The Product Of Your Imagination
REPEAT
SET endpos = LOCATE(delim, haystack, inipos);
SET item = SUBSTR(haystack, inipos, endpos - inipos);
RETURN needleFound;
END
Posted by Jens Walte on April 29, 2015
fastest split() function
/**
* #1: this way is 10-20% faster than #3 and supports not included indexes otherwise than #2
*
* @example: split('a|bbb|cc', '|', 0) -> 'a'
* @example: split('a|bbb|cc', '|', 1) -> 'bbb'
* @example: split('a|bbb|cc', '|', 2) -> 'cc'
* @example: split('a|bbb|cc', '|', 3) -> ''
*/
substring_index(substring_index(concat(content, delimiter), delimiter, index+1), delimiter,
-1);
/**
* #2: faster than #3, but not included index will return last entry
*
* @see: Posted by Mariano Otero on June 22 2012 3:43pm
* @example: split('a|bbb|cc', '|', 0) -> 'a'
* @example: split('a|bbb|cc', '|', 1) -> 'bbb'
* @example: split('a|bbb|cc', '|', 2) -> 'cc'
* @example: split('a|bbb|cc', '|', 3) -> 'cc' (unexpected)
*/
substring_index(substring_index(content, delimiter, index+1), delimiter, -1);
/**
* #3: first introduced split example, supports not included indexes
*
* @see: Posted by Bob Collins on March 17 2006 8:56pm
* @example: split('a|bbb|cc', '|', 0) -> 'a'
* @example: split('a|bbb|cc', '|', 1) -> 'bbb'
* @example: split('a|bbb|cc', '|', 2) -> 'cc'
* @example: split('a|bbb|cc', '|', 3) -> ''
*/
replace(substring(substring_index(content, delimiter, index+1),
length(substring_index(content, delimiter, index)) + 1), delimiter, '');
he uses this effectively simplified formula to retrieve the last occurrence of an index (here's
the space " " ):
to retrieve the leftmost characters before the index of the given string he utilises the "left"
function : but to retrieve the rightmost ones he did :
substr(@string,@loc+1)
but the implementation should be more symmetric when it will be used the "right" function.
So that is :