How Strings Are Stored
How Strings Are Stored
SET ANSI_PADDING { ON | OFF } Controls the way the column stores values shorter than the defined size of the column, and the way the column stores values that have trailing blanks in char, varchar, binary, and varbinary data. When padded, char columns are padded with blanks, and binary columns are padded with zeros. When trimmed, char columns have the trailing blanks trimmed, and binary columns have the trailing zeros trimmed.
Searching Text
ANSI_PADDING Setting
char(n) NOT NULL or binary(n) NOT NULL Pad original value (with trailing blanks for char columns and with trailing zeros for binary columns) to the length of the column. char(n) NULL or binary(n) NULL varchar(n) or varbinary(n)
ANSI_PADDING Setting
char(n) NOT NULL or binary(n) NOT NULL Pad original value (with trailing blanks for char columns and with trailing zeros for binary columns) to the length of the column. char(n) NULL or binary(n) NULL Follows same rules as for varchar or varbinary when SET ANSI_PADDIN G is OFF.
Setting
Setting
varchar(n) or varbinary(n) Trailing blanks in character values inserted into a varchar column are trimmed. Trailing zeros in binary values inserted into a varbinary column are trimmed.
ON
Follows same rules as for char(n) or binary(n) NOT NULL when SET ANSI_PADDIN G is ON.
Trailing blanks in character values inserted into varchar columns are not trimmed. Trailing zeros in binary values inserted into varbinary columns are not trimmed. Values are not padded to the length of the column.
3
OFF
ANSI_PADDING Setting
The SET ANSI_PADDING setting does not affect the nchar, nvarchar, ntext, text, image, and large value. They always display the SET ANSI_PADDING ON behavior. This means trailing spaces and zeros are not trimmed. ANSI_PADDING should always be set to ON.
Example
PRINT 'Testing with ANSI_PADDING ON' SET ANSI_PADDING ON; GO CREATE TABLE t1 ( charcol CHAR(16) NULL, varcharcol VARCHAR(16) NULL, varbinarycol VARBINARY(8) ); GO INSERT INTO t1 VALUES ('No blanks', 'No blanks', 0x00ee); INSERT INTO t1 VALUES ('Trailing blank ', 'Trailing blank ', 0x00ee00); SELECT 'CHAR' = '>' + charcol + '<', 'VARCHAR'='>' + varcharcol + '<', varbinarycol FROM t1; GO
6
Output
CHAR ----------------->No blanks < >Trailing blank < (2 row(s) affected) VARCHAR ----------------->No blanks< >Trailing blank < varbinarycol -----------------0x00EE 0x00EE00
Example
PRINT 'Testing with ANSI_PADDING OFF'; SET ANSI_PADDING OFF; GO CREATE TABLE t2 ( charcol CHAR(16) NULL, varcharcol VARCHAR(16) NULL, varbinarycol VARBINARY(8) ); GO INSERT INTO t2 VALUES ('No blanks', 'No blanks', 0x00ee); INSERT INTO t2 VALUES ('Trailing blank ', 'Trailing blank ', 0x00ee00); SELECT 'CHAR' = '>' + charcol + '<', 'VARCHAR'='>' + varcharcol + '<', varbinarycol FROM t2; GO DROP TABLE t1 DROP TABLE t2
7 8
Output
CHAR ----------------->No blanks< >Trailing blank< (2 row(s) affected) VARCHAR ----------------->No blanks< >Trailing blank< varbinarycol -----------------0x00EE 0x00EE
Comparison of Strings
When you compare character string data, the logical sequence of the characters is defined by the collation of the character data. The result of comparison operators such as < and > are controlled by the character sequence defined by the collation. The same SQL Collation might have different sorting behavior for Unicode and non-Unicode data.
10
String Equivalence
Trailing blanks are ignored in comparisons; for example, these are equivalent: WHERE LastName = 'White' WHERE LastName = 'White ' WHERE LastName = 'White
LIKE
Determines whether a specific character string matches a specified pattern. A pattern can include regular characters and wildcard characters. During pattern matching, regular characters must exactly match the characters specified in the character string. However, wildcard characters can be matched with arbitrary fragments of the character string.
11
12
Syntax
match_expression [ NOT ] LIKE pattern [ ESCAPE escape_character ] match_expression: Is any valid expression of character data type pattern: Is the specific string of characters to search for in match_expression, and can include the wildcard characters. pattern can be a maximum of 8,000 bytes. Returns true if match_expression matches pattern
Wildcard Characters
Wildcard character %
Example WHERE title LIKE '%computer%' finds all book titles with the word 'computer' anywhere in the book title. WHERE au_fname LIKE '_ean' finds all four-letter first names that end with ean (Dean, Sean, and so on).
_ (underscore)
13
14
Wildcard Characters
Wildcard character Description Example WHERE au_lname LIKE '[CP]arsen' finds author last names ending with arsen and starting with any single character between C and P, for example Carsen, Larsen, Karsen, and so on. WHERE au_lname LIKE 'de[^l]%' all author last names starting with de and where the following letter is not l.
Syntax
escape_character: Is a character that is put in front of a wildcard character to indicate that the wildcard should be interpreted as a regular character and not as a wildcard. escape_character is a character expression that has no default and must evaluate to only one character.
[]
Any single character within the specified range ([a-f]) or set ([abcdef]). Any single character not within the specified range ([^a-f]) or set ([^abcdef]).
[^]
15
16
Example
-- ASCII pattern matching with char column CREATE TABLE t (col1 char(30)); INSERT INTO t VALUES ('Robert King'); SELECT * FROM t WHERE col1 LIKE '% King' -- returns 1 row
17
18
Example
-- Unicode pattern matching with nchar column CREATE TABLE t (col1 nchar(30)); INSERT INTO t VALUES ('Robert King'); SELECT * FROM t WHERE col1 LIKE '% King' -- no rows returned -- Unicode pattern matching with nchar column and RTRIM CREATE TABLE t (col1 nchar (30)); INSERT INTO t VALUES ('Robert King'); SELECT * FROM t WHERE RTRIM(col1) LIKE '% King' -- returns 1 row
19
RTRIM
RTRIM ( character_expression ) Returns a character string after truncating all trailing blanks.
20
Remarks
When you perform string comparisons by using LIKE, all characters in the pattern string are significant. This includes leading or trailing spaces. If a comparison in a query is to return all rows with a string LIKE 'abc ' (abc followed by a single space), a row in which the value of that column is abc (abc without a space) is not returned. However, trailing blanks, in the expression to which the pattern is matched, are ignored in ASCII pattern matching. If a comparison in a query is to return all rows with the string LIKE 'abc' (abc without a space), all rows that start with abc and have zero or more trailing blanks are returned.
21
Remarks
A string comparison using a pattern that contains char and varchar data may not pass a LIKE comparison because of how the data is stored.
22
Example
USE AdventureWorks; GO CREATE PROCEDURE FindEmployee @EmpLName char(20) AS SELECT @EmpLName = RTRIM(@EmpLName) + '%'; SELECT c.FirstName, c.LastName, a.City FROM Person.Contact c JOIN Person.Address a ON c.ContactID = a.AddressID WHERE c.LastName LIKE @EmpLName; GO EXEC FindEmployee @EmpLName = 'Barb'; GO
23
Example
In the FindEmployee procedure, no rows are returned because the char variable (@EmpLName) contains trailing blanks whenever the name contains fewer than 20 characters. Because the LastName column is varchar, there are no trailing blanks. This procedure fails because the trailing blanks in the pattern are significant However, the following example succeeds because trailing blanks are not added to a varchar variable.
24
Example
USE AdventureWorks; GO CREATE PROCEDURE FindEmployee @EmpLName varchar(20) AS SELECT @EmpLName = RTRIM(@EmpLName) + '%'; SELECT c.FirstName, c.LastName, a.City FROM Person.Contact c JOIN Person.Address a ON c.ContactID = a.AddressID WHERE c.LastName LIKE @EmpLName; GO EXEC FindEmployee @EmpLName = 'Barb';
Output
FirstName LastName City ---------- ---------------------------------------Angela Barbariol Snohomish David Barber Snohomish (2 row(s) affected)
25
26
NOT LIKE
If preceeded by NOT, LIKE returns true if the match expression does not match the pattern
ESCAPE Characters
You can search for character strings that include one or more of the special wildcard characters. For example, a sample database contains a column named comment that contains the text 30%. To search for any rows that contain the string 30% anywhere in the comment column, specify a WHERE clause such as WHERE comment LIKE '%30!%%' ESCAPE '!'. If ESCAPE and the escape character are not specified, the Database Engine returns any rows with the string 30!.
27
28
ESCAPE Characters
The character after the escape character is interpreted literally, not as a wildcard character
29
30
31
32
Full-text catalog
A full-text catalog contains zero or more full-text indexes. Full-text catalogs must reside on a local hard drive associated with the instance of SQL Server. Each catalog can serve the indexing needs of one or more tables within a database..
33
34
Full-Text Engine
Full-Text Search in Microsoft SQL Server 2005 is powered by the Microsoft Full-Text Engine for SQL Server (MSFTESQL). The MSFTESQL service has two roles, namely indexing support and querying support. It is a separate process
35
36
Stemmers
For a given language, a stemmer generates inflectional forms of a particular word based on the rules of that language. Stemmers are language specific
Filters
When a cell in a varbinary(max), or image column contains a document with a certain file extension, fulltext search uses a filter to interpret the binary data. The filter extracts the textual information from the document and submits it for indexing.
37
38
Filters
Many document types can be stored in a single varbinary(max), or image column. For each document type, SQL Server chooses the correct filter based on the file extension. Because the file extension is not visible when the file is stored in a varbinary(max), or image column, the file extension must be stored in a separate column in the table, called a type column. This type column can be of any character-based data type and contains the document file extension, such as .doc for a Microsoft Word document.
39
Filters
In the Document table in Adventure Works, the Document column is of type varbinary(max), and the FileExtension column is of type nvarchar(8). When creating a full-text index on a varbinary(max), or image column you must identify a corresponding type column that has the extension information so that SQL Server knows which filter to use.
40
Index Population
The process of creating and maintaining a full-text index is called index population. Types of full-text index population: Full population Change tracking-based population Incremental timestamp-based population
41
42
Full Population
Typically occurs when a full-text catalog or full-text index is first populated. The indexes can then be maintained using change tracking or incremental timestamped-based populations. During a full population of a full-text catalog, index entries are built for all the rows in all the tables covered by the catalog. If a full population is requested for a table, index entries are built for all the rows in that table.
43
44
DocumentID 1 2 3
Title Crank Arm and Tire Maintenance Front Reflector Bracket and Reflector Assembly 3 Front Reflector Bracket Installation
46
45
48
Index Creation
Two steps: Create a full-text catalog to store full-text indexes. Create full-text indexes.
49
50
Example
To create a full-text catalog named AdvWksDocFTCat, use the CREATE FULLTEXT CATALOG statement as shown below. CREATE FULLTEXT CATALOG AdvWksDocFTCat
Example
CREATE FULLTEXT INDEX ON Production.Document ( Document --full-text index column name TYPE COLUMN FileExtension --name of column that contains file type information Language 0X0 --0X0 is LCID for neutral language ) KEY INDEX PK_Document_DocumentID --Unique index ON AdvWksDocFTCat WITH CHANGE_TRACKING AUTO --Population type GO
51
52
Example
CHANGE_TRACKING AUTO Specifies that SQL Server automatically updates the full-text index as the data is modified in the associated tables. AUTO is the default.
Full-Text Searching
To seach full-text use the CONTAINS predicate in the WHERE clause of a query Searching for Specific Word or Phrase (Simple Term) Searching for the Inflectional Form of a Specific Word (Generation Term) Performing Prefix Searches Querying varbinary(max) and xml Columns Searching for Words or Phrases Close to Another Word or Phrase (Proximity Term)
53
54
CONTAINS Syntax
CONTAINS ( { column_name | (column_list) | * } , '< contains_search_condition >' [ , LANGUAGE language_term ] ) < contains_search_condition > ::= { < simple_term > | < prefix_term > | < generation_term > | < proximity_term > } | < contains_search_condition > { < AND > | < AND NOT > | < OR > } <contains_search_condition > } [ ...n ]
55
CONTAINS Syntax
< simple_term > ::= word | " phrase " < prefix term > ::= { "word * " | "phrase *" } < generation_term > ::= FORMSOF ( { INFLECTIONAL | THESAURUS } , < simple_term > [ ,...n ] ) < proximity_term > ::= { < simple_term > | < prefix_term > } { { NEAR | ~ } { < simple_term > | < prefix_term >} } [ ...n ]
56
CONTAINS Syntax
{ column_name | (column_list) | * } indicate in which column to search <simple_term> Specifies a match for an exact word or a phrase. Examples of valid simple terms are "blue berry", blueberry, and "Microsoft SQL Server". Phrases should be enclosed in double quotation marks (""). Words in a phrase must appear in the database in the same order as specified in <contains_search_condition>.
57
Simple Term
<simple_term> The search for characters in the word or phrase is not case sensitive. Noise words (such as a, and, or the) in full-text indexed columns are not stored in the full-text index. If a noise word is used in a single word search, SQL Server returns an error message indicating that the query contains only noise words. Punctuation is ignored. Therefore, CONTAINS(testing, "computer failure") matches a row with the value, "Where is my computer? Failure to find it would be expensive."
58
Example
SELECT Name, ListPrice FROM Production.Product WHERE ListPrice = 80.99 AND CONTAINS(Name, 'Mountain'); GO
59
60
Example
SELECT Name FROM Production.Product WHERE CONTAINS(Name, ' "Mountain" OR "Road" ') GO
Prefix Term
< prefix term > ::= { "word * " | "phrase *" } All entries in the column that contain text beginning with the specified prefix will be returned. For example, to search for all rows that contain the prefix top-, as in topple, topping, and top itself, the query looks like this SELECT Description, ProductDescriptionID FROM Production.ProductDescription WHERE CONTAINS (Description, ' "top*" ' ); GO
61 62
Generation Term
< generation_term > Specifies a match of words when the included simple terms include variants of the original word for which to search. INFLECTIONAL Specifies that the language-dependent stemmer is to be used on the specified simple term. Stemmer behavior is defined based on stemming rules of each specific language. The neutral language does not have an associated stemmer. The column language of the column(s) being queried is used to refer to the desired stemmer. If language_term is specified, the stemmer corresponding to that language is used.
63
64
Generation Term
THESAURUS Specifies that the thesaurus corresponding to the column full-text language, or the language specified in the query is used. The longest pattern or patterns from the simple_term are matched against the thesaurus and additional terms are generated to expand or replace the original pattern. If a match is not found for all or part of the simple_term, the non-matching portion is treated as a simple_term.
65
Proximity Term
<proximity_term> Specifies a match of words or phrases that must be close to one another. <proximity_term> operates similarly to the AND operator: both require that more than one word or phrase exist in the column being searched. As the words in <proximity_term> appear closer together, the better the match
67
68
69