Character Functions:: Functions That Change The Case of Characters
Character Functions:: Functions That Change The Case of Characters
And li!e most programming languages the "A" "ystem provides an e#tensive li$rary of %$uilt&in'functions. "A" has more than ()* functions for a variety of programming tas!s. This tutorial +ill cover the synta# for invo!ing functions an overvie+ of the functions availa$le e#amples of commonly used functions selected character handling and numeric functions and some tric!s and applications of functions that +ill surprise you.
Character Functions:
A ma,or strength of "A" is its a$ility to +or! +ith character data. The "A" character functions are essential to this. The collection of functions and call routines in this chapter allo+ you to do e#tensive manipulation on all sorts of character data.
T+o old functions U-CA". and /O0CA". change the case of characters. A ne+ function 1as of 2ersion )3 -4O-CA". 1proper case3 capitalizes the first letter of each +ord
Function: UPCASE
Pur%ose: To change all letters to uppercase. Note5 The corresponding function /O0CA". changes uppercase to lo+ercase. S&nta': UPCASE(character-value)
"here character-value is any "A" character e#pression. If a length has not $een previously assigned the length of the resulting varia$le +ill $e the length of the argument
Progra)1:-
/.N:T; A < C 6 . = (9 IN-UT A < C 6 . 8 >9 6ATA/IN."9 7f-p6(? mfmF7@A 9 6ATA U--.49 ".T 7I8.69 A44A> A//BCCDE BC;A4ACT.4B9 6O I F ( TO 6I71A//BC39 A//BCCIE F U-CA".1A//BCCIE39 .N69 64O- I9 4UN9 -4OC -4INT 6ATAFU--.4 NOO<"9 TIT/. G/isting of 6ata "et U--.4G9 4UN9
.#planation5&
4emem$er that upper& and lo+ercase values are represented $y different internal codes so if you are testing for a value such as > for a varia$le and the actual value is y you +ill not get a match. Therefore it is often useful to convert all character values to either upper& or lo+ercase $efore doing your logical comparisons. In this program BC;A4ACT.4B is used in the array statement to represent all the character varia$les in the data set 7I8.6. Inspection of the listing $elo+ verifies that all lo+ercase values +ere changed to uppercase
2)Function: !O"CASE
Pur%ose: To change all letters to lo+ercase. S&nta': !O"CASE(character-value) character-value is any "A" character e#pression. Note: The corresponding function U-CA". changes lo+ercase to uppercase
Progra)1:-
Progra) to ca%ita*i.e the first *etter of the first and *ast na)e (using SU/ST$)
6ATA CA-ITA/IH.9 INFO47AT FI4"T /A"T =@*.9 IN-UT FI4"T /A"T9 FI4"T F /O0CA".1FI4"T39 /A"T F /O0CA".1/A"T39 "U<"T41FI4"T ( (3 F U-CA".1"U<"T41FI4"T ( (339 "U<"T41/A"T ( (3 F U-CA".1"U<"T41/A"T ( (339 6ATA/IN."9 ronald cO6y T;oma" e6I"ON al$ert einstein 9 -4OC -4INT 6ATAFCA-ITA/IH. NOO<"9 TIT/. I/isting of 6ata "et CA-ITA/IH.I9 4UN9 .#planation5&
This program capitalizes the first letter of the t+o character varia$les FI4"T and /A"T. The same technique could have other applications. The first step is to set all the letters to lo+ercase using the /O0CA". function. The first letter of each name is then turned $ac! to uppercase using the "U<"T4 function 1on the right side of the equal sign3 to select the first letter in the first and last names and the U-CA". function to capitalize it. The "U<"T4 function on the left side of the equal sign is used to place this letter in the first position of each of the varia$les.
#)Function: P$OPCASE
Pur%ose: To capitalize the first letter of each +ord in a string. S&nta': -4O-CA".1character&value3
Progra):-
ronald cO6y T;oma" e6I"ON al$ert einstein 9 -4OC -4INT 6ATAF-4O-.4 NOO<"9 TIT/. I/isting of 6ata "et -4O-.4I9 4UN9 .#planation5&
In this program you use the -4O-CA". function to capitalize the first letter of the first and last names.
Functions That $e)o,e Characters fro) Strings (3CO/7-< 1compress $lan!s3 can replace multiple $lan!s +ith a single $lan!. ?3CO7-4."" function can remove not only $lan!s $ut also any characters you specify from a string. Function: CO0P/!
Pur%ose: To replace all occurrences of t+o or more $lan!s +ith a single $lan! character. This is particularly useful for standardizing addresses and names +here multiple $lan!s may have $een entered. S&nta': CO0P/!(character-value)
Progra):-
NO4T; CIT> N> ((N(O 9 -4OC -4INT 6ATAF"KU..H.9 TIT/. G/isting of 6ata "et "KU..H.G9 I6 NA7.9 2A4 A664."" CIT> "TAT. HI-9 4UN9 E'%*anation:-
.ach line of the addresses +as passed through the CO7-</ function to replace any sequence of t+o or more $lan!s to a single $lan!
FUNCTION : CO0P$ESS Pur%ose: To remove specified characters from a character value.
-rogram(5&
1)*O3?@N&AA)* 1?*(3 NNN&TT )) 9 -4OC -4INT 6ATAF-;ON.BNU7<.49 TIT/. G/isting of 6ata "et -;ON.BNU7<.4G9 4UN9
.#planation5& For the varia$le -;ON.( the second argument is omitted from the CO7-4."" function9 therefore only $lan!s are removed. For -;ON.? left and right parentheses dashes and $lan!s are listed in the second argument so all of these characters are removed from the character value. -rogram?5&
Functions That "earch for Characters Functions in this category allo+ you to search a string for specific characters or for a character category 1such as a digit3. "ome of these functions can also locate the first position in a string +here a character does not meet a particular specification. The IAN>I functions 1AN>A/NU7 AN>A/-;A AN>6I:IT AN>-UNCT and AN>"-AC.3 This group of functions is descri$ed together $ecause of the similarity of their use. Ne+ as of 2ersion ) these functions return the location of the first alphanumeric letter digit punctuation or space in a character string. Note that there are other IAN>I functions $esides those presented hereUthese are the most common ones 1see the SAS OnlineDoc 9.1 for a complete list3. It is important to note that it may $e necessary to use the T4I7 function 1or "T4I- function3 +ith the AN> and NOT functions since leading or especially trailing $lan!s +ill affect the results. For e#ample if 8 F IA<C I 1A<C follo+ed $y three $lan!s3 > F NOTA/NU7183 +ill $e A the location of the first $lan!.
Function5 AN>A/NU7
than the length of the string results in a scan from right to left starting at the end of the string. If the value of start is a positive num$er longer than the length of the string or if it is * the function returns a * .#amples5& For these e#amples STRI ! = "A"C 123 #$%&'('" Function $eturns A YA) *M(STRI !) A YA) *M("##++,,") A YA) *M(STRI !,5) A YA) *M(STRI !,-4) A YA) *M(STRI !,6)
1 0 5 3 6
1the position of "A"3 1no alpha&numeric characters3 1the position of "1"3 1the position of "C"3 1the position of "2"3
Function5 AN>A/-;A
Purpose: To locate the first occurrence of an alpha character 1any upper& or lo+ercase
letter3 and return its position. If none is found the function returns a *. 0ith the use of an optional parameter this function can $egin searching at any position in the string and can also search from right to left if desired.
1position of IAI3 1no alpha characters3 1position of I#I3 1position of ICI3 1position of I#I3
Function5 AN>6I:IT
Purpose: To locate the first occurrence of a digit 1numeral3 and return its position. If
none is found the function returns a *. 0ith the use of an optional parameter this function can $egin searching at any position in the string and can also search from right to left if desired.
.#amples5& For these e#amples STRI ! = "A"C 123 #$%&'('" Function $eturns A Y-I!IT(STRI !) A Y-I!IT("##++,,") A Y-I!IT(STRI !,5) A Y-I!IT(STRI !,-4) A Y-I!IT(STRI !,6) Function: AN2PUNCT
5 0 5 0 6
1position of I(I3 1no digits3 1position of I(I3 1no digits from position A to (3 1position of I?I3
Purpose: To locate the first occurrence of a punctuation character and return its
position. If none is found the function returns a *. 0ith the use of an optional parameter this function can $egin searching at any position in the string and can also search from right to left if desired. In the A"CII character set the follo+ing characters are considered punctuation5 . " / + , 0 1 ( ) 2 3 , - 4 5 6 7 8 = 9 # : ; < = > ' ? @ A B C
Function5 AN>"-AC.
horizontal
1position of the first $lan!3 1no spaces3 1position of the second $lan!3 1position of the first $lan!3 1position of the second $lan!3
-rogram(5&
-rogram?5&
Using the functions AN23I6IT and AN2SPACE to find the first nu)-er in a string
6ATA ".A4C;BNU79 IN-UT "T4IN: =J*.9 "TA4T F AN>6I:IT1"T4IN:39 .N6 F AN>"-AC.1"T4IN: "TA4T39 IF "TA4T N. * T;.N NU7 F IN-UT1"U<"T41"T4IN: "TA4T .N6&"TA4T3 ).39 6ATA/IN."9 This line has a NJ in it t+o num$ers (?@ and ANJ in this line No digits here 9 -4OC -4INT 6ATAF".A4C;BNU7 NOO<"9 TIT/. I/isting of 6ata "et ".A4C;BNU7I9 4UN9 .#planation5& This program identifies the first num$er in any line of data that contains a numeric value 1follo+ed $y one or more $lan!s3. The AN>6I:IT function determines the position of the first digit of the num$er9 the AN>"-AC. function searches for the first $lan! follo+ing the num$er 1the starting position of this search is the position of the first digit3. The "U<"T4 function e#tracts the digits 1starting at the value of "TA4T +ith a length determined $y the difference $et+een .N6 and "TA4T3. Finally the IN-UT function performs the character to numeric conversion.
The INOTI functions 1NOTA/NU7 NOTA/-;A NOT6I:IT and NOTU--.43 This group of functions is similar to the IAN>I functions 1such as AN>A/NU7 AN>A/-;A etc.3 e#cept that the function returns the position of the first character value that is not a particular value 1alphanumeric character digit or uppercase character3. Note that this is not a complete list of the INOTI functions. As +ith the IAN>I functions there is an optional parameter that specifies +here to start the search and in +hich direction to search. Function5 NOTA/NU7
Purpose: To determine the position of the first character in a string that is not an
alphanumeric 1any upper& or lo+ercase letter or a num$er3. If none is found the function returns a *. 0ith the use of an optional parameter this function can $egin searching at any position in the string and can also search from right to left if desired.
1position of the (st $lan!3 1all alpha&numeric values3 1position of the IWI3 1position of the ?nd $lan!3 4 1position of the (st $lan!3 9 1position of the IWI3
4 0 1 8
Function5 NOTA/-;A
Purpose: To determine the position of the first character in a string that is not an
upper& or lo+ercase letter 1alpha character3. If none is found the function returns a *. 0ith the use of an optional parameter this function can $egin searching at any position in the string and can also search from right to left if desired.
4 0 1 5 9
1position of (st $lan!3 1all alpha characters3 1position of first IWI3 1position of I(I3 1start at position (* and search left position
Function5 NOT6I:IT
Purpose: To determine the position of the first character in a string that is not a digit.
If none is found the function returns a *. 0ith the use of an optional parameter this function can $egin searching at any position in the string and can also search from right to left if desired.
1 0 1 8 4 8
1position of IAI3 1all digits3 1position of IWI3 1position of ?nd $lan!3 1position of (st $lan!3 1position of ?nd $lan!3
Function5 NOTU--.4
Purpose: To determine the position of the first character in a string that is not an
uppercase letter. If none is found the function returns a *. 0ith the use of an optional parameter this function can $egin searching at any position in the string and can also search from right to left if desired.
5 0 4 1 5 6 6
1position of IaI3 1all uppercase characters3 1position of (st $lan!3 1position of IWI3 1position of I(I3 1position of I?I3 1position of I?I3
-rogram(5&
.#planation5& This straightfor+ard program demonstrates each of the INOTI character functions. As +ith most character functions $e careful +ith trailing $lan!s. Notice that the last o$servation 1IA<CI3 contains only three characters $ut since "T4IN: is read +ith a =N. informat there are t+o trailing $lan!s follo+ing the letters GA<CG. That is the reason you o$tain a value of A for all the functions e#cept NOT6I:IT +hich returns a ( 1the first character is not a digit3.
FIN6 and FIN6C This pair of functions shares some similarities to the IN6.8 and IN6.8C functions. FIN6 and IN6.8 $oth search a string for a given su$string. FIN6C and IN6.8C $oth search for individual characters. ;o+ever $oth FIN6 and FIN6C have some additional capa$ility over their counterparts. For e#ample this pair of functions has the a$ility to declare a starting position for the search the direction of the search and to ignore case or trailing $lan!s. Function5 FIN6
you can
Syntax: FIN61character-value,
find-string R GmodifiersGS R startS3 0here character-value is any "A" character e#pression. find-string is a character varia$le or string literal that contains one or more characters that you +ant to search for. The function returns the first position in the character-value that contains the find-string. If the find-string is not found the function returns a *. The follo+ing modifiers 1in upper& or lo+ercase3 placed in single or dou$le quotation mar!s may $e used +ith FIN65 G ignore case. F ignore trailing $lan!s in $oth the character varia$le and the findstring. E'a)%*es:-
For these e#amples STRI !1 = "HDNNO PDNNO HOOLJ%D" and STRI !2 = "PDNNO" Function MI MI MI MI MI $eturns 7 1 17 7 7
-(STRI !1, STRI !2) -(STRI !1, STRI !2, 1I1) -(STRI !1,"J%D") -("IJK$%&IJK","IJK",4) -(STRI !1, STRI !2, "G", -99
Function5 FIN6C
Purpose: To locate a character that appears or does not appear +ithin a string. 0ith
optional arguments you can define the starting point for the search the direction of the search to ignore case or trailing $lan!s or to loo! for characters e#cept the ones listed.
Syntax5 FIN6C1character-value
find-characters
R GmodifiersGS R startS3 0here character-value is any "A" character e#pression. find-characters is a list of one or more characters that you +ant to search for. The function returns the first position in the character-value that contains one of the find-characters. If none of the characters are found the function returns a *. 0ith an optional argument you can have the function return the position in a character string of a character that is not in the find-characters list. modifiers 1in upper& or lo+ercase3 placed in single or dou$le quotation mar!s may $e used +ith FIN6C as follo+s5 G ignore case. F ignore trailing $lan!s in $oth the character varia$le and the find-characters. Q count only characters that are not in the list of find characters.
o process the modifiers and find characters only once to a specific call to the function. In su$sequent calls changes to these arguments +ill have no effect.
.#amples5& For these e#amples STRI !1 = "ARRNDE I(L "OOSE" and STRI !2 = "IJKLD" Function $eturns MI -C(STRI !1, STRI !2) MI -C(STRI !1, STRI !2, 1G1) MI -C(STRI !1,"IRND",1QG1) MI -C("IJK$%&IJK","IJK",4)
5 1 6 7
-rogram5&
Using the FIN3 and FIN3C functions to search for strings and characters
6ATA FIN6B2O0./9 IN-UT M( "T4IN: =?*.9 -.A4 F FIN61"T4IN: I-earI39 -O"B2O0./ F FIN6C1"T4IN: IaeiouI GIG39 U--.4B2O0./ F FIN6C1"T4IN: IaeiouI39 NOTB2O0./ F FIN6C1"T4IN: IA.IOUI GI2G39 6ATA/IN."9 8>HA<Ca$c 8>H Apple and -ear 9 -4OC -4INT 6ATAFFIN6B2O0./ NOO<"9 TIT/. I/isting of 6ata "et FIN6B2O0./I9 4UN9 .#planation5& The FIN6 function returns the position of the characters I-earI in the varia$le "T4IN:. "ince the i modifier is not used the search is case&sensitive. The first use of the FIN6C function loo!s for any upper& or lo+ercase vo+el in the string 1$ecause of the i modifier3. The ne#t statement +ithout the i modifier locates only lo+ercase vo+els. Finally the v modifier in the last FIN6C function reverses the search to loo! for the first character that is
IN6.8 IN6.8C and IN6.80 This group of functions all search a string for a su$string of one or more characters. IN6.8 and IN6.80 are similar the difference $eing that IN6.80 loo!s for a +ord 1defined as a string $ounded $y spaces or the $eginning or end of the string3 +hile IN6.8 simply searches for the designated su$string. IN6.8C searches for one or more individual characters and al+ays searches from right to left. Note that these three functions are all case&sensitive.
Function5 IN6.8
0here character-value is any "A" character e#pression. find-string is a character varia$le or string literal that contains the su$string for +hich you +ant to search. The function returns the first position in the character-value that contains the find-string. If the find-string is not found the function returns a *. .#amples5& For these e#amples STRI ! = "A"C-EM!"
$eturns !,1C1) !,1-EM1) !,1X1) !,1ACE1) 3 4 0 0 1the position of the GCG3 1the position of the G6G3 1no I8I in the string3 1no IAC.I in the string3
-rogram5&
Con,erting nu)eric ,a*ues of )i'ed units (e8g89 1g and *-s) to a sing*e nu)eric :uantit&
6ATA ;.A2>9 IN-UT C;A4B0T = MM9 0.I:;T F IN-UT1CO7-4.""1C;A4B0T GP:G3 O.39 IF IN6.81C;A4B0T GPG3 N. * T;.N 0.I:;T F ?.?? D 0.I:;T9 0.I:;T F 4OUN610.I:;T39 64O- C;A4B0T9 6ATA/IN."9 J*P: (NN O?P: NAP: )O 9 -4OC -4INT 6ATAF;.A2> NOO<"9 TIT/. I/isting of 6ata "et ;.A2>I9 2A4 0.I:;T9 4UN9
.#planation5& The data lines contain num$ers in !ilograms follo+ed $y the a$$reviation P: or in pounds 1no units used3. As +ith most pro$lems of this type +hen you are reading a com$ination of num$ers and characters you usually need to first read the value as a character. ;ere the CO7-4."" function is used to remove the letters P: from the character value. The IN-UT function does its usual ,o$ of character to numeric conversion. If the IN6.8 function returns any value other than a * the letter P +as found in the string and the 0.I:;T value is converted from P: to pounds. Finally the value is rounded to the nearest pound using the 4OUN6 function.
Function5 IN6.8C
Purpose: To search a character string for one or more characters. The IN6.8C
function +or!s in a similar manner to the IN6.8 function +ith the difference $eing it can $e used to search for any one in a list of character values.
Syntax:IN6.8C1character-value
...3
0here INDEXC(character-value, 'char1char2char3. . .') character-value is any "A" character e#pression. char1, char2, 444 are individual character values that you +ish to search for in the character-value. The IN6.8C function returns the first occurrence of any of the char1 , char2 etc. values in the string. If none of the characters is found the function returns a *. -rogram5&
E'%*anation:4ather than use three statements using the IN6.8 function you can use the IN6.8C function +hich allo+s you to chec! for any one of a num$er of character values. ;ere if an 8 > or H is found in the varia$le TA:BNU7<.4 the function returns a num$er greater than * and 6."TINATION +ill $e set to INT.4NATIONA/.
-rogram5&
.#planation5& In this some+hat trumped&up e#ample dates are entered either in mm[dd[yyyy or dd7ONyyyy form. Also $esides a slash dashes and colons are used. Any string that includes either a slash dash or colon is a date that needs the mmddyy(*. informat. Other+ise the date). informat is used.
Function5 IN6.80 Purpose: To search a string for a +ord defined as a group of letters separated on $oth ends $y a +ord $oundary 1a space the $eginning of a string end of the string3. Note that punctuation is not considered a +ord $oundary.
.#amples5& For these e#amples STRI !1 = "FPDTD GE I FPD PDTD" and STRI !2 = "D(L G( FPD4" Function $esu*t I -EXU(STRI !1,"FPD") I -EXU("A"A"A"","A"") I -EXU(STRI !1,"DT") I -EXC(STRI !2,"FPD")
12 1the +ord ItheI3 0 1no +ord $oundaries around IA<I3 0 1not a +ord3 0 1punctuation is not a +ord $oundary
-rogram5&
.#planation5& This program demonstrates the difference $et+een IN6.8 and IN6.80. Notice in the first o$servation in the listing $elo+ the IN6.8 function returns a ( $ecause the letters ItheI as part of the +ord IthereI $egin the string. "ince the IN6.80 function needs either +hite space at the $eginning or end of a string to delimit a +ord it returns a (? the position of the +ord ItheI in the string. O$servation @ emphasizes the fact that a punctuation mar! does not serve as a +ord separator. Finally since the string ItheI does not appear any+here
Function5 2.4IF> Purpose: To chec! if a string contains any un+anted values Syntax: 2.4IF>1character-value verify-string3 0here character-value is any "A" character e#pression. verify-string is a "A" character varia$le or a list of character values in quotation mar!s. This function returns the first position in the character-value that is not present in the verify-string. If the character-value does not contain any characters other than those in the verify-string the function returns a *. <e especially careful to thin! a$out trailing $lan!s +hen using this function. If you have an O&$yte character varia$le equal to GA<CG 1follo+ed $y five $lan!s3 and if the verify string is equal to GA<CG the 2.4IF> function returns a A the position of the first $lan! 1+hich is not present in the verify string3. Therefore you may need to use the T4I7 function on either the character-value the verify-string or $oth.
.#amples5& For these e#amples STRI ! = "A"CXA"-" and V = "A"C-E" Function $eturns VERIMY(STRI !,V) VERIMY(STRI !,"A"C-EXYZ") VERIMY(STRI !,"AC-") VERIMY("A"C ","A"C") VERIMY(TRIM("A"C "),"A"C")
4 0 2 4 0
1I8I is not in the verify string3 1no I$adI characters in "T4IN:3 1position of the I<I3 1position of the (st $lan!3 1no invalid characters
-rogram5&
Using the ;E$IF2 function to chec1 for in,a*id character data ,a*ues
6ATA 2.4>BFI9
IN-UT I6 = (&@ AN"0.4 = N&)9 - F 2.4IF>1AN"0.4 GA<C6.G39 OP F - .K *9 6ATA/IN."9 **( AC<.6 **? A<86. **@ (?CC. **A A<C . 9 -4OC -4INT 6ATAF2.4>BFI NOO<"9 TIT/. Ilisting of 6ata "et 2.4>BFII9 4UN9
.#planation5& In this e#ample the only valid values for AN"0.4 are the uppercase letters AX.. Any time there are one or more invalid values the result of the 2.4IF> function 1varia$le -3 +ill $e a num$er from ( to N. The "A" statement that computes the value of the varia$le OP needs a +ord of e#planation. First the logical comparison - .K * returns a value of true or false +hich is equivalent to a ( or *. This value is then assigned to the varia$le OP. Thus the varia$le OP is set to ( for all valid values of AN"0.4 and to * for any invalid values
Functions That .#tract -arts of "trings The functions descri$ed in this section can e#tract parts of strings. 0hen used on the left hand side of the equal sign the "U<"T4 function can also $e used to insert characters into specific positions of an e#isting string. Function5 "U<"T4 Purpose: To e#tract part of a string. 0hen the "U<"T4 function is used on the left side of the equal sign it can place specified characters into an e#isting string. Syntax5 "U<"T41character-value start R lengthS3 character-value is any "A" character e#pression. start is the starting position +ithin the string. length if specified is the num$er of characters to include in the
su$string. If this argument is omitted the "U<"T4 function +ill return all the characters from the start position to the end of the string. If a length has not $een previously assigned the length of the resulting varia$le +ill $e the length of the character-value.
.#amples5& For these e#amples let STRI ! = "A"C123XYZ" Function $eturns S*"STR(STRI !,4,2) S*"STR(STRI !,4) S*"STR(STRI !,)E !TH(STRI !))
-rogram5&
E'tracting %ortions of a character ,a*ue and creating a character ,aria-*e and a nu)eric ,a*ue
6ATA "U<"T4IN:9 IN-UT I6 = (&)9 /.N:T; "TAT. = ?9 "TAT. F "U<"T41I6 ( ?39 NU7 F IN-UT1"U<"T41I6 T @3 @.39 6ATA/IN."9 N>8888(?@ NQ(?@ANJT 9 -4OC -4INT 6ATAF"U<"T4IN: NOO<"9 TIT/. G/isting of 6ata "et "U<"T4IN:G9 4UN9
.#planation5& In this e#ample the I6 contains $oth state and num$er information. The first t+o characters of the I6 varia$le contain the state a$$reviations and the last three characters represent numerals that you +ant to use to create a numeric varia$le. .#tracting the state codes is straightfor+ard. To o$tain a numeric value from the last @ $ytes of the I6 varia$le it is necessary to first use the "U<"T4 function to e#tract the three characters of interest and to then use the IN-UT function to do the character to numeric conversion
program5&
E'tracting the *ast t+o characters fro) a string9 regard*ess of the *ength
6ATA .8T4ACT9 IN-UT M( "T4IN: =?*.9 /A"TBT0O F "U<"T41"T4IN: /.N:T;1"T4IN:3&( ?39 6ATA/IN."9 A<C6. A8(?@ANN> (?JTO) 9 -4OC -4INT 6ATAF.8T4ACT NOO<"9 TIT/. I/isting of 6ata "et .8T4ACTI9 2A4 "T4IN: /A"TBT0O9 4UN9
.#planation5& This program demonstrates ho+ you can use the /.N:T; and "U<"T4 functions together to e#tract portions of a string +hen the strings are of different or un!no+n lengths. To see ho+ this program +or!s ta!e a loo! at the first line of data. The /.N:T; function +ill return a N and 1NX(3 F A the position of the ne#t to the last 1penultimate3 character in "T4IN:
-rogram5&
8CQE F IN-UT1"U<"T41"T4IN: Q (3 (.39 .N69 64O- Q9 4UN9 -4OC -4INT 6ATAFUN-ACP NOO<"9 TIT/. I/isting of 6ata "et UN-ACPI9 4UN9
E'%*anation:There are times +hen you +ant to store a group of one&digit num$ers in a compact space saving +ay. In this e#ample you +ant to store five one&digit num$ers. If you stored each one as an O&$yte numeric you +ould need A* $ytes of storage for each o$servation. <y storing the five num$ers as a N&$yte character string you need only N $ytes of storage. ;o+ever you need to use C-U time to turn the character string $ac! into the five num$ers. The !ey here is to use the "U<"T4 function +ith the starting value as the inde# of a 6O loop. As you pic! off each of the numerals you can use the IN-UT function to do the character&to&numeric conversion. Notice that the A44A> statement in this program does not include a list of varia$les.
As +e mentioned in the description of the "U<"T4 function there is an interesting and useful +ay it can $e usedUon the left&hand side of the equal sign. Purpose: To place one or more characters into an e#isting string. Syntax: SU/ST$(character-value9 start =9 length>) > character-value
E'a)%*es:-
In these e#amples EXISTI ! = "A"C-EM!H", EU = "XY" Function $eturns S*"STR(EXISTI !,3,2) = EU EXISTI ! is no+ = "A"XYEM!H" S*"STR(EXISTI !,3,1) = "2" EXISTI ! is no+ = "A"2-EM!H"
Progra):-
3e)onstrating the SU/ST$ function on the *eft-hand side of the e:ua* sign
6ATA "TA4"9 IN-UT "<- 6<- MM9 /.N:T; "<-BC;P 6<-BC;P = A9 "<-BC;P F -UT1"<- @.39
6<-BC;P F -UT16<- @.39 IF "<- :T (J* T;.N "U<"T41"<-BC;P A (3 F GDG9 IF 6<- :T )* T;.N "U<"T416<-BC;P A (3 F GDG9 6ATA/IN."9 (?* O* (O* )? ?** ((* 9 -4OC -4INT 6ATAF"TA4" NOO<"9 TIT/. I/isting of 6ata "et "TA4"I9 4UN9 E'%*anation:In this program you +ant to IflagI high values of systolic and diastolic $lood pressure $y placing an asteris! after the value. Notice that the varia$les "<-BC;P and 6<-BC;P are $oth assigned a length of A $y the length statement. The fourth position needs to $e there in case you +ant to place an asteris! in that position to flag the value as a$normal. The -UT function places the numerals of the $lood pressures into the first @ $ytes of the corresponding character varia$les. Then if the value is a$ove the specified level an asteris! is placed in the fourth position of these varia$les
Function: SU/ST$N
Purpose: This function serves the same purpose as the "U<"T4 function +ith a fe+ added features. Unli!e the "U<"T4 function the starting position and the length arguments of the "U<"T4N function can $e * or negative +ithout causing an error. In particular if the length is * the function returns a string of * length. This is particularly useful +hen you are using regular e#pression functions +here the length parameter may $e * +hen a pattern is not found. Syntax: SU/ST$N(character-value9 start =9 length>) character-value is any "A" character varia$le. start is the starting position in the string. If this value is non&positive the function returns a su$string starting from the first character in character-value the length of the su$string +ill $e computed $y counting starting from the value of start3. length is the num$er of characters in the su$string. If this value is nonpositive 1in particular *3 the function returns a string of length *. If this argument is omitted the "U<"T4N function +ill return all the characters from the start position to the end of the string.
E'a)%*es:-
Progra):3e)onstrating the uni:ue features of the SU/ST$N Function 6ATA ;OA:I.9 "T4IN: F GA<C6.F:;IQG9 /.N:T; 4."U/T =N.9 4."U/T F "U<"T4N1"T4IN: ? N39 "U<( F "U<"T4N1"T4IN: &( A39 "U<? F "U<"T4N1"T4IN: @ *39 "U<@ F "U<"T4N1"T4IN: T N39 "U<A F "U<"T4N1"T4IN: * ?39 FI/. -4INT9 TIT/. I6emonstrating the "U<"T4N FunctionI9 -UT IOriginal "tring FI M?N "T4IN: [ I"U<"T4N1"T4IN: ? N3 FI M?N 4."U/T [ I"U<"T4N1"T4IN: &( A3 FI M?N "U<( [ I"U<"T4N1"T4IN: @ *3 FI M?N "U<? [ I"U<"T4N1"T4IN: T N3 FI M?N "U<@ [ I"U<"T4N1"T4IN: * ?3 FI M?N "U<A9 4UN9 E'%*anation:In data set ;OA:I. 1su$&strings get itW3 the storage lengths of the varia$les "U<(X"U<A are all equal to the length of "T4IN: 1+hich is (*3. "ince a /.N:T; statement +as used to define the length of 4."U/T it has a length of N.
There are three call routines and four functions that concatenate character strings. Although you can use the \\ concatenation operator in com$ination +ith the "T4I- T4I7 or /.FT functions these routines and functions ma!e it much easier to put strings together and if you +ish to place one or more separator characters $et+een the strings. The three call routines are discussed first follo+ed $y the four concatenation functions.
Ca** $outines
These three call routines concatenate t+o or more strings. Note that there are four concatenation functions as +ell 1CAT CAT" CATT and CAT83. The differences among these routines involve the handling of leading and[or trailing $lan!s as +ell as spacing $et+een the concatenated strings. The traditional concatenation operator 1 ||3 is still useful $ut it sometimes ta!es e#tra +or! to strip leading and trailing $lan!s 1/.FT and T4I7 functions or the ne+ "T4I- function3 $efore performing the concatenation . Function: CA!! CATS Purpose: To concatenate t+o or more strings removing $oth leading and trailing $lan!s $efore the concatenation ta!es place. To help you remem$er that this call routine is the one that strips the leading and trailing $lan!s $efore concatenation thin! of the " at the end of CAT" as Istrip $lan!s.I Note5 To call our three cats I usually ,ust +histle loudly. Syntax: CA!! CATS(result9 string-1 =9string-n@) +here result is the concatenated string. It can $e a ne+ varia$le or if it is an e#isting varia$le the other strings +ill $e added to it. /e sure that the *ength of resu*t is *ong enough to ho*d the concatenated resu*ts8 If not the resulting string +ill $e truncated and you +ill see an error message in the log. string-1 and string-n are the character strings to $e concatenated. /eading and trailing $lan!s +ill $e stripped prior to the concatenation.
E'a)%*es:-
For these e#amples A = ""GNJO" 1no $lan!s3 " = " MTOLO" 1leading $lan!s3 C = "HOJJGF " 1trailing $lan!s3 - = " !I(LINW " 1leading and trailing $lan!s3 Function $eturns CA)) CATT(RES*)T, A, ") CA)) CATT(RES*)T, ", C, -) CA)) CATT(RES*)T, "HDNNO", -)
removing only trailing $lan!s $efore the concatenation ta!es place. To help you remem$er this thin! of the T at the end of CATT as Itrailing $lan!sI or Itrim $lan!s.I
e#isting varia$le the other strings +ill $e added to it. /e sure that the *ength of resu*t is *ong enough to ho*d the concatenated resu*ts8 If not the program +ill terminate and you +ill see an error message in the log. string-1 and string-n are the character strings to $e concatenated.
E'a)%*es:-
For these e#amples A = ""GNJO" 1no $lan!s3 " = " MTOLO" 1leading $lan!s3 C = "HOJJGF " 1trailing $lan!s3 - = " !I(LINW " 1leading and trailing $lan!s3 Function $eturns CA)) CATT(RES*)T, A, ") CA)) CATT(RES*)T, ", C, -) CA)) CATT(RES*)T, "HDNNO", -)
Function: CA!! CAT<
removing $oth leading and trailing $lan!s $efore the concatenation ta!es place and place a single space or one or more characters of your choice $et+een each of the strings. To help you remem$er this thin! of the 8 at the end of CAT8 as Iadd e8tra $lan!.I S&nta': CA!! CAT<(separator9 result9 string-1 =9string-n@) +here separator is one or more characters placed in single or dou$le quotation mar!s that you +ant to use to separate the strings
E'a)%*es:-
For these e#amples A = ""GNJO" 1no $lan!s3 " = " MTOLO" 1leading $lan!s3 C = "HOJJGF " 1trailing $lan!s3 - = " !I(LINW " 1leading and trailing $lan!s3 Function $eturns CA)) CATX(" ", RES*)T, A, ") CA)) CATX(",", RES*)T, ", C, -) CA)) CATX("6", RES*)T, "HDNNO", -) CA)) CATX(", ", RES*)T, "HDNNO", -) CA)) CATX("222", RES*)T, A, ")
Progra):-
E'%*anation:The three concatenation call routines each perform concatenation operations. The CAT" call routine strips leading and trailing $lan!s9 the CATT call routine removes trailing $lan!s $efore performing the concatenation9 the CAT8 call routine is similar to the CAT" call routine e#cept that it inserts a separator character 1specified as the first argument3 $et+een each of the concatenated strings
These four concatenation functions are very similar to the concatenation call routines descri$ed a$ove. ;o+ever since they are functions and not call routines you need to name the ne+ character varia$le to $e created on the left&hand side of the equal sign and the function along +ith its arguments on the right&hand side of the equal sign
Function: CAT
Pur%ose: To concatenate 1,oin3 t+o or more character strings leaving leading and[or
trailing $lan!s unchanged. This function accomplishes the same tas! as the concatenation operator 1\\3. S&nta': CAT(string-19 string-2 =9string-n@) string-1, string-2 8,string-n9 are the character strings to $e concatenated. These arguments can also $e +ritten as5 CAT(OM C1-
Note5 It is very important to set the length of the resulting character string using a
/.N:T; statement 1or other method3 $efore using any of the concatenation functions. Other+ise the length of the resulting string +ill default to ?**. E'a)%*es:-
For these e#amples A = ""GNJO" 1no $lan!s3 " = " MTOLO" 1leading $lan!s3 C = "HOJJGF " 1trailing $lan!s3 - = " !I(LINW " 1leading and trailing $lan!s3 C1-C5 are five character varia$les +ith the values of 1A1, 1"1, 1C1, 1-1, and 1E1 4espectively. Function $eturns CAT(A, ") ""GNJO MTOLO" CAT(", C, -) " MTOLOHOJJGF !I(LINW " CAT("HDNNO", -) "HDNNO !I(LINW " CAT(OM C1-C5) "A"C-E"
Function: CATS
Purpose: To concatenate 1,oin3 t+o or more character strings stripping $oth leading
and trailing $lan!s. Syntax: CATS(string-19 string-2 =9string-n@) string-1, string-2, and string-n are the character strings to $e concatenated. These arguments can also $e +ritten as5 CATS(OM C1C5)+here C( to CN are character varia$les
E'a)%*es:-
For these e#amples A = ""GNJO" 1no $lan!s3 " = " MTOLO" 1leading $lan!s3 C = "HOJJGF " 1trailing $lan!s3 - = " !I(LINW " 1leading and trailing $lan!s3 C1-C5 are five character varia$les +ith the values of 1A1, 1"1, 1C1, 1-1, and 1E1 4espectively. Function $eturns
Function: CATT
Purpose: To concatenate 1,oin3 t+o or more character strings stripping only trailing
$lan!s.
E'a)%*es:-
For these e#amples A = ""GNJO" 1no $lan!s3 " = " MTOLO" 1leading $lan!s3 C = "HOJJGF " 1trailing $lan!s3 - = " !I(LINW " 1leading and trailing $lan!s3 C1-C5 are five character varia$les +ith the values of 1A1, 1"1, 1C1, 1-1, and 1E1 4espectively. Function $eturns CATT(A, ") CATT(", C, -) CATT("HDNNO", -) CATT(OM C1-C5)
Function: CAT<
Purpose: To concatenate 1,oin3 t+o or more character strings stripping $oth leading
and trailing $lan!s and inserting one or more separator characters $et+een the strings. Syntax: CAT<(separator9 string-19 string-2 =9string-n@) separator is one or more characters placed in single or dou$le quotation mar!s to $e used as separators $et+een the concatenated strings. string-1, string-2,string-n are the character strings to $e concatenated. These arguments can also $e +ritten as5 CATX(" ",OM C1-C5) +here C( to CN are character varia$les.
E'a)%*es:-
For these e#amples A = ""GNJO" 1no $lan!s3 " = " MTOLO" 1leading $lan!s3 C = "HOJJGF " 1trailing $lan!s3 - = " !I(LINW " 1leading and trailing $lan!s3 C1-C5 are five character varia$les +ith the values of 1A1, 1"1, 1C1, 1-1, and 1E1 4espectively. Function $eturns
CAT81I I A <3
I<il$o FrodoI
"MTOLO6HOJJGF6!I(LINW" "HDNNO222!I(LINW" "A,",C,-,E"
Progra)1:-
E'%*anation:Notice that each of the "T4IN: varia$les differs +ith respect to leading and trailing $lan!s.
The CAT function is identical to the \\ operator. The CAT" function removes $oth leading and trailing $lan!s and is equivalent to T4I71/.FT1"T4IN:?33 \\ T4I71/.FT1"T4IN:A3. The CATT function trims only trailing $lan!s. The last t+o statements use the CAT8 function +hich removes leading and trailing $lan!s. It is ,ust li!e the CAT" function $ut adds one or more separator characters 1specified as the first argument3 $et+een each of the strings to $e ,oined
There are times +hen you +ant to remove $lan!s from the $eginning or end of a character string. The t+o functions /.FT and 4I:;T merely shift the characters to the $eginning or the end of the string respectively. The T4I7 T4I7N and "T4I- functions are useful +hen you +ant concatenate strings 1although the ne+ concatenation functions +ill do this for you3.
!EFT and $I6AT
These t+o functions left& or right&align te#t. 4emem$er that the length of a character varia$le +ill not change +hen you use these t+o functions. If there are leading $lan!s the /.FT function +ill shift the first non&$lan! character to the first position and move the e#tra $lan!s to the end9 if there are trailing $lan!s the 4I:;T function +ill shift the non&$lan! te#t to the right and move the e#tra $lan!s to the left.
Function: !EFT
Pur%ose: To left&align te#t values. A su$tle $ut important point5 /.FT doesnGt
IremoveI the leading $lan!s9 it moves them to the end of the string. Thus it doesnGt change the storage length of the varia$le even +hen you assign the result of /.FT to a ne+ varia$le. The /.FT function is particularly useful if values +ere read +ith the =C;A4 informat +hich preserves leading $lan!s. Note that the "T4I- function removes $oth leading and trailing $lan!s from a string. S&nta': !EFT(character-value) character-value is any "A" character e#pression.
E'a)%*es:-
In these e#amples STRI ! = " A"C" Function $eturns )EMT(STRI !) "A"C " )EMT(" 123 ") "123 "
Progra)1:-
BCAA$ infor)at
6ATA /.A6BON9 IN-UT "T4IN: =C;A4(N.9 /.FTB"T4IN: F /.FT1"T4IN:39 6ATA/IN."9 A<C 8>H 4on Cody 9 -4OC -4INT 6ATAF/.A6BON NOO<"9 TIT/. I/isting of 6ata "et /.A6BONI9 FO47AT "T4IN: /.FTB"T4IN: =KUOT.(T.9 4UN9
E'%*anation:If you +ant to +or! +ith character values you +ill usually +ant to remove any leading $lan!s first. The =C;A4+. informat differs from the =+. informat. =C;A4+. maintains leading $lan!s9 =+. left&aligns the te#t. -rograms involving character varia$les sometimes fail to +or! properly $ecause careful attention +as not paid to either leading or trailing $lan!s. Notice the use of the =KUOT. format in the -4INT procedure. This format adds dou$le quotation mar!s around the character value.
Function: $I6AT
Pur%ose: To right&align a te#t string. Note that if the length of a character varia$le
has previously $een defined and it contains trailing $lan!s the 4I:;T function +ill move the characters to the end of the string and add the $lan!s to the $eginning so that the final length of the varia$le remains the same.
S&nta': right(character-value)
character-value is any "A" character e#pression.
E'a)%*es:-
In these e#amples STRI ! = "A"C " Function $eturns RI!HT(STRI !) RI!HT(" 123 ")
Progra)1:-
E'%*anation:6ata lines one and t+o $oth contain three leading $lan!s9 lines one and three contain trailing <lan! Notice the use of the =KUOT. format in the -4INT procedure. This format adds dou$le quotation mar!s around the character value. This is especially useful in de$ugging programs involving character varia$les since it allo+s you to easily identify leading $lan!s in a character value
This group of functions trims trailing $lan!s 1T4I7 and T4I7N3 and $oth leading and trailing $lan!s 1"T4I-3. The t+o functions T4I7 and T4I7N are similar5 they $oth remove trailing $lan!s from a string. The functions +or! identically e#cept +hen the argument contains only $lan!s. In that case T4I7 returns a single $lan! 1length of (3 and T4I7N returns a null string +ith a length of *. The "T4I- function removes $oth leading and trailing $lan!s.
Function: T$I0
Purpose: To remove trailing $lan!s from a character value. This is especially useful
+hen you +ant to concatenate several strings together and each string may contain trailing $lan!s. Syntax: T$I0(character-value)
length as the argument unless the length of this varia$le has $een previously defined. If the result of the T4I7 function is assigned to a varia$le +ith a length longer than the trimmed argument the resulting varia$le +ill $e padded +ith $lan!s.
E'a)%*es:-
For these e#amples STRI !1 = "A"C " and STRI !2 = " XYZ" Function $eturns TRIM(STRI !1) "A"C" TRIM(STRI !2) " XYZ" TRIM("A " C ") "A " C" TRIM("A ") AA TRIM("" ") "A""
Progra)1:-
Creating a %rogra) to concatenate first9 )idd*e9 and *ast na)es into a sing*e ,aria-*e
6ATA -UTBTO:.T;.49 /.N:T; NA7. = AN9 INFO47AT NA7.(&NA7.@ =(N.9 INFI/. 6ATA/IN." 7I""O2.49 IN-UT NA7.( NA7.? NA7.@9 NA7. F T4I71NA7.(3 \\ G G \\ T4I71NA7.?3 \\ G G \\ T4I71NA7.@39 0IT;OUT F NA7.( \\ NA7.? \\ NA7.@9 P..- NA7. 0IT;OUT9 6ATA/IN."9 4onald Cody Qulia Child ;enry Ford /ee ;arvey Os+ald 9 -4OC -4INT 6ATAF-UTBTO:.T;.4 NOO<"9 TIT/. I/isting Of 6ata "et -UTBTO:.T;.4I9 4UN9
E'%*anation:-
This program reads in three names each up to (N characters in length. Note the use of the INFI/. option 7I""O2.4. This options sets the value of NA7.@ to missing +hen there are only t+o names. To put the names together you use the concatenate operator 1\\3. The T4I7 function is used to trim trailing $lan!s from each of the +ords 1+hich are all (N $ytes in length3 $efore putting them together. 0ithout the T4I7 function there are e#tra spaces $et+een each of the names 1see the varia$le 0IT;OUT3.
Function: T$I0N
Pur%ose: To remove trailing $lan!s from a character value. This is especially useful
+hen you +ant to concatenate several strings together and each string may contain trailing $lan!s. The difference $et+een T4I7 and T4I7N is that the T4I7 function returns a single $lan! for a $lan! string +hile T4I7N returns a null string 1zero $lan!s3.
S&nta': T$I0N(character-value)
character-value is any "A" character e#pression.
.#amples5&
For these e#amples STRI !1 = "A"C " and STRI !2 = " XYZ" Function $eturns TRIM (STRI !1) TRIM (STRI !2) TRIM ("A " C ") TRIM ("A ") AA TRIM("" ") TRIM (" ")
Progra):-
6ATA A//BT;.BT4I77IN:"9 A F IAAAI9 < F I<<<I9 /.N:T;BA< F /.N:T;C1A \\ <39 /.N:T;BA<BT4I7 F /.N:T;C1T4I71A3 \\ T4I71<339
/.N:T;BA<BT4I7N F /.N:T;C1T4I7N1A3 \\ T4I7N1<339 /.N:T;BNU// F /.N:T;C1CO7-4.""1A IAI3 \\ CO7-4.""1< I<I339 /.N:T;BNU//BT4I7 F /.N:T;C1T4I71CO7-4.""1A IAI33 \\ T4I71CO7-4.""1< I<I3339 /.N:T;BNU//BT4I7N F /.N:T;C1T4I7N1CO7-4.""1A IAI33 \\ T4I7N1CO7-4.""1< I<I3339 -UT AF <F [ /.N:T;BA<F /.N:T;BA<BT4I7F /.N:T;BA<BT4I7NF [ /.N:T;BNU//F /.N:T;BNU//BT4I7F /.N:T;BNU//BT4I7NF9 4UN9
E'%*anation:First remem$er that the /.N:T;C function returns the length of its argument including trailing $lan!s. As the listing from the "A" log 1$elo+3 sho+s the t+o functions T4I7 and T4I7N yield identical results +hen there are no null strings involved. 0hen you compress an GAG from the varia$le A or G<G from varia$le < the result is null. Notice that +hen you trim these compressed values and concatenate the results the length is ? 1( ] (39 +hen you use the T4I7N function the length is *.
Function: ST$IP
Pur%ose: To strip leading and trailing $lan!s from character varia$les or strings.
STRIP(CHAR) is equivalent to TRIM ()EMT(CHAR33 $ut more convenient. S&nta': ST$IP1character-value) character-value is any "A" character e#pression.
Note:-If the "T4I- function is used to create a ne+ varia$le the length of that ne+ varia$le +ill $e equal to the length of the argument of the "T4I- function. If leading or trailing $lan!s +ere trimmed trailing $lan!s +ill $e added to the result to pad out the length as necessary. The "T4I- function is useful +hen using the concatenation operator. ;o+ever note that there are several ne+ concatenation functions and call routines that also perform trimming $efore concatenation
E'a)%*es:-
For these e#amples let STRI ! = " IJK " Function $eturns
STRIP(STRI !) "IJK" 1if result +as previously assigned a length of three other+ise trailing $lan!s +ould $e added3 STRIP(" )EA-I ! A - TRAI)I ! ") ")EA-I ! A - TRAI)I !X
Progra):-
Using the ST$IP function to stri% -oth *eading and trai*ing -*an1s fro) a string
6ATA BNU//B9 ON. F I ON. I9 DDDNote5 three leading and trailing $lan!s9 T0O F I T0O I9 DDDNote5 three leading and trailing $lan!s9 CATBNOB"T4I- F I5I \\ ON. \\ I&I \\ T0O \\ I5I9 CATB"T4I- F I5I \\ "T4I-1ON.3 \\ I&I \\ "T4I-1T0O3 \\ I5I9 -UT ON.F T0OF [ CATBNOB"T4I-F [ CATB"T4I-F9 4UN9 E'%*anation:0ithout the "T4I- function the leading and trailing $lan!s are maintained in the concatenated string. The "T4I- function as advertised removed the leading and trailing <lan!s.
Functions That Compare "trings 1.#act and IFuzzyI Comparisons3 Functions in this section allo+ you to compare strings that are e#actly ali!e 1similar e#cept for case3 or close 1not e#act matches3. -rogrammers find this latter group of functions useful in matching names that may $e spelled differently in separate files.
Function5 CO7-A4. Purpose: To compare t+o character strings. 0hen used +ith one or more modifiers this function can ignore case remove leading $lan!s truncate the longer string to the length of the shorter string and strip quotation mar!s from "A" n&literals. Syntax: CO7-A4.1string-1 string-2 R GmodifiersGS3 0here
string-1 is any "A" character e#pression. string-2 is any "A" character e#pression. modifiers are one or more modifiers placed in single or dou$le quotation mar!s as follo+s5 G or I ignore case. N or ) remove leading $lan!s. ( or remove quotation mar!s from any argument that is an n&literal and ignore case. An n&literal is a string in quotation mar!s follo+ed $y an GnG useful for non&valid "A" names. 5 1colon3 truncate the longer string to the length of the shorter string.
.#amples5& For these e#amples EFTG(H1 = "AJC", EFTG(H2 = " A"C", EFTG(H3 = " 1A"C1(", EFTG(H4 = "A"CXYZ"
-rogram 5&
CO7-A4. F CO7-A4.1"T4IN:( "T4IN:?39 CO7-A4.BI/ F CO7-A4.1"T4IN:( "T4IN:? GI/G39 CO7-A4.BI/BCO/ON F CO7-A4.1"T4IN:( "T4IN:? GI/5G39 6ATA/IN."9 A$c A<C a$c A<C6.F:; (?@ @(( 9 -4OC -4INT 6ATAFCO7-A4. NOO<"9 TIT/. I/isting of 6ata "et CO7-A4.I9 4UN9 .#planation5& The first t+o varia$les .KUA/ and CO/ON use the U-CA". function to convert all the characters to uppercase $efore the comparison is made. The colon modifier follo+ing the equal sign 1the varia$le CO/ON3 is an instruction to truncate the longer varia$le to the length of the shorter varia$le $efore a comparison is made The three CO7-A4. functions demonstrate the coding efficiency of using this function +ith its many modifiers .Using the I/ and colon modifiers allo+s you to compare the t+o strings ignoring case removing leading $lan!s and truncating the t+o strings to a length of @ 1the length of "T4IN:( CA// CO7-CO"T CO7-:.6 and CO7-/.2 The t+o functions CO7-:.6 and CO7-/.2 are $oth used to determine the similarity $et+een t+o strings. The CO7-CO"T call routine allo+s you to customize the scoring system +hen you are using the CO7-:.6 function. CO7-:.6 computes a quantity called genera*i.ed edit distance +hich is useful in matching names that are not spelled e#actly the same. The larger the value the more dissimilar the t+o strings. CO7-/.2 performs a similar function $ut uses a method called the !e,enshtein edit distance. It is more efficient than the generalized edit distance $ut may not $e as useful in name matching.
Function5 CA// CO7-CO"T Purpose: To determine the similarity $et+een t+o strings using a method called the generalized edit distance. The cost is computed $ased on the difference $et+een the t+o strings. >ou need to call this function only once in a 6ATA step. "ince this is a very advanced and complicated routine only a fe+ e#amples of its use +ill $e e#plained
Syntax: CA// CO7-CO"T1Goperation-1G cost-1 R Goperation2G cost-2 ...S3 0here operation is a !ey+ord placed in quotation mar!s. A fe+ !ey+ords are listed here for e#planation purposes $ut see the SAS OnlineDoc 9.1 documentation for a complete list of operations5 -artial /ist of Operations -E)ETE= REP)ACE= SUAP= TR* CATE= cost is a value associated +ith the operation. 2alid values for cost range from X@? TJT to ]@? TJT. .#amples5& CA)) COMPCOST(1REP)ACE=1, 100, 1SUAP=1, 200)7 CA)) COMPCOST(1SUAP=1, 150)7
Note5 Operation can $e upper& or lo+ercase Function5 CO7-:.6 Purpose: To compute the similarity $et+een t+o strings using a method called the generalized edit distance. "ee "-.6I" for a discussion of the possi$le uses of this function. This function can $e used in con,unction +ith CA// CO7-CO"T if you +ant to alter the default costs for each type of spelling error. Syntax: CO7-:.61string-1 string-2 R ma!costS R GmodifiersGS3 string-1 is any "A" character e#pression. string-2 is any "A" character e#pression. maxcost, if specified is the ma#imum cost that +ill $e returned $y the CO7-/.2 function. If the cost computation results in a value larger than maxcost, the value of maxcost +ill $e returned. modifiers placed in single or dou$le quotation mar!s as follo+s5
G or I ignore case. N or ) remove leading $lan!s. ( or remove quotation mar!s from any argument that is an n&literal and ignore case. An n&literal is a string in quotation mar!s follo+ed $y an GnG useful for non&valid "A" names. 5 1colon3 truncate the longer string to the length of the shorter string. Note5 If multiple modifiers are used the order of the modifiers is important. They are applied in the same order as they appear. program5&
.#planation5& This program demonstrates the use of the CO7-:.6 function +ith a "A" n&literal. "tarting +ith 2ersion T "A" varia$le names could contain characters not normally allo+ed in "A" names. The system option 2A/I62A4NA7. is set to IAN>I and the name is placed in quotation mar!s follo+ed $y the letter N. Using the N modifier 1+hich strips quotation mar!s and the GnG from the string3 and the colon modifier 1+hich truncates the longer string to the length of the shorter string3 results in a value of * for the varia$le CO7-?.
Function5 CO7-/.2 Purpose: To compute the similarity $et+een t+o strings using a method called the "evenshtein edit distance. It is similar to the CO7-:.6 function e#cept that it uses less computer resources $ut may not do as good a ,o$ of matching misspelled names. Syntax: CO7-/.21string-1 string-2 R ma!costS R GmodifiersGS3
String1 string2 SAME SAME KIED CASE KIED CASE KIED CASE RO( RY(
$eturns !1, STRI !2) !1, STRI !2) !1,STRI !2,1I1) !1, STRI !2, 999, 1I1) !1, STRI !2)
0 4 0 0 1
Progra):&
Changing the effect of the ca** to CO0PCOST on the resu*t fro) CO0P6E3
6ATA BNU//B9 TIT/. I-rogram +ithout Call to CO7-CO"TI9 IN-UT M( "T4IN:( =C;A4(*. M(( "T4IN:? =C;A4(*.9 6I"TANC. F CO7-:.61"T4IN:( "T4IN:?39 -UT "T4IN:(F "T4IN:?F [ 6I"TANC.F9 6ATA/IN."9 4on 4un A<C A< 9 6ATA BNU//B9 TIT/. I-rogram +ith Call to CO7-CO"TI9 IN-UT M( "T4IN:( =C;A4(*. M(( "T4IN:? =C;A4(*.9 IF BNB F ( T;.N CA// CO7-CO"T1GA--.N6FG @@39
6I"TANC. F CO7-:.61"T4IN:( "T4IN:?39 -UT "T4IN:(F "T4IN:?F [ 6I"TANC.F9 6ATA/IN."9 4on 4un A<C A< 9
.#planation5& The first 6ATA BNU//B program is a simple comparison of "T4IN:( to "T4IN:? using the CO7-:.6 function. The second 6ATA BNU//B program ma!es a call to CO7-CO"T 1note the use of BNBF (3 $efore the CO7-:.6 function is used. In the "A" logs $elo+ you can see that the distance in the second o$servation in the first program is N* +hile in the second program it is @@. That is the result of overriding the default value of N* points for an appending error and setting it equal to @@
Functions That 6ivide "trings into I0ordsI These e#tremely useful functions and call routines can divide a string into +ords. 0ords can $e characters separated $y $lan!s or other delimiters that you specify. SCAN and SCANC The t+o functions "CAN and "CANK are similar. They $oth e#tract I+ordsI from a string +ords $eing defined as characters separated $y a set of specified delimiters. -ay particular attention to the fact that the "CAN and "CANK functions use different sets of default delimiters. The "CANK function also has some additional useful features. -rograms demonstrating $oth of these functions follo+ the definitions.
Function5 "CAN Purpose: .#tracts a specified +ord from a character e#pression +here +ord is defined as the characters separated $y a set of specified delimiters. The length of the returned varia$le is ?** unless previously defined. Syntax:"CAN1character-value n-#ord R $delimiter-listGS3 +here
character-value is any "A" character e#pression. n-word is the nth I+ordI in the string. If n is greater than the num$er of +ords the "CAN function returns a value that contains no characters. If n is negative the character value is scanned from right to left. A value of zero is invalid. delimiter-list is an optional argument. If it is omitted the default set of delimiters are 1for A"CII environments35 JNI(S 4 8 ( 3 0 . + 2 ) 7 > - 5 , , A For .<C6IC environments the default delimiters are5 JNI(S 4 8 ( 3 A 0 . + 2 ) 7 Z - 5 , , A [ If you specify any delimiters only those delimiters +ill $e active. 6elimiters $efore the first +ord have no effect. T+o or more contiguous delimiters are treated as one.
.#amples5& For these e#amples STRI !1 = "A"C -EM" and STRI !2 = "O E#TUO THREE3MO*RAMIVE" This is an A"CII e#ample. Function SCA (STRI SCA (STRI SCA (STRI SCA (STRI SCA (STRI SCA (STRI $eturns !1,2) !1,-1) !1,3) !2,4) !2,2," ") !1,0) "-EM" "-EM" no characters "MIVE" "THREE3MO*RAMIVE" An error in the "A" log
Function5 "CANK Purpose: To e#tract a specified +ord from a character e#pression +ord $eing defined as characters separated $y a set of specified delimiters. The $asic differences $et+een this function and the "CAN function are the default set of delimiters 1see synta# $elo+3 and the fact that a value of * for the +ord count does not result in an error message. "CANK also ignores delimiters enclosed in quotation mar!s 1"CAN recognizes them3.
Syntax: "CANK1character-value n-#ord R Gdelimiter-listGS3 character-value is any "A" character e#pression. n-word is the nth I+ordI in the string 1+ord $eing defined as one or more characters separated $y a set of specified delimiters. If n is negative the scan proceeds from right to left. If n is greater than the num$er of +ords or * the "CANK function +ill return a $lan! value. 6elimiters located $efore the first +ord or after the last +ord are ignored. If t+o or more delimiters are located $et+een t+o +ords they are treated as one. If the character value contains sets of quotation mar!s any delimiters +ithin these mar!s are ignored. delimiter-list is an optional argument. If it is omitted the default set of delimiters are +hite space characters 1$lan! horizontal and vertical ta$ carriage return line feed and form feed
.#amples5& For these e#amples STRI !1 = "A"C -EM", STRI !2 = "O E TUO THREE MO*R MIVE", STRI !3 = "1A" C-1 1X Y1", and STRI !4 = "O E/ 66TUO" Function SCA \(STRI SCA \(STRI SCA \(STRI SCA \(STRI SCA \(STRI SCA \(STRI SCA \(STRI $eturns !1,2) !1,-1) !1,3) !2,4," ") !3,2) !1,0) !4,2," /6")
-rogram(5&
./". 2A/U. F IN-UT1INT.:.4 O.3 ] 1IN-UT1NU7.4ATO4 O.3 [ IN-UT16.NO7INATO4 O.339 P..- "TOCP 2A/U.9 6ATA/IN."9 A<C (A @[O 8>H O T00 N ([O 9 -4OC -4INT 6ATAF-4IC." NOO<"9 TIT/. I/isting of 6ata "et -4IC."I9 4UN9
.#planation5& The "CAN function has many uses $esides merely e#tracting selected +ords from te#t e#pressions. In this program you +ant to convert num$ers such as ?@ N[O into a decimal value [email protected]. An elegant +ay to accomplish this is to use the "CAN function to separate the mi#ed num$er into three parts5 the integer the numerator of the fraction and the denominator. Once this is done all you need to do is to convert each piece to a numerical value 1using the IN-UT function3 and add the integer portion to the fractional portion. If the num$er $eing processed does not have a fractional part the "CAN function returns a $lan! value for the t+o varia$les NU7.4ATO4 and 6.NO7INATO4.
CA// "CAN and CA// "CANK The "CAN and "CANK call routines are similar to the "CAN and "CANK functions. <ut $oth call routines return a position and length of the nth +ord 1to $e used perhaps in a su$sequent "U<"T4 function3 rather than the actual +ord itself. 6ifferences $et+een CA// "CAN and CA// "CANK are the same differences $et+een the t+o functions "CAN and "CANK.
Function5 CA// "CAN Purpose: To $rea! up a string into +ords +here +ords are defined as the characters separated $y a set of specified delimiters and to return the starting position and the length of the nth +ord. Syntax: CA// "CAN1character-value n-#ord position length R $delimiter-listGS3
character-value is any "A" character e#pression. n-word is the nth I+ordI in the string. If n is greater than the num$er of +ords the "CAN call routine returns a value of * for position and length. If n is negative the scan proceeds from right to left. position is the name of the numeric varia$le to +hich the starting position in the character-value of the nth +ord is returned. length is the name of a numeric varia$le to +hich the length of the nth +ord is returned. delimiter-list is an optional argument. If it is omitted the default set of delimiters are 1for A"CII environments35 JNI(S 4 8 ( 3 0 . + 2 ) 7 > - 5 , , A For .<C6IC environments the default delimiters are5 JNI(S 4 8 ( 3 A 0 . + 2 ) 7 Z - 5 , , A [ If you specify any delimiters only those delimiters +ill $e active. 6elimiters are slightly different in A"CII and .<C6IC systems.
.#amples5& For these e#amples STRI !1 = "A"C -EM" and STRI !2 = "O E#TUO THREE3MO*RAMIVE" Function Position $eturns 1,POSITIO ,)E !TH) 5 CA)) SCA (STRI !1,3,POSITIO ,)E !TH) 0 CA)) SCA (STRI !2,1,POSITIO ,)E !TH) 7 CA)) SCA (STRI !2,4,POSITIO ,)E !TH) 4 CA)) SCA (STRI !2,2,POSITIO ,)E !TH," 15 CA)) SCA (STRI !1,0,POSITIO ,)E !TH) CA)) SCA (STRI !1,3 0 1 20 ") missing missing 9
-rogram5&
.#planation5& The "CAN routine is called three times in this program t+ice +ith default delimiters and once +ith the pound sign 1L3 as the delimiter. Notice that using a negative argument results in a scan from right to left.
Function5 CA// "CANK Purpose: To $rea! up a string into +ords +here #ords are defined to $e the characters separated $y a set of specified delimiters and to return the starting position and the length of the nth +ord. The $asic differences $et+een this call routine and CA// "CAN is that CA// "CANK uses +hite space characters as default delimiters and it can accept a value of * for
the n&+ord argument. In addition the "CANK call routine ignores delimiters +ithin quotation mar!s. Syntax: CA// "CAN1character-value n-#ord position length R Gdelimiter-listGS3
.#amples5& For these e#amples STRI !1 = "A"C -EM" and STRI !2 = "O E TUO THREE MO*R MIVE", and STRI !3 = "1A" C-1 1X Y1" Function CA)) SCA \(STRI !1,2,POSITIO ,)E !TH) 3 CA)) SCA \(STRI !1,-1,POSITIO ,)E !TH) 3 CA)) SCA \(STRI !1,3,POSITIO ,)E !TH) 0 CA)) SCA \(STRI !2,4,POSITIO ,)E !TH) 3 CA)) SCA \(STRI !2,2,POSITIO ,)E !TH," 15 CA)) SCA \(STRI !1,0,POSITIO ,)E !TH) 0 CA)) SCA \(STRI !3,2,POSITIO ,)E !TH) 5 Position $eturns 5 5 0 5 ") 9 0 9
Functions That "u$stitute /etters or 0ords in "trings T4AN"/AT. can su$stitute one character for another in a string. T4AN046 is more fle#i$leUit can su$stitute a +ord or several +ords for one or more +ords.
Function5 T4AN"/AT. Purpose To e#change one character value for another. For e#ample you might +ant to change values (XN to the values AX.. Syntax: T4AN"/AT.1character-value to-1 from-1 R ^ to-n from-nS3
character-value is any "A" character e#pression. to-n is a single character or a list of character values. from-n is a single character or a list of characters. .ach character listed in from-n is changed to the corresponding value in to-n. If a character value is not listed in WTO]-( it +ill $e unaffected
.#amples5& In these e#amples CHAR = "12X45", A S = "Y" Function $eturns TRA S)ATE(CHAR,"A"C-E","12345") "A"X-E" TRA S)ATE(CHAR,1A1,111,1"1,121,1C1,131,1-1,141,1E1,151) "A"X-E" TRA S)ATE(A S,"10","Y ") "1"
-rogram5&
.#planation5& In this e#ample you +ant to convert the character values of (XN to the letters AX.. The t+o
arguments in this function seem $ac!+ards to this author. >ou +ould e#pect the order to $e IfromXtoI rather than the other +ay around. I suppose others at "A" felt the same +ay since a more recent function T4AN046 1ne#t e#ample3 uses the Ifrom X toI order for its arguments. 0hile you could use a format along +ith a -UT function to do this translation.
-rogram5&
Con,erting the ,a*ues 727 and 7N7 to 1Ds and GDs
6ATA >."BNO9 /.N:T; C;A4 = (9 IN-UT C;A4 MM9 8 F IN-UT1 T4AN"/AT.1 U-CA".1C;A43 G*(G GN>G3 (.39 6ATA/IN."9 N>nyA<*( 9 -4OC -4INT 6ATAF>."BNO NOO<"9 TIT/. I/isting of 6ata "et >."BNOI9 4UN9
.#planation5& In this program the U-CA". function converts lo+ercase values of InI and IyI to their uppercase equivalents. The T4AN"/AT. function then converts the Ns and >s to the characters I*I and I( I respectively. Finally the IN-UT function does the character to numeric conversion. Note that the data values of I(I and I*I do not get translated $ut do get converted to numeric values.
Function5 T4AN046 Purpose: To su$stitute one or more +ords in a string +ith a replacement +ord or +ords. It +or!s li!e the find and replace feature of most +ord processors. Syntax: T4AN0461character-value from-string to-string3 character-value is any "A" character e#pression.
from-string is one or more characters that you +ant to replace +ith the character or characters in the FO-EFTG(H4 to-string is one or more characters that replace the entire fromstring.
.#amples:For these e#amples STRI ! = "123 EN] ROIL" MROM = "ROIL" and TO = "RL4" Function TRA UR-(STRI TRA UR-(" O^ TRA UR-("O(D TRA UR-("MT4 TRA UR-("O E $eturns !,MROM,TO) GE FPD FG]D","GE","GE (OF") F^O FPTDD","WOYT","4") ROHDTE","MT4"," ") TUO THREE","O E TUO","A "")
"123 EN] RL4" " O^ GE (OF FPD FG]D" "O(D F^O FPTDD" " ROHDTE" "A " THREE"
-rogram5&
.#planation5& T4AN046 is one of the relatively ne+ "A" functionsUand it is enormously useful. This e#ample uses it to help standardize a mailing list su$stituting a$$reviations for full +ords. Another use for this function is to ma!e to&string a $lan! thus allo+ing you to remove +ords such as Qr. or 7r. from an address.
Function5 /.N:T; Purpose: To determine the length of a character value not counting trailing $lan!s. A null argument returns a value of (.
.#amples5& For these e#amples CHAR = "A"C " Function $eturns )E !TH("A"C") )E !TH(CHAR) )E !TH(" ")
3 3 1
Function5 /.N:T;C Purpose: To determine the length of a character value including trailing $lan!s. Syntax5 /.N:T;C1character-value3 character-value is any "A" character e#pression.
.#amples5& For these e#amples CHAR = "A"C " Function $eturns )E !TH("A"C") )E !TH(CHAR) )E !TH(" ")
3 6 1
Function5 /.N:T;7 Purpose: To determine the length of a character varia$le in memory. Syntax: /.N:T;71character-value3 character-value is any "A" character e#pression.
.#amples5& For these e#amples CHAR = "A"C " Function $eturns )E !THM("A"C") )E !THM(CHAR) )E !THM(" ")
3 6 1
Function5 /.N:T;N Purpose: To determine the length of a character value not counting trailing $lan!s. A null argument returns a value of *. Syntax5 /.N:T;N1character-value3 character-value is any "A" character e#pression.
3 3 0
Functions That Count the Num$er of /etters or "u$strings in a "tring The COUNT function counts the num$er of times a given su$string appears in a string. The COUNTC function counts the num$er of times specific characters occur in a string.
Function5 COUNT Purpose: To count the num$er of times a given su$string appears in a string. 0ith the use of a modifier case can $e ignored. If no occurrences of the su$string are found the function returns a *. Syntax:COUNT1character-value find-string R GmodifiersGS3 character-value is any "A" character e#pression find-string is a character varia$le or "A" string literal to $e counted. The follo+ing modifiers placed in single or dou$le quotation mar!s may $e used +ith COUNT5 G or I ignore case. F or T ignore trailing $lan!s in $oth the character value and the find-string. .#amples5& For these e#amples STRI !1 = "HO^
Function $eturns CO* T(STRI !1, STRI !2) CO* T(STRI !1,STRI !2,1I1 CO* T(STRI !1, "XX") CO* T("LG(H I(L LO(H","H ") CO* T("LG(H I(L LO(H","H ","T")
3 4 0 1 2
-rogram5&
Using the COUNT function to count the nu)-er of ti)es the +ord 7the7 a%%ears in a string
6ATA 64ACU/A9 IN-UT "T4IN: =C;A4J*.9 NU7 F COUNT1"T4IN: ItheI39 NU7BNOBCA". F COUNT1"T4IN: ItheI GIG39 6ATA/IN."9 The num$er of times ItheI appears is the question T;. the None on this lineV There is the map 9 -4OC -4INT 6ATAF64ACU/A NOO<9 TIT/. I/isting of 6ata "et 6raculaI9 4UN9
.#planation5 In this program the COUNT function is used +ith and +ithout the I 1ignore case3 modifier. In the first o$servation the first ITheI has an uppercase T so it does not match the su$string and is not counted for the varia$le NU7. <ut +hen the I modifier is used it does count. The same holds for the second o$servation. 0hen there are no occurrences of the su$string as in the third o$servation the function returns a *. The fourth line of data demonstrates that COUNT ignores +ord $oundaries +hen searching for strings
Function5 COUNTC
Purpose: To count the num$er of individual characters that appear or do not appear in a string. 0ith the use of a modifier case can $e ignored. Another modifier allo+s you to count characters that do not appear in the string. If no specified characters are found the function returns a *. Syntax5COUNTC1character-value characters R GmodifiersGS3 character-value is any "A" character e#pression. characters is one or more characters to $e counted. It may $e a string literal 1letters in quotation mar!s3 or a character varia$le. The follo+ing modifiers placed in quotation mar!s may $e used +ith COUNTC5 G or I ignore case. O or O If this modifier is used COUNTC processes the character or characters and modifiers only once. If the COUNTC function is used in the same 6ATA step the previous character and modifier values are used and the current values are ignored. F or T ignore trailing $lan!s in the character-value or the KPITIKFDTE. Note this modifier is especially important +hen loo!ing for $lan!s or +hen you are using the Q modifier 1$elo+3. Q or V count only the characters that do not appear in the KPITIKFDT-QINYD. 4emem$er that this count +ill include trailing $lan!s unless the F modifier is used.
.#amples5& For these e#amples "T4IN:( F I;o+ No+ <ro+n CO0I and "T4IN:? F I+oI Function $eturns CO* TC("AI"JJC-E","C"A") 3 CO* TC("AI"JJC-E","C"A",1I1) 7 CO* TC(STRI !1, STRI !2) 6 CO* TC(STRI !1,STRI !2,1I1) 8 CO* TC(STRI !1, "XX") 0 CO* TC("LG(H I(L LO(H","H ") 4 1? gGs and ? $lan!s3 CO* TC("LG(H I(L LO(H","H ","T") 2 1$lan!s trimmed3 CO* TC("A"C-EIJKLD",""C-",1VI1) 4 1A . a and e3
-rogram5&
E'%*anation:This program demonstrates several features of the COUNTC function. The first use of the function simply loo!s for the num$er of times the uppercase letter A appears in the string. Ne#t $y adding the i modifier the num$er of upper& or lo+ercase AGs is counted. Ne#t +hen you place more than one character in the list the function returns the total num$er of the listed characters. The v modifier is interesting. The first time it is used COUNTC is counting the num$er of characters in the string that are not uppercase AGs.
Function: 0ISSIN6 Purpose: To determine if the argument is a missing 1character or numeric3 value.
This is a handy function to use since you donGt have to !no+ if the varia$le you are testing is character or numeric. The function returns a ( 1true3 if the value is a missing value a * 1false3 other+ise. Syntax5 7I""IN:1varia%le3 variable is a character or numeric varia$le or e#pression.
.#amples5& For these e#amples Function MISSI !( *M1) MISSI !( *M2) MISSI !(CHAR1) MISSI !(CHAR2)
*M1 = 5
$eturns 0 1 0 1
Function5 4ANP Purpose: To o$tain the relative position of the A"CII 1or .<C6IC3 characters. This can $e useful if you +ant to associate each character +ith a num$er so that an A44A> su$script can point to a specific character. Syntax: 4ANP1letter3 letter can $e a string literal or a "A" character varia$le. If the literal or varia$le contains more than one character the 4ANP function returns the collating sequence of the first character in the string. s .#amples5& For these e#amples STRI !1 = "A" and STRI !2 = "XYZ" Function $eturns RA _(STRI !1) RA _(STRI !2) RA _("X") RA _("I")
65 88 88 97
Function5 4.-.AT Purpose: To ma!e multiple copies of a string. syntax: 4.-.AT1character-value n3 character-value is any "A" character e#pression. n is the num$er of repetitions. The result of this function is the original string plus n repetitions. Thus if n equals ( the result +ill $e t+o copies of the original string in the result. If you do not declare the length of the character varia$le holding the result of the 4.-.AT function it +ill default to ?**. .#amples5& For these e#amples STRI ! = "A"C" Function $eturns REPEAT(STRI !,1) REPEAT("HE))O ",3) REPEAT("2",5)
-rogram5&
.#planation5&ssss The program a$ove underlines each string +ith the same num$er of dashes as there are characters in the string. "ince you +ant the line of dashes to $e the same
length as the string you su$tract one from the length remem$ering that the 4.-.AT function results in n ] ( copies of the original string 1the original plus n repetitions3. The t+o important points to remem$er +hen using the 4.-.AT function are5 al+ays ma!e sure you have defined a length for the resulting character varia$le and the result of the 4.-.AT function is n ] ( repetitions of the original string
Function5 4.2.4". Purpose: To reverse the order of te#t of a character value. Syntax: 4.2.4".1character-value3 character-value is any "A" character e#pression. .#amples5& For these e#amples STRI !1 = "A"C-E" and STRI !2 = "XYZ " Function $eturns REVERSE(STRI !1) REVERSE(STRI !2) REVERSE("1234")
-rogram5&
-4OC -4INT 6ATAF<ACP0A46" NOO<"9 TIT/. I/isting of 6ata "et <ACP0A46"I9 4UN9
.#planation5& It is important to realize that if you donGt specify the length of the result it +ill $e the same length as the argument of the 4.2.4". function. Also if there +ere trailing $lan!s in the original string there +ill $e leading $lan!s in the reversed string.
6ate and Time Functions Functions That Create "A" 6ate 6atetime and Time 2alues The first three functions in this group of functions create "A" date values datetime values and time values from the constituent parts 1month day year hour minute second3. The 6AT. and TO6A> functions are equivalent and they $oth return the current date. The 6AT.TI7. and TI7. functions are used to create "A" datetime and time values respectively. Function5 76> Pur%ose: To create a "A" date from the month day and year. S&nta'5 76>1month day year3 month is a numeric varia$le or constant representing the month of the year 1a num$er from ( to (?3. day is a numeric varia$le or constant representing the day of the month 1a num$er from ( to @(3. year is a numeric varia$le or constant representing the year. .#amples For these e#amples 7 F (( Function 76>17 6 >3 76>1(* ?( ()O*3 76>1( ( ()N*3 76>1(@ *( ?**@3 -rogram5&
6 F (N > F ?**@. 4eturns (J*?A 1(NNO2?**@ X formatted value3 TN)) 1?(OCT()O* X formatted value3 &@JN? 1*(QAN()N* X formatted value3 numeric missing value
Creating a SAS date ,a*ue fro) se%arate ,aria-*es re%resenting the da&9 )onth9 and &ear of the date
data funnydate9 input M( 7onth ?. MT >ear A. M(@ 6ay ?.9 6ate F mdy17onth 6ay >ear39 format 6ate mmddyy(*.9
datalines9
*N ?*** ?N
.#planation5& ;ere the values for month day and year +ere not in a form +here any of the standard date informats could $e used. Therefore the day month and year values +ere read into separate varia$les and the 76> function +as used to create a "A" date. -rogram5&
Progra) to read in dates and set the da& of the )onth to 1F if the da& is )issing fro) the date data )issingH in%ut I1 3u))& B1G8H 3a& > scan(3u))&929DJD)H if not )issing(3a&)then 3ate > in%ut(3u))&9))dd&&1G8)H e*se 3ate > )d&(in%ut(scan(3u))&919DJD)928)9 1F9 in%ut(scan(3u))&9#9DJD)9E8))H for)at date dateK8H data*inesH 1GJ21J1KEL 1J J2GGG G1J J2GG2 sss H tit*e 7!isting of 0ISSIN67H %roc %rint data>)issing noo-sH runH
.#planation5& This program reads in a date and +hen the day of the month is missing it uses the (Nth of the month. The entire date is first read as a characterstring as the varia$le 6U77>. Ne#t the "CAN function is e#ecuted +ith the slash character 1[3 as the I+ordI delimiter. The second +ord is the month. If this is not missing the IN-UT function is used to convert the character string into a "A" date. If 6A> is missing the 76> function is used to create the "A" date +ith the value of (N representing the day of the month. Function5 6;7"
Pur%ose: To create a SAS dateti)e ,a*ue fro) a SAS date ,a*ue and a ,a*ue for
the hour9 )inute9 and second8
S&nta': 6;7"1date
date is a SAS date ,a*ue9 either a ,aria-*e or a date constant8 hour is a nu)erica* ,a*ue for the hour of the da&8 If hour is greater than 2E9 the function +i** return the a%%ro%riate dateti)e ,a*ue8 )inute is a nu)erica* ,a*ue for the nu)-er of )inutes8 second is a nu)erica* ,a*ue for the nu)-er of seconds8
Function5 ;7" Pur%ose5 To create a "A" time value from the hour minute and second. S&nta': ;7"1hour minute second3 hour is the value corresponding to the num$er of hours. minute is the value corresponding to the num$er of minutes. second is the value corresponding to the num$er of seconds. .#amples For these e#amples ; F ( 7 F @* " F (N. Function 4eturns ;7"1; 7 "3 NA(N 1(5@*5(N X formatted value3 ;7"1* * ?@3 ?@ 1*5**5?@ X formatted value3
Function5 6AT. and TO6A> 1equivalent functions3 Pur%ose5 To return the current date. S&nta': 6AT.13 or TO6A>13 Note that the parentheses are needed even though these functions do not ta!e any arguments. .#amples5&
Pur%ose: To return the dateti)e ,a*ue for the current date and ti)e8 S&nta': 6AT.TI7.13
.#amples
Function $eturns
Pur%ose: To return the ti)e of da& +hen the %rogra) +as run8 S&nta'5 TI7.13 E'a)%*es Function TI0E()
-rogram5&
This group of functions ta!es a "A" date value and returns parts of the date such as the year the month or the day of the +ee!. "ince these functions are demonstrated in a single program letGs supply the synta# and e#amples.
Function5 >.A4
Pur%ose5 To e#tract the year from a "A" date. S&nta': >.A41date3 date is a "A" date value.
.#amples
Pur%ose5 To e#tract the quarter 1QanuaryX7arch F ( AprilXQune F ? etc.3 from a "A" date. S&nta'5 KT41date3 date is a "A" date value.
.#amples
4eturns ( A
Pur%ose: To e#tract the month of the year from a "A" date 1( F Qanuary ?FFe$ruary etc.3. S&nta': 7ONT;1date3 date is a "A" date value.
.#amples
Function 7ONT;1G(JAU:?**?Gd3
Function5 0..P
4eturns O
Pur%ose: To e#tract the +ee! num$er of the year from a "A" date 1the +ee!&num$er value is a num$er from * to N@ or ( to N@ depending on the optional modifier3. S&nta'5 0..P1RdateS R GmodifierGS3 date is a "A" date value. If date is omitted the 0..P function returns the +ee! num$er of the current date. modifier is an optional argument that determines ho+ the +ee!&num$er value is determined.If modifier is omitted the first "unday of the year is +ee! (.
.#amples5&
4eturns @? * ( N@
Function5 0..P6A>
Pur%ose5 To e#tract the day of the +ee! from a "A" date 1( F "unday ?F7onday etc.3. S&nta'5 0..P6A>1date3 date is a "A" date value.
.#amples
Pur%ose: To e#tract the day of the month from a "A" date a num$er from ( to @(. S&nta'5 6A>1date3 date is a "A" date value.
.#amples
Function 6A>1G(JAU:?**?Gd3
-rogram5&
4eturns (J
3e)onstrating the functions 2EA$9 CT$9 0ONTA9 "EEP9 3A29 and "EEP3A2
data dateBfunctions9 set dates1dropF6ate?39 >ear F year16ate(39 Kuarter F qtr16ate(39 7onth F month16ate(39 0ee! F +ee!16ate(39 6ayBofBmonth F day16ate(39 6ayBofB+ee! F +ee!day16ate(39 run9 title I/isting of 6ata "et 6AT.BFUNCTION"I9 proc print dataFdateBfunctions noo$s9 run9 .#planation5& These $asic date functions are straightfor+ard. They all ta!e a "A" date as the single argument and return the year the quarter the month the +ee! the day of the month or the day of the +ee!. 4emem$er that the 0..P6A> function returns the day of the +ee! +hile the 6A> function returns the day of the month 1itGs easy to confuse these t+o functions3
Functions That .#tract ;ours 7inutes and "econds from "A" 6atetime and Time 2alues
The ;OU4 7INUT. and ".CON6 functions +or! +ith "A" datetime or time values in much the same +ay as the 7ONT; >.A4 and 0..P6A> functions +or! +ith "A" date values.
Function5 ;OU4
Pur%ose5 To e#tract the hour from a "A" datetime or time value. S&nta': ;OU41time or dt3 time or dt is a "A" time or datetime value.
.#amples
For these e#amples 6T F G*?QAN()J*5N5(*5(NGdt T F GN5O5(*GT. Function 4eturns ;OU416T3 N ;OU41T3 N ;OU41;7"1N O )33 N
Function5 7INUT.
Pur%ose: To e#tract the minute value from a "A" datetime or time value. S&nta'5 7INUT.1time or dt3 time or dt is a "A" time or datetime value.
.#amples
For these e#amples 6T F G*?QAN()J*5N5(*5(NGdt T F GN5O5(*GT. Function 4eturns 7INUT.16T3 N 7INUT.1T3 N 7INUT.1;7"1N O )33 N
Function5 ".CON6
Pur%ose: To e#tract the second value from a "A" datetime or time value. S&nta'5 ".CON61time or dt3 time or dt is a "A" time or datetime value.
.#amples
".CON61;7"1N O )33
-rogram5&
The varia$le 6T is a "A" datetime value 1computed as a "A" datetime constant3 and T is a "A" time value 1computed as a "A" time constant3. The program demonstrates that the ;OU4 7INUT. and ".CON6 functions can ta!e either "A" datetime or time values as arguments.
Functions That .#tract the 6ate or Time from "A" 6atetime 2alues
The 6AT.-A4T and TI7.-A4T functions e#tract either the date or the time from a "A" datetime value 1the num$er of seconds from Qanuary ( ()J*3.
Function5 6AT.-A4T
Pur%ose: To compute a "A" date from a "A" datetime value. S&nta'5 6AT.-A4T1date&time&value3 date&time&value is a "A" datetime value.
Function5 TI7.-A4T
Pur%ose5 To e#tract the time part of a "A" datetime value. S&nta': TI7.-A4T1date&time&value3 6ate&time&value is a "A" datetime value.
-rogram
;a*ue
data piecesBparts9 6T F G*(,an()J*5N5(N5@*Gdt9 6ate F datepart16T39 Time F timepart16T39 format 6T datetime. Time time. 6ate date).9 run9 title I/isting of 6ata "et -I.C."B-A4T"I9 proc print dataFpiecesBparts noo$s9 run9 .#planation5&
The 3ATEPA$T and TI0EPA$T functions e'tract the date and the ti)e fro) the dateti)e ,a*ue9 res%ecti,e*&8 These t+o functions are es%ecia**& usefu* +hen &ou i)%ort data fro) other sources8
Functions That 0or! +ith 6ate 6atetime and Time Intervals s
Functions in this group +or! +ith date or time intervals. The INTCP function +hen used +ith date or datetime values can determine the num$er of interval $oundaries crossed $et+een t+o dates. 0hen used +ith "A" time values it can determine the num$er of hour minute or second $oundaries $et+een t+o time values. The INTN8 function +hen used +ith "A" date or datetime values is used to determine the date after a given num$er of intervals have passed. 0hen used +ith"A" time values it computes the time after a given num$er of time interval units have passed.
Function5 INTCP
Pur%ose5 To return the num$er of intervals $et+een t+o dates t+o times or t+o datetime values. To $e more accurate the INTCP function counts the num$er of times a $oundary has $een crossed going from the first value to the second. For e#ample if the interval is >.A4and the starting date is Qanuary ( ?**? and the ending date is 6ecem$er @( ?**? the function returns a *. The reason for this is that the $oundary for >.A4 is Qanuary ( and even though the starting date is on a $oundary no $oundaries are crossed in going from the first date to the second. S&nta'5 INTCP1GintervalR7ultipleSR.shiftSG
end&value3 start&value
interval can $e date units or time units or datetime units multiple is an optional modifier in the interval. >ou can specify multiples of an interval. For e#ample 7ONT;? specifies t+o&month intervals9 6A>N* specifies N*&day intervals. .shift is an optional parameter that determines the starting point in an interval. For e#ample >.A4.A specifies yearly intervals starting from April (.
"hift value for "A" date and datetime values5
Interval >.A4 ".7I>.A4 KT4 7ONT; ".7I7ONT; T.N6A> 0..P6A> 0..P 6A> Interval ;OU4 7INUT. ".CON6
Function5 INTN8
"hift 2alue 7onth 7onth 7onth 7onth "emimonthD Tenday 6ay 6ay 6ay "hift 2alue ;ourD 7inuteD "econdD
Pur%ose5 To return the date after a specified num$er of intervals have passed. S&nta'5 INTN81GintervalG start&date increment R GalignmentGS3 interval is one of the same values that are used +ith the INTCP function 1placed in quotation mar!s3. start&date is a "A" date. increment is the num$er of intervals $et+een the start date and the date returned $y the function. alignment is an optional argument and has a value of <.:INNIN: 1<3 7I66/. 173 .N6 1.3 or "A7.6A>1"3. The default is <.:INNIN:.
Function5 >46IF
Pur%ose: To return the difference in years $et+een t+o dates 1includes fractional parts of a year3. S&nta'5 >46IF1start&date end&date G$asisG3 start&date is a "A" date value. end&date is a "A" date value.
$asis is an argument that controls ho+ "A" computes the result. The first value is used to specify the num$er of days in a month9 the second value 1after the slash3 is used to specify the num$er of days in a year.
-rogram5&
Function That Computes 6ates of "tandard ;olidays Function5 ;O/I6A> -urpose5 4eturns a "A" date given a holiday name and a year. "ynta#5 ;O/I6A> 1holiday year3 holiday is a holiday name 1see list $elo+3. year is a numeric varia$le or constant that represents the year. .#amples5& unctions That 0or! +ith Qulian 6ates
This group of functions involves Qulian dates. Qulian dates are commonly used in computer applications and represent a date as a t+o& or four&digit year follo+ed $y a three&digit day of the year 1( to @JN or @JJ if it is a leap year3. For e#ample Qanuary @ ?**@ in Qulian notation +ould $e either ?**@**@ or *@**@. 6ecem$er @( ?**@ 1a non&leap year3 +ould $e either ?**@@JN or *@@JN.
Function5 6AT.QU/
Pur%ose5 To convert a Qulian date into a "A" date. S&nta': 6AT.QU/1,ul&date3 ,ul&date is a numerical value representing the Qulian date in the form dddyy or dddyyyy
.#amples5&
For these e#amples Q6AT. F ()J*(?@. Function 4eturns 6AT.QU/1()J***(3 * 1*(QAN()J* formatted3 6AT.QU/1?**@@JN3 (J*T* 1@(6.C?**@ formatted3 6AT.QU/1Q6AT.3 (?? 1*?7A>()J* formatted3
Function5 QU/6AT.
Pur%ose: To convert a "A" date into a Qulian date. S&nta': QU/6AT.1date3 date is a "A" date. .#amples For these e#amples 6AT. F G@(6.C?**@G6. Function 4eturns QU/6AT.16AT.3 @@JN QU/6AT.1G*(QAN()J*G63 J*** QU/6AT.1(??3 J*(?
Function5 QU/6AT.T
Pur%ose5 To convert a "A" date into seven&digit Qulian date. S&nta'5 QU/6AT.T1date3 date is a "A" date. .#amples For these e#amples 6AT. F G@(6.C?**@G6. Function 4eturns QU/6AT.T16AT.3 ?**@@JN QU/6AT.T1G*(QAN()J*G63 ()J***( QU/6AT.T1(??3 ()J*(?@
-rogram5&
datalines9 *(QAN()J* ?**@@JN (N7A>()*( ()*N**( ?(OCT()AJ N**( 9 title I/isting of 6ata "et QU/IANI9 proc print dataF,ulian noo$s9 var 6ate "asBtoB,date "asBtoB,dateT Qdate QdateBtoBsas9 run9 .#planation5& It is important to realize that Qulian dates +ithout four&digit years +ill $e converted to "A" dates $ased on the value of the >.A4CUTOFF system option. To avoid any pro$lems it is $est to use seven&digit Qulian dates.