0% found this document useful (0 votes)
9 views10 pages

(Data & Variable Management) Converting Many String Variables To Numeric Variables

Uploaded by

ipz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views10 pages

(Data & Variable Management) Converting Many String Variables To Numeric Variables

Uploaded by

ipz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

10/14/2020 How can I quickly convert many string variables to numeric variables?

| Stata FAQ

NUMERIC VARIABLES? | STATA FAQ


There may be times that you receive a file that has many (or all) of the variables defined as strings, that
is, character variables. The variables may contain numeric values, but if they are defined as type string,
there are very few things you can do to analyze the data. You cannot get means, you cannot do a
regression, you cannot do an ANOVA, etc… Sometimes the dataset contains numerical values that are
stored as strings. We will address this scenario first. Then we will address the case where the string
variables actually contain strings, and the goal is to assign each value the string takes on to a numeric
value.

All of the examples on this page use the same dataset, so let’s start by examining the data. The example
dataset, hsbs, is a subset of the High School and Beyond data file with all of the variables as string
variables.

use https://stats.idre.ucla.edu/stat/stata/faq/hsbs, clear

As you see from the describe command below, the variables are all defined as string variables (e.g.,
science is str2, a string of length 2).

describe

Contains data from hsbs.dta


b 20

https://stats.idre.ucla.edu/stata/faq/how-can-i-quickly-convert-many-string-variables-to-numericvariables/#:~:text=One method of converting num… 1/10


10/14/2020 How can I quickly convert many string variables to numeric variables? | Stata FAQ

obs: 20
vars: 6 25 Nov 1999 19:09
size: 400 (99.6% of memory free)
-------------------------------------------------------------------------------
1. id str3 %3s
2. gender str1 %5s
3. race str5 %9s
4. schtyp str3 %5s
5. read str2 %5s
6. science str2 %5s
-------------------------------------------------------------------------------
Sorted by:

Now that we know the variables are string variables, we can use the list command to see what the
strings stored in these variables look like. Although the variable science is defined as str2, you can see
from the list below that it contains just numeric values. Even so, because the variable is defined as str2,
Stata cannot perform any kind of numerical analysis of the variable science. The same is true for the
variable read.

list

id gender race schtyp read science


1 70 1 b 45 47

https://stats.idre.ucla.edu/stata/faq/how-can-i-quickly-convert-many-string-variables-to-numericvariables/#:~:text=One method of converting num… 2/10


10/14/2020 How can I quickly convert many string variables to numeric variables? | Stata FAQ

1. 70 m 1 pub 45 47
2. 121 f 1 pub 68 63
3. 86 m 1 pub 44 58
4. 141 m 1 pub 63 53
5. 172 m 1 pub 47 53
6. 113 m 1 pub 44 63
7. 50 m 3 pub 50 53
8. 11 m 2 pub 34 39
9. 84 m 1 pub 63 .
10. 48 m 3 pub 57 50
11. 75 m 1 pub 60 53
12. 60 m X pub 57 63
13. 95 m 1 pub 73 61
14. 104 m 1 pub 54 55
15. 38 m 3 pub 45 31
16. 115 m 1 pub 42 50
17. 76 m 1 pub 47 50
18. 195 m 1 pri 57
19. 114 m 1 pub 68 55
20. 85 m 1 pub 55 53

Converting string variables with numeric values


One method of converting numbers stored as strings into numerical variables is to use a string function
called real that translates numeric values stored as strings into numeric values Stata can recognize as
such. The first line of syntax reads in the dataset shown above. The second generates a new variable
read_n that is equal to the value of the number stored in the string variable read. The real(s) is the
function that translates the values held as strings, where s is the variable containing strings.

gen read_n = real(read)


list read read_n in 1/5

https://stats.idre.ucla.edu/stata/faq/how-can-i-quickly-convert-many-string-variables-to-numericvariables/#:~:text=One method of converting num… 3/10


10/14/2020 How can I quickly convert many string variables to numeric variables? | Stata FAQ

+---------------+
| read read_n |
|---------------|
1. | 45 45 |
2. | 68 68 |
3. | 44 44 |
4. | 63 63 |
5. | 47 47 |
+---------------+

A second method of achieving the same result is the command destring. Let’s try using the destring
command and see how it works. The first line of syntax loads the dataset again, so that we are starting
with a dataset containing only string variables again. The second line of syntax runs the destring
command.

use https://stats.idre.ucla.edu/stat/stata/faq/hsbs, clear


destring , replace

id has all characters numeric; replaced as int


gender contains non-numeric characters; no replace
race contains non-numeric characters; no replace
schtyp contains non-numeric characters; no replace
read has all characters numeric; replaced as byte
science has all characters numeric; replaced as byte
(2 missing values generated)

As you can see from the describe command below, the destring command converted all of the variables
to numeric, except for race, gender and schtyp. Since these variables had characters in them, the
destring command left such variables alone. If there had been any numeric variables in the dataset, they
would remain unchanged.

describe

Contains data from https://stats.idre.ucla.edu/stat/stata/faq/hsbs.dta


b 20

https://stats.idre.ucla.edu/stata/faq/how-can-i-quickly-convert-many-string-variables-to-numericvariables/#:~:text=One method of converting num… 4/10


10/14/2020 How can I quickly convert many string variables to numeric variables? | Stata FAQ

obs: 20
vars: 6 25 Nov 1999 19:09
size: 340 (98.5% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
id int %10.0g
gender str1 %5s
race str5 %9s
schtyp str3 %5s
read byte %10.0g
science byte %10.0g
-------------------------------------------------------------------------------
Sorted by:
Note: dataset has changed since last saved

Both of the techniques described above have attributes that in some situations are advantages and in
other situations may be disadvantages. The command destring can be run on an entire dataset in one
step, the method using the function “real” requires issuing a command for each variable to be converted
(although this can be done with a loop rather than typing out the syntax for each variable). One potential
advantage to using the function “real” (the first method) is that if the function “real” encounters a non-
numeric value, it sets the variable equal to missing in that case and moves on. To some extent destring
can be made to behave similarly, but not identically. In order to convert a string variable containing any
non-numeric value using destring one must list the characters that should be ignored (e.g. “,” or “.”).
Additionally, rather than setting values for those cases containing non-numeric values to missing (what
the function “real” does), destring removes the specified non-numeric characters. destring will extract
the specified strings and then convert, meaning that “a4” can be converted to “4”. destring‘s behavior is
very good if one has numeric values stored as strings that occasionally contain things like commas (e.g.
4,354), but there may be situations where this behavior is undesirable.

Converting string variables with non-numeric values into numeric values


How do we convert gender and schtyp into numeric values? We can use the encode command as
shown below. These commands create gender2 and schtyp2.

encode gender, generate(gender2)


encode schtyp, generate(schtyp2)

https://stats.idre.ucla.edu/stata/faq/how-can-i-quickly-convert-many-string-variables-to-numericvariables/#:~:text=One method of converting num… 5/10


10/14/2020 How can I quickly convert many string variables to numeric variables? | Stata FAQ

Notice in the describe command below that gender2 and schtyp2 are numeric variables and they have
labels associated with them (called gender2 and schtyp2).

describe

Contains data from https://stats.idre.ucla.edu/stat/stata/faq/hsbs.dta


obs: 20
vars: 8 25 Nov 1999 19:09
size: 560 (99.8% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
id str3 %3s
gender str1 %5s
race str5 %9s
schtyp str3 %5s
read str2 %5s
science str2 %5s
gender2 long %8.0g gender2
schtyp2 long %8.0g schtyp2
-------------------------------------------------------------------------------
Sorted by:
Note: dataset has changed since last saved

If we list out the data, it appears that gender2 and schtyp2 are identical to gender and schtyp, however
they are really numeric and what you are seeing are the value labels associated with the variables.

list

id gen~r race sch~p read sci~e gender2 schtyp2


1 70 1 b 45 47 b

https://stats.idre.ucla.edu/stata/faq/how-can-i-quickly-convert-many-string-variables-to-numericvariables/#:~:text=One method of converting num… 6/10


10/14/2020 How can I quickly convert many string variables to numeric variables? | Stata FAQ

1. 70 m 1 pub 45 47 m pub
2. 121 f 1 pub 68 63 f pub
3. 86 m 1 pub 44 58 m pub
4. 141 m 1 pub 63 53 m pub
5. 172 m 1 pub 47 53 m pub
6. 113 m 1 pub 44 63 m pub
7. 50 m 3 pub 50 53 m pub
8. 11 m 2 pub 34 39 m pub
9. 84 m 1 pub 63 . m pub
10. 48 m 3 pub 57 50 m pub
11. 75 m 1 pub 60 53 m pub
12. 60 m X pub 57 63 m pub
13. 95 m 1 pub 73 61 m pub
14. 104 m 1 pub 54 55 m pub
15. 38 m 3 pub 45 31 m pub
16. 115 m 1 pub 42 50 m pub
17. 76 m 1 pub 47 50 m pub
18. 195 m 1 pri 57 m pri
19. 114 m 1 pub 68 55 m pub
20. 85 m 1 pub 55 53 m pub

Below we use the nolabel option and you see that gender2 and schtyp2 are really numeric.

list, nolabel

id gen~r race sch~p read sci~e gender2 schtyp2


1 70 1 b 45 47 2 2

https://stats.idre.ucla.edu/stata/faq/how-can-i-quickly-convert-many-string-variables-to-numericvariables/#:~:text=One method of converting num… 7/10


10/14/2020 How can I quickly convert many string variables to numeric variables? | Stata FAQ

1. 70 m 1 pub 45 47 2 2
2. 121 f 1 pub 68 63 1 2
3. 86 m 1 pub 44 58 2 2
4. 141 m 1 pub 63 53 2 2
5. 172 m 1 pub 47 53 2 2
6. 113 m 1 pub 44 63 2 2
7. 50 m 3 pub 50 53 2 2
8. 11 m 2 pub 34 39 2 2
9. 84 m 1 pub 63 . 2 2
10. 48 m 3 pub 57 50 2 2
11. 75 m 1 pub 60 53 2 2
12. 60 m X pub 57 63 2 2
13. 95 m 1 pub 73 61 2 2
14. 104 m 1 pub 54 55 2 2
15. 38 m 3 pub 45 31 2 2
16. 115 m 1 pub 42 50 2 2
17. 76 m 1 pub 47 50 2 2
18. 195 m 1 pri 57 2 1
19. 114 m 1 pub 68 55 2 2
20. 85 m 1 pub 55 53 2 2

What about the variable race? It is still a character variable because our prior destring command saw the
X in the data and did not attempt to convert it because it had non-numeric values. Below we can convert
it to numeric by include the ignore(X) option that tells destring to convert the variable to numeric and
when it encounters X to convert that to a missing value. You can see the results in the list command
below.

destring race, replace ignore(X)


race: characters X removed; replaced as byte
(1 missing value generated)

https://stats.idre.ucla.edu/stata/faq/how-can-i-quickly-convert-many-string-variables-to-numericvariables/#:~:text=One method of converting num… 8/10


10/14/2020 How can I quickly convert many string variables to numeric variables? | Stata FAQ

list

id gen~r race sch~p read sci~e gender2 schtyp2


1. 70 m 1 pub 45 47 m pub
2. 121 f 1 pub 68 63 f pub
3. 86 m 1 pub 44 58 m pub
4. 141 m 1 pub 63 53 m pub
5. 172 m 1 pub 47 53 m pub
6. 113 m 1 pub 44 63 m pub
7. 50 m 3 pub 50 53 m pub
8. 11 m 2 pub 34 39 m pub
9. 84 m 1 pub 63 . m pub
10. 48 m 3 pub 57 50 m pub
11. 75 m 1 pub 60 53 m pub
12. 60 m . pub 57 63 m pub
13. 95 m 1 pub 73 61 m pub
14. 104 m 1 pub 54 55 m pub
15. 38 m 3 pub 45 31 m pub
16. 115 m 1 pub 42 50 m pub
17. 76 m 1 pub 47 50 m pub
18. 195 m 1 pri 57 m pri
19. 114 m 1 pub 68 55 m pub
20. 85 m 1 pub 55 53 m pub

As you have seen, we can use destring to convert string variables that contain numbers into numeric
variables, and it can handle situations where some values are stored as a character (like the X we saw
with race). If you have a character variable that is stored as all characters, you can use encode to
convert the character variable to numeric and it will create value labels that have the values that were
stored with the character variable.

Fore more information, see the help or reference manual about the destring
(http://www.stata.com/help.cgi?destring) and encode (http://www.stata.com/help.cgi?encode)
commands.

Click here to report an error on this page or leave a comment

How to cite this page (https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-how-do-i-cite-web-pages-


and-programs-from-the-ucla-statistical-consulting-group/)

https://stats.idre.ucla.edu/stata/faq/how-can-i-quickly-convert-many-string-variables-to-numericvariables/#:~:text=One method of converting num… 9/10


10/14/2020 How can I quickly convert many string variables to numeric variables? | Stata FAQ

© 2020 UC REGENTS (http://www.ucla.edu/terms-of-use/) HOME (/) CONTACT (/contact)

https://stats.idre.ucla.edu/stata/faq/how-can-i-quickly-convert-many-string-variables-to-numericvariables/#:~:text=One method of converting nu… 10/10

You might also like