In this blog, I have explained a step-by-step approach to change string variable to numeric variable in Stata. Actually, Stata needs numerical data to process it and do the statistical analysis. However, sometimes, we come up with the data which has non-numeric characters in it. This guide will explain how to deal with such situations.

There may be three broad scenarios with your data in Stata:

  1. Your data is in numeric form; however, it is coded as string type in Stata
  2. Your data is in numeric form, however, some of the cells contain non-numeric characters
  3. Your data fully contains string characters

Example Data Description

all string variables in Stata | browser view mode
all string variables in Stata | browser view mode

I have generated the above data to simplify things for you. You may be encountered for these string variable issues in your data. The red data in Stata means data is not in character (integer or float). In above data, we can see that id variable has all numeric characters but still it is red. This means that the variable type is string.

In gender and country variables, we have two types of characters in it. Likewise, in income variable, all of the data is numeric, but there is one string character “X” in it. In math, all of the data is numeric but still it is showing as string. However, in physics, there is one missing value and one “.” in it. So, how to change this data which is useable for data analysis?

We can also check the type of variables using following command:

describe
describe command output in Stata for string variables
describe command output in Stata for string variables

See the storage type column in front of each variable. All variables are string variables.

There are two methods to change string variables to numeric variable in Stata. One is using destring command and other is real() command. We will first explain destring method and at the end, we will also explain the real() command.

Solution: Data is numeric but coded as string type

If your data is in numeric form but it still shows as read while you browse the data. By the way, you can also explore the original data in your Stata window using list command as follows:

list in 1/10
list command in Stata showing string variables
list command in Stata showing string variables

This will show the first 10 rows in your Stata window. We can see that id and math variables are coded in numeric form, but they still appear as string. To change them into numeric form, use following command:

destring, replace
destring replace command output in Stata
destring replace command output in Stata

The above message will appear in the Stata window. It shows the progress of converting variables into numeric form. We can see that three variables status show that “all characters numeric’ replaced as byte”. This means that variables are successfully changed to numeric form.

Note: There are two commands to use with destring. One is replace and other is generate. If you use generate command with destring, it will generate new variables. When you use replace command, it replace the existing variables to numeric form.
Variables after destring command in Stata - browser view mode
Variables after destring command in Stata – browser view mode

We can also confirm from browser window that three variables have been marked as black colored.

Also Read: How To Do Descriptive Statistics In Stata – A Comprehensive Guide With Example

Solution: contains nonnumeric characters; no replace

We usually encountered this error that data contains non-numeric characters; no replace. To overcome this type of error, we use the encode command in Stata. As in our dataset (shown above), gender and country are still string variables. (The income is also string, but we need other method to correct this variable)

encode in Stata

To change gender and country string variables into numeric variables, use following command, one by one:

encode gender, generate (gender2)
encode country, generate (country2)
encode command output in Stata | browser view
encode command output in Stata | browser view

This will generate two new variables. As we can see above, the values of these variables have changed to blue. Actually, the encode command converts the string values into label and automatically assign numeric numbers to labels.

We can confirm this by using nolable command with list as shown following:

list in 1/10, nolabel
list command in Stata after encode command
list command in Stata after encode command

The nolabel command will not show the labels and only show the numbers. Another thing to note is that encode command requires to generate new variables. Therefore, the existing variables also exist in above table.

Remove characters from string

This is the last scenario which we can encounter with in Stata. We have all data in numeric form; however, some characters are in string. In below picture, we can see that only income variable is string, rest all have been changed or converted to numeric form due to “X” in income variable. So, how to remove it?

character in numeric series in Stata | browser view
character in numeric series in Stata | browser view

We again need destring command to remove this character. Use following command:

destring income, replace ignore (X)
character removed in numeric series in Stata | browser view
character removed in numeric series in Stata | browser view

Now, you can see that “X” has been replaced by the missing value and whole variable has been converted to numeric form. The above command replaces the string character with missing value. In this way, we can change the characters in a string variable and convert it to numeric form.

real() command to change string variable to numeric:

We can also use the real() command to convert string variable to numeric variable. However, this option is not much popular among the researchers and data analysts. Because we must use it for each variable (We can convert all variables with string in single command). Further, it only converts those string variables which are coded in numeric form but stored as string type. To convert a string variable to numeric variable, use following command using real() command:

generate id_new = real(id)
real() command output in Stata
real() command output in Stata

Conclusion

There are three types of errors which we encounter with in Stata related to string variables. First, data is in numeric form, but variable is stored as string. Second, data is stored as string in Stata, meaning that actually data is in character form, not numeric form. Third, data is mix of numeric and characters. We can use destring command which is the best command to deal with this issue. We can also use the real() command; however, it is not much popular due to easiness of use of destring command.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *