Data reshaping in Stata is a common task in data analysis and is often used to prepare data for further analysis or to make it more easily interpretable. Usually, when we download data from premium databases such as Thomson Reuters Datastream, the data is not presented in ready to analysis shape. A common example of this type of data is following:
You can see above that primarily, there are only two variables: id and income. However, income variable is available in three years, 2020 to 2022, but they are presented in columns. This is called the wide format. Sometimes, this type of data is not useful for analysis, hence, we need to reshape them to long format so that income variable appears in column, and a new variable should be generated for year. In this way, analysis would be easy to perform in Stata.
Types of Reshaping
There are two main types of data reshaping: wide-to-long reshaping and long-to-wide reshaping:
- Wide-to-long Reshaping data
- Long-to-Wide Reshaping data
Wide-to-long Reshaping data
Wide-to-long reshaping is used to convert data that is already in wide format into long format. Wide format data typically has many columns, with each column representing a different variable. Long format data, on the other hand, has fewer columns but more rows, with each row representing a different observation. The goal of wide-to-long reshaping is to make the data more compact and easier to analyze. The above example of data is in long format.
Long-to-Wide Reshaping data
Long-to-wide reshaping is the opposite of wide-to-long reshaping and is used to convert data that is in long format into wide format. Long format data typically has fewer columns but more rows, with each row representing a different observation. Wide format data, on the other hand, has many columns, with each column representing a different variable. The goal of long-to-wide reshaping is to make the data more easily interpretable and to create new variables.
Converting wide data to long form
Stata provides a command called reshape that can be used to reshape data. The basic syntax of the command is as follows:
reshape [long|wide] [, options]
The long option is used to convert wide data to long format, and the wide option is used to convert long data to wide format.
For example, suppose you have a dataset in wide format as shown in above example database, with variables id, income2020, income2021, and income2022. The variables present income variable information from 2020 to 2022. To convert this data to long format, you would use the following code:
reshape long income, i(id) j(year)
This would create two new variables, income and year, which represent the income and the time period in form of year variable, respectively. The values of income2020, income2021, and income20202 would be repeated for each combination of id and year.
Converting long data to wide form
Now, the above data is in long form. To convert it into again wide form, you would use the following code:
reshape wide income, i(id) j(year)
This would create new variables, income2020, income2021, and income2022, where year is the time period for each variable as shown below.
Usually, wide format data is not commonly used for data analysis. The analysts mostly use long form data to perform the data analysis.
It’s important to note that the i(id) and j(year) options are just examples of variable names, you should replace them with the appropriate variable names in your dataset.