The correlation analysis in Stata is performed to measure the association or relationship between two variables. The Stata is one of the statistical software which helps us to do the correlation analysis. The value of correlation is denoted by r when we interpret the correlation. For example, if value of correlation is 0.04, we will write it as r=0.04 while interpreting the correlation.
Interpretation of Correlation Values
The value of correlation will always range from -1 to +1. There may be positive or negative sign with it. The positive correlation sign means the positive relationship between two variables. The negative correlation sign means the negative relationship between two variables. The value of 0 means no correlation between two variables.
Correlation Analysis Example in Stata
To illustrate the example on how to do correlation analysis in Stata, I have used the example dataset of auto vehicles from the Stata software.
The data can be accessed using following code in Stata:
sysuse auto.dta
To get the correlation table, simply use the following code:
correlate price mpg
In above table, I have used two variables: one is price and other is mgp to estimate their nature of relationship. The value of mpg correlation with price is -0.4686. First, the negative correlation sign means the negative relationship between these two variables. If value of mpg will increase, the value of price will decrease, and vice versa. The value of 0.4686 shows the medium correlation between these variables.
The value under price column and price row is 1.000. The correlation between a variable and itself will always be equal to 1, means perfect correlation. The similar case is with mpg row and mpg column value.
What is the difference between Pwcorr and Corr in Stata?
There are two types of commands in Stata to estimate the correlation. The corr or correlate and pwcorr. The difference between two methods is the treatment of missing value in the data. If you have no missing values in your data, then you can use any one of these two. If you have missing values in your data, there is still no hard and fast rule to select any one method out of these two.
The corr command deletes the missing data in listwise while pwcorr deletes the missing values as pairwise. The pairwise correlation of above example is following:
pwcorr price mpg
You can see the result is same because we had no missing values in our data.
What are the 4 types of correlation?
There are 4 types of correlation:
- Positive correlation
- Negative correlation
- No correlation
- Non-linear correlation
Positive Correlation
Positive correlation means the nature of relationship between two variables is positive. For example, in above table, the trunk variable has positive correlation value. It means if trunk value increases, the price value will also increase. If trunk value will decrease, the price value will also decrease.
Negative Correlation
Negative correlation means the nature of relationship between two variables is negative. For example, in above table, the mpg variable has negative correlation value. It means if mpg value increases, the price value will decrease. If mpg value will decrease, the price value will increase.
No Correlation
If the value of correlation is 0, it would mean that there is no correlation between two variables.
Non-linear Correlation
If there is a correlation between two variables, however, the correlation is not linear, this is called non-linear correlation. For example, at some time, the correlation is positive, at some time, it is negative. This is called non-linear correlation.
Which method is best for correlation?
There are 2 tools to study correlation. First is correlation matrix and other is scatter plot. Both gives the same meaning and interpreting of correlation among two variables. However, the scatter plot is a visual thing to quickly understand the correlation. Therefore, it is better to present both methods to truly understand the correlation between two variables.
How to read correlation on a scatter plot?
The above scatter plot shows the correlation between two variables. To understand the correlation on a scatter plot, it is the best practice to also draw the trend line on it as I have drawn on above scatter plot. You can use the following code to get it in Stata:
twoway (scatter price mpg, sort) (lfit price mpg, sort)
The values in trend line also show the negative correlation between price and mpg variable. The trend line is downward placed which is a sign of negative correlation. If the trend line is placed upward, it shows the positive correlation between two variables.
In the following section, I have answered the exact questions which mostly my students have asked from me.