The correlation analysis in Stata is performed to measure the association or relationship between two variables. The Stata is one of the statistical software which helps us to do the correlation analysis. The value of correlation is denoted by r when we interpret the correlation. For example, if value of correlation is 0.04, we will write it as r=0.04 while interpreting the correlation.

Interpretation of Correlation Values

The value of correlation will always range from -1 to +1. There may be positive or negative sign with it. The positive correlation sign means the positive relationship between two variables. The negative correlation sign means the negative relationship between two variables. The value of 0 means no correlation between two variables.

Correlation Analysis Example in Stata

To illustrate the example on how to do correlation analysis in Stata, I have used the example dataset of auto vehicles from the Stata software.

The data can be accessed using following code in Stata:

sysuse auto.dta

To get the correlation table, simply use the following code:

correlate price mpg
Correlation Matrix in Stata
Correlation Matrix in Stata

In above table, I have used two variables: one is price and other is mgp to estimate their nature of relationship. The value of mpg correlation with price is -0.4686. First, the negative correlation sign means the negative relationship between these two variables. If value of mpg will increase, the value of price will decrease, and vice versa. The value of 0.4686 shows the medium correlation between these variables.

The value under price column and price row is 1.000. The correlation between a variable and itself will always be equal to 1, means perfect correlation. The similar case is with mpg row and mpg column value.

What is the difference between Pwcorr and Corr in Stata?

There are two types of commands in Stata to estimate the correlation. The corr or correlate and pwcorr. The difference between two methods is the treatment of missing value in the data. If you have no missing values in your data, then you can use any one of these two. If you have missing values in your data, there is still no hard and fast rule to select any one method out of these two.

The corr command deletes the missing data in listwise while pwcorr deletes the missing values as pairwise. The pairwise correlation of above example is following:

pwcorr price mpg
Pairwise Correlation Matrix in Stata
Pairwise Correlation Matrix in Stata

You can see the result is same because we had no missing values in our data.

What are the 4 types of correlation?

There are 4 types of correlation:

  • Positive correlation
  • Negative correlation
  • No correlation
  • Non-linear correlation

Positive Correlation

Positive Correlation Value in Stata
Positive Correlation Value in Stata

Positive correlation means the nature of relationship between two variables is positive. For example, in above table, the trunk variable has positive correlation value. It means if trunk value increases, the price value will also increase. If trunk value will decrease, the price value will also decrease.

Negative Correlation

Negative Correlation Value in Stata
Negative Correlation Value in Stata

Negative correlation means the nature of relationship between two variables is negative. For example, in above table, the mpg variable has negative correlation value. It means if mpg value increases, the price value will decrease. If mpg value will decrease, the price value will increase.

No Correlation

If the value of correlation is 0, it would mean that there is no correlation between two variables.

Non-linear Correlation

If there is a correlation between two variables, however, the correlation is not linear, this is called non-linear correlation. For example, at some time, the correlation is positive, at some time, it is negative. This is called non-linear correlation.

Which method is best for correlation?

There are 2 tools to study correlation. First is correlation matrix and other is scatter plot. Both gives the same meaning and interpreting of correlation among two variables. However, the scatter plot is a visual thing to quickly understand the correlation. Therefore, it is better to present both methods to truly understand the correlation between two variables.

How to read correlation on a scatter plot?

Scatter Plot with Trend line in Stata
Scatter Plot with Trend line in Stata

The above scatter plot shows the correlation between two variables. To understand the correlation on a scatter plot, it is the best practice to also draw the trend line on it as I have drawn on above scatter plot. You can use the following code to get it in Stata:

twoway (scatter price mpg, sort) (lfit price mpg, sort)

The values in trend line also show the negative correlation between price and mpg variable. The trend line is downward placed which is a sign of negative correlation. If the trend line is placed upward, it shows the positive correlation between two variables.

FAQs related to Correlation Analysis in Stata

In the following section, I have answered the exact questions which mostly my students have asked from me.

No. 0.39 is not a strong correlation. It is more close to medium correlation.

Yes. 0.4 correlation is good. It is close to medium correlation which is considered as good correlation.

The 0.5 correlation means the exact medium correlation between two variables.

The 0.8 correlation means a strong correlation between two variables as it is more close to 1.

Yes, 0.92 is a strong correlation value among two variables.

Yes. 0.09 is a weak correlation. It is more close to 0, therefore, it is a weak correlation value.

No. 0.001 is a weak correlation. There is very low correlation exists between these two variables.

No. 0.05 is not a strong correlation. It is weak correlation as value is more close to 0.

Yes. 0.29 is a weak correlation. This value is more close to 0.25, an exact weak correlation point.

Probably Not. 0.35 correlation is somewhat medium correlation. It is 35% relationship out of 100%, therefore, 35% is not a weak correlation.

Yes. It is weak correlation as value is more close to 0.

No. .3 is not a strong correlation. The value is more close to 0.25, therefore, it is a weak correlation.

No. 0.2 is not a strong correlation. The value is more close to 0.25, therefore, it is a weak correlation.

No. 0.1 is not a good correlation. The value is more close to 0, therefore, it is a weak correlation.

No. The correlation measures the relationship between only two variables. However, you can add multiple variables in correlation matrix, but, when we describe their relationship, we take into account only 2 variables.

We need two variables to do the correlation analysis. You can add multiple variables in the correlation matrix. But you would need 2 variables to interpret the correlation.

You should use the multiple regression analysis for 3 variables correlation.

The correlation measures the relationship between two variables while regression can measure the relationship between more than two variables. The regression equation can estimate the multiple independent variables and regressed them on dependent variable. However, in correlation, we can estimate the relationship between only two variables. Read more about regression analysis here.

Yes. The correlation analysis is a statistical test. However, the correlation measures the relationship between only two variables. To measure the nature of relationship between more than two variables, we need to do the regression analysis.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *