The t test in Stata is used for mean comparison analysis between two variables, or two groups, or a single variable in comparison to a hypothesized value. It is also called student t test. When we have data about two groups, for example, math score of male and female. We want to see whether math score among male and female is same or different. The t test will analyze the data and answer this question.
3 types of t test in Stata
There are 3 types of t test. The types of t test are primarily based on the nature of variables. These types are following:
Definition of One Sample t test
The One sample t test is used when there is single column of variable is available. For example, only math scores data is available, and we want to check whether scores are different from a particular number, let’s say 50, or not.
Note: We compare the mean statistics of variables with t test.
Definition of Independent Sample t test
The independent sample t test is used when you have data in form of two groups. For example, math score data is available in two groups, one for male and other for female. We want to examine whether math score is different among male and female or not.
Definition of Paired Sample t test
The paired sample t test is used when there are two variables, and they are kind of dependent on each other. For example, we have data about math score and Financial Management (FM) score of a class. We want to analyze whether math and FM scores are different from each other or not.
Read More: How to interpret regression results in Stata? With Example
Things to Know Before doing t test
There are some things which an analyst needs to know before starting mean comparison analysis in Stata.
0 as Base Value
We usually use 0 as the base value to compare mean between two groups or two variables. If difference between two mean values is 0, we assume that there is no difference among them. If difference between them is not 0, we assume that there is significant difference among both variables or groups or hypothesized value.
T-statistics and P-value
The difference mean value is analyzed through t-statistics and p-value which Stata automatically calculate for us. If t-statistics value is greater than 1.96, then it is significant at 5% level of confidence. If t-stat value is less than 1.96 value, this is insignificant at same 5% level of confidence. Similarly, if p-value is less than 0.05, it would be significant at 5% level of confidence. If p-value is greater than 0.05, then it is insignificant at 5% level of confidence.
Can we perform t test with three variables?
Never. The t test is used for analysis between two variables or a single variable with a hypothesized value. It cannot be used for three variables. For mean comparison analysis of more than two variables, we use ANOVA test.
Normal Distribution
The t test is used for normally distributed data. Therefore, Stata automatically assumes that data is normal and examine the mean comparison. It is the analyst’s job to identify whether its data is normal or not. If your data is not normal, then there is a different method to analyze mean comparison.
Null and Alternative Hypotheses
Another thing to know is null and alternative hypotheses. We usually drive these to examine the analysis and reject or do not reject them. In t test, we assume that null hypothesis is that there is no significant difference between two values or variables, meaning that their mean difference is 0. The alternative hypothesis is that there is significant difference between two mean values or variables.
Performing t test in Stata: Data Description
I have used the data which you can download from here. The data is about math and Financial Management scores of a 13th grade class. These are assumed values which I created by myself for this exercise. There are three variables in data: gender, math score and financial management score.
Doing One sample t test
One sample t test is used to test whether a particular variable is equal to a particular value or not. For example, we want to see that whether math score variable’s mean is significantly different from 50 or not. To analyze above argument, first do the t test in Stata using following code:
ttest math_score == 50
There are lot of things in above output which I will explain all below. However, there are two particular things to note in t test output in Stata: t value and p-value.
General Interpretation
The t-statistics value is 1.9109 and p-value is 0.0712. The t-statistics value is less than 1.96 threshold which I earlier explained. This indicates that t-statistics is insignificant. Further, p-value is greater than 0.05, which also shows that it is insignificant. Now, we have insignificant values of t-statistics and p-value. Therefore, as per thumb rule, these insignificant values show that we do not reject the null hypothesis. This means that mean score of math is somewhat equal to 50 and there is no significant difference between mean of math score and 50.
In-depth Interpretation
There are two parts of t test output in Stata. First is summary statistics and second is test statistics.
Summary Statistics
The highlighted part in red is summary statistics. This overall shows the general things about variable.
Variable. The first column holds the name of variable for which we have conducted the t test. This is math score in our case.
Obs. Second column is Obs (Observations) which counts total number of observations in the data.
Mean. It shows the mean of math score which is 59.6 in our example.
Std. Err. This is standard error. It is the estimated standard deviation of sample mean. The logic behind this value is that if we draw another sample of same observations from the population, the standard deviation of the sample mean should be close to this standard error.
Std. Dev. The standard deviation of the variable.
[95% Conf. Interval]. This is upper and lower mean value. This is a range where the other unknown parameters may lie, in our case it is mean.
Test Statistics
The second part of t test output in Stata shows the core t test analysis. I have highlighted this above in red color.
There are five sub-parts in test statistics of t test in Stata.
First part shows the name of variable which is being analyzed in this test. In our example, it is mean of math score. Second thing is null hypothesis in numeric form. This is mean = 50 in our case. It means that we are testing with null hypothesis that mean of math score is equal to 50.
Second part indicates the t-statistics value and degrees of freedom. The t-statistics value is 1.9109 and degrees of freedom is 19. The degrees of freedom are calculated as total number of observations minus one, (n-1). We can see that t-statistics is less than 1.96, which shows that it is insignificant.
Third part shows the one-tailed (left tailed) t test value. It shows the alternative hypothesis value. Next is p-value. The rule of thumb here is same. If p-value is less than threshold value (in our case 0.05), then it is significant and reject the null hypothesis. The p-value is 0.9644, greater than 0.05. It shows it is insignificant. This means that we should not reject the null hypothesis and reject the alternative hypothesis. To put it more simply, mean value of math score is greater than 50.
Forth part also shows the one-tailed (right-side) t test observations. It shows if mean value of math is greater than 50. The p-value is 0.0356, less than 0.05 threshold value. This means that it is significant. It shows that we reject the null hypothesis and do not reject the alternative hypothesis. It means mean value of math score is greater than 50.
Note: Remember third and fourth part is related to one-tailed t test. However, we are interested in two-tailed t test in our example.
Fifth part is the ultimate part in which we are interested in two-tailed t test. The alternative hypothesis is that mean value of math score is not equal to 50. The p-value here is 0.0712, greater than 0.05. This shows the insignificance of t test. Therefore, we do not reject the null hypothesis and reject the alternative hypothesis. It means that mean value of math score is not equal to 50.
Independent sample t test using Equal Variance
The independent sample t test is used to analyze mean comparisons between two groups. There are two further sub types of independent t test: assuming equal variance and assuming unequal variance. In this part, we will talk about independent sample t test assuming equal variance of population. For example, we have data in form of male and female of their math score. The following command is used to do independent t test in Stata:
ttest math_score, by(gender)
General Interpretation
In a t test analysis, we usually note two things; t-statistics and p-value. I will explain these two here, and step by step interpretation of all measures of above table is also explained below.
In above test, the t-statistics value is -1.0130 and p-value is 0.3245. The t-statistics value is less than 1.96 threshold. This indicates that t-statistics is insignificant. Further, p-value is greater than 0.05, which also shows that it is insignificant. Now, we have insignificant values of t-statistics and p-value. Therefore, as per thumb rule, these insignificant values do not reject the null hypothesis. This means that mean score of math is somewhat equal between male and female and there is no significant difference between mean value of male and female in math score.
In-depth Interpretation
Now I explain the step-by-step interpretation of independent sample t test below. There are also two parts in independent t test: Summary statistics and test statistics.
Summary Statistics
The highlighted part above shows the summary statistics of independent t test in Stata.
Group. The first column holds the name of group. In our case, there are two groups, one is male and other is female. Here 0 represents female and 1 represents male.
Obs. Second column is Obs (Observations) which counts total number of observations in the data. In our case, there are 11 observations for female and 9 observations of male.
Mean. It shows the mean of both groups. The female group mean value is 55 and male group mean value is 65.22.
Std. Err. This is standard error. It is the estimated standard deviation of sample mean. The logic behind this value is that if we draw another sample of same observations from the population, the standard deviation of the sample mean should be close to this standard error.
Std. Dev. The standard deviation of both groups.
[95% Conf. Interval]. This is upper and lower mean value. This is a range where the other unknown parameters may lie, in our case it is mean.
Rows. There are three rows in independent t test in Stata. First row shows both groups which we are testing. Second row shows Combined statistics of both groups. Third row shows the difference between both groups.
Test Statistics
The above highlighted part shows the primary test statistics of independent t test in Stata. There are also five sub-sections of this t test. These are highlighted below again:
First part shows the equation which is being tested. For example, we are analyzing that whether mean score of math variable is same or different between male and female. Second thing is null hypothesis in numeric form. This is diff = 0 in our case. It means that we are testing with null hypothesis that difference between two groups is 0.
Second part indicates the t-statistics value and degrees of freedom. The t-statistics value is -1.0130 and degrees of freedom is 18. The degrees of freedom are calculated as total number of observations minus two in independent sample t test, (n-2). We can see that t-statistics is less than 1.96, which shows that it is insignificant.
Third part shows the one-tailed (left tailed) t test value. It shows the alternative hypothesis value. Next is p-value. The rule here is also same. If p-value is less than threshold value (in our case 0.05), then it is significant and reject the null hypothesis. The p-value is 0.1622, greater than 0.05. It shows it is insignificant. This means that we should not reject the null hypothesis and reject the alternative hypothesis. To put it more simply, difference of mean value of math score between male and female is 0. There is no significant difference between math score of male and female group.
Forth part also shows the one-tailed (right-side) t test observations. It shows if difference of mean value of between male and female is greater than 0. The p-value is 0.8378, greater than 0.05 threshold value. This means that it is insignificant. It shows that we do not reject the null hypothesis and reject the alternative hypothesis. It means difference of mean value of math score between male and female is not greater than 0.
Note: Remember third and fourth part is related to one-tailed t test. However, we are interested in two-tailed t test in our example.
Fifth part is the ultimate part in which we are interested in two-tailed t test. The alternative hypothesis is that difference of mean value of math score between male and female is not equal to 0. The p-value here is 0.3245, greater than 0.05. This shows the insignificance of t test. Therefore, we do not reject the null hypothesis and reject the alternative hypothesis. It means that difference of mean value of math score between male and female is 0. It means there is no significant difference of math score between male and female group.
Independent sample t test assuming Unequal variance
In this test we will repeat the independent sample t test which we did above. However, the only difference is that here we assume that variance of population is unequal. The rest explanation and interpretations are somewhat same. Use following code for unequal variance of t test:
ttest math_score, by(gender) unequal
The interpretation of this test is same as I have explained it in equal variance section above. The values are slightly different; however, their interpretation is same. To avoid unnecessary details, I am not writing the interpretation again, you can check above independent t test using unequal variance section.
Paired Sample t test
The paired sample t test is used when we have two dependent variables in our data set, and we want to compare their means. For example, in our case, a student both attempted math and financial management subjects. Therefore, there must be a relationship between math and financial management scores of a student. So, if we want to compare means of math and financial management, we shall use paired sample t test in Stata.
To do the paired sample t test, use the following command:
ttest math_score == FM_score
General Interpretation
The above output is paired sample t test output in Stata. The interpretation of output is somewhat similar to other types of t test which we conducted above. However, there is a little difference which I will explain here.
As noted earlier, there are two things to note in a t test. First t-statistics value and other is p-value. As we can see in above test that t-statistics value is 0.5664 which is lower than 1.96, indicating its insignificant. Another value to note is p-value which is 0.5778 which is greater than 0.05, showing the insignificance of p-value. The insignificance p-value and t-statistics value indicate that we do not reject the null hypothesis and reject the alternative hypothesis. It means the difference between math and financial management score is 0. To put it simpler, there is no difference in math and financial management scores of students.
In-depth Interpretation
Now, I will explain the step-by-step paired t test output. As I told earlier, there are two parts of t test in Stata: first is summary statistics and other is test statistics.
Summary Statistics
The highlighted part above shows the summary statistics of paired t test in Stata.
Variable. The first column holds the name of both variables in which we are interested to analyze. In our case, they are math score and financial management score.
Obs. Second column is Obs (Observations) which counts total number of observations in the data. In our case, there are 20 observations for both variables.
Mean. It shows the mean of both variables. The math score mean value is 59.6 and financial management (FM) variable mean value is 55.65.
Std. Err. This is standard error. It is the estimated standard deviation of sample mean. The logic behind this value is that if we draw another sample of same observations from the population, the standard deviation of the sample mean should be close to this standard error.
Std. Dev. The standard deviation of both groups.
[95% Conf. Interval]. This is upper and lower mean value. This is a range where the other unknown parameters may lie, in our case it is mean.
Rows. There are two rows in paired t test in Stata. First row shows name of both variables which we are analyzing. Second row shows difference statistics of both variables.
Test Statistics
The above highlighted part shows the test statistics of paired t test in Stata. There are also five sub-sections of paired t test in Stata. These are highlighted below again:
First part shows the question which is being tested. For example, we are investigating that whether mean values of math score and financial management scores are equal or different. Second thing is null hypothesis in numeric form. This is mean(diff) = 0 in our case. It means that we are examining with null hypothesis that difference between two variables is 0.
Second part indicates the t-statistics value and degrees of freedom. The t-statistics value is 0.5664 and degrees of freedom is 19. The degrees of freedom are calculated as total number of observations minus one in paired sample t test, (n-1). We can see that t-statistics is less than 1.96, which shows that it is insignificant.
Third part shows the one-tailed (left tailed) t test value. It shows the alternative hypothesis value. Next is p-value. The rule here is also same. If p-value is less than threshold value (in our case 0.05), then it is significant and reject the null hypothesis. The p-value is 0.7111, greater than 0.05. It shows it is insignificant. This means that we should not reject the null hypothesis and reject the alternative hypothesis. To put it more simply, difference of mean value of math score and financial management score is 0. There is no significant difference between math score and financial management score.
Forth part also shows the one-tailed (right-side) t test observations. It shows if difference of mean value of both variables is greater than 0. The p-value is 0.2889, greater than 0.05 threshold value. This means that it is insignificant. It shows that we do not reject the null hypothesis and reject the alternative hypothesis. It means difference of mean value of math score and financial management score not greater than 0.
Note: Remember third and fourth part is related to one-tailed t test. However, we are interested in two-tailed paired t test in our example.
Fifth part is the ultimate part in which we are interested in two-tailed t test. The alternative hypothesis is that difference of mean value of math score and financial management score is not equal to 0. The p-value here is 0.5778, greater than 0.05. This shows the insignificance of t test. Therefore, we do not reject the null hypothesis and reject the alternative hypothesis. It means that difference of mean value of math score and financial management score is 0. It means there is no significant difference of math score and financial management score of a student.
Conclusion
The t test is used to measure the mean difference between two variables, or two groups, or a single variable with a hypothesized value. It assumes that data is normally distributed. There are two things particularly to note in a t test output in Stata: first is t-statistics and second is p-value. If these values are significant then we reject the null hypothesis and do not reject the alternative hypothesis. If these values are insignificant then we do not reject the null hypothesis and reject the alternative hypothesis.