Inference: Comparison of Means


Learning Objectives


1. Understand the conceptual difference among a one sample, two independent sample vs. paired tests.


2. Conduct and properly interpret a one sample test.


3. While you will not be asked to calculate a two independent sample test statistic, you should be able to properly interpret a comparison of means test for two independent samples (understand/interpret hypotheses, test statistic and p-value).  For an example of the two sample Welch’s test, click here.


4.  Understand when to use the Student’s t or the z statistic in a comparison of means test.


5. Calculate AND interpret a p-value.



General Steps in Conducting a Comparison of Means Test


1.  Decide type of comparison of means test.

         (one sample, two sample, paired samples)


2.  Decide whether a one- or two-sided test.


3.  Examine the appropriateness of a comparison of means test (based on the assumptions)***.


4.  Establish null and alternative hypotheses.


5.  Decide whether a z-statistic or t-statistic is appropriate.


6.  Calculate sample mean(s).


7.  Calculate standard deviation of sample IF using a t-test.


8.  Calculate standard error.


9.  Calculate z-statistic or t-statistic.


10.  Determine p-value from the test statistic using the appropriate z or t distribution.


11.  Interpret the p-value in terms of the hypotheses established prior to the test.



Type of Comparison of Means Test


There are three major types of comparison of means tests: (1) one sample test; (2) two independent samples and (3) paired or repeated measures test.  It is important to be able to differentiate between these three tests.  In each of the tests we make inferences to a population or populations based on one or two samples.


One sample test:  We make an inference to a population in comparison to some set value.  For example, we might be interest in knowing whether the dissolved oxygen levels in a lake meet a state standard of 5 mg/L.


Two independent sample test:  In this test, we collect two independent samples to test whether there is a difference in means between two populations (or if one population mean is greater or less than the other) .    Comparing GRE scores between  men and women is an example of a two independent sample test.


Paired or Repeated measure test:  This test compares paired data, such as data collected before and after a treatment.  Example:  a comparison of NOx emissions from randomly selected automobiles before and after an additive is added to the fuel.



One-Sided vs. Two-Sided Comparison of Means Tests


For a comparison of means test, you may use either a one-sided or two-sided test.  A one-sided test (leading to a one-sided p-value) examines whether one mean is greater (or less than) the other mean.  If you want to test whether the mean of population A is greater (or less) than the mean of population B, this is a one-sided test.  If you want to test whether there is a difference between two means (without any directionality), then you use a two-sided test (and subsequently a tw0-sided p-value (see below).  The null and alternative hypotheses should reflect whether or not you are using a one- or two-sided comparison of means test.


z-stat vs. t-stat



A z-statistic should be calculated when the standard deviation of the population(s) is known.  If the standard deviation is not known, then the standard error must be estimated using the standard deviation of the sample(s).  Due to this estimation, we must use the t-distribution which is thicker in the tails to account for estimating the standard error with the sample standard deviation.

Test Statistic Calculation



In general terms, a comparison of means test equals:


The standard error is the standard deviation of the sampling distribution of the sample mean.  In essence, the test statistic calculates how many standard errors the mean of the sample is away from the value that we hypothesize.  The one sample z-statistic is:


With the z-statistic, the standard error equals the standard deviation of the population (σ) divided by the square root of the sample size.   With the t-statistic, the standard deviation of the population is estimated with the standard deviation of the sample.  With the t-statistic, we assume that we do NOT know the standard deviation of the population (σ) and so we estimate the standard error using the sample standard deviation.  A student t table can be found at this Texas A & M Statistics website.





A one-sided p-value is the probability that the test statistic is greater than (or less than) the calculated value.  For a two-sided test, the two-sided p-value is the probability that the test statistic is greater than OR less than the calculated value.  It is VERY IMPORTANT to know that the p-value is a conditional probability–a probability conditioned on the assumption that the NULL HYPOTHESIS is true.  In the words of Moore, McCabe and Craig (2012, p. 365), a p-value is:


The probability, assuming Ho is true, that the test statistic would take a value as extreme or more extreme than that actually observed is called the p-value of a test.  The smaller the p-value, the stronger the evidence against the H0 provided by the data. 


Many statisticians consider a p-value less than 0.05 to be statistically significant (and a p-value of <0.01 as highly statistically significant).  As a general rule, the smaller the p-value, the stronger the evidence against the null hypothesis.  If you determine a level of significance (α (alpha) level) prior to your test, this sets a ‘type I’ error rate.  A type I error is the probability of rejecting the null hypothesis when it is in fact true.  If the p-value is less than or equal to the a priori alpha level, we can state that there is statistical significance at the alpha level.  To learn more about Type I errors, view this Khan Academy video.


Remember, however, that statistical significance is a different concept than ‘practical’ significance.  Practical significance suggests that a difference between two populations has ‘real world’ meaning.  For example, if we have a very large sample size (e.g., n=1,000), we might be able to detect a very small statistically significant difference of students’ performance on a statistics exam based on gender (e.g., 0.06%).  While the estimated difference between average test scores of 0.06% may be statistically significant (if we have a large n and small standard deviations), a difference of 0.06% between male and female performance on the exam may have no practical meaning.


Assumptions of a One Sample Comparison of Means Test


 In order for our results of our comparison of means test to be significant, we must make a few assumptions.

1.  Population of concern is normally distributed.

2.  Observations are independent (the value of one observation is independent of the value of another observation).  When data are temporally or spatially correlated this assumption is violated.



Example Problems


One Sample Hypothesis Test

(σ is known)



In 1979, the State of North Carolina adopted a chlorophyll a standard of 40 ul/L for its rivers and lakes.  Jordan Lake (actually a reservoir) sits to the south of Chapel Hill, North Carolina and serves as a drinking water supply for much of the Triangle area.  If you would like to learn more about Jordan Lake nutrient management strategy, check out the NC Division of Water Quality’s link.  We want to determine whether or not the chlorophyll a of the water of Jordan Lake is greater than the state standard (out of compliance).  On the 4th of July!  we go out and collect 100 randomly selected water quality samples from the reservoir (we typically wouldn’t collect so many samples on one day, but for statistical simplification for this problem we will!).  Based on our GIS map of Jordan Lake, we randomly sample points in the Jordan Lake polygon based on latitude and longitude values.


1. Because we are comparing a population (chlorophyll a in Jordan Lake) to a specific value 40 ug/L, we use a one sample hypothesis test.


2.  Because we care whether or not the chlorophyll a concentration is greater than the state standard, we will use a one-sided test.


3.  Examine the assumptions of the comparison of means test.  We will return to this at the end of this example.


4.  A comparison of means tests assesses/determines the evidence AGAINST the null hypothesis (and in favor of the alternative).  Because we want to determine whether or not the lake is meeting/exceeding the state standard, we establish the following hypotheses:


Null hypothesis (Ho)


Ho: μ ≤ 40 ug/L  


  -Remember that the hypotheses are about the POPULATION and therefore should contain population parameters and NOT sample statistics (such as xbar).


Alternative hypothesis  (Ha)


Ha: μ > 40 ug/L


5.  We will use the z-statistic because we are assuming we know the standard deviation of the population (5.0 ug/L).  We will assume that we know the population standard deviation of chlorophyll a, based on previous monitoring studies (5.0 ug/L).  Because we assume this value, we will use the test based on the normal distribution (z-statistic).


6.  We calculate the sample mean of the 100 chlorophyll a observations to be 41.0 ug/L.


7.  Because we are assuming a KNOWN standard deviation of the population, we use the z-statistic. The population standard deviation is given in the problem–we don’t need to calculate it.

8.  To calculate the z-statistic, we first need to calculate the standard error which equals s/√n = 5.0/√100 = 5/10 = 0.50.


9.  As outlined above, the z-statistic equals the estimate minus the value of interest, all divided by the standard error.  In this case, our observed sample mean is 41.0.  We calculate the z as (41.0 – 40)/0.50 which equals 1/0.50=2.


10.  We now look up the value of 4 in the z-table to determine the p-value [p(z>2.0)].  In this case, the p-value is equal to the area to the right of the z-stat of 2.  Using this z-table, we look up a z-stat of 2.  Because we are interested in the probability to the right of the z-statistic of 2, we need to subtract 0.9772 from one.  We calculate a one-sided p-value of 0.0228.


11.  Interpret the p-value.  We calculated a p-value of 0.0228 above.  A p-value is a conditional probability:  ASSUMING that the null hypothesis is true, the p-value is the probability of getting a test statistic as extreme, or more extreme, than we got [p(z>2.0)=0.0228].  A small p-value provides stronger evidence AGAINST the null hypothesis.  In this problem, we conclude that the data suggest that the mean chlorophyll a level (on July 4th) in Jordan Lake is greater than the state standard of 40 ug/L.


12.  In order for these results to be valid, we assume that chlorophyll a in Jordan Lake are normally distributed.  We also assume independent sampling.  The latter may not hold, if the observations are spatially correlated (if nearby observations are correlated with each other–spatial autocorrelation).


Comparison of Means PPT


Sample Problems


1.  True or False:  A p-value is the probability that the null hypothesis is true.


2.  True or False:  A very small p-value (p<0.01)  provides evidence in support of the null hypothesis.


3.  True or FalseAll else constant, in a one sample test, the greater the sample size, the greater the positive test statistic or the smaller (more negative) the negative test statistic.


4.  We want to examine the effectiveness of an environmental education seminar.  We randomly select 50 seminar attendees and give them a test on environmental topics both before and after the seminar.  We want to determine if the environmental education seminar improved environmental understanding.  Which of the following is the most appropriate test?

    a.  a paired, two-sided test

    b.  a two independent sample, one-sided test

    c.  a paired, one-sided test



5.   We want to determine whether a certain make and model of car has a greater fuel efficiency in miles per gallon (mpg) than 50mpg.  We randomly sample 100 cars of the same make and model.  The mean mpg of the sample is 53 mpg and the sample standard deviation is 5 mpg.

a.  Establish the null and alternative hypotheses.

b.  Calculate the appropriate test statistic.

c.  True or False:  We should calculate a z-statistic.

d.  Calculate the p-value and compare to an alpha level of 0.05 to draw your conclusions.


6.  We sample two forests of loblolly pine trees (n=100 in each forest) and measure their diameter at breast height (dbh) in centimeters.  We establish the following hypotheses:

         Ho:  The mean DBH measurement of loblolly trees in forest A is less than or equal to the mean in forest B.

         Ha:  The mean of DBH measurement of loblolly trees in forest A is greater than the mean in forest B.

We use the following test statistic (Welch’s t-test) equation and get a t-statistic of -3.6.

True or False:  This test statistic provides strong evidence against the null hypothesis.




This page was developed by Elizabeth A. Albright, PhD of the Nicholas School of the Environment, Duke University.

Return to the Statistics Review home page.