The comparison of two means in independent groups is one of the most used statistical procedures. In the bibliometric study carried out by

In this analysis we have a categorical variable, which can be binary or a factor with

Speaking in generic terms, vectors Y1 and Y2 can correspond to an experimental group and a control group, to two experimental conditions, to responses of men and women, etc. In terms of R, the two response vectors are defined:

> Y = c(6, 8, 9, 12, 9, 10, 8, 11, 11, 12, 10, 10)

> Y2 = c(13, 13,14, 16, 15, 12, 17, 14, 15, 18, 17, 20)

A vector, called Y, is generated that joins both vectors Y1 and Y2.

Y = c(Y1,Y2)

> Y

[1] 6 8 9 12 9 10 8 11 11 12 10 10 13 13 14 16 15 12 17 14 15 18 17 20

The dichotomous variable Grupo is generated, which defines which group each response corresponds to and which differentiates the responses of vector Y1, with a value of 0, from the responses of vector Y2, with a value of 1.

Grupo = as.factor(c(rep(0,12),rep(1,12)))

> Grupo

[1] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

Levels: 0 1

Finally, a data matrix that will be named Data is created, with two columns: the first is the Y vector and the second is Grupo.

> Datos = data.frame(Y, Grupo)

The sample means of the two groups are obtained, which allow the comparison of the two population means, as well as the difference of the sample means:

The box plot of the two distributions is displayed using the

The following is a non-exhaustive overview of the different scenarios in which a researcher may find himself having to make a comparison of two means in independent groups and of the procedures to use, with R, in each scenario. The syntax in R and the most relevant results are provided. A broader description and the formulas on which each test is based can be found in

Thus, the Shapiro-Wilk test allows us to conclude the null hypothesis of normality in the original populations of both groups, while the robust Levene test maintains the null hypothesis of equality of population variances.

It should be remembered that the evaluation of the assumptions must always be carried out and, based on its results, a correct decision can be made about the procedure to follow.

The Grupo coefficient is the difference of means. In this case, when moving the Grupo from 0 to 1, the mean of Y increases 5.667, so it appears with a positive sign. For this reason, the CI (95%) appears with a change in sign to that of the

The following table shows the application to the parametric way:

Effect size is a standardized measure of the difference between means. A very popular measure is

Cohen differentiates between small (

In the following link,

The power of the statistical test is the probability of rejecting the null hypothesis when it is false. The APA recommends always providing this term (

If the alternative hypothesis is one-sided, it will be enough to change the parameter alt = “less” or alt = “greater”. The power obtained is very high (0.99). If a low power is obtained, for example 0.22 when obtaining a

Therefore, to have a power of 80%, the size of each group should be at least

When the assumption of normality is not fulfilled, a non-parametric test can be used. In these tests, the original data is transformed into order numbers.

You can also get a bootstrap version of the Yuen test, with the

The final values of the Huber M-estimators are:

In this case, this function uses the value of the first iteration of the M-estimator

For example, with the

Below are the results of the

Also, this can be verified by seeing that the confidence intervals do not contain the value zero. The most important thing is to use the best procedure in each case based on the circumstances that arise. In the example presented, when the application conditions are met, the most powerful procedure to choose would be the parametric route. As there is normality, the non-parametric way is not required. As there are no outliers, the Robust way is not required, and although the exact way can be used, as it is small samples, in this case the parametric way is more powerful.

In this example, all the procedures provide similar results since the conditions of application are met which will not happen when any of the assumptions are not fulfilled.

The diagram in the following page provides an overview of the content explained in this article (

Cite this article as: Palmer, A., Sesé, A., Cajal, B., Montaño, J. J., Jiménez, R., & Gervilla, E. (2022). Procedures for comparison of two means in independent groups with R.