Statistical tests in R. Part 3: Quantitative data tests

This is the third article in the series on the use of R for statistical data analysis, which will deal with the presentation and testing of quantitative data. You will learn how to quickly and visually present the data, as well as how to use the t-test in R.

Part 1: Binary Classification
Part 2: Quality Data Analysis

Go!

For a start, I want to once again give the scheme from the previous article:
')

"
Paired data is different in that the data for the test groups were obtained from the same objects. Applications for t-test: the influence of any factor on the change in sales / speed of the application / service life of the device, the comparison of two groups of people on productivity. If there are several groups, then ANOVA models (ANOVA - analysis of variance) are used for analysis, which will be discussed in the following articles.

Presenting data before analysis

Let's start with how to submit data. I use my existing medical test file that I used to study. I will briefly describe it. People from the two groups, the study and control, did some exercises. Before and after exercise, their physiological parameters were measured. We will try to analyze the pulse and the forced expiratory volume. In addition to t-tests, I will supplement the previous article and show you how to get qualitative data from numbers. So:

tab <- read.csv(file="data1.csv", header=TRUE, sep=",", dec=".") attach(tab) tab tab <- cbind(tab, pulsediff=pulse2-pulse1, FEVcut2.5=cut(FEV1_1, c(0,2.5,max(FEV1_1)+0.1))) str(tab) detach(tab) attach(tab)

We form an additional column with the pulse difference before and after, as well as an additional column of qualitative data for the forced expiratory volume that can be tested for chi-square test. The last function is cut. In it we set the data and the point of cut. At the exit:

"

We go further. We calculate the average values and standard deviations, construct visual graphs.

 mean(pulsediff[group==0], na.rm=T) mean(pulsediff[group==1], na.rm=T) sd(pulsediff[group==0], na.rm=T) sd(pulsediff[group==1], na.rm=T) boxplot(pulsediff~group, main="Distribution of pulse difference stratified by group", names=c("control-group", "exercise-group"), ylab="pulse difference")

From the important I want to note na.rm = T. If there are empty cells in your data, R will remove them by itself. Otherwise, you will get an error. Boxplot is a very good method for sampling visualization. It shows the maximum and minimum, the average value, as well as the quantile probability of 25 and 75 percent.

Now let's talk about the difference for paired and independent data in terms of statistics. In the case of independent data, the following confidence interval is used for the analysis:

The mean value is the difference between the means of each sample, and the standard deviation is calculated using a special formula for the difference.

In the case of paired data, we subtract the values in pairs and get a new sample, for which we find our average value and standard deviation. Confidence interval:

Application of tests in R

 #Paired data #approach 1: t.test(pulsediff[group==1]) t.test(pulsediff[group==0]) #approach 2: t.test(pulse1[group==1], pulse2[group==1], paired=T) t.test(pulse1[group==0], pulse2[group==0], paired=T)

Here we look at the differences between the pulse before and after in the same group. You can use t.test in two ways, either by sending a difference or two data arrays there.

"

Conclusions: in group 0 there is no difference before and after, in group 1 there is a difference, because The p-value is much less than 5% .

 #Unpaired data t.test(pulsediff~group)

Here we analyze the difference between groups. Data is independent. Strictly speaking, R uses Welch's test, which is slightly different from the usual t-test. Welch test is more accurate, they converge with a large sample size.

"

Conclusions: the difference between the groups is significant, because The p-value is much less than 5% .

 #Descriptive analysis: library(prettyR) str(FEVcut2.5) str(sex) xtab(sex~FEVcut2.5, data=tab) #Inferential analysis: chisq.test(table(sex, FEVcut2.5), correct=F) #ADD: chisq.test(table(sex, FEVcut2.5)) fisher.test(table(sex, FEVcut2.5))

Here we compare the forced expiratory volume in men and women.
Table (from R gui, in RStudio, for me personally, the tables are slightly incorrectly displayed):

"

Results:

"
Here we used as many as three tests to increase complexity and accuracy. Once again, I recommend using the Fisher test, but keep these in mind. Conclusions: the tests gave different results, but the p-value is still very small. Groups differ among themselves.

Results

So, today we looked at examples of using tests. This information is enough to conduct enough qualitative statistical studies. These methods can be applied in any areas. Using them will save you from mistakes, allow you to objectively evaluate your work and provide objectively reliable information to other people. There are a few more topics that I want to highlight regarding the assessment of the required sample size, proof of equality of random variables and ANOVA-models.

Source: https://habr.com/ru/post/176795/

All Articles

Statistical tests in R. Part 3: Quantitative data tests

Presenting data before analysis

Application of tests in R

Results

More articles: