Coursera Data Science Specialization: Statistical Inference Course Project part 2

Part 2 of my Statistical Inference project (part 1 is here), part of the Johns Hopkins Data Science Specialization on Coursera.
You can find the .Rmd (R Markdown) format here, and the html result here.


Statistical Inference Course Project Part 2: Inferential Data Analysis

Massimiliano Figini
October 9, 2016

Analysis of the ToothGrowth data in the R datasets package.
The data are about response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods (orange juice or ascorbic acid).

 

# Basic settings and libraries
knitr::opts_chunk$set(echo = TRUE, tidy.opts=list(width.cutoff=60), tidy=TRUE)
library(pander)
library(ggplot2)
library(dplyr)
library(datasets)   # load the data

 

QUESTION 1 and 2: Load the ToothGrowth data, perform some basic exploratory data analyses and provide a basic summary of the data.

# Show the table
pandoc.table(ToothGrowth)
lensuppdose
4.2VC0.5
11.5VC0.5
7.3VC0.5
5.8VC0.5
6.4VC0.5
10VC0.5
11.2VC0.5
11.2VC0.5
5.2VC0.5
7VC0.5
16.5VC1
16.5VC1
15.2VC1
17.3VC1
22.5VC1
17.3VC1
13.6VC1
14.5VC1
18.8VC1
15.5VC1
23.6VC2
18.5VC2
33.9VC2
25.5VC2
26.4VC2
32.5VC2
26.7VC2
21.5VC2
23.3VC2
29.5VC2
15.2OJ0.5
21.5OJ0.5
17.6OJ0.5
9.7OJ0.5
14.5OJ0.5
10OJ0.5
8.2OJ0.5
9.4OJ0.5
16.5OJ0.5
9.7OJ0.5
19.7OJ1
23.3OJ1
23.6OJ1
26.4OJ1
20OJ1
25.2OJ1
25.8OJ1
21.2OJ1
14.5OJ1
27.3OJ1
25.5OJ2
26.4OJ2
22.4OJ2
24.5OJ2
24.8OJ2
30.9OJ2
26.4OJ2
27.3OJ2
29.4OJ2
23OJ2
ToothGrowth$dose <- as.factor(ToothGrowth$dose)  # dose is factor
str(ToothGrowth)  # Table variables
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...

The variables are:

  • len: the tooth length
  • supp: the supplement type of vitamin C (VC is ascorbic acid, OJ is orange juice)
  • dose: dose in milligrams/day of vitamin C

 

# Basic summary
summary(ToothGrowth)
##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

 

# Means for supplement type and dose
TG <- tbl_df(ToothGrowth)
TG2 <- summarize(group_by(TG, supp, dose), mean_of_len = mean(len))
TG2
## Source: local data frame [6 x 3]
## Groups: supp [?]
## 
##     supp   dose mean_of_len
##   <fctr> <fctr>       <dbl>
## 1     OJ    0.5       13.23
## 2     OJ      1       22.70
## 3     OJ      2       26.06
## 4     VC    0.5        7.98
## 5     VC      1       16.77
## 6     VC      2       26.14

 

# Basic exploratory graph
qplot(supp, data = TG2, geom = "bar", weight = mean_of_len, facets = . ~ 
    dose, fill = supp, main = "Mean of tooth lenght for supplement type and quantity", 
    xlab = "", ylab = "Tooth length")

image.png

The lenght seems correlated with dose quantity and, in part, with supplement type.

QUESTION 3: Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

We perform a first T-test with significance level of 5%, with this Null Hypotesis: different supplement type hasn't effect with tooth growth.

t.test(len ~ supp, data = ToothGrowth)
## 
##     Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

We can't reject null hypotesis because confidence interval contains zero. We can't say that supplement types have significant impact on tooth growth.

 

Now we perform three tests with significance level of 5%, with this Null Hypotesis: different dose hasn't effect with tooth growth.

# dose = 2 vs dose = 0.5
t.test(ToothGrowth$len[ToothGrowth$dose == 2], ToothGrowth$len[ToothGrowth$dose == 
    0.5])
## 
##     Welch Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == 2] and ToothGrowth$len[ToothGrowth$dose == 0.5]
## t = 11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  12.83383 18.15617
## sample estimates:
## mean of x mean of y 
##    26.100    10.605
# dose = 2 vs dose = 1
t.test(ToothGrowth$len[ToothGrowth$dose == 2], ToothGrowth$len[ToothGrowth$dose == 
    1])
## 
##     Welch Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == 2] and ToothGrowth$len[ToothGrowth$dose == 1]
## t = 4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3.733519 8.996481
## sample estimates:
## mean of x mean of y 
##    26.100    19.735
# dose = 1 vs dose = 0.5
t.test(ToothGrowth$len[ToothGrowth$dose == 1], ToothGrowth$len[ToothGrowth$dose == 
    0.5])
## 
##     Welch Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == 1] and ToothGrowth$len[ToothGrowth$dose == 0.5]
## t = 6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   6.276219 11.983781
## sample estimates:
## mean of x mean of y 
##    19.735    10.605

All three tests have confidence intervals that don't contains 0. This means that we can accept the null hypotesis. Increasing the dose of vitamin C have impact on tooth growth.

 

QUESTION 4: State your conclusions and the assumptions needed for your conclusions.

We assume that the experiment was done with random assignment of guinea pigs and the sample is representative of the entire population. The variances are assumed to be different for the groups being compared.
If all the assumptions are true, we can say that increasing the dose of vitamin C have a significant impact on tooth growth of the pig: a higher dose level of vitamin C correspond to longer teeth.
We also can say that supplement types of vitamin C (orange juice or ascorbic acid) don't have significant impact on tooth growth.