This page uses the ‘car’ (Companion for Applied Regression) for data because inputting data inline is a problem for having sufficient observations or cases for a correlation problem.
The Davis dataset contains self-reported height and weight along with measured height and weight for a group of Canadians.
library(psych)
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
str(Davis)
## 'data.frame': 200 obs. of 5 variables:
## $ sex : Factor w/ 2 levels "F","M": 2 1 1 2 1 2 2 2 2 2 ...
## $ weight: int 77 58 53 68 59 76 76 69 71 65 ...
## $ height: int 182 161 161 177 157 170 167 186 178 171 ...
## $ repwt : int 77 51 54 70 59 76 77 73 71 64 ...
## $ repht : int 180 159 158 175 155 165 165 180 175 170 ...
As you can (hopefully) see, there are variables for sex (M or F), height, weight, self-reported height and self-reported weight.
Computing the correlations is a simple request, but all the variables must be numeric, and sex is coded as a factor. Further, we need to request ‘complete.obs’ if we want listwise deletion, or else request ‘pairwise.complete.obs’ for pairwise deletion.
Davis$sex <- as.numeric(Davis$sex)
cor(Davis, use = 'complete.obs')
## sex weight height repwt repht
## sex 1.0000000 0.5753653 0.5824326 0.7178326 0.7381536
## weight 0.5753653 1.0000000 0.1542575 0.8353758 0.6314352
## height 0.5824326 0.1542575 1.0000000 0.6037367 0.7391662
## repwt 0.7178326 0.8353758 0.6037367 1.0000000 0.7618604
## repht 0.7381536 0.6314352 0.7391662 0.7618604 1.0000000
The correlation between height and weight seems suspicious to me, so I plotted them.
plot(Davis$height, Davis$weight)
As you can see, there is an outlier that causes trouble. I had forgotten that one of the observations is transposed so that height and weight are punched into the opposite columns in the data.
So we fix the data problem.
Davis$weight[12] <- 57
Davis$height[12] <- 166
plot(Davis$height, Davis$weight)
That looks better. Let’s recompute the correlations.
cor(Davis, use = 'complete.obs')
## sex weight height repwt repht
## sex 1.0000000 0.6983927 0.7394358 0.7178326 0.7381536
## weight 0.6983927 1.0000000 0.7684924 0.9861233 0.7486882
## height 0.7394358 0.7684924 1.0000000 0.7827870 0.9755870
## repwt 0.7178326 0.9861233 0.7827870 1.0000000 0.7618604
## repht 0.7381536 0.7486882 0.9755870 0.7618604 1.0000000
These are all substantial correlations.
To compute conventional significance tests (H0: rho =0):
corr.test(Davis, use='complete.obs')
## Call:corr.test(x = Davis, use = "complete.obs")
## Correlation matrix
## sex weight height repwt repht
## sex 1.00 0.70 0.74 0.72 0.74
## weight 0.70 1.00 0.77 0.99 0.75
## height 0.74 0.77 1.00 0.78 0.98
## repwt 0.72 0.99 0.78 1.00 0.76
## repht 0.74 0.75 0.98 0.76 1.00
## Sample Size
## sex weight height repwt repht
## sex 200 200 200 183 183
## weight 200 200 200 183 183
## height 200 200 200 183 183
## repwt 183 183 183 183 181
## repht 183 183 183 181 183
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## sex weight height repwt repht
## sex 0 0 0 0 0
## weight 0 0 0 0 0
## height 0 0 0 0 0
## repwt 0 0 0 0 0
## repht 0 0 0 0 0
##
## To see confidence intervals of the correlations, print with the short=FALSE option
This gives results comparable to SAS. Note that the p levels are all zero to two decimals (very small p values). Probably should omit the p values for the diagonal and show the listwise sample size, but oh well. We should note also that sex is a binary variable, but its ordinary correlation (so-called point-biserial correlation) is also computed and printed in this example. Many would prefer to use logistic regression in such a case, but if sex is considered to be an independent variable (as it probably would be in this case), then the correlation is mathematically related to the t-test and has intuitive meaning.
We may also want to see the scatterplots of all the numerical values.
Davis2 <- Davis[,2:5] # subset of the 4 continuous variables
pairs(Davis2)
It appears that people in this sample reported their height and weight rather accurately.