library(car) # calling the library lets you use the data and functions in it
library(psych)
##
## Attaching package: 'psych'
## The following object is masked from 'package:car':
##
## logit
str(Davis) # list the variables and their classes
## 'data.frame': 200 obs. of 5 variables:
## $ sex : Factor w/ 2 levels "F","M": 2 1 1 2 1 2 2 2 2 2 ...
## $ weight: int 77 58 53 68 59 76 76 69 71 65 ...
## $ height: int 182 161 161 177 157 170 167 186 178 171 ...
## $ repwt : int 77 51 54 70 59 76 77 73 71 64 ...
## $ repht : int 180 159 158 175 155 165 165 180 175 170 ...
table(Davis$sex)
##
## F M
## 112 88
There are 112 women and 88 men.
describe(Davis) # the describe function comes from the 'psych' pacakage
## vars n mean sd median trimmed mad min max range skew
## sex* 1 200 1.44 0.50 1.0 1.43 0.00 1 2 1 0.24
## weight 2 200 65.80 15.10 63.0 64.21 11.86 39 166 127 2.01
## height 3 200 170.02 12.01 169.5 170.32 9.64 57 197 140 -4.00
## repwt 4 183 65.62 13.78 63.0 64.27 11.86 41 124 83 1.03
## repht 5 183 168.50 9.47 168.0 168.19 10.38 148 200 52 0.33
## kurtosis se
## sex* -1.95 0.04
## weight 8.96 1.07
## height 37.07 0.85
## repwt 1.33 1.02
## repht -0.36 0.70
For weight, the mean is 65.8 and the standard deviation is 15.10. The rest of the variables are contained in the same table.
BodyMassIndex <- Davis$weight/((Davis$height/100)^2)
describe(BodyMassIndex)
## vars n mean sd median trimmed mad min max range skew
## X1 1 200 24.7 34.68 21.84 22.11 2.6 15.82 510.93 495.1 13.77
## kurtosis se
## X1 190.1 2.45
The mean BMI is 24.7.
stem(BodyMassIndex)
##
## The decimal point is 1 digit(s) to the right of the |
##
## 0 | 67778888888888888899999999999
## 2 | 00000000000000000000000000000000001111111111111111111111111111122222+90
## 4 |
## 6 |
## 8 |
## 10 |
## 12 |
## 14 |
## 16 |
## 18 |
## 20 |
## 22 |
## 24 |
## 26 |
## 28 |
## 30 |
## 32 |
## 34 |
## 36 |
## 38 |
## 40 |
## 42 |
## 44 |
## 46 |
## 48 |
## 50 | 1
boxplot(BodyMassIndex)
stem(Davis$height)
##
## The decimal point is 1 digit(s) to the right of the |
##
## 5 | 7
## 6 |
## 7 |
## 8 |
## 9 |
## 10 |
## 11 |
## 12 |
## 13 |
## 14 | 8
## 15 | 0234567777788899
## 16 | 00000111111222222223333333333344444445555555555566666666667777777888+2
## 17 | 000000001111122222333333333444445555555566666777778888888888889999
## 18 | 0000011222233333344445555677899
## 19 | 117
There is an outlier. Someone has a BMI of 500. This is not very credible. Looking at the data, it appears that when they were punched (input), the values of height and weight were transposed for one of the people. Person 12 has a height of 57 and weight of 166. Assuming this is backwards, the problem can be fixed by swapping the numbers.
Davis$weight[12] <- 57 # swap the numbers
Davis$height[12] <- 166 # yes, both of them
BodyMassIndex <- Davis$weight/((Davis$height/100)^2) #recompute the BMI
stem(BodyMassIndex) # check the plot
##
## The decimal point is at the |
##
## 15 | 8
## 16 | 9
## 17 | 14578899
## 18 | 111122356699
## 19 | 0234555566667789999
## 20 | 01122222222333334444445666677888999
## 21 | 00001111122334555566677889999
## 22 | 00011122444455557778888999
## 23 | 11223333344456778899
## 24 | 0256667788
## 25 | 001222445667889
## 26 | 013333445556
## 27 | 23358
## 28 | 46
## 29 | 78
## 30 | 12
## 31 |
## 32 |
## 33 |
## 34 |
## 35 |
## 36 | 7
That’s better.
We could code BMI for under, healthy, over, and obese and then run a table as we did for men and women. Or we can just select (subset) those who are underweight (which is what I did). Then we need to compute the proportion of underweight people in the sample. That gives us the probability of drawing an underweight person at random.
under <- BodyMassIndex[BodyMassIndex<18.5] # subset the data for those underweight
describe(under) # use this to find the number 18; we already know there are 200 people total.
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 18 17.75 0.63 17.88 17.82 0.41 15.82 18.47 2.65 -1.59 2.29
## se
## X1 0.15
p.sample <- 18/200 # find the proportion.
p.sample # print the proportion
## [1] 0.09
The probability of drawing someone who is underweight at random from this sample is .09.
describe(BodyMassIndex) # to find the mean and SD of the sample
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 200 22.25 3 21.8 22.07 2.54 15.82 36.73 20.91 0.92 1.94
## se
## X1 0.21
z.pop <- (18.5-22.25)/3 # to find z
z.pop # print z
## [1] -1.25
p.pop <- pnorm(z.pop) # to find the probability
p.pop # print the probability
## [1] 0.1056498
The probability of drawing someone underweight at random from a normally distributed population with mean = 22.25 and SD = 3 is .11. This is pretty close to our observed sample (p = .09).
We could compute the difference between reported and actual weight for each person and then see whether there was a larger mean difference (or mean of absolute differences) for men or women. (That would be an independent samples t-test.) Or we could create a scatterplot where each person was represented by a point for Y = reported and X = measured values. We could see the relations between actual and reported for each, and whether the relations between the two were similar for both males and females. (That would be analysis of covariance.) You will see how to compute and interpret such statistical tests later in the course.