In the first example, I will use data I manufactured as an example and present in the accompanying PowerPoint slides. In the second example, I will use the Davis data in the ‘car’ package, where the data are measures from real people.
First, the PowerPoint data - measures are not metric (inches for height and pounds for weight):
library(psych)
Ht <- c(61, 62, 63, 65, 65, 68, 69, 70, 72, 75)
Wt <- c(105, 120, 120, 160, 120, 145, 175, 160, 185, 210)
describe(Ht)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 10 67 4.57 66.5 66.75 5.19 61 75 14 0.26 -1.39 1.45
describe(Wt)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 10 150 33.99 152.5 148.12 48.18 105 210 105 0.27 -1.4
## se
## X1 10.75
cor(Ht, Wt)
## [1] 0.9368622
As you can see, the means, standard deviations, and correlation match what you saw in the PowerPoint slides.
The code to compute the regression is simple:
res1 <- lm(Wt~Ht) # run simple linear regression
summary(res1) # print detailed information about the regression
##
## Call:
## lm(formula = Wt ~ Ht)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.064 -8.976 -0.984 4.694 23.936
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -316.8617 61.7404 -5.132 0.000894 ***
## Ht 6.9681 0.9196 7.578 6.44e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.61 on 8 degrees of freedom
## Multiple R-squared: 0.8777, Adjusted R-squared: 0.8624
## F-statistic: 57.42 on 1 and 8 DF, p-value: 6.439e-05
Note that lm stands for linear model, and the dependent variable (weight in this model) is listed first, followed by the tilde (is distributed as…), followed by the independent variable.
The summary presents the values of the intercept and slope (estimates), which agree with those presented in the PowerPoint slides. We also get significance tests for each of the estimates as well as R-square for the model, and in the case of a single independent variable, the test for the model is equivalent to the test of the slope for the independent variable.
We can use the object ‘res1’ to find the predicted and residual values as well:
predict(res1) # print the predicted values of Y (weight)
## 1 2 3 4 5 6 7 8
## 108.1915 115.1596 122.1277 136.0638 136.0638 156.9681 163.9362 170.9043
## 9 10
## 184.8404 205.7447
resid(res1) # print the residuals values of Y (weight)
## 1 2 3 4 5 6
## -3.1914894 4.8404255 -2.1276596 23.9361702 -16.0638298 -11.9680851
## 7 8 9 10
## 11.0638298 -10.9042553 0.1595745 4.2553191
These values also agree with those shown in the PowerPoint slides.
We can also create a graph similar to the one shown in the slides:
plot(Ht, Wt, xlab= "Height in Inches", ylab="Weight in Pounds") # scatterplot (X, Y)
abline(lm(Wt~Ht))
abline(v=mean(Ht))
abline(h=mean(Wt))
text(62, 200, "Y'=a+bX")
text(64, 180, "Y'=-316.86+6.97X")
We wouldn’t usually run a regression with just 10 people because the estimate of the line is pretty poor unless we have over 100 observations.
Similar analysis for real height and weight in Davis data in ‘car.’ These are metric measures (centimeters and kilograms). One of the people in the dataset has height and weight reversed, so there is code here to swap them.
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
Davis$weight[12] <- 57
Davis$height[12] <- 166
res2 <- lm(Davis$weight~Davis$height)
summary(res2)
##
## Call:
## lm(formula = Davis$weight ~ Davis$height)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.658 -5.381 -0.555 4.807 42.894
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -130.91040 11.52792 -11.36 <2e-16 ***
## Davis$height 1.15009 0.06749 17.04 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.505 on 198 degrees of freedom
## Multiple R-squared: 0.5946, Adjusted R-squared: 0.5925
## F-statistic: 290.4 on 1 and 198 DF, p-value: < 2.2e-16
describe(Davis$weight)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 200 65.25 13.32 63 64.04 11.86 39 119 80 0.91 0.86
## se
## X1 0.94
describe(Davis$height)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 200 170.56 8.93 169.5 170.36 9.64 148 197 49 0.22 -0.37
## se
## X1 0.63
plot(Davis$height, Davis$weight, xlab= "Height in Centimeters", ylab="Weight in Kilograms")
abline(lm(Davis$weight~Davis$height))