Simple Regression

Regression of Weight on Height

In the first example, I will use data I manufactured as an example and present in the accompanying PowerPoint slides. In the second example, I will use the Davis data in the ‘car’ package, where the data are measures from real people.

First, the PowerPoint data - measures are not metric (inches for height and pounds for weight):

library(psych)
Ht <- c(61, 62, 63, 65, 65, 68, 69, 70, 72, 75)
Wt <- c(105, 120, 120, 160, 120, 145, 175, 160, 185, 210)
describe(Ht)

##    vars  n mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 10   67 4.57   66.5   66.75 5.19  61  75    14 0.26    -1.39 1.45

describe(Wt)

##    vars  n mean    sd median trimmed   mad min max range skew kurtosis
## X1    1 10  150 33.99  152.5  148.12 48.18 105 210   105 0.27     -1.4
##       se
## X1 10.75

cor(Ht, Wt)

## [1] 0.9368622

As you can see, the means, standard deviations, and correlation match what you saw in the PowerPoint slides.

The code to compute the regression is simple:

res1 <- lm(Wt~Ht)             # run simple linear regression
summary(res1)                 # print detailed information about the regression

## 
## Call:
## lm(formula = Wt ~ Ht)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.064  -8.976  -0.984   4.694  23.936 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -316.8617    61.7404  -5.132 0.000894 ***
## Ht             6.9681     0.9196   7.578 6.44e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.61 on 8 degrees of freedom
## Multiple R-squared:  0.8777, Adjusted R-squared:  0.8624 
## F-statistic: 57.42 on 1 and 8 DF,  p-value: 6.439e-05

Note that lm stands for linear model, and the dependent variable (weight in this model) is listed first, followed by the tilde (is distributed as…), followed by the independent variable.

The summary presents the values of the intercept and slope (estimates), which agree with those presented in the PowerPoint slides. We also get significance tests for each of the estimates as well as R-square for the model, and in the case of a single independent variable, the test for the model is equivalent to the test of the slope for the independent variable.

We can use the object ‘res1’ to find the predicted and residual values as well:

predict(res1)                 # print the predicted values of Y (weight)

##        1        2        3        4        5        6        7        8 
## 108.1915 115.1596 122.1277 136.0638 136.0638 156.9681 163.9362 170.9043 
##        9       10 
## 184.8404 205.7447

resid(res1)                   # print the residuals values of Y (weight)

##           1           2           3           4           5           6 
##  -3.1914894   4.8404255  -2.1276596  23.9361702 -16.0638298 -11.9680851 
##           7           8           9          10 
##  11.0638298 -10.9042553   0.1595745   4.2553191

These values also agree with those shown in the PowerPoint slides.

We can also create a graph similar to the one shown in the slides:

plot(Ht, Wt, xlab= "Height in Inches", ylab="Weight in Pounds") # scatterplot (X, Y)
     abline(lm(Wt~Ht))
     abline(v=mean(Ht))
     abline(h=mean(Wt))
     text(62, 200, "Y'=a+bX")
     text(64, 180, "Y'=-316.86+6.97X")

We wouldn’t usually run a regression with just 10 people because the estimate of the line is pretty poor unless we have over 100 observations.

Similar analysis for real height and weight in Davis data in ‘car.’ These are metric measures (centimeters and kilograms). One of the people in the dataset has height and weight reversed, so there is code here to swap them.

library(car)

## 
## Attaching package: 'car'

## The following object is masked from 'package:psych':
## 
##     logit

Davis$weight[12] <- 57
Davis$height[12] <- 166
res2 <- lm(Davis$weight~Davis$height)
summary(res2)

## 
## Call:
## lm(formula = Davis$weight ~ Davis$height)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.658  -5.381  -0.555   4.807  42.894 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -130.91040   11.52792  -11.36   <2e-16 ***
## Davis$height    1.15009    0.06749   17.04   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.505 on 198 degrees of freedom
## Multiple R-squared:  0.5946, Adjusted R-squared:  0.5925 
## F-statistic: 290.4 on 1 and 198 DF,  p-value: < 2.2e-16

describe(Davis$weight)

##    vars   n  mean    sd median trimmed   mad min max range skew kurtosis
## X1    1 200 65.25 13.32     63   64.04 11.86  39 119    80 0.91     0.86
##      se
## X1 0.94

describe(Davis$height)

##    vars   n   mean   sd median trimmed  mad min max range skew kurtosis
## X1    1 200 170.56 8.93  169.5  170.36 9.64 148 197    49 0.22    -0.37
##      se
## X1 0.63

plot(Davis$height, Davis$weight, xlab= "Height in Centimeters", ylab="Weight in Kilograms")
     abline(lm(Davis$weight~Davis$height))

Simple Regression

Michael Brannick

12/2/2017

Regression of Weight on Height