mtb home

In this example, I use PROC IML, the SAS matrix language, to compute some regression results given some data. I then use PROC REG to show the correspondence between the matrix and REG results. This example is to help you link the matrix algebra to concrete numbers and to the SAS output that you will use.

Our input data:

X

y

1 4 5 6

5

1 2 3 4

2

1 4 5 4

5

1 2 3 4

3

1 4 3 2

4

1 4 4 5

4

1 2 3 4

3

1 4 5 4

5

1 7 8 9

8

1 5 4 3

5

Note that the design matrix for X has a column of 1s. This is raw data. Y is a single column (the last column).

To find the b weights, we need

(X'X) -1X'y=b

so

X'X

10 38 43 45

38 166 182 186

43 182 207 216

45 186 216 235

and

(X'X) -1

0.9505447 -0.053377 -0.114379 -0.034641

-0.053377 0.2309368 -0.29085 0.0947712

-0.114379 -0.29085 0.5196078 -0.22549

-0.034641 0.0947712 -0.22549 0.1431373

and

X'y

44

189

211

217

 

Therefore, when we multiply, we get

b

0.0847495

0.4945534

0.7026144

-0.130065

 

 

To find the beta weights, we need

b = R-1 r

R

1 0.8513143 0.5661385

0.8513143 1 0.8395464

0.5661385 0.8395464 1

R-1

4.9882353 -6.354649 2.5109908

-6.354649 11.483333 -6.043179

2.5109908 -6.043179 4.6519608

r

.9495869

.9387835

.6747098

beta

0.4653129

0.6686799

-0.15011

Xb

Y'

4.7956427

2.6614379

5.0557734

2.6614379

3.9106754

4.2230937

2.6614379

5.0557734

7.9969499

4.9777778

Squared correlation between Y and Y':

RSQ

0.9683203

Y-Y'

resid

0.2043573

-0.661438

-0.055773

0.3385621

0.0893246

-0.223094

0.3385621

-0.055773

0.0030501

0.0222222

 

SAS Proc Reg output for the same problem:

 

Model: MODEL1

Dependent Variable: Y

Analysis of Variance

Source

DF

Sum of Squares

Mean Square

F Value

Prob>F

Model

3

23.62702

7.87567

61.132

0.0001

Error

6

0.77298

0.12883

   

C Total

9

24.40000

     

Root MSE

0.35893

R-square

0.9683

Dep Mean

4.40000

Adj R-sq

0.9525

C.V.

8.15750

   

Parameter Estimates

Variable

DF

Parameter Estimate

Standard Error

T for H0: Parameter=0

Prob > |T|

Standardized Estimate

INTERCEP

1

0.084749

0.34994203

0.242

0.8167

0.00000000

X1

1

0.494553

0.17248702

2.867

0.0285

0.46531294

X2

1

0.702614

0.25873053

2.716

0.0348

0.66867988

X3

1

-0.130065

0.13579575

-0.958

0.3751

-0.15010958

 The data below are the observation number, X1-X3, Y, Y' and the residuals.

1 4 5 6 5 4.79564 0.20436

2 2 3 4 2 2.66144 -0.66144

3 4 5 4 5 5.05577 -0.05577

4 2 3 4 3 2.66144 0.33856

5 4 3 2 4 3.91068 0.08932

6 4 4 5 4 4.22309 -0.22309

7 2 3 4 3 2.66144 0.33856

8 4 5 4 5 5.05577 -0.05577

9 7 8 9 8 7.99695 0.00305

10 5 4 3 5 4.97778 0.02222