Measures of Relative Importance

The narrative on importance weights http://faculty.cas.usf.edu/mbrannick/regression/Part3/ImportanceNarrative.html describes several different measures of importance in regression models when the independent variables are correlated. Here I will show you a couple of different ways to calculate the numbers I showed in that file.

First, I will show you a population matrix and how to calculate regression indices from it. Then I will sample some data from that matrix and use the program ‘yhat’ to compute dominance analysis (one approach to calculating importance in regression).

#install.packages('yhat')  # for dominance analysis
#install.packages('MASS')  # for multivariate sampling
library(yhat)
library(MASS)
IV.rel.dat1 <- matrix(0,2,2)+diag(2)
IV.rel.dat1
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1

In this first matrix, we have set the correlation between independent variables to be zero.

In the second such matrix we will the correlation between independent variables to be .4.

IV.rel.dat2 <- IV.rel.dat1
IV.rel.dat2[2,1] <- .4
IV.rel.dat2[1,2] <- IV.rel.dat2[2,1]
IV.rel.dat2
##      [,1] [,2]
## [1,]  1.0  0.4
## [2,]  0.4  1.0

The the correlation between the two independent variables and the dependent variable are .5 and .6, respectively. The standardized regresion weights can be solved by matrix multiplication of the inverse of the predictor correlations by the vector of correlations between Y and each X. So now we can solve for the standardized regression weights (betas) for the same problem, once when the predictors are uncorrelated, and once when they are correlated .4.

r*beta

r.inv1 <- solve(IV.rel.dat1)
r.inv2 <- solve(IV.rel.dat2)
y <- rbind(.5, .6)
y
##      [,1]
## [1,]  0.5
## [2,]  0.6
beta1 <- r.inv1%*%y
beta1
##      [,1]
## [1,]  0.5
## [2,]  0.6
rsq1 <- t(beta1)%*%y
rsq1
##      [,1]
## [1,] 0.61
#
beta2 <- r.inv2%*%y
beta2
##           [,1]
## [1,] 0.3095238
## [2,] 0.4761905
rsq2 <- t(beta2)%*%y
rsq2
##           [,1]
## [1,] 0.4404762

The R command ‘solve’ finds the inverse of a matrix. The R command ‘t’ finds the transpose of a matrix. The command ’%*%’ instructs R to perform matrix multiplication.

Notice that when the independent variables are uncorrelated, the regression weights (beta1) are equal to the zero order correlations (.5 and .6). The R2 for the model is .61, which is the sum of the squared zero order correlations (.5^2 + .6^2).

When the independent variables are correlated .4, however, the regression weights are reduced to .31 and .48, and the R2 is reduced to .44.

Now that we have the regressions, we can compute the imporantance measures.

First, the product of beta and r:

prod1 <- beta1*y
prod2 <- beta2*y
prod1
##      [,1]
## [1,] 0.25
## [2,] 0.36
prod2
##           [,1]
## [1,] 0.1547619
## [2,] 0.2857143
sum(prod1) # equals Rsq of .61 when IVs are correlated 0
## [1] 0.61
sum(prod2) # equals Rsq of .44 when IVs are correlated .4
## [1] 0.4404762

In the first case, beta and r are equal, so we have the sum of r2, which equals 61. In the second case, the betas are reduced, and the sum of beta*r is .44.

Next we can decompose the total R2 into the shared part and the parts unique to each independent variable.

Incremental R2

# 
inc2b <- sum(prod2) - y[1]^2
inc1b <- sum(prod2) - y[2]^2
inc1b
## [1] 0.08047619
inc2b
## [1] 0.1904762
sum.unique <- inc1b+inc2b
sum.unique
## [1] 0.2709524
shared <- rsq2-sum.unique
shared
##           [,1]
## [1,] 0.1695238

To find the unique variance in Y accounted for by each independent variable, we take the total R2 and subtract the zero order r-squared from the other independent variable to yield inc1a and inc1b, which are .08 and .19 in this case. Note that if we have more independent variables, we would have to compute extra models so that the unique part was obtained by subtracting the R2 for the model containing all predictors but the one of interest from R2 for the full model that contains all the preditors, including the one of interest.

To find the shared variance, we add the uniue variances and subtract that sum from the total R2, which gives us .17 in this case. When we add .08, .19, and .17, we get .44, which is, of course, the total R2.

Dominance Analysis

To find the average increment (general dominance weights), we find the average incremental R2 for the whole model and for each and every subset. In our case, we only have 3 predictors, so we only have to consider each predictor separately and each as an increment beyond the others. As the number of predictors increases, the number of increments expands very rapidly, and computations can take a while.

# General Dominance Weights (average increments)
domg1 <- (y[1]^2+inc1b)/2
domg2 <- (y[2]^2+inc2b)/2
domg1
## [1] 0.1652381
domg2
## [1] 0.2752381
sum.dom <- domg1+domg2
sum.dom
## [1] 0.4404762

The average increments also add to the total R2, and do not have some of the awkward properties of the beta*r importance measures (namely, the average increment will never be negative and will only be zero if the predictor is uncorrelated with everything).

I have shown you the calculations in some detail so you can see what is happening up close. Typically, we will have raw data rather than a correlation matrix, and we will just want some answers rather than lots of calculations. Fortuntely, there are packages in R to provide us that. A nice choice the the ‘yhat’ package. Next I will sample some data from the population matrix and run that through yhat so you will see the general dominance weights.

# generate data from the population matrix
pop.mat <- matrix(0,3,3)+diag(3)  # a 3 by 3 matrix of zeros; add ones to the diagonal
pop.mat[2,1] <- .5                # set the correlation between X1 and Y to .5
pop.mat[1,2] <- pop.mat[2,1]
pop.mat[3,1] <- .6                # set the correlation between X2 and Y to .6
pop.mat[1,3] <- pop.mat[3,1]
pop.mat[3,2] <- .4                # set the correltion between X1 and X2 to .4
pop.mat[2,3] <- pop.mat[3,2]
pop.mat
##      [,1] [,2] [,3]
## [1,]  1.0  0.5  0.6
## [2,]  0.5  1.0  0.4
## [3,]  0.6  0.4  1.0
#
set.seed(123) # setting the seed makes the random numbers the same on each run
sample.dat <- mvrnorm(10000,mu=c(0,0,0),Sigma=pop.mat)
# multivariate normal sample of N=10,000 based on the population matrix we specified.
sample.dat <- data.frame(sample.dat)      # set the matrix to a data frame 
names(sample.dat)<- c("Y", "X1", "X2")    # change the names to be what we used earler
cor(sample.dat)                           # find the correlation matrix to compare to pop
##            Y        X1        X2
## Y  1.0000000 0.4996136 0.5969984
## X1 0.4996136 1.0000000 0.4026643
## X2 0.5969984 0.4026643 1.0000000

As you can see, the sample.dat correlation matrix is close to the population values, which is what we would expect given the large sample size (N=10,000).

Next we call the yhat package to run all possible regressions and the dominance analysis.

apsOut2=aps(sample.dat,'Y',list('X1', 'X2'))
apsOut2
## $ivID
##    [,1]
## X1    1
## X2    2
## 
## $PredBitMap
##    [,1] [,2] [,3]
## X1    1    0    1
## X2    0    1    1
## 
## $apsBitMap
##       [,1]
## X1       1
## X2       2
## X1,X2    3
## 
## $APSMatrix
##       k        R2
## X1    1 0.2496138
## X2    1 0.3564071
## X1,X2 2 0.4366076
dominance(apsOut2)
## $DA
##       k        R2     X1.Inc    X2.Inc
## X1    1 0.2496138         NA 0.1869938
## X2    1 0.3564071 0.08020051        NA
## X1,X2 2 0.4366076         NA        NA
## 
## $CD
##              X1        X2
## CD:0 0.24961379 0.3564071
## CD:1 0.08020051 0.1869938
## 
## $GD
##        X1        X2 
## 0.1649071 0.2717005

Notice that the results of ‘aps’ show all the possible regressions. At the bottom of the ‘aps’ results, we see that the results are R2 values of .26, .36, and .44 which argree with our population values (within sampling error).

Turning to the results of applying the ‘dominance’ analysis, we can see that all possible regression are again listed, along with the increments for each variable. In the DA part of the output, you will see each pair of variables for each model. In the CD (conditional dominance) part of the analysis, you will see the increments at each level (adding predictors to build up the models). The final section is GD (general dominance) output, where you will see the average of the incremental R2 values for each variable (here .16 and .27). These agree with our earlier population values to within sampling error.