Importance Weights

Measures of Relative Importance

The narrative on importance weights http://faculty.cas.usf.edu/mbrannick/regression/Part3/ImportanceNarrative.html describes several different measures of importance in regression models when the independent variables are correlated. Here I will show you a couple of different ways to calculate the numbers I showed in that file.

First, I will show you a population matrix and how to calculate regression indices from it. Then I will sample some data from that matrix and use the program ‘yhat’ to compute dominance analysis (one approach to calculating importance in regression).

#install.packages('yhat')  # for dominance analysis
#install.packages('MASS')  # for multivariate sampling
library(yhat)
library(MASS)
IV.rel.dat1 <- matrix(0,2,2)+diag(2)
IV.rel.dat1

##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1

In this first matrix, we have set the correlation between independent variables to be zero.

In the second such matrix we will the correlation between independent variables to be .4.

IV.rel.dat2 <- IV.rel.dat1
IV.rel.dat2[2,1] <- .4
IV.rel.dat2[1,2] <- IV.rel.dat2[2,1]
IV.rel.dat2

##      [,1] [,2]
## [1,]  1.0  0.4
## [2,]  0.4  1.0

The the correlation between the two independent variables and the dependent variable are .5 and .6, respectively. The standardized regresion weights can be solved by matrix multiplication of the inverse of the predictor correlations by the vector of correlations between Y and each X. So now we can solve for the standardized regression weights (betas) for the same problem, once when the predictors are uncorrelated, and once when they are correlated .4.

r*beta

r.inv1 <- solve(IV.rel.dat1)
r.inv2 <- solve(IV.rel.dat2)
y <- rbind(.5, .6)
y

##      [,1]
## [1,]  0.5
## [2,]  0.6

beta1 <- r.inv1%*%y
beta1

##      [,1]
## [1,]  0.5
## [2,]  0.6

rsq1 <- t(beta1)%*%y
rsq1

##      [,1]
## [1,] 0.61

#
beta2 <- r.inv2%*%y
beta2

##           [,1]
## [1,] 0.3095238
## [2,] 0.4761905

rsq2 <- t(beta2)%*%y
rsq2

##           [,1]
## [1,] 0.4404762

The R command ‘solve’ finds the inverse of a matrix. The R command ‘t’ finds the transpose of a matrix. The command ’%*%’ instructs R to perform matrix multiplication.

Notice that when the independent variables are uncorrelated, the regression weights (beta1) are equal to the zero order correlations (.5 and .6). The R² for the model is .61, which is the sum of the squared zero order correlations (.5^2 + .6^2).

When the independent variables are correlated .4, however, the regression weights are reduced to .31 and .48, and the R² is reduced to .44.

Now that we have the regressions, we can compute the imporantance measures.

First, the product of beta and r:

prod1 <- beta1*y
prod2 <- beta2*y
prod1

##      [,1]
## [1,] 0.25
## [2,] 0.36

prod2

##           [,1]
## [1,] 0.1547619
## [2,] 0.2857143

sum(prod1) # equals Rsq of .61 when IVs are correlated 0

## [1] 0.61

sum(prod2) # equals Rsq of .44 when IVs are correlated .4

## [1] 0.4404762

In the first case, beta and r are equal, so we have the sum of r², which equals 61. In the second case, the betas are reduced, and the sum of beta*r is .44.

Next we can decompose the total R² into the shared part and the parts unique to each independent variable.

Incremental R²

# 
inc2b <- sum(prod2) - y[1]^2
inc1b <- sum(prod2) - y[2]^2
inc1b

## [1] 0.08047619

inc2b

## [1] 0.1904762

sum.unique <- inc1b+inc2b
sum.unique

## [1] 0.2709524

shared <- rsq2-sum.unique
shared

##           [,1]
## [1,] 0.1695238

To find the unique variance in Y accounted for by each independent variable, we take the total R² and subtract the zero order r-squared from the other independent variable to yield inc1a and inc1b, which are .08 and .19 in this case. Note that if we have more independent variables, we would have to compute extra models so that the unique part was obtained by subtracting the R² for the model containing all predictors but the one of interest from R² for the full model that contains all the preditors, including the one of interest.

To find the shared variance, we add the uniue variances and subtract that sum from the total R², which gives us .17 in this case. When we add .08, .19, and .17, we get .44, which is, of course, the total R².

Dominance Analysis

To find the average increment (general dominance weights), we find the average incremental R² for the whole model and for each and every subset. In our case, we only have 3 predictors, so we only have to consider each predictor separately and each as an increment beyond the others. As the number of predictors increases, the number of increments expands very rapidly, and computations can take a while.

# General Dominance Weights (average increments)
domg1 <- (y[1]^2+inc1b)/2
domg2 <- (y[2]^2+inc2b)/2
domg1

## [1] 0.1652381

domg2

## [1] 0.2752381

sum.dom <- domg1+domg2
sum.dom

## [1] 0.4404762

The average increments also add to the total R², and do not have some of the awkward properties of the beta*r importance measures (namely, the average increment will never be negative and will only be zero if the predictor is uncorrelated with everything).

I have shown you the calculations in some detail so you can see what is happening up close. Typically, we will have raw data rather than a correlation matrix, and we will just want some answers rather than lots of calculations. Fortuntely, there are packages in R to provide us that. A nice choice the the ‘yhat’ package. Next I will sample some data from the population matrix and run that through yhat so you will see the general dominance weights.

# generate data from the population matrix
pop.mat <- matrix(0,3,3)+diag(3)  # a 3 by 3 matrix of zeros; add ones to the diagonal
pop.mat[2,1] <- .5                # set the correlation between X1 and Y to .5
pop.mat[1,2] <- pop.mat[2,1]
pop.mat[3,1] <- .6                # set the correlation between X2 and Y to .6
pop.mat[1,3] <- pop.mat[3,1]
pop.mat[3,2] <- .4                # set the correltion between X1 and X2 to .4
pop.mat[2,3] <- pop.mat[3,2]
pop.mat

##      [,1] [,2] [,3]
## [1,]  1.0  0.5  0.6
## [2,]  0.5  1.0  0.4
## [3,]  0.6  0.4  1.0

#
set.seed(123) # setting the seed makes the random numbers the same on each run
sample.dat <- mvrnorm(10000,mu=c(0,0,0),Sigma=pop.mat)
# multivariate normal sample of N=10,000 based on the population matrix we specified.
sample.dat <- data.frame(sample.dat)      # set the matrix to a data frame 
names(sample.dat)<- c("Y", "X1", "X2")    # change the names to be what we used earler
cor(sample.dat)                           # find the correlation matrix to compare to pop

##            Y        X1        X2
## Y  1.0000000 0.4996136 0.5969984
## X1 0.4996136 1.0000000 0.4026643
## X2 0.5969984 0.4026643 1.0000000

As you can see, the sample.dat correlation matrix is close to the population values, which is what we would expect given the large sample size (N=10,000).

Next we call the yhat package to run all possible regressions and the dominance analysis.

apsOut2=aps(sample.dat,'Y',list('X1', 'X2'))
apsOut2

## $ivID
##    [,1]
## X1    1
## X2    2
## 
## $PredBitMap
##    [,1] [,2] [,3]
## X1    1    0    1
## X2    0    1    1
## 
## $apsBitMap
##       [,1]
## X1       1
## X2       2
## X1,X2    3
## 
## $APSMatrix
##       k        R2
## X1    1 0.2496138
## X2    1 0.3564071
## X1,X2 2 0.4366076

dominance(apsOut2)

## $DA
##       k        R2     X1.Inc    X2.Inc
## X1    1 0.2496138         NA 0.1869938
## X2    1 0.3564071 0.08020051        NA
## X1,X2 2 0.4366076         NA        NA
## 
## $CD
##              X1        X2
## CD:0 0.24961379 0.3564071
## CD:1 0.08020051 0.1869938
## 
## $GD
##        X1        X2 
## 0.1649071 0.2717005

Notice that the results of ‘aps’ show all the possible regressions. At the bottom of the ‘aps’ results, we see that the results are R² values of .26, .36, and .44 which argree with our population values (within sampling error).

Turning to the results of applying the ‘dominance’ analysis, we can see that all possible regression are again listed, along with the increments for each variable. In the DA part of the output, you will see each pair of variables for each model. In the CD (conditional dominance) part of the analysis, you will see the increments at each level (adding predictors to build up the models). The final section is GD (general dominance) output, where you will see the average of the incremental R² values for each variable (here .16 and .27). These agree with our earlier population values to within sampling error.

Importance Weights

Michael Brannick

12/17/2017

Measures of Relative Importance

r*beta

Incremental R²

Dominance Analysis

Importance Weights

Michael Brannick

12/17/2017

Measures of Relative Importance

r*beta

Incremental R2

Dominance Analysis

Incremental R²