Adventures in DM: Simple Logistic Regression (GLM and Optim)

We will be using the data from the 1998 KDD Cup in the next couple of posts - at least a couple of the columns- which we will re-purpose. Using R to download and unzip the data set, we will keep three fields, renaming all and binning the latter of them. We are left with the following fields:

Respond: 1 if the prospect donated, 0 otherwise.
Amount: The amount the prospect donated.
Gifts: We will create three dummy variables that will take on the value 1 if the value is 1, 2, or 3 respectively with 4+ set as our reference level.

We will imagine here that the Gifts variable is a nominal variable representing 4 levels of an experiment. For concreteness, we can imagine that we sent a direct mail letter to prospects asking them for a donation, with a free gift for anyone donating. We are interested in which of these gifts led to the greatest response. Say, there was a free gift card to department store X versus a free gift card to store Y etc. These free gift offers were randomized among the prospects in an unbalanced way (sample sizes are quite different in this data set).

To analyze the result of our simple experiment, we decide to fit a logistic regression to the data examining the effect of the 4 gifts on the probability that the prospect will respond (ignoring the amount donated). We are making Gift 4 the reference level. In this simply case, we could use other methods, but a logistic regression will generalize to larger more complex experiments.

We can do this easily in R using the glm function. Here we quickly examine the coefficients that were fit via maximum likelihood, their standard errors, the log-likelihood of the data at the MLEs and the variance-covariance matrix of these estimates.

For reasons that will be useful latter, we can also use the optim function, along with the log-likelihood of the logistic model to achieve the same results. Here we minimize the negative log likelihood and compare the estimates to those from glm - noting they are very nearly identical.

Next post will examine a linear hypothesis test - looking at the difference between Gifts=2 and Gifts=3.

Adventures in DM

Sunday, December 30, 2012

Simple Logistic Regression (GLM and Optim)

No comments:

Post a Comment