Wednesday, September 5, 2007

Man Vs Regression

I have been slinking my way through a new book, Super Crunchers. While the book itself has not sent down any truly illuminating beams of light from the heavens for me, it has brought to illumination a few interesting tidbits.

The basic premise is that statistical experimentation and data mining represents a massive leap in our ability to predict. For anyone already knows a thing or two about regressions and how they are used, the book weaves its way through some predictable examples of statistically driven models outdoing the best judgment of experts. Where the book really starts to shine is when it puts the light on some more unusual cases of data crunching that resulted in superior predictive results.

One of the more notable examples given is when a simple multivariate regression taking into account just six factors was able to out predict a team of 80+ political science and law experts when deciding how the Supreme Court would vote on various cases. To add insult to injury, the experts only gave their opinion when it dealt with the field that they were working in. A simple formula was able to out predict a small army of experts in what most would consider to be a very humanistic field.

This "Regression Vs The Law and Political Experts" example is not an isolated incident. In a survey of some 120+ academic studies built around experts competing with a multivariate regression over a vast range of topics, it was found that in only 8 cases did the experts defeat the regression.

As a user of regression analysis, this has got me thinking about where I can apply this sort of statistical thinking that reaches beyond the limited engineering optimization experiments that I normally engage in. One example brought up in Super Crunchers of particular interest to me was when a statistician with minimal knowledge of businesses put his multi-variate regression up against expert buyers for a particular company. He found that his regression was vastly superior in predicting if and when an order would be filled, and if it would come in on budget. While I have little desire (yet) to look at how the company I work for, Ballard, does its buying, it does call to question if other non-engineering aspects of my job might not be served by a regression analysis.

Most notably, I wonder if projecting project completion times might not benefit from the cold and object analysis that a statistical approach can bring. If there is one theme that Super Crunchers brings up constantly, it is that humans tend to make poorer decisions when presented with more variables and more emotional attachment. Something like project projection times fits both conditions for begging for poor human judgment. That isn't to say that human judgment needs to be removed, just that predicative capability might be improved if human judgment is just one variable to consider when making a prediction, instead of being the only variable. By letting human judgment be a factor in a regression, you can take into account factors that might be over look, while at the same time leaving the final decision out of human hands.

Super Crunchers has not exactly brought me any new and profound enlightenment, but it has opened my eyes a little to both the accuracy of a statical evidence based approach and to the breadth of areas it can be used in.

No comments: