Get the latest tech news
Stepwise selection of variables in regression is Evil
Stepwise variable selection is bad and dangerous, and you shouldn't do it. It increases false positives. It drops variables that should be in the model. It gives biased estimates for regression coefficients. The problems are worse for smaller samples; higher correlation between the X variables; and models with weaker explanatory power for the y (i.e. lower R-squared).
But most of the time that I see the method used (including recent examples being distributed by so-called experts as part of their online teaching), the end model is indeed used for interpretation, and I have no doubt this is also the case with much published science. Rather than include a whole bunch of individual cases, I ran some more simulations covering a range of such values so we can see the relationship to those parameters of the average bias in the estimated regression coefficients remaining in the model. Use theory-driven model selection if it’s explanation you’re after, Bayesian methods are going to be good too as a complement to that and forcing you to think about the problem; and for regression-based prediction use a lasso or elastic net regularization.
Or read this on Hacker News