Get the latest tech news
Adventures in Imbalanced Learning and Class Weight
Finally turning that stone
I set up a rudimentary imbalanced classification pipeline with scikit-learn ’s make_classification and DecisionTreeClassifier, and created an empirical version of the above plot, using class_sep as a proxy for the tradeoff curve. As for why the plot looks different for bigger values of \(\alpha\), my hunch is that the tradeoff curve isn’t symmetric, allowing the classifier to get a decent recall without sacrificing precision entirely. After publishing the post it’s been pointed out to me that there are tutorials that specifically demonstrate how inverse proportion weighting (or stratified under- / oversampling, which is pretty equivalent) improves imbalanced classification performance.
Or read this on Hacker News