Get the latest tech news

Weird Kaggle and the Superiority of Books


I recently entered a Kaggle competition to brush up on some modeling skills.1 The analysis problem is a pretty typical clinical prediction question. My final product is, well, "final product" is an extremely charitable description of it.2 But despite this it was a fascinating and worthwhile experience, full of interesting questions to ponder, such as "what is the fundamental difference between statistics and machine learning?" and "do they realize their evaluation metric is pretty silly?" The following are some notes from my project log.3 Wait, you want me to maximize what? As I read the rules, they started off normally.

It evaluates submissions based on concordance, which is just a scoring method that rewards the model for correctly ranking who will survive the longest (it doesn't matter if you are wrong about how much longer). So to sum up, it seems like the statistics approach is more about carefully crafted models that you try to understand more deeply: you can apply background knowledge (and priors), and you can mathematically express theories and test them out. The problem is that the loss function is no longer an independent calculation for each data point (it becomes non-decomposable), which is not supported by the library and likely comes with a big cost in terms of optimizing the computation.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of books

books

Photo of weird kaggle

weird kaggle

Photo of superiority

superiority

Related news:

News photo

Amazon Is Making It Harder to Move Your E-Books Around | Critics of Amazon's strangehold over the e-book industry have another thorn in their side.

News photo

Court filings show Meta paused efforts to license books for AI training

News photo

100 Or so Books that shaped a Century of Science