Get the latest tech news

I want to break some laws too


I made an automated pipeline to clean data. The idea started from a paper called Minipile. This led me to a rabbit hole. If you’re careful about the data you use for training, you can can break the scaling laws. Who knew being a data snob could be so rewarding?

Model Performance vs Labels Seen (axis is in Log Scale,)You can even see kind of a trend line in this second image if you squint your eyes hard enough. The good thing is that the authors of this paper proposed a method to select the hardest or the easiest examples without any human supervision. I was using conda because I wanted to replicate this as closely as possible, but I think a better idea for next time would be to just create an empty environment and install everything directly.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of laws

laws

Related news:

News photo

Laws Need to Catch Up to Artificial Intelligence’s Unique Risks

News photo

US may exempt latest chip fabs from eco red-tape, but power is still a trip

News photo

The Eleven Laws of Showrunning [pdf]