Get the latest tech news

Why are your models so big? (2023)

I don’t understand why today’s LLMs are so large. Some of the smallest models getting coverage sit at 2.7B parameters, but even this seems pretty big to me. If you need generalizability, I totally get it. Things like chat applications require a high level of semantic awareness, and the model has to respond in a manner that’s convincing enough to its users. In cases where you want the LLM to produce something human-like, it makes sense that the brains would need to be a little juiced up.

None

Get the Android app

Or read this on Hacker News