Get the latest tech news
How large are large language models?
base model trends.md. GitHub Gist: instantly share code, notes, and snippets.
LLaMA was officially stated to use Books3 source as a data set - this is a very important dataset which has been pivotal in lawmaking regarding the training of AIs on large amounts of copyrighted and potentially pirated material. Empirically, we find that annealing (see Section 3.4.3) on small amounts of high-quality code and mathematical data can boost the performance of pre-trained models on key benchmarks Attempts to match GPT-3 level performance with downloadable weights were hindered by this, and genuinely, I do not think people understood that the raw size of the model being comparable to 175B was required.
Or read this on Hacker News