Get the latest tech news
Domain Adaptation of Base Models + ShadowdarkQA Bench
Investigating the effects of continued pre-training for learning precise mechanical rules of TTRPGs.
I purchased a copy of Shadowdark RPG’s base rules, a 332 page document full of stylish formatting, thematic (to be read as: challenging) fonts and images. Inparticular, the ability to read in rules tables cleanly in markdown format as opossed to just a random series of numbers and symbols I would have to strip out of the dataset was hugely valuable. The specific lecture cited a lot of facts that would commonly appear in pretraining - famous presidents, for example (you would have a tough time scraping the whole internet and not knowing who George Washington was).
Or read this on Hacker News