Get the latest tech news
Using reinforcement learning and $4.80 of GPU time to find the best HN post
Convert expensive LLM prompts into fast, cheap fine-tuned models
OpenPipe is a managed fine-tuning service that makes it easy to build your own LLMs that achieve very high accuracy on a specific task. For certain applications like chatbots or recommendation systems, you can proactively offer the user several potential outputs to choose from, and use which option they preferred as your feedback signal. There are some good diamonds in the rough there that probably should have sparked more discussion like dealing with Machiavellian co-founders, pushing through burn-out, and recording a signal from a single electron.
Or read this on Hacker News