Get the latest tech news
Self-hosting keeps your private data out of AI models
Last week, Slack’s users realized that under the company’s terms of service, their private data could be used to train artificial-intelligence models. — Slack’s privacy principles, May 17, 2024 This came as a shock: chat messages convey sensitive company data, and LLMs (large language models, the category …
If you do find out that your data has been misused to train an LLM, convincing a court to compensate you for the damage to your business will be an arduous and uncertain journey. In creating the GitHub Copilot AI model, Microsoft and OpenAI blatantly ignored these license restrictions, and are defending this policy as “fair use” in a class-action lawsuit. When that’s not practical, self-hosting key applications in a public cloud should be safe enough for most organizations: both technical and PR considerations make it highly unlikely that a hosting provider would break into a customer’s database to access data.
Or read this on Hacker News