Get the latest tech news
Unlocking a Million Times More Data for AI
How a new ARPANET-style program could solve the data accessibility problem
Unlike web-scraped data, these datasets are continuously validated for accuracy because organizations depend on them, creating natural quality controls that make even a small fraction of this massive pool extraordinarily valuable for specialized AI applications. Together, these PETs work in concert to create structured transparency, ensuring that neither training data, model weights, nor user queries are ever exposed in unencrypted form to unauthorized parties, while cryptographic attestation maintains the attribution chains. Just as IBM, Bell Telephone, and Microsoft didn’t necessarily have the right incentives to bring together the nation’s supercomputers under the banner of TCP/IP, WWW, and HTTP, today’s AI titans are naturally focused on their individual competitive advantages rather than building shared infrastructure.
Or read this on Hacker News