Some of the things they'll be looking to find: How can chatbots be manipulated to cause harm? Will they share the private information we confide in them to other users? And why do they assume a doctor is a man and a nurse is a woman?
“This is why we need thousands of people," said Rumman Chowdhury, lead coordinator of the mass hacking event planned for this summer's DEF CON hacker convention in Las Vegas that's expected to draw several thousand people.
"We need a lot of people with a wide range of lived experiences, subject matter expertise and backgrounds hacking at these models and trying to find problems that can then go be fixed,” Chowdury added.
The idea of a mass hack caught the attention of U.S. government officials in March at the South by Southwest festival in Austin, Texas, where Sven Cattell, founder of DEF CON’s long-running AI Village, and Austin Carson, president of responsible AI nonprofit SeedAI, helped lead a workshop inviting community college students to hack an AI model.
Carson said those conversations eventually blossomed into a proposal to test AI language models following the guidelines of The White House's blueprint for an AI Bill of Rights — a set of principles to limit the impacts of algorithmic bias, give users control over their data, and ensure that automated systems are used safely and transparently.
There’s already a community of users trying their best to trick chatbots and highlight their flaws. Some are official “red teams” authorized by the companies to “prompt attack” the AI models to discover their vulnerabilities. Many others are hobbyists showing off humorous or disturbing outputs on social media until they get banned for violating a product’s terms of service.
Chowdhury, now the co-founder of AI accountability nonprofit Humane Intelligence, said it's not just about finding flaws but about figuring out ways to fix them.
“This is a direct pipeline to give feedback to companies,” she said. “It’s not like we’re just doing this hackathon and everybody’s going home. We’re going to be spending months after the exercise compiling a report, explaining common vulnerabilities, things that came up, patterns we saw.”