Generative AI models can be prompted with just a few words to insert offensive or discriminatory text messages into images. Aditya Kumar from the SPRINT-ML Lab at the CISPA Helmholtz Center for Information Security is investigating how such outputs can be reliably prevented. To address this, he developed ToxicBench, a test dataset that evaluates how well image-generating AI systems handle offensive inputs. He also created a fine-tuning strategy to adapt the models accordingly.
AI image generators get a new safety test for hidden toxic text in memes
Tech News
-
HighlightsFree Dark Web Monitoring Stamps the $17 Million Credentials Markets
-
HighlightsSmart buildings: What happens to our free will when tech makes choices for us?
-
AppsScreenshots have generated new forms of storytelling, from Twitter fan fiction to desktop film
-
HighlightsDarknet markets generate millions in revenue selling stolen personal data, supply chain study finds
-
SecurityPrivacy violations undermine the trustworthiness of the Tim Hortons brand
-
Featured HeadlinesWhy Tesla’s Autopilot crashes spurred the feds to investigate driver-assist technologies – and what that means for the future of self-driving cars

