New research from EPFL demonstrates that even the most recent large language models (LLMs), despite undergoing safety training, remain vulnerable to simple input manipulations that can cause them to behave in unintended or harmful ways.
Can we convince AI to answer harmful requests?
Tech News
-
Free Dark Web Monitoring Stamps the $17 Million Credentials Markets
-
Smart buildings: What happens to our free will when tech makes choices for us?
-
Screenshots have generated new forms of storytelling, from Twitter fan fiction to desktop film
-
Darknet markets generate millions in revenue selling stolen personal data, supply chain study finds
-
Privacy violations undermine the trustworthiness of the Tim Hortons brand
-
Why Tesla’s Autopilot crashes spurred the feds to investigate driver-assist technologies – and what that means for the future of self-driving cars