Large language models (LLMs) can teach other algorithms unwanted traits, which can persist even when training data has been scrubbed of the original trait, according to new research published in Nature. In one example, a model seems to transmit a preference for owls to other models via hidden signals in data. The findings demonstrate that more thorough safety checks are needed when producing LLMs.
AI chatbot teaches AI ‘student’ to love owls, even after data is scrubbed
Tech News
-
HighlightsFree Dark Web Monitoring Stamps the $17 Million Credentials Markets
-
HighlightsSmart buildings: What happens to our free will when tech makes choices for us?
-
AppsScreenshots have generated new forms of storytelling, from Twitter fan fiction to desktop film
-
HighlightsDarknet markets generate millions in revenue selling stolen personal data, supply chain study finds
-
SecurityPrivacy violations undermine the trustworthiness of the Tim Hortons brand
-
Featured HeadlinesWhy Tesla’s Autopilot crashes spurred the feds to investigate driver-assist technologies – and what that means for the future of self-driving cars

