OpenAI releases CoT monitoring to stop malicious behavior of large models
2025-03-10 22:50:05
OpenAI has released the latest research. Using CoT (thinking chain) monitoring, it can prevent malicious behaviors such as big models talking nonsense and hiding their true intentions. It is also one of the effective tools for monitoring super models. OpenAI uses the newly released cutting-edge model o3-mini as the monitored object, and uses the weaker GPT-4o model as the monitor. The testing environment is a coding task, requiring AI to implement functions in the code base to pass the unit test. The results show that the CoT monitor performs well when detecting systematic "reward hacking" behavior, with a recall rate of 95%, far exceeding the 60% of monitoring behavior alone.
Disclaimer:
1. The information provided does not constitute investment advice. Investors should make independent decisions and bear all risks themselves.
2. The copyright of this content belongs to the original author. The views expressed herein are solely those of the author and do not represent the stance or position of this website.
Previous article:
OpenAI发布CoT监控,阻止大模型恶意行为Next article:
业内对于白宫首届加密货币峰会反应褒贬不一