At 2 am today, OpenAI open-sourced a test benchmark dedicated to the function of the agent browser - BrowseComp. This test benchmark is very difficult. Even OpenAI's own GPT-4o and GPT-4.5 have an accuracy rate of only 0.6% and 0.9% almost 0, and even using GPT-4o with browser function is only 1.9%. But OpenAI's latest agent model Deep Research has an accuracy rate of 51.5%, which is excellent in autonomous search, information integration, and accuracy calibration. (AIGC Open Community)
今天凌晨2点,OpenAI开源了专门用于智能体浏览器功能的测试基准——BrowseComp。这个测试基准非常有难度,连OpenAI自己的GPT-4o、GPT-4.5准确率只有0.6%和0.9%几乎为0,即便使用带浏览器功能的GPT-4o也只有1.9%。但OpenAI最新发布的Agent模型Deep Research准确率高达51.5%,在自主搜索、信息整合、准确性校准方面非常优秀。(AIGC开放社区)