News list for "browsecomp"

OpenAI Open Source BrowseComp, Reinventing Agent Browser Reviews

At 2 am today, OpenAI open-sourced a test benchmark dedicated to the function of the agent browser - BrowseComp. This test benchmark is very difficult. Even OpenAI's own GPT-4o and GPT-4.5 have an accuracy rate of only 0.6% and 0.9% almost 0, and even using GPT-4o with browser function is only 1.9%. But OpenAI's latest agent model Deep Research has an accuracy rate of 51.5%, which is excellent in autonomous search, information integration, and accuracy calibration. (AIGC Open Community)

clock
2025-04-10 20:46:09
OpenAI开源BrowseComp,重塑Agent浏览器评测

今天凌晨2点,OpenAI开源了专门用于智能体浏览器功能的测试基准——BrowseComp。这个测试基准非常有难度,连OpenAI自己的GPT-4o、GPT-4.5准确率只有0.6%和0.9%几乎为0,即便使用带浏览器功能的GPT-4o也只有1.9%。但OpenAI最新发布的Agent模型Deep Research准确率高达51.5%,在自主搜索、信息整合、准确性校准方面非常优秀。(AIGC开放社区)

clock
2025-04-10 20:46:09
Disclaimer:
1. The information provided does not constitute investment advice. Investors should make independent decisions and bear all risks themselves.
2. The copyright of this content belongs to the original author. The views expressed herein are solely those of the author and do not represent the stance or position of this website.