The scientists developed a tool called "AgentBench" to benchmark LLM models as agents. Nearly two dozen researchers from Tsinghua University, Ohio State University and the University of California at Berkeley collaborated to create a method for measuring the capabilities of large language models (LLMs) as real-world agents. LLMs such as OpenAI’s ChatGPT and Anthropic’s Claude...Continue reading