A few months after launching Qwen3-VL, Alibaba has released a detailed technical report on the open multimodal model. The data shows the system excels at image-based math tasks and can analyze hours ...
Microsoft is adding new AI shopping tools to its Edge browser in the US. The built-in Copilot can now surface price comparisons, price histories, and cashback options right inside the browser. Users ...
The White House has reportedly put a hold on a draft executive order that would have let federal law override state-level AI regulations. According to Reuters, the draft called for the Department of ...
The latest pre-release version of Google's Gemini 2.5 Pro language model brings major improvements for front-end development and complex programming tasks. Google has launched an updated preview of ...
The company has hired Aaron Saunders, the former Chief Technology Officer of Boston Dynamics, as Vice President of Hardware Engineering—a move that strengthens its hardware expertise as it aims to ...
A new international study highlights major problems with large language model (LLM) benchmarks, showing that most current evaluation methods have serious flaws. After reviewing 445 benchmark papers ...
Anthropic co-founder and Head of Policy Jack Clark offers a look at how Silicon Valley AI leaders are thinking about the future of AI. To explain the moment AI systems develop situational awareness, ...
A new study analyzing 25 language models finds that most do not fake safety compliance - though not due to a lack of capability. Only a handful - including Claude 3 Opus, Claude 3.5 Sonnet, Llama 3 ...
Amazon has announced a major investment in its AI footprint for federal work, saying it will spend up to $50 billion to expand AI and supercomputing infrastructure for U.S. government agencies. The ...
It replaces the Gemini 2.5 Flash Image model from August and is built to handle complex scenes with consistent physics, render text accurately, and use real-time information as input. It also appears ...
New research from Anthropic shows how reward hacking in AI models can trigger more dangerous behaviors. When models learn to trick their reward systems, they can spontaneously drift into deception, ...
Salesforce's new CRMArena-Pro benchmark reveals major challenges for AI agents in business contexts. Even top models like Gemini 2.5 Pro manage just a 58 percent success rate on single turns. When the ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible results