Crypto News

AI News: OpenAI Launches New Benchmark To Tackle AI Factuality

OpenAI is pushing the limits of its model as it seek to measure correctedness with SimpleQA, a new open-source benchmark
Published by
AI News: OpenAI Launches New Benchmark To Tackle AI Factuality

Highlights

  • OpenAI has introduced its AI model fact-checking benchmark
  • The benchmark is named SimpleQA and it is open-sourced
  • This product is part of the sweeping expansions in OpenAI after its valuation boom

Renowned Artificial Intelligence (AI) firm OpenAI has introduced SimpleQA, a factuality benchmark. Based on its description, this tool measures the ability for language models to answer short, fact-seeking questions. This new benchmark marks another attempt for the AI giant to restore trust in its flagship products.

Advertisement

SimpleQA Outperforms Frontier Models

A general problem faced by AI platforms is training models to provide responses that are factually correct.

Currently, the situation has reached a point where these models even produce false outputs or give answers without substantial evidence. This challenge is generally referred to as “hallucination.” Consequently, netizens are more geared towards the few models that provide more accurate responses with less hallucinations.

However, OpenAI decided to come up with the SimpleQA benchmark that measures factuality of language models. This vision is considered a difficult one to pursue because measuring factuality is challenging as the firm noted. SimpleQA is designed to focus on short, fact-seeking queries. Not only will this design reduce the scope of the benchmark, it will also make measuring factuality much more tractable.

The team behind the development of the benchmark fixed their gaze on high correctedness, diversity and good researcher UX. Unlike previous solutions like TriviaQA which has now become saturated, OpenAI’s SimpleQA was built to challenge frontier models including GPT-4o which currently scores less than 40%. While training the AI tool, the team ensured that each question in the dataset met certain criteria.

“As a final verification of quality, we had a third AI trainer answer a random sample of 1,000 questions from the dataset. We found that the third AI trainer’s answer matched the original agreed answers 94.4% of the time, with a 5.6% disagreement rate,” the ChatGPT maker wrote.

Advertisement

OpenAI’s Valuation Surge to $157 Bln

At the beginning of October, the AI firm saw its valuation top $157 billion after it secured $6.6 billion in funding from investors. These investors includes Thrive Capital, which led the round, Microsoft Corporation and AI giant NVIDIA. The ascent of the Sam Altman-led firm hinges on making plans to bolster its position in frontier AI research.

A week after raising the fund, the firm revealed that it is opening new offices in the United States, France, and Asia, marking another monumental stride globally.

The offices will be located in NYC, Seattle, Paris, Brussels, and Singapore, adding to those already in San Francisco, London, Dublin, and Tokyo. The decision to launch SimpleQA marks a product expansion push that followed the spike in OpenAI’s valuation.

 

Advertisement
Share
Godfrey Benjamin

Benjamin Godfrey is a blockchain enthusiast and journalists who relish writing about the real life applications of blockchain technology and innovations to drive general acceptance and worldwide integration of the emerging technology. His desires to educate people about cryptocurrencies inspires his contributions to renowned blockchain based media and sites. Benjamin Godfrey is a lover of sports and agriculture. Follow him on X, Linkedin

Published by
Why trust CoinGape: CoinGape has covered the cryptocurrency industry since 2017, aiming to provide informative insights to our readers. Our journalists and analysts bring years of experience in market analysis and blockchain technology to ensure factual accuracy and balanced reporting. By following our Editorial Policy, our writers verify every source, fact-check each story, rely on reputable sources, and attribute quotes and media correctly. We also follow a rigorous Review Methodology when evaluating exchanges and tools. From emerging blockchain projects and coin launches to industry events and technical developments, we cover all facets of the digital asset space with unwavering commitment to timely, relevant information.
Investment disclaimer: The content reflects the author’s personal views and current market conditions. Please conduct your own research before investing in cryptocurrencies, as neither the author nor the publication is responsible for any financial losses.
Ad Disclosure: This site may feature sponsored content and affiliate links. All advertisements are clearly labeled, and ad partners have no influence over our editorial content.

Recent Posts

  • Crypto News

Fed’s Williams Says No Urgency to Cut Rates Further as Crypto Traders Bet Against January Cut

New York Federal Reserve President John Williams has signaled his support for holding rates steady…

December 19, 2025
  • Crypto News

Trump to Interview BlackRock’s Rick Rieder as Fed Chair Shortlist Narrows to Four

The Fed chair race is heating up with U.S. President Donald Trump set to interview…

December 19, 2025
  • Crypto News

Breaking: VanEck Discloses Fees and Staking Details for its Avalanche ETF

The leading crypto asset manager VanEck amends its Avalanche ETF with the U.S. Securities and…

December 19, 2025
  • Crypto News

Crypto Market Braces for Volatility as BTC, ETH Options Expiry Collides $7.1 Trillion ‘Triple Witching’

Crypto market traders are bracing for heightened volatility and a potential crash as Bitcoin and…

December 19, 2025
  • Crypto News

Terraform Labs Lawsuit: Jump Trading Faces $4B Case over Market Manipulation

While the crypto market has yet to fully recover from the $40 billion collapse of…

December 19, 2025
  • Crypto News

Coinbase Challenges US States Over Regulatory Restrictions on Prediction Markets

Coinbase Global, the largest cryptocurrency exchange in the United States, has filed lawsuits against three…

December 19, 2025