OpenAI has announced the launch of IndQA, a new benchmark designed to measure how well artificial-intelligence models understand and reason about Indian languages, culture and everyday life. The benchmark covers questions in twelve languages and spans ten cultural domains, reflecting a significant expansion of evaluation criteria beyond existing multilingual tests.
IndQA comprises 2,278 questions, each authored by domain experts from India. These questions cover twelve languages including Hindi, Bengali, Tamil, Telugu, Gujarati, Marathi, Odia, Malayalam, Punjabi, Kannada, Hinglish and English. The ten cultural domains include architecture and design, arts and culture, everyday life, food and cuisine, history, law and ethics, literature and linguistics, media and entertainment, religion and spirituality, and sports and recreation.
According to OpenAI, the need for this benchmark arises from limitations in current multilingual evaluation tools. Many of the existing benchmarks focus on translation or multiple-choice formats and have become saturated, meaning top models cluster near perfect scores. IndQA aims to fill the gap by presenting reasoning-heavy, culturally grounded questions that challenge models’ deeper understanding within Indian contexts.
The creation of IndQA involved the participation of 261 domain experts across India. The experts include journalists, linguists, artists, editors, and academics specializing in regional languages and culture. Each question includes native-language prompts, English translations for auditability, ideal answers and a detailed rubric specifying evaluation criteria. Models are graded based on whether they satisfy criteria defined by the experts.
OpenAI points out that India is a critical market for its technologies, being the second-largest market for ChatGPT in terms of user base. The company states that for AI systems to serve all of humanity they must work well across languages and cultural contexts, and India’s diverse linguistic environment makes it a logical focus area.
Early performance results shared by OpenAI indicate that even advanced models have substantial room for improvement on IndQA. While the benchmark is not designed as a direct leaderboard across languages, initial results show that newer models such as GPT-5 outperform previous versions but still score in the mid-30 percent range on this dataset.
For AI developers and researchers, IndQA offers a fresh tool for measuring progress in multilingual and cultural understanding, rather than focusing only on English or broadly defined multilingual tasks. The benchmark’s culturally grounded questions encourage model training and evaluation that reflect local-language reasoning, not just translation capability.
From a strategic perspective, the launch of IndQA signals OpenAI’s ongoing commitment to the Indian ecosystem. By building region-specific benchmarks and engaging local expertise, the company is positioning its technologies for markets that traditional benchmarks have not sufficiently addressed. Analysts view this as part of a broader shift towards localisation and cultural alignment in AI development.
However, challenges remain. The varying linguistic features, dialects, scripts and cultural contexts across Indian languages mean that model performance will likely vary significantly by language and domain. Moreover, because the benchmark intentionally filters for questions that prior top models struggled with, scores reflect the difficulty of the evaluation and must be interpreted accordingly.
In the Indian AI ecosystem this benchmark is likely to drive both academic and industry interest in improving multilingual and cultural competence of models. Start-ups, research labs and global players may use IndQA to benchmark their systems, identify gaps and refine training data or architectures accordingly. Over time it may contribute to AI products and services that better understand Indian languages, cultural references and domain-specific reasoning.
In the wider global context, IndQA could serve as a model for similar region-oriented benchmarks, reflecting the need to move beyond English-centred evaluation. As AI adoption increases in emerging markets, culturally aware benchmarks may become standard practice for verifying model readiness and fairness.
In summary, by launching IndQA, OpenAI has introduced a new benchmark that emphasises Indian languages and culture as integral to AI evaluation. The initiative provides a pathway for measuring model competence beyond translation, focusing instead on reasoning, context and cultural nuance. As AI models evolve, IndQA will likely play a key role in guiding how systems are built and assessed for diverse global audiences.