Who Owns AI? The Global Battle Over Data, Copyright and Sovereignty

The world’s most powerful artificial intelligence systems are being built on a question no one has fully answered yet: who owns AI?

For technology companies, AI is the outcome of engineering, compute infrastructure, model design and billions of dollars of investment. For publishers, artists, authors, researchers and media companies, it is also the outcome of years of human-created work that has been scraped, processed and absorbed into large language models. For governments, the question goes even further. If AI systems shape knowledge, public discourse, consumer behaviour and economic decision-making, should they be controlled by a handful of global companies, or should nations have sovereign control over the intelligence layer of their own digital economies?

What began as a copyright dispute is now becoming a much larger battle over data ownership, economic power and digital sovereignty.

In India, the issue came into sharp focus after Asian News International filed a copyright case against OpenAI in the Delhi High Court in 2024. ANI alleged that OpenAI used its copyrighted news content without authorisation to train AI systems and also raised concerns around false attribution. OpenAI has denied wrongdoing and has argued that its practices are lawful. According to court documents reported by legal observers, OpenAI also submitted that ANI’s domain had been blocklisted from future training as of October 2024.

The case is significant because it is among India’s first major legal tests of generative AI, copyright and jurisdiction. It asks questions that go beyond one news agency or one technology company. Can publicly accessible content be used to train commercial AI models without permission? Does storing, processing or reproducing copyrighted material inside an AI system amount to infringement? If a foreign AI company has no servers in India but its product is available to Indian users, can Indian courts claim jurisdiction?

The Delhi High Court has reserved judgment in the matter, according to recent legal commentary. Whatever the outcome, the case is likely to influence how Indian publishers, platforms and AI companies think about licensing, scraping, attribution and model accountability.

India is not alone. In the United States, The New York Times sued OpenAI and Microsoft in 2023, alleging that millions of its articles were used without permission to train AI systems that now compete with the newspaper’s own products. OpenAI has rejected the allegations and called the case baseless. In a 2025 statement on a related data preservation dispute, OpenAI said it “strongly” believed the demand to retain user content was an overreach and argued that it risked user privacy without helping resolve the lawsuit.

The case has become one of the defining copyright battles of the AI era because it involves not just training data, but the economic future of journalism. At the World News Media Congress, New York Times publisher A.G. Sulzberger described unauthorised AI use of creative content as “brazen theft” and warned that the issue threatens not only news but the wider creative economy, including books, music, research and entertainment.

The dispute widened further in May 2026 when CNN filed a lawsuit against AI search company Perplexity in a New York federal court. According to Reuters, CNN alleged that Perplexity unlawfully copied and reused thousands of CNN stories, videos and images to power its AI services and distribute similar or identical content in competition with CNN. Perplexity has argued in response to publisher complaints that facts cannot be copyrighted, while publishers contend that the expression, structure and reporting behind those facts are protected.

This is the heart of the AI ownership debate. AI companies argue that models do not simply copy content. They learn patterns, relationships and language structures to generate new outputs. Many developers compare this process to human learning. Publishers and creators reject that analogy. They argue that machines trained at industrial scale are not like readers learning from a newspaper or a book. They are commercial systems that can absorb the value of copyrighted work, generate substitutes and weaken the market for original content.

The courts are beginning to draw early lines, but those lines remain uneven.

In 2025, U.S. courts delivered important rulings in cases involving Thomson Reuters, Anthropic and Meta. In the Thomson Reuters case, a federal judge rejected a fair use defence by Ross Intelligence, which had used Westlaw material to build a competing AI legal research product. Reuters reported that the ruling was one of the first major decisions on fair use in an AI-related copyright dispute.

Other decisions were more favourable to AI developers, at least in part. In Bartz v. Anthropic, Judge William Alsup found that training AI on legally obtained books could be considered highly transformative, but he did not excuse the alleged use of pirated copies. In Kadrey v. Meta, another court also treated AI training as potentially fair use under specific circumstances. The emerging message is not that AI training is automatically legal or illegal. It is that sourcing, market harm, outputs and competitive substitution will matter.

That distinction is crucial for enterprises and marketers. The debate is no longer theoretical. Brands are using generative AI for campaign ideation, content production, customer service, search, media planning, design and analytics. If the legal foundation of these systems is uncertain, enterprises may increasingly ask vendors where their models were trained, what data was used, whether copyrighted content was licensed, and how risks are allocated in contracts.

For the media industry, the stakes are even higher. Publishers depend on traffic, subscriptions, licensing and advertising. If AI answer engines summarise or reproduce reporting without sending users back to the original source, the economic model of journalism becomes weaker. This is why the AI copyright battle is also a battle over distribution. The question is not only whether content was used in training. It is whether AI systems will become the new interface through which audiences consume news, research and knowledge.

This is where copyright merges with sovereignty.

Nvidia CEO Jensen Huang has been one of the most visible voices arguing that countries need sovereign AI capabilities. At the World Governments Summit in Dubai in 2024, Huang said, “Every country needs to own the production of their own intelligence.” He also warned, “You cannot allow that to be done by other people.” His argument was not limited to national pride. It was about economic competitiveness, culture, language and control over infrastructure.

Huang later described AI as infrastructure, comparable to electricity and the internet. At Computex 2025, according to Nvidia, he said, “AI is now infrastructure, and this infrastructure, just like the internet, just like electricity, needs factories.” In the AI economy, those factories are data centres, chips, cloud systems, models and software stacks.

That framing explains why governments are no longer treating AI as just another software product. AI is becoming a strategic layer of national capability. The countries that control compute, data, models and deployment infrastructure will have greater influence over education, healthcare, defence, manufacturing, language technologies and digital public services.

India has already moved in this direction. The IndiaAI Mission, approved in 2024, was allocated more than Rs 10,300 crore over five years to build AI infrastructure, support startups, improve data availability and develop indigenous AI capabilities. According to the Press Information Bureau, India’s common compute capacity had crossed 34,000 GPUs by May 2025. A February 2026 government update said the IndiaAI Compute Portal provides access to more than 38,000 GPUs and 1,050 TPUs at subsidised rates.

The objective is clear. India does not want to remain only a consumer of foreign AI models. It wants to build AI systems that understand Indian languages, local contexts, public priorities and sectoral needs. The government has also backed indigenous foundation model initiatives focused on India-specific data.

This is where the question “who owns AI?” becomes deeply political. If a model is trained mostly on Western data, hosted on foreign cloud infrastructure, governed by foreign laws and controlled by foreign companies, can it fully reflect the needs of Indian users, Indian businesses and Indian institutions? If Indian enterprises build customer experiences, marketing automation and decision systems on such models, how much strategic dependency are they creating?

The European Union has taken a regulatory route. The EU AI Act is the world’s first comprehensive AI legal framework, according to the European Commission. Its general-purpose AI rules began applying from August 2025. These rules require providers of general-purpose AI models to maintain documentation, follow copyright-related obligations and publish summaries of training content. The European Commission has also developed a template for providers to summarise the data used to train their models.

For publishers and rights holders, transparency is a minimum demand. Without visibility into training data, they cannot know whether their content was used. Without licensing frameworks, they cannot be compensated. Without attribution, their role in creating value disappears.

For AI companies, however, full transparency is difficult. Training datasets are massive, often assembled from multiple sources and filtered through complex pipelines. Companies also argue that revealing too much about training data could expose trade secrets or create security risks. This creates another sovereignty tension. Regulators want accountability, companies want flexibility, and creators want control.

There is no easy settlement ahead.

A licensing-led model may emerge for premium content, particularly in news, legal, financial and scientific domains. Some publishers have already signed deals with AI companies, while others have chosen litigation. Open models may rely more heavily on curated datasets, public domain material and consent-based repositories. Governments may push for national datasets and sovereign compute. Enterprises may demand indemnity and auditability from AI vendors.

Yet the deeper question will remain unresolved: is AI a product, a platform, a public utility, or a knowledge system built on the collective output of society?

If AI is treated only as a product, ownership will sit largely with companies. If it is treated as infrastructure, governments will demand control. If it is treated as a derivative of human creativity, creators and publishers will demand rights and compensation. If it is treated as a public knowledge layer, the world may need new governance models that go beyond traditional copyright law.

For marketers, publishers and enterprises, this debate will shape the next phase of the AI economy. The first phase was about adoption. The second is about accountability. The third will be about ownership.

The answer to who owns AI may not be one group. Companies may own models. Creators may own protected expression. Governments may control national infrastructure. Users may demand rights over their data. Enterprises may insist on governance over deployment.

But one thing is clear. AI is no longer just a technology story. It is a power story.

And the battle over who owns AI has only just begun.

Disclaimer: All data points and statistics are attributed to published research studies and verified market research. All quotes are either sourced directly or attributed to public statements.