OpenAI Seeks Contractors’ Real Work Examples to Measure AI Model Capabilities

OpenAI has reportedly begun asking third-party contractors to submit real examples of work they performed in previous or current jobs as part of an effort to train and evaluate the performance of its next generation artificial intelligence models. The initiative reflects a broader industry push to move beyond synthetic or publicly available data and towards more representative, real-world professional workflows.

According to information shared with contractors, the company is seeking actual job deliverables such as documents, spreadsheets, presentations, images or code, along with the original instructions or briefs that guided the work. The goal is to create benchmarks that reflect how humans complete complex tasks, which can then be used to assess how closely AI systems are able to replicate or support similar work processes.

The programme is being conducted with the support of data-labeling and workforce partner Handshake AI. Contractors participating in the exercise are required to submit authentic work samples rather than recreated or simulated examples. These materials are expected to represent genuine professional output across a range of roles and industries.

OpenAI has instructed contributors to remove or anonymise any sensitive or personally identifiable information before uploading files. Contractors have reportedly been directed to use internal tools designed to help scrub confidential data from documents. The company has emphasised that responsibility for ensuring submissions are free of proprietary or identifying information lies with the individual contributor.

The effort is understood to be part of OpenAI’s focus on developing AI agents that can carry out multi-step tasks rather than simply generate text or respond to prompts. By analysing how humans complete real assignments, the company aims to improve its models’ ability to reason, plan and execute work in a structured and contextual manner.

Examples shared internally illustrate the type of material being collected. In one case, a contractor submitted a real travel itinerary prepared for a client as part of a professional role, along with the original task description. Similar submissions reportedly span fields such as administration, research, creative services and technical work.

OpenAI has not publicly disclosed how many contractors are involved in the programme or how the collected data will be weighted within its broader training and evaluation processes. The company has also not confirmed whether the materials will be used directly for model training, benchmarking, or internal testing.

The initiative has drawn attention from legal and data governance experts, particularly around issues of confidentiality and intellectual property. While contractors are asked to remove sensitive details, some observers note that determining what constitutes confidential information can be subjective, especially when dealing with complex professional documents.

Legal analysts have pointed out that placing the burden of anonymisation on individual contributors may carry risks if proprietary information is inadvertently shared. In such cases, both the contractor and the platform receiving the data could potentially face legal challenges related to breaches of nondisclosure agreements or misuse of trade secrets.

The move also comes amid ongoing scrutiny of how artificial intelligence companies source and use training data. In recent years, developers have faced criticism and legal action over the use of copyrighted material and the lack of transparency around data collection practices. As models become more advanced, the demand for higher-quality and more specialised data has increased.

Industry analysts say the shift toward real professional work samples reflects the limitations of traditional training datasets, which often rely on publicly available text or artificially generated examples. These sources may not adequately capture the complexity, decision-making and contextual judgement involved in real workplace tasks.

The rise of AI agents capable of navigating software tools, managing workflows and completing end-to-end assignments has further intensified the need for realistic benchmarks. Developers are increasingly focused on evaluating whether AI systems can perform at levels comparable to human workers in practical settings.

At the same time, the programme highlights the growing role of contractors in the AI development ecosystem. Data labeling, evaluation and content generation have become a significant segment of the technology workforce, with thousands of contributors supporting model development across the industry.

As competition among AI companies intensifies, access to high-quality data is emerging as a key differentiator. Initiatives like OpenAI’s contractor outreach underscore how firms are experimenting with new ways to measure progress and improve model reliability while navigating ethical and legal considerations.

The long-term implications of using real work samples remain to be seen. While such data could help improve AI performance and usefulness, it also raises questions about data ownership, consent and the boundaries between human labour and machine learning.

For now, OpenAI’s approach reflects the evolving strategies being adopted across the artificial intelligence sector as companies seek to build systems that operate more effectively in real-world business environments. As these efforts continue, debates around transparency, responsibility and data governance are likely to remain central to the industry’s development.