"From Billion-Dollar Apps to Thinking Machines" -State of Foundation Models 2025

Key Highlights & Major Trends :

1. Generative AI Mainstream Adoption:
- 1 in 8 global workers uses AI monthly; 90% of growth occurred in the last 6 months.
- AI-native apps now generate billions in annual revenue.

2. Exponential Scaling:
- Model metrics (cost, intelligence, context windows) improve >10x YoY.
- Human-task duration handled by models doubles every 7 months (e.g., from 1 sec in 2019 to 1 hour in 2025).
- Context windows: 2–8k tokens (2023) → ~1M tokens (2025).

3. Economic Paradox:
- Revenue Growth: OpenAI/Anthropic accelerate at >$1B/year.
- Cost Challenges: Frontier model training costs near $500M; models obsolete in 3 weeks due to open- source competition.

4. Reasoning Models:
- Models now "think before speaking," leveraging reinforcement learning and reward models.
- Smaller models (e.g., 3B params) outperform larger ones (70B) given "thinking time."

5. Open vs. Closed Source:
- Open-source (Meta, Mistral, Alibaba) now competitive with proprietary models.
- Example: DeepSeek-VL (open, <$10M training) rivals GPT-4 ($100M+ training).

6. AI in Professions:
- Copilots/agents handle high-value tasks in engineering, law, healthcare, and creative fields.
- Example: LLMs outperform doctors in diagnostics and solve geometry problems better than 99.999% of humans.

Technical Innovations

1. Models & Architecture
- Mixture-of-Experts (MoE):
- Efficient routing (e.g., DeepSeek v2/v3, Mixtral, rumored GPT-4).
- Balances active parameters (e.g., Llama 4 Scout: 17B active, 109B total).
- Tokenization Challenges:
- Root cause of LLM weaknesses (spelling, arithmetic, non-English support).
- Solution*: Byte-level tokenization (e.g., Byte Latent Transformer).
- Multimodality & Robotics:
- Video: Veo hits "ChatGPT moment."
- Robotics: Generalized models (e.g., Physical Intelligence) perform novel tasks in unseen environments.
- *DNA Models: Evo 2 enables mutation prediction and genome design.

2. Post-Training & Reasoning
- Shift from Pre-training:
- Data scarcity ("fossil fuel of AI") limits scaling; focus shifts to synthetic data, agents, and inference-time compute.
- Reasoning Techniques:
- Best-of-N: Generate multiple responses, select best via verifier.
- Search Algorithms: Beam search, tree search for complex problem-solving.
- Verifiers & Reward Models:
- Critical for safety/accuracy (e.g., procedural verifiers for code/math, learned verifiers for generalization).

3. Market Dynamics
Investment & Revenue
- VC Funding:
- 10% of 2024 VC dollars ($33B) went to foundation model companies.
- 2025: >50% of all VC funding targets AI.
- Revenue Leaders:
- OpenAI: $3.7B projected 2025 revenue (1/3 from agents).
- Anthropic: $2B annualized revenue (Q1 2025).
- AI Apps: >$1.2B ARR from 20+ companies (e.g., Midjourney, Cursor, ElevenLabs).

4. Competitive Landscape
- OpenAI vs. Anthropic:
- OpenAI: 73% revenue from ChatGPT subscriptions (consumer -focused).
- Anthropic: 83% from API (B2B-focused).
- Incumbent Disruption:
- Startups beat giants despite advantages (e.g., Cursor vs. GitHub Copilot, Krea vs. Adobe Firefly).
- Risks
- Novelty Effect: Revenue spikes/drops (e.g., Lensa AI).
- High Burn: Companies spend $50M+/year on training without product-market fit.

5. GPU Ecosystem
- NVIDIA Dominance: Stock up 1,440% in 5 years; inference tokens surge 10x YoY.
- GPU Clouds (e.g., CoreWeave): Focus on raw compute access vs. bundled services (unlike AWS/GCP).

Use Cases & Applications

1. Dominant Categories
a. Search & Synthesis: Vertical startups (e.g., Harvey for legal, OpenEvidence for healthcare).
b. Software Engineering:
- Copilots = $2B/year market; Cursor hits $1B ARR fastest ever.
- AI touches entire SDLC (coding, testing, migration, documentation).
c. Agents:
- Constrained agents thrive (e.g., Lovable, Dosu); general agents struggle.
- Success traits: Human-machine balance, workflow specificity, "show your work" UX.
d. Creative Tools: Runway (video), Suno (music), generative design tools.

2. Emerging Opportunities
- Therapy/Life Organization: Top consumer use case (HBR survey).
- Voice Agents: Early but growing (e.g., ElevenLabs, Phonic).
- Labor Automation: Dropzone AI (security), LightTable (translation).

3. Product Development Insights
Systems Over Models
- Compound AI Systems: Combine multiple models/tools (e.g., Apple Intelligence’s hybrid architecture).
- RAG > Long Context:
- RAG outperforms 1M-token models in cost, latency, accuracy (e.g., 677ms vs. 68 sec latency).
- Context Engineering:
- Prioritize/reduce context (e.g., semantic deduplication) to fit ~60k tokens from 1M+ relevant tokens.

4. UX & Design
- Personality Matters: Base models outperform aligned models in creativity/randomness (critical for design/therapy apps).
- Agent Interfaces: Tool-specific integrations beat generic protocols (e.g., MCP).

5. Data Curation
- High-quality data boosts efficiency:
- Curated datasets reduce compute by 13% and increase training speed 7.7x.

Long-Term Shifts & Predictions
1. AI-Native Organizations:
- Flatter teams of generalists; specialists devalued.
- Example: Shopify mandates AI proficiency: "Learn AI or leave."
2. Labor Transformation:
- Managers shift to "AI fleet management"; designers/engineers roles blur.
- Example: CTOs spend 100% time reviewing agent output.
3. Infrastructure Renaissance:
- AI-Provisioned Infra: Databases (Neon), browsers (BrowserBase) rebuilt for AI agents.
- Semiconductor Innovation* Transformer-focused chips (Etched, d-Matrix).
4. AGI Timeline:
- 14% of AI researchers predict AGI in 3 years.

Conclusions & Strategic Implications

Winners:
- NVIDIA/GPU Clouds: Irreplaceable compute backbone.
- Vertical AI Apps: Solve specific problems (e.g., healthcare, legal).
- Data-Centric Startups: High-quality data curation/synthetic data.

Risks:
- Incumbents in "AI line of fire" (e.g., CRM, creative tools).
- Middle management erosion; specialist roles obsolete.
- Future Build Areas:
- AI Code Gen Downstream: Reinvent SDLC, testing, and observability.
- Science Foundation Models: Biology, materials, climate (e.g., Orbital Materials).
- Closed-Loop Systems: Generate + verify (e.g., AI scientists).

Final Takeaway: Foundation models are reshaping work, creativity, and infrastructure. Survival requires embracing AI- native workflows, context -aware systems, and strategic open- source leverage. The era of "thinking AI" has begun—post-training now outweighs pre-training.