šŸ“ Tutorials
Ā· 8 min read

Apple Language Model Protocol: Use Any LLM in Your iOS App (2026)


Apple’s Language Model Protocol (LMP) is the most significant shift in how iOS developers work with AI since Core ML. Announced at WWDC 2026 (Session 339), it’s an open protocol that lets any LLM provider — Claude, Gemini, custom models, or Apple’s own Foundation Models — plug into a single Swift API. No more juggling five different SDKs. No more rewriting your networking layer every time you switch providers.

If you’ve been building AI features on iOS, you know the pain. Each provider has its own SDK, its own auth flow, its own streaming format, its own way of handling tool calls. LMP eliminates all of that by defining a standard interface that any model can conform to.

Let’s break down what it is, how it works, and how to implement it today.

What Is the Language Model Protocol?

The Language Model Protocol is Apple’s answer to a fragmented AI ecosystem. It’s a Swift protocol — specifically LanguageModelExecutor — that defines how any language model communicates with your app. Whether the model runs on-device (Apple’s Foundation Models), in Apple’s Private Cloud Compute, or on a third-party server (Anthropic, Google, OpenAI, or your own infrastructure), the API surface is identical.

Think of it this way: Core ML standardized how you run inference on-device. LMP standardizes how you talk to any language model, anywhere.

The protocol supports:

  • Text generation with streaming responses
  • Tool calling (function calling) with structured schemas
  • Image input for multimodal models
  • Structured output with type-safe decoding
  • Conversation history management

Apple has committed to open-sourcing the framework later in 2026, which means third-party providers can ship conforming implementations without reverse-engineering anything.

How It Works: The Architecture

The architecture is elegant in its simplicity. Your app talks to LanguageModelExecutor. The executor is a protocol that any provider can conform to. Apple ships three built-in executors:

  1. On-device executor — Runs Apple’s Foundation Models directly on the Neural Engine
  2. PCC executor — Routes to Apple’s Private Cloud Compute for larger models
  3. Third-party executor — Your custom implementation for any external provider

Here’s the key insight: your application code doesn’t change regardless of which executor is active. You can swap from an on-device model to Claude to Gemini by changing one line of configuration.

This is fundamentally different from how MCP (Model Context Protocol) works. MCP standardizes how models connect to tools and data sources. LMP standardizes how your app connects to models. They’re complementary — you can use both together. Your app uses LMP to talk to the model, and the model uses MCP to talk to external tools.

Implementing LanguageModelExecutor

Here’s what a basic third-party executor looks like. Let’s say you want to integrate Claude:

import FoundationModels

struct ClaudeExecutor: LanguageModelExecutor {
    let apiKey: String
    let model: String = "claude-sonnet-4-20250514"
    
    func generate(
        prompt: LanguageModelPrompt,
        options: GenerationOptions
    ) async throws -> LanguageModelResponse {
        let request = buildRequest(from: prompt, options: options)
        let (data, response) = try await URLSession.shared.data(for: request)
        return try decodeResponse(data)
    }
    
    func stream(
        prompt: LanguageModelPrompt,
        options: GenerationOptions
    ) -> AsyncThrowingStream<LanguageModelChunk, Error> {
        AsyncThrowingStream { continuation in
            Task {
                let request = buildStreamRequest(from: prompt, options: options)
                let (bytes, _) = try await URLSession.shared.bytes(for: request)
                for try await line in bytes.lines {
                    if let chunk = parseSSELine(line) {
                        continuation.yield(chunk)
                    }
                }
                continuation.finish()
            }
        }
    }
}

The beauty is in the consumption side. Your app code looks like this regardless of the provider:

let session = LanguageModelSession(executor: ClaudeExecutor(apiKey: key))

// Simple generation
let response = try await session.generate("Explain this Swift error")

// Streaming
for try await chunk in session.stream("Write a unit test for this function") {
    outputView.append(chunk.text)
}

Switch to Apple’s on-device model? Change one line:

let session = LanguageModelSession(executor: .onDevice)

Tool Calling With LMP

Tool calling is where LMP really shines. Instead of each provider having its own function-calling format, you define tools once using Swift’s type system:

@Tool
struct SearchDocumentation {
    @Parameter(description: "The search query")
    var query: String
    
    @Parameter(description: "Maximum results to return")
    var limit: Int = 5
    
    func execute() async throws -> [DocumentResult] {
        // Your implementation
        return try await docSearch.search(query, limit: limit)
    }
}

let session = LanguageModelSession(
    executor: executor,
    tools: [SearchDocumentation.self]
)

The @Tool macro generates the JSON schema automatically. The model receives it in whatever format it expects (Claude’s tool_use, OpenAI’s function calling, Gemini’s function declarations), because the executor handles translation. Your tool definition stays the same.

This approach is similar to how AI agents work in other frameworks — defining capabilities that the model can invoke autonomously — but with compile-time type safety that only Swift can provide.

Which Providers Support It?

As of June 2026, the following providers have shipped or announced LanguageModelExecutor conformances:

ProviderStatusModels Available
Apple (on-device)ShippingAFM 3B, AFM 7B
Apple (PCC)ShippingAFM Cloud, AFM Cloud Pro
AnthropicShippingClaude Sonnet 4, Opus 4
GoogleBetaGemini 3.5 Flash, 3.1 Pro
OpenAIAnnouncedGPT-5.5, GPT-5
MistralBetaMedium 3.5, Large 2
OllamaCommunityAny local model

The community implementation for Ollama is particularly interesting for developers who want to run models locally on Apple Silicon. You get the same API whether you’re hitting Claude’s servers or a Llama model running on your Mac’s GPU.

LMP vs MCP: Different Problems, Same Ecosystem

This is the question I see most often, so let’s clarify. MCP and LMP solve different problems:

MCP = How models connect to tools and data (model-to-world) LMP = How apps connect to models (app-to-model)

In practice, you’ll use both. Your iOS app uses LMP to send a prompt to Claude. Claude uses MCP to call your app’s search tool. The tool result flows back through MCP to Claude, and Claude’s response flows back through LMP to your app.

They compose naturally because Apple designed LMP with MCP in mind. The @Tool macro in LMP can also expose tools via MCP if you want external agents to call them.

Why This Matters for the Ecosystem

Before LMP, building a multi-model architecture on iOS was painful. You’d have Anthropic’s SDK for one feature, Google’s AI SDK for another, and OpenAI’s for a third. Each brought its own dependencies, its own error handling, its own streaming format.

LMP makes using multiple AI models trivial. Route simple queries to the on-device model (free, fast, private). Send complex reasoning tasks to Claude or Gemini. Fall back gracefully when a provider is down. All through one API, one error type, one streaming format.

For developers evaluating the build vs buy decision for AI features, LMP significantly reduces the ā€œbuildā€ cost. You’re not building provider integrations anymore — you’re just picking executors.

Privacy and On-Device First

Apple’s approach remains privacy-first. The on-device executor processes everything locally on the Neural Engine. No data leaves the device. For the PCC executor, Apple’s Private Cloud Compute guarantees are in play — encrypted in transit, processed in secure enclaves, no data retention.

For third-party executors, LMP includes a PrivacyManifest requirement. Each executor must declare what data it sends, where it goes, and how long it’s retained. This integrates with iOS’s existing privacy labels in the App Store.

If you’re building apps that handle sensitive data, this matters. You can use the on-device model for PII-adjacent tasks and only route to cloud models for tasks where you’ve obtained user consent. The GDPR implications are handled at the architecture level rather than as an afterthought.

Getting Started Today

To start building with LMP today:

  1. Update to Xcode 18 — LMP is part of the Foundation Models framework
  2. Target iOS 26+ — The protocol requires the latest runtime
  3. Watch Session 339 — Apple’s WWDC session covers implementation in detail
  4. Start with on-device — Get your prompts working with the free, local model first
  5. Add providers incrementally — Drop in third-party executors as you need more capability

The simplest path is to build your features against the on-device model, then swap in cloud executors when you need more power. Because the API is identical, this requires zero code changes in your feature logic.

The Bigger Picture

LMP represents Apple’s bet that AI model providers will proliferate, not consolidate. By making it trivial to swap between models, Apple ensures that no single provider gains lock-in on their platform. Developers benefit from competition on quality and pricing. Users benefit from apps that can use the best model for each task.

The open-source commitment (planned for later 2026) means this isn’t just an Apple-ecosystem standard. If LMP gains adoption as an open protocol, it could become the standard way any Swift application — macOS, visionOS, even server-side Swift — talks to language models.

For now, if you’re building iOS apps with AI features, LMP is the path forward. One API, any model, type-safe tools, privacy-first architecture. The fragmentation era of iOS AI development is over.

Frequently Asked Questions

Is the Language Model Protocol the same as MCP?

No. LMP standardizes how your app talks to models (app-to-model communication). MCP standardizes how models talk to external tools and data sources (model-to-world communication). They’re complementary protocols — use LMP to connect your app to any LLM, and MCP to give that LLM access to tools.

Can I use Language Model Protocol with models running locally on my Mac?

Yes. The community has already built LanguageModelExecutor conformances for Ollama, which means any model you can run locally on Apple Silicon (Llama, Qwen, Mistral, etc.) works through the same API. For on-device iOS inference, Apple’s built-in executor handles their Foundation Models natively.

Do I need iOS 26 to use LMP?

Yes. The FoundationModels framework and LanguageModelExecutor protocol require iOS 26 or later. If you need to support older iOS versions, you’ll need to maintain separate code paths or wait until your minimum deployment target catches up.

Is LMP free to use, or does Apple charge for it?

The protocol itself is free. On-device inference through Apple’s Foundation Models is free — it runs on the user’s hardware. Apple’s Private Cloud Compute (PCC) models are included for users with Apple Intelligence enabled. Third-party models (Claude, Gemini, etc.) require your own API keys and you pay those providers directly.

When will LMP be open-sourced?

Apple announced at WWDC 2026 that the framework will be open-sourced later in 2026. No exact date has been given. The open-source release will allow the protocol to be used outside Apple platforms, potentially in server-side Swift applications.

How does tool calling work across different providers?

You define tools once using the @Tool macro in Swift. The macro generates a provider-agnostic schema. Each LanguageModelExecutor implementation translates that schema into whatever format its model expects (Claude’s tool_use blocks, OpenAI’s function calling, etc.). Your tool code is written once and works with every provider.