Intro
Recently, Google released the A2A protocol, extending the Model Context Protocol (MCP) proposed by Anthropic last year. Before diving into A2A, I wanted to take some time to document my notes on MCP.
📌 This article focuses on more advanced MCP topics, including MCP Agents, Sampling, Composability, Registry, and future developments.
Friendly reminder: this article turned out longer than expected, so it is split into two parts. Feel free to jump to the sections you are most interested in 🥸
MCP Agents and System Architecture
An MCP Agent is a powerful framework that provides the following core capabilities:
- A simple and flexible context injection mechanism
- Declarative frameworks for implementing various workflows
- A rich set of building blocks for constructing agents
Taking LastMile AI’s MCP Agent as an example, it provides a lightweight framework for building agents using MCP 🤖
System Architecture Considerations
Component | Responsibilities |
Client | - Does not need to handle retry logic- Does not need to manage logging details |
Server | - Closer to the final application- Has greater control over system-level interactions |
Scalability and Limitations
Current model capacity constraints:
- Standard models (e.g., Claude): ~50–100 tools
- Advanced models: several hundred tools
Tool Management Strategies
To manage large toolsets effectively, the following approaches may be useful:
- Tool search systems
- A RAG-style abstraction layer over tools
- Fuzzy search over a tool repository
- Hierarchical system design
- For example: financial tools, data reading tools, data writing tools
Implementation Guidelines
A recommended path for building an MCP server:
Start with basic tools to understand the MCP Server architecture → move on to prompt design → configure resource management
Automation and Integration
One noteworthy feature is that MCP supports automatic MCP server generation:
- Using tools such as Cline
- Enables real-time MCP server generation
Maintenance and Evolution
Key principles for managing system changes:
- Adhere to the MCP protocol to ensure baseline functionality
- Allow tools to evolve dynamically
- Maintain consistency in resource prompts
- Preserve standardized tool invocation patterns
Building Effective Agents with MCP
Next, we introduce two critical MCP concepts: Sampling and Composability
Sampling
In MCP, Sampling is a powerful capability that allows servers to request large language model (LLM) generations through the client. This enables servers to dynamically request model inference while executing tools, prompts, or workflows—without directly managing model access—while preserving security and privacy.
🤔 Why Use Sampling?
Traditionally, servers that rely on LLMs must manage API keys, model selection, and cost controls themselves. MCP Sampling fundamentally changes this:
- Servers do not directly access LLMs
Servers request inference through the client without hosting or directly invoking models.
- Users retain control
Clients can review and modify requests to ensure privacy and security compliance.
- Flexible model selection
Clients choose the most appropriate model based on server preferences and available resources.
🔁 Sampling Workflow
- Server sends a request
The server issues a
sampling/createMessage request to the client, including conversation history, system prompts, and model preferences.- Client reviews the request
The client may inspect or modify the request to enforce user control and privacy.
- LLM inference execution
The client selects an appropriate model and performs inference.
- Review of generated output
The client inspects the output to ensure it meets expectations.
- Return results to the server
The client sends the generation back to the server for further processing or user response.
This design ensures human-in-the-loop control while enabling nested LLM calls within multi-step agent workflows.
Request Format
Sampling requests use a standardized message format that includes:
messages: Conversation history, with roles (user,assistant) and content (text or images)
modelPreferences: Model hints, cost, speed, and intelligence priorities
systemPrompt: Optional system-level instruction
includeContext: Scope of included context (none,thisServer,allServers)
temperature: Controls randomness (0.0–1.0)
maxTokens: Maximum token count
stopSequences: Optional stop tokens
metadata: Provider-specific parameters
Official example request:
{ "method": "sampling/createMessage", "params": { "messages": [ { "role": "user", "content": { "type": "text", "text": "What files are in the current directory?" } } ], "systemPrompt": "You are a helpful file system assistant.", "includeContext": "thisServer", "maxTokens": 100 } }
🔐 Security and User Control
The design of Sampling emphasizes a human-in-the-loop philosophy, ensuring that users retain control over both model inputs and outputs. Through the client, users can:
- Review and modify server requests to prevent potentially malicious or inappropriate usage
- Enforce limits on model usage, such as daily quotas or restrictions on specific models
- Inspect generated outputs to ensure they meet user expectations and safety standards
These mechanisms ensure that, even in multi-agent systems, the client retains final authority over all interactions with the model.
→ By centralizing all LLM interactions on the client side, requests can be collectively managed. In open-source setups, clients may even self-host their own LLMs, retaining full freedom over which model types are actually used.
- For example, a server may request inference using specific parameters, such as preferred models. It might say, “I strongly prefer this particular version of Claude,” or “I need either a large or small model—please satisfy this if possible.” The server can pass along system prompts, task prompts, temperature, maximum token limits, and other parameters. However, the client is not obligated to comply. It may reject the request if it appears malicious, or impose strict limits based on privacy or cost considerations, including throttling request frequency.
2. Composability
Earlier, we mentioned that MCP separates clients and servers. However, this separation is logical rather than physical.
In other words, any application, API, or agent can simultaneously act as both an MCP Client and an MCP Server.
MCP Agent System Overview
Within the MCP architecture, agents can assume dual roles. Consider a real-world example: when a user interacts with Claude Desktop (an LLM), they may request a research agent to gather information. This research agent acts as both an MCP server and an MCP client. It can invoke multiple resources—such as file systems, data ingestion services, or web search tools—process the retrieved information, and return structured results.
Agent Chaining Architecture
This design forms a chained architecture:
User → Client/Server Combination A → Client/Server Combination B → …
Such chaining enables multi-layered LLM systems, where each layer focuses on a specific responsibility or task.
Common Questions and Answers
Category | Question | Answer |
Error handling | How are cascading errors handled in multi-layer systems? | This is a general multi-node challenge and depends on how each agent processes information flow. MCP itself neither simplifies nor complicates this issue. |
Protocol choice | Why use MCP instead of HTTP? | MCP offers richer capabilities, including resource notifications, bidirectional communication, and structured data requests, supporting complex asynchronous multi-step interactions. |
Intelligent services | Why convert traditional services into MCP servers? | Doing so grants services agent-like capabilities, enabling more autonomous and intelligent task handling. |
System Control and Observability
- LLMs reside at the application layer and control rate limits and interaction flow
- The system behaves largely as a black box; observability depends on the implementation and surrounding ecosystem
Debugging and Security
- Use the Inspector tool to view logs and monitor connections
- Supports server-side debugging
- Authorization and authentication are defined by the server implementer
- Tool annotations enable fine-grained read/write permission control
3. Sampling + Composability
Sampling and Composability can also be combined to chain multiple agents together while ensuring that the client application maintains full control over the inference process.
- In some cases, these agents may exist on the public internet or be developed by third parties
- MCP allows you to connect to such agents while preserving privacy, security, and control guarantees
This raises a natural question: Couldn’t we just use RESTful APIs instead? Let’s compare the two approaches.
MCP vs. RESTful APIs
MCP vs. RESTful API | MCP | RESTful API |
Design goal | Orchestrates LLM reasoning and tool interactions with contextual logic | Transfers data via CRUD operations |
Data processing | Supports complex transformations, context composition, and prompt logic | Primarily structured data with fixed formats |
Interactivity | Multi-step, logic-driven conversations and tool chains | Stateless request-response |
LLM integration | Deep integration for multi-step reasoning | Minimal; usually passes inputs and outputs only |
Use cases | Multi-agent systems, intelligent assistants, context-aware applications | Simple APIs and backend services |
State management | Maintains context and workflow state | Stateless by design |
What’s Next for MCP
As MCP gains adoption across agent applications and LLM ecosystems, new challenges emerge. To improve scalability and stability, the MCP team is advancing several key initiatives—most notably remote server support, formal authentication, and the MCP Registry API.
Remote MCP Servers and Authentication
MCP deployment is no longer limited to local processes (stdio). Many teams want to host MCP servers in the cloud or internal networks to support cross-system integration.
Remote MCP Servers + Auth
- The Inspector tool now officially supports authentication
- Server-Sent Events (SSE) are the recommended transport for low-latency, reliable bidirectional communication
Official MCP Registry API (In Development)
Why a Registry?
Today, MCP servers are scattered across GitHub, npm, PyPI, Rust ecosystems, and more—leading to:
- Poor discoverability
- Opaque transport details
- No version tracking or trust verification
What Is the MCP Registry API?
The MCP team is building an open, centrally hosted Registry API to address this fragmentation.
Feature | Description |
Central hosting | Maintained by the MCP team; schemas and development are fully open-source |
Ecosystem integration | Works with npm, PyPI, Cargo, Go Modules |
Metadata queries | Protocols, transports, authors, and version history |
Version diffing | Tracks added tools, updated descriptions, and API changes |
- Example:
If Shopify publishes an official MCP server, developers can verify its authenticity, supported transports, authentication requirements, and available tools via the Registry.
FAQ
Topic | Question | Summary Answer |
Self-hosted Registry support | Can organizations run their own MCP Registry? | ✅ Yes. In addition to the public registry, organizations can deploy private registries and integrate them with development environments such as VS Code and Cursor. The Registry API only provides data interfaces; the UI is unrestricted. |
Open vs. closed | Is MCP open-source? Does it support multiple model providers? | ✅ Fully open. MCP is an open protocol. Claude is not the only client, and other LLM providers are free to implement it. Healthy competition benefits both users and developers. |
Server-initiated interactions | Can MCP servers proactively trigger inference or interact with clients? | ✅ Partially supported. Servers can currently notify clients of resource updates. Proactive sampling is not yet supported but is on the roadmap. Servers may also act as clients and use their own LLMs to implement advanced interaction and composition logic. |
Model participation | Does every MCP interaction require an LLM? | ❌ Not necessarily. Clients can directly invoke tools or resources without involving a model, enabling fully deterministic control flows. |
Server-to-server communication | Can MCP servers communicate directly with each other? | ✅ Flexible but not the default. By default, interactions are mediated by a client to ensure security and separation of concerns. However, the architecture allows implementers to design direct server-to-server communication if needed. |
MCP Registry: Enabling Self-Evolving AI Agents 🦾
The MCP Registry fundamentally changes how agents are designed. Instead of relying on predefined tools, agents can dynamically discover, load, and use new capabilities at runtime.
How Do Agents “Evolve”?
Traditionally, agents must know in advance which tools are available. With the MCP Registry, agents can instead:
- Search the Registry to find the tools they need (for example, a Google Maps MCP server)
- Verify that the source is trusted, then connect via transports such as SSE
- Dynamically use the tool to complete tasks such as geolocation, distance calculation, or route planning
✅ No pre-packaging and no manual configuration. Agents can “grow stronger” based on the task at hand 🦾
🔐 Security and Access Control
The MCP team proposes DevOps-style mechanisms for governance and safety:
Control Mechanism | Description |
Private registries | Control which MCP servers are available to agents |
Whitelisted queries | Restrict agents to searching and accessing only approved sources |
Official verification badges | Allow agents to identify trusted providers such as Shopify or Grafana |
Middleware filters | Add an additional filtering layer to block unauthorized server access |
Summary
Traditional Agent Design | MCP-Based Design |
Tools are hard-coded and difficult to extend | Tools can be dynamically discovered, loaded, and used |
Manual integration and deployment | Agents automatically discover and connect to tools |
No verification or access control | Registry provides verification, whitelists, and security filtering |
→ The MCP Registry transforms agents from passive tool users into active, intelligent decision-makers.
Beyond the Registry: Integrating .well-known with MCP for Smarter Agents
As MCP pushes AI agents toward more dynamic and composable tool ecosystems, another key complementary mechanism emerges:
.well-known/mcp.json. It not only addresses limitations of the MCP Registry, but also lays the groundwork for a more understandable and accessible web for intelligent agents.What Is .well-known/mcp.json?
.well-known is a standardized way for websites to expose metadata at their root (for example, .well-known/robots.txt or .well-known/openid-configuration). In the context of MCP, it allows a website to explicitly publish its MCP endpoint and available tools.Example:
Suppose Shopify exposes
https://shopify.com/.well-known/mcp.json with the following content:{ "mcp_endpoint": "https://api.shopify.com/mcp", "tools": ["order.read", "product.update"], "auth": { "type": "oauth2", "token_url": "https://shopify.com/oauth/token" } }
This means that if a user tells an agent, “Help me manage my Shopify store,” the agent can directly query this
.json file and obtain everything it needs to integrate—without prior knowledge of the tools, APIs, or authentication mechanisms.Registry vs. .well-known
Dimension | MCP Registry | .well-known/mcp.json |
Discovery model | Search and explore unknown servers from a central platform | Direct lookup for a known website |
Use case | Discover tools from scratch | Quickly integrate with a known service (e.g., shopify.com) |
Perspective | Bottom-up (discovering tools) | Top-down (site advertises its capabilities) |
Relationship | Global index and verification layer | Site-specific interface exposure and integration guide |
These two approaches are complementary, allowing agents both to explore new tools and to integrate quickly with known platforms.
Combined with Computer Use Models: Toward General-Purpose Agents
In October 2024, Anthropic introduced the Computer Use Model, enabling agents to:
- Interact with user interfaces when APIs are unavailable
- Log in, click buttons, and navigate pages automatically
- Operate interfaces they have never seen before
A Hybrid Agent Strategy
We can envision agents operating as follows:
- If a website provides
.well-known/mcp.json, the agent uses MCP-based API integration (fast, stable, secure)
- If not, the agent falls back to UI-level interaction via Computer Use models, simulating human behavior
Combining MCP with Computer Use models may be the future solution for integrating long-tail websites.
Final Takeaway: Teaching Agents to Check .well-known Is a Step Toward a Web-Native AI
.well-known/mcp.jsonallows websites to “introduce themselves” to agents
- It complements the discovery layer of the MCP Registry, enabling both proactive exploration and passive registration
- When combined with Computer Use models, agents can:
- Use APIs when available
- Fall back to UI automation when APIs do not exist
As a result, agents are no longer limited to built-in toolkits. They can truly navigate the web, discover tools, and integrate services autonomously.
References
The Creators of Model Context Protocol
MCP Workshop by Anthropic
MCP: Flash in the Pan or Future Standard?
Why MCP Won
Closing Thoughts
This article on MCP is long, but there is still much more to explore. I hope these notes were helpful.
Writing this for the first time made me realize how challenging it is—not only in terms of time and mental effort, but also in ensuring technical accuracy and clarity.
