Note: MCP Ep.2

Intro

Recently, Google released the A2A protocol, extending the Model Context Protocol (MCP) proposed by Anthropic last year. Before diving into A2A, I wanted to take some time to document my notes on MCP.

📌 This article focuses on more advanced MCP topics, including MCP Agents, Sampling, Composability, Registry, and future developments.

Friendly reminder: this article turned out longer than expected, so it is split into two parts. Feel free to jump to the sections you are most interested in 🥸

MCP Agents and System Architecture

An MCP Agent is a powerful framework that provides the following core capabilities:

A simple and flexible context injection mechanism

Declarative frameworks for implementing various workflows

A rich set of building blocks for constructing agents

Taking LastMile AI’s MCP Agent as an example, it provides a lightweight framework for building agents using MCP 🤖

https://github.com/lastmile-ai/mcp-agent

System Architecture Considerations

Component	Responsibilities
Client	- Does not need to handle retry logic- Does not need to manage logging details
Server	- Closer to the final application- Has greater control over system-level interactions

Scalability and Limitations

Current model capacity constraints:

Standard models (e.g., Claude): ~50–100 tools

Advanced models: several hundred tools

Tool Management Strategies

To manage large toolsets effectively, the following approaches may be useful:

Tool search systems

A RAG-style abstraction layer over tools
Fuzzy search over a tool repository

Hierarchical system design

For example: financial tools, data reading tools, data writing tools

Implementation Guidelines

A recommended path for building an MCP server:

Start with basic tools to understand the MCP Server architecture → move on to prompt design → configure resource management

Automation and Integration

One noteworthy feature is that MCP supports automatic MCP server generation:

Using tools such as Cline

Enables real-time MCP server generation

Maintenance and Evolution

Key principles for managing system changes:

Adhere to the MCP protocol to ensure baseline functionality

Allow tools to evolve dynamically

Maintain consistency in resource prompts

Preserve standardized tool invocation patterns

Building Effective Agents with MCP

Next, we introduce two critical MCP concepts: Sampling and Composability

Sampling

In MCP, Sampling is a powerful capability that allows servers to request large language model (LLM) generations through the client. This enables servers to dynamically request model inference while executing tools, prompts, or workflows—without directly managing model access—while preserving security and privacy.

🤔 Why Use Sampling?

Traditionally, servers that rely on LLMs must manage API keys, model selection, and cost controls themselves. MCP Sampling fundamentally changes this:

Servers do not directly access LLMs

Servers request inference through the client without hosting or directly invoking models.

Users retain control

Clients can review and modify requests to ensure privacy and security compliance.

Flexible model selection

Clients choose the most appropriate model based on server preferences and available resources.

🔁 Sampling Workflow

Server sends a request

The server issues a sampling/createMessage request to the client, including conversation history, system prompts, and model preferences.

Client reviews the request

The client may inspect or modify the request to enforce user control and privacy.

LLM inference execution

The client selects an appropriate model and performs inference.

Review of generated output

The client inspects the output to ensure it meets expectations.

Return results to the server

The client sends the generation back to the server for further processing or user response.

This design ensures human-in-the-loop control while enabling nested LLM calls within multi-step agent workflows.

Request Format

Sampling requests use a standardized message format that includes:

messages: Conversation history, with roles (user, assistant) and content (text or images)

modelPreferences: Model hints, cost, speed, and intelligence priorities

systemPrompt: Optional system-level instruction

includeContext: Scope of included context (none, thisServer, allServers)

temperature: Controls randomness (0.0–1.0)

maxTokens: Maximum token count

stopSequences: Optional stop tokens

metadata: Provider-specific parameters

Official example request：


{
  "method": "sampling/createMessage",
  "params": {
    "messages": [
      {
        "role": "user",
        "content": {
          "type": "text",
          "text": "What files are in the current directory?"
        }
      }
    ],
    "systemPrompt": "You are a helpful file system assistant.",
    "includeContext": "thisServer",
    "maxTokens": 100
  }
}

🔐 Security and User Control

The design of Sampling emphasizes a human-in-the-loop philosophy, ensuring that users retain control over both model inputs and outputs. Through the client, users can:

Review and modify server requests to prevent potentially malicious or inappropriate usage

Enforce limits on model usage, such as daily quotas or restrictions on specific models

Inspect generated outputs to ensure they meet user expectations and safety standards

These mechanisms ensure that, even in multi-agent systems, the client retains final authority over all interactions with the model.

→ By centralizing all LLM interactions on the client side, requests can be collectively managed. In open-source setups, clients may even self-host their own LLMs, retaining full freedom over which model types are actually used.

For example, a server may request inference using specific parameters, such as preferred models. It might say, “I strongly prefer this particular version of Claude,” or “I need either a large or small model—please satisfy this if possible.” The server can pass along system prompts, task prompts, temperature, maximum token limits, and other parameters. However, the client is not obligated to comply. It may reject the request if it appears malicious, or impose strict limits based on privacy or cost considerations, including throttling request frequency.

2. Composability

Earlier, we mentioned that MCP separates clients and servers. However, this separation is logical rather than physical.

In other words, any application, API, or agent can simultaneously act as both an MCP Client and an MCP Server.

MCP Agent System Overview

Within the MCP architecture, agents can assume dual roles. Consider a real-world example: when a user interacts with Claude Desktop (an LLM), they may request a research agent to gather information. This research agent acts as both an MCP server and an MCP client. It can invoke multiple resources—such as file systems, data ingestion services, or web search tools—process the retrieved information, and return structured results.

Agent Chaining Architecture

This design forms a chained architecture:

User → Client/Server Combination A → Client/Server Combination B → …

Such chaining enables multi-layered LLM systems, where each layer focuses on a specific responsibility or task.

Common Questions and Answers

Category	Question	Answer
Error handling	How are cascading errors handled in multi-layer systems?	This is a general multi-node challenge and depends on how each agent processes information flow. MCP itself neither simplifies nor complicates this issue.
Protocol choice	Why use MCP instead of HTTP?	MCP offers richer capabilities, including resource notifications, bidirectional communication, and structured data requests, supporting complex asynchronous multi-step interactions.
Intelligent services	Why convert traditional services into MCP servers?	Doing so grants services agent-like capabilities, enabling more autonomous and intelligent task handling.

System Control and Observability

LLMs reside at the application layer and control rate limits and interaction flow

The system behaves largely as a black box; observability depends on the implementation and surrounding ecosystem

Debugging and Security

Use the Inspector tool to view logs and monitor connections

Supports server-side debugging

Authorization and authentication are defined by the server implementer

Tool annotations enable fine-grained read/write permission control

3. Sampling + Composability

Sampling and Composability can also be combined to chain multiple agents together while ensuring that the client application maintains full control over the inference process.

In some cases, these agents may exist on the public internet or be developed by third parties

MCP allows you to connect to such agents while preserving privacy, security, and control guarantees

This raises a natural question: Couldn’t we just use RESTful APIs instead? Let’s compare the two approaches.

MCP vs. RESTful APIs

MCP vs. RESTful API	MCP	RESTful API
Design goal	Orchestrates LLM reasoning and tool interactions with contextual logic	Transfers data via CRUD operations
Data processing	Supports complex transformations, context composition, and prompt logic	Primarily structured data with fixed formats
Interactivity	Multi-step, logic-driven conversations and tool chains	Stateless request-response
LLM integration	Deep integration for multi-step reasoning	Minimal; usually passes inputs and outputs only
Use cases	Multi-agent systems, intelligent assistants, context-aware applications	Simple APIs and backend services
State management	Maintains context and workflow state	Stateless by design

What’s Next for MCP

As MCP gains adoption across agent applications and LLM ecosystems, new challenges emerge. To improve scalability and stability, the MCP team is advancing several key initiatives—most notably remote server support, formal authentication, and the MCP Registry API.

Remote MCP Servers and Authentication

MCP deployment is no longer limited to local processes (stdio). Many teams want to host MCP servers in the cloud or internal networks to support cross-system integration.

Remote MCP Servers + Auth

The Inspector tool now officially supports authentication

Server-Sent Events (SSE) are the recommended transport for low-latency, reliable bidirectional communication

Official MCP Registry API (In Development)

Why a Registry?

Today, MCP servers are scattered across GitHub, npm, PyPI, Rust ecosystems, and more—leading to:

Poor discoverability

Opaque transport details

No version tracking or trust verification

What Is the MCP Registry API?

The MCP team is building an open, centrally hosted Registry API to address this fragmentation.

Feature	Description
Central hosting	Maintained by the MCP team; schemas and development are fully open-source
Ecosystem integration	Works with npm, PyPI, Cargo, Go Modules
Metadata queries	Protocols, transports, authors, and version history
Version diffing	Tracks added tools, updated descriptions, and API changes

Example：

If Shopify publishes an official MCP server, developers can verify its authenticity, supported transports, authentication requirements, and available tools via the Registry.

FAQ

Topic	Question	Summary Answer
Self-hosted Registry support	Can organizations run their own MCP Registry?	✅ Yes. In addition to the public registry, organizations can deploy private registries and integrate them with development environments such as VS Code and Cursor. The Registry API only provides data interfaces; the UI is unrestricted.
Open vs. closed	Is MCP open-source? Does it support multiple model providers?	✅ Fully open. MCP is an open protocol. Claude is not the only client, and other LLM providers are free to implement it. Healthy competition benefits both users and developers.
Server-initiated interactions	Can MCP servers proactively trigger inference or interact with clients?	✅ Partially supported. Servers can currently notify clients of resource updates. Proactive sampling is not yet supported but is on the roadmap. Servers may also act as clients and use their own LLMs to implement advanced interaction and composition logic.
Model participation	Does every MCP interaction require an LLM?	❌ Not necessarily. Clients can directly invoke tools or resources without involving a model, enabling fully deterministic control flows.
Server-to-server communication	Can MCP servers communicate directly with each other?	✅ Flexible but not the default. By default, interactions are mediated by a client to ensure security and separation of concerns. However, the architecture allows implementers to design direct server-to-server communication if needed.

MCP Registry: Enabling Self-Evolving AI Agents 🦾

The MCP Registry fundamentally changes how agents are designed. Instead of relying on predefined tools, agents can dynamically discover, load, and use new capabilities at runtime.

How Do Agents “Evolve”?

Traditionally, agents must know in advance which tools are available. With the MCP Registry, agents can instead:

Search the Registry to find the tools they need (for example, a Google Maps MCP server)

Verify that the source is trusted, then connect via transports such as SSE

Dynamically use the tool to complete tasks such as geolocation, distance calculation, or route planning

✅ No pre-packaging and no manual configuration. Agents can “grow stronger” based on the task at hand 🦾

🔐 Security and Access Control

The MCP team proposes DevOps-style mechanisms for governance and safety:

Control Mechanism	Description
Private registries	Control which MCP servers are available to agents
Whitelisted queries	Restrict agents to searching and accessing only approved sources
Official verification badges	Allow agents to identify trusted providers such as Shopify or Grafana
Middleware filters	Add an additional filtering layer to block unauthorized server access

Summary

Traditional Agent Design	MCP-Based Design
Tools are hard-coded and difficult to extend	Tools can be dynamically discovered, loaded, and used
Manual integration and deployment	Agents automatically discover and connect to tools
No verification or access control	Registry provides verification, whitelists, and security filtering

→ The MCP Registry transforms agents from passive tool users into active, intelligent decision-makers.

Beyond the Registry: Integrating `.well-known` with MCP for Smarter Agents

As MCP pushes AI agents toward more dynamic and composable tool ecosystems, another key complementary mechanism emerges: .well-known/mcp.json. It not only addresses limitations of the MCP Registry, but also lays the groundwork for a more understandable and accessible web for intelligent agents.

What Is `.well-known/mcp.json`?

.well-known is a standardized way for websites to expose metadata at their root (for example, .well-known/robots.txt or .well-known/openid-configuration). In the context of MCP, it allows a website to explicitly publish its MCP endpoint and available tools.

Example：

Suppose Shopify exposes https://shopify.com/.well-known/mcp.json with the following content:


{
  "mcp_endpoint": "https://api.shopify.com/mcp",
  "tools": ["order.read", "product.update"],
  "auth": {
    "type": "oauth2",
    "token_url": "https://shopify.com/oauth/token"
  }
}

This means that if a user tells an agent, “Help me manage my Shopify store,” the agent can directly query this .json file and obtain everything it needs to integrate—without prior knowledge of the tools, APIs, or authentication mechanisms.

Registry vs. `.well-known`

Dimension	MCP Registry	`.well-known/mcp.json`
Discovery model	Search and explore unknown servers from a central platform	Direct lookup for a known website
Use case	Discover tools from scratch	Quickly integrate with a known service (e.g., shopify.com)
Perspective	Bottom-up (discovering tools)	Top-down (site advertises its capabilities)
Relationship	Global index and verification layer	Site-specific interface exposure and integration guide

These two approaches are complementary, allowing agents both to explore new tools and to integrate quickly with known platforms.

Combined with Computer Use Models: Toward General-Purpose Agents

In October 2024, Anthropic introduced the Computer Use Model, enabling agents to:

Interact with user interfaces when APIs are unavailable

Operate interfaces they have never seen before

A Hybrid Agent Strategy

We can envision agents operating as follows:

If a website provides .well-known/mcp.json, the agent uses MCP-based API integration (fast, stable, secure)

If not, the agent falls back to UI-level interaction via Computer Use models, simulating human behavior

Combining MCP with Computer Use models may be the future solution for integrating long-tail websites.

Final Takeaway: Teaching Agents to Check `.well-known` Is a Step Toward a Web-Native AI

.well-known/mcp.json allows websites to “introduce themselves” to agents

It complements the discovery layer of the MCP Registry, enabling both proactive exploration and passive registration

When combined with Computer Use models, agents can:

Use APIs when available
Fall back to UI automation when APIs do not exist

As a result, agents are no longer limited to built-in toolkits. They can truly navigate the web, discover tools, and integrate services autonomously.

References

Closing Thoughts

This article on MCP is long, but there is still much more to explore. I hope these notes were helpful.

Writing this for the first time made me realize how challenging it is—not only in terms of time and mental effort, but also in ensuring technical accuracy and clarity.