Note: MCP Ep.2
Note: MCP Ep.2

Note: MCP Ep.2

Created
Apr 18, 2025 03:35 AM
Summary
Recently, Google released the A2A protocol, extending the Model Context Protocol (MCP) proposed by Anthropic last year. Before diving into A2A, I wanted to take some time to document my notes on MCP.
Tags
AI
Note

Intro

Recently, Google released the A2A protocol, extending the Model Context Protocol (MCP) proposed by Anthropic last year. Before diving into A2A, I wanted to take some time to document my notes on MCP.
📌 This article focuses on more advanced MCP topics, including MCP Agents, Sampling, Composability, Registry, and future developments.
Friendly reminder: this article turned out longer than expected, so it is split into two parts. Feel free to jump to the sections you are most interested in 🥸

MCP Agents and System Architecture

An MCP Agent is a powerful framework that provides the following core capabilities:
  • A simple and flexible context injection mechanism
  • Declarative frameworks for implementing various workflows
  • A rich set of building blocks for constructing agents
Taking LastMile AI’s MCP Agent as an example, it provides a lightweight framework for building agents using MCP 🤖

System Architecture Considerations

Component
Responsibilities
Client
- Does not need to handle retry logic- Does not need to manage logging details
Server
- Closer to the final application- Has greater control over system-level interactions

Scalability and Limitations

Current model capacity constraints:
  • Standard models (e.g., Claude): ~50–100 tools
  • Advanced models: several hundred tools

Tool Management Strategies

To manage large toolsets effectively, the following approaches may be useful:
  • Tool search systems
    • A RAG-style abstraction layer over tools
    • Fuzzy search over a tool repository
  • Hierarchical system design
    • For example: financial tools, data reading tools, data writing tools

Implementation Guidelines

A recommended path for building an MCP server:
Start with basic tools to understand the MCP Server architecture → move on to prompt design → configure resource management

Automation and Integration

One noteworthy feature is that MCP supports automatic MCP server generation:
  • Using tools such as Cline
  • Enables real-time MCP server generation

Maintenance and Evolution

Key principles for managing system changes:
  • Adhere to the MCP protocol to ensure baseline functionality
  • Allow tools to evolve dynamically
  • Maintain consistency in resource prompts
  • Preserve standardized tool invocation patterns

Building Effective Agents with MCP

Next, we introduce two critical MCP concepts: Sampling and Composability

Sampling

notion image
In MCP, Sampling is a powerful capability that allows servers to request large language model (LLM) generations through the client. This enables servers to dynamically request model inference while executing tools, prompts, or workflows—without directly managing model access—while preserving security and privacy.

🤔 Why Use Sampling?

Traditionally, servers that rely on LLMs must manage API keys, model selection, and cost controls themselves. MCP Sampling fundamentally changes this:
  • Servers do not directly access LLMs
    • Servers request inference through the client without hosting or directly invoking models.
  • Users retain control
    • Clients can review and modify requests to ensure privacy and security compliance.
  • Flexible model selection
    • Clients choose the most appropriate model based on server preferences and available resources.

🔁 Sampling Workflow

  1. Server sends a request
    1. The server issues a sampling/createMessage request to the client, including conversation history, system prompts, and model preferences.
  1. Client reviews the request
    1. The client may inspect or modify the request to enforce user control and privacy.
  1. LLM inference execution
    1. The client selects an appropriate model and performs inference.
  1. Review of generated output
    1. The client inspects the output to ensure it meets expectations.
  1. Return results to the server
    1. The client sends the generation back to the server for further processing or user response.
This design ensures human-in-the-loop control while enabling nested LLM calls within multi-step agent workflows.

Request Format

Sampling requests use a standardized message format that includes:
  • messages: Conversation history, with roles (user, assistant) and content (text or images)
  • modelPreferences: Model hints, cost, speed, and intelligence priorities
  • systemPrompt: Optional system-level instruction
  • includeContext: Scope of included context (none, thisServer, allServers)
  • temperature: Controls randomness (0.0–1.0)
  • maxTokens: Maximum token count
  • stopSequences: Optional stop tokens
  • metadata: Provider-specific parameters
Official example request:
{ "method": "sampling/createMessage", "params": { "messages": [ { "role": "user", "content": { "type": "text", "text": "What files are in the current directory?" } } ], "systemPrompt": "You are a helpful file system assistant.", "includeContext": "thisServer", "maxTokens": 100 } }

🔐 Security and User Control

The design of Sampling emphasizes a human-in-the-loop philosophy, ensuring that users retain control over both model inputs and outputs. Through the client, users can:
  • Review and modify server requests to prevent potentially malicious or inappropriate usage
  • Enforce limits on model usage, such as daily quotas or restrictions on specific models
  • Inspect generated outputs to ensure they meet user expectations and safety standards
These mechanisms ensure that, even in multi-agent systems, the client retains final authority over all interactions with the model.
→ By centralizing all LLM interactions on the client side, requests can be collectively managed. In open-source setups, clients may even self-host their own LLMs, retaining full freedom over which model types are actually used.
  • For example, a server may request inference using specific parameters, such as preferred models. It might say, “I strongly prefer this particular version of Claude,” or “I need either a large or small model—please satisfy this if possible.” The server can pass along system prompts, task prompts, temperature, maximum token limits, and other parameters. However, the client is not obligated to comply. It may reject the request if it appears malicious, or impose strict limits based on privacy or cost considerations, including throttling request frequency.

2. Composability

Earlier, we mentioned that MCP separates clients and servers. However, this separation is logical rather than physical.
In other words, any application, API, or agent can simultaneously act as both an MCP Client and an MCP Server.
notion image

MCP Agent System Overview

Within the MCP architecture, agents can assume dual roles. Consider a real-world example: when a user interacts with Claude Desktop (an LLM), they may request a research agent to gather information. This research agent acts as both an MCP server and an MCP client. It can invoke multiple resources—such as file systems, data ingestion services, or web search tools—process the retrieved information, and return structured results.

Agent Chaining Architecture

This design forms a chained architecture:
User → Client/Server Combination A → Client/Server Combination B → …
Such chaining enables multi-layered LLM systems, where each layer focuses on a specific responsibility or task.

Common Questions and Answers

Category
Question
Answer
Error handling
How are cascading errors handled in multi-layer systems?
This is a general multi-node challenge and depends on how each agent processes information flow. MCP itself neither simplifies nor complicates this issue.
Protocol choice
Why use MCP instead of HTTP?
MCP offers richer capabilities, including resource notifications, bidirectional communication, and structured data requests, supporting complex asynchronous multi-step interactions.
Intelligent services
Why convert traditional services into MCP servers?
Doing so grants services agent-like capabilities, enabling more autonomous and intelligent task handling.

System Control and Observability

  • LLMs reside at the application layer and control rate limits and interaction flow
  • The system behaves largely as a black box; observability depends on the implementation and surrounding ecosystem

Debugging and Security

  • Supports server-side debugging
  • Authorization and authentication are defined by the server implementer
  • Tool annotations enable fine-grained read/write permission control

3. Sampling + Composability

Sampling and Composability can also be combined to chain multiple agents together while ensuring that the client application maintains full control over the inference process.
notion image
  • In some cases, these agents may exist on the public internet or be developed by third parties
  • MCP allows you to connect to such agents while preserving privacy, security, and control guarantees
This raises a natural question: Couldn’t we just use RESTful APIs instead? Let’s compare the two approaches.

MCP vs. RESTful APIs

MCP vs. RESTful API
MCP
RESTful API
Design goal
Orchestrates LLM reasoning and tool interactions with contextual logic
Transfers data via CRUD operations
Data processing
Supports complex transformations, context composition, and prompt logic
Primarily structured data with fixed formats
Interactivity
Multi-step, logic-driven conversations and tool chains
Stateless request-response
LLM integration
Deep integration for multi-step reasoning
Minimal; usually passes inputs and outputs only
Use cases
Multi-agent systems, intelligent assistants, context-aware applications
Simple APIs and backend services
State management
Maintains context and workflow state
Stateless by design

What’s Next for MCP

As MCP gains adoption across agent applications and LLM ecosystems, new challenges emerge. To improve scalability and stability, the MCP team is advancing several key initiatives—most notably remote server support, formal authentication, and the MCP Registry API.

Remote MCP Servers and Authentication

MCP deployment is no longer limited to local processes (stdio). Many teams want to host MCP servers in the cloud or internal networks to support cross-system integration.

Remote MCP Servers + Auth

  • Server-Sent Events (SSE) are the recommended transport for low-latency, reliable bidirectional communication

Official MCP Registry API (In Development)

notion image

Why a Registry?

Today, MCP servers are scattered across GitHub, npm, PyPI, Rust ecosystems, and more—leading to:
  • Poor discoverability
  • Opaque transport details
  • No version tracking or trust verification

What Is the MCP Registry API?

The MCP team is building an open, centrally hosted Registry API to address this fragmentation.
Feature
Description
Central hosting
Maintained by the MCP team; schemas and development are fully open-source
Ecosystem integration
Works with npm, PyPI, Cargo, Go Modules
Metadata queries
Protocols, transports, authors, and version history
Version diffing
Tracks added tools, updated descriptions, and API changes
  • Example:
    • If Shopify publishes an official MCP server, developers can verify its authenticity, supported transports, authentication requirements, and available tools via the Registry.

FAQ

Topic
Question
Summary Answer
Self-hosted Registry support
Can organizations run their own MCP Registry?
✅ Yes. In addition to the public registry, organizations can deploy private registries and integrate them with development environments such as VS Code and Cursor. The Registry API only provides data interfaces; the UI is unrestricted.
Open vs. closed
Is MCP open-source? Does it support multiple model providers?
✅ Fully open. MCP is an open protocol. Claude is not the only client, and other LLM providers are free to implement it. Healthy competition benefits both users and developers.
Server-initiated interactions
Can MCP servers proactively trigger inference or interact with clients?
✅ Partially supported. Servers can currently notify clients of resource updates. Proactive sampling is not yet supported but is on the roadmap. Servers may also act as clients and use their own LLMs to implement advanced interaction and composition logic.
Model participation
Does every MCP interaction require an LLM?
❌ Not necessarily. Clients can directly invoke tools or resources without involving a model, enabling fully deterministic control flows.
Server-to-server communication
Can MCP servers communicate directly with each other?
✅ Flexible but not the default. By default, interactions are mediated by a client to ensure security and separation of concerns. However, the architecture allows implementers to design direct server-to-server communication if needed.

MCP Registry: Enabling Self-Evolving AI Agents 🦾

The MCP Registry fundamentally changes how agents are designed. Instead of relying on predefined tools, agents can dynamically discover, load, and use new capabilities at runtime.
notion image

How Do Agents “Evolve”?

Traditionally, agents must know in advance which tools are available. With the MCP Registry, agents can instead:
  1. Search the Registry to find the tools they need (for example, a Google Maps MCP server)
  1. Verify that the source is trusted, then connect via transports such as SSE
  1. Dynamically use the tool to complete tasks such as geolocation, distance calculation, or route planning
✅ No pre-packaging and no manual configuration. Agents can “grow stronger” based on the task at hand 🦾

🔐 Security and Access Control

The MCP team proposes DevOps-style mechanisms for governance and safety:
Control Mechanism
Description
Private registries
Control which MCP servers are available to agents
Whitelisted queries
Restrict agents to searching and accessing only approved sources
Official verification badges
Allow agents to identify trusted providers such as Shopify or Grafana
Middleware filters
Add an additional filtering layer to block unauthorized server access

Summary

Traditional Agent Design
MCP-Based Design
Tools are hard-coded and difficult to extend
Tools can be dynamically discovered, loaded, and used
Manual integration and deployment
Agents automatically discover and connect to tools
No verification or access control
Registry provides verification, whitelists, and security filtering
→ The MCP Registry transforms agents from passive tool users into active, intelligent decision-makers.

Beyond the Registry: Integrating .well-known with MCP for Smarter Agents

As MCP pushes AI agents toward more dynamic and composable tool ecosystems, another key complementary mechanism emerges: .well-known/mcp.json. It not only addresses limitations of the MCP Registry, but also lays the groundwork for a more understandable and accessible web for intelligent agents.

What Is .well-known/mcp.json?

.well-known is a standardized way for websites to expose metadata at their root (for example, .well-known/robots.txt or .well-known/openid-configuration). In the context of MCP, it allows a website to explicitly publish its MCP endpoint and available tools.

Example:

notion image
Suppose Shopify exposes https://shopify.com/.well-known/mcp.json with the following content:
{ "mcp_endpoint": "https://api.shopify.com/mcp", "tools": ["order.read", "product.update"], "auth": { "type": "oauth2", "token_url": "https://shopify.com/oauth/token" } }
This means that if a user tells an agent, “Help me manage my Shopify store,” the agent can directly query this .json file and obtain everything it needs to integrate—without prior knowledge of the tools, APIs, or authentication mechanisms.

Registry vs. .well-known

Dimension
MCP Registry
.well-known/mcp.json
Discovery model
Search and explore unknown servers from a central platform
Direct lookup for a known website
Use case
Discover tools from scratch
Quickly integrate with a known service (e.g., shopify.com)
Perspective
Bottom-up (discovering tools)
Top-down (site advertises its capabilities)
Relationship
Global index and verification layer
Site-specific interface exposure and integration guide
These two approaches are complementary, allowing agents both to explore new tools and to integrate quickly with known platforms.

Combined with Computer Use Models: Toward General-Purpose Agents

In October 2024, Anthropic introduced the Computer Use Model, enabling agents to:
  • Interact with user interfaces when APIs are unavailable
  • Log in, click buttons, and navigate pages automatically
  • Operate interfaces they have never seen before

A Hybrid Agent Strategy

We can envision agents operating as follows:
  • If a website provides .well-known/mcp.json, the agent uses MCP-based API integration (fast, stable, secure)
  • If not, the agent falls back to UI-level interaction via Computer Use models, simulating human behavior
Combining MCP with Computer Use models may be the future solution for integrating long-tail websites.

Final Takeaway: Teaching Agents to Check .well-known Is a Step Toward a Web-Native AI

  • .well-known/mcp.json allows websites to “introduce themselves” to agents
  • It complements the discovery layer of the MCP Registry, enabling both proactive exploration and passive registration
  • When combined with Computer Use models, agents can:
    • Use APIs when available
    • Fall back to UI automation when APIs do not exist
As a result, agents are no longer limited to built-in toolkits. They can truly navigate the web, discover tools, and integrate services autonomously.

References

The Creators of Model Context Protocol

MCP Workshop by Anthropic

MCP: Flash in the Pan or Future Standard?

Why MCP Won


Closing Thoughts

This article on MCP is long, but there is still much more to explore. I hope these notes were helpful.
Writing this for the first time made me realize how challenging it is—not only in terms of time and mental effort, but also in ensuring technical accuracy and clarity.