mcp2cli

E2e Conformance Testing

Build structured, reproducible test suites that validate your MCP server against the full protocol specification — using nothing but bash and mcp2cli.


The Problem

You’ve built an MCP server. How do you know it actually conforms to the spec? How do you prevent regressions? Writing a custom MCP client in code just to test your server is circular — you’d be testing your test client as much as the server.

mcp2cli is already a fully compliant MCP client. Use it as a test harness.

graph LR
    T["Test Script<br/>(bash)"] -->|"invokes"| CLI["mcp2cli<br/>(compliant MCP client)"]
    CLI -->|"MCP protocol"| SERVER["Your MCP Server"]
    CLI -->|"JSON output"| T
    T -->|"assert"| RESULT["Pass / Fail"]

Test Framework in 30 Lines

Here’s a minimal bash test framework that wraps mcp2cli:

#!/bin/bash
# mcp-test-framework.sh — source this in your test scripts
PASS=0; FAIL=0; SKIP=0; ERRORS=""
SERVER="${MCP_TEST_SERVER:-}" # Set via --url or --stdio
_mcp() { mcp2cli $SERVER --json --timeout "${MCP_TIMEOUT:-30}" "$@" 2>/dev/null; }
assert_ok() {
local name="$1"; shift
if output=$(_mcp "$@"); then
echo " ✅ $name"; ((PASS++))
else
echo " ❌ $name (exit: $?)"; ((FAIL++))
ERRORS+="FAIL: $name\n"
fi
}
assert_fail() {
local name="$1"; shift
if _mcp "$@" >/dev/null 2>&1; then
echo " ❌ $name (expected failure, got success)"; ((FAIL++))
ERRORS+="FAIL: $name\n"
else
echo " ✅ $name (correctly rejected)"; ((PASS++))
fi
}
assert_json() {
local name="$1" jq_expr="$2"; shift 2
local output
if output=$(_mcp "$@"); then
if echo "$output" | jq -e "$jq_expr" >/dev/null 2>&1; then
echo " ✅ $name"; ((PASS++))
else
echo " ❌ $name (assertion failed: $jq_expr)"; ((FAIL++))
ERRORS+="FAIL: $name — jq: $jq_expr\n"
fi
else
echo " ❌ $name (command failed)"; ((FAIL++))
ERRORS+="FAIL: $name\n"
fi
}
skip_if() {
local reason="$1" name="$2"
echo " ⏭️ $name (skipped: $reason)"; ((SKIP++))
}
summary() {
echo ""
echo "═══════════════════════════════════════"
echo " Results: $PASS passed, $FAIL failed, $SKIP skipped"
echo "═══════════════════════════════════════"
if [ -n "$ERRORS" ]; then
echo ""; echo "Failures:"; echo -e "$ERRORS"
fi
exit $FAIL
}

Conformance Test Suite

A complete conformance suite organized by MCP spec section:

1. Lifecycle & Initialization

#!/bin/bash
source ./mcp-test-framework.sh
SERVER="--url http://localhost:3001/mcp"
echo "=== 1. Lifecycle & Initialization ==="
# 1.1 Server responds to initialize
assert_json "initialize succeeds" \
'.data' \
inspect
# 1.2 Server reports valid protocol version
assert_json "protocol version is 2025-11-25 or compatible" \
'.data.protocol_version' \
inspect
# 1.3 Server declares capabilities
assert_json "server declares capabilities object" \
'.data.capabilities | type == "object"' \
inspect
# 1.4 Server provides name and version
assert_json "server info has name" \
'.data.server_info.name | length > 0' \
inspect
# 1.5 Ping works after init
assert_ok "ping after initialization" ping
summary

2. Discovery

Terminal window
echo "=== 2. Discovery ==="
# 2.1 tools/list returns array
assert_json "tools/list returns items array" \
'.data.items | type == "array"' \
ls --tools
# 2.2 Each tool has required fields
assert_json "tools have id and kind" \
'[.data.items[] | select(.id and .kind)] | length == (.data.items | length)' \
ls --tools
# 2.3 resources/list returns array
assert_json "resources/list returns items array" \
'.data.items | type == "array"' \
ls --resources
# 2.4 prompts/list returns array
assert_json "prompts/list returns items array" \
'.data.items | type == "array"' \
ls --prompts
# 2.5 Filtered discovery works
TOOL_COUNT=$(_mcp ls --tools | jq '.data.items | length')
assert_ok "discovery returns $TOOL_COUNT tools" ls --tools
summary

3. Tool Invocation

Terminal window
echo "=== 3. Tool Invocation ==="
# Dynamically test each tool
TOOLS=$(_mcp ls --tools | jq -r '.data.items[].id')
for tool in $TOOLS; do
assert_ok "tool call: $tool" "$tool"
done
# 3.1 Required argument enforcement
# Find a tool with required args and test without them
TOOL_WITH_REQUIRED=$(_mcp ls --tools | jq -r '
.data.items[] | select(.summary | test("required"; "i")) | .id
' | head -1)
if [ -n "$TOOL_WITH_REQUIRED" ]; then
assert_fail "required args enforced on $TOOL_WITH_REQUIRED" \
"$TOOL_WITH_REQUIRED"
else
skip_if "no tool with known required args" "required arg enforcement"
fi
summary

4. Resource Reading

Terminal window
echo "=== 4. Resource Reading ==="
# 4.1 Read each concrete resource
RESOURCES=$(_mcp ls --resources | jq -r '
.data.items[] | select(.kind == "resource") | .id
')
for uri in $RESOURCES; do
assert_ok "resource read: $uri" get "$uri"
done
# 4.2 Invalid URI returns error
assert_fail "reject invalid resource URI" get "invalid://nonexistent/resource"
summary

5. Prompts

Terminal window
echo "=== 5. Prompt Execution ==="
PROMPTS=$(_mcp ls --prompts | jq -r '.data.items[].id')
for prompt in $PROMPTS; do
assert_ok "prompt: $prompt" "$prompt"
done
summary

6. Capability-Gated Features

Terminal window
echo "=== 6. Capability-Gated Features ==="
# Check if specific capabilities are advertised
CAPS=$(_mcp inspect | jq '.data.capabilities')
# 6.1 Logging
if echo "$CAPS" | jq -e '.logging' >/dev/null 2>&1; then
assert_ok "set log level: debug" log debug
assert_ok "set log level: warn" log warn
else
skip_if "logging not advertised" "log level setting"
fi
# 6.2 Completions
if echo "$CAPS" | jq -e '.completions' >/dev/null 2>&1; then
assert_ok "completion request" complete ref/prompt "" "" ""
else
skip_if "completions not advertised" "completion request"
fi
# 6.3 Resource subscriptions
if echo "$CAPS" | jq -e '.resources.subscribe' >/dev/null 2>&1; then
FIRST_RESOURCE=$(_mcp ls --resources | jq -r '.data.items[0].id // empty')
if [ -n "$FIRST_RESOURCE" ]; then
assert_ok "subscribe to resource" subscribe "$FIRST_RESOURCE"
assert_ok "unsubscribe from resource" unsubscribe "$FIRST_RESOURCE"
fi
else
skip_if "subscribe not advertised" "resource subscription"
fi
summary

Flow Testing: Script Multi-Step Scenarios

Beyond individual method conformance, test realistic multi-step workflows that exercise state and sequencing:

Auth → Discover → Invoke → Verify Flow

#!/bin/bash
source ./mcp-test-framework.sh
SERVER="--url http://localhost:3001/mcp"
echo "=== Flow: Auth → Discover → Invoke → Verify ==="
# Step 1: Health check
assert_ok "1. server is healthy" doctor
# Step 2: Discovery succeeds and returns non-empty
assert_json "2. discovery finds at least 1 capability" \
'.data.items | length > 0' \
ls
# Step 3: Pick the first tool
FIRST_TOOL=$(_mcp ls --tools | jq -r '.data.items[0].id // empty')
if [ -z "$FIRST_TOOL" ]; then
echo " ⏭️ No tools found, skipping invoke flow"
summary
fi
# Step 4: Invoke it
assert_ok "3. invoke first tool: $FIRST_TOOL" "$FIRST_TOOL"
# Step 5: Verify result has content
assert_json "4. result has content array" \
'.data.content | type == "array"' \
"$FIRST_TOOL"
# Step 6: Invoke again — idempotency check
RESULT1=$(_mcp "$FIRST_TOOL" | jq -S '.data.content')
RESULT2=$(_mcp "$FIRST_TOOL" | jq -S '.data.content')
if [ "$RESULT1" = "$RESULT2" ]; then
echo " ✅ 5. idempotent tool invocation"; ((PASS++))
else
echo " ⚠️ 5. non-idempotent tool (not necessarily a bug)"; ((PASS++))
fi
summary

Discovery Cache Invalidation Flow

Terminal window
echo "=== Flow: Cache Invalidation ==="
# Step 1: First discovery (populates cache)
assert_ok "1. initial discovery" ls
# Step 2: Second discovery (should use cache — fast)
START=$(date +%s%N)
assert_ok "2. cached discovery" ls
END=$(date +%s%N)
CACHED_MS=$(( (END - START) / 1000000 ))
echo " cache hit latency: ${CACHED_MS}ms"
# Step 3: Verify capabilities are consistent across calls
COUNT1=$(_mcp ls --tools | jq '.data.items | length')
COUNT2=$(_mcp ls --tools | jq '.data.items | length')
if [ "$COUNT1" = "$COUNT2" ]; then
echo " ✅ 3. capability count stable ($COUNT1 tools)"; ((PASS++))
else
echo " ❌ 3. capability count changed ($COUNT1 vs $COUNT2)"; ((FAIL++))
fi
summary

Background Job Lifecycle Flow

Terminal window
echo "=== Flow: Background Job Lifecycle ==="
# Only if server supports tasks
CAPS=$(_mcp inspect | jq '.data.capabilities')
if ! echo "$CAPS" | jq -e '.tasks' >/dev/null 2>&1; then
echo " ⏭️ Server does not advertise tasks capability"
summary
fi
# Step 1: Submit background job
FIRST_TOOL=$(_mcp ls --tools | jq -r '.data.items[0].id // empty')
assert_ok "1. submit background job" "$FIRST_TOOL" --background
# Step 2: List jobs
assert_json "2. jobs list is array" \
'type == "array" or .data | type == "array"' \
jobs list
# Step 3: Show latest job
assert_ok "3. show latest job" jobs show --latest
# Step 4: Wait for completion
assert_ok "4. wait for job" jobs wait --latest
summary

Running the Full Suite

All-in-One Runner

#!/bin/bash
# run-conformance.sh — execute all conformance test files
set -euo pipefail
TARGET="${1:?Usage: $0 <--url URL | --stdio CMD>}"
shift
export MCP_TEST_SERVER="$TARGET $*"
export MCP_TIMEOUT=15
echo "╔══════════════════════════════════════════╗"
echo "║ MCP Conformance Test Suite ║"
echo "║ Target: $MCP_TEST_SERVER"
echo "╚══════════════════════════════════════════╝"
echo ""
TOTAL_PASS=0; TOTAL_FAIL=0; TOTAL_SKIP=0
for test_file in tests/conformance/*.sh; do
echo ""
echo "━━━ $(basename "$test_file") ━━━"
if output=$(bash "$test_file" 2>&1); then
# Parse results from last line
pass=$(echo "$output" | grep -oP '\d+ passed' | grep -oP '\d+')
fail=$(echo "$output" | grep -oP '\d+ failed' | grep -oP '\d+')
skip=$(echo "$output" | grep -oP '\d+ skipped' | grep -oP '\d+')
TOTAL_PASS=$((TOTAL_PASS + ${pass:-0}))
TOTAL_FAIL=$((TOTAL_FAIL + ${fail:-0}))
TOTAL_SKIP=$((TOTAL_SKIP + ${skip:-0}))
fi
echo "$output"
done
echo ""
echo "╔══════════════════════════════════════════╗"
echo "║ TOTAL: $TOTAL_PASS pass, $TOTAL_FAIL fail, $TOTAL_SKIP skip"
echo "╚══════════════════════════════════════════╝"
exit $TOTAL_FAIL

Usage

Terminal window
# Against an HTTP server
./run-conformance.sh --url http://localhost:3001/mcp
# Against a stdio server
./run-conformance.sh --stdio "npx @modelcontextprotocol/server-everything"
# With custom timeout
MCP_TIMEOUT=60 ./run-conformance.sh --url https://staging.api/mcp

CI/CD Integration

GitHub Actions

name: MCP Conformance
on: [push, pull_request]
jobs:
conformance:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Start MCP server
run: |
cargo build --release -p my-mcp-server
./target/release/my-mcp-server &
# Wait for server to be ready
for i in $(seq 1 30); do
curl -sf http://localhost:3001/mcp > /dev/null && break
sleep 1
done
- name: Install mcp2cli
run: cargo install --path .
- name: Run conformance suite
run: ./run-conformance.sh --url http://localhost:3001/mcp
- name: Upload test results
if: always()
uses: actions/upload-artifact@v4
with:
name: conformance-results
path: test-results/

Pre-Commit Hook

.git/hooks/pre-push
#!/bin/bash
echo "Running MCP conformance tests..."
if ! ./run-conformance.sh --stdio "./target/debug/my-server" > /tmp/mcp-conformance.log 2>&1; then
echo "❌ Conformance tests failed. See /tmp/mcp-conformance.log"
exit 1
fi
echo "✅ All conformance tests passed"

Spec Coverage Matrix

Track which spec sections your server implements:

#!/bin/bash
# coverage-matrix.sh — generate a spec coverage report
source ./mcp-test-framework.sh
SERVER="--url http://localhost:3001/mcp"
echo "MCP Spec Coverage Matrix"
echo "========================"
echo ""
# Core
echo "## Core Protocol"
CAPS=$(_mcp inspect)
echo " initialize: ✅ (mcp2cli connected)"
echo " ping: $(mcp2cli $SERVER --timeout 5 ping >/dev/null 2>&1 && echo '✅' || echo '❌')"
# Capabilities
CAP_JSON=$(echo "$CAPS" | jq -r '.data.capabilities // {}')
check_cap() {
local name="$1" path="$2"
if echo "$CAP_JSON" | jq -e "$path" >/dev/null 2>&1; then
echo " $name:$(printf '%*s' $((20 - ${#name})) '')✅ advertised"
else
echo " $name:$(printf '%*s' $((20 - ${#name})) '')— not advertised"
fi
}
echo ""
echo "## Server Capabilities"
check_cap "tools" ".tools"
check_cap "resources" ".resources"
check_cap "prompts" ".prompts"
check_cap "logging" ".logging"
check_cap "completions" ".completions"
check_cap "tasks" ".tasks"
check_cap "resources.subscribe" ".resources.subscribe"
echo ""
echo "## Discovery"
TOOL_COUNT=$(_mcp ls --tools 2>/dev/null | jq '.data.items | length' 2>/dev/null || echo 0)
RES_COUNT=$(_mcp ls --resources 2>/dev/null | jq '.data.items | length' 2>/dev/null || echo 0)
PROMPT_COUNT=$(_mcp ls --prompts 2>/dev/null | jq '.data.items | length' 2>/dev/null || echo 0)
echo " tools: $TOOL_COUNT"
echo " resources: $RES_COUNT"
echo " prompts: $PROMPT_COUNT"

Output:

MCP Spec Coverage Matrix
========================
## Core Protocol
initialize: ✅ (mcp2cli connected)
ping: ✅
## Server Capabilities
tools: ✅ advertised
resources: ✅ advertised
prompts: ✅ advertised
logging: ✅ advertised
completions: — not advertised
tasks: — not advertised
resources.subscribe: ✅ advertised
## Discovery
tools: 12
resources: 5
prompts: 3

Regression Testing Pattern

Capture golden outputs and diff against them to detect regressions:

#!/bin/bash
# regression-test.sh — compare against golden baselines
BASELINE_DIR="./test-baselines"
RESULT_DIR="./test-results"
mkdir -p "$RESULT_DIR"
SERVER="--url http://localhost:3001/mcp"
# Capture current state
mcp2cli $SERVER --json ls --tools | jq -S '.data.items | map({id, kind})' > "$RESULT_DIR/tools.json"
mcp2cli $SERVER --json ls --resources | jq -S '.data.items | map({id, kind})' > "$RESULT_DIR/resources.json"
mcp2cli $SERVER --json inspect | jq -S '.data.capabilities' > "$RESULT_DIR/capabilities.json"
# Compare against baseline
DIFFS=0
for file in tools.json resources.json capabilities.json; do
if [ ! -f "$BASELINE_DIR/$file" ]; then
echo "⚠️ No baseline for $file — creating"
cp "$RESULT_DIR/$file" "$BASELINE_DIR/$file"
elif ! diff -q "$BASELINE_DIR/$file" "$RESULT_DIR/$file" >/dev/null 2>&1; then
echo "❌ REGRESSION: $file"
diff --color "$BASELINE_DIR/$file" "$RESULT_DIR/$file"
((DIFFS++))
else
echo "✅ $file matches baseline"
fi
done
if [ "$DIFFS" -gt 0 ]; then
echo ""
echo "To update baselines: cp $RESULT_DIR/*.json $BASELINE_DIR/"
exit 1
fi

Negative Testing

Test that your server correctly rejects invalid inputs:

Terminal window
echo "=== Negative Tests ==="
# Invalid JSON-RPC (via raw bridge commands)
assert_fail "reject empty tool name" tool call --name ""
assert_fail "reject nonexistent tool" tool call --name "this-tool-does-not-exist"
assert_fail "reject invalid resource URI" resource read --uri "://broken"
# Type mismatches (if tools have typed schemas)
# e.g., passing string to integer parameter
assert_fail "reject string for integer param" add --a "not-a-number" --b 3
summary

Performance Benchmarking

Use mcp2cli to measure server performance:

#!/bin/bash
# bench.sh — simple latency benchmarks
SERVER="--url http://localhost:3001/mcp"
ITERATIONS=20
echo "=== MCP Server Performance ==="
echo ""
bench() {
local name="$1"; shift
local total=0
for i in $(seq 1 $ITERATIONS); do
start=$(date +%s%N)
mcp2cli $SERVER --timeout 30 "$@" >/dev/null 2>&1
end=$(date +%s%N)
elapsed=$(( (end - start) / 1000000 ))
total=$((total + elapsed))
done
avg=$((total / ITERATIONS))
echo " $name: avg ${avg}ms ($ITERATIONS iterations)"
}
bench "ping" ping
bench "discovery" ls
bench "tool:echo" echo --message test
echo ""
echo "Daemon comparison:"
mcp2cli daemon start bench-server 2>/dev/null
bench "ping (daemon)" ping
bench "tool:echo (daemon)" echo --message test
mcp2cli daemon stop bench-server 2>/dev/null

See Also