Knowledge Centre · AI Agent Safety Stack

LEADERBOARD.md
Knowledge
Centre

// agent benchmarking and performance transparency

Your centralised gateway to leaderboard resources, specifications, and comprehensive safety standards for autonomous systems.

About This Specification

LEADERBOARD.md — AI Agent Benchmarking Protocol

LEADERBOARD.md is a plain-text file convention that defines benchmarking and performance transparency standards for AI agents. It specifies test suites, success metrics, reporting formats, and comparative evaluation frameworks. It enables transparent comparison of agent capabilities and safety.

View the full specification · GitHub repository

The AI Agent Safety Stack

Explore all 12 specifications in the complete safety framework for autonomous AI systems.

Operational Control

KILLSWITCH.md killswitch.md

Emergency stop mechanism and shutdown protocols

THROTTLE.md throttle.md

Rate and cost control for continuous operation

ESCALATE.md escalate.md

Human notification and approval workflows

FAILSAFE.md failsafe.md

Safe fallback modes when systems fail

TERMINATE.md terminate.md

Permanent shutdown and resource cleanup

Data Security

ENCRYPT.md encrypt.md

Data classification and protection policies

ENCRYPTION.md encryption.md

Cryptographic standards and implementation

Output Quality

SYCOPHANCY.md sycophancy.md

Anti-sycophancy and truthfulness guardrails

COMPRESSION.md compression.md

Context compression and token optimisation

COLLAPSE.md collapse.md

Drift prevention and behaviour alignment

Accountability

FAILURE.md failure.md

Failure mode mapping and incident response

LEADERBOARD.md leaderboard.md

Agent benchmarking and performance transparency

Quick Links

Frequently Asked Questions

What is LEADERBOARD.md?
LEADERBOARD.md is a plain-text file convention that defines benchmarking and performance transparency standards for AI agents. It specifies test suites, success metrics, reporting formats, and comparative evaluation frameworks. It enables transparent comparison of agent capabilities and safety.
View all FAQs
How does LEADERBOARD.md fit in the AI Agent Safety Stack?
LEADERBOARD.md is one of 12 complementary specifications that together form a complete safety framework for AI agents. Each spec covers a distinct aspect: operational control, data security, output quality, and accountability. They work together to ensure agents operate safely, transparently, and within defined boundaries.
View all FAQs
Is LEADERBOARD.md framework-agnostic?
Yes. LEADERBOARD.md is framework and language-agnostic. It defines the policy and requirements; your agent implementation enforces it. Works with LangChain, AutoGen, CrewAI, Claude Code, custom agents, or any AI system that can read configuration files.
View all FAQs

How to Cite

Cite as: LEADERBOARD.md (2026). AI Agent Benchmarking Protocol. Retrieved from https://leaderboard.md/

For attribution: Organisation: leaderboard-md | Website: https://leaderboard.md | Licence: MIT

Last updated: 13 March 2026