EvolutionToolSet#

The EvolutionToolSet provides evolutionary code optimization using iterative LLM-guided mutations and evaluations.

Overview#

Key features:

  • Single-File Evolution: Optimize individual code files

  • Codebase Evolution: Optimize multi-file projects

  • Island-Based Evolution: Parallel populations for diversity

  • Custom Evaluators: User-defined fitness functions

  • Progress Tracking: Monitor evolution status and history

Basic Usage#

from pantheon.agent import Agent
from pantheon.toolsets import EvolutionToolSet

# Create evolution toolset
evolution_tools = EvolutionToolSet(
    name="evolution",
    workdir="/path/to/workspace",
    default_iterations=50,
    default_islands=3
)

# Create agent and add toolset at runtime
agent = Agent(
    name="optimizer",
    instructions="You help users optimize their code through evolution."
)
await agent.toolset(evolution_tools)

await agent.chat()

Constructor Parameters#

Parameter

Type

Description

name

str

Name of the toolset (default: “evolution”)

workdir

str | None

Working directory for evolution workspaces

default_iterations

int

Default number of evolution iterations (default: 50)

default_islands

int

Default number of evolution islands (default: 3)

Tools Reference#

evolve_code#

Evolve and optimize a single piece of code.

result = await evolution_tools.evolve_code(
    code="def sort(arr): ...",
    evaluator_code='''
def evaluate(workspace_path):
    import time
    exec(open(f"{workspace_path}/main.py").read())
    # Run benchmarks...
    return {"combined_score": 0.85, "speed": 1.2}
''',
    objective="Optimize for speed while maintaining correctness",
    iterations=50,
    islands=3,
    model="normal"
)

Parameters:

  • code: The initial code to optimize (single file content)

  • evaluator_code: Python code defining an evaluate(workspace_path) function

  • objective: Natural language description of the optimization goal

  • iterations: Maximum iterations (default: 50)

  • islands: Number of parallel populations (default: 3)

  • model: Model for mutation generation

Returns:

{
    "success": True,
    "best_code": "def sort(arr): ...",  # Optimized code
    "best_score": 0.95,
    "initial_score": 0.70,
    "improvement": 0.25,
    "total_iterations": 45,
    "improvements_found": 12,
    "summary": "Evolution completed..."
}

evolve_codebase#

Evolve an entire multi-file codebase.

result = await evolution_tools.evolve_codebase(
    codebase_path="/path/to/project",
    evaluator_code='''
def evaluate(workspace_path):
    import subprocess
    result = subprocess.run(
        ["pytest", workspace_path],
        capture_output=True
    )
    passed = result.returncode == 0
    return {"combined_score": 1.0 if passed else 0.0}
''',
    objective="Improve test coverage and performance",
    include_patterns=["**/*.py"],
    output_path="/path/to/output"
)

Parameters:

  • codebase_path: Path to the directory containing the codebase

  • evaluator_code: Python code defining an evaluate(workspace_path) function

  • objective: Natural language description of the optimization goal

  • include_patterns: Glob patterns for files to include (default: [”**/*.py”])

  • iterations: Maximum iterations

  • islands: Number of parallel populations

  • model: Model for mutation generation

  • output_path: Optional path to save the best result

Returns:

{
    "success": True,
    "best_score": 0.92,
    "initial_score": 0.75,
    "improvement": 0.17,
    "total_iterations": 50,
    "improvements_found": 8,
    "files_evolved": ["src/main.py", "src/utils.py"],
    "output_path": "/path/to/output",
    "summary": "Codebase evolution completed..."
}

get_evolution_status#

Get the status of a saved evolution database.

result = await evolution_tools.get_evolution_status(
    database_path="/path/to/evolution.db"
)

Returns:

{
    "success": True,
    "total_programs": 150,
    "best_score": 0.95,
    "avg_fitness": 0.82,
    "num_islands": 3,
    "archive_size": 25,
    "best_program_id": "prog_abc123",
    "best_generation": 42
}

Writing Evaluators#

Evaluators are Python functions that score code quality:

def evaluate(workspace_path):
    """
    Evaluate code quality.

    Args:
        workspace_path: Directory containing the code to evaluate

    Returns:
        dict with at least "combined_score" (0-1 scale)
    """
    # Read the code
    with open(f"{workspace_path}/main.py") as f:
        code = f.read()

    # Run tests, benchmarks, or analysis
    score = 0.0

    # Check correctness
    try:
        exec(code)
        score += 0.5
    except:
        pass

    # Measure performance
    import time
    start = time.time()
    # ... run benchmark ...
    elapsed = time.time() - start

    if elapsed < 0.1:
        score += 0.3

    # Check code quality
    if len(code) < 1000:  # Prefer concise code
        score += 0.2

    return {
        "combined_score": score,  # Required
        "correctness": 0.5,       # Optional metrics
        "speed": elapsed,
        "complexity": len(code)
    }

Examples#

Optimizing a Sorting Algorithm#

initial_code = '''
def sort(arr):
    for i in range(len(arr)):
        for j in range(i + 1, len(arr)):
            if arr[i] > arr[j]:
                arr[i], arr[j] = arr[j], arr[i]
    return arr
'''

evaluator = '''
def evaluate(workspace_path):
    import time
    import random

    exec(open(f"{workspace_path}/main.py").read(), globals())

    # Test correctness
    test_cases = [
        ([3, 1, 2], [1, 2, 3]),
        ([5, 4, 3, 2, 1], [1, 2, 3, 4, 5]),
        ([], []),
    ]

    correct = all(sort(list(t[0])) == t[1] for t in test_cases)
    if not correct:
        return {"combined_score": 0.0}

    # Benchmark speed
    test_data = [random.randint(0, 1000) for _ in range(1000)]
    start = time.time()
    sort(test_data)
    elapsed = time.time() - start

    # Score: faster is better (0.01s = 1.0, 1s = 0.1)
    speed_score = min(1.0, 0.01 / elapsed)

    return {"combined_score": speed_score, "time": elapsed}
'''

result = await evolution_tools.evolve_code(
    code=initial_code,
    evaluator_code=evaluator,
    objective="Optimize sorting speed while maintaining correctness",
    iterations=100
)

Best Practices#

  1. Start with working code: Initial code should pass basic tests

  2. Clear objectives: Be specific about optimization goals

  3. Robust evaluators: Handle errors gracefully in evaluator code

  4. Use multiple islands: Increases diversity and exploration

  5. Monitor progress: Check intermediate results via evolution status

  6. Save outputs: Use output_path to preserve best results

Security Warning#

Evolution executes arbitrary code during evaluation. Always:

  • Run in sandboxed environments (Docker, VM)

  • Limit evaluator permissions

  • Monitor resource usage

  • Avoid untrusted code inputs