EvolutionToolSet#
The EvolutionToolSet provides evolutionary code optimization using iterative LLM-guided mutations and evaluations.
Overview#
Key features:
Single-File Evolution: Optimize individual code files
Codebase Evolution: Optimize multi-file projects
Island-Based Evolution: Parallel populations for diversity
Custom Evaluators: User-defined fitness functions
Progress Tracking: Monitor evolution status and history
Basic Usage#
from pantheon.agent import Agent
from pantheon.toolsets import EvolutionToolSet
# Create evolution toolset
evolution_tools = EvolutionToolSet(
name="evolution",
workdir="/path/to/workspace",
default_iterations=50,
default_islands=3
)
# Create agent and add toolset at runtime
agent = Agent(
name="optimizer",
instructions="You help users optimize their code through evolution."
)
await agent.toolset(evolution_tools)
await agent.chat()
Constructor Parameters#
Parameter |
Type |
Description |
|---|---|---|
|
str |
Name of the toolset (default: “evolution”) |
|
str | None |
Working directory for evolution workspaces |
|
int |
Default number of evolution iterations (default: 50) |
|
int |
Default number of evolution islands (default: 3) |
Tools Reference#
evolve_code#
Evolve and optimize a single piece of code.
result = await evolution_tools.evolve_code(
code="def sort(arr): ...",
evaluator_code='''
def evaluate(workspace_path):
import time
exec(open(f"{workspace_path}/main.py").read())
# Run benchmarks...
return {"combined_score": 0.85, "speed": 1.2}
''',
objective="Optimize for speed while maintaining correctness",
iterations=50,
islands=3,
model="normal"
)
Parameters:
code: The initial code to optimize (single file content)evaluator_code: Python code defining anevaluate(workspace_path)functionobjective: Natural language description of the optimization goaliterations: Maximum iterations (default: 50)islands: Number of parallel populations (default: 3)model: Model for mutation generation
Returns:
{
"success": True,
"best_code": "def sort(arr): ...", # Optimized code
"best_score": 0.95,
"initial_score": 0.70,
"improvement": 0.25,
"total_iterations": 45,
"improvements_found": 12,
"summary": "Evolution completed..."
}
evolve_codebase#
Evolve an entire multi-file codebase.
result = await evolution_tools.evolve_codebase(
codebase_path="/path/to/project",
evaluator_code='''
def evaluate(workspace_path):
import subprocess
result = subprocess.run(
["pytest", workspace_path],
capture_output=True
)
passed = result.returncode == 0
return {"combined_score": 1.0 if passed else 0.0}
''',
objective="Improve test coverage and performance",
include_patterns=["**/*.py"],
output_path="/path/to/output"
)
Parameters:
codebase_path: Path to the directory containing the codebaseevaluator_code: Python code defining anevaluate(workspace_path)functionobjective: Natural language description of the optimization goalinclude_patterns: Glob patterns for files to include (default: [”**/*.py”])iterations: Maximum iterationsislands: Number of parallel populationsmodel: Model for mutation generationoutput_path: Optional path to save the best result
Returns:
{
"success": True,
"best_score": 0.92,
"initial_score": 0.75,
"improvement": 0.17,
"total_iterations": 50,
"improvements_found": 8,
"files_evolved": ["src/main.py", "src/utils.py"],
"output_path": "/path/to/output",
"summary": "Codebase evolution completed..."
}
get_evolution_status#
Get the status of a saved evolution database.
result = await evolution_tools.get_evolution_status(
database_path="/path/to/evolution.db"
)
Returns:
{
"success": True,
"total_programs": 150,
"best_score": 0.95,
"avg_fitness": 0.82,
"num_islands": 3,
"archive_size": 25,
"best_program_id": "prog_abc123",
"best_generation": 42
}
Writing Evaluators#
Evaluators are Python functions that score code quality:
def evaluate(workspace_path):
"""
Evaluate code quality.
Args:
workspace_path: Directory containing the code to evaluate
Returns:
dict with at least "combined_score" (0-1 scale)
"""
# Read the code
with open(f"{workspace_path}/main.py") as f:
code = f.read()
# Run tests, benchmarks, or analysis
score = 0.0
# Check correctness
try:
exec(code)
score += 0.5
except:
pass
# Measure performance
import time
start = time.time()
# ... run benchmark ...
elapsed = time.time() - start
if elapsed < 0.1:
score += 0.3
# Check code quality
if len(code) < 1000: # Prefer concise code
score += 0.2
return {
"combined_score": score, # Required
"correctness": 0.5, # Optional metrics
"speed": elapsed,
"complexity": len(code)
}
Examples#
Optimizing a Sorting Algorithm#
initial_code = '''
def sort(arr):
for i in range(len(arr)):
for j in range(i + 1, len(arr)):
if arr[i] > arr[j]:
arr[i], arr[j] = arr[j], arr[i]
return arr
'''
evaluator = '''
def evaluate(workspace_path):
import time
import random
exec(open(f"{workspace_path}/main.py").read(), globals())
# Test correctness
test_cases = [
([3, 1, 2], [1, 2, 3]),
([5, 4, 3, 2, 1], [1, 2, 3, 4, 5]),
([], []),
]
correct = all(sort(list(t[0])) == t[1] for t in test_cases)
if not correct:
return {"combined_score": 0.0}
# Benchmark speed
test_data = [random.randint(0, 1000) for _ in range(1000)]
start = time.time()
sort(test_data)
elapsed = time.time() - start
# Score: faster is better (0.01s = 1.0, 1s = 0.1)
speed_score = min(1.0, 0.01 / elapsed)
return {"combined_score": speed_score, "time": elapsed}
'''
result = await evolution_tools.evolve_code(
code=initial_code,
evaluator_code=evaluator,
objective="Optimize sorting speed while maintaining correctness",
iterations=100
)
Best Practices#
Start with working code: Initial code should pass basic tests
Clear objectives: Be specific about optimization goals
Robust evaluators: Handle errors gracefully in evaluator code
Use multiple islands: Increases diversity and exploration
Monitor progress: Check intermediate results via evolution status
Save outputs: Use
output_pathto preserve best results
Security Warning#
Evolution executes arbitrary code during evaluation. Always:
Run in sandboxed environments (Docker, VM)
Limit evaluator permissions
Monitor resource usage
Avoid untrusted code inputs