KnowledgeToolSet#
The KnowledgeToolSet provides comprehensive knowledge base management with hybrid retrieval (vector + BM25 + reranking) using Qdrant.
Overview#
Key features:
Collection Management: Create, list, and delete knowledge collections
Source Management: Add files, folders, or URLs to collections
Hybrid Retrieval: Vector + BM25 + FlashRank reranking
Metadata Extraction: Optional title, keyword, and summary extraction via LLM
Chat Configuration: Per-session active collections and auto-search settings
Async Processing: Background document indexing with progress tracking
Basic Usage#
from pantheon.agent import Agent
from pantheon.toolsets import KnowledgeToolSet
# Create knowledge toolset
knowledge_tools = KnowledgeToolSet(
name="knowledge",
config_path="path/to/config.json" # Optional
)
# Create agent and add toolset at runtime
agent = Agent(
name="researcher",
instructions="You can search and manage knowledge bases."
)
await agent.toolset(knowledge_tools)
await agent.chat()
Constructor Parameters#
Parameter |
Type |
Description |
|---|---|---|
|
str |
Name of the toolset (default: “knowledge”) |
|
str | None |
Path to configuration file. Uses default settings if not provided. |
Tools Reference#
search_knowledge#
Search the knowledge base with hybrid retrieval.
result = await knowledge_tools.search_knowledge(
query="What is machine learning?",
top_k=5, # Number of results
collection_ids=None, # Use session's active collections
use_hybrid=True # Enable hybrid retrieval
)
Parameters:
query: Search query texttop_k: Number of results to return (default: 5)collection_ids: Optional list of collection IDs. Uses session’s active collections if not specified.use_hybrid: Whether to use hybrid retrieval (default: True)
Returns:
{
"success": True,
"results": [
{
"id": "node_abc123",
"text": "Machine learning is...",
"metadata": {"source_name": "ml_intro.pdf", ...},
"score": 0.95,
"collection_id": "col_xyz789"
}
],
"searched_collections": ["col_xyz789"]
}
Collection Management (Internal)#
These methods are marked exclude=True and used by UI/API:
list_collections
result = await knowledge_tools.list_collections()
# Returns: {"success": True, "collections": [...], "total": 5}
create_collection
result = await knowledge_tools.create_collection(
name="Research Papers",
description="Academic papers collection"
)
# Returns: {"success": True, "collection": {...}}
delete_collection
result = await knowledge_tools.delete_collection(
collection_id="col_abc123"
)
# Returns: {"success": True}
Source Management (Internal)#
add_sources
# Add single source
result = await knowledge_tools.add_sources(
collection_id="col_abc123",
sources={"type": "file", "path": "/path/to/doc.pdf"}
)
# Add multiple sources
result = await knowledge_tools.add_sources(
collection_id="col_abc123",
sources=[
{"type": "file", "path": "/path/to/doc1.pdf"},
{"type": "folder", "path": "/path/to/docs/"},
]
)
# Returns: {"success": True, "source_ids": [...], "message": "..."}
Source types:
file: Single file (PDF, TXT, MD, DOCX, etc.)folder: Directory with documents (recursive scan)url: Web URL (not yet implemented)
list_sources
result = await knowledge_tools.list_sources(
collection_id="col_abc123"
)
# Returns: {"success": True, "sources": [...], "total": 3}
remove_source
result = await knowledge_tools.remove_source(
collection_id="col_abc123",
source_id="src_xyz789"
)
# Returns: {"success": True}
Chat Configuration (Internal)#
get_chat_knowledge
result = await knowledge_tools.get_chat_knowledge()
# Returns current session's knowledge config
enable_collection / disable_collection
await knowledge_tools.enable_collection(collection_id="col_abc123")
await knowledge_tools.disable_collection(collection_id="col_abc123")
set_auto_search
await knowledge_tools.set_auto_search(enabled=True)
Configuration#
The toolset uses a configuration file with the following structure:
{
"knowledge": {
"embedding": {
"model": "text-embedding-3-small"
},
"chunking": {
"chunk_size": 512,
"chunk_overlap": 50
},
"retrieval": {
"use_reranker": true,
"hybrid_alpha": 0.5
},
"metadata": {
"path": ".pantheon/knowledge/metadata.json",
"extract_title": false,
"extract_keywords": false,
"extract_summary": false
}
}
}
Architecture#
The toolset uses a multi-layer architecture:
KnowledgeToolSet: Tool interface and metadata management
VectorStoreBackend: Qdrant-based vector storage with hybrid search
LlamaIndex: Document loading, chunking, and metadata extraction
Retrieval Pipeline:
Query → Dense + Sparse embeddings
Qdrant hybrid search (vector + BM25)
FlashRank reranking
Results returned with metadata
Examples#
Building a Research Assistant#
from pantheon.agent import Agent
from pantheon.toolsets import KnowledgeToolSet, FileManagerToolSet
knowledge_tools = KnowledgeToolSet(name="knowledge")
file_tools = FileManagerToolSet(name="files")
researcher = Agent(
name="researcher",
instructions="""You are a research assistant. When asked questions:
1. Search the knowledge base for relevant information
2. Synthesize findings into a comprehensive answer
3. Cite sources from the search results"""
)
await researcher.toolset(knowledge_tools)
await researcher.toolset(file_tools)
result = await researcher.run(
"What are the key findings about neural networks?"
)
Best Practices#
Use collections for organization: Group related documents together
Enable hybrid retrieval: Combines semantic and keyword matching
Monitor indexing status: Check source status after adding
Use appropriate chunk sizes: 256-512 tokens for most use cases
Enable reranking: Improves result quality significantly
Dependencies#
Requires additional packages:
pip install qdrant-client llama-index fastembed