KnowledgeToolSet

KnowledgeToolSet#

The KnowledgeToolSet provides comprehensive knowledge base management with hybrid retrieval (vector + BM25 + reranking) using Qdrant.

Overview#

Key features:

Collection Management: Create, list, and delete knowledge collections
Source Management: Add files, folders, or URLs to collections
Hybrid Retrieval: Vector + BM25 + FlashRank reranking
Metadata Extraction: Optional title, keyword, and summary extraction via LLM
Chat Configuration: Per-session active collections and auto-search settings
Async Processing: Background document indexing with progress tracking

Basic Usage#

from pantheon.agent import Agent
from pantheon.toolsets import KnowledgeToolSet

# Create knowledge toolset
knowledge_tools = KnowledgeToolSet(
    name="knowledge",
    config_path="path/to/config.json"  # Optional
)

# Create agent and add toolset at runtime
agent = Agent(
    name="researcher",
    instructions="You can search and manage knowledge bases."
)
await agent.toolset(knowledge_tools)

await agent.chat()

Constructor Parameters#

Parameter	Type	Description
`name`	str	Name of the toolset (default: “knowledge”)
`config_path`	str \| None	Path to configuration file. Uses default settings if not provided.

Tools Reference#

search_knowledge#

Search the knowledge base with hybrid retrieval.

result = await knowledge_tools.search_knowledge(
    query="What is machine learning?",
    top_k=5,                    # Number of results
    collection_ids=None,        # Use session's active collections
    use_hybrid=True             # Enable hybrid retrieval
)

Parameters:

query: Search query text
top_k: Number of results to return (default: 5)
collection_ids: Optional list of collection IDs. Uses session’s active collections if not specified.
use_hybrid: Whether to use hybrid retrieval (default: True)

Returns:

{
    "success": True,
    "results": [
        {
            "id": "node_abc123",
            "text": "Machine learning is...",
            "metadata": {"source_name": "ml_intro.pdf", ...},
            "score": 0.95,
            "collection_id": "col_xyz789"
        }
    ],
    "searched_collections": ["col_xyz789"]
}

Collection Management (Internal)#

These methods are marked exclude=True and used by UI/API:

list_collections

result = await knowledge_tools.list_collections()
# Returns: {"success": True, "collections": [...], "total": 5}

create_collection

result = await knowledge_tools.create_collection(
    name="Research Papers",
    description="Academic papers collection"
)
# Returns: {"success": True, "collection": {...}}

delete_collection

result = await knowledge_tools.delete_collection(
    collection_id="col_abc123"
)
# Returns: {"success": True}

Source Management (Internal)#

add_sources

# Add single source
result = await knowledge_tools.add_sources(
    collection_id="col_abc123",
    sources={"type": "file", "path": "/path/to/doc.pdf"}
)

# Add multiple sources
result = await knowledge_tools.add_sources(
    collection_id="col_abc123",
    sources=[
        {"type": "file", "path": "/path/to/doc1.pdf"},
        {"type": "folder", "path": "/path/to/docs/"},
    ]
)
# Returns: {"success": True, "source_ids": [...], "message": "..."}

Source types:

file: Single file (PDF, TXT, MD, DOCX, etc.)
folder: Directory with documents (recursive scan)
url: Web URL (not yet implemented)

list_sources

result = await knowledge_tools.list_sources(
    collection_id="col_abc123"
)
# Returns: {"success": True, "sources": [...], "total": 3}

remove_source

result = await knowledge_tools.remove_source(
    collection_id="col_abc123",
    source_id="src_xyz789"
)
# Returns: {"success": True}

Chat Configuration (Internal)#

get_chat_knowledge

result = await knowledge_tools.get_chat_knowledge()
# Returns current session's knowledge config

enable_collection / disable_collection

await knowledge_tools.enable_collection(collection_id="col_abc123")
await knowledge_tools.disable_collection(collection_id="col_abc123")

set_auto_search

await knowledge_tools.set_auto_search(enabled=True)

Configuration#

The toolset uses a configuration file with the following structure:

{
  "knowledge": {
    "embedding": {
      "model": "text-embedding-3-small"
    },
    "chunking": {
      "chunk_size": 512,
      "chunk_overlap": 50
    },
    "retrieval": {
      "use_reranker": true,
      "hybrid_alpha": 0.5
    },
    "metadata": {
      "path": ".pantheon/knowledge/metadata.json",
      "extract_title": false,
      "extract_keywords": false,
      "extract_summary": false
    }
  }
}

Architecture#

The toolset uses a multi-layer architecture:

KnowledgeToolSet: Tool interface and metadata management
VectorStoreBackend: Qdrant-based vector storage with hybrid search
LlamaIndex: Document loading, chunking, and metadata extraction

Retrieval Pipeline:

Query → Dense + Sparse embeddings
Qdrant hybrid search (vector + BM25)
FlashRank reranking
Results returned with metadata

Examples#

Building a Research Assistant#

from pantheon.agent import Agent
from pantheon.toolsets import KnowledgeToolSet, FileManagerToolSet

knowledge_tools = KnowledgeToolSet(name="knowledge")
file_tools = FileManagerToolSet(name="files")

researcher = Agent(
    name="researcher",
    instructions="""You are a research assistant. When asked questions:
    1. Search the knowledge base for relevant information
    2. Synthesize findings into a comprehensive answer
    3. Cite sources from the search results"""
)
await researcher.toolset(knowledge_tools)
await researcher.toolset(file_tools)

result = await researcher.run(
    "What are the key findings about neural networks?"
)

Best Practices#

Use collections for organization: Group related documents together
Enable hybrid retrieval: Combines semantic and keyword matching
Monitor indexing status: Check source status after adding
Use appropriate chunk sizes: 256-512 tokens for most use cases
Enable reranking: Improves result quality significantly

Dependencies#

Requires additional packages:

pip install qdrant-client llama-index fastembed