KnowledgeToolSet#

The KnowledgeToolSet provides comprehensive knowledge base management with hybrid retrieval (vector + BM25 + reranking) using Qdrant.

Overview#

Key features:

  • Collection Management: Create, list, and delete knowledge collections

  • Source Management: Add files, folders, or URLs to collections

  • Hybrid Retrieval: Vector + BM25 + FlashRank reranking

  • Metadata Extraction: Optional title, keyword, and summary extraction via LLM

  • Chat Configuration: Per-session active collections and auto-search settings

  • Async Processing: Background document indexing with progress tracking

Basic Usage#

from pantheon.agent import Agent
from pantheon.toolsets import KnowledgeToolSet

# Create knowledge toolset
knowledge_tools = KnowledgeToolSet(
    name="knowledge",
    config_path="path/to/config.json"  # Optional
)

# Create agent and add toolset at runtime
agent = Agent(
    name="researcher",
    instructions="You can search and manage knowledge bases."
)
await agent.toolset(knowledge_tools)

await agent.chat()

Constructor Parameters#

Parameter

Type

Description

name

str

Name of the toolset (default: “knowledge”)

config_path

str | None

Path to configuration file. Uses default settings if not provided.

Tools Reference#

search_knowledge#

Search the knowledge base with hybrid retrieval.

result = await knowledge_tools.search_knowledge(
    query="What is machine learning?",
    top_k=5,                    # Number of results
    collection_ids=None,        # Use session's active collections
    use_hybrid=True             # Enable hybrid retrieval
)

Parameters:

  • query: Search query text

  • top_k: Number of results to return (default: 5)

  • collection_ids: Optional list of collection IDs. Uses session’s active collections if not specified.

  • use_hybrid: Whether to use hybrid retrieval (default: True)

Returns:

{
    "success": True,
    "results": [
        {
            "id": "node_abc123",
            "text": "Machine learning is...",
            "metadata": {"source_name": "ml_intro.pdf", ...},
            "score": 0.95,
            "collection_id": "col_xyz789"
        }
    ],
    "searched_collections": ["col_xyz789"]
}

Collection Management (Internal)#

These methods are marked exclude=True and used by UI/API:

list_collections

result = await knowledge_tools.list_collections()
# Returns: {"success": True, "collections": [...], "total": 5}

create_collection

result = await knowledge_tools.create_collection(
    name="Research Papers",
    description="Academic papers collection"
)
# Returns: {"success": True, "collection": {...}}

delete_collection

result = await knowledge_tools.delete_collection(
    collection_id="col_abc123"
)
# Returns: {"success": True}

Source Management (Internal)#

add_sources

# Add single source
result = await knowledge_tools.add_sources(
    collection_id="col_abc123",
    sources={"type": "file", "path": "/path/to/doc.pdf"}
)

# Add multiple sources
result = await knowledge_tools.add_sources(
    collection_id="col_abc123",
    sources=[
        {"type": "file", "path": "/path/to/doc1.pdf"},
        {"type": "folder", "path": "/path/to/docs/"},
    ]
)
# Returns: {"success": True, "source_ids": [...], "message": "..."}

Source types:

  • file: Single file (PDF, TXT, MD, DOCX, etc.)

  • folder: Directory with documents (recursive scan)

  • url: Web URL (not yet implemented)

list_sources

result = await knowledge_tools.list_sources(
    collection_id="col_abc123"
)
# Returns: {"success": True, "sources": [...], "total": 3}

remove_source

result = await knowledge_tools.remove_source(
    collection_id="col_abc123",
    source_id="src_xyz789"
)
# Returns: {"success": True}

Chat Configuration (Internal)#

get_chat_knowledge

result = await knowledge_tools.get_chat_knowledge()
# Returns current session's knowledge config

enable_collection / disable_collection

await knowledge_tools.enable_collection(collection_id="col_abc123")
await knowledge_tools.disable_collection(collection_id="col_abc123")

set_auto_search

await knowledge_tools.set_auto_search(enabled=True)

Configuration#

The toolset uses a configuration file with the following structure:

{
  "knowledge": {
    "embedding": {
      "model": "text-embedding-3-small"
    },
    "chunking": {
      "chunk_size": 512,
      "chunk_overlap": 50
    },
    "retrieval": {
      "use_reranker": true,
      "hybrid_alpha": 0.5
    },
    "metadata": {
      "path": ".pantheon/knowledge/metadata.json",
      "extract_title": false,
      "extract_keywords": false,
      "extract_summary": false
    }
  }
}

Architecture#

The toolset uses a multi-layer architecture:

  1. KnowledgeToolSet: Tool interface and metadata management

  2. VectorStoreBackend: Qdrant-based vector storage with hybrid search

  3. LlamaIndex: Document loading, chunking, and metadata extraction

Retrieval Pipeline:

  1. Query → Dense + Sparse embeddings

  2. Qdrant hybrid search (vector + BM25)

  3. FlashRank reranking

  4. Results returned with metadata

Examples#

Building a Research Assistant#

from pantheon.agent import Agent
from pantheon.toolsets import KnowledgeToolSet, FileManagerToolSet

knowledge_tools = KnowledgeToolSet(name="knowledge")
file_tools = FileManagerToolSet(name="files")

researcher = Agent(
    name="researcher",
    instructions="""You are a research assistant. When asked questions:
    1. Search the knowledge base for relevant information
    2. Synthesize findings into a comprehensive answer
    3. Cite sources from the search results"""
)
await researcher.toolset(knowledge_tools)
await researcher.toolset(file_tools)

result = await researcher.run(
    "What are the key findings about neural networks?"
)

Best Practices#

  1. Use collections for organization: Group related documents together

  2. Enable hybrid retrieval: Combines semantic and keyword matching

  3. Monitor indexing status: Check source status after adding

  4. Use appropriate chunk sizes: 256-512 tokens for most use cases

  5. Enable reranking: Improves result quality significantly

Dependencies#

Requires additional packages:

pip install qdrant-client llama-index fastembed