WebToolSet#

The WebToolSet provides web search and content retrieval capabilities through DuckDuckGo search and web crawling using crawl4ai.

Overview#

Key features:

  • Web Search: Search the web using DuckDuckGo (no API key required)

  • Web Crawling: Fetch and extract content from URLs as markdown

  • Concurrent Fetching: Fetch multiple URLs in parallel

  • Timeout Support: Configurable timeouts for reliable operation

Basic Usage#

from pantheon.agent import Agent
from pantheon.toolsets import WebToolSet

# Create web toolset
web_tools = WebToolSet(name="web")

# Create agent and add toolset at runtime
agent = Agent(
    name="researcher",
    instructions="You can search the web and analyze content."
)
await agent.toolset(web_tools)

await agent.chat()

Constructor Parameters#

Parameter

Type

Description

name

str

Name of the toolset

Tools Reference#

web_crawl#

Fetch and extract content from URLs as markdown.

contents = await web_tools.web_crawl(
    urls=["https://example.com/page1", "https://example.com/page2"],
    timeout=20.0          # Optional: default 20 seconds
)

Parameters:

  • urls: List of URLs to crawl

  • timeout: Request timeout in seconds (default: 20.0)

Returns:

List of markdown content from each URL:

[
    "# Page 1 Title\n\nContent from page 1...",
    "# Page 2 Title\n\nContent from page 2...",
]

Examples#

Research Workflow#

from pantheon.agent import Agent
from pantheon.toolsets import WebToolSet

web_tools = WebToolSet(name="web")

research_agent = Agent(
    name="researcher",
    instructions="""Conduct thorough research:
    1. Search for multiple sources
    2. Visit relevant links
    3. Extract and synthesize information
    4. Provide comprehensive analysis with citations"""
)
await research_agent.toolset(web_tools)

result = await research_agent.run(
    "Research the impact of AI on healthcare with at least 3 sources"
)

News Monitoring#

news_agent = Agent(
    name="news_monitor",
    instructions="""Monitor news on specific topics:
    1. Search for recent news using time_limit
    2. Fetch full articles
    3. Summarize key developments"""
)
await news_agent.toolset(web_tools)

# Find news from the past week
result = await news_agent.run(
    "Find news about renewable energy from the past week"
)

Content Extraction#

# Fetch multiple pages concurrently
contents = await web_tools.web_crawl(
    urls=[
        "https://docs.python.org/3/tutorial/",
        "https://numpy.org/doc/stable/user/",
        "https://pandas.pydata.org/docs/getting_started/"
    ],
    timeout=30.0
)

# contents[0] = Python tutorial markdown
# contents[1] = NumPy docs markdown
# contents[2] = Pandas docs markdown

Best Practices#

  1. Use time_limit for recent content: Filter results by recency

  2. Batch URL fetches: Use web_crawl with multiple URLs for efficiency

  3. Set appropriate timeouts: Longer for complex pages

  4. Verify sources: Cross-reference information from multiple sources

  5. Handle empty results: Some pages may fail to load

Limitations#

  • JavaScript-heavy sites may not render fully (uses crawl4ai)

  • Some sites block automated access

  • Rate limits may apply to DuckDuckGo

  • Cannot handle authentication-required pages

  • Limited to publicly accessible content