WebToolSet

WebToolSet#

The WebToolSet provides web search and content retrieval capabilities through DuckDuckGo search and web crawling using crawl4ai.

Overview#

Key features:

Web Search: Search the web using DuckDuckGo (no API key required)
Web Crawling: Fetch and extract content from URLs as markdown
Concurrent Fetching: Fetch multiple URLs in parallel
Timeout Support: Configurable timeouts for reliable operation

Basic Usage#

from pantheon.agent import Agent
from pantheon.toolsets import WebToolSet

# Create web toolset
web_tools = WebToolSet(name="web")

# Create agent and add toolset at runtime
agent = Agent(
    name="researcher",
    instructions="You can search the web and analyze content."
)
await agent.toolset(web_tools)

await agent.chat()

Constructor Parameters#

Parameter	Type	Description
`name`	str	Name of the toolset

Tools Reference#

duckduckgo_search#

Search the web using DuckDuckGo.

results = await web_tools.duckduckgo_search(
    query="machine learning tutorials",
    max_results=10,       # Optional: default 10
    time_limit="w"        # Optional: d/w/m/y for day/week/month/year
)

Parameters:

query: The search query
max_results: Maximum number of results (default: 10)
time_limit: Time filter - “d” (day), “w” (week), “m” (month), “y” (year)

Returns:

List of search results with title, href, and body:

[
    {
        "title": "Introduction to Machine Learning",
        "href": "https://example.com/ml-intro",
        "body": "A comprehensive guide to..."
    },
    ...
]

web_crawl#

Fetch and extract content from URLs as markdown.

contents = await web_tools.web_crawl(
    urls=["https://example.com/page1", "https://example.com/page2"],
    timeout=20.0          # Optional: default 20 seconds
)

Parameters:

urls: List of URLs to crawl
timeout: Request timeout in seconds (default: 20.0)

Returns:

List of markdown content from each URL:

[
    "# Page 1 Title\n\nContent from page 1...",
    "# Page 2 Title\n\nContent from page 2...",
]

Examples#

Research Workflow#

from pantheon.agent import Agent
from pantheon.toolsets import WebToolSet

web_tools = WebToolSet(name="web")

research_agent = Agent(
    name="researcher",
    instructions="""Conduct thorough research:
    1. Search for multiple sources
    2. Visit relevant links
    3. Extract and synthesize information
    4. Provide comprehensive analysis with citations"""
)
await research_agent.toolset(web_tools)

result = await research_agent.run(
    "Research the impact of AI on healthcare with at least 3 sources"
)

News Monitoring#

news_agent = Agent(
    name="news_monitor",
    instructions="""Monitor news on specific topics:
    1. Search for recent news using time_limit
    2. Fetch full articles
    3. Summarize key developments"""
)
await news_agent.toolset(web_tools)

# Find news from the past week
result = await news_agent.run(
    "Find news about renewable energy from the past week"
)

Content Extraction#

# Fetch multiple pages concurrently
contents = await web_tools.web_crawl(
    urls=[
        "https://docs.python.org/3/tutorial/",
        "https://numpy.org/doc/stable/user/",
        "https://pandas.pydata.org/docs/getting_started/"
    ],
    timeout=30.0
)

# contents[0] = Python tutorial markdown
# contents[1] = NumPy docs markdown
# contents[2] = Pandas docs markdown

Best Practices#

Use time_limit for recent content: Filter results by recency
Batch URL fetches: Use web_crawl with multiple URLs for efficiency
Set appropriate timeouts: Longer for complex pages
Verify sources: Cross-reference information from multiple sources
Handle empty results: Some pages may fail to load

Limitations#

JavaScript-heavy sites may not render fully (uses crawl4ai)
Some sites block automated access
Rate limits may apply to DuckDuckGo
Cannot handle authentication-required pages
Limited to publicly accessible content