Feeding the Agents: Bulk Ingestion and the MCP Marketplace

The strongest agent with the best model still fails the moment a user asks about a document it has never seen. Enterprise knowledge does not live in chat transcripts — it lives in file shares, shared mailboxes, and line-of-business systems. Closing that gap has been one of our largest investments over the last quarter.

Two systems now carry most of that weight: bulk ingestion and the MCP marketplace.

Bulk Ingestion from the Systems You Already Use

Athena now ingests documents at scale from three canonical enterprise sources: OneDrive folders, SharePoint libraries, and Microsoft 365 email inboxes. Administrators point the platform at a folder or a mailbox, choose whether to poll on a schedule or sync once, and Athena handles the rest — browsing the drive, enumerating files, and chunking content for indexing.

Supported formats include PDF, Word (.docx), PowerPoint (including extracted images), plain text, and Markdown. Every chunk is embedded via Azure OpenAI's text-embedding-3-large and made searchable through hybrid vector plus full-text search. The same ingestion pass auto-generates knowledge-graph entities, so facts extracted from a SharePoint policy document immediately become queryable through the graph as well as through semantic search.

Privacy by Default

Polling a shared mailbox is a powerful capability and a serious responsibility. Athena treats it accordingly. Email polling is opt-in per connection — there is an explicit autoPoll checkbox that administrators must set before the platform will read a single message. Scheduled sync runs surface prominently in the ingestion dashboard with pause, resume, and purge controls, and the audit log captures every fetch.

Incremental sync handles the long tail. Once a source is initially ingested, subsequent runs only process new or changed content, deduplicated at the message and file level. You pay for the first full crawl, then almost nothing thereafter.

The MCP Marketplace

Bulk ingestion solves read access. Action takes a different system entirely. For that, we adopted the Model Context Protocol — the emerging open standard for wiring tools into AI agents — and built a marketplace on top of it.

The marketplace is a paginated, governed registry of MCP-compatible skills: CRM connectors for SuiteCRM, HubSpot, and DealCloud; email and calendar actions; file operations; custom internal tools your team publishes for shared use. Administrators browse, preview, and install skills without a deployment cycle. Skills carry their own capability manifests, so once installed, agents can be granted access to specific tools rather than the entire skill surface.

Test Before You Trust

Installing a new skill used to mean wiring credentials in production and hoping for the best. The marketplace now includes a Test Configuration flow that runs the skill against ephemeral credential overrides before committing to a permanent deployment. Administrators can validate that a CRM integration actually reaches the right tenant, that an email skill resolves to the right mailbox, and that the tool calls are shaped the way the agent expects — all before a single end user touches it.

Per-Agent Tool Selection

A cluster of ten specialist agents does not need the same ten tools on every agent. Athena now supports per-MCP-server tool selection at the agent level: when you grant a skill to an agent, you choose exactly which of its tools that agent can invoke. The result is tighter blast radius, clearer agent identities, and fewer accidental tool invocations muddying the neural-path trace.

Putting It Together

An enterprise AI platform is only as good as its context and its reach. Bulk ingestion extends Athena's context across the systems your teams already rely on. The MCP marketplace extends its reach without forcing you to wait for a vendor release cycle. Together they turn Athena from a chat surface into a platform your organization's knowledge and workflows actually flow through.

And every step — the ingestion runs, the test configurations, the skill installations, the per-agent tool grants — is captured in the audit log. The system is powerful because it can see and do a great deal. It is trustworthy because every action it takes is governed and reviewable.