Architecture

Deep dive into how runpiper works under the hood.

Overview

runpiper is an enterprise-grade AI agent runtime built in Rust. It uses code execution as the primary primitive instead of traditional tool calls, enabling significant cost and latency improvements.

Core Innovation

Traditional agent frameworks use multiple LLM calls for tool execution:

Traditional:
  LLM → tool → LLM → tool → LLM → tool
  (5-10 round trips)

runpiper:
  LLM → JavaScript code block
  (1 round trip)

Result: 97% cost reduction, 97% latency reduction for mechanical operations.

System Architecture

Request Flow

┌─────────────────────────────────────────────────────────────────┐
│                   Request Flow                                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  LLM generates JS code                                          │
│         │                                                       │
│         ▼                                                       │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  WASM/QuickJS Instance (per session)                   │    │
│  │                                                         │    │
│  │  Agent code:                                            │    │
│  │    web.search(q)     → __host("web.search", q)         │    │
│  │                             │                           │    │
│  │                             ▼                           │    │
│  │                      Rust: brave_api.search(q)          │    │
│  │                                                         │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Components

WASM Sandbox

runpiper uses a sandboxed JavaScript runtime to execute agent code safely.

Properties:

FeatureValue
Cold Start~200µs
Per Turn~20µs
Instance Size~1-5MB
IsolationStrong (WASM)

Architecture:

  • One universal WASM module compiled at startup (~1s, once)
  • One instance per active session (~200µs to create)
  • Host functions bound per-tenant with their credentials/permissions
  • Agent code executes against configured capabilities only

REPL Sessions

Variables persist across turns within a session using a persistent REPL:

// Turn 1
const results = web.search("wasm performance");

// Turn 2 - results still in scope
const page = web.fetch(results[0].url);

// Turn 3 - both still available
memory.add("Found: " + page.title);

Performance: ~20µs per turn with same context reuse.

Multi-Tenant Model

Shared:       WASM module, engine
Per-tenant:   Credentials, DB pools, rate limits
Per-session:  WASM instance, REPL state

Security is enforced at the host function binding layer. Each tenant’s SDK calls route to their resources.

Memory Model

┌─────────────────────────────────────────────────────────────────┐
│                        Memory Layers                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Session State (REPL)     │  Memory (cross-session)            │
│  ─────────────────────    │  ──────────────────────            │
│  - Variables in scope     │  - Past conversations              │
│  - Defined functions      │  - Learned facts                   │
│  - Current turn context   │  - User preferences                │
│                           │                                     │
│  Dies when session ends   │  Persists forever                  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
  • Agent explicitly stores memories via memory.add()
  • System auto-captures conversation history (configurable)

SDK Generation

The agent’s SDK is generated dynamically from capabilities configuration:

[agent.capabilities]
web = true
memory = true
database = "my-database"
api = { base_url = "https://api.acme.com", auth = "{{env.ACME_KEY}}" }
tasks = ["summarize"]

Agent sees:

// Built-in
web.search(query);
web.fetch(url);

memory.search(query);
memory.add(content, metadata);

// Database
db.query(sql, params);
db.execute(sql, params);

// HTTP
http.api.get(path);
http.api.post(path, body);

// Tasks
tasks.summarize(input);

Performance

Scaling Characteristics

1000 concurrent sessions:
  Memory: ~1-5 GB
  CPU idle: 0 (just memory)
  CPU active: ~20µs per turn
  Bottleneck: External I/O (LLM, DB, HTTP)

Cost Comparison

ScenarioLLM CallsCostLatency
Traditional (50 operations)50+~$0.16~42s
runpiper (50 operations)1~$0.005~1s

Security Model

Encryption at Rest

runpiper encrypts all sensitive data by default:

  • Input/output encryption - Task inputs, outputs, and errors encrypted before storage
  • Secrets protection - Organization secrets and webhook tokens never stored as plaintext
  • Deployment security - Deployment definitions and callback payloads encrypted
  • Envelope encryption - Per-object encryption keys using AES-256-GCM
  • Per-tenant keys - Unique encryption keys derived per organization
  • Master key isolation - Master key encrypted with passphrase, stored outside database

Key Management

  • Passphrase protection - Server requires master passphrase to start
  • Separate storage - Master key stored via disk or S3, never in database
  • Key zeroization - Keys cleared from memory after use
  • In-memory derivation - Tenant keys derived per request, never persisted

Why Encryption Matters

Traditional agent platforms store sensitive data in plaintext or use weak encryption. This means:

  • Database breaches expose all customer data, secrets, and inputs
  • Agents with vulnerabilities can access stored plaintext secrets
  • No protection against insider threats or compromised infrastructure

runpiper’s approach ensures:

  • Database alone cannot decrypt customer data
  • Agents cannot access master passphrase or encrypted master key
  • Even complete database exposure is useless without master key
  • Tenant isolation prevents cross-tenant data leakage

Sandboxing

  • WASM isolation - Strong code execution sandbox
  • Host function binding - Controlled access to external systems
  • Per-tenant isolation - Credentials bound at session level

Authentication & Authorization

  • API tokens - Secure authentication for all requests
  • Per-task permissions - Capabilities granted per task/agent
  • Organization isolation - Complete separation of tenants

Data Privacy

  • Self-hosted option - Full control over your data
  • Encryption by default - All sensitive data encrypted at rest
  • Audit logs - Complete trail of all operations

Technology Stack

Core

  • Language - Rust
  • Runtime - WASM/QuickJS for JavaScript execution
  • Database - PostgreSQL
  • API - Axum web framework

LLM Integration

  • Anthropic - Claude models
  • OpenAI - GPT models
  • OpenAI Compatible - Custom model support

Infrastructure

  • Docker - Containerization
  • PostgreSQL - Data persistence
  • Redis (optional) - Caching layer

Agent Execution Flow

1. Agent receives goal
   │
2. LLM generates plan
   │
3. LLM generates JavaScript code
   │
4. Code executes in WASM sandbox
   │
5. SDK calls route to host functions
   │
6. Host functions execute operations
   - web.search()
   - db.query()
   - http.get()
   - etc.
   │
7. Results returned to agent
   │
8. If more work needed → repeat from 3
   │
9. Otherwise → done() with final answer

System Prompt Generation

runpiper automatically generates a system prompt for agents based on:

  1. Available tools - plan, execute_code, done
  2. SDK documentation - Generated from capabilities
  3. Approach guidelines - Best practices for task completion

Example system prompt:

You are an autonomous agent. You accomplish goals by writing JavaScript code.

## Your Tools

1. **plan** - Create or update your task plan
2. **execute_code** - Run JavaScript with SDK access
3. **done** - Complete the task with final answer

## Available SDK

```javascript
web.search(query)              // Returns: [{ title, url, snippet }]
web.fetch(url)                 // Returns: { status, body }
db.query(sql, params?)         // Returns: Row[]
memory.search(query)           // Returns: [{ content, score }]
memory.add(content, metadata?) // Stores for future recall
tasks.summarize(input)         // Returns: task output
```

## Approach

1. Start with `plan` to break down the goal
2. Use `execute_code` to accomplish each step
3. Call `done` with the final answer

Variables persist across execute_code calls. Do not narrate. Use tools.

Monitoring & Observability

Metrics Collected

  • Request latency
  • LLM costs
  • Token usage
  • Error rates
  • Concurrency levels

Logging

  • Structured JSON logging
  • Request/response correlation
  • Error stack traces
  • Performance metrics

Debugging

  • Full execution traces for agents
  • SDK call logs
  • REPL state inspection
  • Session replay

Deployment Patterns

Cloud-Managed

  • Zero operational overhead
  • Automatic scaling
  • Managed updates
  • Included monitoring

Self-Hosted

  • Full data control
  • Custom configurations
  • On-premises deployment
  • Enterprise security

Next Steps