Architecture

Deep dive into how runpiper works under the hood.

Overview

runpiper is an enterprise-grade AI agent runtime built in Rust. It uses code execution as the primary primitive instead of traditional tool calls, enabling significant cost and latency improvements.

Core Innovation

Traditional agent frameworks use multiple LLM calls for tool execution:

Traditional:
  LLM → tool → LLM → tool → LLM → tool
  (5-10 round trips)

runpiper:
  LLM → JavaScript code block
  (1 round trip)

Result: 97% cost reduction, 97% latency reduction for mechanical operations.

System Architecture

Request Flow

┌─────────────────────────────────────────────────────────────────┐
│                   Request Flow                                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  LLM generates JS code                                          │
│         │                                                       │
│         ▼                                                       │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  WASM/QuickJS Instance (per session)                   │    │
│  │                                                         │    │
│  │  Agent code:                                            │    │
│  │    web.search(q)     → __host("web.search", q)         │    │
│  │                             │                           │    │
│  │                             ▼                           │    │
│  │                      Rust: brave_api.search(q)          │    │
│  │                                                         │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Components

WASM Sandbox

runpiper uses a sandboxed JavaScript runtime to execute agent code safely.

Properties:

Feature	Value
Cold Start	~200µs
Per Turn	~20µs
Instance Size	~1-5MB
Isolation	Strong (WASM)

Architecture:

One universal WASM module compiled at startup (~1s, once)
One instance per active session (~200µs to create)
Host functions bound per-tenant with their credentials/permissions
Agent code executes against configured capabilities only

REPL Sessions

Variables persist across turns within a session using a persistent REPL:

// Turn 1
const results = web.search("wasm performance");

// Turn 2 - results still in scope
const page = web.fetch(results[0].url);

// Turn 3 - both still available
memory.add("Found: " + page.title);

Performance: ~20µs per turn with same context reuse.

Multi-Tenant Model

Shared:       WASM module, engine
Per-tenant:   Credentials, DB pools, rate limits
Per-session:  WASM instance, REPL state

Security is enforced at the host function binding layer. Each tenant’s SDK calls route to their resources.

Memory Model

┌─────────────────────────────────────────────────────────────────┐
│                        Memory Layers                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Session State (REPL)     │  Memory (cross-session)            │
│  ─────────────────────    │  ──────────────────────            │
│  - Variables in scope     │  - Past conversations              │
│  - Defined functions      │  - Learned facts                   │
│  - Current turn context   │  - User preferences                │
│                           │                                     │
│  Dies when session ends   │  Persists forever                  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Agent explicitly stores memories via memory.add()
System auto-captures conversation history (configurable)

SDK Generation

The agent’s SDK is generated dynamically from capabilities configuration:

[agent.capabilities]
web = true
memory = true
database = "my-database"
api = { base_url = "https://api.acme.com", auth = "{{env.ACME_KEY}}" }
tasks = ["summarize"]

Agent sees:

// Built-in
web.search(query);
web.fetch(url);

memory.search(query);
memory.add(content, metadata);

// Database
db.query(sql, params);
db.execute(sql, params);

// HTTP
http.api.get(path);
http.api.post(path, body);

// Tasks
tasks.summarize(input);

Performance

Scaling Characteristics

1000 concurrent sessions:
  Memory: ~1-5 GB
  CPU idle: 0 (just memory)
  CPU active: ~20µs per turn
  Bottleneck: External I/O (LLM, DB, HTTP)

Cost Comparison

Scenario	LLM Calls	Cost	Latency
Traditional (50 operations)	50+	~$0.16	~42s
runpiper (50 operations)	1	~$0.005	~1s

Security Model

Encryption at Rest

runpiper encrypts all sensitive data by default:

Input/output encryption - Task inputs, outputs, and errors encrypted before storage
Secrets protection - Organization secrets and webhook tokens never stored as plaintext
Deployment security - Deployment definitions and callback payloads encrypted
Envelope encryption - Per-object encryption keys using AES-256-GCM
Per-tenant keys - Unique encryption keys derived per organization
Master key isolation - Master key encrypted with passphrase, stored outside database

Key Management

Passphrase protection - Server requires master passphrase to start
Separate storage - Master key stored via disk or S3, never in database
Key zeroization - Keys cleared from memory after use
In-memory derivation - Tenant keys derived per request, never persisted

Why Encryption Matters

Traditional agent platforms store sensitive data in plaintext or use weak encryption. This means:

Database breaches expose all customer data, secrets, and inputs
Agents with vulnerabilities can access stored plaintext secrets
No protection against insider threats or compromised infrastructure

runpiper’s approach ensures:

Database alone cannot decrypt customer data
Agents cannot access master passphrase or encrypted master key
Even complete database exposure is useless without master key
Tenant isolation prevents cross-tenant data leakage

Sandboxing

WASM isolation - Strong code execution sandbox
Host function binding - Controlled access to external systems
Per-tenant isolation - Credentials bound at session level

Authentication & Authorization

API tokens - Secure authentication for all requests
Per-task permissions - Capabilities granted per task/agent
Organization isolation - Complete separation of tenants

Data Privacy

Self-hosted option - Full control over your data
Encryption by default - All sensitive data encrypted at rest
Audit logs - Complete trail of all operations

Technology Stack

Core

Language - Rust
Runtime - WASM/QuickJS for JavaScript execution
Database - PostgreSQL
API - Axum web framework

LLM Integration

Anthropic - Claude models
OpenAI - GPT models
OpenAI Compatible - Custom model support

Infrastructure

Docker - Containerization
PostgreSQL - Data persistence
Redis (optional) - Caching layer

Agent Execution Flow

1. Agent receives goal
   │
2. LLM generates plan
   │
3. LLM generates JavaScript code
   │
4. Code executes in WASM sandbox
   │
5. SDK calls route to host functions
   │
6. Host functions execute operations
   - web.search()
   - db.query()
   - http.get()
   - etc.
   │
7. Results returned to agent
   │
8. If more work needed → repeat from 3
   │
9. Otherwise → done() with final answer

System Prompt Generation

runpiper automatically generates a system prompt for agents based on:

Available tools - plan, execute_code, done
SDK documentation - Generated from capabilities
Approach guidelines - Best practices for task completion

Example system prompt:

You are an autonomous agent. You accomplish goals by writing JavaScript code.

## Your Tools

1. **plan** - Create or update your task plan
2. **execute_code** - Run JavaScript with SDK access
3. **done** - Complete the task with final answer

## Available SDK

```javascript
web.search(query)              // Returns: [{ title, url, snippet }]
web.fetch(url)                 // Returns: { status, body }
db.query(sql, params?)         // Returns: Row[]
memory.search(query)           // Returns: [{ content, score }]
memory.add(content, metadata?) // Stores for future recall
tasks.summarize(input)         // Returns: task output
```

## Approach

1. Start with `plan` to break down the goal
2. Use `execute_code` to accomplish each step
3. Call `done` with the final answer

Variables persist across execute_code calls. Do not narrate. Use tools.

Monitoring & Observability

Metrics Collected

Request latency
LLM costs
Token usage
Error rates
Concurrency levels

Logging

Structured JSON logging
Request/response correlation
Error stack traces
Performance metrics

Debugging

Full execution traces for agents
SDK call logs
REPL state inspection
Session replay

Deployment Patterns

Cloud-Managed

Zero operational overhead
Automatic scaling
Managed updates
Included monitoring

Self-Hosted

Full data control
Custom configurations
On-premises deployment
Enterprise security

Next Steps

Tasks: Learn about tasks
Agents: Build autonomous agents
Self-Hosted Deployment: Deploy your own instance