Deploy AI apps, agents, evals, and model-backed services
Runtime is the programmable control plane behind run.apothic.ai. Define AI workloads with apothic-client, deploy inference services and background agent jobs, coordinate workers with queues and dictionaries, and keep humans in the loop with typed watch streams, schedules, volumes, secrets, and account-scoped billing.
Current surface
- -AI apps, inference services, and background agent jobs through one account-scoped control plane.
- -Queues, dictionaries, schedules, volumes, secrets, and sandboxes for real AI coordination workflows.
- -Typed watch streams for jobs, apps, deployments, queues, and dictionaries with snapshot support.
- -Managed CPU and GPU capacity for everything from low-latency APIs to long-running eval and automation loops.
More than a model endpoint
Apothic Runtime is the layer that accepts deploy, invoke, spawn, and operator requests from the SDK and CLI. It keeps durable records for deployments, functions, jobs, schedules, workers, and coordination primitives so your AI workloads can be inspected, watched, paused, resumed, or cleaned up later.
The user-facing workflow starts in Python with apothic-client, but the runtime is what turns that code into a shared execution substrate for agent systems, model-backed services, and background automation with real operational visibility.
Runtime building blocks
These primitives are already reflected in the website docs, the Python SDK, and the control plane API surface exposed from run.apothic.ai.
AI workloads in a few lines of Python
Runtime is designed for AI applications that need more than a single synchronous model call. You can deploy agent backends, `liv`-generated eval jobs, retrieval services, and long-running pipelines with the same SDK and then observe them through job and deployment watch streams.
- -Agent evaluation jobs with typed progress events and durable job handles.
- -Deploy-time generated implementations with `@apothic.liv(...)` for functions and endpoints.
- -Inference APIs and model-backed services via endpoint, ASGI, FastAPI, WSGI, or web_server shapes.
- -Background extraction, routing, reranking, and enrichment tasks that need retries and visibility.
- -Shared worker coordination for multi-agent pipelines through Queue, Dict, schedules, secrets, and volumes.
import apothic
app = apothic.App("agent-evals")
@apothic.liv(
self_debug=True,
cache="account",
examples=[(
("How do refunds work?", "Refunds post in 3-5 business days."),
{},
)],
)
@app.function(cpu=2, memory_mb=4096, timeout_s=600)
def grade_response(prompt: str, answer: str) -> dict[str, object]:
"""Return a JSON object with keys `score` and `reason`.
Grade whether `answer` correctly addresses `prompt`.
`score` must be a float between 0 and 1.
"""
raise NotImplementedError
@app.local_entrypoint()
def main() -> None:
deployment_id = app.deploy()
remote = apothic.RemoteFunction.from_name("agent-evals", "grade_response")
job = remote.spawn(
"How do refunds work?",
"Refunds post in 3-5 business days.",
)
print("deployment:", deployment_id)
for event in job.watch(snapshot=True):
print(event.event, event.data)
Everything an AI workload needs after the first deploy
Runtime is where packaging, execution, shared state, and operator visibility come together. The point is not just to run code remotely. It is to keep AI services, agents, and eval loops inspectable and scriptable end to end.
Deploy apps, not loose functions
Package inference endpoints, agent workers, schedules, class resources, and local entrypoints into a named App deployment.
Runtime keeps deployment records that can be listed, watched, stopped, and deleted later instead of treating every call as a stateless one-off.
Request/response and background jobs
Use `remote()` for immediate AI responses or `spawn()` when you want durable eval jobs, logs, status, retries, and async waiting.
Typed job events and snapshot-aware streams make long-running AI workflows visible to CLIs, dashboards, and operator UIs.
Multiple service shapes
Expose model APIs, retrieval services, ASGI apps, FastAPI handlers, WSGI apps, or full local web servers through the same deploy path.
This lets teams bring existing AI application servers into the runtime without rewriting them around a single narrow serving model.
Shared coordination primitives
Queue and Dict provide blocking reads, claims, acknowledgements, leases, compare-and-set, locks, leader election, and watch streams.
You get worker coordination and live state sharing for multi-agent and multi-stage AI pipelines without building a bespoke control plane.
Persistent runtime resources
Attach named volumes, secrets, and cloud bucket mount specs directly to deploy manifests and operator flows.
The same stack also supports named sandboxes and schedule rows, so stateful AI workflows do not stop at plain function invocation.
Deploy-time generation with liv
Generated implementations are materialized before deploy and then shipped as frozen Python for the remote worker to execute.
That keeps the execution path deterministic while still giving developers an AI-assisted generation workflow inside ordinary app definitions.
Architecture for serious AI applications
The runtime split is intentional: the control plane handles API access, scheduling, autoscaling, and durable event streams while your AI workloads run behind a single consistent runtime surface. That keeps operator workflows, billing, and observability centralized even as workload shape changes.
1. run.apothic.ai
- ▹Accepts deploy, invoke, spawn, and operator requests from apothic-client and the CLI for AI services, agent workers, and eval jobs.
- ▹Associates runtime resources with the caller account for ownership, billing, and auth enforcement.
- ▹Exposes the surface for apps, deployments, jobs, schedules, queues, dictionaries, volumes, and secrets.
2. scheduler
- ▹Polls queued AI jobs, applies retry policy objects, and re-enqueues work with backoff when needed.
- ▹Materializes cron and period schedules into ordinary runtime jobs through the existing queue path.
- ▹Can coalesce compatible queued requests into one batched invocation and split the results back out.
3. autoscaler
- ▹Adds or removes compute capacity as demand changes across inference APIs, agent backends, and background processing.
- ▹Keeps execution concerns behind the runtime boundary instead of exposing infrastructure details directly to callers.
- ▹Supports cost-aware operation by reconciling capacity against real workload demand.
4. durable runtime state
- ▹Stores deployments, functions, jobs, schedules, workers, scaling events, queue items, and dictionary entries.
- ▹Keeps durable resource event rows for queue and dictionary watch semantics plus richer deployment and job inspection.
- ▹Persists the metadata that operator flows need for inspection, pause/resume, cleanup, and billing.
Execution flow
- 1. A developer defines an AI app in Python with inference functions, services, schedules, or shared primitives.
- 2. apothic-client packages the manifest and sends it to run.apothic.ai.
- 3. Runtime stores deployment and function metadata, then resolves the correct execution path for model services, agents, or queued jobs.
- 4. The scheduler and autoscaling layer place the work onto managed compute capacity based on the workload profile.
- 5. Job, deployment, queue, and dictionary events stream back to clients so dashboards and AI operators can react in real time.
How teams ship AI workloads day to day
The flow is straightforward: package the AI workload, deploy it, watch what it does, then operate the resulting shared state and live services without switching to a different system for every step.
Define the AI workload
Compose model-backed functions, agent services, schedules, queues, dictionaries, or sandboxes inside a named App using apothic-client.
Deploy from code or CLI
Ship the app with `app.deploy()` or `apothic deploy ...`, then invoke low-latency inference calls or spawn durable background jobs for evals and automations.
Observe live runtime state
Use `job.watch()`, deployment watches, queue watches, and dictionary watches with snapshot support to follow live model and agent progress.
Operate shared resources
Pause or resume schedules, inspect deployments, acknowledge claimed work, rotate secrets, and clean up older AI deployments through the same control plane.
Build AI products without building your own runtime layer
If your team needs deployable AI services, durable jobs, watchable state, and account-scoped billing in one surface, Runtime is the product layer that ties it together. Start with the SDK, then move up into schedules, shared coordination, and operator tooling as your agents, evals, and model services grow.
