Madrid · AI systems · public notes

Petru Arakiss.

I build AI systems that need evidence, permissions, traces, and boring operational checks. I write about the parts that fail after the demo, and about the cost of putting work in public.

Writing
Production AI, agent harnesses, indie hacking, trust
Public work
gommage, nahuali, traceframe, eldr, vestig
Current focus
Retrieval, runtimes, evals, traces, operator screens
Base
Madrid · remote across EU

01 / start

Writing and public repositories.

The useful entry points are the writing and the public repos. The CV exists for formal processes, after the work has made the conversation worth having.

since

2006

Professional software work across product, backend, platform, and AI systems.

ml since

2015

now

Retrieval, agent runtimes, guardrails, evals, traces, and operator screens.

Private production work gives me constraints: regulated finance, document workflows, retrieval quality, permissions, cost, latency, and review paths. Those constraints shape what I write and what I build in public.

Public work is where the claims can be checked: repos for agent policy, memory, traces, logging, local tooling, and the surrounding harnesses that make AI output inspectable.

Read this way
Writing

Notes on production AI, agent harnesses, attention, indie hacking, and the cost of publishing work that can be judged.

Nahuali

This year's quiet building-in-public attempt: a memory and evidence system for long-running AI work.

Production constraints

Private work supplies the boring parts: permissions, latency, cost, retrieval quality, review paths, and operator screens.

02 / constraints

Production constraints.

The implementation details are private. The shape is public enough to explain the problems: document ingestion and retrieval, workflow runtime behavior, and an internal assistant with guardrails and citations.

01

BIFROST

document intelligence

Document intelligence for the documents finance actually runs on: ingestion quality gates, semantic and visual chunking, pgvector/HNSW search, caching, source-quality scoring, analytics, and explicit no-answer paths when evidence is weak.

02

ORVIAN

workflow runtime

Multi-tenant AI workflow runtime: context assembly, durable memory, deterministic and cached execution tiers, queue processing, idempotency, run events, and human-review metadata when automation should stop.

03

Polaris

internal assistant

Internal assistant that combines BIFROST retrieval with guardrails, citations, streaming UX, suggestion revalidation, and operator analytics. One surface for support, sales, and product teams.

03 / open source

Public engineering work.

Public work I can point to directly: agent policy, traces, governed memory, structured observability, local systems tooling, and developer environments.

04 / what i work on

The parts I usually own.

Most of the value sits between a source document and the person using the result. That path includes indexing, permissions, runtime state, evals, guardrails, latency, and the operator workflow for failure handling.

Retrieval

Chunking, metadata, permissions, source quality, citations, and caching, with explicit refusal when retrieved sources are weak, contradictory, missing, or out of scope.

Agent runtimes

Tool boundaries, stopping conditions, traces, handoffs, evaluators, queues, and cost control, defined before an agent reaches production.

Evaluation

Eval sets, abstain logic, and regression traces. The discipline of checking whether a change helped before treating it as an improvement.

Compliance-aware operations

Permissions, audit trails, PII handling, and human-in-the-loop review for AI running inside regulated finance, where a fluent wrong answer is still a defect.

Harness engineering

Repository context, executable plans, browser checks, CI gates, and review logs around coding agents, so their output stays verifiable inside a real codebase.

Product workflows

Next.js, TypeScript, streaming UX, and operator screens for model failures, showing retrieved evidence, failure state, audit trail, escalation path, and the next valid action.

Read field notes

05 / how i work

How I work.

I work async and stay accountable for retrieval quality, runtime behavior, latency, cost, permissions, and operator workflows. If there is a serious production AI problem behind the conversation, email is the shortest path.

Production stack

  • Python · FastAPI · TypeScript · Next.js
  • PostgreSQL · pgvector · Redis · Supabase
  • OpenAI · Anthropic · Vercel AI SDK
  • Evals · traces · guardrails · observability

Good fit when

  • Teams shipping AI inside regulated or operationally heavy environments
  • Roles that need retrieval, runtime, evals, and product judgment in one architect
  • Products where an AI feature is already close to users and now needs reliability, cost control, and permissions

06 / questions

Common questions.

Quick facts on writing, public work, current constraints, and when a direct email makes sense.

Read writing

07

Start with a concrete problem.

Best conversations start with a broken retrieval path, an agent that does too much, an eval that catches too little, or a product surface where users cannot tell why the model answered.

Email works best with a specific problem attached
Madrid · Remote across the EU