Skip to main content
petruarakiss
aboutprojectswritingCV
CV

Open to Staff · Principal · Architect roles · Madrid · Remote EU

I build AI systems that hold up in production.

I'm Petru Arakiss, an AI systems engineer in Madrid. Twenty years in software, in ML since 2015, and full-time on production AI these last few years. I design and ship retrieval, agent runtimes, guardrails, and evals: the parts that decide whether a system holds up after launch, not just in a demo.

Download CVLinkedInGitHub
Currently
AI Engineering Lead · Atlax360
Experience
20 yrs software · ML since 2015
Focus
Production LLM · RAG · agents · evals · guardrails
Based
Madrid · Remote across EU

01 / background

Making AI reliable is mostly ordinary engineering.

I've built software for twenty years and worked in machine learning since 2015, well before the current wave. Before AI, I shipped for banks and retailers, where a bug cost money and a missed deadline stayed missed. That work taught me the layer around the model is what matters: retrieval quality, runtimes that behave the same way twice, guardrails, and interfaces an operator can still trust when a prediction comes out wrong.

20 years

Full-stack architecture and platform work behind financial and B2B products, before moving into production AI.

At Atlax360 I lead AI engineering across three systems: BIFROST (document intelligence), ORVIAN (a multi-tenant workflow runtime), and Polaris (an internal knowledge assistant). They run inside live financial operations; I set the retrieval, evaluation, and guardrail standards the three share, and keeping them reliable under load is the job.

I use coding agents and language models heavily in my own work, and I know where they break. The bar stays the same: safe defaults, traces you can inspect, and software that keeps running when the input is malformed, the permissions are tight, or the API is slow.

Full backgroundProject inventory
Worked with
Banks & retailers

Led engineering teams and full rewrites inside regulated financial environments and high-traffic retail.

12–20 engineer teams

Ran cross-functional platform teams through complex shipping schedules and legacy transitions.

IBM & Linux Foundation

IBM Machine Learning Professional and Linux Foundation Node.js developer certifications.

ClientsBBVASantanderBankinterDecathlonEl Corte Inglés

02 / current work

Three systems I lead at Atlax360.

Most of what I build sits behind enterprise firewalls. These three show the kind of architecture I work on: ingestion pipelines, a multi-tenant runtime, and a guardrailed assistant.

01

BIFROST

document intelligence

Document intelligence for the documents finance actually runs on: ingestion quality gates, semantic and visual chunking, pgvector/HNSW search, caching, source-quality scoring, analytics, and explicit no-answer paths when evidence is weak.

02

ORVIAN

workflow runtime

Multi-tenant AI workflow runtime: context assembly, durable memory, deterministic and cached execution tiers, queue processing, idempotency, run events, and human-review metadata when automation should stop.

03

Polaris

internal assistant

Internal assistant that combines BIFROST retrieval with guardrails, citations, streaming UX, suggestion revalidation, and operator analytics. One surface for support, sales, and product teams.

03 / open source

Deterministic utilities, shared publicly.

The patterns behind my production work, extracted into Rust and TypeScript libraries: permission boundaries, trace evidence, governed memory, and structured observability.

gommage / Rust

Deterministic policy engine for AI coding agents: maps tool calls to capabilities, evaluates YAML rules, and signs every decision in a verifiable audit log, with hard-stops that policy can't bypass.

github.com/Arakiss →
nahuali / Rust

Self-inspecting, auditable memory for AI agents: surfaces the evidence, provenance, and health behind each recall so callers can see which memory to trust, with an optional Ed25519-signed tamper-evident ledger. Local-first, Rust.

github.com/Arakiss →
traceframe / Rust

Local-first trace recorder for AI agent runs: append-only, verifiable evidence of what the agent called, what it was allowed, and what failed, with hook ingestion for Codex/OMX harnesses.

github.com/Arakiss →
vestig / TypeScript

Runtime-agnostic structured logging with automatic PII sanitization (GDPR/HIPAA/PCI-DSS) and native W3C tracing. Zero dependencies; runs on Node, Bun, Deno, Edge, and the browser.

github.com/Arakiss →
greco / Rust

Research harness exploring whether a coding-agent harness can measurably improve itself through typed, layered modifications validated against operator-defined evals within strict budgets.

github.com/Arakiss →

04 / what i work on

From index to interface.

I design and ship the whole path for AI that has to work past the demo: retrieval index, runtime, evals, guardrails, and the operator screen. The glue between an API and a UI is the small part.

Retrieval

Chunking, metadata, permissions, source quality, citations, and caching, plus the unglamorous work of returning “I don't know” when the index is wrong or the document is ambiguous.

Agent runtimes

Tool boundaries, stopping conditions, traces, handoffs, evaluators, queues, and cost control. The brakes matter more than the autonomy.

Evaluation

Eval sets, abstain logic, and regression traces. The discipline of proving a change helped instead of assuming it did.

Compliance-aware operations

Permissions, audit trails, PII handling, and human-in-the-loop review for AI running inside regulated finance, where “mostly right” isn't good enough.

Harness engineering

Repository context, executable plans, browser checks, CI guardrails, and review loops around Codex and Claude, so their output stays legible and verifiable inside a real codebase.

Product surfaces

Next.js, TypeScript, streaming UX, and the operator screen someone uses when the model misfires. In my systems, that screen is where adoption is won or lost, long before the benchmark.

Read field notes

05 / how i work

How I work.

High autonomy, async by default, and I own the production outcome. I'm in Madrid and work remote-first across European time zones. I'm looking for Staff, Principal, Architect, and Forward Deployed roles where production AI is core to the product, with the reliability work that implies.

Production stack

  • Python · FastAPI · TypeScript · Next.js
  • PostgreSQL · pgvector · Redis · Supabase
  • OpenAI · Anthropic · Vercel AI SDK
  • Evals · traces · guardrails · observability

Good fit when

  • Teams shipping AI inside regulated or operationally heavy environments
  • Roles that need retrieval, runtime, evals, and product judgment in one architect
  • Organizations past the demo phase and into reliability, cost, and permissions

06 / questions

Common questions.

Quick facts on stack, roles, and fit. The CV has the full timeline.

Download CV

07

Let's talk.

I'm looking for teams that treat language models as product infrastructure, where retrieval quality, cost, latency, and safety are engineering problems with owners.

Available for select conversations
|
Madrid / Remote across EU
Get in touchCV
© 2026 Petru Arakiss · MadridAI engineer · Staff / Principal · Remote EU
aboutprojectswriting