Production AI systems: retrieval infrastructure (RAG, pgvector, document intelligence), agent and workflow runtimes, guardrails, eval loops, traces, and operator-facing product surfaces in Next.js and TypeScript. My current live systems are BIFROST, ORVIAN, and Polaris at Atlax360.

What roles are you looking for?

Staff and Principal roles where production AI is the core infrastructure: AI Architect, Forward Deployed / Solutions Engineer, LLM Platform Architect. The work is retrieval, runtimes, evals, and guardrails, not a one-off integration. I'm in Madrid and work remote-first across the EU.

How long have you worked in AI?

I've been in machine learning since 2015, well before the current LLM wave, on top of twenty years building production software. The last few years have been full-time on production LLM, RAG, and agentic systems inside regulated finance.

What's in your production stack?

Python and FastAPI for retrieval and ML services; TypeScript, Next.js, and Hono for platforms and runtimes; PostgreSQL with pgvector for semantic search; Redis and queues for execution tiers; OpenAI and Anthropic APIs with eval and trace tooling around them.

How is this different from prompt engineering?

The model is rarely the bottleneck in production. Context quality, permissions, orchestration boundaries, eval coverage, failure handling, and operator UX decide whether a system holds up once real users hit it.

petruarakiss

Open to Staff · Principal · Architect roles · Madrid · Remote EU

I build AI systems that hold up in production.

I'm Petru Arakiss, an AI systems engineer in Madrid. Twenty years in software, in ML since 2015, and full-time on production AI these last few years. I design and ship retrieval, agent runtimes, guardrails, and evals: the parts that decide whether a system holds up after launch, not just in a demo.

Download CV LinkedIn GitHub

Currently: AI Engineering Lead · Atlax360
Experience: 20 yrs software · ML since 2015
Focus: Production LLM · RAG · agents · evals · guardrails
Based: Madrid · Remote across EU

01 / background

Making AI reliable is mostly ordinary engineering.

I've built software for twenty years and worked in machine learning since 2015, well before the current wave. Before AI, I shipped for banks and retailers, where a bug cost money and a missed deadline stayed missed. That work taught me the layer around the model is what matters: retrieval quality, runtimes that behave the same way twice, guardrails, and interfaces an operator can still trust when a prediction comes out wrong.

20 years

Full-stack architecture and platform work behind financial and B2B products, before moving into production AI.

At Atlax360 I lead AI engineering across three systems: BIFROST (document intelligence), ORVIAN (a multi-tenant workflow runtime), and Polaris (an internal knowledge assistant). They run inside live financial operations; I set the retrieval, evaluation, and guardrail standards the three share, and keeping them reliable under load is the job.

I use coding agents and language models heavily in my own work, and I know where they break. The bar stays the same: safe defaults, traces you can inspect, and software that keeps running when the input is malformed, the permissions are tight, or the API is slow.

Full background Project inventory

Worked with

Banks & retailers

Led engineering teams and full rewrites inside regulated financial environments and high-traffic retail.

12–20 engineer teams

Ran cross-functional platform teams through complex shipping schedules and legacy transitions.

IBM & Linux Foundation

IBM Machine Learning Professional and Linux Foundation Node.js developer certifications.

ClientsBBVASantanderBankinterDecathlonEl Corte Inglés

02 / current work

Three systems I lead at Atlax360.

Most of what I build sits behind enterprise firewalls. These three show the kind of architecture I work on: ingestion pipelines, a multi-tenant runtime, and a guardrailed assistant.

BIFROST

document intelligence

Document intelligence for the documents finance actually runs on: ingestion quality gates, semantic and visual chunking, pgvector/HNSW search, caching, source-quality scoring, analytics, and explicit no-answer paths when evidence is weak.

ORVIAN

workflow runtime

Multi-tenant AI workflow runtime: context assembly, durable memory, deterministic and cached execution tiers, queue processing, idempotency, run events, and human-review metadata when automation should stop.

Polaris

internal assistant

Internal assistant that combines BIFROST retrieval with guardrails, citations, streaming UX, suggestion revalidation, and operator analytics. One surface for support, sales, and product teams.

03 / open source

Deterministic utilities, shared publicly.

The patterns behind my production work, extracted into Rust and TypeScript libraries: permission boundaries, trace evidence, governed memory, and structured observability.

gommage / Rust

Deterministic policy engine for AI coding agents: maps tool calls to capabilities, evaluates YAML rules, and signs every decision in a verifiable audit log, with hard-stops that policy can't bypass.

github.com/Arakiss →

nahuali / Rust

Self-inspecting, auditable memory for AI agents: surfaces the evidence, provenance, and health behind each recall so callers can see which memory to trust, with an optional Ed25519-signed tamper-evident ledger. Local-first, Rust.

github.com/Arakiss →

traceframe / Rust

Local-first trace recorder for AI agent runs: append-only, verifiable evidence of what the agent called, what it was allowed, and what failed, with hook ingestion for Codex/OMX harnesses.

github.com/Arakiss →

vestig / TypeScript

Runtime-agnostic structured logging with automatic PII sanitization (GDPR/HIPAA/PCI-DSS) and native W3C tracing. Zero dependencies; runs on Node, Bun, Deno, Edge, and the browser.

github.com/Arakiss →

greco / Rust

Research harness exploring whether a coding-agent harness can measurably improve itself through typed, layered modifications validated against operator-defined evals within strict budgets.

github.com/Arakiss →

04 / what i work on

From index to interface.

I design and ship the whole path for AI that has to work past the demo: retrieval index, runtime, evals, guardrails, and the operator screen. The glue between an API and a UI is the small part.

Retrieval

Chunking, metadata, permissions, source quality, citations, and caching, plus the unglamorous work of returning “I don't know” when the index is wrong or the document is ambiguous.

Agent runtimes

Tool boundaries, stopping conditions, traces, handoffs, evaluators, queues, and cost control. The brakes matter more than the autonomy.

Evaluation

Eval sets, abstain logic, and regression traces. The discipline of proving a change helped instead of assuming it did.

Compliance-aware operations

Permissions, audit trails, PII handling, and human-in-the-loop review for AI running inside regulated finance, where “mostly right” isn't good enough.

Harness engineering

Repository context, executable plans, browser checks, CI guardrails, and review loops around Codex and Claude, so their output stays legible and verifiable inside a real codebase.

Product surfaces

Next.js, TypeScript, streaming UX, and the operator screen someone uses when the model misfires. In my systems, that screen is where adoption is won or lost, long before the benchmark.

Read field notes

05 / how i work

How I work.

High autonomy, async by default, and I own the production outcome. I'm in Madrid and work remote-first across European time zones. I'm looking for Staff, Principal, Architect, and Forward Deployed roles where production AI is core to the product, with the reliability work that implies.

Production stack

Python · FastAPI · TypeScript · Next.js
PostgreSQL · pgvector · Redis · Supabase
OpenAI · Anthropic · Vercel AI SDK
Evals · traces · guardrails · observability

Good fit when

Teams shipping AI inside regulated or operationally heavy environments
Roles that need retrieval, runtime, evals, and product judgment in one architect
Organizations past the demo phase and into reliability, cost, and permissions

06 / questions

Common questions.

Quick facts on stack, roles, and fit. The CV has the full timeline.

Download CV

Let's talk.

I'm looking for teams that treat language models as product infrastructure, where retrieval quality, cost, latency, and safety are engineering problems with owners.

Available for select conversations

Madrid / Remote across EU

Get in touch CV

I build AI systems that hold up in production.

Making AI reliable is mostly ordinary engineering.

I build AI systems that hold up in production.

Making AI reliable is mostly ordinary engineering.

Three systems I lead at Atlax360.

BIFROST

ORVIAN

Polaris

Deterministic utilities, shared publicly.

From index to interface.

Retrieval

Agent runtimes

Evaluation

Compliance-aware operations

Harness engineering

Product surfaces

How I work.

Production stack

Good fit when

Common questions.

01What do you build?

02What roles are you looking for?

03How long have you worked in AI?

04What's in your production stack?

05How is this different from prompt engineering?

Let's talk.

I build AI systems that hold up in production.

Making AI reliable is mostly ordinary engineering.

Three systems I lead at Atlax360.

BIFROST

ORVIAN

Polaris

Deterministic utilities, shared publicly.

From index to interface.

Retrieval

Agent runtimes

Evaluation

Compliance-aware operations

Harness engineering

Product surfaces

How I work.

Production stack

Good fit when

Common questions.

01What do you build?

02What roles are you looking for?

03How long have you worked in AI?

04What's in your production stack?

05How is this different from prompt engineering?

Let's talk.