Skip to content

I build AI products with evaluation, fallback handling, and uncertainty-aware UX

How an AI product behaves matters as much as how accurate it is

  • The pipeline works in demos, but production exposes behaviour no one designed for

  • Users can't tell when to trust the output, or what to do when it's wrong

  • No one on the team owns AI behaviour end-to-end

  • Better accuracy alone won't fix any of it

Book Free Intro Call

Portrait of Alfred Persson, freelance AI engineer

About me

Hi, I'm Alfred. I'm a freelance AI engineer and I work with product teams shipping AI products: assistants, copilots, RAG over internal data, agentic workflows.

If your AI product behaves well in demos and unpredictably in production, that's where I come in. The pipeline might work, but the AI confuses users, fails silently, or breaks trust in ways no one designed for. Better accuracy doesn't fix that. Designing how the AI should behave does.

I studied interaction technology and design at Umeå University, a programme that combines software engineering with UX and usability. After that I moved into backend engineering, where I spent five years building production distributed systems that had to stay reliable under unpredictable conditions. The AI focus came in the last stretch: building demos and RAG pipelines at ChromaWay on the platform's built-in LLM inference and vector database support, then Datalumina's six-week production AI program. AI products fail unpredictably by nature, and the engineering layer that decides how they should behave when they fail tends to fall through the cracks between ML, product, and design teams. That mix of usability and reliable systems engineering is what I bring to AI work now.

Why work with me?

Usability + Engineering Background · I studied interaction technology at Umeå University, software engineering combined with UX and usability. I think about the pipeline and how the user reads its output, including what happens when the model is wrong. I diagnose and spec what the UI should do; your design team handles how it looks.

5 Years in Production Systems · I spent five years building production distributed systems that had to stay reliable under unpredictable conditions. Production-grade shipping habits transfer directly to AI products, which fail in unpredictable ways by nature.

End-to-end ownership of behaviour · I build the retrieval pipeline, LLM orchestration, and evaluation framework, and I spec how each response type should behave in the UI: confidence states, escalation flows, error handling. Most AI engineers stop at the pipeline.

Behaviour over accuracy · Five behaviour dimensions: Product-Journey Fit, UX & Trust, Output Quality, Measurement & Feedback, Ops & Ownership. Accuracy is one of five, not the whole picture.

How I work

AI Feature Audit · A two-week diagnostic of an existing AI product across the five behaviour dimensions. You get a maturity score per dimension, a prioritized roadmap of named fixes, and a walkthrough session. Fixed price, €3,000-5,000.

Build · Greenfield AI product, end-to-end. Retrieval, LLM orchestration, evaluation framework, fallback paths, and UI behaviour spec. 1-3 months, project-based or part-time embedded.

Embedded · Part-time AI engineer on your team, typically 2-3 days a week for 2-6 months. Owns AI behaviour, evaluation, and integration spec across multiple features.

Evaluation & Measurement · Most teams iterating on prompts have no way to tell if a change actually improved anything. I run error analysis on real traces with your domain expert to build a failure taxonomy, then ship the deterministic asserts and LLM judges that catch each failure mode in CI. Available as a one-week starter that ships the taxonomy and first asserts, or a longer engagement that delivers a validated judge suite, annotation interface, and CI loop your team owns after handoff.

Frequently asked questions

What kind of products do you work on?

AI products where the AI is user-facing. Common shapes: assistants and copilots, RAG over internal data, agentic workflows, intelligent search, document processing. Common fits include B2B SaaS adding AI to existing products, AI-native startups small enough to lack a dedicated AI engineering team, ops tooling, and internal knowledge platforms. The common thread is that the AI behaviour matters to a real user, and someone needs to own how it should behave end-to-end.

What's your background?

Short version: interaction technology and design at Umeå University (software engineering combined with UX and usability), then five years building production distributed systems. The AI focus came in the last stretch, with RAG pipelines at ChromaWay and Datalumina's six-week production AI program. The mix I work from is production engineering plus user-side thinking about how the AI should behave, including when it's wrong.

Do you work as a contractor or on fixed-scope projects?

Both. Embedded work makes sense if you have ongoing AI work and want me iterating alongside your engineers. A fixed-scope engagement makes sense if you want a specific AI product built, shipped, and handed off. We figure out what fits during the intro call.

What does a typical engagement look like?

Depends on scope. An audit takes 2 weeks. A Build is 1-3 months from architecture to deployment. Embedded work is 2-3 days per week over 2-6 months. We start with a free intro call to scope the work and make sure it's a good fit.

What's your tech stack?

Python, FastAPI, OpenAI, Claude, LangChain/pydantic-ai, ChromaDB, Pinecone, pgvector, sentence-transformers, PostgreSQL, and Docker.

Where are you based and how does GDPR work?

Fully remote from Mauritius, on EU working hours. GDPR data handling is covered by a standard Data Processing Agreement and EU Standard Contractual Clauses, with a Transfer Impact Assessment available on request.

Let's grab a virtual coffee

Want to see if we're a good fit? Let's have a chat. Book a free 30-minute intro call and we can talk through what you're working on.

Book Free Intro Call