I’ve been building what I believe is one of the most sophisticated prompt engineering systems I’ve worked on so far—a production-grade prompt engine that powers AI interactions across multiple languages, contexts, and safety requirements.
This isn’t just another “prompt management” system. It’s a complete architecture that handles:
- Dynamic prompt construction based on context and conditions
- Component-based architecture for reusable, maintainable prompts
- Multi-layer guardrails with stop, caution, and allow decisions
- Structured output with source tracking and moderation
- Parallel processing for performance optimization
Whether you’re building AI agents, workflow automation systems, or chat assistants, this architecture provides a robust foundation for managing complex prompt engineering at scale.
Note on Examples
The examples in this article are derived from the project but are not directly related to the actual product. They’ve been adapted for illustrative purposes. This article is meant to be useful to anyone building AI agents, workflow automation systems, or chat-based assistants.
The Problem: Why Prompt Engineering Needs an Engine
When you’re building AI applications at scale, hardcoding prompts quickly stops working.
You need:
- Context-aware prompts that adapt to entities, languages, and user situations
- Reusable components so you don’t copy-paste the same instructions everywhere
- Safety systems that can evaluate content before and after generation
- Maintainability so prompts can change without shipping new code
- Observability to understand what prompts are being used and how they perform
I needed a system that could do all of this—and do it reliably in production.
The Architecture: A Three-Layer System
The prompt engine follows a three-layer architecture that separates concerns cleanly:
- Prompt Definitions – when and how prompts should be used
- Prompt Templates – the structure of the prompt
- Prompt Components – the reusable content building blocks
Layer 1: Prompt Definitions
The “when and how” layer
Prompt definitions are the central configuration layer that determines when and how a prompt should be used.

Each definition can specify:
- Target action – e.g.
answer_question,guardrail,summarize, etc. - Conditions – e.g. POI categories, languages, time ranges
- Priority – a scoring system (e.g. 1–10) to break ties
- LLM configuration – model, temperature, reasoning effort
- Moderation settings – pre- and post-moderation rules
- Template reference – which template to use
Definitions are basically the routing and decision layer of the engine.
Layer 2: Prompt Templates
The “structure” layer
Templates define the shape of the prompt. They contain the layout (system, user, tool messages) and placeholders that get filled at runtime.

Templates support three main kinds of placeholders:
- Component placeholders
{{comp:source_knowledge_instructions}}
→ Inject reusable prompt components. - Function placeholders
{{current_date()}},{{language()}},{{chat_history()}}
→ Inject dynamic values such as dates, language, or history. - Database placeholders
{{poi.name}},{{meta.description}}
→ Directly read values from the underlying data.
Templates give you a consistent skeleton, while still allowing dynamic content.
Layer 3: Prompt Components
The “content” layer
Components are reusable content fragments that can be injected into templates. They’re the building blocks of your prompt system.

Components can include:
- Conditions – only apply in certain contexts (language, POI, time, etc.)
- Priority – multiple matching components can be ranked
- Random selection – useful for A/B testing or variation
- Language-aware variants – different content per language
Together, definitions + templates + components give a flexible but structured system.
The Request Pipeline: How Everything Fits Together
Here’s how a single request flows through the engine:

Key ideas:
- Definition selection decides what to run
- Template + components build the actual prompt
- Moderation and guardrails wrap safety around the whole thing
- Parallelization keeps latency down
Guardrails: A Three-Decision Safety System
One of the most important parts of the architecture is the guardrail system. Every user question can be evaluated before any AI response is generated.

There are three possible decisions:
- STOP – block the response and return a safe message
- CAUTION – continue, but inject a cautionary message into the system prompt
- ALLOW – proceed normally
Guardrails themselves are defined using the same three-layer system (definitions, templates, components) with target_action: "guardrail".
They evaluate:
- Safety risks
- Policy compliance
- Educational value
- Conversation context
- User intent
And they output a structured decision, e.g.:
{
"decision": "stop",
"stop_message": "I can't provide instructions for creating dangerous devices. I can, however, help with educational or historical information about related topics.",
"caution_message": ""
}
or:
{
"decision": "caution",
"stop_message": "",
"caution_message": "This topic involves sensitive and distressing historical events. The following information is presented for educational purposes."
}
STOP, CAUTION, ALLOW – When Each Is Used
- STOP – For clear safety / policy violations
- e.g. self-harm instructions, illegal activities, explicit hate, dangerous devices
- CAUTION – For sensitive but still educational topics
- e.g. wars, atrocities, trauma topics, controversial social issues
- ALLOW – For safe, standard questions
- e.g. museum content, general knowledge, factual Q&A
The CAUTION path is the key bridge between safety and usefulness: it allows content while enforcing framing and context.
Parallel Guardrail Execution: Safety Without Latency
The interesting part: guardrails don’t have to slow everything down.
Guardrail evaluation runs in parallel with prompt construction and (optionally) the first model call.

This gives you:
- Safety → evaluated for every request
- Performance → no unnecessary blocking
- Efficiency → expensive calls cancelled early when needed
Observability: Prompt Logs for Full Visibility
Every execution is logged in a Prompt Log, which captures what happened, how, and why.
You can use these logs to:
- Debug problematic responses
- Analyze performance bottlenecks
- See which definitions / templates are most used
- Track guardrail decisions and safety behavior
- Monitor user feedback (thumbs up/down, comments)
Observability is what turns this from “prompt glue” into an operable system.
Advanced Features
A few more features that make this engine production-ready:
Conditional Component Resolution
Components can be tied to context, such as:
- Specific POI categories
- Certain languages
- Time ranges or events
- Field existence in the database
This keeps prompts both contextual and maintainable.
Multi-Language Support
The engine supports multi-language behavior through:
- Language-specific templates
- Language-specific components
{{language()}}helper- Multi-language fields resolved from the database
This ensures the same architecture can serve users across different locales.
Structured Output with Sources
AI responses aren’t just raw text. Each response can include:
answer– the user-facing answersources– where the information came from- relevance scores, IDs, and types
This is useful for:
- audit trails
- debugging
- transparency
- downstream logic
A Real-World Style Example (Simplified)
Imagine a customer support assistant.
User asks:
“How do I reset my password?”
The engine:
- Selects the best prompt definition for
answer_user_questioninen - Chooses the relevant template
- Resolves all components (tone, context, instructions, entity fields)
- Evaluates guardrails (almost always ALLOW here) in parallel
- Builds the final prompt and calls the LLM
- Logs everything into the Prompt Logs
Final response to the user might be:
To reset your password, go to the account settings page and click on “Security”. Then select “Change Password” and follow the prompts…
Behind the scenes, you also have:
- Structured JSON output
- Source tracking
- Guardrail decision (
"allow") - Performance metrics
- Full prompt + model call for replay
Technical Challenges I Had to Solve
A few of the big ones:
- Nested placeholders & functions
e.g.{{entity_content('name', '{{language()}}')}}
→ Resolve inner functions first, then outer. - Complex conditional logic
POI filters, visitor filters, temporal filters, content filters all combined. - Performance at scale
Doing all of this without making response times explode. - Multi-language everything
Templates, components, and data all needing language-aware behavior. - Guardrail integration
Making guardrails first-class citizens without killing performance.
Results
This architecture gives me a prompt engine that is:
- Production-ready
- Deployed in a real product
- Handles real traffic and real users
- Maintainable
- Prompt changes don’t require code changes
- Components and templates are reusable and composable
- Safe
- Guardrails evaluated per request
- Stop/Caution/Allow system with structured decisions
- Observable
- Prompt logs with full traceability
- Performance, usage, and quality all measurable
- Flexible
- Easy to add new prompt types, languages, and safety rules
Key Takeaways
If you’re thinking about building something similar, here are the main things I’d keep in mind:
- Separate definitions, templates, and components
- Make everything conditional and context-aware
- Treat guardrails as first-class citizens, not an afterthought
- Use structured output for predictable downstream behavior
- Log everything – prompts, decisions, performance, guardrails
- Lean heavily on parallelization where possible
- Design for multi-language support from the start
This isn’t just about “prompt engineering” anymore—it’s about prompt architecture.
And having an engine like this changes how you think about building AI systems.

Leave a Reply