Introduction

One of the first things you learn when building a useful AI agent is that the hard part is not calling the model. The harder part is deciding what the model should be allowed to see.

I recently ran into this while building an agent to manage Questions, Risks, and Issues (QRIs) in an Oracle APEX project management application. The agent can search existing QRIs, find people related to the project, create new QRIs, update existing ones, and help users navigate the project content.

That sounds straightforward until you remember that a project can contain a lot of data. Users naturally ask broad questions like:

"Show me all open issues."
"What are the risks for this project?"
"Find anything related to payment terms."

Those are reasonable requests. However, if the agent responds by dumping every matching row into the prompt, the UI, or the conversation history, the experience quickly deteriorates.

In this post, I will walk through some of the context management lessons from building this agent, how Retrieval-Augmented Generation (RAG) helped limit the result set, and a few practical patterns I would recommend for anyone building agents inside APEX or database-backed applications.

More Context Is Not Always Better

A common early instinct with AI agents is to give the model as much context as possible and let it figure it out. If the model has more data, surely it can give a better answer.

For an application agent, context has a cost:

it consumes tokens
it slows down responses
it gives the model more opportunities to focus on the wrong thing
it makes debugging harder
it increases the chance of exposing internal identifiers or unrelated business data
it can overwhelm the user when the response mirrors the size of the input

In my QRI agent, I had two context problems.

First, the model needed sufficient information to answer questions and perform actions safely. If a user asked to update a QRI, the agent needed to resolve which QRI they meant, preserve the correct workflow status, and avoid using IDs supplied directly by the user.

Second, users often wanted to "see more" than was useful. A user may request all matching data, but that does not mean the agent should become an export feature. The agent's goal is to help the user act, not to replace every report in the application.

Start With Runtime Context, Not Database Dumps

The agent prompt includes a small runtime context block with values such as:

logged-in user display name
logged-in user person ID
active project ID
current date, time, and time zone

This is the kind of context that should always be explicit and controlled. The agent should not infer that the project is active from the user's message. It should not trust a project ID typed into the chat. It should not accept person IDs from the user.

In the prompt, I made the runtime context authoritative:

Always pass the active project ID from the runtime context. Never use a user-supplied project ID.

That may sound simple, but it is an important boundary. The model can be flexible with language, but the application must be strict with authority.

That said, in APEX applications, the session already knows who the user is and which page or business object they are working with. The agent tools should always use that server-side context. It should not let the chat box become a backdoor for switching tenants, projects, users, or source records.

In APEX terms, the model should not become the authority for session state. APEX session state should supply the active project context. Page items, authorization schemes, application context, and PL/SQL APIs should still define what the user can see and change.

Authorization schemes should guard the target pages and actions. PL/SQL APIs should validate access independently of the model. Opaque links such as qri:// can be resolved by the application into safe APEX URLs using APEX_PAGE.GET_URL. Debug logs should go into an application-specific AI interaction table, not just transient APEX debug output.

Use Tools as Context Gates

The QRI agent does not receive all project data up front. Instead, it receives tools that expose narrowly scoped slices of data:

search subsections
list project team
search QRIs
get project details
create QRI
update QRI

This is one of the most useful mental models for agent design: tools are not only capabilities but also context gates.

A tool defines what the agent can ask for, which filters it must provide, how many rows it can receive, and which fields are returned. That is much safer than letting the model query arbitrary SQL or injecting large JSON payloads into every conversation turn.

For example, the QRI search tool supports structured filters such as type, status grouping, owner, assignee, priority, section, subsection, and semantic search text. The prompt tells the agent to use the narrowest filters available before retrieval.

That instruction matters because users do not always phrase requests as filters, but the agent can often infer them:

"open issues" means type_code = ISSUE and status_code = OPEN
"high priority risks" means type_code = RISK and priority_code = HIGH
"assigned to me" means the assignee should come from the runtime context

This keeps the prompt smaller and the answer more relevant.

RAG as a Limiting Mechanism

The most useful change was using RAG to limit QRI search results.

For semantic searches, the agent does not retrieve and return every QRI. It generates an embedding of the user's search text and compares it against vector chunks representing questions, risks, and issues in the active project. It then selects the closest candidates, applies a distance threshold, and passes only matching QRI IDs into the main relational query.

In the package, this is controlled with constants like:

gc_qri_vector_candidate_max CONSTANT PLS_INTEGER := 50;
gc_qri_vector_max_distance  CONSTANT NUMBER      := 0.55;   -- Cosine
gc_qri_search_result_max    CONSTANT PLS_INTEGER := 125;
gc_qri_content_max_chars    CONSTANT PLS_INTEGER := 300;

The exact numbers are application-specific, but the pattern is the important part.

The vector search is not the final answer. It is a narrowing step. Once the candidate QRI IDs are identified, the normal relational query still applies the project ID, QRI type, status, owner, assignee, priority, and section filters.

That combination is powerful:

vector search handles fuzzy user language
SQL filters enforce business rules
row limits prevent oversized responses
field truncation keeps each result compact
the final ordering is predictable

This is where RAG is easy to misunderstand. It is not just a way to "make the AI smarter." In business applications, RAG is also a way to avoid giving the AI too much.

Return Counts, Not Just Rows

The QRI search result includes values like:

total_count
returned_count
max_allowed
has_more_matches

The prompt then asks the agent to state how many matches were made overall and how many were returned. If more matches exist, the agent should show the returned records and suggest narrowing the filters.

This avoids a poor user experience in which the agent silently drops results. It also avoids the opposite problem, where the agent tries to be helpful by offering pagination inside the chat.

For this agent, I explicitly told it not to paginate results or offer the next range. That was intentional. If a user needs to review a large result set, the application should provide a report. The agent should help narrow and act.

There is a difference between "I found 125 of 600 matching records; narrow by section, priority, or status" and "Here are the first 125, would you like the next 125?" The second version turns the agent into a slow report viewer.

Truncate Content Before It Reaches the Model

Another practical choice was truncating QRI content and responses before returning them to the model. In the tool logic, QRI content is capped to a small number of characters using DBMS_LOB.SUBSTR after stripping HTML.

This is not only about token savings. It also changes the agent's behavior.

When the model receives short result summaries, it is more likely to summarize, compare, and guide the user. When it receives full long-form content for many records, it is more likely to drown in details or repeat them back.

For search results, the agent usually needs sufficient information to identify the item and determine its relevance. It does not need every character of every answer.

If the user truly needs the full record, the application can provide a link. In this agent, QRI results include an opaque qri:// link that the UI converts into an APEX page URL using APEX_PAGE.GET_URL. The model can display the link, but it is instructed not to expose raw database IDs.

That gives users a way to drill down without sending the full payload to the chat.

Separate Model Context From UI Context

One design detail I liked in this implementation was the separation of what is stored, what is sent back to the model, and what is shown in the UI.

Tool results are stored in a log table, but the visible conversation only shows a reduced preview, such as "Tool result captured." The next model turn gets the structured llm_context, not necessarily the entire raw display payload.

The UI response also undergoes redaction to remove business IDs, including project, section, subsection, person, QRI, and client IDs.

That may seem defensive, but it is worth doing. Models are very good at repeating what they see. If internal IDs appear in tool results, prompts, or hidden messages, they will eventually make it into a user-facing response unless you actively prevent it.

A better pattern is:

use internal IDs inside tool calls
return user-facing references in responses
expose links as opaque application links
redact accidental ID leakage before rendering
validate all IDs again in PL/SQL before writes

Redaction is useful, but it should be treated as a fallback. The better design is to avoid putting raw IDs into model-visible or user-visible text unless the model truly needs them.

Put Write Operations Behind Confirmation

Context management is not only about reads. Writes need even stricter control.

For create and update tools, the agent queues the requested action and returns a confirmation request to the UI. The write is only executed after the user confirms. The tools also prevent mixing create and update requests in the same batch.

This creates a clean boundary:

The model interprets the request.
The application prepares a pending action.
The user confirms.
PL/SQL executes the write after validating access, status, people, source location, and IDs.

That pattern reduces the risk of the model acting on ambiguous context. It also gives the user a compact preview of what will change without dumping the entire record set into the conversation.

Keep the Prompt Opinionated

The agent prompt is fairly detailed. It defines authority, tool rules, retrieval behavior, status mappings, people resolution, section targeting, QRI references, write safety, and response style.

This is necessary because agent behavior is partly application behavior. If you leave too much open-ended, the model will choose differently from one turn to the next.

A few prompt rules that proved useful:

use runtime context as authoritative
never trust user-supplied IDs
ask one concise clarification when required fields are missing
do not partially write a batch
preserve tool result ordering
state returned counts
do not expose raw IDs
use the narrowest retrieval filters
do not invent statuses, people, counts, or document facts

These are not personality instructions. They are application rules.

Other Context Management Practices

First, treat conversation history as a liability after a certain point. Keep enough recent history to maintain continuity, but do not blindly send the entire conversation forever. Older tool results can be summarized, referenced, or re-fetched when needed.

Second, design tools around user intent, not tables. A project_qri_search tool is easier for an agent to use safely than a generic run_sql tool because it encodes the business boundary.

Third, return structured JSON to the model. The model can work with prose, but structured fields reduce ambiguity and make the prompt rules easier to enforce.

Fourth, use separate limits for different surfaces. A report may show 1,000 rows. A tool may return 125 compact rows. A confirmation message may preview only 8 items. Those are different jobs.

Fifth, log enough to debug the agent loop without storing more sensitive payload than necessary. In this package, each major step logs elapsed time: building chat messages, preparing prompt context, calling the model, parsing tool calls, and executing tools. When agents behave strangely, these timings and payload boundaries are extremely useful.

Finally, remember that security still belongs in the database and application layer. The model can be instructed to behave, but PL/SQL should still assert project access, validate people, check statuses, and reject invalid source locations.

Conclusion

Building this agent reminded me that context management is one of the core design skills for practical AI applications.

The goal is not to give the model everything. The goal is to give it the smallest useful slice of information, at the right time, through a tool that enforces the same business rules your application already depends on.

RAG helped because it limited broad semantic searches to relevant candidates before the relational filters and row limits were applied. But RAG was only part of the answer. The full solution also needed runtime authority, structured tools, truncation, count reporting, ID redaction, confirmation gates, and server-side validation.

That may sound like a lot of plumbing, but this is the difference between a demo agent and an application agent. A demo can be impressive with a large prompt and a few lucky examples. A real agent needs boundaries.

AI Agents Need Boundaries, Not Bigger Prompts

Introduction

More Context Is Not Always Better

Start With Runtime Context, Not Database Dumps

Use Tools as Context Gates

RAG as a Limiting Mechanism

Return Counts, Not Just Rows

Truncate Content Before It Reaches the Model

Separate Model Context From UI Context

Put Write Operations Behind Confirmation

Keep the Prompt Opinionated

Other Context Management Practices

Conclusion

Comments

More from this blog

Using an APEX AI Agent to Turn Purchase Orders into JSON

APEX AI Agent Handlers: Intercepting & Updating Requests and Responses

APEX AI Agent Logging with Request & Response Handlers

The anatomy of an APEX 26.1 APEXlang file

Command Palette

Introduction

More Context Is Not Always Better

Start With Runtime Context, Not Database Dumps

Use Tools as Context Gates

RAG as a Limiting Mechanism

Return Counts, Not Just Rows

Truncate Content Before It Reaches the Model

Separate Model Context From UI Context

Put Write Operations Behind Confirmation

Keep the Prompt Opinionated

Other Context Management Practices

Conclusion

Comments

More from this blog