Out of Memory Issue in Production

When I was working on an NLP-based chatbot, we occasionally encountered Out of Memory (OOM) issues. These were not frequent enough to disrupt the service entirely, but happened often enough to raise concern. Since we were building the application from scratch, we hadn’t initially included the appropriate Java arguments to generate a heap dump, which made it difficult to debug the issue when it happened.

These OOM errors occurred sporadically, and during that time, the system was serving more than 50,000 concurrent users. Since none of the customers reported issues, we couldn't immediately prioritize deeper investigation. Eventually, we added the necessary Java arguments to generate a heap dump when an OOM occurs and waited for the issue to happen again. While waiting may not sound like the best strategy, we did our best to reproduce the problem under controlled scenarios.

If you're interested in learning how to capture and analyze heap dumps, I’ve written a dedicated post about it here: How to Capture and Analyze Heap Dump.

Pre-Context:

Our product allows users to create a set of "skills." Each skill contains multiple "actions," and every action is defined with more than three trigger sentences.

For example, I owned three different entities:

Banking
Insurance
Housing Finance

Each entity is mapped to a skill. For instance, the Banking skill includes actions such as:

Balance Check
Add Beneficiary

The Insurance skill might include actions like:

Maturity Date
Next Payment Date
Pending Tenure

To keep things simple, let’s focus only on the Balance Check action from the Banking skill.

This action has several trigger sentences like:

"What is my current balance?"
"What is my saving balance?"
"What is my balance?"

(Follow-up question will be asked by the chatbot: whether it's current or saving)

"Show my current balance"
"Fetch my saving balance"

In these sentences, terms like "current" and "saving" are identified as part of the same field and are marked mandatory by the developer owning the skill. This structured input is then passed to an in-house function that works similarly to AWS Lambda

Context:

Initially, we were chunking all trigger sentences (across all actions inside a skill) every single time a user input was received. We used Apache OpenNLP for this chunking process. However, performing this operation repeatedly consumed a large amount of memory, which eventually led to the OOM issues.

Solution:

To solve this, we introduced a new column in our data model. During the skill creation or update phase, we now preprocess the trigger sentences, chunk them once, and store the result in this column. When a user sends an input, we only chunk the user input and compare it against the preprocessed chunks.

Yes, the solution is simple when you look back. We could have designed it this way from the beginning. But as my professor always said, "Get a working model first, and then improve it." In many cases, it's more important to deliver a functional product early and iterate based on real-world use rather than aim for a perfect product upfront.

Debug Diaries

Search This Blog