Skip to main content

Out of Memory Issue in Production


When I was working on an NLP-based chatbot, we occasionally encountered Out of Memory (OOM) issues. These were not frequent enough to disrupt the service entirely, but happened often enough to raise concern. Since we were building the application from scratch, we hadn’t initially included the appropriate Java arguments to generate a heap dump, which made it difficult to debug the issue when it happened.

These OOM errors occurred sporadically, and during that time, the system was serving more than 50,000 concurrent users. Since none of the customers reported issues, we couldn't immediately prioritize deeper investigation. Eventually, we added the necessary Java arguments to generate a heap dump when an OOM occurs and waited for the issue to happen again. While waiting may not sound like the best strategy, we did our best to reproduce the problem under controlled scenarios.

If you're interested in learning how to capture and analyze heap dumps, I’ve written a dedicated post about it here: How to Capture and Analyze Heap Dump.

Pre-Context:

Our product allows users to create a set of "skills." Each skill contains multiple "actions," and every action is defined with more than three trigger sentences.
For example, I owned three different entities:
  1. Banking
  2. Insurance
  3. Housing Finance
Each entity is mapped to a skill. For instance, the Banking skill includes actions such as:
  • Balance Check
  • Add Beneficiary
The Insurance skill might include actions like:
  • Maturity Date
  • Next Payment Date
  • Pending Tenure
To keep things simple, let’s focus only on the Balance Check action from the Banking skill.

This action has several trigger sentences like:
  • "What is my current balance?"
  • "What is my saving balance?"
  • "What is my balance?"
    • (Follow-up question will be asked by the chatbot: whether it's current or saving)
  • "Show my current balance"
  • "Fetch my saving balance"

In these sentences, terms like "current" and "saving" are identified as part of the same field and are marked mandatory by the developer owning the skill. This structured input is then passed to an in-house function that works similarly to AWS Lambda

Context:

Initially, we were chunking all trigger sentences (across all actions inside a skill) every single time a user input was received. We used Apache OpenNLP for this chunking process. However, performing this operation repeatedly consumed a large amount of memory, which eventually led to the OOM issues.

Solution:

To solve this, we introduced a new column in our data model. During the skill creation or update phase, we now preprocess the trigger sentences, chunk them once, and store the result in this column. When a user sends an input, we only chunk the user input and compare it against the preprocessed chunks.

Yes, the solution is simple when you look back. We could have designed it this way from the beginning. But as my professor always said, "Get a working model first, and then improve it." In many cases, it's more important to deliver a functional product early and iterate based on real-world use rather than aim for a perfect product upfront.

Comments

Popular posts from this blog

One minute problem (Clock Drift Issues)

While working on a chatbot project based on Apache OpenNLP , we followed a microservice architecture , as the product was new and being built from scratch. We had multiple independent services that communicated internally using a custom token along with a user ticket for authentication. The custom token was time-based , valid for only one minute , and required a positive time difference.  If the client’s time was ahead of the server, the generated token would be considered invalid before it even reached the server because the server calculates the time difference as a negative value, which violates the validation rule requiring a positive time difference within one minute . A request would be processed successfully only if both the custom token and the user ticket were valid. If either validation failed, the request would return an error response . We deployed all microservices on a centralized high-performance server , allowing each service to run independently. During develo...

Two Contexts/Webapps Cache Problem

Today, we're going to discuss a simple bug that marked the beginning of my learning journey. It may seem basic now, but for someone from a humble background who had never even touched a computer, it was a significant challenge. Looking back, it might not seem like much, but it holds a special place in my heart as the starting point of my growth. In the early days, we used the Common ClassLoader ( tomcat/lib ) for all web applications (contexts). Let's assume we had two contexts named app1 and app2 —both applications' JAR files were placed in the common class loader, allowing them to access each other's classes. From a security perspective, this approach was not ideal. To address this, we switched to using the WebApp ClassLoader ( tomcat/webapps/app1/WEB-INF/lib/ ), ensuring isolation between applications. We maintain a HashMap that stores user details against a unique ID retrieved from cookies. If no entry exists for the given ID, we check the database to see if there ...