Bringing AI solution to the end-user

The most valuable insight in this technical deep dive isn't about the latest model architecture, but a stark admission that often gets buried in hype: "An AI solution is useless until it's deployed. Period." While much of the industry chases theoretical performance, this piece from NO BS AI cuts through the noise to expose the gritty reality of getting a system running in the real world without bankrupting the client. It is a rare, grounded look at the "last mile" of artificial intelligence, where the magic of the lab often crashes into the hard constraints of budget and legacy software.

The Reality of Deployment

The article opens by dismantling the romanticized view of AI development, arguing that the "most exciting part" of experimenting with technology is often overshadowed by the "most daunting—and often overlooked—challenge" of production. NO BS AI reports, "In theory, the RAG (Retrieval-Augmented Generation) space offers ready-to-use building blocks for deployment. However, in our experience, they lack the flexibility we need." This is a crucial distinction for any organization looking to adopt these tools; the off-the-shelf solutions often fail because they cannot handle the "messy knowledge base" or integrate with specific tools like HubSpot that businesses already rely on.

The piece details a specific failure mode where standard components "returned incorrect answers to the questions sent to customer support," falling "well below the acceptance threshold." This highlights a critical gap in the current market: the tools are too rigid for complex, real-world data. The editors note that the project would have failed entirely if they had stuck with these generic frameworks. Instead, they made a conscious choice to build custom logic, prioritizing "safety and savings" over the convenience of a black-box solution.

"We opted to deploy our own custom code rather than relying on frameworks. We find that many frameworks lack transparency, making it difficult to understand what's happening under the hood."

This stance is a refreshing counter-narrative to the "no-code" movement. By rejecting opaque frameworks, the team maintained full control, ensuring that the system could actually handle the specific volume of 500–600 emails per month without unnecessary bloat. Critics might argue that building from scratch increases the initial development burden and creates long-term maintenance debt, but the authors justify this by noting that "premature optimizations were made for future scaling that may never be needed." In an era of over-engineering, this restraint is a strategic asset.

The Economics of Intelligence

Perhaps the most striking section of the coverage is the explicit focus on cost constraints as a primary design driver. The team set a hard ceiling: "500 dollars per month of fixed costs is the upper limit." This is not a theoretical exercise; it is a survival strategy for small businesses where "a couple of hundred dollars per month can be substantial in the budget." NO BS AI points out that "Azure deployments can be expensive," and many developers mistakenly assume their costs will be dominated by the AI model itself, when in reality, "the costs of some Azure tools can be surprisingly high" due to the infrastructure required to run them.

The article outlines a series of architectural trade-offs designed to keep the system within this budget. For instance, they chose to use polling via Azure Durable Functions rather than event-driven solutions, acknowledging that "polling introduces latency" but accepting it as a necessary compromise for cost efficiency. Similarly, they hosted their vector database on Azure Container Instances (ACI) despite it being "relatively expensive for persistent workloads," because it offered the only viable path to run a containerized database within a secure Virtual Network (VNet).

"While the VPN Gateway is expensive, it was a necessary trade-off for security and compliance."

This admission underscores a vital truth: security and cost are often in direct tension. The piece argues that while some choices "come with limitations, they were made consciously with the current operational scope in mind." By interviewing end-users to understand their workflow, the team ensured the technology didn't disrupt human operations, creating a system where agents could review AI-generated responses before sending them. This human-in-the-loop approach is a pragmatic solution to the "correctness" problem, acknowledging that AI is an assistant, not a replacement, in high-stakes customer support.

Bottom Line

The strongest part of this argument is its unflinching focus on the economic and operational realities of AI deployment, proving that a "robust, yet cost-efficient system" is possible without over-engineering. Its biggest vulnerability lies in the assumption that custom-built solutions are always the answer; for larger enterprises, the maintenance overhead of bypassing established frameworks could eventually outweigh the initial savings. Readers should watch for how this "safety and savings" philosophy scales as the volume of data grows beyond the current 600-email threshold.

Bringing AI solution to the end-user

by Various · NO BS AI · Read full article

The most exciting part? Experimenting with AI and developing the core technical solution.

The most daunting—and often overlooked—challenge? Deploying it to production.

An AI solution is useless until it’s deployed. Period. That’s why we knew from the start that we needed MLOps expertise to deliver value from end to end.

In theory, the RAG (Retrieval-Augmented Generation) space offers ready-to-use building blocks for deployment. However, in our experience, they lack the flexibility we need:

No built-in integration with HubSpot, the tool of choice of our client.

The off-the-shelf RAG component was too basic and didn’t account for our required preprocessing and messy knowledge base.

The architecture was rigid, limiting our ability to customize it for the client’s needs.

We tested off-the-shelf solutions, but they simply didn’t work—they returned incorrect answers to the questions sent to customer support. The correctness level was well below the acceptance threshold of the customer and the project would fail if we couldn't get off the ready-made solutions.

After reading this article you will have a clear idea on how you can deploy your RAG application which is more complex than basic tutorial examples..

We outline our architectural decisions, trade-offs, and the rationale behind them, demonstrating that while some choices come with limitations, they were made consciously with the current operational scope in mind.

Our goal was to create a robust, yet cost-efficient system tailored to the customer's needs while avoiding overengineering.

The system was designed with a clear set of assumptions to ensure efficiency, cost-effectiveness, and rapid implementation.

With an estimated volume of 500–600 emails per month, the focus was on scaling to meet this demand rather than over-engineering for hypothetical future growth. This strategic decision helped avoid unnecessary costs and complexity while ensuring a robust and stable system. The primary goal was to rapidly implement and validate the system’s usefulness, prioritizing deployment speed and real-world client feedback over building a long-term, future-proof solution.

As a key principle was to avoid over-engineering—no premature optimizations were made for future scaling that may never be needed. Instead, we made conscious technological choices with two principles in mind - safety and savings.

It turns out that Azure deployments can be expensive and if you are a small business a couple of hundred dollars per month can be substantial in your budget. We agreed that 500 dollars per month of fixed costs is the upper limit. It meant that we ...

The Reality of Deployment

The Economics of Intelligence

Bottom Line

Sources

Bringing AI solution to the end-user