Why QA Might Be the Most Important Team in AI Right Now

A Conversation with JB Mesquita (N8N Ambassador) on Testing, N8N, and the Future of AI Workflows

I’ve had to watch over the years as QA took a backseat to development. In terms of perception, QA was often treated like the underdog: the last stop in the software development lifecycle, the final checkbox before production. But AI is changing that dynamic fast.

When your systems are powered by probabilistic outputs instead of deterministic logic, validation becomes the product. Suddenly the question isn’t just “Did it run?”, it’s “Did it produce the right outcome?” And with that shift, quality engineering has moved front and center.

Honestly? I’m here for it.

I recently sat down with JB, an engineer @ Ovidius AI, whose background spans software development, data engineering, and AI automation. What started as a conversation about n8n quickly turned into a much bigger discussion around observability, testing AI systems, and why the future of engineering may belong to the people who know how to validate what AI produces.

The Rise of N8N and Why Engineers Are Moving Toward It

At the center of JB’s workflow stack is n8n, an open-source workflow orchestration platform that’s rapidly gaining traction in the AI automation space.

Unlike older automation tools like Zapier or Make, JB believes n8n was built with AI-native workflows in mind rather than having AI added later as a feature layer.

For non-technical users, tools like Zapier made automation approachable. But for power users building production-grade systems, JB argues the cracks start to show quickly: debugging limitations, connector dependency, pricing models tied to node executions, and lack of visibility into what’s actually happening inside a workflow.

N8N solves a lot of that.

“You can inspect everything node by node,” JB explained. “You can pin outputs, freeze data at certain steps, and see exactly what’s flowing through the system.”

That visibility matters more than ever in AI workflows where outputs are variable by nature.

JB also pointed to n8n’s flexibility as a major differentiator. While many automation platforms rely heavily on official integrations, modern AI workflows increasingly revolve around APIs, MCPs, and raw HTTP requests. In other words: if a tool exposes an endpoint, n8n can probably connect to it.

And for advanced users, self-hosting changes the economics entirely.

JB self-hosts n8n using Docker, which removes execution limits and gives him more control over logging, scaling, and security. That level of ownership is something he believes more engineers need to think about as AI systems become deeply embedded into company operations.

As someone with a non-technical background, I’ve personally experimented with both Make and n8n and I understood exactly what JB meant.

I’ve tested both the free cloud version of n8n and a locally hosted setup using Docker, which, honestly, was a pain for me to get running at first. I also spent a lot of time watching tutorials from creators like Ben van Sprundel @ BenAI92. But one thing JB said really stuck with me: watching someone build a workflow and actually building one yourself are two completely different skills.

N8N is definitely more technical than Make, but that tradeoff comes with flexibility. You can go much deeper with it once you understand the architecture and logic behind the workflows. The last time I used it, n8n had an AI assistant built into the platform that helped guide workflow creation, which made the learning curve feel less intimidating. Using that, I was able to build a few simple outsourced staffing sales workflows for account reps and actually get them running well.

If you’re curious about AI automation and willing to experiment a little, I’d genuinely recommend checking it out. My friends love it.

Why AI Changes the Entire Testing Conversation

One of the most interesting parts of our conversation was JB’s framing of how testing changes once AI enters the stack.

Traditional software automation is deterministic. If X happens, then Y should happen. Inputs and outputs are predictable.

AI breaks that model.

The same prompt can generate different outputs depending on the LLM, the configuration settings, model updates, temperature tuning, or even subtle context changes.

That means testing can’t just focus on whether a workflow executed successfully. It has to evaluate whether the result itself was correct, useful, safe, or aligned with expectations.

That’s where observability comes in.

JB repeatedly emphasized the importance of logging everything: execution history, outputs, errors, retries, token usage, timestamps, and behavioral drift over time. His workflow architecture reflects his data engineering background, every automation is designed with monitoring in mind.

He uses Supabase to store execution logs from his workflows, creating a searchable history of failures and outputs that can even be fed back into Claude for debugging and iteration.

“You can show Claude the logs and create a new version,” he said.

That loop: build, test, log, refine is becoming the new normal in AI engineering.

And according to JB, QA can no longer exist as a separate final-stage process.

“QA is almost walking hand by hand with development now,” he told me. “I can’t build something and only test it when I move to production.”

The Problem With “Just Throw It Into Claude”

One topic we spent time on was the growing trend of people trying to “vibe code” entire systems directly inside tools like Claude Code or Cursor without understanding the underlying architecture.

JB doesn’t dismiss AI coding tools at all, he uses Claude every single day.

But he strongly pushes back against the idea that prompting alone replaces engineering.

His concern isn’t whether AI can generate code. It’s whether the human behind the keyboard understands what the AI might be missing.

Security gaps. Authentication issues. Retry handling. Logging. CI/CD considerations. Data privacy laws like GDPR and LGPD. Edge cases. Long-term maintainability.

AI won’t automatically protect you from problems you don’t know to ask about.

One of the sharpest moments in the conversation came when JB asked:

“Who is driving the car? Are you ready to maintain that car?”

That’s really the heart of the debate.

The issue isn’t whether Claude is powerful. It’s whether organizations are building systems they can realistically support six months from now after the original builder leaves.

JB’s answer isn’t anti-AI, it’s hybrid architecture.

“The sweet spot is to combine both,” he said. “Use Claude as the brain to manage workflows but use N8N as the backbone.”

In practice, that means giving Claude MCP access to an n8n environment so the AI can help create, edit, and test workflows in natural language while n8n handles execution, orchestration, logging, retries, and observability.

AI provides the speed. The workflow infrastructure provides the reliability.

How JB Tests N8N Workflows

JB’s testing methodology was one of the most practical parts of the conversation.

His process starts with node-by-node validation during development. Instead of running an entire workflow and hoping for the best, he pins outputs at each step and validates what’s moving through the pipeline before continuing downstream.

That prevents cascading failures and makes debugging dramatically easier.

From there, every workflow feeds execution logs into Supabase for long-term monitoring.

He also stressed the importance of configuring retry logic and explicit error handling, something many builders skip entirely.

According to JB, silent failures are one of the biggest risks in production automation today.

If an API times out, a token expires, or a downstream system changes format, workflows can quietly stop producing results while appearing operational on the surface.

His philosophy is simple: never let workflows fail invisibly.

Advice for Engineers Trying to Learn AI

Toward the end of the conversation, I asked JB what advice he’d give QA engineers or developers trying to break into AI automation.

His answer was refreshingly practical.

Start by building something you’ll actually use.

Not a tutorial project. Not a copied YouTube workflow. Something tied to your real daily life.

For example:

  • Create a Gmail and calendar summary
  • Build an AI-powered meeting digest
  • Automate repetitive reporting tasks
  • Scrape information you regularly search for

“The goal is to hit real blockers,” JB explained. “That’s how you learn.”

From there, he recommends experimenting with different LLMs and configurations. Swap Claude for Gemini. Change the temperature settings. Compare outputs. Learn how model behavior changes.

That experimentation mindset matters because AI engineering isn’t just about prompts anymore. It’s about evaluation.

And that brings the conversation full circle.

For years, QA was treated as the final checkpoint before release. But in AI systems, validation isn’t the last step, it’s embedded into every step.

The engineers who know how to test, observe, debug, and validate AI behavior may end up becoming some of the most valuable people in the room.

Huge thanks to JB for sitting down with me and openly sharing both the technical side and the real-world lessons behind building AI workflows in production.

Never Miss a New Post

Get the latest posts and tips delivered straight to your inbox.

I don’t spam! Read my privacy policy for more info.

Leave a Reply

Your email address will not be published. Required fields are marked *