AI QA Interviews Are Testing Systems Thinking, Not Just Frameworks

I recently had a conversation with Gregory Goldshteyn, a QA engineering leader at Fox Corporation working on streaming playback and quality initiatives across platforms like Apple TV, Roku, and Fire TV.

The biggest takeaway?

The hiring market for AI-QA roles is shifting quickly, and a lot of candidates and recruiters are still screening for the wrong signals.

Ironically, some of the strongest candidates for these roles may not have “AI” all over their resumes.

Why? Because engineering teams are starting to separate people who can prompt models from people who can engineer reliable systems around them.

That distinction came up repeatedly throughout our conversation.

Greg mentioned that a large portion of the market still approaches “AI testing” at a surface level. A polished UI, a GPT integration, or asking ChatGPT to generate Playwright tests is often where the conversation stops. But once teams start discussing the infrastructure underneath, eval loops, CI integration, structured outputs, grounding, reliability, and how quality is actually measured in production, the gap becomes obvious.

One line from the conversation stuck with me:

“Engineering beats prompting every time, and that’s the muscle most people haven’t built.”

A lot of engineers trying to transition into AI-focused QA work are still treating the LLM as the product itself instead of one component inside a larger system. Prompting is the easy part. The hard part is everything around the model: tool calling, deterministic harnesses, eval design, drift detection, structured outputs, CI integration, and building systems that continue working once the demo environment disappears.

That shift also changes how teams think about testing.

Traditional QA lives in a pass or fail world. Assert equals. Deterministic outputs. AI systems introduce probability and confidence scoring into the equation. Instead of asking “did this pass?” teams are increasingly asking:

• Is the system reliable enough?
• Is the model drifting?
• Is this actual signal or just noise?
• How do we evaluate something that is right 97 percent of the time?

That mindset shift is much larger than learning a new framework or adding “AI” to a LinkedIn headline.

Another point that stood out was around hiring itself.

Traditional QA reqs typically screened for frameworks, languages, and manual versus automation experience. AI-QA hiring is starting to screen for systems thinking, evaluation design, and the ability to reason about non-deterministic behavior.

The interview questions are evolving too.

Less:
“Write a test for this function.”

More:
“Here’s a flaky agent. How would you measure whether it’s actually good?”

According to Greg, that single question separates people who understand production systems from people who only understand demos.

There was also an interesting observation about recruiting and resume screening.

A lot of hiring workflows are still keyword matching around “AI” and “ML,” but some of the strongest candidates are actually engineers with deep infrastructure experience who understand feedback loops, measurement, reliability, and production systems, even if they’ve never labeled themselves as “AI engineers.”

That matters because generating test cases is often the easiest part.

The difficult part is grounding the model in the actual state of the application, designing useful interactions, handling app changes, building self-healing mechanisms, and integrating everything without requiring constant human intervention.

One of the most practical pieces of advice Greg shared for job seekers was simple:

Build something real.

Not a notebook. Not a prompt library. Not another “ChatGPT generated test case” demo.

Build a small end-to-end system that can actually catch a failure reliably and explain how you measure whether it’s working correctly.

That’s the type of work hiring managers seem to be paying attention to now.

Big shoutout to Gregory for taking the time to share insight into both the engineering and hiring side of the market. Really interesting perspective on where QA, AI infrastructure, and evaluation systems are headed next. I highly recommend checking out his coaching and coursework initiatives he’s building around AI testing and production-grade QA systems.

Traditional QA interviews tested frameworks. AI-QA interviews are starting to test systems thinking

Never Miss a QA Post

Leave a ReplyCancel Reply

Never Miss a QA Post

Related Posts

You’re Probably Doing More Than “Manual Testing”

AI Doesn’t Fail Because of the Technology

Let’s Not Roll the Dice on a $130K Engineering Seat

Leave a ReplyCancel Reply