Using AI in Technical Assessments: Why One QA Candidate Failed

I had a really interesting conversation this week with Ryan while going through LoopQA technical evaluations and reviewing one candidate’s Playwright project submission.

And honestly, one part of the conversation stuck with me more than anything else.

One candidate openly admitted they used AI heavily throughout the assessment. Which, to be clear, is completely allowed. Candidates are literally encouraged to use AI during the exercise.

The surprising part was this:

They seemed genuinely confused that they didn’t pass.

From their perspective, the project worked.

The tests ran. The walkthrough video functioned. Nothing was visibly “broken.”

But Ryan made a point during the discussion that I think a lot of people entering this AI-assisted development era are going to run into very quickly:

AI wrote it. But it’s still your code.

That ended up turning into a much bigger conversation around what the LoopQA technical evaluation is actually testing, why certain candidates fail it, and why “working code” and “good engineering” are not the same thing.

What The LoopQA Technical Evaluation Actually Is

The assessment itself is honestly pretty straightforward.

Candidates receive:

A target website
Login credentials
A set of test cases
Instructions to automate them using Playwright

But there’s an important caveat baked into the exercise:

The provided scenarios are not supposed to be treated like isolated one-off tests.

Candidates are specifically told to structure the framework in a reusable, scalable way because theoretically more test cases could be added later.

One of the major requirements is that the tests be data-driven.

Meaning:

test data should live separately from the test logic
scenarios should be extendable
future engineers should be able to add coverage without rewriting the framework itself

Ryan made a point during our conversation that I thought was important:

This is not some overly academic engineering puzzle. It mirrors real-world QA automation work pretty closely. Anyone who has worked in automation long enough has probably solved some version of this exact problem before.

Which honestly is part of why the evaluation works so well.

It exposes who understands scalable automation design versus who is just trying to get something running fast enough to submit.

Why Candidates Commonly Fail

According to Ryan, there are a few recurring patterns that show up over and over again in weaker submissions.

They Ignore The Data-Driven Requirement

This one apparently happens constantly.

Candidates hardcode test data directly into the test logic even though the instructions explicitly say not to.

The framework might technically work, but the structure immediately signals maintainability problems.

The whole point is to design something another engineer can extend later without ripping apart the implementation.

They Miss The Actual Verification Requirement

Another common issue:

Candidates automate navigation correctly but skip the most important validation step entirely.

Part of the exercise requires validating exact tag sets associated with specific items on the page.

Ryan mentioned candidates will often:

find the right page
find the right item
but never actually validate the tags themselves

Which is the actual point of the test case.

Again, the code “runs.” But the test itself is incomplete.

They Use AI Without Reviewing The Output

This was probably the most interesting part of the conversation.

Ryan said he can often tell when candidates leaned heavily on AI but never really reviewed or refined what it produced.

And to be fair, the issue usually is not that the code completely fails.

The issue is that it lacks engineering maturity.

Things like:

no page object model
no separation of concerns
brittle locator strategies
poor maintainability
no meaningful comments
duplicated logic everywhere

One thing Ryan said really stood out to me:

Just because it works doesn’t mean it’s done well.

That feels like one of the defining engineering conversations of the AI era right now.

The Candidate Submission We Reviewed

Earlier this week, I had Ryan review a junior candidate’s submission and walk me through his feedback live.

To the candidate’s credit:
everything worked.

That already puts them ahead of a decent number of technical submissions people see in hiring processes.

But once Ryan started reviewing the structure itself, several issues surfaced pretty quickly.

The Test Cases Were Embedded Directly Inside The Tests

The first thing Ryan pointed out was that the scenarios themselves lived in the same files as the test logic.

His recommendation was simple:

Pull the test cases into fixtures or separate test case files entirely.

That way, if another engineer wants to add a new scenario later, they only update the data layer instead of modifying the implementation itself.

That’s scalable framework thinking.

There Was No Page Object Model

For people outside QA automation:

A Page Object Model (POM) is basically a reusable abstraction layer for UI interactions.

Instead of redefining selectors and page actions inside every test, you centralize them into reusable page files.

In the reviewed submission:

login URLs were hardcoded
locators lived directly in the tests
reusable interactions weren’t abstracted

Ryan explained that if the UI changes later, you now have to update every individual test instead of one centralized file.

Again:
the code worked.

But long-term maintainability was weak.

The Locator Logic Was Brittle

This was one of my favorite examples from the conversation because it perfectly captures the larger issue.

The candidate repeatedly used .first() in their locator strategy because the desired item happened to appear first during that particular execution.

Meaning: the tests passed.

But if the ordering changes tomorrow, the automation either breaks or validates the wrong item silently.

That’s exactly the kind of thing experienced QA engineers look for immediately.

The Bigger AI Conversation

One thing I asked Ryan directly was whether these issues came from poor AI usage or lack of underlying engineering knowledge.

His answer was basically:
probably the second one.

AI can absolutely help accelerate development.

But getting strong output still requires enough technical knowledge to:

guide the architecture
recognize bad patterns
request better abstractions
identify maintainability problems
evaluate whether the generated solution is actually good

Ryan mentioned that when he approached the same evaluation himself, one of the first things he instructed AI to build was the Page Object Model structure before even touching the tests.

That difference in prompting alone changes the quality of the final framework significantly.

And honestly, I think that’s the real conversation the industry is starting to have now.

Not:
“Did you use AI?”

Almost everyone does at this point.

The more important question is:
Can you evaluate what AI produced?

Because shipping generated code you don’t fully understand is probably going to create the same kinds of quality gaps we discussed throughout this entire review session.

Big thanks again to Ryan Rossbach for taking the time to walk through the evaluations and the broader engineering discussion around AI-assisted development. Honestly one of the more educational QA conversations I’ve had in a while.

A Candidate Used AI To Complete Their Technical Evaluation. They Were Surprised They Didn’t Pass

What The LoopQA Technical Evaluation Actually Is

Why Candidates Commonly Fail

They Ignore The Data-Driven Requirement

They Miss The Actual Verification Requirement

They Use AI Without Reviewing The Output

The Candidate Submission We Reviewed

The Test Cases Were Embedded Directly Inside The Tests

There Was No Page Object Model

The Locator Logic Was Brittle

The Bigger AI Conversation

Never Miss a New Post

Leave a ReplyCancel Reply

What The LoopQA Technical Evaluation Actually Is

Why Candidates Commonly Fail

They Ignore The Data-Driven Requirement

They Miss The Actual Verification Requirement

They Use AI Without Reviewing The Output

The Candidate Submission We Reviewed

The Test Cases Were Embedded Directly Inside The Tests

There Was No Page Object Model

The Locator Logic Was Brittle

The Bigger AI Conversation

Never Miss a New Post

Related Posts

Candidates are worried about “beating the AI.” That’s the wrong fight.

I Expected to Learn About AI. Instead, I Learned How Recruiters Need to Think Differently

The Most Interesting AI Builds I Keep Seeing In Talent Acquisition Aren’t About Sourcing

Leave a ReplyCancel Reply