I had a really interesting conversation this week with Ryan while going through LoopQA technical evaluations and reviewing one candidate’s Playwright project submission.
And honestly, one part of the conversation stuck with me more than anything else.
One candidate openly admitted they used AI heavily throughout the assessment. Which, to be clear, is completely allowed. Candidates are literally encouraged to use AI during the exercise.
The surprising part was this:
They seemed genuinely confused that they didn’t pass.
From their perspective, the project worked.
The tests ran. The walkthrough video functioned. Nothing was visibly “broken.”
But Ryan made a point during the discussion that I think a lot of people entering this AI-assisted development era are going to run into very quickly:
AI wrote it. But it’s still your code.
That ended up turning into a much bigger conversation around what the LoopQA technical evaluation is actually testing, why certain candidates fail it, and why “working code” and “good engineering” are not the same thing.
What The LoopQA Technical Evaluation Actually Is
The assessment itself is honestly pretty straightforward.
Candidates receive:
- A target website
- Login credentials
- A set of test cases
- Instructions to automate them using Playwright
But there’s an important caveat baked into the exercise:
The provided scenarios are not supposed to be treated like isolated one-off tests.
Candidates are specifically told to structure the framework in a reusable, scalable way because theoretically more test cases could be added later.
One of the major requirements is that the tests be data-driven.
Meaning:
- test data should live separately from the test logic
- scenarios should be extendable
- future engineers should be able to add coverage without rewriting the framework itself
Ryan made a point during our conversation that I thought was important:
This is not some overly academic engineering puzzle. It mirrors real-world QA automation work pretty closely. Anyone who has worked in automation long enough has probably solved some version of this exact problem before.
Which honestly is part of why the evaluation works so well.
It exposes who understands scalable automation design versus who is just trying to get something running fast enough to submit.
Why Candidates Commonly Fail
According to Ryan, there are a few recurring patterns that show up over and over again in weaker submissions.
They Ignore The Data-Driven Requirement
This one apparently happens constantly.
Candidates hardcode test data directly into the test logic even though the instructions explicitly say not to.
The framework might technically work, but the structure immediately signals maintainability problems.
The whole point is to design something another engineer can extend later without ripping apart the implementation.
They Miss The Actual Verification Requirement
Another common issue:
Candidates automate navigation correctly but skip the most important validation step entirely.
Part of the exercise requires validating exact tag sets associated with specific items on the page.
Ryan mentioned candidates will often:
- find the right page
- find the right item
- but never actually validate the tags themselves
Which is the actual point of the test case.
Again, the code “runs.” But the test itself is incomplete.
They Use AI Without Reviewing The Output
This was probably the most interesting part of the conversation.
Ryan said he can often tell when candidates leaned heavily on AI but never really reviewed or refined what it produced.
And to be fair, the issue usually is not that the code completely fails.
The issue is that it lacks engineering maturity.
Things like:
- no page object model
- no separation of concerns
- brittle locator strategies
- poor maintainability
- no meaningful comments
- duplicated logic everywhere
One thing Ryan said really stood out to me:
Just because it works doesn’t mean it’s done well.
That feels like one of the defining engineering conversations of the AI era right now.
The Candidate Submission We Reviewed
Earlier this week, I had Ryan review a junior candidate’s submission and walk me through his feedback live.
To the candidate’s credit:
everything worked.
That already puts them ahead of a decent number of technical submissions people see in hiring processes.
But once Ryan started reviewing the structure itself, several issues surfaced pretty quickly.
The Test Cases Were Embedded Directly Inside The Tests
The first thing Ryan pointed out was that the scenarios themselves lived in the same files as the test logic.
His recommendation was simple:
Pull the test cases into fixtures or separate test case files entirely.
That way, if another engineer wants to add a new scenario later, they only update the data layer instead of modifying the implementation itself.
That’s scalable framework thinking.
There Was No Page Object Model
For people outside QA automation:
A Page Object Model (POM) is basically a reusable abstraction layer for UI interactions.
Instead of redefining selectors and page actions inside every test, you centralize them into reusable page files.
In the reviewed submission:
- login URLs were hardcoded
- locators lived directly in the tests
- reusable interactions weren’t abstracted
Ryan explained that if the UI changes later, you now have to update every individual test instead of one centralized file.
Again:
the code worked.
But long-term maintainability was weak.
The Locator Logic Was Brittle
This was one of my favorite examples from the conversation because it perfectly captures the larger issue.
The candidate repeatedly used .first() in their locator strategy because the desired item happened to appear first during that particular execution.
Meaning: the tests passed.
But if the ordering changes tomorrow, the automation either breaks or validates the wrong item silently.
That’s exactly the kind of thing experienced QA engineers look for immediately.
The Bigger AI Conversation
One thing I asked Ryan directly was whether these issues came from poor AI usage or lack of underlying engineering knowledge.
His answer was basically:
probably the second one.
AI can absolutely help accelerate development.
But getting strong output still requires enough technical knowledge to:
- guide the architecture
- recognize bad patterns
- request better abstractions
- identify maintainability problems
- evaluate whether the generated solution is actually good
Ryan mentioned that when he approached the same evaluation himself, one of the first things he instructed AI to build was the Page Object Model structure before even touching the tests.
That difference in prompting alone changes the quality of the final framework significantly.
And honestly, I think that’s the real conversation the industry is starting to have now.
Not:
“Did you use AI?”
Almost everyone does at this point.
The more important question is:
Can you evaluate what AI produced?
Because shipping generated code you don’t fully understand is probably going to create the same kinds of quality gaps we discussed throughout this entire review session.
Big thanks again to Ryan Rossbach for taking the time to walk through the evaluations and the broader engineering discussion around AI-assisted development. Honestly one of the more educational QA conversations I’ve had in a while.
