Many recruiters are still trying to figure out how to vet automation skills.
Now add AI into the mix.
So I asked a senior QE how they’re actually using tools like Claude Code and Cursor day to day. The answer wasn’t “it writes code for me.”
It was:
• Writing test plans, test strategies, test cases, one-pagers, and automation roadmaps
• Analyzing repositories and documenting frameworks
• Creating and modifying Cypress and Playwright tests
• Debugging failures and tracing root causes
• Validating implementation decisions before involving engineers
The common theme: AI handles structure and execution. The QE provides context and judgment.
Then we got more specific and looked at Cursor.
How Cursor is actually being used
Cursor is being used for three core workflows: research, validation, and coding test cases.
1. Research (repo understanding)
Instead of manually digging through a codebase, Cursor is used to:
- Scan entire repositories
- Break down what the automation framework does
- Surface structure, patterns, and dependencies
- Generate high-level documentation
- Create summaries for team or leadership discussions
This turns a new or unfamiliar repo into something you can reason about quickly, instead of slowly reverse-engineering it.
2. Test case creation and modification
Cursor is used to accelerate test development by:
- Generating Cypress or Playwright specs based on existing patterns
- Cloning and adapting existing test cases
- Modifying tests to match new functionality
- Getting tests running locally faster for validation
The goal isn’t to replace test design but to remove repetitive implementation work so QEs can focus on coverage and edge cases.
3. Running tests and debugging failures
This is where Cursor starts to function like a co-developer.
Typical flow:
- Run a test case
- Identify what is failing
- Trace failure back to UI, DOM, or data layer
- Cross-reference repo structure
- Suggest and apply fixes directly in code
Before any changes are made, it first outlines what it plans to modify. That step keeps control and prevents blind edits.
Model usage (important nuance)
Cursor runs on models like Claude Opus and Sonnet.
In practice:
- Sonnet is sufficient for most QE workflows
- It is faster, cheaper, and strong enough for test generation and debugging
- Opus is only needed for deeper architectural or complex reasoning tasks
Most teams overuse heavier models where lighter ones would work fine.
Most teams overuse heavier models where lighter ones would work fine.
Starter questions recruiters can ask if they’re trying to vet AI usage
1. What LLM do you use most often and why?
The strongest answers are usually Sonnet, sometimes paired with Haiku depending on the task.
If someone immediately defaults to Opus for everything, it can be a signal they haven’t spent much time balancing cost, speed, and capability in real-world workflows.
2. How do you use Cursor with an LLM to perform coding tasks?
Listen for specifics.
Strong candidates will describe using AI to analyze repositories, generate tests, modify automation frameworks, debug failures, and validate ideas before escalating to engineers.
The key is whether they use AI as a coding partner rather than simply a chatbot.
3. What have you built with AI?
It doesn’t need to be production software.
A bug generator, test utility, internal tool, side project, or workflow automation all count.
Building forces people to learn prompting, context management, validation, and iteration. If they’ve never built anything, that’s often a signal their experience is surface-level.
4. How do you give AI context before asking it to produce work?
This is where the difference between junior and senior usage becomes obvious.
Look for answers involving repository context, documentation, examples, requirements, architecture details, or existing test patterns.
Strong outputs usually come from strong context.
5. Tell me about a time AI gave you a wrong answer. How did you catch it?
Every experienced QE has examples.
The best candidates will talk about validating AI-generated code, testing assumptions, reviewing outputs, and identifying hallucinations before they become defects.
If they’ve never caught AI being wrong, they’re probably not reviewing its work closely enough.
Most teams overuse heavier models where lighter ones would work fine.
Where QEs get it wrong
1. Using Opus when Sonnet is enough
Sonnet covers:
- Test generation
- Framework modifications
- Debugging workflows
- Repo-level analysis
Using Opus everywhere increases cost without meaningful benefit in most QE work.
2. Not using AI as a co-developer
This is the bigger issue.
Cursor with Claude is effectively:
- A developer
- An API engineer
- A frontend/backend assistant
- A documentation and analysis layer
But only if it’s given real repo context and used deliberately.
Too many QEs still default to:
- Asking engineers for things AI can surface quickly
- Manually tracing issues that could be accelerated
- Treating AI as a tool instead of a collaborator
The skill is no longer just writing automation.
It’s knowing how to direct an AI through a codebase, give it enough context, and validate what it produces.
Closing
The gap in QE performance today isn’t tooling but it’s depth of usage.
Some teams are still using AI to “help write tests.”
Others are using it as a live layer across the codebase to research, debug, generate, and validate work end-to-end.
That gap is widening quickly.
Appreciate Jonathan for sharing how his team is actually applying these tools in practice. It helped ground this in real workflows rather than theory for us recruiters🙂. Highly recommend QE professionals check your guide HERE.
