BlogProduct

What SolveLab's analytics actually show you after an assessment

SA
Santiago Alvarez·Feb 26, 2026·5 min read

A score tells you someone performed well. It doesn't tell you why. For most hiring assessments, that's an acceptable tradeoff. For AI skills, the "why" is the entire point.

Whether a candidate arrived at the right answer matters less than how they got there. Did they over-rely on AI? Did they validate outputs? Did they give up after one prompt or iterate toward something better? A single number can't answer those questions. SolveLab's analytics dashboard is built to answer them.

What you see when you open the results

The first thing on the results page is an overall score from 0 to 100. This is a weighted average across all scoring dimensions, adjusted for the question types in the AI hiring assessment. It's a reasonable sorting metric when you have twelve candidates and need to narrow to four, but it's not where you should stop.

Watch: navigating the analytics dashboard after a completed AI hiring assessment.

Below the overall score, each question shows a breakdown across behavioral dimensions. For a typical assessment, these include context quality, prompt precision, output validation, iteration strategy, and efficiency. Each dimension has a score and, more importantly, evidence: specific observations from the candidate's responses and Copilot interactions that explain the score.

If a candidate scored low on output validation for a particular question, you'll see the exact moment where they accepted incorrect information from the AI without checking it. If they scored high on iteration strategy, you'll see the sequence of prompts where they refined a vague initial output into something specific and useful.

What each metric actually tells you

Context quality measures whether the candidate gave the AI enough information to produce a useful response. A candidate who sets up the scenario, specifies the audience, and defines constraints before prompting will score higher than one who pastes a block of text with "help me with this." This maps directly to day-to-day AI effectiveness. People who provide good context get better results from AI tools at work.

Prompt precision captures whether the requests were specific enough. Vague prompts produce vague outputs. Candidates who ask for "a summary" get generic summaries. Candidates who ask for "a three-sentence summary for the VP of Engineering focusing on the timeline risk" get something they can actually use.

Output validation is arguably the most important dimension for hiring decisions. It measures whether the candidate checked the AI's work. Did they accept a hallucinated statistic? Did they notice when a recommendation contradicted the scenario's constraints? Candidates who score high here have developed the habit of treating AI output as a draft, not a final product.

Iteration strategy looks at how the candidate responded when the AI's first answer wasn't good enough. Did they refine their approach? Did they build on what worked and discard what didn't? Or did they give up and submit whatever came back first?

How two similar scores can look completely different

Imagine two candidates who both score 72 overall on a product manager assessment.

Candidate A used the Copilot extensively. They prompted for every section of their response, accepted most outputs with minor edits, and completed the assessment quickly. Their prompt precision was high, but their output validation was low. They missed two factual errors in AI-generated responses and included a recommendation that contradicted information in the scenario.

Candidate B used the Copilot selectively. They wrote their strategic analysis independently, used AI to help draft a stakeholder communication, and then spent time verifying the AI's suggestions against the scenario details. They were slower, but their output validation score was significantly higher. They caught every error.

Both candidates got 72. But Candidate A would need coaching on verification habits, while Candidate B would need encouragement to use AI more broadly. The score alone can't tell you that. The dimensional breakdown can.

What hiring teams do with this data

The most common use is sharpening the interview that follows the assessment. Instead of asking generic questions about AI experience, interviewers can ask about specific moments from the assessment. "In question three, you accepted the AI's market sizing estimate without adjusting it. Walk me through your thinking." That conversation is more productive than any abstract discussion about AI skills.

Teams also use the dimensional profiles to match candidates to roles. A candidate with high prompt precision but low output validation might be a good fit for a role with strong review processes. A candidate who scores high on independence but low on AI utilisation might be better suited for a role that's just beginning to adopt AI tools.

The analytics aren't a replacement for judgment. They're the raw material that makes judgment possible.

See AI skills assessments in action

SolveLab builds custom assessments tailored to your roles. Try it free — no credit card needed.

Try for free