Friction - AI-Powered Usability Testing

What is Friction?

Friction is an AI-powered usability testing tool. It accepts a Figma design link or a live website URL, generates a set of realistic user personas, and simulates each persona attempting a task on that design - producing a full report with heatmaps, usability scores, and prioritised findings.

The simulation methodology is grounded in Jakob Nielsen's think-aloud protocol and the 12-step usability testing framework - the same approach used in professional UX research labs. The difference is speed and cost: a Friction session takes seconds and runs entirely in the browser without recruiting participants.

Under the hood, Friction uses two Anthropic models. The full session analysis runs on Claude Opus 4.6- Anthropic's most capable model - for its reasoning depth and ability to maintain persona consistency across a structured multi-output response. Persona generation and task suggestion use Claude Haiku 4.5, a faster, lighter model suited to those simpler structured tasks.

The short version: paste a link → Haiku generates 3 personas → Opus simulates each persona attempting the task → you get heatmaps, scores, and a findings report.

Step 01

Your link

Friction accepts two input types, detected automatically from the URL you paste.

figma.com/design/...

Figma design link

A direct link to a frame or screen in a Figma file. The URL must include a node-id parameter pointing to the specific frame. Friction reads the design and simulates interaction with it without requiring Figma API access.

https://yoursite.com

Live website URL

Any publicly accessible https URL. Friction analyses the live page as a browser would render it. The URL must use https or http - file:// and localhost URLs are not supported in production.

URL detection

Detection checks two things: (1) the protocol must be https: or http:, and (2) if the hostname includes figma.com it is classified as a Figma link; otherwise it is classified as a website. The urlType value ("figma" | "website") is passed through to every API call so the AI can adapt its simulation context accordingly.

Task suggestion

When a valid URL is entered, the client fires a debounced POST to /api/suggest-task after 800ms of inactivity. This calls Haiku with the URL and type, asking it to produce one goal-based task in 12–20 words. The suggestion appears inline and can be accepted or ignored - it does not run automatically.

If you leave the task field blank, the session prompt instructs Opus to infer the most realistic primary task from the URL and persona goals. In most cases, providing a task produces more focused and accurate results.

Step 02

User personas

Friction generates exactly 3 personas per session via a POST to /api/suggest-personas. The prompt asks Haiku to produce three meaningfully distinct personas who would realistically attempt this task on this product.

Personas are varied across three axes that directly influence how each person experiences a design: technical literacy (Low / Medium / High), product familiarity (First-time / Returning), and emotional state (their mindset entering the session). These three axes were chosen because they are the dimensions most likely to produce divergent scoring - a low-literacy first-time user and a high-literacy returning user will have genuinely different experiences even on a well-designed page.

Persona schema

{
  "id": "persona-1",
  "name": "Sarah Okafor",
  "age": 58,
  "occupation": "Semi-retired teacher",
  "techLiteracy": "Low",
  "productFamiliarity": "First-time",
  "emotionalState": "Cautious - wants to book correctly, anxious about making a mistake",
  "goal": "Find a double room with breakfast included for a weekend stay"
}

Personas are visible on the session page before the analysis runs. You can review the generated set and re-run if the personas don't match your target audience. The personas are passed as a JSON array directly into the main session prompt.

Step 03

The simulation

The full session runs as a single POST to /api/run-session using Claude Opus 4.6 with a 4,000 token output limit. The entire report - simulations, heatmaps, scores, and findings - is produced in one structured JSON response. Opus was chosen over Sonnet for its stronger instruction-following on complex nested schemas and its more nuanced persona voice differentiation.

The system prompt

The system prompt embeds several layers of constraint that shape the output quality:

Think-aloud framing

The model is instructed to inhabit each persona fully - their age, literacy, emotional state, and goal - and narrate their interaction using first-person internal monologue. Internal monologue must sound like a real person, not a UX audit.

Goal-based perspective

Personas think in goals, not features. The prompt explicitly prohibits action descriptions like "click the button labelled Reserve" - instead it must be "I need to pay, where do I go from here?" This keeps the simulation grounded in user intent.

Hesitation taxonomy

The prompt distinguishes three hesitation types. Decision hesitation (weighing options) does not penalise clarity or flow. Comprehension hesitation (unclear label or concept) penalises clarity. Orientation hesitation (can't find where to go) penalises flow.

Scoring calibration

Scores are persona-relative - each persona is scored on what they actually experience. A well-designed page should score high for all personas. Scores only diverge when a specific persona genuinely encounters friction others would not, such as jargon that confuses a novice but not an expert.

Absolute prohibition

The model is forbidden from suggesting further user testing, recruiting participants, running A/B experiments, or any other form of meta-recommendation. Every output must be a concrete design or functionality change.

Step structure

Each persona produces exactly 3 steps - a constraint imposed by the 4,000 token budget across all personas, heatmaps, scores, and findings in a single response. Each step has strict word limits enforced in the prompt:

action12 words

What the persona physically does - scrolls, clicks, reads, abandons.

internalMonologue25 words

First-person think-aloud voice, specific to this UI at this moment.

attention10 words

What they are focused on - not what they should focus on.

outcomeenum

"success" | "hesitation" | "failure" - no middle ground.

outcomeDetail12 words

Exactly what happened at this step, tied to the outcome enum.

On hesitation and scoring:a user pausing to carefully weigh a non-refundable rate against a flexible rate is exercising rational financial judgement - not struggling with the UI. This does not lower clarity or flow. It may affect emotionalFit if the design doesn't provide adequate reassurance copy around the high-stakes decision. Speed of completion is explicitly excluded from flow scoring.

Heatmaps

Heatmaps are generated per persona and represent predicted attention distribution across the page. Each heatmap produces exactly 4 zones - positioned as percentage coordinates (x, y, width, height as 0–100 values relative to the viewport) - with an attention level and an eye-tracking rationale.

The model is instructed to apply established eye-tracking scan patterns based on the page type. These patterns come from decades of research by the Nielsen Norman Group, the Poynter Institute, and academic eye-tracking studies:

F-patternText-heavy pages (articles, documentation, listings)

Users scan a full horizontal strip across the top, then a second shorter strip further down, then a vertical strip down the left edge. Content in the right gutter of a text-heavy page is frequently ignored.

Z-patternSparse hero layouts (landing pages, marketing sites)

The eye travels across the top (logo → CTA), diagonally down-left to the next anchor, then across the bottom. Works well for pages with a single primary action and minimal competing content.

Gutenberg diagramLow-density content grids

Attention concentrates in the top-left primary optical area and the bottom-right terminal area. The top-right fallow area and bottom-left strong fallow area receive minimal attention.

Attention levels

Hot

Hero headlines, primary CTAs, prices, lead images. First-fixation zones.

Warm

Secondary content scanned en route to the goal - subheadings, supporting text.

Cool

Elements that exist in the layout but rarely receive deliberate attention.

Dead

Ignored entirely - footers, sidebars, elements below the fold for impatient users.

Each heatmap also includes a missedCritical array - elements that should have received high attention given the task (e.g. a "Breakfast included" column header when the task is to find breakfast options) but were predicted to be missed entirely. These often surface the most actionable design findings.

Usability scores

Each persona is scored 1–10 across five dimensions. The overall score is the mean of the five dimensions. averageScore at the report level is the mean of all persona overall scores.

Scores are persona-relative - calibrated to what this specific persona experiences, not an absolute standard. The prompt explicitly enforces that a Low literacy first-time user and a High literacy returning user on the same page must differ by more than 1 point across relevant dimensions if their profiles differ significantly. Compressed scores (e.g. both scoring 6.2 and 6.8) indicate the simulation is not differentiating correctly.

Calibration bands

9–10Exceptional

Near-flawless usability. Industry-leading UX. Extremely rare on first iteration.

7–8Good

Clear, functional, minor friction only. A well-designed commercial site or Figma prototype typically lands here.

5–6Average

Usable but with noticeable friction. Some tasks require effort. Common on first-launch MVPs.

3–4Poor

Significant friction. Multiple confusion points or task failures. Needs substantial design work.

1–2Broken

Core tasks cannot be completed. Fundamental UX failures present. Must be addressed before any launch.

Dimensions

Clarity

How clearly does the page communicate what it is and what to do next - for this specific persona. A Low literacy user may score a complex options matrix at 3/10 while a High literacy user scores it 7/10. Poor clarity means users must infer intent rather than read it.

Flow

How cleanly the correct path is laid out. Penalised by backtracking, wrong turns, and disorientation - not by speed or deliberation. A persona who reads carefully and follows the correct route without deviation scores high on flow even if they pause frequently.

Accessibility

How well the design works for this persona's specific technical literacy, device type, and cognitive load tolerance. Accounts for dense information architecture on mobile, small touch targets, and jargon that assumes domain knowledge the persona doesn't have.

Feedback

How clearly the interface communicates the result of every action. Covers loading states, error messages, success confirmations, and inline validation. A form that submits silently - with no visual response - scores 1–2 on feedback regardless of other qualities.

Emotional fit

How well the tone, visual language, and content register for this persona emotionally. Includes anxiety from high-stakes decisions (financial risk, irreversible actions) even when the UI is technically clear. A non-refundable booking without reassurance copy scores lower for an anxious first-time buyer than for a seasoned traveller.

Your report

All findings are organised into four categories, capped at 2 items each. The cap is intentional - it forces prioritisation and prevents the report from becoming a noise-heavy laundry list. The most impactful issues surface; minor variations of the same issue are merged.

Every finding must name the specific element and its precise impact. The prompt explicitly flags "navigation is confusing" as an example of an unacceptable finding. A valid equivalent would be: "The checkout CTA sits below the fold on the initial viewport - both low-literacy personas scrolled past it without clicking."

Critical issues

Blockers that prevent task completion. Rated 0–4 on the Nielsen severity scale. Severity 4 means the task is impossible; severity 3 means it seriously impedes completion. Each finding includes the specific element, its location, which personas were affected, and the observed impact.

Friction points

Elements that slow users down or cause confusion without fully blocking them. Includes unclear labels, non-obvious interactive elements, information overload, and layout patterns that violate convention. Each includes a suggested fix (concrete design change, not a generic recommendation).

What's working

Genuinely effective elements - clear hierarchy, well-labelled CTAs, appropriate feedback, natural flow. These are as important as the failures: they tell you what not to change, and give you a baseline for what good looks like in this design.

Recommendations

Specific, prioritised design changes. Every recommendation names the exact element to change, the change to make, and the rationale tied back to observed behaviour. Ranked High / Medium / Low by impact. Critically: Friction never recommends further user testing, A/B experiments, or recruiting participants - only design and functionality changes.

Finding schema

Critical issues and friction points carry structured metadata that maps directly to actionable next steps:

severity0–4 integer

Nielsen severity scale. Applied to every critical and friction finding. Severity 4 = must fix before launch.

affectedPersonasstring[]

Which personas encountered this issue. Tells you whether this is universal or segment-specific.

locationstring

Where on the page the issue occurs. On critical issues, this is the specific element (e.g. "Checkout CTA, below fold on mobile viewport").

suggestedFixstring

Concrete design or copy change - on friction findings. 15 words max, always a specific action.

rationalestring

Link to the observed evidence. Ties the fix back to what a specific persona actually experienced.

priorityHigh|Medium|Low

On recommendations only. Severity beats frequency - one severity-4 issue outranks ten severity-1 issues.

Nielsen severity scale (0–4): 0 = not a usability problem. 1 = cosmetic only - fix if time allows. 2 = minor problem - low priority. 3 = major problem - impedes task completion, prioritise. 4 = catastrophic - users cannot complete the task, must fix before launch.

Architecture

For developers and technical UX practitioners who want to understand what's happening under the hood.

Tech stack

Next.js 15

App Router, API routes, server components

TypeScript

Strict mode throughout

Tailwind CSS

Utility-first styling, custom design tokens

shadcn/ui

Headless component primitives

Phosphor Icons

Icon system

Anthropic SDK

Claude API calls - server-side only

Zod

API route request validation

localStorage

Session persistence - client only, no server DB

API routes

POST /api/suggest-taskHaiku 4.5 · 120 tokens

in: { url, urlType }out: { task: string }

Called on a debounced 800ms timer when a valid URL is entered. Returns one goal-based task suggestion.

POST /api/suggest-personasHaiku 4.5 · 900 tokens

in: { url, urlType, task? }out: Persona[3]

Called once at the start of a session. Returns exactly 3 persona objects as a JSON array.

POST /api/run-sessionOpus 4.6 · 4,000 tokens

in: { url, urlType, personas, task? }out: SessionReport

The main session call. Produces the complete report - simulations, heatmaps, scores, and findings - as a single structured JSON response.

Session schema

interface Session {
  id: string;           // UUID v4 - generated client-side at session creation
  url: string;          // The tested URL
  urlType: "figma" | "website";
  task?: string;        // User-provided or AI-inferred
  createdAt: string;    // ISO 8601 - used to compute 10-day expiry
  personas: Persona[];  // Populated after /api/suggest-personas
  report: SessionReport | null; // Populated after /api/run-session
  status: "pending" | "complete" | "error";
}

Data flow

User pastes URL → client detects type and begins 800ms debounce

Debounce fires → POST /api/suggest-task → task suggestion displayed inline

User clicks Start test → UUID generated, session written to localStorage with status: pending

Client navigates to /session/[id] → POST /api/suggest-personas → 3 personas rendered

POST /api/run-session fires with URL + task + personas → Opus 4.6 responds with full JSON report

Report parsed and written to localStorage → session status updated to complete

UI renders heatmaps, scores, and findings from the stored report

Data & privacy

Friction has no server-side database. Here's the exact data handling:

Sessions live in localStorage only

The session list and all report data are stored under Friction-specific keys in your browser's localStorage. Nothing is written to any server. Clearing your browser data removes all sessions permanently.

Expiry is calculated client-side

Each session stores a createdAt ISO timestamp. On load, the client filters out sessions where Date.now() − createdAt > 10 days. There is no server-side TTL - expiry is enforced by the client on each page visit.

What is sent to the Claude API

The API request includes: the URL you're testing, the urlType, the task description, and the persona array. No browser metadata, IP address, or user identifier is included. The API key is server-side only and never exposed to the client.

Anthropic's data handling

Requests sent to the Claude API are subject to Anthropic's API usage policy. By default, Anthropic does not use API inputs to train models. Refer to Anthropic's privacy policy for full details.

No accounts, no analytics, no cookies

Friction has no login system, no session tracking, no analytics scripts, and no third-party cookies. The only external request the client makes is to Anthropic's API - via the server-side route, not directly from the browser.

Back to Friction

How Friction works