Improving Search with AI

Increased search click-through rate by 4%, improved customer satisfaction by 8%, and reduced engineering maintenance time by 42% by first making "relevance" mean something specific.

Overview

A cross-team initiative to rebuild HSE's core search experience, one of the highest-intent journeys on the platform, across web and app.

Search in action.

My direct contribution

Strategy, framing & design direction

Defined the four search intent types (product number, brand, category, open query) and the success criteria for each; this became the shared rubric for Design, Engineering, Data, and Research
Led the initial analysis of top-100 queries and zero-result failure patterns that shaped the redesign brief
Directed the information hierarchy on result pages: what gets promoted, what gets demoted, and why visual prominence had to follow purchase signal strength
Pushed to include the -42% engineering maintenance reduction as a success metric; most teams only measured customer-facing KPIs

Team & collaboration

Cross-team alignment & delivery

Coached and mentored the search team's designer throughout the project lifecycle, providing strategic direction on key decisions
Worked closely with the search PM to ensure both business goals and user satisfaction shaped the success metrics; search touched Merchandising, Engineering, Design, and Data simultaneously
Kept design involved in platform decisions early, not just the UI layer
Coordinated with the User Researcher on qualitative interview framing

Duration: 2024 · Channels: Web shop, main app · Platform: Elasticsearch

Impact

+4% click-through rate on search results

+8% customer satisfaction for search

−42% engineering maintenance time

The Problem

The previous search behaved like a black box. Limited control over ranking, difficult to explain why results appeared, hard to tune for different user intents, and too expensive for engineering to maintain. Every attempt to improve it made something else worse.

The harder problem was upstream: nobody had agreed on what "good search" actually meant. Engineering measured index performance. Merchandising measured product visibility. Customer service measured complaint volume. Design was measuring nothing. Before anything could improve, that fragmentation had to be resolved.

The decision that unlocked everything: defining relevance first

My most important contribution to this project happened before any screen was designed. I pushed the team to define "relevance" as four distinct scenarios, not one global metric.

HSE customers search in four fundamentally different ways: by product number shown on live TV (exact match, high intent), by brand (brand loyalty, browsing), by category (exploratory), and by open-ended query (uncertain intent, needs guidance). Each requires a different definition of a good result: different ranking logic, different result page hierarchy, different filter behaviour, different handling of zero results.

Collapsing all four into a single "relevance score" was what had made the previous system so hard to improve. Every change that helped one intent type hurt another. Splitting them created four separate improvement tracks, and four separate success metrics that Product, Engineering, and Data could all align on.

What qualitative research added

Before finalising the success metrics, the User Researcher ran interviews with customers who had recently experienced search failures. The goal wasn't to discover new intent types; those we already knew. It was to understand the language customers used to describe failure.

That language mattered for two reasons. First, it revealed that customers didn't distinguish between "no results" and "wrong results"; both felt like the product was broken. Second, it told us which failure modes were most damaging to purchase intent, which shaped our prioritisation order.

What we built

Intent-based result page hierarchy: result prominence follows purchase signal, not just keyword match
Improved filter and refinement patterns matched to how customers actually think about product categories
Better zero and near-zero result states: recovery paths instead of dead ends
Consistent behaviour across web and mobile; previously they had diverged significantly

Reflection

The platform decision, Elasticsearch, was made before design joined the initiative. That constrained some options and closed off certain architectural directions before anyone had framed the problem properly. A cleaner version of this project starts with a cross-functional problem audit before any technology is chosen.

Some zero-result failures we surfaced were catalog gaps, not search UI failures. The product simply did not exist. Design could surface those gaps, not fix them. Early in the project I treated all zero-result failures as design problems. They were not. I was clearer about that boundary as the project progressed, and the brief reflected it.

I now push for design involvement at the problem-framing stage, before platform decisions close off the option space. That is not always winnable. When it is not, the brief should state what design can and cannot change.

Takeaway

Redesigning search starts before the first screen. When design defines what relevance means, across four distinct intents, engineering, data, and product can finally pull in the same direction.