eCommerce Lever
  • Categories
    • eCommerce Operations
    • Analytics
    • SEO
    • CRO
    • UX
    • On-site Search
HomeOn-site Search On-Site Search Quality in eCommerce

On-Site Search Quality in eCommerce

Alla Vovnenko on April 15, 2026
On-site Search
7 Min Read

Table of Contents

  1. Two Levels of Search Evaluation
  2. The Three Criteria of Good On-Site Search
    2.1 Completeness
    2.2 Ranking
    2.3 Relevance
  3. How to Evaluate Search Quality in Practice
  4. What “Good Enough” Looks Like

On-site search plays very different roles depending on the type of eCommerce business.

In smaller stores and in categories like apparel, search is often used sparingly. Many users prefer browsing through categories, filters, and collections. In some cases, search usage can stay in the low double digits or even below 10% of sessions.

But as catalogs grow, behavior changes. In large eCommerce sites with thousands or millions of products, search becomes a primary navigation method. It’s common to see 30–60% of users relying on search, especially when they know what they are looking for or when navigation becomes too complex.

This shift matters because expectations change with it. Search is not just another feature; it’s a direct expression of intent. And because of that, even small issues in the quality become immediately visible.

When users browse, they explore.
When users search, they expect precision.

So, how to measure the quality of internal search results?

Two Levels of Search Evaluation

Evaluating on-site search works best when separated into two distinct stages.

First stage

The first stage focuses on the search system itself. The goal is to understand whether the results are correct and whether they fully reflect the product catalog. This goes beyond simple keyword matching. It includes how well search captures intent — for example, whether “running shoes” returns appropriate products even if naming varies across the catalog.

At this stage, evaluation is manual and relies on a strong understanding of the assortment. The question is straightforward: does search return what it should?

Second stage

Once this baseline is established, the second stage looks at how users interact with those results. This is where behavioral data from analytics becomes useful. Metrics help identify patterns, edge cases, and opportunities for refinement.

To make the distinction clearer:

AspectLevel 1: Search QualityLevel 2: Behavioral Performance
GoalValidate correctness of resultsImprove performance based on user behavior
FocusCompleteness, sorting, relevanceCTR, conversions, zero-result rate
InputProduct catalog knowledge, real queriesAnalytics data, user interactions
NatureManual, qualitative evaluationQuantitative, data-driven analysis
Key questionDoes search return the right products?How do users interact with results?
When to useFirst, as a foundationAfter search quality is reliable

This separation helps keep the evaluation grounded. Search first needs to work as a system before it can be optimized as a performance channel.

The Three Criteria of Good On-Site Search

These three criteria are not evaluated in parallel. They follow a specific order.

The foundation is completeness. First, it is necessary to ensure that all relevant products are present in the results.

Once that baseline is reached, the focus shifts to ranking, or how those products are ordered.

Only after both are reliable does it make sense to look at relevance more strictly, refining which products should or should not appear at all.

Each step builds on the previous one, so the sequence matters.

Completeness

Completeness is met when all products relevant to a search query are present in the results.

This is the foundation of search quality. If relevant items are missing, the system cannot be considered reliable.

Evaluating completeness requires a strong understanding of the product catalog. It is necessary to know what should appear for a given query, including variations in naming, attributes, and categorization.

In practice, this means testing multiple queries and comparing the results against the actual assortment. The goal is to confirm that search consistently surfaces the full set of relevant products, not just a portion of them.

Different types of queries should be used during testing, for example:

  • product names;
  • SKUs;
  • product groups;
  • categories and subcategories;
  • product types;
  • combinations like product group plus attribute;
  • models;
  • brands.

Testing across these variations helps ensure that search works reliably for different ways users express intent, not just for a narrow set of queries.

Ranking

Ranking defines the order in which products appear in search results.

Once completeness is achieved, the focus shifts to how those results are organized. All relevant products may be present, but if the most relevant ones are buried lower in the list, the overall quality of search is still poor.

The goal of ranking is to ensure that products that best match the query appear at the top. Less relevant items can still be present, but they should not compete with stronger matches.

For example, for the query “black running shoes”:

  • black running shoes should appear at the top
  • running shoes in other colors may appear lower

All of these products can be relevant to some extent, but their position should reflect how closely they match the query.

Evaluating ranking follows the same approach as completeness. It requires testing queries and comparing the order of results against expectations based on product knowledge. The key question is not whether the right products exist in the list, but whether they appear in the right positions.

Well-functioning ranking makes search feel accurate immediately, without requiring users to scan or refine results.

Relevance

Relevance defines which products should not appear in the results.

After completeness and ranking are in place, the focus shifts to refining the result set by removing items that do not match the user’s intent. Even if a product shares keywords with the query, it does not necessarily mean it belongs in the results.

For example, for the query “charcoal”:

  • products like charcoal bags or briquettes are relevant
  • charcoal grills may appear due to keyword match, but they do not match the intent and should be excluded

This distinction is important because keyword matching alone often introduces noise. Without filtering irrelevant results, the search may technically return matching items but still feel inaccurate.

Evaluating relevance requires understanding what the user is actually looking for behind the query, not just how the query maps to product data. The goal is to keep the result set focused, so every product shown is a valid answer to the search.

How to Evaluate Search Quality in Practice

The evaluation process starts with a fixed set of queries.

This list should cover different query types and reflect how users search across the catalog. It becomes a baseline for testing and should be reused consistently. Every time search settings are adjusted, the same queries are used to review how results change.

The next step is iterative testing.

Search engines usually provide multiple ways to tune results. The most impactful controls are typically related to fields and their values, such as product name, attributes, categories, or other structured data. Adjusting how these fields are weighted or interpreted directly affects completeness, ranking, and relevance.

The process is straightforward:

  • define a set of queries;
  • review results;
  • adjust search settings;
  • review results again.

This cycle is repeated until results consistently meet the three criteria.

The goal is not to rely on a single change, but to gradually shape the search system through controlled iterations, using the same queries as a reference point.

What “Good Enough” Looks Like

Search is not something that can be fully completed and left unchanged.

Product catalogs evolve, new items are added, attributes change, and the way users search also shifts over time. In large catalogs, there can be thousands or even millions of unique queries. Because of that, search tuning is an ongoing process with constant opportunities for improvement.

At the same time, a practical baseline is needed.

For large catalogs, a useful rule is to consider search “good” at the first level when all testing queries consistently meet the three criteria: completeness, ranking, and relevance. If the predefined set of queries produces correct results, the system can be considered stable enough to move forward.

There are also edge cases that highlight limitations of search as a tool.

For example, a broad query like “shoes” may correctly return all relevant products. If the catalog contains 300 shoes, search returning all 300 is technically correct, but not useful. Users are unlikely to explore such a large result set.

In situations like this, search alone is not the right solution. Broad queries are better handled as structured landing pages with filters and navigation options.

This is why working with on-site search goes beyond tuning results. It requires continuous adjustments and, in some cases, rethinking how certain queries should be handled altogether.

Alla Vovnenko on April 15, 2026 On-site Search
previous article
Next article

About ME

Alla Vovnenko

eCommerce Mechanic

  • Let’s connect on LinkedIn

This blog is based on my experience building and working on eCommerce websites, from scratch to revenue-generating stores. I work across everything eCommerce involves, but especially love analytics, SEO, and conversion optimization, and have a soft spot for on-site search.

categories

  • Analytics
  • CRO
  • eCommerce Operations
  • On-site Search
  • SEO
  • UX

LATEST POSTS

  • Shopify vs WooCommerce vs Magento: Which Is Best
  • eCommerce Merchandising as a System
  • On-Site Search Quality in eCommerce
  • An Easy Guide to Keyword Clustering with AI & Python
  • LTV report in GA4 for eCommerce
Read next
An Easy Guide to Keyword Clustering with AI & Python 13 Min
An Easy Guide to Keyword Clustering with AI & Python
Alla Vovnenko on April 11, 2026
Keyword clustering is one of the most effective ways to turn a messy list of search queries into a clear, actionable...

Subscribe to my Newsletter

eCommerce Lever

Practical notes on how eCommerce works day to day, from traffic and SEO to conversion, on-site search, and analytics. A mix of observations, experiments, and small things that tend to matter more than they seem. 

Linkedin

categories

  • Analytics
  • SEO
  • CRO

categories

  • UX
  • eCommerce Operations
  • On-site Search
© 2026 — eCommerceLever. All Rights Reserved.
Back to top