The Problem with SQL's LIKE
If you've ever tried to build a search feature using a SQL database, you probably reached for the LIKE operator: SELECT * FROM articles WHERE content LIKE '%search term%';. While this works for simple cases, it quickly falls apart:
- It's Slow: A leading wildcard (%...) prevents the database from using an index, forcing a full table scan, which is incredibly slow on large datasets.
- It's Not "Smart": It has no concept of relevance. A document that mentions the search term once is treated the same as one that mentions it 20 times.
- It's Inflexible: It can't handle typos (search temr), synonyms (search word), or different forms of a word (plurals, verb tenses like searching vs. searched).
To solve these problems, you need a dedicated full-text search engine.
What is Elasticsearch?
Elasticsearch is a distributed, open-source search and analytics engine built on top of a library called Apache Lucene. It's designed from the ground up to solve the problems listed above.
Think of it like the index card catalog in a library, but on steroids. Instead of just looking up book titles, it analyzes the entire content of every book, understands the language, and can instantly find not just exact matches, but the most relevant passages, even if you make a spelling mistake.
Core Concepts
- Document: A JSON object that represents a single piece of data you want to make searchable (e.g., a product, a user profile, a log entry).
- Index: A collection of documents with a similar structure. It's roughly analogous to a table in a SQL database.
- Inverted Index: This is the secret sauce. Instead of a regular index that maps a Document ID to its content, an inverted index maps each word to a list of Document IDs where that word appears. When you search for a word, Elasticsearch just looks it up in this massive dictionary to find all matching documents instantly.
The Two-Step Process: Indexing and Searching
Working with Elasticsearch involves two main actions.
1. Indexing: Sending Your Data to Elasticsearch Before you can search for data, you have to put it into an Elasticsearch index. This process is called indexing. During this step, Elasticsearch performs an analysis on your text fields:
- Tokenization: Breaks text down into individual words (tokens). "The quick brown fox" -> [the, quick, brown, fox].
- Lowercasing: Converts all tokens to lowercase.
- Stop Word Removal: Removes common words like the, a, is.
- Stemming: Reduces words to their root form. searching, searched, searches all become search.
This analysis is what makes the search so "smart" and flexible.
Example of indexing a document using cURL:
Bash
curl -X PUT "localhost:9200/products/_doc/1" -H 'Content-Type: application/json' -d'
{
"name": "Super Fast Laptop",
"description": "A very quick laptop for all your development and gaming needs.",
"price": 1299.99,
"in_stock": true
}
'
2. Searching: Querying Your Index Once your data is indexed, you can run queries against it. Elasticsearch provides a rich JSON-based Query DSL (Domain-Specific Language). The most common query is the match query, which performs a full-text search on a field.
Example of a search query using cURL:
Bash
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"description": "quick develop"
}
}
}
'
Elasticsearch will find documents whose description contains words like "quick" or "development" (thanks to stemming). The results will be returned sorted by a relevance score (_score), which calculates how well each document matches the query.
Code Snippet: Using the Elasticsearch Client in Node.js
JavaScript
import { Client } from '@elastic/elasticsearch';
// --- Setup ---
const client = new Client({ node: 'http://localhost:9200' });
async function runExample() {
const indexName = 'products';
// --- 1. Index a document ---
console.log('Indexing a document...');
await client.index({
index: indexName,
id: '1',
document: {
name: 'Super Fast Laptop',
description: 'A very quick laptop for all your development needs.',
price: 1299.99
}
});
// Ensure the document is indexed before searching
await client.indices.refresh({ index: indexName });
// --- 2. Search for the document ---
console.log('Searching for documents...');
const { hits } = await client.search({
index: indexName,
query: {
match: {
description: 'quick develop' // Note the typo is handled by analysis
}
}
});
console.log('Search results:');
console.log(hits.hits); // The 'hits' array contains the search results
}
runExample().catch(console.error);