Blog/Building a Fast, Typo-Tolerant Search Engine in Python with Typesense

ChatGPT

Google AI Studio

Claude

Grok

Perplexity

Building a Fast, Typo-Tolerant Search Engine in Python with Typesense

Typesense is a fast, open source, typo tolerant search engine that is easy to set up and use with Python.
You define collections with schemas, then index documents in batches to get low latency, highly relevant search results.
Features like typo tolerance, faceting, and relevance weights let you fine tune the user search experience.
With a simple Flask API, you can turn your Typesense setup into a real world search service for your app.

Describe Typesense

Typesense is a quick, open-source search engine that is highly relevant and typo-tolerant for full-text searches. In contrast to other search engines, Typesense is intended to be easy to use without sacrificing any of its robust capabilities.

It's a great option for applications like e-commerce platforms, content-rich websites, and real-time search apps where you need to give consumers a strong, typo-tolerant search experience.

What Makes Typesense Unique?

Although Elasticsearch and Algolia are popular alternatives, Typesense stands out for being simple to set up, offering lightning-fast searches, and having exceptional typo tolerance right out of the box. Typesense's support for simple deployment in containers and availability as a hosted version (Typesense Cloud) make getting started in various contexts easier.

Why Use Python with Typesense?

Python is a great choice for creating apps because of its simplicity and adaptability, and Typesense's Python client facilitates smooth integration.

If you want quick search functionality in a Python project, whether it's in a data processing script, Django, or Flask, Typesense provides:

Low-latency searches: Typesense is incredibly speed-optimized.
Tolerance for errors: It manages typos and incomplete word matches.
Relevance tuning: Setting up relevance ranking for results is simple.

How to Begin Using Typesense?

Let's get started by configuring Typesense and integrating it with Python. The Typesense server will be started first, and then it will be connected to a Python project.

Setting Up Typesense Locally

The quickest way to run Typesense locally is via Docker. Run the following command to start a Typesense container:

docker run -p 8108:8108 -v /typesense-data:/data typesense/typesense:latest  --data-dir /data --api-key=xyz --enable-cors

Using the API key xyz, this program launches a Typesense server on localhost:8108. In production, don't forget to substitute a secure key for xyz.

Installing the Typesense Python Client

Install the typesense-python client to interact with Typesense from Python:

pip install typesense

Connecting to Typesense with Python

With the Typesense server running, you can connect to it from your Python script or app.

import typesense

client = typesense.Client({
    'nodes': [{
        'host': 'localhost',
        'port': '8108',
        'protocol': 'http'
    }],
    'api_key': 'xyz',
    'connection_timeout_seconds': 2
})

This code initializes a Typesense client connected to your local server. Now, you are ready to create collections and index data.

Creating and Managing Collections

Data is kept in collections in Typesense. Typesense is able to enhance search performance since each collection has a schema that specifies the fields and their types.

Defining a Collection Schema

Let's build a product (products) data collection for an online store. title, description, categories and price are examples of fields that we will define:

product_schema = {
    "name": "products",
    "fields": [
        {"name": "title", "type": "string"},
        {"name": "description", "type": "string"},
        {"name": "price", "type": "float"},
        {"name": "categories", "type": "string[]", "facet": True}
    ]
}

client.collections.create(product_schema)

This schema includes a title, description, price, and a categories field for faceting (filtering by categories).

Adding and Updating Records

With our schema set, we can add records to the collection:

products = [
    {"title": "Wireless Earbuds", "description": "High-quality earbuds", "price": 29.99, "categories": ["electronics"]},
    {"title": "Smart Watch", "description": "Feature-rich smartwatch", "price": 99.99, "categories": ["wearables", "electronics"]}
]

client.collections['products'].documents.import_(products, {'action': 'upsert'})

This code imports product data and performs an "upsert" action, adding or updating records as needed.

Performing Searches

Now, let’s search our products collection for items related to "watch."

results = client.collections['products'].documents.search({
    'q': 'watch',
    'query_by': 'title,description',
    'facet_by': 'categories'
})

print(results)

With Typesense, you can search in several fields (in this case, title and description) and use facets to narrow down the results.

Advanced Search Features

Typesense offers robust search customization options:

Typo Tolerance: By returning results even for little errors, built-in typo tolerance enhances the user experience.
Relevance Tuning: To improve relevance, change which fields are given greater weight in the search.
Faceting and Filtering: To enable users to filter results by category, add faceting to fields such as categories.

For example, adding relevance tuning is as simple as:

results = client.collections['products'].documents.search({
    'q': 'watch',
    'query_by': 'title,description',
    'query_by_weights': '3,1'  # title is weighted more heavily than description
})

Tips for Performance Optimization

As you build more complex applications, it’s important to consider optimizations:

Batch Importing: It is more efficient to add documents in batches using import_ rather than importing records one at a time.
Index Tuning: Create your schema with care, retaining all necessary fields and facets.
Caching: To save search demand, think about caching results if your queries are often asked.

Real-World Example: Building a Search API

To illustrate Typesense in action, let’s create a basic Flask API for product search.

from flask import Flask, request, jsonify
import typesense


app = Flask(__name__)

client = typesense.Client({
    'nodes': [{'host': 'localhost', 'port': '8108', 'protocol': 'http'}],
    'api_key': 'xyz',
    'connection_timeout_seconds': 2
})


@app.route('/search')
def search():
    query = request.args.get('q')
    search_parameters = {
        'q': query,
        'query_by': 'title,description',
        'facet_by': 'categories'
    }

    results = client.collections['products'].documents.search(search_parameters)
    return jsonify(results)


if __name__ == '__main__':
    app.run()

This straightforward Flask application offers a search API that receives a query and produces Typesense results.

Final Thoughts

Adding search functionality to your project is simple using Typesense. Typesense is a good option for applications that require accurate and rapid search because of its typo tolerance, speed, and relevancy characteristics.

Frequently Asked Questions

A schema in Typesense defines the structure of data in a collection, specifying the fields each document will have and the data types for each field. This helps Typesense optimize search functionality and relevance. By defining a schema, you can tailor how Typesense indexes and searches your data. For example, a schema for a product collection might include fields like: String fields (e.g., title and description) for text data that users will search. Numeric fields (e.g., price) for filtering and sorting by number values. Array fields (e.g., categories) for storing lists of tags or categories and enabling faceted search. Creating a schema also allows for relevance tuning, where certain fields can be weighted more heavily in search results, making Typesense’s results highly relevant to user queries.