Implementation of Elastic Search in Django

Implementation of Elastic Search in Django

Introduction

In the first article, we delved into how elastic search works under the hood.

In this article, we will implement elastic search in a Django application.

This article is intended for someone familiar with Django, we will not be explaining setup deeply or structures such as models and views.

Setup

Clone this repository into a folder of your choosing.

git clone git@github.com:robinmuhia/elasticSearchPOC.git .

We need three specific libraries that we will use as they abstract a lot of what we need to implement elastic search.

djangoelasticsearchdsl==8.0
elasticsearch==8.0.0
elasticsearchdsl==8.12.0

Create a virtual environment, activate it and install the dependencies in the requirements.txt file

python3 -m venv venv
source venv/bin/activate
pip install -r requirements/txt

Your project structure should look like below;

Now we’re ready to go.

Understanding the project

Settings file

The project is a simple Django application. It has your usual setup structure.

In the config folder, we have our settings.py file.
For the purpose of this project, our elastic search settings are simple as shown below;

ELASTICSEARCH_DSL = {
“default”: {
“hosts”: [os.getenv(“ELASTICSEARCH_URL”, “http://localhost:9200”)],
},
}
ELASTICSEARCH_DSL_SIGNAL_PROCESSOR = “django_elasticsearch_dsl.signals.RealTimeSignalProcessor”
ELASTICSEARCH_DSL_INDEX_SETTINGS = {}
ELASTICSEARCH_DSL_AUTOSYNC = True
ELASTICSEARCH_DSL_AUTO_REFRESH = True
ELASTICSEARCH_DSL_PARALLEL = False

In a production ready application, i would recommend using the CelerySignalProcessor. The RealTimeSignalProcessor re-indexes documents immediately any changes are made to a model. CelerySignalProcessor would handle the re-indexing asynchronously to ensure thus our users would not have to experience added latency when they modify any of our models. You would have to set up Celery.

Read more about the nuances of settings here.

Models

from django.db import models

class GenericMixin(models.Model):
“””Generic mixin to be inherited by all models.“””

id = models.AutoField(primary_key=True, editable=False, unique=True)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)

class Meta:
abstract = True
ordering = (-updated_at, -created_at)
class Country(GenericMixin):
name = models.CharField(max_length=200)

def __str__(self):
return self.name

class Genre(GenericMixin):
name = models.CharField(max_length=100)

def __str__(self):
return self.name

class Author(GenericMixin):
name = models.CharField(max_length=200)

def __str__(self):
return self.name

class Book(GenericMixin):
title = models.CharField(max_length=100)
description = models.TextField()
genre = models.ForeignKey(Genre, on_delete=models.CASCADE, related_name=genres)
country = models.ForeignKey(Country, on_delete=models.CASCADE, related_name=countries)
author = models.ForeignKey(Author, on_delete=models.CASCADE, related_name=authors)
year = models.IntegerField()
rating = models.FloatField()

def __str__(self):
return self.title

The Generic Mixin has fields that should be inherited by all. For a production application, i would recommend using a UUID as a primary key but we will use a normal incrementing integer field as it is easier for this project.

The models are pretty self-explanatory but we will be indexing and querying the book model. Our goals are to be able to search for a book using its title, description,genre and author while also being able to filter by genre, author, year and rating.

Documents file

We have documents.py file in the books folder.
This folder will is important and should be named as such. Our documents will be written here. For our book model, the code is shown below;

from django_elasticsearch_dsl import Document, fields
from django_elasticsearch_dsl.registries import registry

from elastic_search.books.models import Author, Book, Country, Genre

@registry.register_document
class BookDocument(Document):
genre = fields.ObjectField(
properties={
name: fields.TextField(),
}
)
country = fields.NestedField(
properties={
name: fields.TextField(),
}
)
author = fields.NestedField(
properties={
name: fields.TextField(),
}
)

class Index:
name = books

class Django:
model = Book
fields = [
title,
description,
year,
rating,
]

related_models = [Genre, Country, Author]

def get_queryset(self):
return super().get_queryset().select_related(genre, author, country)

def get_instances_from_related(self, related_instance):
if isinstance(related_instance, Genre):
return related_instance.genres.all()
elif isinstance(related_instance, Country):
return related_instance.countries.all()
elif isinstance(related_instance, Author):
return related_instance.authors.all()
else:
return []

Import Statements:
We import necessary modules and classes from django_elasticsearch_dsl and our Django models.

Document Definition:
We define a BookDocument class which inherits from Document, provided by django_elasticsearch_dsl.

Registry Registration:
We register the BookDocument class with the registry using the @registry.register_document decorator. This tells the Elasticsearch DSL library to manage this document.

Index Configuration:
We specify the name of the Elasticsearch index for this document as “books”. This index name should be unique within the Elasticsearch cluster.

Django Model Configuration:
Under the Django class nested within BookDocument, we link the document to the Django model (Book) and specify which fields of the model should be indexed.

Fields Mapping:
Inside the BookDocument class, we define fields for the Elasticsearch document. These fields map to the fields in the Django model. Some fields, such as genre, country, and author, are nested objects.

Related Models Handling:
We specify related models (Genre, Country, Author) that should be indexed along with the Book model. For each related model, we define how to retrieve instances related to the main model. This involves specifying which fields to index from related models.

Queryset Configuration:
We override the get_queryset method to specify how the queryset should be retrieved. In this case, we use select_related to fetch related objects efficiently.

Instances from Related:
We define the get_instances_from_related method to handle instances from related models. This method is used to retrieve instances related to the main model for indexing purposes.

Leave a Reply

Your email address will not be published. Required fields are marked *