Django¶
In Identify Distribution Strategy we discussed the framework-agnostic database changes required for using Citus in the multi-tenant use case. Here we investigate specifically how to migrate multi-tenant Django applications to a Citus storage backend with the help of the django-multitenant library.
This process will be in 5 steps:
Introducing the tenant column to models missing it that we want to distribute
Changing the primary keys of distributed tables to include the tenant column
Updating the models to use the
TenantModelMixin
Distributing the data
Updating the Django Application to scope queries
Preparing to scale-out a multi-tenant application¶
Initially you’ll start with all tenants placed on a single database node. To be able to scale out django, some simple changes will have to be made to your models.
Let’s consider this simplified model:
from django.utils import timezone
from django.db import models
class Country(models.Model):
name = models.CharField(max_length=255)
class Account(models.Model):
name = models.CharField(max_length=255)
domain = models.CharField(max_length=255)
subdomain = models.CharField(max_length=255)
country = models.ForeignKey(Country, on_delete=models.SET_NULL)
class Manager(models.Model):
name = models.CharField(max_length=255)
account = models.ForeignKey(Account, on_delete=models.CASCADE,
related_name='managers')
class Project(models.Model):
name = models.CharField(max_length=255)
account = models.ForeignKey(Account, related_name='projects',
on_delete=models.CASCADE)
managers = models.ManyToManyField(Manager)
class Task(models.Model):
name = models.CharField(max_length=255)
project = models.ForeignKey(Project, on_delete=models.CASCADE,
related_name='tasks')
The tricky thing with this pattern is that in order to find all tasks for an account, you’ll have to query for all of an account’s project first. This becomes a problem once you start sharding data, and in particular when you run UPDATE or DELETE queries on nested models like task in this example.
1. Introducing the tenant column to models belonging to an account¶
1.1 Introducting the column to models belonging to an account
In order to scale out a multi-tenant model, it’s essential for queries to quickly locate all records that belong to an account. Consider an ORM call such as:
Project.objects.filter(account_id=1).prefetch_related('tasks')
It generates these underlying SQL queries:
SELECT *
FROM myapp_project
WHERE account_id = 1;
SELECT *
FROM myapp_task
WHERE project_id IN (1, 2, 3);
However the second query would go faster with an extra filter:
-- the AND clause identifies the tenant
SELECT *
FROM myapp_task
WHERE project_id IN (1, 2, 3)
AND account_id = 1;
This way you can easily query the tasks belonging to one account.
The easiest way to achieve this is to simply add a account_id
column on every object that belongs to an account.
In our case:
class Task(models.Model):
name = models.CharField(max_length=255)
project = models.ForeignKey(Project, on_delete=models.CASCADE,
related_name='tasks')
account = models.ForeignKey(Account, related_name='tasks',
on_delete=models.CASCADE)
Create a migration to reflect the change: python manage.py makemigrations
.
1.2. Introduce a column for the account_id on every ManyToMany model that belongs to an account
The goal is the same as previously. We want to be able to have ORM calls and queries routed to one account. We also want to be able to distribute the ManyToMany relationship related to an account on the account_id.
So the calls generated by:
Project.objects.filter(account_id=1).prefetch_related('managers')
Can include in their WHERE
clause the account_id
like this:
SELECT *
FROM "myapp_project" WHERE "myapp_project"."account_id" = 1;
SELECT *
FROM myapp_manager manager
INNER JOIN myapp_projectmanager projectmanager
ON (manager.id = projectmanager.manager_id
AND projectmanager.account_id = manager.account_id)
WHERE projectmanager.project_id IN (1, 2, 3)
AND manager.account_id = 1;
For that we need to introduce through
models. In our case:
class Project(models.Model):
name = models.CharField(max_length=255)
account = models.ForeignKey(Account, related_name='projects',
on_delete=models.CASCADE)
managers = models.ManyToManyField(Manager, through='ProjectManager')
class ProjectManager(models.Model):
project = models.ForeignKey(Project, on_delete=models.CASCADE)
manager = models.ForeignKey(Manager, on_delete=models.CASCADE)
account = models.ForeignKey(Account, on_delete=models.CASCADE)
Create a migration to reflect the change: python manage.py makemigrations
.
2. Include the account_id in all primary keys and unique constraints¶
Primary-key and unique constraints on values other than the tenant_id will present a problem in any distributed system, since it’s difficult to make sure that no two nodes accept the same unique value. Enforcing the constraint would require expensive scans of the data across all nodes.
To solve this problem, for the models which are logically related to an account (the tenant for our app), you should add account_id to the primary keys and unique constraints, effectively scoping objects unique inside a given account. This helps add the concept of tenancy to your models, thereby making the multi-tenant system more robust.
2.1 Including the account_id to primary keys
Django automatically creates a simple “id” primary key on models, so we will need to circumvent that behavior with a custom migration of our own. Run python manage.py makemigrations appname --empty --name remove_simple_pk
, and edit the result to look like this:
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
# leave this as it was generated
]
operations = [
# Django considers "id" the primary key of these tables, but
# we want the primary key to be (account_id, id)
migrations.RunSQL("""
ALTER TABLE myapp_manager
DROP CONSTRAINT myapp_manager_pkey CASCADE;
ALTER TABLE myapp_manager
ADD CONSTRAINT myapp_manager_pkey
PRIMARY KEY (account_id, id);
"""),
migrations.RunSQL("""
ALTER TABLE myapp_project
DROP CONSTRAINT myapp_project_pkey CASCADE;
ALTER TABLE myapp_project
ADD CONSTRAINT myapp_product_pkey
PRIMARY KEY (account_id, id);
"""),
migrations.RunSQL("""
ALTER TABLE myapp_task
DROP CONSTRAINT myapp_task_pkey CASCADE;
ALTER TABLE myapp_task
ADD CONSTRAINT myapp_task_pkey
PRIMARY KEY (account_id, id);
"""),
migrations.RunSQL("""
ALTER TABLE myapp_projectmanager
DROP CONSTRAINT myapp_projectmanager_pkey CASCADE;
ALTER TABLE myapp_projectmanager
ADD CONSTRAINT myapp_projectmanager_pkey PRIMARY KEY (account_id, id);
"""),
]
2.2 Including the account_id to unique constraints
The same thing needs to be done for UNIQUE
constraints. You can have explicit constraints that you might have set in your model with unique=True
or unique_together
like:
class Project(models.Model):
name = models.CharField(max_length=255, unique=True)
account = models.ForeignKey(Account, related_name='projects',
on_delete=models.CASCADE)
managers = models.ManyToManyField(Manager, through='ProjectManager')
class Task(models.Model):
name = models.CharField(max_length=255)
project = models.ForeignKey(Project, on_delete=models.CASCADE,
related_name='tasks')
account = models.ForeignKey(Account, related_name='tasks',
on_delete=models.CASCADE)
class Meta:
unique_together = [('name', 'project')]
For these constraints, you can simply change in the models the constraints:
class Project(models.Model):
name = models.CharField(max_length=255)
account = models.ForeignKey(Account, related_name='projects',
on_delete=models.CASCADE)
managers = models.ManyToManyField(Manager, through='ProjectManager')
class Meta:
unique_together = [('account', 'name')]
class Task(models.Model):
name = models.CharField(max_length=255)
project = models.ForeignKey(Project, on_delete=models.CASCADE,
related_name='tasks')
account = models.ForeignKey(Account, related_name='tasks',
on_delete=models.CASCADE)
class Meta:
unique_together = [('account', 'name', 'project')]
Then generate the migration with:
python manage.py makemigrations
Some UNIQUE
constraints are created by the ORM and you will need to explicitily drop them.
This is the case for OneToOneField
and ManyToMany
fields.
For these cases you will need to: 1. Find the constraints 2. Do a migration to drop them 3. Re-create constraints including the account_id field
To find the constraints, connect to your database using psql
and run \d+ myapp_projectmanager
You will see the ManyToMany
(or OneToOneField
) constraint:
"myapp_projectmanager" UNIQUE CONSTRAINT myapp_projectman_project_id_manager_id_bc477b48_uniq,
btree (project_id, manager_id)
Drop this constraint in a migration:
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
# leave this as it was generated
]
operations = [
migrations.RunSQL("""
DROP CONSTRAINT myapp_projectman_project_id_manager_id_bc477b48_uniq;
"""),
Then change your models to have a unique_together
including the account\_id
class ProjectManager(models.Model):
project = models.ForeignKey(Project, on_delete=models.CASCADE)
manager = models.ForeignKey(Manager, on_delete=models.CASCADE)
account = models.ForeignKey(Account, on_delete=models.CASCADE)
class Meta:
unique_together=(('account', 'project', 'manager'))
And finally apply the changes by creating a new migration to generate these constraints:
python manage.py makemigrations
3. Updating the models to use TenantModelMixin and TenantForeignKey¶
Next, we’ll use the django-multitenant library to add account_id to foreign keys, and make application queries easier later on.
In requirements.txt for your Django application, add
django_multitenant>=2.0.0, <3
Run pip install -r requirements.txt
.
In settings.py, change the database engine to the customized engine provied by django-multitenant:
'ENGINE': 'django_multitenant.backends.postgresql'
3.1 Introducing the TenantModelMixin and TenantManager
The models will now not only inherit from models.Model
but also from the TenantModelMixin
.
To do that in your models.py
file you will need to do the following imports
from django_multitenant.mixins import *
Previously our example models inherited from just models.Model, but now we need
to change them to also inherit from TenantModelMixin. The models in real
projects may inherit from other mixins too like django.contrib.gis.db
,
which is fine.
You will also, at this point, introduce the tenant_id to define which column is the distribution column.
class TenantManager(TenantManagerMixin, models.Manager):
pass
class Account(TenantModelMixin, models.Model):
...
tenant_id = 'id'
objects = TenantManager()
class Manager(TenantModelMixin, models.Model):
...
tenant_id = 'account_id'
objects = TenantManager()
class Project(TenantModelMixin, models.Model):
...
tenant_id = 'account_id'
objects = TenantManager()
class Task(TenantModelMixin, models.Model):
...
tenant_id = 'account_id'
objects = TenantManager()
class ProjectManager(TenantModelMixin, models.Model):
...
tenant_id = 'account_id'
objects = TenantManager()
3.2 Handling ForeignKey constraints
For ForeignKey
and OneToOneField
constraint, we have a few different cases:
Foreign keys (or One to One) between distributed tables, for which you should use the
TenantForeignKey
(orTenantOneToOneField
).Foreign keys between a distributed table and a reference table don’t require a change.
Foreign keys between a distributed table and a local table, which require to drop the constraint by using
models.ForeignKey(MyModel, on_delete=models.CASCADE, db_constraint=False)
.
Finally your models should look like this:
from django.db import models
from django_multitenant.fields import TenantForeignKey
from django_multitenant.mixins import *
class Country(models.Model): # This table is a reference table
name = models.CharField(max_length=255)
class TenantManager(TenantManagerMixin, models.Manager):
pass
class Account(TenantModelMixin, models.Model):
name = models.CharField(max_length=255)
domain = models.CharField(max_length=255)
subdomain = models.CharField(max_length=255)
country = models.ForeignKey(Country, on_delete=models.SET_NULL) # No changes needed
tenant_id = 'id'
objects = TenantManager()
class Manager(TenantModelMixin, models.Model):
name = models.CharField(max_length=255)
account = models.ForeignKey(Account, related_name='managers',
on_delete=models.CASCADE)
tenant_id = 'account_id'
objects = TenantManager()
class Project(TenantModelMixin, models.Model):
account = models.ForeignKey(Account, related_name='projects',
on_delete=models.CASCADE)
managers = models.ManyToManyField(Manager, through='ProjectManager')
tenant_id = 'account_id'
objects = TenantManager()
class Task(TenantModelMixin, models.Model):
name = models.CharField(max_length=255)
project = TenantForeignKey(Project, on_delete=models.CASCADE,
related_name='tasks')
account = models.ForeignKey(Account, on_delete=models.CASCADE)
tenant_id = 'account_id'
objects = TenantManager()
class ProjectManager(TenantModelMixin, models.Model):
project = TenantForeignKey(Project, on_delete=models.CASCADE)
manager = TenantForeignKey(Manager, on_delete=models.CASCADE)
account = models.ForeignKey(Account, on_delete=models.CASCADE)
tenant_id = 'account_id'
objects = TenantManager()
3.3 Handling ManyToMany constraints
In the second section of this article, we introduced the fact that with citus, ManyToMany
relationships require a through
model with the tenant column. Which is why we have the model:
class ProjectManager(TenantModelMixin, models.Model):
project = TenantForeignKey(Project, on_delete=models.CASCADE)
manager = TenantForeignKey(Manager, on_delete=models.CASCADE)
account = models.ForeignKey(Account, on_delete=models.CASCADE)
tenant_id = 'account_id'
objects = TenantManager()
After installing the library, changing the engine, and updating the models, run
python manage.py makemigrations
. This will produce a migration to make the foreign keys composite when necessary.
4. Distribute data in Citus¶
We need one final migration to tell Citus to mark tables for distribution. Create a new migration python manage.py makemigrations appname --empty --name distribute_tables
. Edit the result to look like this:
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
# leave this as it was generated
]
operations = [
migrations.RunSQL(
"SELECT create_distributed_table('myapp_account','id')"
),
migrations.RunSQL(
"SELECT create_distributed_table('myapp_manager','account_id')"
),
migrations.RunSQL(
"SELECT create_distributed_table('myapp_project','account_id')"
),
migrations.RunSQL(
"SELECT create_distributed_table('myapp_projectmanager','account_id')"
),
migrations.RunSQL(
"SELECT create_distributed_table('myapp_task','account_id')"
),
]
With all the migrations created from the steps so far, apply them to the database with python manage.py migrate
.
At this point the Django application models are ready to work with a Citus backend. You can continue by importing data to the new system and modifying controllers as necessary to deal with the model changes.
Updating the Django Application to scope queries¶
The django-multitenant library discussed in the previous section is not only useful for migrations, but for simplifying application queries. The library allows application code to easily scope queries to a single tenant. It automatically adds the correct SQL filters to all statements, including fetching objects through relations.
For instance, in a controller simply set_current_tenant
and all the queries or joins afterward will include a filter to scope results to a single tenant.
# set the current tenant to the first account
s = Account.objects.first()
set_current_tenant(s)
# now this count query applies only to Project for that account
Project.objects.count()
# Find tasks for very import projects in the current account
Task.objects.filter(project__name='Very important project')
In the context of an application controller, the current tenant object can be stored as a SESSION variable when a user logs in, and controller actions can set_current_tenant
to this value. See the README in django-multitenant for more examples.