Implementing sharding in a multitenant Rails application

Jun 9, 2018 · 8 minute read · Comments
postgres scaling

As Enectiva, an energy management solution, grows, the amount of data we need to store grows as well. We’re definitely not in terabytes but some of the tables are becoming pretty hefty so we decided to implement sharding. Our biggest pain point are pre-calculated consumption data so we naturally started there.

Basic concept

Due to the nature of the data and the access patterns, we decided to go with traditional sharding by customer. We didn’t want to go to the extreme of creating separate tables for each customer and instead opted for grouping customers into a fixed number of shards. This means that each customer’s data lives only in one shard but one shard contains data of multiple customers.

Objectives

split large tables into many small ones, making management and ops tasks easier and faster
make it easy for the application to select the appropriate shard for any given operation
keep everything in the same database (for now)
gradually transition from a non-sharded storage to a sharded storage without downtime

Considerations and decisions

The data had already been split by different types of objects (meters, buildings etc.) and different energies resulting in 18 tables. Since we wanted to keep the data in the same database we looked into performance impacts of a large number of tables in Postgres. There doesn’t seem to be any specific rule but the general sense is don’t create millions of tables and whine about poor performance.

Number of shards

Postgres generally doesn’t care about the number of tables and doesn’t impose any hard limit. However, each table means some bookkeeping in tables in pg_catalog. This is where Postgres keeps information about all the user defined schemas, tables, indexes, functions etc. Therefore more tables mean that every time Postgres looks up a table to see whether it can perform a query on it, it will take slightly more time. Every extra table leads to additional minuscule penalty for almost all queries.

Another consideration which came up in our research was to plan for the future when we might want to split the load between multiple databases (running on different server instances). At that point, it would be nice to have roughly the same load on each of them and an easy proxy for load is the number of shards. Going from one instance to two should mean migrating half of the shards to the new instance, so we probably should have an even number of them. OK, let’s imagine it works and the data grows, what do we do when it reaches unreasonable values again? We want to split it once more. We can either go from two to three but that would mean an uneven split or a need to re-shuffle the shards (probably incurring some unwanted downtime). Let’s split it in half again. And again, and again… This plan leads to starting with a number of shards which is a power of two; the process of changing underlying physical mapping while keeping the logical mapping is nicely described in an article by Craig Kerstiens. In the end, we decided to go with 256 shards.

Organization

A tiny practical nugget was to leverage Postgres schemas, i.e. making each shard a separate schema. The main benefit is that schemas are physically separated even in a single database. A side benefit while everything lives in a single database is that most GUIs respect schemas so we don’t end up with a gigantic list of tables in public.

Shard assignment and selection

Ensuring each shard could be selected easily was a choice we initially struggled with. In the earliest stages, we toyed with the idea of sharding by time (but it wouldn’t fit our access patterns) so we explored the built-in partitioning support in Postgres. It is getting significantly better (especially Postgres 10 includes quite a few improvements in the automated management of partitioning) but there is one requirement we don’t satisfy: Postgres can partition (shard) data only based on columns which are part of the sharded table (and to get performance benefits a query needs to include a condition on this column so only one shard is searched). However, we don’t store the customer association directly in the affected table and adding customer ID to the table would mean quite a lot of useless duplicated data. Alternatively, we could shard based on an existing column such as meter ID. This would be OK initially but once we separate the shards to physically different instances, data of one customer would be scattered all over. There are quite a few queries which operate on different tables of one customer so preserving co-location on this data is beneficial.

With all this, the decision is clear: application level sharding based on customer, each shard is a Postgres schema and we have a clear path for changing the physical organization while keeping the logical structure. For the first step, we decided to have a naive algorithm for shard selection customer_id % 256. Of course, this can put multiple large customers in a single shard while grouping several small customers into another. Still better than one huge pile and the algorithm can be improved later so let’s count it as an improvement.

Transition path

The last objective was to achieve gradual transition without a downtime. With all the previous decisions, the path we had to take was rather clear. The data in question is derivative (basically a cache) with a pretty high write activity. The last time we changed it significantly, the volume was so small we could afford to throw it away and recalculate everything from scratch within minutes. This is no longer an option, we needed to keep the old storage format while preparing the new one and then switch between them.

The first step was straightforward: create the shards. We use Rails migrations for managing the structure of the database so the code was pretty easy and just few dozen lines of code. Still, it generated over four thousand tables and structure.sql (file in Rails with SQL to recreate the database structure) grew by roughly ten thousand lines making for a giant diff.¹

Then we needed to populate the new storage with data. We considered jump starting this by running a series of INSERT INTO shard SELECT * FROM nonshard WHERE id % 256 = 0 but because the data is changing all the time and not all changes are predictable (or tracked in a useful way) we decided to re-calculate the whole history bringing the new storage up to date. Then we could run tests, verify everything is correct and eventually start switching over.

This need for gradually changing behaviour is a classic use case for feature flags. We prepared them with the code capable handling all steps in the transition (writing to & reading from only the old structure, writing to both & reading from the new, writing to both & reading from the new, writing to & reading from only the new structure). Due to the organization of data, there were more flags in reality split across two applications. Because some of the calculations are memory intensive, we had migrated them from our Rails codebase to workers in Go some time ago. This works well but it made this transition slightly more complex. We invested quite a bit of time into ensuring the flags are easy to operate (default to the old, safe behaviour) and check for nonsensical combinations. We didn’t end up running into any problems but it was comforting to have a safety net. Also, we made sure that the shard selection lives only in the Rails application and the workers get the shard as an argument.

Feature flags require quite a bit of code and the affected code paths multiply. In this situation, most components had only two alternatives but we had a component with four different paths through it. It was very tiresome to write tests for all of them but it paid off and the commit which deleted most of them after the transition was glorious.

D-Day

The actual transition took several days because we throttled the volume of data being calculated in order to keep the application running (under higher load but still operational). The transition went according to plan. We didn’t run into any issues with the feature flags and the alternative code paths. We ran into two edge cases on staging but those had been fixed by the time code went to production.

The recalculation served as a load test for our whole infrastructure; there were fires that had to be put down during the whole process. We expected it and took it as an opportunity to see how far we can push our code and where are the deficiencies. It’s much better to find them when you’re expecting something to happen than to hit them during a normal operation. We ended up with a list of various improvements to make Enectiva resilient and performant.

Sharded data organization works great and we have observed performance improvements in both access and management operations.

We tried to split the structure.sql into multiple files based on the Postgres schema in order to keep the public less noisy. Unfortunately, Rails puts all schemas into one file which is hardcoded in ActiveRecord. ↩︎

We're looking for developers to help us save energy

If you're interested in what we do and you would like to help us save energy, drop us a line at jobs@enectiva.cz.

Štěpán Pilař

Developing Enectiva.cz