Painless Conversion of a MySQL App to Multi-Tenant

Let me first start by saying that the technique described below was not invented by me. I found inspiration from blogs and forums and have refined the process over the years. I am sorry that I cannot refer to the original source as the bookmarks have been lost to the sands of time. There are many ways to architect a multi-tenant application. This is just one way that met my needs, but it might not meet yours. I would love to hear your solutions to multi-tenant architecture in the comments or email.

This post refers to MySQL, but all of the queries should work equally well in the wonderful MariaDB.

What is a “Multi-Tenant” Application?

A multi-tenant application is an application for which a single deployment of the software serves separate and distinct tenants with each tenant’s data and existence is kept securely invisible to other tenants.

If you ran a store called DogMart and created a web application for managing your inventory and point-of-sale system, then you have a single-tenant application.  This is true even if all of your employees have separate logins to this system or possibly even if you had multiple DogMart locations using the application.  Users and locations are not what we are referring to when we say ‘tenants’.  A tenant typically refers to a group of users, a business, or a business unit, in this case, the DogMart business.

If you knew the folks running CatMart and BirdMart needed the same kind of software you might want to sell them your inventory software.  Maybe it’s time to sell DogMart and run PetSoft selling this inventory management software.  Now you need to support three tenants and you need to make sure tenants can’t see other tenants data or potentially even know other tenants exist.  You could set up different servers and databases to run a copy of the software for DogMart, CatMart, and BirdMart, but this can be a real challenge to maintain when this scales to dozens, hundreds, or thousands of tenants.

Multi-tenant applications solve this by using a single deployment of the application to serve separate customers where the data is securely partitioned where each tenant can only see and modify their own data.  BirdMart.com, CatMart.com, and DogMart.com might be pointed at the same application server, but show radically different content which is made possible by this data separation.

The challenge with multi-tenant is designing a system that actually solves the complexity issues you’re trying to avoid with multiple single-tenant deployments.  This article describes one such pattern for MySql multi tenant.

Background

Many years ago I was faced with the challenge to take a legacy web application intended for one customer (tenant) and turn it into a multi-tenant SaaS application. The app itself was large, but not particularly complex. It used around 60 tables in a MySQL database and did not use views, triggers, or stored procedures. At the start, the application was accessible using a single domain. The request was to allow each “tenant” to access the software from a different domain without any sharing of data.

The most obvious solution was to simply have a distinct instance of the database for each tenant. As I was going to be the only engineer maintaining these databases I shivered at the thought of having to make a database change to hundreds or thousands of databases during an upgrade. Not to mention, having to maintain all those backups. Instead, I decided to keep the single database, but to segment the data such that one tenant couldn’t see another tenant’s records.

The tricky part was ensuring that it would not be possible (within reason) for one tenant to see another tenant’s data and, most importantly, not have to modify every SQL statement in the app to make it happen. As it worked out, I was able to convert the application to multi-tenant in two days. It’s been running this way for several years with almost no change to this logic since.

In the end, if we had two tenants with the hostnames www.tenant_a.com and www.tenant_b.com they can both be pointed to (CNAME records for) app.example.com and both show completely different data despite using the same database and app servers.

A Solution Design Pattern

The chosen solution involves the following:

  • Creating a database user for each tenant
  • Renaming every table to a different and unique name (e.g. using a prefix ‘someprefix_’)
  • Adding a text column called ‘id_tenant’ to every table to store the name of the tenant the row belongs to
  • Creating a trigger for each table to automatically store the current database username to the id_tenant column before inserting a new row
  • Creating a view for each table with the original table name with all the columns except id_tenant. The view will only return rows where (id_tenant = current_database_username)
  • Only grant permission to the views (not tables) to each tenant’s database user

Then, the only part of the application that needs to change is the database connection logic. When someone connects to the SaaS, the application would need to:

  • Determine the hostname
  • Lookup the correct tenant database username for that particular hostname using some non-database method (e.g. a datafile or a separate database/cache)
  • Connect to the database as that tenant-specific username

What this means is that the application doesn’t really know or need to know that you’re multi-tenant. Once the app has it’s DB connection it no longer cares which tenant it is. If your application did a ‘SELECT * FROM invoices” query before the change, it can do so after the change despite the fact that the real invoices table could contain other tenant’s data. In this case, it’s because ‘invoices’ is now a VIEW of the original table renamed to someprefix_invoices and it’s automagically filtering out other tenant’s data.

Pseudo-Code Examples

Let’s walk through a pseudo-code example of the scripts necessary to convert to this pattern. This is written such that it can be re-run each time a schema change is made. I actually run it each and every time I update the software regardless if there is a schema change. All of the code in this post is intentionally language neutral. You will need to write the real code in your language of choice.

First, we need a main loop to find actually tables (not views) and make sure they are converted to our pattern. Fortunately, MariaDB/MySQL provides a meta-structure to inspect your schema to distinguish between tables and views.

main(schema) {
query('SHOW FULL TABLES');
for (queryResults as tableInfo) {
// If it's an actual table, not a view
if (tableType == 'BASE_TABLE') {
// Check to see if the tableName has our special prefix
if (tableName starts with 'someprefix_') {
// we've already converted this table previously
viewName = sub string of tableName to remove the prefix;
} else {
// need to convert this table
viewName = tableName;
tableName = "someprefix_{tableName}";
renameTable(viewName, tableName);
}
addTenantColumnIfMIssing(schema, tableName);
createOrReplaceTrigger(tableName);
createOrReplaceView(viewName, tableName);
}
}
}

Now let’s break down those important worker functions that do the heavy lifting.

The first one is pretty easy. Renaming a table is pretty straightforward. This won’t get called if the table is already named with the prefix.

function renameTable(oldTableName, newTableName) {
query("RENAME TABLE {oldTableName} TO {newTableName}");
}

Next, there is the function to make sure that every table has an ‘id_tenant’ field. Remember, this is a text field that will contain the tenant-specific database user id each row belongs to. This is written to only add the column if it doesn’t already exist.

function addTenantColumnIfMissing(schema, realTableName) {
if (!columnExists(schema, realTableName, 'id_tenant')) {
query("
ALTER TABLE {realTableName}
ADD COLUMN id_tenant VARCHAR(64)
NOT NULL DEFAULT 'NULL'
");
}
}
function columnExists(schema, table, column) {
query("
SELECT *
FROM information_schema.COLUMNS
WHERE
TABLE_SCHEMA = '{schema}'
AND TABLE_NAME = '{table}'
AND COLUMN_NAME = '{column}'
");
return true if row returned
}

Next, here is the function to make sure each table has a trigger to automatically populate the id_tenant column before each row insert. In MariaDB/MySQL database users look like email addresses, therefore we need to extract only the user part before the ‘@’. Note that if you’re logged into the database as ‘root’ you will be responsible for setting the id_tenant field appropriately.

function createOrReplaceTrigger(realTableName) {
triggerName = "bi_{realTableName}";
query("DROP TRIGGER IF EXISTS {triggerName}");
query("
CREATE TRIGGER {triggerName}
BEFORE INSERT ON {realTableName}
FOR EACH ROW BEGIN
IF ('root' != SUBSTRING_INDEX(USER(), '@', 1)) THEN
SET NEW.id_tenant = SUBSTRING_INDEX(USER(), '@', 1);
END IF;
END;
");
}

Finally, we need to create a view that will filter out other tenant’s data. This is the key to segmenting each tenant’s data. Also note that the view uses the id_tenant column to do the filtering, but this column is left out of the view. Since the tenant database user only has access to the views they are unable to change the value of the id_tenant column and it is auto-populated by the trigger when new rows are inserted.

function createOrReplaceView(viewName, tableName) {
// generate a list of columns on the table for the view, skipping id_tenant
columnNames = [];
query("SHOW COLUMNS FROM {tableName}");
for (columns as columnInfo) {
fieldName = columnInfo['Field'];
if (fieldName != 'id_tenant') {
columnNames.push("{fieldName} AS {fieldName}");
}
}
// join column names into one comma separated string
fieldList = implode(', ', columnNames);
query("
CREATE OR REPLACE VIEW {viewName}
AS
SELECT {fieldList}
FROM {tableName}
WHERE (id_tenant = SUBSTRING_INDEX(USER( ),'@',1))
");
}

That takes care of the database conversion, but now we need to create all those tenant database users and ensure they only have access to our views. Below is a pseudo-code function to create a new tenant database user and set the appropriate permissions.

function createOrUpdateTenantDBUser(schema, user, pass) {
// group all tables/views into two buckets, allowed & banned
allowedTables = [];
bannedTables = [];
query('SHOW FULL TABLES');
foreach (tables as tableInfo) {
tableName = tableInfo[0];
if (tableType != 'BASE_TABLE') {
// this is a view, it's ok
allowedTables.push(tableName);
} else {
// this is a table, which is not ok
bannedTables.push(tableName);
}
}
// create the user. You should make this soft-fail and continue
// that way the GRANTs and REVOKEs below will execute on the existing user
query("CREATE USER {user} IDENTIFIED BY '{pass}'");
// grant them access
query("GRANT LOCK TABLES ON {schema}.* TO {user}");
for (allowedTables as table) {
query("GRANT SELECT, UPDATE, INSERT, DELETE ON {schema}.{table} TO {user}");
}
// out of paranoia, revoke all perms on the actual tables
for (bannedTables as table) {
query("REVOKE ALL ON {schema}.{table} FROM {user}");
}
// make sure the database uses the latest perms
query("FLUSH PRIVILEGES");
}

There you have it. Using the pattern above you can convert many existing single-tenant applications to multi-tenant in a matter of hours. This pattern is useful in cases where there is an existing codebase that would be too time-intensive to take other approaches. Here are some pros/cons to think about:

Pros:

  • Single database instance to maintain and backup for all tenants
  • Almost no change to application code
  • Easy to scale-up
  • If many tenants have a small amount of data this is significaly more efficient than having a separate database per tenant.

Cons:

  • Hard to scale-out. DB connection pooling logic could get complex over time. You might need to assign specific app servers to specific tenants to avoid individual app servers needing too many different DB connections for each tenant.
  • AUTO INCREMENT counters are shared across all tenants. Tenant “A” might have invoices 1, 2, and 4, while tenant “B” has invoice 3 as they are all rows in the same base table.
  • You cannot restore a specific tenant’s data from a backup without impacting all tenants. You have one backup for all tenants and the sequence numbers could collide if you try to get creative.
  • You are ultimately limited by how far you can scale-up MariaDB/MySQL for a single database.

There are several enhancements to this pattern that can help with more complex applications:

  • use another prefix for tables that should not be converted and should all for direct read-only access. For example, use the ‘common_’ prefix for a read-only table containing US zip codes and cities.
  • use another prefix for tables that should not be converted and should not be visible to any tenant database user id. For example, you could use a prefix of ‘hidden_’ for tables used by an internal administration app to add/remove tenants.
  • use a special id_tenant value to represent shared rows. For example, the views could filter down to all rows where id_tenant is USER() OR ‘common’. The trick here is to ensure through other means that tenants can’t modify those rows.
  • Adding a unique and unusual prefix to your actual table names can also have a security side-benefit. If the prefix is never expected as real data you can check web query data for the existence of the prefix in requests. If the prefix is suitably unique it should never be part of a valid HTTP request, therefore if it does appear in a request it’s probably malicious. Another note, with MySQL/MariaDB apps, it’s probably a good idea to also test HTTP request parameters for ‘information_schema’ as this internal is often exploited by hackers.

As I mentioned, I have been using this pattern for several years. I would love to hear suggestions for improvements, alternative design patterns, and feedback if this is useful to you.

I am available for consulting engagements and custom software development projects.  You can reach me at rob@opensource.io.

About the Author

Before joining OSA, Rob had a sixteen year career as a Senior Software Engineer and Manager at Texas Instruments, Inc. His extensive experience with enterprise web, mobile, and embedded solutions are helping our customers succeed in the digital economy. Rob can be reached at rob@opensource.io

  • Abhijit says:

    Great article. Thanks for sharing your experience.
    In article, you specifically mention about database not having stored procs/triggers. Does the solution approach change if db already has stored procs? Any suggestions for such scenario?

    • Rob Jenks says:

      I don’t think it would be an issue other than you would need to audit and test based on your situation. The main issue is to make sure that these stored procedures and triggers are all called within the context of a MySQL user based on the appropriate tenant. My note about this was more to make sure you tested them after the conversion to make sure you don’t end up with records without a proper tenant id or mixing tenant data inadvertently.

  • Art says:

    It is nice to hear there are others using such approach… 🙂

    We have used this in the past for a more complex multitenant app. We had to give up due to the cons you have mentioned plus high mysql load caused by views usage.

    We haved now moved the separation to database connector class and we are using pregenerated triggers for inserts / deletes / updates based on session variables.

  • >