Multi-tenant Hive data warehouse
2024-07-11
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
1. Concept
Multi-tenancy corresponds to single-tenancy. This article focuses on multi-tenancy, and single-tenancy is for understanding the content.
1.1 Multi-tenancy
Multi-tenancy technology, or SaaS for short, is a software architecture technology that implements how to share the same system or program components in a multi-user environment (multi-users here are generally for enterprise users) and ensure the isolation of data between users. Simply put: a single application instance runs on a server, which provides services for multiple tenants (customers). From the definition, we can understand:Multi-tenancy is an architecture that allows multiple users to use the same set of programs while ensuring data isolation between users.
1.2 Single Tenant
The difference between single-tenant and multi-tenant architecture is that single-tenant creates its own software application and supporting environment for each user. Single-tenant SaaS is widely used in situations where customers need to support customized applications, and this customization may be due to the region or they need higher security control. Through the single-tenant model, each customer has a database and operating system on a separate server, or in a virtual network environment isolated with strong security measures.
1.3 The difference between single-tenant and multi-tenant
- Different security control levels. A multi-tenant database stores data from multiple independent tenants. Although security isolation is set, the security control level is still higher than that of a single tenant. Since a single tenant has an independent software and hardware environment, the database only stores the data of one tenant, which technically eliminates the possibility of data leakage. The single-tenant architecture is sometimes more suitable for certain industries that require security control or even legal compliance requirements.
- The complexity of data backup varies. A single tenant has an independent database, and it is very easy to back up and restore the customer database. However, multi-tenants share a database, and the tenants' data is both isolated and shared. The system cannot automatically perform independent backups of the enterprise every day.
- Control upgrade time is different. The system maintenance cost of multi-tenant is low. When the multi-tenant system is upgraded, it only needs to be updated once. The maintenance personnel do not need to update each user, saving a lot of operation and maintenance costs. This is very useful for systems where all customers are doing the same thing. However, if the system upgrade time occurs when the enterprise is particularly busy, it will inevitably affect the enterprise users.
Application scenario: Multi-tenancy is suitable for multiple companies (departments) under the same group (company). Even if the data is leaked, it will not be leaked to the outside.
2. Multi-tenant data isolation solution
- Independent database
- Shared database, independent schema
- Shared database, shared schema, shared data table
2.1 Independent Database
This is the first solution, that is, one tenant one database,This solution has the highest level of user data isolation and the best security, but it is more expensive.。
- Advantages: Providing independent databases for different tenants helps simplify the expansion design of the data model and meet the unique needs of different tenants; if a failure occurs, it is relatively simple to recover the data.
- Disadvantages: The number of database installations increases, which leads to increased maintenance and purchase costs.
This solution is similar to the traditional one customer, one set of data, one set of deployment, the only difference is that the software is uniformly deployed at the operator. If you are facing tenants such as banks and hospitals that require a very high level of data isolation, you can choose this model to increase the rental price. If the price is low and the product is low-priced, this solution is generally unaffordable for operators.
2.2 Shared database, independent schema
This is the second solution, that is, multiple or all tenants share the database, but each tenant has a schema (also called a user). The underlying database is, for example, DB2, ORACLE, etc. There can be multiple SCHEMAs under one database.
- Advantages: It provides a certain degree of logical data isolation for tenants with higher security requirements, but it is not complete isolation; each database can support more tenants.
- Disadvantages: If a failure occurs, data recovery is difficult because restoring the database will involve the data of other tenants; if cross-tenant statistical data is required, there will be certain difficulties.
2.3 Shared database, shared schema, shared data table
This is the third option.That is, tenants share the same database and schema, but add the TenantID multi-tenant data field in the table. This is the mode with the highest degree of sharing and the lowest isolation level.。
That is, each time a piece of data is inserted, a customer identifier is required. This is how the data of different customers can be distinguished in the same table.
- Advantages: Comparison of three solutions:The third option has the lowest maintenance and acquisition costs and allows each database to support the largest number of tenants.。
- Disadvantages: The lowest isolation level and lowest security, requiring increased security development during design and development; Data backup and recovery are the most difficult, requiring table-by-table and item-by-item backup and restoration.This solution is most suitable if you want to provide services to the most tenants with the least servers and the tenants are willing to sacrifice isolation level in exchange for lower costs.。
2.4 Summary
In the process of SaaS implementation, there is a significant consideration, which is how to design application data to support multi-tenancy. The idea of this design is to strike a balance between data sharing, security isolation and performance. The characteristics of the three modes can be summarized in one picture.
