MySQL多表查詢，mybatis+mysql分庫分表_一種簡單易懂的 MyBatis 分庫分表方案-基礎知識庫-匯編語言學習筆記

MySQL多表查詢，mybatis+mysql分庫分表_一種簡單易懂的 MyBatis 分庫分表方案

2023-11-30 阅读 35 评论 0

摘要：數據庫分庫分表除了使用中間件來代理請求分發之外，另外一種常見的方法就是在客戶端層面來分庫分表 —— 通過適當地包裝客戶端代碼使得分庫分表的數據庫訪問操作代碼編寫起來也很方便。本文的分庫分表方案基于 MyBatis 框架，但是又不同于市面上常用的方案࿰

數據庫分庫分表除了使用中間件來代理請求分發之外，另外一種常見的方法就是在客戶端層面來分庫分表 —— 通過適當地包裝客戶端代碼使得分庫分表的數據庫訪問操作代碼編寫起來也很方便。本文的分庫分表方案基于 MyBatis 框架，但是又不同于市面上常用的方案，它們一般都是通過編寫復雜的 MyBatis 插件來重寫 SQL 語句，這樣的插件代碼會巨復雜無比，可能最終只有插件的原作者自己可以完全吃透相關代碼，給項目的維護性帶來一定問題。本文的方案非常簡單易懂，而且也不失使用上的便捷性。它的設計哲學來源于 Python —— Explicit is better than Implicit，也就是顯式優于隱式，它不會將分庫分表的過程隱藏起來。

很多分庫分表的設計在實現上會盡量將分庫分表的邏輯隱藏起來，其實這是毫無必要的。使用者必須知道背后確實進行了分庫分表，否則他怎么會無法進行全局的索引查找？他怎么會無法隨意進行多表的 join 操作。如果你真的將它當成單表來用，到上線時必然會出大問題。

接下來我們來看看在本文的方案之下，數據庫操作代碼的形式是怎樣的

帖子表一共分出來 64 個表，不同的記錄會各自分發到其中一個表，可以是按 hash 分發，也可以按照日期分發，分發邏輯由用戶代碼自己來決定。在不同的環境中可以將分表數量設置為不同的值，比如在單元測試下分表設為 4 個，而線上可能需要設置為 64 個。

@Configuration

public class PartitionConfig {

private int post = 64;

public int post() {

return post;

}

public void post(int post) {

this.post = post;

}

帖子表又會被分配到多個庫，這里就直接取模分配。假設有 4 個帖子庫，帖子表總共分出來 64 個表，分別是 post_0、post_1、post_2 一直到 post_63。那么 post_0、post_4、post_8 等分配到 0 號庫，post_1、post_5、post_9 等分配到 1 號庫，post_2、post_6、post_10 等分配到 2 號庫，post_3、post_5、post_11 等分配到 4 號庫。

從配置文件中構建 MySQLGroupStore 數據庫組對象，這個對象是我們執行 MySQL 操作的入口，通過它可以找到具體的物理的 MySQL 主從數據源。

@Configuration

public class RepoConfig {

@Autowired

private Environment env;

private MySQLGroupBuilder mysqlGroupBuilder = new MySQLGroupBuilder();

@Bean

@Qualifier("post")

public MySQLGroupStore replyMySQLGroupStore() {

MySQLGroupStore store = mysqlGroupBuilder.buildStore(env, "post");

store.prepare(factory -> {

factory.getConfiguration().addMapper(PostMapper.class);

});

return store;

}

配置文件 application.properties 如下

mysql.post0.master.addrWeights=localhost:3306

mysql.post0.master.db=sample

mysql.post0.master.user=sample

mysql.post0.master.password=123456

mysql.post0.master.poolSize=10

mysql.post0.slave.addrWeights=localhost:3307=100&localhost:3308=100

mysql.post0.slave.db=sample

mysql.post0.slave.user=sample

mysql.post0.slave.password=123456

mysql.post0.slave.poolSize=10

mysql.post1.master.addrWeights=localhost:3309

mysql.post1.master.db=sample

mysql.post1.master.user=sample

mysql.post1.master.password=123456

mysql.post1.master.poolSize=10

mysql.post1.slave.addrWeights=localhost:3310=100&localhost:3311=100

mysql.post1.slave.db=sample

mysql.post1.slave.user=sample

mysql.post1.slave.password=123456

mysql.post1.slave.poolSize=10

mysqlgroup.post.nodes=post0,post1

mysqlgroup.post.slaveEnabled=true

這里的數據庫組是由多個對等的 Master-Slaves 對構成，每個 Master-Slaves 是由一個主庫和多個不同權重的從庫構成，Master-Slaves 對的數量就是分庫的數量。

mysqlgroup 還有一個特殊的配置選項 slaveEnabled 來控制是否需要從庫，從而關閉讀寫分離，默認是關閉的，這樣就不會去構建從庫實例相關對象。

post_k 這張表后綴 k 我們稱之為 partition number，也就是后續代碼中到處在用的 partition 變量，表明當前的記錄被分配到對應物理數據表的序號。我們需要根據記錄的內容計算出 partition number，再根據 partition number 決定出這條記錄所在的物理表屬于那個物理數據庫，然后對這個物理數據庫進行相應的讀寫操作。

在本例中，帖子表按照 userId 字段 hash 出 64 張表，平均分配到 2 對物理庫中，每個物理庫包含一個主庫和2個從庫。

有了 MySQLGroupStore 實例，我們就可以盡情操縱所有數據庫了。

@Repository

public class PostMySQL {

@Autowired

private PartitionConfig partitions;

@Autowired

@Qualifier("post")

private MySQLGroupStore mysql;

public void createTables() {

for (int i = 0; i < partitions.post(); i++) {

int k = i;

mysql.master(k).execute(session -> {

PostMapper mapper = session.getMapper(PostMapper.class);

mapper.createTable(k);

});

}

public void dropTables() {

for (int i = 0; i < partitions.post(); i++) {

int k = i;

mysql.master(k).execute(session -> {

PostMapper mapper = session.getMapper(PostMapper.class);

mapper.dropTable(k);

});

}

public Post getPostFromMaster(String userId, String id) {

Holder holder = new Holder<>();

int partition = this.partitionFor(userId);

mysql.master(partition).execute(session -> {

PostMapper mapper = session.getMapper(PostMapper.class);

holder.value(mapper.getPost(partition, id));

});

return holder.value();

}

public Post getPostFromSlave(String userId, String id) {

Holder holder = new Holder<>();

int partition = this.partitionFor(userId);

mysql.slave(partition).execute(session -> {

PostMapper mapper = session.getMapper(PostMapper.class);

holder.value(mapper.getPost(partition, id));

});

return holder.value();

}

public void savePost(Post post) {

int partition = this.partitionFor(post);

mysql.master(partition).execute(session -> {

PostMapper mapper = session.getMapper(PostMapper.class);

Post curPost = mapper.getPost(partition, post.getId());

if (curPost != null) {

mapper.updatePost(partition, post);

} else {

mapper.insertPost(partition, post);

}

});

}

public void deletePost(String userId, String id) {

int partition = this.partitionFor(userId);

mysql.master(partition).execute(session -> {

PostMapper mapper = session.getMapper(PostMapper.class);

mapper.deletePost(partition, id);

});

}

private int partitionFor(Post post) {

return Post.partitionFor(post.getUserId(), partitions.post());

}

private int partitionFor(String userId) {

return Post.partitionFor(userId, partitions.post());

}

從上面的代碼中可以看出所有的讀寫、創建、刪除表操作的第一步都是計算出 partition number，然后根據它來選出目標主從庫再進一步對目標的數據表進行操作。這里我默認開啟了autocommit，所以不需要顯式來 session.commit() 了。

mysql.master(partition)

mysql.slave(partition)

// 如果沒有分庫mysql.master()

mysql.slave()

// 如果既沒有分庫也沒有讀寫分離mysql.db()

// 操作具體的表時要帶 partitionmapper.getPost(partition, postId)

mapper.savePost(partition, post)

在對數據表的操作過程中，又需要將具體的 partition number 傳遞過去，如此 MyBatis 才能知道具體操作的是哪個分表。

public interface PostMapper {

@Update("create table if not exists post_#{partition}(id varchar(128) primary key not null, user_id varchar(1024) not null, title varchar(1024) not null, content text, create_time timestamp not null) engine=innodb")

public void createTable(int partition);

@Update("drop table if exists post_#{partition}")

public void dropTable(int partition);

@Results({@Result(property = "createTime", column = "create_time"),

@Result(property = "userId", column = "user_id")})

@Select("select id, user_id, title, content, create_time from post_#{partition} where id=#{id}")

public Post getPost(@Param("partition") int partition, @Param("id") String id);

@Insert("insert into post_#{partition}(id, user_id, title, content, create_time) values(#{p.id}, ${p.userId}, #{p.title}, #{p.content}, #{p.createTime})")

public void insertPost(@Param("partition") int partition, @Param("p") Post post);

@Update("update post_#{partition} set title=#{p.title}, content=#{p.content}, create_time=#{p.createTime} where id=#{p.id}")

public void updatePost(@Param("partition") int partition, @Param("p") Post post);

@Delete("delete from post_#{partition} where id=#{id}")

public void deletePost(@Param("partition") int partition, @Param("id") String id);

}

在每一條數據庫操作中都必須帶上 partition 參數，你可能會覺得這有點繁瑣。但是這也很直觀，它明確地告訴我們目前正在操作的是哪一個具體的分表。

在 MyBatis 的注解 Mapper 類中，如果方法含有多個參數，需要使用 @Param 注解進行名稱標注，這樣才可以在 SQL 語句中直接使用相應的注解名稱。否則你得使用默認的變量占位符名稱 param0、param1 來表示，這就很不直觀。

我們將分表的 hash 算法寫在實體類 Post 中，這里使用 CRC32 算法進行 hash。

public class Post {

private String id;

private String userId;

private String title;

private String content;

private Date createTime;

public Post() {}

public Post(String id, String userId, String title, String content, Date createTime) {

this.id = id;

this.userId = userId;

this.title = title;

this.content = content;

this.createTime = createTime;

}

public String getId() {

return id;

}

public void setId(String id) {

this.id = id;

}

public String getUserId() {

return userId;

}

public void setUserId(String userId) {

this.userId = userId;

}

public String getTitle() {

return title;

}

public void setTitle(String title) {

this.title = title;

}

public String getContent() {

return content;

}

public void setContent(String content) {

this.content = content;

}

public Date getCreateTime() {

return createTime;

}

public void setCreateTime(Date createTime) {

this.createTime = createTime;

}

public int partitionFor(int num) {

return partitionFor(userId, num);

}

public static int partitionFor(String userId, int num) {

CRC32 crc = new CRC32();

crc.update(userId.getBytes(Charsets.UTF8));

return (int) (Math.abs(crc.getValue()) % num);

}

代碼中的 partitionFor 方法的參數 num 就是一共要分多少表。如果是按日期來分表，這個參數可能就不需要，直接返回日期的整數就行比如 20190304。

還有最后一個問題是多個帶權重的從庫是如何做到概率分配的。這里就要使用到 spring-jdbc 自帶的 AbstractRoutingDataSource —— 帶路由功能的數據源。它可以包含多個子數據源，然后根據一定的策略算法動態挑選出一個數據源來，這里就是使用權重隨機。

但是有個問題，我這里只需要這一個類，但是需要引入整個 spring-boot-jdbc-starter 包，有點拖泥帶水的感覺。我研究了一下 AbstractRoutingDataSource 類的代碼，發現它的實現非常簡單，如果就仿照它自己實現了一個簡單版的，這樣就不需要引入整個包代碼了。

public class RandomWeightedDataSource extends DataSourceAdapter {

private int totalWeight;

private Set sources;

private Map sourceMap;

public RandomWeightedDataSource(Map srcs) {

this.sources = new HashSet<>();

this.sourceMap = new HashMap<>();

for (Entry entry : srcs.entrySet()) {

// 權重值不宜過大 int weight = Math.min(10000, entry.getValue());

for (int i = 0; i < weight; i++) {

sourceMap.put(totalWeight, entry.getKey());

totalWeight++;

}

this.sources.add(entry.getKey());

}

private PooledDataSource getDataSource() {

return this.sourceMap.get(ThreadLocalRandom.current().nextInt(totalWeight));

}

public void close() {

for (PooledDataSource ds : sources) {

ds.forceCloseAll();

}

@Override

public Connection getConnection() throws SQLException {

return getDataSource().getConnection();

}

@Override

public Connection getConnection(String username, String password) throws SQLException {

return getDataSource().getConnection(username, password);

}

還需進一步深入理解其實現代碼的可以將 shardino 代碼倉庫拉到本地跑一跑

git clone https://github.com/pyloque/shardino.git

里面有單元測試可以運行起來，運行之前需要確保本機安裝了 docker 環境

docker-compose up -d

這條指令會啟動2對主從庫，各1主兩從。

在本例中雖然用到了 springboot ，其實也只是用了它方便的依賴注入和單元測試功能，shardino 完全可以脫離 springboot 而獨立存在。

shardino 并不是一個完美的開源庫，它只是一份實現代碼的樣板，如果讀者使用的是其它數據庫或者 MySQL 的其它版本，那就需要自己微調一下代碼來適配了。

原文链接：https://hbdhgg.com/5/186702.html

发表评论: