2020-12-11
Clickhouse-server与MySQL实时复制配置引擎MaterializeMySQL
数据复制原理:
ClickHouse 可以挂载为 MySQL 的一个从库 ,先全量再增量的实时同步 MySQL 数据,基于它我们可以轻松的打造一套企业级解决方案,让 OLTP 和 OLAP 的融合。
目前支持 MySQL 5.6/5.7/8.0 版本,兼容 Delete/Update 语句,及大部分常用的 DDL 操作。
安装软件版本:
[root@node220 clickhouse-server]# rpm -qa | grep clickhouse
clickhouse-server-20.10.3.30-2.noarch
clickhouse-client-20.10.3.30-2.noarch
clickhouse-common-static-20.10.3.30-2.x86_64
[root@node221 ~]# rpm -qa | grep mysql
mysql-community-client-5.7.18-1.el7.x86_64
mysql-community-libs-5.7.18-1.el7.x86_64
mysql-community-server-5.7.18-1.el7.x86_64
mysql-community-common-5.7.18-1.el7.x86_64
mysql-community-libs-compat-5.7.18-1.el7.x86_64
操作系统:
[root@node221 ~]# uname -a
Linux node221 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@node221 ~]# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
Mysql-Master数据库主库配置:
开启BINLOG和GITD:
vim /etc/my.cnf
#binlog
log_bin=mysql-bin
server_id=221
binlog_row_image=full
#Gtid
gtid_mode=on
enforce_gtid_consistency=on
配置clickhouse-server连接mysql的帐户:
create user ch_reader identified by 'hx123.'
GRANT SELECT, RELOAD, PROCESS, FILE, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT, EVENT ON *.* TO 'ch_reader'@'%'
ClickHouse Slave配置:
目前以 database 为单位进行复制,不同的 database 可以来自不同的 MySQL master,这样就可以实现多个 MySQL 源数据同步到一个 ClickHouse 做 OLAP 分析功能。
开启开关:
[root@node220 clickhouse-server]# pwd
/etc/clickhouse-server
[root@node220 clickhouse-server]# ls
config.d config.xml preprocessed users.d users.xml
[root@node220 clickhouse-server]# head users.xml
<?xml version="1.0"?>
<yandex>
<!-- Profiles of settings. -->
<profiles>
<!-- Default settings. -->
<default>
<!-- Maximum memory usage for processing single query, in bytes. -->
<max_memory_usage>10000000000</max_memory_usage>
<allow_experimental_database_materialize_mysql >1</allow_experimental_database_materialize_mysql>
<max_partitions_per_insert_block>1000</max_partitions_per_insert_block>
<!-- Use cache of uncompressed blocks of data. Meaningful only for processing many of very short queries. --
重新启动clickhouse
systemctl restart clickhouse-server
或者在运行实例中修改:
先查看:
node220 :) select name,value changed from system.settings where name ='allow_experimental_database_materialize_mysql';
┌─name──────────────────────────────────────────┬─changed─┐
│ allow_experimental_database_materialize_mysql │ 1 │
└───────────────────────────────────────────────┴─────────┘
1 rows in set. Elapsed: 0.006 sec.
如果值为0,进行如下配置:
node220 :) set allow_experimental_database_materialize_mysql=1;
创建一个复制通道:
node220 :) CREATE DATABASE hexin ENGINE = MaterializeMySQL('172.16.1.221:3306', 'hexin', 'ch_reader', 'Ch_reader123.');
Ok.
看下 ClickHouse 的同步位点:
[root@node220 hexin]# cat /var/lib/clickhouse/metadata/hexin/.metadata
Version: 2
Binlog File: mysql-bin.000006
Executed GTID: b1ed20bd-191e-11eb-be28-000c297e52fe:1-22
Binlog Position: 5401
Data Version: 4
查看clickhouse从mysql复制过来的表:
node220 :) use hexin;
USE hexin
Ok.
0 rows in set. Elapsed: 0.002 sec.
node220 :) show tables;
SHOW TABLES
┌─name─┐
│ rsc │
│ test │
└──────┘
2 rows in set. Elapsed: 0.008 sec.
查看mysql中的信息:
mysql> show processlist;
+----+-----------+--------------------+-------+------------------+------+---------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-----------+--------------------+-------+------------------+------+---------------------------------------------------------------+------------------+
| 6 | root | localhost | hexin | Sleep | 1469 | | NULL |
| 14 | ch_reader | 172.16.1.220:35546 | hexin | Sleep | 741 | | NULL |
| 15 | ch_reader | 172.16.1.220:35548 | NULL | Binlog Dump GTID | 741 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 17 | root | localhost | NULL | Query | 0 | starting | show processlist |
+----+-----------+--------------------+-------+------------------+------+---------------------------------------------------------------+------------------+
4 rows in set (0.00 sec)
clickhouse-server时时复制mysql数据库配置成功
注意:
mysql中的表一定要有主键,不然会报错,无法复制数据到clickhouse,而且即使以后再在mysql中创建或修改任何数据,clickhouse也不会同步数据了,
不支持json字段,当mysql创建json字段的表时,clickhouse的复制会停止
报以下错误:
2020.10.29 11:43:52.139725 [ 17746 ] {} <Error> MaterializeMySQLSyncThread: Code: 48, e.displayText() = DB::Exception: The hexin.gmy cannot be materialized, because there is no primary keys., Stack trace (when copying this message, always include the lines below):
参考:https://www.cnblogs.com/dbtech/p/13499555.html
需要在MySQL端开启GTID模式:(这个是单实例模式,如果为主从模式,需要另外配置)
1.确保MySQL版本在5.6. 以上
2.在MySQL 5.7版本支持热部署,即不停止服务的情况下开启GTID模式
操作步骤:
SET GLOBAL ENFORCE_GTID_CONSISTENCY = 'WARN';
SET GLOBAL ENFORCE_GTID_CONSISTENCY = 'ON';
SET GLOBAL GTID_MODE = 'OFF_PERMISSIVE';
SET GLOBAL GTID_MODE = 'ON_PERMISSIVE';
SET GLOBAL GTID_MODE = 'ON';
查看验证:
mysql> show variables like 'gtid_mode';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| gtid_mode | ON |
+---------------+-------+
1 row in set (0.00 sec)
mysql> show variables like 'ENFORCE_GTID_CONSISTENCY';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| enforce_gtid_consistency | ON |
+--------------------------+-------+
1 row in set (0.00 sec)
数据测试:
Delete
首先在 MySQL Master 上执行一个删除操作:
mysql> delete from t1 where a=1;
Query OK, 1 row affected (0.01 sec)
然后在 ClickHouse Slave 侧查看记录:
clickhouse :) select * from t1;
SELECT *
FROM t1
┌─a─┬─b─┐
│ 2 │ 2 │
└───┴───┘
1 rows in set. Elapsed: 0.032 sec.
此时的 metadata 里 Data Version 已经递增到 2:
cat ckdatas/metadata/hexin/.metadata
Version: 1
Binlog File: mysql-bin.000001
Binlog Position: 1171
Data Version: 2
Update
MySQL Master:
mysql> select * from t1;
+---+------+
| a | b |
+---+------+
| 2 | 2 |
+---+------+
1 row in set (0.00 sec)
mysql> update t1 set b=b+1;
mysql> select * from t1;
+---+------+
| a | b |
+---+------+
| 2 | 3 |
+---+------+
1 row in set (0.00 sec)
ClickHouse Slave:
clickhouse :) select * from t1;
SELECT *
FROM t1
┌─a─┬─b─┐
│ 2 │ 3 │
└───┴───┘
1 rows in set. Elapsed: 0.023 sec.
数据分布情况:
元数据存放位置:
[root@node220 tmp]# pwd
/var/lib/clickhouse/metadata/tmp #tmp库名
[root@node220 tmp]# cat hexin_erp_package_less_detail.sql
ATTACH TABLE hexin_erp_package_less_detail
(
`id` Int32,
`less_no` String,
`sku` String,
`spec_id` Int32,
`less_num` Int32,
`storage_id` Int32,
`package_no` String,
`create_time` DateTime,
`update_time` DateTime,
`package_create_time` DateTime,
`storage_code` String,
`type` UInt8,
`inventory_id` Int32,
`inventory_status` Int32,
`uuid` Int32,
`uuid_name` String,
`pick_type` Int8,
`is_urgent` Int8,
`platform_id` Int32,
`_sign` Int8 MATERIALIZED 1,
`_version` UInt64 MATERIALIZED 1
)
ENGINE = ReplacingMergeTree(_version)
PARTITION BY intDiv(id, 4294967)
ORDER BY (sku, package_no, storage_code, inventory_id, id)
SETTINGS index_granularity = 8192
物化视图定义文件:
[root@node220 metadata]# pwd
/var/lib/clickhouse/metadata
[root@node220 metadata]# cat tmp.sql
ATTACH DATABASE tmp
ENGINE = MaterializeMySQL('192.168.66.33:3306', 'tmp', 'ch_reader', 'Ch_reader123.')
表数据存放位置:
[root@node220 tmp]# pwd
/var/lib/clickhouse/data/tmp
[root@node220 tmp]# ls
hexin_erp_package_less_detail hexin_erp_product_sku hexin_erp_storage_position_sku
hexin_erp_package_sending_detail hexin_erp_storage hexin_erp_storage_stock
[root@node220 tmp]# cd hexin_erp_package_less_detail/
[root@node220 hexin_erp_package_less_detail]# ls
0_1_67_5 0_68_84_3_91 0_85_87_2_91 0_88_88_0_91 0_90_90_0 format_version.txt
0_1_67_5_91 0_85_86_1 0_87_87_0 0_89_89_0 0_90_90_0_91 mutation_91.txt
0_68_84_3 0_85_87_2 0_88_88_0 0_89_89_0_91 detached
性能测试:
测试环境
MySQL 8C16G 云主机, 192.168.0.3,基础数据 10188183 条记录
ClickHouse 8C16G 云主机, 192.168.0.4
benchyou 8C8G 云主机, 192.168.0.5, 256并发写, https://github.com/xelabs/benchyou
性能测试跟硬件环境有较大关系,这里使用的是云主机模式,数据供参考。
全量性能
8c16G-vm :) create database sbtest engine=MaterializeMySQL('192.168.0.3:3306', 'sbtest', 'test', '123');
8c16G-vm :) watch lv1;
WATCH lv1
┌─count()─┬───────────────now()─┬─_version─┐
│ 0 │ 2020-07-29 06:36:04 │ 1 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 1113585 │ 2020-07-29 06:36:05 │ 2 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 2227170 │ 2020-07-29 06:36:07 │ 3 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 3340755 │ 2020-07-29 06:36:10 │ 4 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 4454340 │ 2020-07-29 06:36:13 │ 5 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 5567925 │ 2020-07-29 06:36:16 │ 6 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 6681510 │ 2020-07-29 06:36:18 │ 7 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 7795095 │ 2020-07-29 06:36:22 │ 8 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 8908680 │ 2020-07-29 06:36:25 │ 9 │
└─────────┴─────────────────────┴──────────┘
┌──count()─┬───────────────now()─┬─_version─┐
│ 10022265 │ 2020-07-29 06:36:28 │ 10 │
└──────────┴─────────────────────┴──────────┘
┌──count()─┬───────────────now()─┬─_version─┐
│ 10188183 │ 2020-07-29 06:36:28 │ 11 │
└──────────┴─────────────────────┴──────────┘
← Progress: 11.00 rows, 220.00 B (0.16 rows/s., 3.17 B/s.)
在这个硬件环境下,全量同步性能大概是 424507/s,42w 事务每秒。
因为全量的数据之间没有依赖关系,可以进一步优化成并行,加速同步。
全量的性能直接决定 ClickHouse slave 坏掉后重建的速度,如果你的 MySQL 有 10 亿条数据,大概 40 分钟就可以重建完成。
增量性能(实时同步)
在当前配置下,ClickHouse slave 单线程回放消费能力大于 MySQL master 256 并发下生产能力,通过测试可以看到它们保持实时同步。
benchyou 压测数据,2.1w 事务/秒(MySQL 在当前环境下TPS上不去):
./bin/benchyou --mysql-host=192.168.0.3 --mysql-user=test --mysql-password=123 --oltp-tables-count=1 --write-threads=256 --read-threads=0
time thds tps wtps rtps
[13s] [r:0,w:256,u:0,d:0] 19962 19962 0
time thds tps wtps rtps
[14s] [r:0,w:256,u:0,d:0] 20415 20415 0
time thds tps wtps rtps
[15s] [r:0,w:256,u:0,d:0] 21131 21131 0
time thds tps wtps rtps
[16s] [r:0,w:256,u:0,d:0] 21606 21606 0
time thds tps wtps rtps
[17s] [r:0,w:256,u:0,d:0] 22505 22505 0
ClickHouse 侧单线程回放能力,2.1w 事务/秒,实时同步:
┌─count()─┬───────────────now()─┬─_version─┐
│ 150732 │ 2020-07-30 05:17:15 │ 17 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 155477 │ 2020-07-30 05:17:16 │ 18 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 160222 │ 2020-07-30 05:17:16 │ 19 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 164967 │ 2020-07-30 05:17:16 │ 20 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 169712 │ 2020-07-30 05:17:16 │ 21 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 174457 │ 2020-07-30 05:17:16 │ 22 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 179202 │ 2020-07-30 05:17:17 │ 23 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 183947 │ 2020-07-30 05:17:17 │ 24 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 188692 │ 2020-07-30 05:17:17 │ 25 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 193437 │ 2020-07-30 05:17:17 │ 26 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 198182 │ 2020-07-30 05:17:17 │ 27 │
└─────────┴─────────────────────┴──────────┘
评论
发表评论
姓 名: