Loading... # MySQL主从复制搭建实战:数据同步零丢失配置指南 # 一、概述 ## 1. 背景介绍 在生产环境中运维MySQL数据库十年,主从复制是遇到最频繁的架构需求之一。无论是为了读写分离提升性能,还是为了数据容灾保障业务连续性,主从复制都是MySQL高可用架构的基石。 2019年的一次惨痛经历至今记忆犹新:某电商平台大促期间,单机MySQL扛不住流量直接宕机,由于没有从库,数据恢复花了整整6个小时,直接损失数百万。这次事故之后,主从复制成为所有核心业务数据库的标配。 ### A. 核心原理 MySQL主从复制的核心原理是:主库将数据变更写入二进制日志(Binary Log),从库通过I/O线程读取主库的binlog并写入本地的中继日志(Relay Log),然后SQL线程读取中继日志并在从库上重放这些变更,从而实现数据同步。 ## 2. 技术特点 MySQL 8.0/8.4 LTS版本在复制功能上有了重大改进: ### A. 基于GTID的复制 - 全局事务标识符(GTID)让每个事务都有唯一标识 - 故障切换时自动定位复制位点,无需手动指定binlog文件和位置 - 支持自动故障转移,大幅简化运维复杂度 ### B. 增强的半同步复制 - 主库提交事务前必须收到至少一个从库的确认 - 配合rpl_semi_sync_source_wait_for_replica_count参数可要求多个从库确认 - 有效防止主库故障时的数据丢失 ### C. 并行复制优化 - 基于WriteSet的并行复制,大幅提升从库回放效率 - replica_parallel_type=LOGICAL_CLOCK配合binlog_transaction_dependency_tracking=WRITESET - 实测可将复制延迟从分钟级降到秒级 ### D. 克隆插件支持 - MySQL 8.0.17+引入Clone Plugin - 从库初始化无需mysqldump或xtrabackup - 自动完成数据同步和复制配置 ## 3. 适用场景 | 场景 | 说明 | 推荐配置 | |------|------|----------| | 读写分离 | 主库写、从库读,分担主库压力 | 异步复制 + 多从库 | | 数据容灾 | 主库故障时快速切换到从库 | 半同步复制 + GTID | | 数据备份 | 在从库执行备份,不影响主库性能 | 异步复制 + 延迟从库 | | 数据分析 | 报表查询在从库执行 | 异步复制 + 专用从库 | | 跨地域部署 | 多数据中心数据同步 | 异步复制 + 压缩传输 | ## 4. 环境要求 | 组件 | 版本要求 | 说明 | |------|----------|------| | MySQL Server | 8.0.35+ 或 8.4 LTS | 推荐使用8.4 LTS长期支持版本 | | 操作系统 | Rocky 9 / Ubuntu 24.04 | 内核版本5.14+ | | 内存 | 最低4GB,推荐16GB+ | InnoDB Buffer Pool需要足够内存 | | 磁盘 | SSD,IOPS 3000+ | binlog和数据文件分盘存储 | | 网络 | 主从之间延迟<1ms | 同机房部署最佳 | 本文实验环境: - 主库:192.168.1.100(master.example.com) - 从库:192.168.1.101(slave.example.com) - 系统:Rocky Linux 9.3 - MySQL:8.4.0 LTS # 二、准备工作 ## 1. 系统环境配置 在主库和从库服务器上执行相同的系统配置: ```bash # 关闭SELinux(或配置相应策略) sudo setenforce 0 sudo sed -i 's/SELINUX=enforcing/SELINUX=permissive/' /etc/selinux/config # 配置防火墙 sudo firewall-cmd --permanent --add-port=3306/tcp sudo firewall-cmd --reload # 配置时间同步(主从时间必须一致) sudo dnf install -y chrony sudo systemctl enable --now chronyd # 验证时间同步状态 chronyc tracking # 配置主机名解析 echo "192.168.1.100 master.example.com master" | sudo tee -a /etc/hosts echo "192.168.1.101 slave.example.com slave" | sudo tee -a /etc/hosts # 优化系统参数 cat << 'EOF' | sudo tee /etc/sysctl.d/mysql.conf # 网络优化 net.core.somaxconn = 65535 net.ipv4.tcp_max_syn_backlog = 65535 net.ipv4.tcp_fin_timeout = 10 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_keepalive_time = 600 # 内存优化 vm.swappiness = 1 vm.dirty_ratio = 10 vm.dirty_background_ratio = 5 # 文件描述符 fs.file-max = 6815744 fs.aio-max-nr = 1048576 EOF sudo sysctl -p /etc/sysctl.d/mysql.conf # 配置用户资源限制 cat << 'EOF' | sudo tee /etc/security/limits.d/mysql.conf mysql soft nofile 65535 mysql hard nofile 65535 mysql soft nproc 65535 mysql hard nproc 65535 EOF ``` ## 2. 安装MySQL 8.4 LTS ```bash # 添加MySQL官方仓库 sudo dnf install -y https://dev.mysql.com/get/mysql84-community-release-el9-1.noarch.rpm # 安装MySQL Server sudo dnf install -y mysql-community-server mysql-community-client # 查看安装版本 mysql --version # mysql Ver 8.4.0 for Linux on x86_64 (MySQL Community Server - GPL) # 创建数据目录(建议数据和日志分盘) sudo mkdir -p /data/mysql/{data,logs,binlog,relaylog,tmp} sudo chown -R mysql:mysql /data/mysql sudo chmod 750 /data/mysql ``` ## 3. 生成server-id MySQL主从复制要求每个节点有唯一的server-id。推荐使用IP地址最后两段作为server-id: ```bash # 主库(192.168.1.100) # server-id = 1100 # 从库(192.168.1.101) # server-id = 1101 ``` # 三、核心配置 ## 1. 主库配置 创建主库配置文件 /etc/my.cnf: ```ini [mysqld] # 基础配置 user = mysql port = 3306 basedir = /usr datadir = /data/mysql/data tmpdir = /data/mysql/tmp socket = /var/lib/mysql/mysql.sock pid-file = /var/run/mysqld/mysqld.pid # 字符集配置 character-set-server = utf8mb4 collation-server = utf8mb4_unicode_ci init_connect = 'SET NAMES utf8mb4' # 主从复制核心配置 server-id = 1100 log-bin = /data/mysql/binlog/mysql-bin binlog_format = ROW binlog_row_image = FULL max_binlog_size = 1G binlog_expire_logs_seconds = 604800 sync_binlog = 1 # GTID配置(强烈推荐) gtid_mode = ON enforce_gtid_consistency = ON binlog_gtid_simple_recovery = ON # 半同步复制配置 # MySQL 8.0.26+使用新参数名 rpl_semi_sync_source_enabled = ON rpl_semi_sync_source_timeout = 10000 rpl_semi_sync_source_wait_for_replica_count = 1 rpl_semi_sync_source_wait_point = AFTER_SYNC # InnoDB配置 innodb_buffer_pool_size = 8G innodb_buffer_pool_instances = 8 innodb_log_file_size = 2G innodb_log_buffer_size = 64M innodb_flush_log_at_trx_commit = 1 innodb_flush_method = O_DIRECT innodb_file_per_table = ON innodb_io_capacity = 4000 innodb_io_capacity_max = 8000 innodb_read_io_threads = 8 innodb_write_io_threads = 8 # 连接配置 max_connections = 2000 max_connect_errors = 100000 wait_timeout = 600 interactive_timeout = 600 # 日志配置 log_error = /data/mysql/logs/error.log slow_query_log = ON slow_query_log_file = /data/mysql/logs/slow.log long_query_time = 1 log_queries_not_using_indexes = ON # 性能优化 table_open_cache = 4000 table_definition_cache = 2000 thread_cache_size = 128 sort_buffer_size = 4M join_buffer_size = 4M read_buffer_size = 2M read_rnd_buffer_size = 8M # 安全配置 local_infile = OFF skip_symbolic_links = ON sql_mode = STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION [client] socket = /var/lib/mysql/mysql.sock default-character-set = utf8mb4 [mysql] socket = /var/lib/mysql/mysql.sock default-character-set = utf8mb4 prompt = "\\u@\\h [\\d]> " ``` ## 2. 从库配置 创建从库配置文件 /etc/my.cnf: ```ini [mysqld] # 基础配置 user = mysql port = 3306 basedir = /usr datadir = /data/mysql/data tmpdir = /data/mysql/tmp socket = /var/lib/mysql/mysql.sock pid-file = /var/run/mysqld/mysqld.pid # 字符集配置 character-set-server = utf8mb4 collation-server = utf8mb4_unicode_ci init_connect = 'SET NAMES utf8mb4' # 主从复制核心配置 server-id = 1101 log-bin = /data/mysql/binlog/mysql-bin binlog_format = ROW binlog_row_image = FULL max_binlog_size = 1G binlog_expire_logs_seconds = 604800 sync_binlog = 1 # 从库开启binlog(用于级联复制或切换为主库) log_replica_updates = ON # 中继日志配置 relay_log = /data/mysql/relaylog/relay-bin relay_log_recovery = ON relay_log_info_repository = TABLE relay_log_purge = ON # GTID配置 gtid_mode = ON enforce_gtid_consistency = ON binlog_gtid_simple_recovery = ON # 半同步复制配置(从库端) rpl_semi_sync_replica_enabled = ON # 并行复制配置(大幅降低复制延迟) replica_parallel_type = LOGICAL_CLOCK replica_parallel_workers = 16 replica_preserve_commit_order = ON binlog_transaction_dependency_tracking = WRITESET transaction_write_set_extraction = XXHASH64 # 复制过滤(按需配置) # replicate_do_db = app_db # replicate_ignore_db = test # replicate_wild_do_table = app_%.% # replicate_wild_ignore_table = %.tmp_% # 从库只读配置 read_only = ON super_read_only = ON # InnoDB配置 innodb_buffer_pool_size = 8G innodb_buffer_pool_instances = 8 innodb_log_file_size = 2G innodb_log_buffer_size = 64M innodb_flush_log_at_trx_commit = 2 innodb_flush_method = O_DIRECT innodb_file_per_table = ON innodb_io_capacity = 4000 innodb_io_capacity_max = 8000 innodb_read_io_threads = 8 innodb_write_io_threads = 8 # 连接配置 max_connections = 2000 max_connect_errors = 100000 wait_timeout = 600 interactive_timeout = 600 # 日志配置 log_error = /data/mysql/logs/error.log slow_query_log = ON slow_query_log_file = /data/mysql/logs/slow.log long_query_time = 1 log_queries_not_using_indexes = ON # 性能优化 table_open_cache = 4000 table_definition_cache = 2000 thread_cache_size = 128 sort_buffer_size = 4M join_buffer_size = 4M read_buffer_size = 2M read_rnd_buffer_size = 8M # 安全配置 local_infile = OFF skip_symbolic_links = ON sql_mode = STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION [client] socket = /var/lib/mysql/mysql.sock default-character-set = utf8mb4 [mysql] socket = /var/lib/mysql/mysql.sock default-character-set = utf8mb4 prompt = "\\u@\\h [\\d]> " ``` # 四、启动和验证 ## 1. 初始化并启动主库 ```bash # 初始化数据目录 sudo mysqld --initialize --user=mysql --datadir=/data/mysql/data # 获取临时密码 sudo grep 'temporary password' /data/mysql/logs/error.log # 2024-01-15T10:30:45.123456Z 6 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: Abc123!@#xyz # 启动MySQL服务 sudo systemctl start mysqld sudo systemctl enable mysqld # 修改root密码并进行安全配置 mysql -uroot -p'Abc123!@#xyz' --connect-expired-password << 'EOF' ALTER USER 'root'@'localhost' IDENTIFIED BY 'YourStrongPassword@2024'; DELETE FROM mysql.user WHERE User=''; DROP DATABASE IF EXISTS test; DELETE FROM mysql.db WHERE Db='test' OR Db='test\_%'; FLUSH PRIVILEGES; EOF # 安装半同步复制插件 mysql -uroot -p'YourStrongPassword@2024' << 'EOF' INSTALL PLUGIN rpl_semi_sync_source SONAME 'semisync_source.so'; SET GLOBAL rpl_semi_sync_source_enabled = ON; SHOW PLUGINS WHERE Name LIKE '%semi%'; EOF ``` ## 2. 创建复制用户 在主库上创建专用的复制用户: ```sql -- 登录主库 mysql -uroot -p'YourStrongPassword@2024' -- 创建复制用户 CREATE USER 'repl_user'@'192.168.1.%' IDENTIFIED WITH caching_sha2_password BY 'ReplPassword@2024'; -- 授予复制权限 GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'repl_user'@'192.168.1.%'; -- 如果使用Clone插件初始化从库,还需要额外权限 GRANT BACKUP_ADMIN ON *.* TO 'repl_user'@'192.168.1.%'; FLUSH PRIVILEGES; -- 查看主库状态 SHOW MASTER STATUS\G /* *************************** 1. row *************************** File: mysql-bin.000003 Position: 856 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 3e11fa47-c5ab-11ee-9c88-000c29123456:1-15 */ -- 查看GTID执行状态 SELECT @@GLOBAL.GTID_EXECUTED; ``` ## 3. 初始化并启动从库 ### A. 方法一:使用Clone插件(推荐,MySQL 8.0.17+) ```bash # 从库执行初始化(空数据目录) sudo mysqld --initialize --user=mysql --datadir=/data/mysql/data # 获取临时密码并启动 sudo grep 'temporary password' /data/mysql/logs/error.log sudo systemctl start mysqld # 修改root密码 mysql -uroot -p'临时密码' --connect-expired-password << 'EOF' ALTER USER 'root'@'localhost' IDENTIFIED BY 'YourStrongPassword@2024'; EOF ``` ```sql -- 登录从库 mysql -uroot -p'YourStrongPassword@2024' -- 安装Clone插件 INSTALL PLUGIN clone SONAME 'mysql_clone.so'; -- 配置克隆源(主库地址) SET GLOBAL clone_valid_donor_list = '192.168.1.100:3306'; -- 执行克隆操作(会覆盖从库所有数据并重启) CLONE INSTANCE FROM 'repl_user'@'192.168.1.100':3306 IDENTIFIED BY 'ReplPassword@2024'; -- 克隆完成后MySQL会自动重启,重新登录 mysql -uroot -p'YourStrongPassword@2024' -- 安装半同步复制插件 INSTALL PLUGIN rpl_semi_sync_replica SONAME 'semisync_replica.so'; SET GLOBAL rpl_semi_sync_replica_enabled = ON; ``` ### B. 方法二:使用mysqldump(适合小型数据库) ```bash # 在主库执行导出 mysqldump -uroot -p'YourStrongPassword@2024' \ --all-databases \ --single-transaction \ --routines \ --triggers \ --events \ --set-gtid-purged=ON \ --source-data=2 \ > /tmp/full_backup.sql # 传输到从库 scp /tmp/full_backup.sql slave:/tmp/ # 在从库导入 mysql -uroot -p'YourStrongPassword@2024' < /tmp/full_backup.sql ``` ## 4. 配置从库复制 ```sql -- 登录从库 mysql -uroot -p'YourStrongPassword@2024' -- 配置复制(使用GTID模式) CHANGE REPLICATION SOURCE TO SOURCE_HOST = '192.168.1.100', SOURCE_PORT = 3306, SOURCE_USER = 'repl_user', SOURCE_PASSWORD = 'ReplPassword@2024', SOURCE_AUTO_POSITION = 1, SOURCE_CONNECT_RETRY = 10, SOURCE_RETRY_COUNT = 3, GET_SOURCE_PUBLIC_KEY = 1; -- 启动复制 START REPLICA; -- 查看复制状态 SHOW REPLICA STATUS\G ``` ## 5. 验证复制状态 ```sql -- 在从库执行,检查关键指标 SHOW REPLICA STATUS\G /* 关键字段说明: Replica_IO_Running: Yes -- I/O线程运行状态 Replica_SQL_Running: Yes -- SQL线程运行状态 Seconds_Behind_Source: 0 -- 复制延迟(秒) Retrieved_Gtid_Set: ... -- 已接收的GTID集合 Executed_Gtid_Set: ... -- 已执行的GTID集合 Auto_Position: 1 -- GTID自动定位 Source_SSL_Allowed: Yes -- SSL状态 */ -- 检查半同步状态(主库) SHOW STATUS LIKE 'Rpl_semi_sync%'; /* Rpl_semi_sync_source_status | ON Rpl_semi_sync_source_clients | 1 Rpl_semi_sync_source_yes_tx | 156 Rpl_semi_sync_source_no_tx | 0 */ -- 检查半同步状态(从库) SHOW STATUS LIKE 'Rpl_semi_sync%'; /* Rpl_semi_sync_replica_status | ON */ ``` ## 6. 测试数据同步 ```sql -- 主库创建测试数据 CREATE DATABASE test_repl; USE test_repl; CREATE TABLE users ( id INT AUTO_INCREMENT PRIMARY KEY, username VARCHAR(50) NOT NULL, email VARCHAR(100), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ) ENGINE=InnoDB; INSERT INTO users (username, email) VALUES ('user1', 'user1@example.com'), ('user2', 'user2@example.com'), ('user3', 'user3@example.com'); -- 从库验证数据 USE test_repl; SELECT * FROM users; /* +----+----------+-------------------+---------------------+ | id | username | email | created_at | +----+----------+-------------------+---------------------+ | 1 | user1 | user1@example.com | 2024-01-15 10:45:30 | | 2 | user2 | user2@example.com | 2024-01-15 10:45:30 | | 3 | user3 | user2@example.com | 2024-01-15 10:45:30 | +----+----------+-------------------+---------------------+ */ ``` # 五、架构模式 ## 1. 一主多从架构 生产环境中常见的一主多从架构,通常配置2-3个从库分担读压力: ```mermaid graph TD M[Master<br/>192.168.1.100] --> S1[Slave1<br/>192.168.1.101<br/>读请求] M --> S2[Slave2<br/>192.168.1.102<br/>读请求] M --> S3[Slave3<br/>192.168.1.103<br/>备份专用] ```   从库1和从库2的配置基本相同,从库3作为备份专用,可配置为延迟从库: ```ini # 从库3配置(延迟复制,用于误操作恢复) [mysqld] # ...其他配置同上... # 延迟复制配置:延迟1小时 # 在CHANGE REPLICATION SOURCE时指定 # SOURCE_DELAY = 3600 ``` 配置延迟从库: ```sql CHANGE REPLICATION SOURCE TO SOURCE_HOST = '192.168.1.100', SOURCE_PORT = 3306, SOURCE_USER = 'repl_user', SOURCE_PASSWORD = 'ReplPassword@2024', SOURCE_AUTO_POSITION = 1, SOURCE_DELAY = 3600; -- 延迟1小时 START REPLICA; ``` ## 2. 级联复制架构 当从库数量较多时,可采用级联复制减轻主库压力: ```mermaid graph TD M[Master<br/>192.168.1.100] --> RS[Relay Slave<br/>192.168.1.101] RS --> S2[Slave2<br/>192.168.1.102] RS --> S3[Slave3<br/>192.168.1.103] ```   中继从库配置要点: ```ini # 中继从库关键配置 [mysqld] server-id = 1101 log-bin = /data/mysql/binlog/mysql-bin # 必须开启此选项,将接收到的事务写入自己的binlog log_replica_updates = ON # 其他配置同普通从库 ``` # 六、实际应用案例 ## 1. 电商平台读写分离 某电商平台日活百万,订单系统数据库面临高并发压力。 ### A. 需求分析 - 写入QPS:约500 - 读取QPS:约5000 - 数据量:200GB - 可用性要求:99.99% ### B. 架构设计 ```mermaid sequenceDiagram participant App as 应用服务器 participant M as 主库<br/>写操作 participant S1 as 从库1<br/>读操作 participant S2 as 从库2<br/>读操作 App->>M: 写入订单 M-->>App: 写入成功 App->>S1: 查询订单 S1-->>App: 返回数据 App->>S2: 统计报表 S2-->>App: 返回结果 ```   ### C. 应用层配置示例(Python + SQLAlchemy) ```python from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker import random class MySQLRouter: """MySQL主从路由器""" def __init__(self): # 主库连接(写操作) self.master = create_engine( 'mysql+pymysql://app_user:password@192.168.1.100:3306/order_db', pool_size=20, max_overflow=10, pool_pre_ping=True, pool_recycle=3600 ) # 从库连接池(读操作) self.slaves = [ create_engine( 'mysql+pymysql://app_user:password@192.168.1.101:3306/order_db', pool_size=30, max_overflow=20, pool_pre_ping=True, pool_recycle=3600 ), create_engine( 'mysql+pymysql://app_user:password@192.168.1.102:3306/order_db', pool_size=30, max_overflow=20, pool_pre_ping=True, pool_recycle=3600 ) ] def get_read_connection(self): """获取读连接(负载均衡)""" return random.choice(self.slaves) def get_write_connection(self): """获取写连接""" return self.master # 使用示例 router = MySQLRouter() # 写操作 with router.get_write_connection().connect() as conn: conn.execute(text("INSERT INTO orders (user_id, amount) VALUES (:uid, :amt)"), {"uid": 1001, "amt": 99.99}) conn.commit() # 读操作 with router.get_read_connection().connect() as conn: result = conn.execute(text("SELECT * FROM orders WHERE user_id = :uid"), {"uid": 1001}) orders = result.fetchall() ``` ## 2. 跨机房容灾部署 某金融系统要求同城双活,异地灾备: ```mermaid graph LR subgraph 北京机房 M[Master<br/>192.168.1.100] --> SS[Sync Slave<br/>半同步复制] end subgraph 上海机房 M -->|异步复制<br/>30ms| DR[DR Slave<br/>192.168.2.100] end ```   异地从库特殊配置: ```ini # 异地从库配置 [mysqld] server-id = 2100 # 压缩传输减少带宽消耗 replica_compressed_protocol = ON # 适当增大超时时间(跨机房网络延迟) replica_net_timeout = 120 # 增大中继日志空间 relay_log_space_limit = 20G # 允许较大的复制延迟 # 通过监控而非配置限制 ``` # 七、最佳实践 ## 1. 性能优化 ### A. 并行复制调优 MySQL 8.0的并行复制能显著降低复制延迟,关键参数调优: ```sql -- 查看当前并行复制配置 SHOW VARIABLES LIKE '%replica_parallel%'; SHOW VARIABLES LIKE '%binlog_transaction_dependency%'; -- 推荐配置(从库执行) SET GLOBAL replica_parallel_type = 'LOGICAL_CLOCK'; SET GLOBAL replica_parallel_workers = 16; -- 通常设为CPU核数 SET GLOBAL replica_preserve_commit_order = ON; SET GLOBAL binlog_transaction_dependency_tracking = 'WRITESET'; -- 监控并行复制效果 SELECT * FROM performance_schema.replication_applier_status_by_worker; ``` ### B. 网络传输优化 ```ini # 主库配置 [mysqld] # 增大binlog缓存 binlog_cache_size = 4M binlog_stmt_cache_size = 4M # 批量发送优化 replica_net_timeout = 60 ``` ```ini # 从库配置(跨机房时启用) [mysqld] # 启用压缩传输 replica_compressed_protocol = ON ``` ## 2. 安全加固 ### A. 网络安全 ```sql -- 复制用户仅允许特定IP段访问 CREATE USER 'repl_user'@'192.168.1.0/255.255.255.0' IDENTIFIED WITH caching_sha2_password BY 'StrongPassword@2024' REQUIRE SSL; -- 强制SSL连接 GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'192.168.1.0/255.255.255.0'; ``` ### B. SSL加密复制 ```bash # 生成SSL证书 mkdir -p /etc/mysql/ssl cd /etc/mysql/ssl # CA证书 openssl genrsa 2048 > ca-key.pem openssl req -new -x509 -nodes -days 3650 -key ca-key.pem -out ca-cert.pem \ -subj "/C=CN/ST=Beijing/L=Beijing/O=Example/CN=MySQL-CA" # 主库证书 openssl req -newkey rsa:2048 -nodes -keyout master-key.pem -out master-req.pem \ -subj "/C=CN/ST=Beijing/L=Beijing/O=Example/CN=master.example.com" openssl x509 -req -in master-req.pem -days 3650 -CA ca-cert.pem -CAkey ca-key.pem \ -set_serial 01 -out master-cert.pem # 从库证书 openssl req -newkey rsa:2048 -nodes -keyout slave-key.pem -out slave-req.pem \ -subj "/C=CN/ST=Beijing/L=Beijing/O=Example/CN=slave.example.com" openssl x509 -req -in slave-req.pem -days 3650 -CA ca-cert.pem -CAkey ca-key.pem \ -set_serial 02 -out slave-cert.pem # 设置权限 chown mysql:mysql *.pem chmod 600 *-key.pem chmod 644 *-cert.pem ca-cert.pem ``` ```ini # 主库SSL配置 [mysqld] ssl-ca = /etc/mysql/ssl/ca-cert.pem ssl-cert = /etc/mysql/ssl/master-cert.pem ssl-key = /etc/mysql/ssl/master-key.pem require_secure_transport = ON ``` ```sql -- 从库配置SSL复制 CHANGE REPLICATION SOURCE TO SOURCE_HOST = '192.168.1.100', SOURCE_USER = 'repl_user', SOURCE_PASSWORD = 'StrongPassword@2024', SOURCE_AUTO_POSITION = 1, SOURCE_SSL = 1, SOURCE_SSL_CA = '/etc/mysql/ssl/ca-cert.pem', SOURCE_SSL_CERT = '/etc/mysql/ssl/slave-cert.pem', SOURCE_SSL_KEY = '/etc/mysql/ssl/slave-key.pem', SOURCE_SSL_VERIFY_SERVER_CERT = 1; ``` ## 3. 配置注意事项 1. server-id必须唯一:整个复制拓扑中,每个MySQL实例的server-id必须不同 2. GTID模式限制: - 不能使用CREATE TABLE ... SELECT语句 - 不能在事务中混合使用事务表和非事务表 - 不能使用CREATE TEMPORARY TABLE(事务内) 3. binlog_format选择: - ROW:记录每行数据变化,数据一致性最好,推荐使用 - STATEMENT:记录SQL语句,binlog体积小,但某些函数可能导致不一致 - MIXED:自动选择,不推荐 ## 4. 常见错误处理 | 错误码 | 错误描述 | 原因分析 | 解决方案 | |-------|----------|----------|----------| | 1062 | Duplicate entry for key | 从库已存在相同主键数据 | 使用pt-table-sync同步数据 | | 1032 | Can't find record | 从库缺少要更新/删除的数据 | 检查数据一致性,使用pt-table-checksum校验 | | 1236 | Binary log position error | binlog位置无效或已被清理 | 重新初始化从库或使用GTID重建复制 | | 1045 | Access denied | 复制用户权限不足 | 检查用户权限和密码 | | 1593 | Server_id not set | server-id未配置 | 配置唯一的server-id | | 1794 | Slave SQL thread retried | 复制冲突重试 | 检查并发写入或网络问题 | | 13117 | GTID_PURGED can't be set | GTID集合冲突 | 使用RESET MASTER清理后重新配置 | # 八、故障排查和监控 ## 1. 关键监控指标 | 指标名称 | 说明 | 告警阈值 | 采集方法 | |----------|------|----------|----------| | Seconds_Behind_Source | 复制延迟秒数 | >10s告警,>60s严重 | SHOW REPLICA STATUS | | Replica_IO_Running | I/O线程状态 | !=Yes告警 | SHOW REPLICA STATUS | | Replica_SQL_Running | SQL线程状态 | !=Yes告警 | SHOW REPLICA STATUS | | Rpl_semi_sync_source_clients | 半同步从库数 | <1告警 | SHOW STATUS | | Rpl_semi_sync_source_no_tx | 降级异步次数 | >0告警 | SHOW STATUS | | Relay_Log_Space | 中继日志大小 | >10GB告警 | SHOW REPLICA STATUS | ## 2. 常见问题诊断 ### A. 复制延迟过大 ```sql -- 查看复制延迟 SHOW REPLICA STATUS\G -- 分析延迟原因 -- 1. 检查从库负载 SHOW PROCESSLIST; SELECT * FROM performance_schema.threads WHERE TYPE = 'BACKGROUND' AND NAME LIKE '%replica%'; -- 2. 检查是否有大事务 SELECT * FROM performance_schema.events_transactions_current WHERE STATE = 'ACTIVE'\G -- 3. 检查并行复制效率 SELECT WORKER_ID, LAST_SEEN_TRANSACTION, APPLYING_TRANSACTION, LAST_APPLIED_TRANSACTION FROM performance_schema.replication_applier_status_by_worker; -- 4. 检查是否有锁等待 SELECT * FROM performance_schema.data_lock_waits; ``` ### B. I/O线程停止 ```sql -- 查看I/O线程状态 SHOW REPLICA STATUS\G -- 关注:Replica_IO_Running, Last_IO_Error, Last_IO_Errno -- 常见原因: -- 1. 网络连接问题 -- 2. 复制用户权限问题 -- 3. 主库binlog被清理 -- 4. 主库server-id变更 -- 重新连接测试 STOP REPLICA IO_THREAD; START REPLICA IO_THREAD; ``` ### C. SQL线程停止 ```sql -- 查看SQL线程状态 SHOW REPLICA STATUS\G -- 关注:Replica_SQL_Running, Last_SQL_Error, Last_SQL_Errno -- 查看最后执行的事务 SELECT * FROM performance_schema.replication_applier_status_by_coordinator\G SELECT * FROM performance_schema.replication_applier_status_by_worker WHERE LAST_ERROR_NUMBER != 0\G -- 数据不一致时的处理 -- 1. 使用pt-table-checksum校验 pt-table-checksum --user=root --password=xxx --host=192.168.1.100 \ --databases=mydb --tables=mytable -- 2. 使用pt-table-sync修复 pt-table-sync --print --user=root --password=xxx \ --sync-to-master h=192.168.1.101,D=mydb,t=mytable ``` ## 3. 备份与恢复 ### A. 从库备份策略 利用从库进行备份,不影响主库性能: ```bash #!/bin/bash # 文件:/opt/scripts/backup_from_slave.sh # 功能:从库热备份脚本 BACKUP_DIR="/backup/mysql" DATE=$(date +%Y%m%d_%H%M%S) MYSQL_USER="backup" MYSQL_PASS="BackupPass@2024" RETENTION_DAYS=7 # 创建备份目录 mkdir -p ${BACKUP_DIR}/${DATE} # 停止复制SQL线程,确保数据一致性 mysql -u${MYSQL_USER} -p${MYSQL_PASS} -e "STOP REPLICA SQL_THREAD;" # 记录复制位置 mysql -u${MYSQL_USER} -p${MYSQL_PASS} -e "SHOW REPLICA STATUS\G" > ${BACKUP_DIR}/${DATE}/replica_status.txt # 执行备份 xtrabackup --backup \ --user=${MYSQL_USER} \ --password=${MYSQL_PASS} \ --target-dir=${BACKUP_DIR}/${DATE}/data \ --slave-info \ --safe-slave-backup # 恢复复制 mysql -u${MYSQL_USER} -p${MYSQL_PASS} -e "START REPLICA SQL_THREAD;" # 压缩备份 cd ${BACKUP_DIR} tar -czf ${DATE}.tar.gz ${DATE} rm -rf ${DATE} # 清理过期备份 find ${BACKUP_DIR} -name "*.tar.gz" -mtime +${RETENTION_DAYS} -delete echo "Backup completed: ${BACKUP_DIR}/${DATE}.tar.gz" ``` ### B. 使用备份重建从库 ```bash # 解压备份 cd /backup/mysql tar -xzf 20240115_100000.tar.gz # 准备备份 xtrabackup --prepare --target-dir=/backup/mysql/20240115_100000/data # 停止MySQL systemctl stop mysqld # 清空数据目录 rm -rf /data/mysql/data/* # 恢复数据 xtrabackup --copy-back --target-dir=/backup/mysql/20240115_100000/data # 修改权限 chown -R mysql:mysql /data/mysql/data # 启动MySQL systemctl start mysqld # 配置复制(从xtrabackup_slave_info获取位置信息) cat /backup/mysql/20240115_100000/data/xtrabackup_slave_info # CHANGE REPLICATION SOURCE TO SOURCE_AUTO_POSITION = 1 mysql -uroot -p << 'EOF' CHANGE REPLICATION SOURCE TO SOURCE_HOST = '192.168.1.100', SOURCE_USER = 'repl_user', SOURCE_PASSWORD = 'ReplPassword@2024', SOURCE_AUTO_POSITION = 1; START REPLICA; EOF ``` # 九、总结 ## 1. 技术要点回顾 1. GTID是现代MySQL复制的标配:简化故障切换,避免手动指定binlog位置 2. 半同步复制保障数据安全:rpl_semi_sync_source_wait_point=AFTER_SYNC配合超时设置,在性能和安全间取得平衡 3. 并行复制大幅降低延迟:WRITESET依赖追踪配合多个worker线程,将复制延迟从分钟级降到秒级 4. Clone插件简化从库初始化:无需依赖外部工具,自动完成数据同步和复制配置 5. 监控告警是高可用基础:复制延迟、线程状态、半同步状态必须纳入监控体系 ## 2. 进阶学习方向 - MySQL Group Replication:多主架构,自动故障转移 - MySQL InnoDB Cluster:官方高可用解决方案 - MySQL Router:自动读写分离和故障切换 - Orchestrator:复制拓扑管理和自动化故障恢复 - ProxySQL:高性能MySQL代理 ## 3. 命令速查表 | 命令 | 说明 | |------|------| | SHOW MASTER STATUS | 查看主库binlog位置 | | SHOW REPLICA STATUS\G | 查看从库复制状态 | | START REPLICA | 启动复制 | | STOP REPLICA | 停止复制 | | RESET REPLICA ALL | 清除所有复制配置 | | SHOW BINARY LOGS | 查看binlog文件列表 | | PURGE BINARY LOGS TO 'file' | 清理指定文件之前的binlog | | SELECT @@GLOBAL.GTID_EXECUTED | 查看已执行的GTID集合 | | SHOW PROCESSLIST | 查看当前连接和复制线程 | | SHOW VARIABLES LIKE '%replica%' | 查看复制相关配置 | ## 4. 术语表 | 术语 | 英文 | 说明 | |------|------|------| | 主库 | Master/Source | 提供写入的数据库节点 | | 从库 | Slave/Replica | 复制主库数据的数据库节点 | | 二进制日志 | Binary Log | 记录数据变更的日志文件 | | 中继日志 | Relay Log | 从库缓存主库binlog的日志文件 | | GTID | Global Transaction ID | 全局事务标识符 | | 半同步复制 | Semi-sync Replication | 主库等待从库确认的复制模式 | | 异步复制 | Async Replication | 主库不等待从库确认的复制模式 | | 复制延迟 | Replication Lag | 从库数据落后主库的时间 | | I/O线程 | I/O Thread | 从库接收binlog的线程 | | SQL线程 | SQL Thread | 从库回放事务的线程 | *** ## 参考资料 1. [MySQL主从复制搭建实战:数据同步零丢失配置指南](https://mp.weixin.qq.com/s/s6oe68S2paspWu4DKV8JgQ) 最后修改:2026 年 01 月 15 日 © 允许规范转载 赞 如果觉得我的文章对你有用,请随意赞赏