MySQL db03:33060+ ssl JS > c.status() { "clusterName": "main", "defaultReplicaSet": { "name": "default", "primary": "db03.luis.local:3306", "ssl": "REQUIRED", "status": "NO_QUORUM", "statusText": "Cluster has no quorum as visible from 'db03.luis.local:3306' and cannot process write transactions. 2 members are not active", "topology": { "db01.luis.local:3306": { "address": "db01.luis.local:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "(MISSING)" }, "db02.luis.local:3306": { "address": "db02.luis.local:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "UNREACHABLE", "version": "8.0.21" }, "db03.luis.local:3306": { "address": "db03.luis.local:3306", "mode": "R/O", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.21" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "db03.luis.local:3306" }
rejoin しようが何しようが有効な quorum がないためエラーになる。
MySQL db03:33060+ ssl JS > c.rejoinInstance('db03.luis.local:3306')
Cluster.rejoinInstance: There is no quorum to perform the operation (RuntimeError)
有効なノードが1台居るから(db03.luis.local:3306)ここを元に Cluster を再作成する。
MySQL db03:33060+ ssl JS > c.forceQuorumUsingPartitionOf('root@db03.luis.local:3306', 'password') Restoring cluster 'main' from loss of quorum, by using the partition composed of [db03.luis.local:3306] Restoring the InnoDB cluster ... The InnoDB cluster was successfully restored using the partition from the instance 'root@db03.luis.local:3306'. WARNING: To avoid a split-brain scenario, ensure that all other members of the cluster are removed or joined back to the group that was restored.
稀に失敗することがあるけど数分後待つと通る(もしかして、status: UNREACHABLE が悪い?)
数分待って通ったのは db02.luis.local:3306 への疎通が通ったからかな?
MySQL db03:33060+ ssl JS > c.forceQuorumUsingPartitionOf('root@db03.luis.local:3306', 'password') Restoring cluster 'main' from loss of quorum, by using the partition composed of [db03.luis.local:3306] Restoring the InnoDB cluster ... Cluster.forceQuorumUsingPartitionOf: db03.luis.local:3306: Variable 'group_replication_force_members' can't be set to the value of 'db03.luis.local:33061' (RuntimeError)
NO_QUORUM は解決できた
MySQL db03:33060+ ssl JS > c.status() { "clusterName": "main", "defaultReplicaSet": { "name": "default", "primary": "db03.luis.local:3306", "ssl": "REQUIRED", "status": "OK_NO_TOLERANCE", "statusText": "Cluster is NOT tolerant to any failures. 2 members are not active", "topology": { "db01.luis.local:3306": { "address": "db01.luis.local:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "(MISSING)" }, "db02.luis.local:3306": { "address": "db02.luis.local:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "(MISSING)" }, "db03.luis.local:3306": { "address": "db03.luis.local:3306", "mode": "R/W", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.21" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "db03.luis.local:3306" }
あとは MISSING なノードを rejoin させる
MySQL db03:33060+ ssl JS > c.rejoinInstance('root@db01.luis.local') Rejoining the instance to the InnoDB cluster. Depending on the original problem that made the instance unavailable, the rejoin operation might not be successful and further manual steps will be needed to fix the underlying problem. Please monitor the output of the rejoin operation and take necessary action if the instance cannot rejoin. Rejoining instance to the cluster ... The instance 'db01.luis.local' was successfully rejoined on the cluster. MySQL db03:33060+ ssl JS > c.rejoinInstance('root@db02.luis.local') Rejoining the instance to the InnoDB cluster. Depending on the original problem that made the instance unavailable, the rejoin operation might not be successful and further manual steps will be needed to fix the underlying problem. Please monitor the output of the rejoin operation and take necessary action if the instance cannot rejoin. Rejoining instance to the cluster ... The instance 'db02.luis.local' was successfully rejoined on the cluster. MySQL db03:33060+ ssl JS > c.status() { "clusterName": "main", "defaultReplicaSet": { "name": "default", "primary": "db03.luis.local:3306", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "db01.luis.local:3306": { "address": "db01.luis.local:3306", "mode": "R/O", "readReplicas": {}, "recovery": { "state": "ON" }, "recoveryStatusText": "Distributed recovery in progress", "role": "HA", "status": "RECOVERING", "version": "8.0.21" }, "db02.luis.local:3306": { "address": "db02.luis.local:3306", "mode": "R/O", "readReplicas": {}, "recovery": { "cloneStartTime": "2020-04-11 12:37:09.240", "cloneState": "Completed", "currentStage": "RECOVERY", "currentStageState": "Completed" }, "recoveryStatusText": "Cloning in progress", "role": "HA", "status": "RECOVERING", "version": "8.0.21" }, "db03.luis.local:3306": { "address": "db03.luis.local:3306", "mode": "R/W", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.21" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "db03.luis.local:3306" }