Docker + CentOS 部署 Zookeeper 集群 + Kubernetes Operator 自动化运维方案

环境说明

主机 IP 主机名 节点角色 数据目录 Kubernetes 节点标签
192.168.10.100 zk1 Master /opt/zookeeper/data zk-cluster=true
192.168.10.101 zk2 Worker /opt/zookeeper/data zk-cluster=true
192.168.10.102 zk3 Worker /opt/zookeeper/data zk-cluster=true
192.168.10.103 zk4 Worker /opt/zookeeper/data zk-cluster=true
192.168.10.104 zk5 Worker /opt/zookeeper/data zk-cluster=true

一、基础环境部署(所有节点)

1. 系统配置

bash
 

# 设置主机名 sudo hostnamectl set-hostname zk1  # 分别在每台机器执行对应主机名  # 编辑hosts文件 sudo tee -a /etc/hosts <<EOF 192.168.10.100 zk1 192.168.10.101 zk2 192.168.10.102 zk3 192.168.10.103 zk4 192.168.10.104 zk5 EOF  # 关闭SELinux sudo setenforce 0 sudo sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config  # 优化内核参数 sudo tee -a /etc/sysctl.conf <<EOF net.core.somaxconn=65535 net.ipv4.tcp_max_syn_backlog=65535 vm.swappiness=1 EOF sudo sysctl -p

2. Docker 安装

bash
 

# 安装依赖 sudo dnf install -y yum-utils device-mapper-persistent-data lvm2  # 添加Docker仓库 sudo yum-config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo  # 安装Docker sudo dnf install -y docker-ce docker-ce-cli containerd.io  # 配置Docker sudo mkdir -p /etc/docker sudo tee /etc/docker/daemon.json <<EOF {   "log-driver": "json-file",   "log-opts": {     "max-size": "100m",     "max-file": "3"   } } EOF  # 启动Docker sudo systemctl start docker sudo systemctl enable docker

3. Kubernetes 组件安装

bash
 

# 禁用Swap sudo swapoff -a sudo sed -i '/ swap / s/^(.*)$/#1/g' /etc/fstab  # 安装kubeadm/kubelet/kubectl cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg EOF  sudo dnf install -y kubelet kubeadm kubectl --disableexcludes=kubernetes sudo systemctl enable --now kubelet  # 初始化Master节点 (仅在zk1执行) sudo kubeadm init --pod-network-cidr=10.244.0.0/16    --control-plane-endpoint="zk1:6443"    --upload-certs    --apiserver-advertise-address=192.168.10.100  # 配置kubectl (在zk1执行) mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config  # 安装网络插件 kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml  # 加入Worker节点 (在zk2-zk5执行) # 使用kubeadm init输出的join命令 kubeadm join zk1:6443 --token <token> --discovery-token-ca-cert-hash <hash>


二、Zookeeper Operator 部署

1. 安装 Zookeeper Operator

bash
 

# 创建命名空间 kubectl create ns zookeeper-operator  # 部署Operator kubectl apply -f https://raw.githubusercontent.com/pravega/zookeeper-operator/master/deploy/all_ns/rbac.yaml kubectl apply -f https://raw.githubusercontent.com/pravega/zookeeper-operator/master/deploy/all_ns/operator.yaml  # 验证Operator状态 kubectl get pods -n zookeeper-operator

2. 创建 Zookeeper 集群 CRD

zookeeper-cluster.yaml:

yaml
 

apiVersion: zookeeper.pravega.io/v1beta1 kind: ZookeeperCluster metadata:   name: zookeeper-cluster   namespace: default spec:   replicas: 5   image:     repository: zookeeper     tag: 3.8.0   persistence:     storageClassName: local-storage     volumeReclaimPolicy: Retain     size: 20Gi   config:     initLimit: 15     syncLimit: 5     tickTime: 2000     autopurge:       snapRetainCount: 10       purgeInterval: 24   pod:     affinity:       podAntiAffinity:         requiredDuringSchedulingIgnoredDuringExecution:         - labelSelector:             matchExpressions:             - key: app               operator: In               values:               - zookeeper           topologyKey: kubernetes.io/hostname     nodeSelector:       zk-cluster: "true"     securityContext:       runAsUser: 1000       fsGroup: 1000     resources:       requests:         memory: "2Gi"         cpu: "1"       limits:         memory: "4Gi"         cpu: "2"   security:     enable: true     jaasConfig:       secretRef: zk-jaas-secret     tlsConfig:       enable: true       secretRef: zk-tls-secret   metrics:     enable: true     port: 7000

3. 创建安全配置

bash
 

# JAAS 认证配置 kubectl create secret generic zk-jaas-secret    --from-literal=jaas-config="Server {     org.apache.zookeeper.server.auth.DigestLoginModule required     user_admin="adminpassword"     user_appuser="apppassword"; };"  # TLS 证书配置 # (提前生成keystore.jks) kubectl create secret generic zk-tls-secret    --from-file=keystore.jks=keystore.jks    --from-literal=keystore-password=changeit

4. 创建存储类

local-storage.yaml:

yaml
 

apiVersion: storage.k8s.io/v1 kind: StorageClass metadata:   name: local-storage provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer

5. 部署集群

bash
 

kubectl apply -f local-storage.yaml kubectl apply -f zookeeper-cluster.yaml  # 查看集群状态 kubectl get zookeepercluster kubectl get pods -l app=zookeeper


三、自动化运维功能实现

1. 自动扩缩容

bash
 

# 水平扩展 kubectl patch zk zookeeper-cluster --type='merge' -p '{"spec":{"replicas":7}}'  # 垂直扩容 kubectl patch zk zookeeper-cluster --type='merge' -p '{"spec":{"pod":{"resources":{"limits":{"memory":"8Gi"}}}}}'

2. 自动备份与恢复

zk-backup-job.yaml:

yaml
 

apiVersion: batch/v1 kind: CronJob metadata:   name: zk-backup spec:   schedule: "0 2 * * *"   jobTemplate:     spec:       template:         spec:           containers:           - name: backup             image: zookeeper:3.8.0             command: ["/bin/sh", "-c"]             args:               - |                 echo "Connecting to ${ZK_SERVER}"                 echo "savemn" | nc ${ZK_SERVER} 2181                 tar czf /backup/$(date +%Y%m%d).tar.gz -C /data .             volumeMounts:             - name: backup-volume               mountPath: /backup             - name: data-volume               mountPath: /data           restartPolicy: OnFailure           volumes:           - name: backup-volume             persistentVolumeClaim:               claimName: zk-backup-pvc           - name: data-volume             persistentVolumeClaim:               claimName: $(ZK_PVC)

3. 自动监控告警

prometheus-monitoring.yaml:

yaml
 

apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata:   name: zookeeper-monitor spec:   selector:     matchLabels:       app: zookeeper   endpoints:   - port: metrics     interval: 15s   namespaceSelector:     any: true

4. 自动证书轮换

bash
 

# 证书更新后滚动重启 kubectl patch zk zookeeper-cluster --type='merge' -p '{"spec":{"tlsConfig":{"certUpdated":true}}}'


四、安全合规与灾备

1. 安全加固

yaml
 

# 在CRD中增加安全配置 spec:   security:     enable: true     jaasConfig:       secretRef: zk-jaas-secret     tlsConfig:       enable: true       secretRef: zk-tls-secret     networkPolicy:       enabled: true       allowedClients:       - 192.168.10.0/24

2. 跨集群灾备

yaml
 

apiVersion: zookeeper.pravega.io/v1beta1 kind: ZookeeperCluster metadata:   name: zookeeper-dr spec:   replicas: 3   config:     # 配置为观察者模式     peerType: observer     # 连接主集群     initConfig: |       server.1=zk1:2888:3888:participant;2181       server.2=zk2:2888:3888:participant;2181       server.3=zk3:2888:3888:participant;2181       server.4=dr-zk1:2888:3888:observer;2181       server.5=dr-zk2:2888:3888:observer;2181       server.6=dr-zk3:2888:3888:observer;2181


五、日常运维操作

1. 集群状态检查

bash
 

# 查看集群状态 kubectl get zookeepercluster kubectl describe zk zookeeper-cluster  # 检查节点角色 kubectl exec zookeeper-cluster-0 -- zkServer.sh status

2. 日志管理

bash
 

# 查看实时日志 kubectl logs -f zookeeper-cluster-0  # 日志归档配置 (Operator自动管理)

3. 配置热更新

bash
 

# 修改配置后触发更新 kubectl patch zk zookeeper-cluster --type='merge' -p '{"spec":{"config":{"tickTime":"3000"}}}'


六、扩展与升级

1. 集群升级流程

bash
 

# 滚动升级到新版本 kubectl patch zk zookeeper-cluster --type='merge' -p '{"spec":{"image":{"tag":"3.9.0"}}}'  # 监控升级进度 kubectl get pods -w -l app=zookeeper

2. 多集群管理

bash
 

# 部署多套Zookeeper集群 kubectl apply -f zookeeper-cluster-app1.yaml kubectl apply -f zookeeper-cluster-app2.yaml  # 统一监控 kubectl apply -f zookeeper-global-monitor.yaml


七、备份与恢复方案

1. Velero 全集群备份

bash
 

# 安装Velero velero install    --provider aws    --plugins velero/velero-plugin-for-aws:v1.0.0    --bucket zk-backups    --secret-file ./credentials-velero    --use-restic  # 创建备份 velero backup create zk-full-backup --include-namespaces default --selector app=zookeeper  # 灾难恢复 velero restore create --from-backup zk-full-backup

2. 数据迁移

bash
 

# 使用zkTransfer工具 kubectl exec zookeeper-cluster-0 -- zkTransfer.sh    --source zk1:2181    --target new-zk1:2181    --path /critical_data    --parallel 8


运维检查清单

检查项 频率 命令/方法
集群健康状态 每日 kubectl get zk
节点资源使用率 每日 kubectl top pods
证书有效期检查 每月 keytool -list -v -keystore
备份恢复测试 每季度 Velero恢复演练
安全漏洞扫描 每月 Trivy扫描镜像
故障转移演练 每半年 模拟节点故障
性能压测 每年 ZK Benchmark工具

通过Kubernetes Operator实现Zookeeper集群的全生命周期自动化管理,结合Velero实现灾备,Prometheus实现监控,显著提升运维效率。生产环境建议:

  1. 使用HashiCorp Vault管理密钥

  2. 部署多可用区集群

  3. 集成OpenPolicyAgent进行策略管理

  4. 使用GitOps工作流(Argo CD)管理配置

 

发表评论

评论已关闭。

相关文章