ETCD 增加容量限制上限

本文档针对于当前开源版本 Rainbond V5.2 编写,推荐在此之前安装 Rainbond 的开源用户阅读

为什么要增加 ETCD 容量上限

在之前的版本中,ETCD 容量上限我们使用了默认值,而默认值较小,如果ETCD的存储使用超过了空间配额, ETCD 将发起集群范围的警告,让集群进入维护模式,仅接收键的读取和删除,从而影响平台的正常运行,所以需要手动增加 ETCD 容量上限;在之后的版本中我们已将如下优化写入安装过程,无需用户手动操作。

当 ETCD 的数据存储达到上限后,将不能写入数据,出现以下报错

$  kubectl logs -fl name=rbd-etcd -n rbd-system
2020-05-31 06:03:06.144891 W | etcdserver: read-only range request "key:\"/rainbond/nodes/10.0.8.153\" " with result "range_response_count:1 size:1551" took too long (801.141729ms) to execute
2020-05-31 09:23:39.035503 W | wal: sync duration of 1.212082876s, expected less than 1s
2020-05-31 10:54:39.149146 W | etcdserver: read-only range request "key:\"/rainbond/nodes/10.0.8.153\" " with result "range_response_count:1 size:1551" took too long (547.360999ms) to execute
2020-05-31 10:54:39.149219 W | etcdserver: read-only range request "key:\"/traefik/backends/event_log_event_http/servers\" range_end:\"/traefik/backends/event_log_event_http/servert\" " with result "range_response_count:1 size:110" took too long (1.061533919s) to execute
2020-05-31 14:09:11.819257 W | etcdserver: read-only range request "key:\"/traefik/backends/event_log_event_http/servers\" range_end:\"/traefik/backends/event_log_event_http/servert\" " with result "range_response_count:1 size:110" took too long (124.522335ms) to execute
2020-05-31 14:12:30.307642 W | wal: sync duration of 1.231021859s, expected less than 1s
2020-06-01 06:18:08.886320 I | wal: segmented wal file rbd-etcd.etcd/member/wal/000000000000001a-00000000000fcd95.wal is created
2020-06-01 06:18:36.830198 I | pkg/fileutil: purged file rbd-etcd.etcd/member/wal/0000000000000015-00000000000cc4d8.wal successfully
2020-06-01 09:54:47.891290 W | etcdserver: alarm NOSPACE raised by peer dd52d8de9e6a3ec2
2020-06-02 05:03:01.433339 W | etcdserver: read-only range request "key:\"/rainbond/endpoint\" range_end:\"/rainbond/endpoinu\" " with result "range_response_count:0 size:6" took too long (218.222717ms) to execute

修改方式

需要修改 etcd 参数两处内容

kubectl edit sts rbd-etcd -n rbd-system
  1. spec.containers 字段添加以下内容
        env:
        - name: ETCD_QUOTA_BACKEND_BYTES
          value: "4294967296"
  1. 修改 spec.containers.volumeMountsmountPath/var/run/etcd ,保存退出
        volumeMounts:
        - mountPath: /var/run/etcd
          name: data

示例

修改完成后查看 etcd 服务是否正常启动

$ kubectl get po -l name=rbd-etcd -n rbd-system
NAME         READY   STATUS    RESTARTS   AGE
rbd-etcd-0   1/1     Running   0          9m31s
  1. 重启以下组件

rbd-eventlog rbd-node rbd-worker rbd-chaos rbd-api

rbd-eventlog 组件为例,执行重启操作

kubectl delete po -l name=rbd-eventlog -n rbd-system

在平台构建应用查看问题是否解决

此方法解决问题

1 前端点击构建显示构建异常

{“msg”:“deploy app error”,“code”:507,“data”:{“bean”:{},“list”:[]},“msg_show”:“构建异常”}

2 后端grctl cluster 查看节点状态为unknown状态