天天看点

安装有蓝鲸的机器断电重启了怎么办?安装有蓝鲸的机器断电重启了怎么办?q.前提知识:1.引子:2.蓝鲸机器重启原理3.蓝鲸机器重启实践3.1.确认机器有crontab任务存在3.2.机器关机,增加内存3.3.机器启动4.结论5.其他参考

安装有蓝鲸的机器断电重启了怎么办?

断电重启,手动重启,人工误操作,机器死机,蓝鲸进程怎么办?

q.前提知识:

本文基于《如何安装蓝鲸的saas-o之bk_nodeman?》

https://blog.csdn.net/haoding205/article/details/82784686

1.引子:

在上文中,我们知道,快速部署蓝鲸,正在安装蓝鲸的saas-o之bk_nodeman时,发现第一台机器由于内存不足卡死,必须要重启,我们就要做好预案,重启后进程怎么办?

2.蓝鲸机器重启原理

机器重启后

确认 /etc/resolv.conf 里第一个nameserver是 127.0.0.1,option选项不能有rotate

检查重启机器的crontab,是否有自动拉起进程的配置 crontab -l | grep process_watch,重启后的自动拉起主要靠crontab

中控机上确认所有进程状态:./bkcec status all, 正常情况下应该都是正常拉起RUNNING状态,如果有EXIT的,则尝试手动拉起。手动拉起的具体方法参考组件的启动停止

如果社区版所有机器同时重启,很大概率会有很多进程启动失败,因为不同机器上组件恢复的时间没法控制,导致依赖的组件还没启动起来,导致失败,连锁反应。所以这种情况,遵循和安装时的启动原则:

先启动db
启动依赖的其他开源组件及服务
启动蓝鲸产品
           

如果已经部署过SaaS,那么手动拉起。

./bkcec start saas-o # 正式环境
  ./bkcec start saas-t # 测试环境
           

3.蓝鲸机器重启实践

3.1.确认机器有crontab任务存在

安装有蓝鲸的机器断电重启了怎么办?安装有蓝鲸的机器断电重启了怎么办?q.前提知识:1.引子:2.蓝鲸机器重启原理3.蓝鲸机器重启实践3.1.确认机器有crontab任务存在3.2.机器关机,增加内存3.3.机器启动4.结论5.其他参考

3.2.机器关机,增加内存

3.3.机器启动

静候应用进程被crontab定时任务自动拉起,然后检查效果:

[[email protected] install]# ./bkcec status all  #检查状态
[192.168.1.101] consul: RUNNING
[192.168.1.101] nginx: RUNNING
[192.168.1.101] zk: RUNNING
[192.168.1.101] rabbitmq: RUNNING
[192.168.1.101] paas_agent()    paas_agent                       RUNNING   pid 4670, uptime 0:01:34
[192.168.1.101] nginx: RUNNING
[192.168.1.101] paas_agent(T)    paas_agent                       RUNNING   pid 4670, uptime 0:01:39
[192.168.1.101] nginx: RUNNING
[192.168.1.101] es: RUNNING
[192.168.1.101] kafka: RUNNING
---------------------------------------------------------------------------------------------------------
[192.168.1.101] dataapi     dataapi                          RUNNING   pid 3691, uptime 0:01:55
[192.168.1.101] dataapi     dataapi-celery-1                 RUNNING   pid 3688, uptime 0:01:55
[192.168.1.101] dataapi     dataapi-celery-2                 RUNNING   pid 3689, uptime 0:01:55
[192.168.1.101] dataapi     dataapi-celery-3                 RUNNING   pid 3690, uptime 0:01:55
[192.168.1.101] monitor     collect:collect0                   RUNNING   pid 5689, uptime 0:01:34
[192.168.1.101] monitor     common:logging                     RUNNING   pid 5690, uptime 0:01:34
[192.168.1.101] monitor     common:scheduler                   RUNNING   pid 5692, uptime 0:01:34
[192.168.1.101] monitor     converge:converge0                 RUNNING   pid 5694, uptime 0:01:34
[192.168.1.101] monitor     detect_cron                        RUNNING   pid 5681, uptime 0:01:34
[192.168.1.101] monitor     kernel:cron                        RUNNING   pid 5683, uptime 0:01:34
[192.168.1.101] monitor     kernel:match_alarm0                RUNNING   pid 5685, uptime 0:01:34
[192.168.1.101] monitor     kernel:qos                         RUNNING   pid 5686, uptime 0:01:34
[192.168.1.101] monitor     run_data_access:run_data_access0   RUNNING   pid 5680, uptime 0:01:34
[192.168.1.101] monitor     run_detect_new:run_detect_new0     RUNNING   pid 5679, uptime 0:01:34
[192.168.1.101] monitor     run_poll_alarm:run_poll_alarm0     RUNNING   pid 5678, uptime 0:01:34
[192.168.1.101] databus     databus_es                       RUNNING   pid 4990, uptime 0:01:41
[192.168.1.101] databus     databus_etl                      RUNNING   pid 4992, uptime 0:01:41
[192.168.1.101] databus     databus_jdbc                     RUNNING   pid 4989, uptime 0:01:41
[192.168.1.101] databus     databus_redis                    RUNNING   pid 4993, uptime 0:01:41
[192.168.1.101] databus     databus_tsdb                     RUNNING   pid 4991, uptime 0:01:41
[192.168.1.101] fta         common:apiserver                 RUNNING   pid 3685, uptime 0:02:02
[192.168.1.101] fta         common:jobserver                 RUNNING   pid 3683, uptime 0:02:02
[192.168.1.101] fta         common:logging                   RUNNING   pid 3687, uptime 0:02:02
[192.168.1.101] fta         common:polling0                  RUNNING   pid 3684, uptime 0:02:02
[192.168.1.101] fta         common:qos                       RUNNING   pid 3681, uptime 0:02:02
[192.168.1.101] fta         common:scheduler0                RUNNING   pid 3682, uptime 0:02:02
[192.168.1.101] fta         common:webserver                 RUNNING   pid 3686, uptime 0:02:02
[192.168.1.101] fta         fta:collect0                     RUNNING   pid 3678, uptime 0:02:02
[192.168.1.101] fta         fta:converge0                    RUNNING   pid 3673, uptime 0:02:02
[192.168.1.101] fta         fta:job                          RUNNING   pid 3675, uptime 0:02:02
[192.168.1.101] fta         fta:match_alarm0                 RUNNING   pid 3679, uptime 0:02:02
[192.168.1.101] fta         fta:match_alarm1                 RUNNING   pid 3680, uptime 0:02:02
[192.168.1.101] fta         fta:match_alarm2                 RUNNING   pid 3677, uptime 0:02:02
[192.168.1.101] fta         fta:match_alarm3                 RUNNING   pid 3676, uptime 0:02:02
[192.168.1.101] fta         fta:poll_alarm                   RUNNING   pid 3672, uptime 0:02:02
[192.168.1.101] fta         fta:solution                     RUNNING   pid 3674, uptime 0:02:02

[192.168.1.102] consul: RUNNING
[192.168.1.102] mysqld: RUNNING
[192.168.1.102] mongod: RUNNING
[192.168.1.102] zk: RUNNING
[192.168.1.102] paas_agent()    paas_agent                       RUNNING   pid 3714, uptime 1 day, 0:32:50
[192.168.1.102] nginx: RUNNING
[192.168.1.102] paas_agent(O)    paas_agent                       RUNNING   pid 3714, uptime 1 day, 0:32:52
[192.168.1.102] nginx: RUNNING
[192.168.1.102] es: RUNNING
[192.168.1.102] kafka: RUNNING
[192.168.1.102] beanstalk: RUNNING

[192.168.1.103] consul: RUNNING
[192.168.1.103] license: RUNNING
[192.168.1.103] redis: RUNNING
---------------------------------------------------------------------------------------------------------
[192.168.1.103] open_paas    appengine                        RUNNING   pid 13809, uptime 2 days, 5:03:29
[192.168.1.103] open_paas    esb                              RUNNING   pid 13808, uptime 2 days, 5:03:29
[192.168.1.103] open_paas    login                            RUNNING   pid 13806, uptime 2 days, 5:03:29
[192.168.1.103] open_paas    paas                             RUNNING   pid 13805, uptime 2 days, 5:03:29
[192.168.1.103] gse_alarm: RUNNING
[192.168.1.103] gse_ops: RUNNING
[192.168.1.103] gse_opts: RUNNING
[192.168.1.103] gse_api: RUNNING
[192.168.1.103] gse_btsvr: RUNNING
[192.168.1.103] gse_data: RUNNING
[192.168.1.103] gse_dba: RUNNING
[192.168.1.103] gse_task: RUNNING
[192.168.1.103] cmdb-nginx: RUNNING
[192.168.1.103] server      cmdb_adminserver                 RUNNING   pid 16427, uptime 1 day, 2:01:20
[192.168.1.103] server      cmdb_apiserver                   RUNNING   pid 16416, uptime 1 day, 2:01:20
[192.168.1.103] server      cmdb_auditcontoller              RUNNING   pid 16412, uptime 1 day, 2:01:20
[192.168.1.103] server      cmdb_datacollection              RUNNING   pid 16426, uptime 1 day, 2:01:20
[192.168.1.103] server      cmdb_eventserver                 RUNNING   pid 16417, uptime 1 day, 2:01:20
[192.168.1.103] server      cmdb_hostcontroller              RUNNING   pid 16406, uptime 1 day, 2:01:20
[192.168.1.103] server      cmdb_hostserver                  RUNNING   pid 16407, uptime 1 day, 2:01:20
[192.168.1.103] server      cmdb_objectcontroller            RUNNING   pid 16409, uptime 1 day, 2:01:20
[192.168.1.103] server      cmdb_proccontroller              RUNNING   pid 16428, uptime 1 day, 2:01:20
[192.168.1.103] server      cmdb_procserver                  RUNNING   pid 16411, uptime 1 day, 2:01:20
[192.168.1.103] server      cmdb_toposerver                  RUNNING   pid 16408, uptime 1 day, 2:01:20
[192.168.1.103] server      cmdb_webserver                   RUNNING   pid 16410, uptime 1 day, 2:01:20
[192.168.1.103] zk: RUNNING
[192.168.1.103] job: RUNNING
[192.168.1.103] es: RUNNING
[192.168.1.103] kafka: RUNNING
[192.168.1.103] influxdb: RUNNING
[[email protected] install]#

           
安装有蓝鲸的机器断电重启了怎么办?安装有蓝鲸的机器断电重启了怎么办?q.前提知识:1.引子:2.蓝鲸机器重启原理3.蓝鲸机器重启实践3.1.确认机器有crontab任务存在3.2.机器关机,增加内存3.3.机器启动4.结论5.其他参考

4.结论

实践成功,重启后应用进程被crontab自动拉起了,经过脚本检测,所有进程启动正常,符合预期结果。

此时,我们可以回到《如何安装蓝鲸的saas-o之bk_nodeman?》

https://blog.csdn.net/haoding205/article/details/82784686

继续完成bk_nodeman的安装,直到100%完成并成功为止。

5.其他参考

http://docs.bk.tencent.com/bkce_install_guide/maintain.html#migrate_module

好了,聪明如你,知道了安装有蓝鲸的机器断电重启了怎么办,是不是很欢喜 _

还有其他问题的可以在评论区留言或者扫码加博主获取资源或者提问。

安装有蓝鲸的机器断电重启了怎么办?安装有蓝鲸的机器断电重启了怎么办?q.前提知识:1.引子:2.蓝鲸机器重启原理3.蓝鲸机器重启实践3.1.确认机器有crontab任务存在3.2.机器关机,增加内存3.3.机器启动4.结论5.其他参考

继续阅读