Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lain ps 的 healthy 状态显示不准确 #64

Open
cloudfly opened this issue Jun 15, 2016 · 0 comments
Open

lain ps 的 healthy 状态显示不准确 #64

cloudfly opened this issue Jun 15, 2016 · 0 comments

Comments

@cloudfly
Copy link
Contributor

有时,如果 container 运行的过程中自己挂掉,lain ps 无法马上获取 container 的正确状态。

因 deployd 内部逻辑的不太严谨,无法马上获知 container 状态,具体情况如下:

  1. deployd 启动 container 成功,认为是 health 的。
  2. container 挂掉,但 lain ps 显示是 health 的。
  3. 90s 后 deployd 巡检,发现 container 挂掉,尝试拉起,成功,所以还是 health。
  4. container 再次挂掉,lain ps 还是 health 的。

临时改进方法:

  1. 完善 deployd 巡检逻辑,缩短巡检 interval,尝试拉起 3 次后认定 container 是 unhealth的,就不再管了。
  2. deployd 在 handle http 请求时,都从 swarm 同步最新状态(需控制频率,如最快1s 同步一次)。这里只同步状态信息,不对 container 做重启的操作。
@cloudfly cloudfly added this to the LAIN 2.1.0 milestone Jun 15, 2016
@cloudfly cloudfly added the bug label Jun 15, 2016
@fossilet fossilet added the ready label May 25, 2017
@fossilet fossilet modified the milestone: LAIN 2.1.0 Jul 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants