Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address in Use on Worker Restart? #121

Open
jeffv opened this issue Jul 15, 2011 · 6 comments
Open

Address in Use on Worker Restart? #121

jeffv opened this issue Jul 15, 2011 · 6 comments

Comments

@jeffv
Copy link

jeffv commented Jul 15, 2011

This only happens once or twice per month and has been very difficult to debug, and nigh on impossible to replicate. Seeing as how the issue seems to original deep inside of cluster I thought I'd ping you and get your opinion.

What seems to be happening based on the log output is that cluster is attempting to restart the master process, and I'm not sure why.

RELEVANT CODE:
var proc = cluster(app).set('workers', numberOfWorkers).listen(port);

if (!proc.isWorker) {
console.log(
':::',
' Redman Mobile '.bold,
mvc.version.magenta,
', loading on port ',
String(port).yellow,
' with ',
String(numberOfWorkers).magenta,
' worker(s):::'
);
require('./lib/staticBuilder.js');
require('./lib/mobileServer/tasks.js');
}

ERROR LOG:

::: Redman Mobile 1.1.0 , loading on port 9296 with 2 worker(s):::

node.js:134
throw e; // process.nextTick error, or 'error' event on first tick
^
Error: EADDRINUSE, Address already in use
at listen (/www/node/redman-mobile/node_modules/cluster/lib/master.js:791:17)
at Master.startListening (/www/node/redman-mobile/node_modules/cluster/lib/master.js:784:5)
at Master. (/www/node/redman-mobile/node_modules/cluster/lib/master.js:294:12)
at Master.emit (events.js:81:20)
at Server. (/www/node/redman-mobile/node_modules/cluster/lib/master.js:326:10)
at Server.emit (events.js:61:17)
at Server._startWatcher (net.js:1088:8)
at Array. (net.js:1121:10)
at EventEmitter._tickCallback (node.js:126:26)

@tj
Copy link
Contributor

tj commented Jul 15, 2011

hmmmm only the original master should be bind()ing

@jeffv
Copy link
Author

jeffv commented Jul 15, 2011

That's what I thought. If the master dies it should just die, right?

For some I'm getting the echo that's only intended to show when it isn't a worker. It just repeats ad-nauseum in our logs, which aren't very good. The whole thing doesn't go down either. At least one thread is still running, even though the others are giving a 500+ (I forget exactly which) error.

I've turned on cluster logging in hopes it'll provide more clues in the mean time.

@tj
Copy link
Contributor

tj commented Jul 15, 2011

yeah if master is killed by anything but SIGKILL (obviously) the rest should crumble

@tj
Copy link
Contributor

tj commented Jul 15, 2011

hmm yeah they dont seem to be, i'll look into it

@tj
Copy link
Contributor

tj commented Jul 15, 2011

ff941d5 should help

@jeffv
Copy link
Author

jeffv commented Jul 18, 2011

Yah. That certainly looks promising. I'm not in a position to deploy it just yet to test it, unfortunately. Still trying to deal with our own uncaught exceptions that are triggering the situation in the first place.

Here's the log output from a crash (in case it gives you any other fix ideas):
[Sun, 17 Jul 2011 20:36:36 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:36 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:37 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:37 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:38 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:38 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:39 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:39 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:39 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:40 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:40 GMT] INFO master started
[Sun, 17 Jul 2011 20:37:17 GMT] INFO master started
[Sun, 17 Jul 2011 20:37:17 GMT] INFO spawned worker 0
[Sun, 17 Jul 2011 20:37:17 GMT] INFO spawned worker 1
[Sun, 17 Jul 2011 20:37:47 GMT] WARNING shutting down master
[Sun, 17 Jul 2011 20:37:53 GMT] INFO master started
[Sun, 17 Jul 2011 20:37:53 GMT] INFO spawned worker 0
[Sun, 17 Jul 2011 20:37:53 GMT] INFO spawned worker 1
[Sun, 17 Jul 2011 20:38:21 GMT] ERROR worker 0 died
[Sun, 17 Jul 2011 20:38:21 GMT] INFO spawned worker 0
[Sun, 17 Jul 2011 20:38:25 GMT] ERROR worker 1 died
[Sun, 17 Jul 2011 20:38:25 GMT] INFO spawned worker 1
[Sun, 17 Jul 2011 20:39:00 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:00 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:01 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:01 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:02 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:02 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:02 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:03 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:03 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:04 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:04 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:05 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:05 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:06 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:06 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:07 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:07 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:07 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:08 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:08 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:39 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:39 GMT] INFO spawned worker 0
[Sun, 17 Jul 2011 20:39:39 GMT] INFO spawned worker 1

Nothing before. And nothing after. At 20:39:39 GMT Monit Daemon detected the crash and restarted the App.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants