Address in Use on Worker Restart? #121

jeffv · 2011-07-15T20:15:07Z

This only happens once or twice per month and has been very difficult to debug, and nigh on impossible to replicate. Seeing as how the issue seems to original deep inside of cluster I thought I'd ping you and get your opinion.

What seems to be happening based on the log output is that cluster is attempting to restart the master process, and I'm not sure why.

RELEVANT CODE:
var proc = cluster(app).set('workers', numberOfWorkers).listen(port);

if (!proc.isWorker) {
console.log(
':::',
' Redman Mobile '.bold,
mvc.version.magenta,
', loading on port ',
String(port).yellow,
' with ',
String(numberOfWorkers).magenta,
' worker(s):::'
);
require('./lib/staticBuilder.js');
require('./lib/mobileServer/tasks.js');
}

ERROR LOG:

::: Redman Mobile 1.1.0 , loading on port 9296 with 2 worker(s):::

node.js:134
throw e; // process.nextTick error, or 'error' event on first tick
^
Error: EADDRINUSE, Address already in use
at listen (/www/node/redman-mobile/node_modules/cluster/lib/master.js:791:17)
at Master.startListening (/www/node/redman-mobile/node_modules/cluster/lib/master.js:784:5)
at Master. (/www/node/redman-mobile/node_modules/cluster/lib/master.js:294:12)
at Master.emit (events.js:81:20)
at Server. (/www/node/redman-mobile/node_modules/cluster/lib/master.js:326:10)
at Server.emit (events.js:61:17)
at Server._startWatcher (net.js:1088:8)
at Array. (net.js:1121:10)
at EventEmitter._tickCallback (node.js:126:26)

tj · 2011-07-15T20:21:53Z

hmmmm only the original master should be bind()ing

jeffv · 2011-07-15T22:29:56Z

That's what I thought. If the master dies it should just die, right?

For some I'm getting the echo that's only intended to show when it isn't a worker. It just repeats ad-nauseum in our logs, which aren't very good. The whole thing doesn't go down either. At least one thread is still running, even though the others are giving a 500+ (I forget exactly which) error.

I've turned on cluster logging in hopes it'll provide more clues in the mean time.

tj · 2011-07-15T23:06:13Z

yeah if master is killed by anything but SIGKILL (obviously) the rest should crumble

tj · 2011-07-15T23:07:18Z

hmm yeah they dont seem to be, i'll look into it

tj · 2011-07-15T23:42:02Z

ff941d5 should help

jeffv · 2011-07-18T16:53:15Z

Yah. That certainly looks promising. I'm not in a position to deploy it just yet to test it, unfortunately. Still trying to deal with our own uncaught exceptions that are triggering the situation in the first place.

Here's the log output from a crash (in case it gives you any other fix ideas):
[Sun, 17 Jul 2011 20:36:36 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:36 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:37 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:37 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:38 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:38 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:39 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:39 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:39 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:40 GMT] INFO master started
[Sun, 17 Jul 2011 20:36:40 GMT] INFO master started
[Sun, 17 Jul 2011 20:37:17 GMT] INFO master started
[Sun, 17 Jul 2011 20:37:17 GMT] INFO spawned worker 0
[Sun, 17 Jul 2011 20:37:17 GMT] INFO spawned worker 1
[Sun, 17 Jul 2011 20:37:47 GMT] WARNING shutting down master
[Sun, 17 Jul 2011 20:37:53 GMT] INFO master started
[Sun, 17 Jul 2011 20:37:53 GMT] INFO spawned worker 0
[Sun, 17 Jul 2011 20:37:53 GMT] INFO spawned worker 1
[Sun, 17 Jul 2011 20:38:21 GMT] ERROR worker 0 died
[Sun, 17 Jul 2011 20:38:21 GMT] INFO spawned worker 0
[Sun, 17 Jul 2011 20:38:25 GMT] ERROR worker 1 died
[Sun, 17 Jul 2011 20:38:25 GMT] INFO spawned worker 1
[Sun, 17 Jul 2011 20:39:00 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:00 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:01 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:01 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:02 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:02 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:02 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:03 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:03 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:04 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:04 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:05 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:05 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:06 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:06 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:07 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:07 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:07 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:08 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:08 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:39 GMT] INFO master started
[Sun, 17 Jul 2011 20:39:39 GMT] INFO spawned worker 0
[Sun, 17 Jul 2011 20:39:39 GMT] INFO spawned worker 1

Nothing before. And nothing after. At 20:39:39 GMT Monit Daemon detected the crash and restarted the App.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address in Use on Worker Restart? #121

Address in Use on Worker Restart? #121

jeffv commented Jul 15, 2011

tj commented Jul 15, 2011

jeffv commented Jul 15, 2011

tj commented Jul 15, 2011

tj commented Jul 15, 2011

tj commented Jul 15, 2011

jeffv commented Jul 18, 2011

Address in Use on Worker Restart? #121

Address in Use on Worker Restart? #121

Comments

jeffv commented Jul 15, 2011

tj commented Jul 15, 2011

jeffv commented Jul 15, 2011

tj commented Jul 15, 2011

tj commented Jul 15, 2011

tj commented Jul 15, 2011

jeffv commented Jul 18, 2011