-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the load balancer for evenly distributing the tasks across workers #211
base: devel
Are you sure you want to change the base?
Conversation
1. The default balancer is "stepwise" for `bplapply` and "sequential" for `bpiterate` 2. add two options `lapplyBalancer` and 'iterateBalancer' to `bpoptions` 3. use snake case
Can you give some more examples of how the different balancers work? Suppose we have 10 tasks numbered 1 through 10, and there are 3 workers, labeled A, B, and C (deliberately chosen to have a non-integer ratio). Can you show how the tasks will be assigned by sequential and stepwise in these cases? For random, does the balancer ensure that an approximately equal number of tasks are sent to each worker, or does it randomly select a worker for each task independently of other tasks? |
Sure, for the sequential balancer, the task dispatching plan is
For the stepwise balancer, it is
The random balancer will randomly create three sets of tasks, with the cardinality 3, 3, and 4 respectively. |
The stepwise balancer performs well in this circumstance because of how the computation scales with task number. But doesn't the 'random' balancer have lower expected evaluation time, in as much as we don't know the distribution of task evaluation times? |
Yes, the random balancers have the lowest expected evaluation time, but the highest variance(when you redo the same apply function many times). If we do not know the task evaluation times in advance, the performance of the stepwise and random balancers should be comparable in most cases. It is more like a tradeoff between expectation and variance. I'm not a fan of randomization, so I choose the stepwise balancer as the default balancer, but I keep the random balancer as an option here just in case the user knows the stepwise balancer will suffer in his apply function. |
A couple of points:
|
Thanks. I suppose that if the distribution of task evaluation times is independent of task order, then really any balancer has the same expected time? I suppose (??) that the next most likely is that task evaluation times are ordered (from low to high, or high to low), perhaps not intentionally? And then what is the optimal evaluation order? If I had seven tasks 1:7 taking 1:7 seconds, and 4 workers, then I would like to assign worker:task as 1:7, 2:1, 6; 3: 2, 5; 4: 3:4 would be optimal. But I don't think any of the balancer satisfy that? |
In my point 3 above, I'm assuming each task takes an equal amount of time. The reason for wanting to assign different numbers of tasks to each worker is that each worker takes so long to get started that by the time worker C has started, worker A has already been running long enough to run 4 or 5 tasks. For example, imagine that starting a worker takes 1 minute and each task takes 30 seconds to run. |
Hello @DarwinAwardWinner , for your comments
During the parallel evaluation, the function Once you have defined
This can let |
For @mtmorgan 's comment, I think if we know the task evaluation time in advance, we can provide a customized balancer to reach the optimal performance. It is not very hard to implement it. I plan to add a vignette to give a formal introduction to the balancer along with the other advanced features we have added recently. |
I made a mistake in my previous comment. If we have 10 tasks and 3 workers, the actual task sizes are 4, 4, 2. I think this is better than 3,3,4 as the former makes all workers to do more tasks and the latter only give more tasks to one worker(Imagine we have 109 tasks and 10 workers, one worker will have 19 tasks). There is no need to update the commit. |
Hello Martin, I wonder if you can merge this pull request. It looks like we have some new feature requests these days. |
This pull request enables the load balancer in the apply function.
There are three build-in balancers for
bplapply
, namely "sequential", "stepwise", and "random". the sequential balancer is the balancer used in the master branch. However, I changed the default balancer to the stepwise balancer in this branch.The stepwise balancer sends the 1st element of
X
to the 1st worker, 2nd to the 2nd worker, and so on down to the last worker. Then it started again, sending the next element ofX
to the 1st worker and so on. The cost of the stepwise balancer is marginal and the performance is better than the sequential balancer. Here is an example