Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workaround: lock ttl/expiration #20

Closed
wants to merge 2 commits into from
Closed

workaround: lock ttl/expiration #20

wants to merge 2 commits into from

Conversation

mxmCherry
Copy link
Contributor

@mxmCherry mxmCherry commented Sep 30, 2021

Addresses / Solves #13


To work around that, `FLEISS_LOCK_TTL` (seconds) ENV variable can be set. This should be larger than maximum expected job perform time.

Jobs that are started but not finished in `FLEISS_LOCK_TTL` seconds can be picked up by other workers.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better English suggestions are welcome ^^

@@ -1,6 +1,6 @@
Gem::Specification.new do |s|
s.name = 'fleiss'
s.version = '0.4.4'
s.version = '0.4.5'
Copy link
Contributor Author

@mxmCherry mxmCherry Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No incompatible changes. This is a new, optional feature to be explicitly opted-in by setting FLEISS_LOCK_TTL.

@@ -4,6 +4,8 @@ class ActiveRecord
module Concern
extend ActiveSupport::Concern

DEFAULT_LOCK_TTL = ENV['FLEISS_LOCK_TTL']&.to_i # seconds
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No dynamic / official config (per-worker or so).

Considering this a tweak (even hack).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so now I see it, I don't like this. Let's add a proper heartbeart instead. How about:

  1. add a lock_expires_at column - not sure how to deal with the migration of this?
  2. document a def max_run_time callback (like we do with def ttl)
  3. set lock_expires_at when job.max_run_time is set - much like https://github.com/bsm/fleiss/blob/main/lib/fleiss/backend/active_record/concern.rb#L46
  4. fix def pending scope

Copy link
Contributor Author

@mxmCherry mxmCherry Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find suggested solution viable but I don't like it too much (still sound workaround-ish):

  • if you call it max_run_time - that sounds like you imply that job can be killed when this is reached (Time.now > started_at + max_run_time)
  • if you call it proper lock_ttl (which better be done globally, per worker) - this is kinda closer, but also not worth much without a proper heartbeat in a separate thread (maybe one per-worker thread, not per job)

I tend to implement second option (one heartbeat thread per worker).

UPD: looks like https://www.rubydoc.info/gems/concurrent-ruby/Concurrent/TimerTask is the thing for it.

UPD2: actually, it cannot be one per worker, but like one per pooled thread because job "owner" is based on thread ID... How much deeper is the rabbit hole? :)

Btw, migrations are solved by create_table(...) unless table_exists?(...) and add_column(...) unless column_exists?(...).

lib/fleiss/backend/active_record/concern.rb Show resolved Hide resolved
spec/spec_helper.rb Show resolved Hide resolved
@@ -4,6 +4,8 @@ class ActiveRecord
module Concern
extend ActiveSupport::Concern

DEFAULT_LOCK_TTL = ENV['FLEISS_LOCK_TTL']&.to_i # seconds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so now I see it, I don't like this. Let's add a proper heartbeart instead. How about:

  1. add a lock_expires_at column - not sure how to deal with the migration of this?
  2. document a def max_run_time callback (like we do with def ttl)
  3. set lock_expires_at when job.max_run_time is set - much like https://github.com/bsm/fleiss/blob/main/lib/fleiss/backend/active_record/concern.rb#L46
  4. fix def pending scope

.order(priority: :desc)
.order(scheduled_at: :asc)

pending.or(lock_expired(now, DEFAULT_LOCK_TTL))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not right. if something is e.g. finished or expired, you don't want to include it!!
it should be:

not_finished
  .not_exipred
  .where(arel_table[:scheduled_at].lteq(now))
  .merge(not_started(now).or(lock_expired(now))
  .order(priority: :desc, scheduled_at: :asc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦

@mxmCherry
Copy link
Contributor Author

I'll close this one - too much to change, better re-submit once ready.

@mxmCherry mxmCherry closed this Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants