-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(legacy-refunds): support going the spending path on refund #2280
base: dev
Are you sure you want to change the base?
Conversation
modularize these funcs and introdcue a new error type to filter errors and possible retrys. tests are not yet adapted.
not finished and on successful checking swaps are removed. these tests were used to check that we error on recover funds when: - swap is not finished yet: there is not reason not to try recover funds still - swap was successful (determined the existance of taker/maker payment spend): we should still attempt to recover funds in this case as the tx might be re-orged or some weird thing happned. there is no downside of retrying. also test_recover_funds_maker_swap_maker_payment_refunded was used to test that we check local data to determine that the swap failed. again this isn't the followed approach here. even if we store maker_payment_refund tx locally (so we should have sent it when the swap was running) we will still attempt to run recover funds and look for the tx on-chain. this test could have been adapted but then it looks exactly like test_recover_funds_maker_payment_refund_already_refunded, so there is no point of that.
namely, we do try to refund (maker payment) or spend (taker payment), which ever is possible. we might want to change refund_maker_payment name to recover or something
same as what's done with the maker. thought much more useful here because on the maker side we can't really miss spending the taker's payment while getting our own payment spent :p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, looks cleaner. My only note is regarding error handling.
@mariocynicys could you please fix PR lint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this fix, here my notes
sorry, but amending this would break the links i just shared :). thanks for your undertanding, reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. First review iteration!
A few suggestions/discussion points:
- Maybe we should add start/stop swap rpc for @cipig to use to stop endlessly failing swaps until we fix the bugs that lead to this.
- We will need this for TPU, please add it to an issue checklist.
.maker_coin | ||
.can_refund_htlc(maker_payment_lock) | ||
.await | ||
.map_err(RecoverSwapError::Irrecoverable)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can_refund_htlc
can sometimes return a recoverable error, please check utxo implementation for this ref.
let mtp = coin.get_current_mtp().await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Roll back to confirming the maker payment spend. | ||
RecoveredSwapAction::SpentOtherPayment => { | ||
info!("Refund canceled. Maker payment spend tx {:02x}", tx_ident.tx_hash); | ||
// TODO: We prepared for refund but didn't finalize refund. This must be breaking something for lightning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we ever hit this code in lightning? I guess it can happen for taker swap code but it's not a big problem anyways since the finalize refund is just to allow the maker get his payment back instantly instead of waiting for the timelock to expire, and if we spend it, it means it can't be failed backwards. For maker swap code it's a different case, maker fails the htlc backwards before this step, so we will never hit this code.
Usually such things in code is a cue for a needed refactor but not in this PR of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this is only relevant for lightning receiving maker. and yeah if the maker fails the HTLC then we can't hit SpentOtherPayment
.
removed the todo 70afcde
// FIXME: We should try to `recover_funds` again after locktime. The taker payment might have been | ||
// spent by the maker at this point and we should go for spending the maker payment instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, wait_for_htlc_refund
for lightning doesn't depend on locktime, but this is a general code and we should handle this case. For the lightning case, if we got payment successful from here
komodo-defi-framework/mm2src/coins/lightning.rs
Lines 824 to 828 in f401497
_ => Ready(MmError::err(RefundError::Internal(ERRL!( | |
"Payment {} has an invalid status of {} in the db", | |
payment_hex, | |
payment.status | |
)))), |
It means that we should be spending the maker payment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that makes sense in the general sense. if the wait_for_htlc_refund
fails we can try to recover again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is one case that comes to my mind where it would be bad to retry endlessly: if the initial tx was reverted because of too low gas (so basically a misconfiguration from our side)... if you retry that tx, you will spend gas again, but the result will be the same if you haven't increased gas limits... so you loose the txfees every time you try
idk if we want/should account for this case, but thought i mention it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's a good reason to have a start/stop
rpc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A start/stop rpc doesn't solve this, as too much gas might have been already spent until the user or @cipig is aware of this, this could drain the user's wallet which is really bad. We need to mark swap as failed for this and this is one of the situations where we will still require manual recover funds usage. @cipig are you aware of other situations like this one, we shouldn't release this PR/fix unless we are sure it's completely safe to retry forever and all edge cases are covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is this #1567, but there you don't loose any fees when you retry forever
i guess reverted "out of gas" txes on EVM chains are the only problem where you loose money if you retry forever
there may be other reasons why a EVM chain reverts your tx, but i haven't encountered any other that would always fail
so short answer is "no", just this case :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
covered here: a28c768
for the gas issue with reverted txs: what about retrying the refund in an exponential backoff manner? this greatly reduces the amount of failed recoveries and doesn't hurt the perf/speed that much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, 'reverted' means that tx was cancelled due to errors during the contract execution (for e.g. locktime has not been passed yet), so the tx may be retried yet.
A tx can be cancelled due to out of gas. To prevent retries in this case, I think, we may analyse the tx receipt in eth code and return some unrecoverable error to the swap code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A tx can be cancelled due to out of gas. To prevent retries in this case, I think, we may analyse the tx receipt in eth code and return some unrecoverable error to the swap code.
gas spikes on Ethereum are typically temporary nope? and a transaction being "out of gas" is not necessarily a permanent failure.
I would suggest to mark swap failed in this case and stop the process, but allow user to retry spend-or-refun process later and start it manually. And also allow to stop if it takes too long time for them. If retry faced "out of gas" again, then mark as failed and stop process.
I suppose you're referencing to this issue #1895 ? We have lots of lists, so I just want to clarify. I also added eth coin todos here. |
Moved target for this to 2.4.0-beta release as there will not be enough time to test it for next release. |
@@ -1375,13 +1364,13 @@ impl MakerSwap { | |||
Ok((swap, command)) | |||
} | |||
|
|||
pub async fn recover_funds(&self) -> Result<RecoveredSwap, String> { | |||
async fn try_spend_taker_payment(selfi: &MakerSwap, secret_hash: &[u8]) -> Result<TransactionEnum, String> { | |||
pub async fn recover_funds(&self) -> MmResult<RecoveredSwap, RecoverSwapError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a place which calls recover_funds
taker and maker methods wrapped in try_s
macro
komodo-defi-framework/mm2src/mm2_main/src/lp_swap/saved_swap.rs
Lines 111 to 120 in 10e4192
match self { | |
SavedSwap::Maker(saved) => { | |
let (maker_swap, _) = try_s!(MakerSwap::load_from_saved(ctx, maker_coin, taker_coin, saved)); | |
Ok(try_s!(maker_swap.recover_funds().await)) | |
}, | |
SavedSwap::Taker(saved) => { | |
let (taker_swap, _) = try_s!(TakerSwap::load_from_saved(ctx, maker_coin, taker_coin, saved).await); | |
Ok(try_s!(taker_swap.recover_funds().await)) | |
}, | |
} |
as try_s!
appends error path + MmErr also contains error path, Im afraid there will be unreadable result error message with error path duplications
Ok(try_s!(maker_swap.recover_funds().await))
Ok(try_s!(taker_swap.recover_funds().await))
I would suggest to extract MmErr inner value and put it into ERR!
in SavedSwap.recover_funds()
.
Or return MmResult
in recover_funds
from SavedSwap
and in RPC method fn recover_funds_of_swap
, as recover_funds
from SavedSwap
is used inly in recover_funds_of_swap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ummmm, i imagined we would have both paths as in kind of nesting, which is good (and was the behaviour before this PR anyways).
i will try to trigger this and see if it looks complicated then i'll simplify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upd: its non blocking and not urgent note. I suppose we have other places which already have same issue. I think it will be fixed in refactored kdf repo.
this field was convereted to a hashmap instead of a vector for easy access to the swap. also we now manually delete the swap from running swaps when the swap is finished/inturepted (memory leak fix). as a consequence to manuallly deleting the swap from running_swaps, we can now store them as arcs instead of weakrefs, which simplifies a lot of .upgrade calls.
10e4192
to
245ea93
Compare
This reverts commit 5c1bb6f.
))); | ||
}, | ||
}; | ||
let taker_coin = match swap.taker_coin_ticker() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code repetition for finding maker_coin and taker_coin.
(BTW could we just use lp_coinfind_or_err in all such cases?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. i didn't like it either but needed some assertion xD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reverts commit 245ea93.
@@ -516,7 +532,7 @@ struct LockedAmountInfo { | |||
} | |||
|
|||
struct SwapsContext { | |||
running_swaps: Mutex<Vec<Weak<dyn AtomicSwap>>>, | |||
running_swaps: Mutex<Vec<(Weak<dyn AtomicSwap>, AbortOnDropHandle)>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you tell why do we need this change?
Is it related to some review note? may be I missed smth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah actually, not discussed in review. @shamardy just told me we need the swaps to be stoppable via rpc (since we now do a run forever recovery), that's why we record their abort handles to be able to stop them mid-recover (or even mid-swap).
}; | ||
// Run the swap in an abortable task and wait for it to finish. | ||
let (swap_ended_notifier, swap_ended_notification) = oneshot::channel(); | ||
let abortable_swap = spawn_abortable(async move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we create an abortable future to run the swap loop (for using this in the new stop rpc I believe).
But run_taker_swap fn, where we are now, is also wrapped in an abortable future. Could we use this one in the stop rpc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but how will we grab that abort handle and supply it to run_taker_swap
in the first place?
also kickstart_thread_handler
uses run_taker_swap
inline (without spawning it), well eventually if u follow the usage ofc u will find kickstart_thread_handler
is itself being spawned somewhere. but it's hard to propagate these handles all the way down to the swap runner methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a bit odd when we use different futures to stop the same swap in different use cases (abort_all and stop_swap_rpc). Maybe we could get the handle for a future from the abortable_system for stop_swap_rpc, so we would use same abort technique in both cases. Maybe some modification for AbortableQueue is needed for that
test_mm2_stops_immediately seems not working btw. I guess clear_running_swaps() is needed here too? |
yup. I'll just wait till we finish reviewing the mem leak PR and merge it and then merge this with dev. |
_touch = touch_loop => unreachable!("Touch loop can not stop!"), | ||
}; | ||
// Run the swap in an abortable task and wait for it to finish. | ||
let (swap_ended_notifier, swap_ended_notification) = oneshot::channel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we simplify this code to eliminate oneshot::channel?
- let (swap_ended_notifier, swap_ended_notification) = oneshot::channel();
- let abortable_swap = spawn_abortable(async move {
+ let fut_with_touch = async move {
select! {
_swap = swap_fut => (), // swap finished normally
_touch = touch_loop => unreachable!("Touch loop can not stop!"),
- }
- if swap_ended_notifier.send(()).is_err() {
- error!("Swap listener stopped listening!");
- }
- });
- swap_ctx.running_swaps.lock().unwrap().push((weak_ref, abortable_swap));
- // Halt this function until the swap has finished (or interrupted, i.e. aborted/panic).
- swap_ended_notification.await.error_log_with_msg("Swap interrupted!");
+ };
+ };
+ let (abortable, handle) = abortable(fut_with_touch);
+ swap_ctx.running_swaps.lock().unwrap().push((weak_ref, handle.into()));
+ if let Err(Aborted) = abortable.await {
+ info!("Swap uuid={} interrupted!", uuid);
+ }
@mariocynicys pr started to have conflicts |
This PR attempts to follow a more robust recovery process by attempting both refunding OR spending if the swap goes the ugly way.
This is done by replacing the refund-only logic with refund-or-spend-logic (recovery,
recover_funds
).Also
recover_funds
has been adapted to return more structured error types for information about retrials and such. Also removing some pre-checks that makes us fail early based on local data (e.g. is_swap_finished), as we should still query the rpc to make sure our recovery tx isn't lost or something.