Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(legacy-refunds): support going the spending path on refund #2280

Open
wants to merge 19 commits into
base: dev
Choose a base branch
from

Conversation

mariocynicys
Copy link
Collaborator

This PR attempts to follow a more robust recovery process by attempting both refunding OR spending if the swap goes the ugly way.
This is done by replacing the refund-only logic with refund-or-spend-logic (recovery, recover_funds).

Also recover_funds has been adapted to return more structured error types for information about retrials and such. Also removing some pre-checks that makes us fail early based on local data (e.g. is_swap_finished), as we should still query the rpc to make sure our recovery tx isn't lost or something.

modularize these funcs and introdcue a new error type to filter errors and possible retrys.
tests are not yet adapted.
not finished and on successful checking swaps are removed. these tests were used to check that we error on recover funds when:
- swap is not finished yet: there is not reason not to try recover funds still
- swap was successful (determined the existance of taker/maker payment spend): we should still attempt to recover funds in this case as the tx might be re-orged or some weird thing happned. there is no downside of retrying.

also test_recover_funds_maker_swap_maker_payment_refunded was used to test that we check local data to determine that the swap failed. again this isn't the followed approach here. even if we store maker_payment_refund tx locally (so we should have sent it when the swap was running) we will still attempt to run recover funds and look for the tx on-chain.
this test could have been adapted but then it looks exactly like test_recover_funds_maker_payment_refund_already_refunded, so there is no point of that.
namely, we do try to refund (maker payment) or spend (taker payment), which ever is possible.
we might want to change refund_maker_payment name to recover or something
same as what's done with the maker.
thought much more useful here because on the maker side we can't really miss spending the taker's payment while getting our own payment spent :p
Copy link
Member

@borngraced borngraced left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, looks cleaner. My only note is regarding error handling.

mm2src/mm2_main/src/lp_swap/maker_swap.rs Outdated Show resolved Hide resolved
mm2src/mm2_main/src/lp_swap/taker_swap.rs Outdated Show resolved Hide resolved
@laruh
Copy link
Member

laruh commented Nov 25, 2024

@mariocynicys could you please fix PR lint

@mariocynicys mariocynicys changed the title optimization(legacy-swap): support going the spending path on refund fix(legacy-refunds): support going the spending path on refund Nov 25, 2024
Copy link
Member

@laruh laruh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix, here my notes

mm2src/mm2_main/src/lp_swap/maker_swap.rs Outdated Show resolved Hide resolved
mm2src/mm2_main/src/lp_swap/maker_swap.rs Outdated Show resolved Hide resolved
mm2src/mm2_main/src/lp_swap/taker_swap.rs Outdated Show resolved Hide resolved
mm2src/mm2_main/src/lp_swap/taker_swap.rs Outdated Show resolved Hide resolved
mm2src/mm2_main/src/lp_swap/maker_swap.rs Outdated Show resolved Hide resolved
borngraced
borngraced previously approved these changes Dec 2, 2024
Copy link
Collaborator

@shamardy shamardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. First review iteration!

A few suggestions/discussion points:

  • Maybe we should add start/stop swap rpc for @cipig to use to stop endlessly failing swaps until we fix the bugs that lead to this.
  • We will need this for TPU, please add it to an issue checklist.

Comment on lines 1515 to 1518
.maker_coin
.can_refund_htlc(maker_payment_lock)
.await
.map_err(RecoverSwapError::Irrecoverable)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can_refund_htlc can sometimes return a recoverable error, please check utxo implementation for this ref.

let mtp = coin.get_current_mtp().await?;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm2src/mm2_main/src/lp_swap.rs Show resolved Hide resolved
// Roll back to confirming the maker payment spend.
RecoveredSwapAction::SpentOtherPayment => {
info!("Refund canceled. Maker payment spend tx {:02x}", tx_ident.tx_hash);
// TODO: We prepared for refund but didn't finalize refund. This must be breaking something for lightning.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we ever hit this code in lightning? I guess it can happen for taker swap code but it's not a big problem anyways since the finalize refund is just to allow the maker get his payment back instantly instead of waiting for the timelock to expire, and if we spend it, it means it can't be failed backwards. For maker swap code it's a different case, maker fails the htlc backwards before this step, so we will never hit this code.
Usually such things in code is a cue for a needed refactor but not in this PR of course.

Copy link
Collaborator Author

@mariocynicys mariocynicys Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this is only relevant for lightning receiving maker. and yeah if the maker fails the HTLC then we can't hit SpentOtherPayment.
removed the todo 70afcde

Comment on lines 1967 to 1968
// FIXME: We should try to `recover_funds` again after locktime. The taker payment might have been
// spent by the maker at this point and we should go for spending the maker payment instead.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, wait_for_htlc_refund for lightning doesn't depend on locktime, but this is a general code and we should handle this case. For the lightning case, if we got payment successful from here

_ => Ready(MmError::err(RefundError::Internal(ERRL!(
"Payment {} has an invalid status of {} in the db",
payment_hex,
payment.status
)))),

It means that we should be spending the maker payment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that makes sense in the general sense. if the wait_for_htlc_refund fails we can try to recover again.

Copy link
Member

@cipig cipig Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is one case that comes to my mind where it would be bad to retry endlessly: if the initial tx was reverted because of too low gas (so basically a misconfiguration from our side)... if you retry that tx, you will spend gas again, but the result will be the same if you haven't increased gas limits... so you loose the txfees every time you try
idk if we want/should account for this case, but thought i mention it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's a good reason to have a start/stop rpc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A start/stop rpc doesn't solve this, as too much gas might have been already spent until the user or @cipig is aware of this, this could drain the user's wallet which is really bad. We need to mark swap as failed for this and this is one of the situations where we will still require manual recover funds usage. @cipig are you aware of other situations like this one, we shouldn't release this PR/fix unless we are sure it's completely safe to retry forever and all edge cases are covered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is this #1567, but there you don't loose any fees when you retry forever
i guess reverted "out of gas" txes on EVM chains are the only problem where you loose money if you retry forever
there may be other reasons why a EVM chain reverts your tx, but i haven't encountered any other that would always fail
so short answer is "no", just this case :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

covered here: a28c768

for the gas issue with reverted txs: what about retrying the refund in an exponential backoff manner? this greatly reduces the amount of failed recoveries and doesn't hurt the perf/speed that much.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, 'reverted' means that tx was cancelled due to errors during the contract execution (for e.g. locktime has not been passed yet), so the tx may be retried yet.
A tx can be cancelled due to out of gas. To prevent retries in this case, I think, we may analyse the tx receipt in eth code and return some unrecoverable error to the swap code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tx can be cancelled due to out of gas. To prevent retries in this case, I think, we may analyse the tx receipt in eth code and return some unrecoverable error to the swap code.

gas spikes on Ethereum are typically temporary nope? and a transaction being "out of gas" is not necessarily a permanent failure.
I would suggest to mark swap failed in this case and stop the process, but allow user to retry spend-or-refun process later and start it manually. And also allow to stop if it takes too long time for them. If retry faced "out of gas" again, then mark as failed and stop process.

@laruh
Copy link
Member

laruh commented Dec 5, 2024

  • We will need this for TPU, please add it to an issue checklist.

I suppose you're referencing to this issue #1895 ?

We have lots of lists, so I just want to clarify. I also added eth coin todos here.

@shamardy
Copy link
Collaborator

shamardy commented Dec 9, 2024

Moved target for this to 2.4.0-beta release as there will not be enough time to test it for next release.

@@ -1375,13 +1364,13 @@ impl MakerSwap {
Ok((swap, command))
}

pub async fn recover_funds(&self) -> Result<RecoveredSwap, String> {
async fn try_spend_taker_payment(selfi: &MakerSwap, secret_hash: &[u8]) -> Result<TransactionEnum, String> {
pub async fn recover_funds(&self) -> MmResult<RecoveredSwap, RecoverSwapError> {
Copy link
Member

@laruh laruh Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a place which calls recover_funds taker and maker methods wrapped in try_s macro

match self {
SavedSwap::Maker(saved) => {
let (maker_swap, _) = try_s!(MakerSwap::load_from_saved(ctx, maker_coin, taker_coin, saved));
Ok(try_s!(maker_swap.recover_funds().await))
},
SavedSwap::Taker(saved) => {
let (taker_swap, _) = try_s!(TakerSwap::load_from_saved(ctx, maker_coin, taker_coin, saved).await);
Ok(try_s!(taker_swap.recover_funds().await))
},
}

as try_s! appends error path + MmErr also contains error path, Im afraid there will be unreadable result error message with error path duplications

Ok(try_s!(maker_swap.recover_funds().await))
Ok(try_s!(taker_swap.recover_funds().await))

I would suggest to extract MmErr inner value and put it into ERR! in SavedSwap.recover_funds().
Or return MmResult in recover_funds from SavedSwap and in RPC method fn recover_funds_of_swap, as recover_funds from SavedSwap is used inly in recover_funds_of_swap.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ummmm, i imagined we would have both paths as in kind of nesting, which is good (and was the behaviour before this PR anyways).
i will try to trigger this and see if it looks complicated then i'll simplify it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upd: its non blocking and not urgent note. I suppose we have other places which already have same issue. I think it will be fixed in refactored kdf repo.

this field was convereted to a hashmap instead of a vector for easy access to the swap.
also we now manually delete the swap from running swaps when the swap is finished/inturepted (memory leak fix). as a consequence to manuallly deleting the swap from running_swaps, we can now store them as arcs instead of weakrefs, which simplifies a lot of .upgrade calls.
)));
},
};
let taker_coin = match swap.taker_coin_ticker() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code repetition for finding maker_coin and taker_coin.
(BTW could we just use lp_coinfind_or_err in all such cases?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. i didn't like it either but needed some assertion xD

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -516,7 +532,7 @@ struct LockedAmountInfo {
}

struct SwapsContext {
running_swaps: Mutex<Vec<Weak<dyn AtomicSwap>>>,
running_swaps: Mutex<Vec<(Weak<dyn AtomicSwap>, AbortOnDropHandle)>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you tell why do we need this change?

Is it related to some review note? may be I missed smth

Copy link
Collaborator Author

@mariocynicys mariocynicys Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah actually, not discussed in review. @shamardy just told me we need the swaps to be stoppable via rpc (since we now do a run forever recovery), that's why we record their abort handles to be able to stop them mid-recover (or even mid-swap).

@laruh laruh added the priority: medium Moderately important tasks that should be completed but are not urgent. label Jan 10, 2025
};
// Run the swap in an abortable task and wait for it to finish.
let (swap_ended_notifier, swap_ended_notification) = oneshot::channel();
let abortable_swap = spawn_abortable(async move {
Copy link
Collaborator

@dimxy dimxy Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we create an abortable future to run the swap loop (for using this in the new stop rpc I believe).
But run_taker_swap fn, where we are now, is also wrapped in an abortable future. Could we use this one in the stop rpc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but how will we grab that abort handle and supply it to run_taker_swap in the first place?
also kickstart_thread_handler uses run_taker_swap inline (without spawning it), well eventually if u follow the usage ofc u will find kickstart_thread_handler is itself being spawned somewhere. but it's hard to propagate these handles all the way down to the swap runner methods.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit odd when we use different futures to stop the same swap in different use cases (abort_all and stop_swap_rpc). Maybe we could get the handle for a future from the abortable_system for stop_swap_rpc, so we would use same abort technique in both cases. Maybe some modification for AbortableQueue is needed for that

@dimxy
Copy link
Collaborator

dimxy commented Jan 14, 2025

test_mm2_stops_immediately seems not working btw. I guess clear_running_swaps() is needed here too?

@mariocynicys
Copy link
Collaborator Author

test_mm2_stops_immediately seems not working btw. I guess clear_running_swaps() is needed here too?

yup. I'll just wait till we finish reviewing the mem leak PR and merge it and then merge this with dev.

_touch = touch_loop => unreachable!("Touch loop can not stop!"),
};
// Run the swap in an abortable task and wait for it to finish.
let (swap_ended_notifier, swap_ended_notification) = oneshot::channel();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify this code to eliminate oneshot::channel?

-    let (swap_ended_notifier, swap_ended_notification) = oneshot::channel();
-    let abortable_swap = spawn_abortable(async move {
+    let fut_with_touch = async move {
         select! {
             _swap = swap_fut => (), // swap finished normally
             _touch = touch_loop => unreachable!("Touch loop can not stop!"),
-        }
-        if swap_ended_notifier.send(()).is_err() {
-            error!("Swap listener stopped listening!");
-        }
-    });
-    swap_ctx.running_swaps.lock().unwrap().push((weak_ref, abortable_swap));
-    // Halt this function until the swap has finished (or interrupted, i.e. aborted/panic).
-    swap_ended_notification.await.error_log_with_msg("Swap interrupted!");
+        };
+    };
+    let (abortable, handle) = abortable(fut_with_touch);
+    swap_ctx.running_swaps.lock().unwrap().push((weak_ref, handle.into()));
+    if let Err(Aborted) = abortable.await {
+        info!("Swap uuid={} interrupted!", uuid);
+    }

@laruh laruh added priority: high Important tasks that need attention soon. and removed priority: medium Moderately important tasks that should be completed but are not urgent. labels Jan 16, 2025
@laruh
Copy link
Member

laruh commented Jan 16, 2025

@mariocynicys pr started to have conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.4.0-beta priority: high Important tasks that need attention soon. status: pending review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MM2 doesn't retry querying electrums on failed RPC requests
7 participants