Skip to content

Commit

Permalink
Merge pull request #792 from jieyouxu/bors-firefighting
Browse files Browse the repository at this point in the history
bors: document how to properly resync bors queue
  • Loading branch information
pietroalbini authored Jan 7, 2025
2 parents 10d9cd3 + d78ded7 commit 0251359
Show file tree
Hide file tree
Showing 3 changed files with 150 additions and 4 deletions.
1 change: 1 addition & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@
- [AWS regions](./infra/docs/aws-regions.md)
- [Bastion server](./infra/docs/bastion.md)
- [Bors](./infra/docs/bors.md)
- [Fixing bors queue](./infra/docs/bors/queue-resync.md)
- [CDN](./infra/docs/cdn.md)
- [Crater agents](./infra/docs/crater-agents.md)
- [Dev Desktops](./infra/docs/dev-desktop.md)
Expand Down
8 changes: 4 additions & 4 deletions src/infra/docs/bors.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ from the [rust-lang/homu] repository onto our [ECS cluster][ecs].
### Fixing inconsistencies in the queue

Homu is quite buggy, and it might happen that the queue doesn't reflect the
actual state in the repositories. This can be fixed by pressing the
"Synchronize" button in the queue page. Note that the synchronization process
itself is a bit buggy, and it might happen that PRs which were approved but
failed are re-approved again on their own.
actual state in the repositories.

See [Fixing inconsistencies in the bors queue](./bors/queue-resync.md) for
instructions on how to do this properly.

### Adding a new repository to bors

Expand Down
145 changes: 145 additions & 0 deletions src/infra/docs/bors/queue-resync.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Fixing inconsistencies in the bors queue

bors queue page: <https://bors.rust-lang.org/queue/rust>.

<div class="warning">
**WARNING**: You should only do this if you have bors `r+` permissions on the
rust-lang/rust repo. Please do not synchronize if you do not have `r+` permissions
even if you have write access to the repo, as you will be unable to perform the
required cleanup steps.

This is a **destructive** operation. If someone syncs, they need to
baby-sit the queue for around 45 minutes: around 30 minutes to wait for PRs to
be recollected, and 15 minutes after that to kick out PRs that should not be in
the tree.

**DO NOT CLICK THIS BUTTON IF YOU ARE NOT ABLE TO HANDLE THE CLEANUP.**
</div>

Sometimes you have to do a bors queue sync for various reasons. This is not
trivial and requires you to be very careful, as otherwise we may accidentally
merge PRs to `master` (or even `beta`) that should not have been merged
otherwise.

## Steps

### Step 0: Announce your intention

Let T-infra (and other reviewers) know that you plan to close the tree. Open a
new [T-infra zulip
thread](https://rust-lang.zulipchat.com/#narrow/channel/242791-t-infra) to let
other contributors know about the bors queue resync.

### Step 1: Close the tree

Find a PR that's currently being tested (or any open PR in the queue really if
the queue is *really* messed up).

Issue `@bors treeclosed=1000` along with some brief explanation for why you are
closing the tree so other reviewers (especially people doing rollups) have some
context.

Example:

```text
Closing the tree due to a resync.
cc <https://rust-lang.zulipchat.com/#narrow/channel/242791-t-infra/topic/try.20jobs.20not.20kicking.20off>.
@bors treeclosed=1000
```

### Step 2: Click "synchronize" button

As a courtesy, you can record which PRs had try jobs starting on them. After a
sync, the distinction between regular jobs and try jobs will be lost, so you'll
have to kick out all "pending" PRs.

Click the "synchronize" button in the [bors queue page][bors-queue]. Then,
immediately start performing the next step.

### Step 3: Kick out all actively tested PRs

Find *all* actively tested PRs. This includes both "auto" builds (or full CI)
which show up as "pending", or "try" builds which show up as "pending (try)"
(but sometimes bors forget the distinction and try jobs can show up as "pending"
too).

On each "pending" or "pending (try)" PR, write:

```text
@bors retry r- (sync)
```

to suspend the current job and take it out of the queue (bors can confuse try
jobs with full CI jobs).

### Step 4: Wait

Wait for around **30-45 minutes** to allow bors to recollect all the PRs. **Do
not** reopen the tree beforehand, as it will cause bors to have an inconsistent
view of the PRs, which will lead to unspecified behavior.

### Step 5: Kick out ineligible PRs

Check "approved" PRs in the queue. Some of them will actually not be eligible
for merge, due to reasons such as:

- Merge conflicts
- Significant changes since last review
- Never has been approved

But sometimes bors forget these distinction. You'll need to manually visit each
of the "approved" PRs and check their eligibility.

For "approved" PRs that are not actually eligible, you should kick them out of
the queue via `@bors r-`. Prefer to be cautious if you are not sure, and
unapprove the PR in case of ambiguity.

```text
@bors r- (sync)
```

### Step 6: Double-check approved PRs

Do another review pass of "approved" PRs in the queue, to make sure all approved
PRs are actually eligible for merge.

### Step 7: Re-open the tree on the same PR where you closed the tree

Reopen the tree on the same PR that you issued the `treeclosed` command with

```text
@bors treeclosed-
```

Closely monitor bors' behavior for around 5 minutes, to ensure that bors is
correctly testing a PR that's eligible for merge. Update the relevant T-infra
zulip thread as suitable.

### Step 8: Re-queue try jobs

In Step 2, if you had to kick out try jobs, you can requeue the try jobs on the
PRs that previously had try jobs started on them.

Use the normal try-job command:

```text
@bors try
```

and not `@bors retry`.

### Step 9: Edit your `treeclose` commands to prevent bors from picking them up

Edit the `@bors treeclosed=xxxx` command and `@bors treeclosed-` command like

```text
~~@/bors treeclosed=xxxx~~
EDIT(ferris): edited to prevent bors from picking up command in a future sync
```

AFTER the tree has been reopened to prevent bors from picking them up in a
future sync.

[bors-queue]: https://bors.rust-lang.org/queue/rust

0 comments on commit 0251359

Please sign in to comment.