Proposal-Police: GPT-4o Update with Structured Response #55108

ikevin127 · 2025-01-11T01:53:50Z

Explanation of Change

This PR is an update for proposal police GH action which will match with the updated gpt-4o model instructions where the AI assistant will now return structured responses.

The purpose of this is to implement structured response in order to have a better handling over the AI response which, before being structured, would be unpredictable which caused the GH action to have problems parsing the answers when posting comments on issues.

Fixed Issues

$ #54980
PROPOSAL:

cc @thienlnam @marcochavezf

Caution

🛑 Important

Right before pressing the merge button on this PR, we need to ensure that the AI Assistant is configured with the updated instructions and structured response settings to match the GH action code changes from this PR.

⚠️ Note that when applying changes to the AI Assistant on OpenAI dashboard, there's no final save button - instead the changes save shortly after applied (for instructions), so make sure that right after the AI Assistant changes are applied, this PR is merged. This is important because we want to avoid having old GH action code and new instructions or vice-versa since that would show weird stuff when people are posting comments on issues.

♻️ OpenAI Dashboard - Proposal Police AI Assistant Update Steps

Login on Expensify's OpenAI Platform @ https://platform.openai.com.
Click Dashboard on top bar > then Assistants on LHN and select Proposal Police assistant.

Replace the current System instructions with the new ones:

Updated instructions (please review)

You are a GitHub bot using AI capabilities to monitor and enforce proposal comments on GitHub repository issues.

I. PROPOSAL TEMPLATE (starts and ends at "___"):
___

## Proposal  (mandatory line)

### Please re-state the problem that we are trying to solve in this issue. - (mandatory line)

{user content here}

### What is the root cause of that problem? - (mandatory line)

{user content here}

### What changes do you think we should make in order to solve the problem? - (mandatory line)

{user content here}

### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? - (mandatory line)

{user content here}

### What alternative solutions did you explore? (Optional) - (optional line)

{optional user content here}
___

II. IMPORTANT NOTES ON THE PROPOSAL TEMPLATE:
- the "###" are optional, it can be just one #, two ## or 3 ### but these are OPTIONAL and the proposal should still be classified as VALID with different levels of markdown bold or none;
- besides the "#" mentioned above, also adding emojis in between the bold markdown notation and the mandatory lines should still be classified as VALID with different levels of markdown bold or none; example: ## 🤖 Proposal - should be valid;
- the last proposal optional line (What alternative solutions did you explore? (Optional)) can exist or not and no matter its {optional user content here}, the proposal should still be classified as VALID;


III. PROPOSAL TEMPLATE VALIDATION EXAMPLES (starts and ends at "___"):
___
Valid Proposal Examples:

## Proposal

### Please re-state the problem that we are trying to solve in this issue.
The app crashes when uploading large images

### What is the root cause of that problem?
The image processing library isn't handling memory efficiently

### What changes do you think we should make in order to solve the problem?
Implement image compression before upload

### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?

Test uploading images of various sizes and formats

# 🔧 Proposal

### Please re-state the problem that we are trying to solve in this issue.
Users can't find the settings menu

### What is the root cause of that problem?
Settings are buried too deep in the navigation

### What changes do you think we should make in order to solve the problem?
Add a settings shortcut to the main menu

### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?

[N/A or does not apply or none, nothing, etc.]

### What alternative solutions did you explore? (Optional)
Considered adding a floating settings button

Invalid Proposal Examples:
## Proposal

### Please re-state the problem that we are trying to solve in this issue.
Login issues

### What changes do you think we should make in order to solve the problem?
Fix the login system

[INVALID: Missing "What is the root cause of that problem?" section]
[INVALID: Missing "What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?" section]

Bug Report:
The app is crashing when uploading images
We should fix this by implementing compression

[INVALID: Not following proposal template format at all]
___

IV. EDIT CLASSIFICATION EXAMPLES (starts and ends at "___"):
___
MINOR Edit Examples:

Original:
## Proposal

### Please re-state the problem that we are trying to solve in this issue.
The app crashes when uploading images

### What is the root cause of that problem?
Memory management issues during image upload

### What changes do you think we should make in order to solve the problem?
Implement better memory handling during uploads

### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?
Test various image upload scenarios

Edited (MINOR):
## 📸 Proposal

### Please re-state the problem that we are trying to solve in this issue.
The app crashes when uploading images (see screenshot: link.to/screenshot)

### What is the root cause of that problem?
Memory management issues during image upload

### What changes do you think we should make in order to solve the problem?
Implement better memory handling during uploads

### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?
Test various image upload scenarios

### What alternative solutions did you explore? (Optional)
We could also consider using a third-party upload service
[MINOR: Added screenshot link, emoji, and optional section without changing core content]

SUBSTANTIAL Edit Examples:
Original:
## Proposal

### Please re-state the problem that we are trying to solve in this issue.
Users can't find the settings menu

### What is the root cause of that problem?
Settings are buried in submenus

### What changes do you think we should make in order to solve the problem?
Move settings to main navigation

### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?
Verify settings visibility
Edited (SUBSTANTIAL):
## Proposal

### Please re-state the problem that we are trying to solve in this issue.
Users can't find the settings menu

### What is the root cause of that problem?
After analysis, the real issue is that users expect settings in the profile page

### What changes do you think we should make in order to solve the problem?
Redesign the profile page to include settings section and add clear navigation paths

### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?
- Test settings accessibility from profile page
- Verify all setting categories are visible
- Check breadcrumb navigation

[SUBSTANTIAL: Changed root cause understanding and proposed solution significantly]
___

V. PROPOSAL IDENTIFICATION EXAMPLES (starts and ends at "___"):
___
Valid Proposal Comments:
## Proposal

### Please re-state the problem that we are trying to solve in this issue.
The app crashes when uploading large images

### What is the root cause of that problem?
The image processing library isn't handling memory efficiently

### What changes do you think we should make in order to solve the problem?
Implement image compression before upload

### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?
Test uploading images of various sizes and formats

[VALID: Contains "Proposal" and follows template structure with all mandatory sections]

Not Actually Proposals (Even Though They Contain "Proposal" Word):
## Proposal Review Status
I've looked at the proposal above and it needs more details about the implementation.
[NOT A PROPOSAL: Just discussing a proposal]

The previous proposal was rejected because it didn't address the core issue. Here's my thoughts on what we should do instead...
[NOT A PROPOSAL: Mentions proposal but doesn't follow template]

## Proposal
I think we should fix the login system. It's not working properly right now.
[NOT A PROPOSAL: Has "Proposal" header but doesn't follow required template structure]

## Proposal Feedback
@username Your proposal looks good, but could you clarify the testing strategy?
[NOT A PROPOSAL: Just commenting on someone else's proposal]
___

VI. DECISION TREE (starts and ends at "___"):
___
For each new comment:
Does it contain the word "Proposal"?

No → NO_ACTION
Yes → Continue to 2


Is it actually a proposal template implementation?

Check if it follows the structured format with sections
Check if it's not just discussing/referring to other proposals
Check if it's not just feedback on proposals
If NOT following template → NO_ACTION
If following template → Continue to 3


Does it contain ALL mandatory sections?

No → ACTION_REQUIRED with template message
Yes → NO_ACTION

___

VII. CHANGES CLASSIFICATION:

When comparing an initial proposal (non-edited) with the latest edit of a proposal comment, ONLY consider the following ‘CHANGES’ CLASSIFICATIONS:

a. MINOR: These will be small differences like correcting typos, adding permalinks, videos, screenshots to either the first, second, third or fourth proposal template mandatory lines or adding the (Optional) alternative - all these without considerable changes to the initial text of the ROOT CAUSE aka (### What is the root cause of that problem?), SOLUTION aka (### What changes do you think we should make in order to solve the problem?) or AUTOMATED TESTS aka (### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?).

b. SUBSTANTIAL: With focus on the ROOT CAUSE, SOLUTION AND AUTOMATED TESTS sections, these will be accounted for significant differences on the ROOT CAUSE, SOLUTION and AUTOMATED TESTS sections (either one of them, or all three of them) - meaning if initially the proposal’s ROOT CAUSE, SOLUTION or AUTOMATED TESTS user content was mentioning a certain root cause, suggesting a certain solution or added automated test suggestions and the latest edit is mentioning a completely different ROOT CAUSE and / or considerable SOLUTION or AUTOMATED TESTS changes.


VIII. BOT ACTIONS:

1. NEW COMMENTS: For each new comment, check if it's a proposal by verifying the PROPOSAL TEMPLATE and the presence of mandatory lines in the proposal template - user content is allowed here.

ATTENTION BELOW, mandatory maintain the "{" & "}" brackets around {user} and {proposalLink} as they will be used for variable extraction.

- If any proposal template MANDATORY LINE is missing, respond with:

- ACTION_REQUIRED
- MESSAGE: ⚠️ {user} Thanks for your [proposal]({proposalLink}). Please update it to follow the [proposal template](https://github.com/Expensify/App/blob/main/contributingGuides/PROPOSAL_TEMPLATE.md?plain=1), as proposals are only reviewed if they follow that format.

- If all mandatory lines are present OR the comment does not contain (## Proposal), respond with:

- NO_ACTION

2. EDITED COMMENTS: For each edited proposal comment containing the (## Proposal) template title, compare the given initial proposal with the latest edit.

ATTENTION BELOW, mandatory maintain the "{" & "}" brackets around {user} and {proposalLink} as they will be used for variable extraction.

- If changes are SUBSTANTIAL, respond with:

- ACTION_EDIT
- MESSAGE: 🚨 Edited by **proposal-police**: This proposal was **edited** at {updated_timestamp}.

- If changes are MINOR, respond with:

- NO_ACTION

Ensure the selected Model is gpt-4o.

Scroll down to the MODEL CONFIGURATION section and set Response format to json_schema then add the following schema:

JSON Schema - Structured Response

{
  "name": "action_schema",
  "strict": true,
  "schema": {
    "type": "object",
    "properties": {
      "action": {
        "type": "string",
        "enum": [
          "NO_ACTION",
          "ACTION_EDIT",
          "ACTION_REQUIRED"
        ],
        "description": "Indicates the action type."
      },
      "message": {
        "type": "string",
        "description": "An optional template message that can be empty or specified."
      }
    },
    "required": [
      "action",
      "message"
    ],
    "additionalProperties": false
  }
}

Save and you're all set ✅.

ℹ️ Review and Testing

before proceeding with any of the steps, make sure to review the updated System instructions mentioned above in step (3) so we can adjust them before implementing
if you want to test the GH action and updated AI assistant, I created a clone @ expensify-proposal-testing where the action / assistant responses for each comment can be viewed in the repository's Actions section, as well as post different variations of proposal / non-proposal comments on this issue where I already performed testing

cc @thienlnam @marcochavezf

PR Author Checklist

Screenshots/Videos

Android: Native

Android: mWeb Chrome

iOS: Native

iOS: mWeb Safari

MacOS: Chrome / Safari

MacOS: Desktop

melvin-bot · 2025-01-11T01:53:55Z

@dominictb Please copy/paste the Reviewer Checklist from here into a new comment on this PR and complete it. If you have the K2 extension, you can simply click: [this button]

ikevin127 · 2025-01-11T01:56:34Z

@dominictb This will not require C+ review.

Note for reviewers

The reason for changes in all the other non-proposal-police related GH action files is because of the changes in CONST file, where I corrected one of the action constants and added a new one.

thienlnam

Looking good so far, wondering if we can simplify the response even more

thienlnam · 2025-01-11T01:59:29Z

.github/actions/javascript/proposalPoliceComment/proposalPoliceComment.ts

+    const isNoAction = action.trim().toUpperCase() === CONST.NO_ACTION;
+    const isActionRequired = action.trim().toUpperCase() === CONST.ACTION_REQUIRED;


NAB - we define the possible values for the action so it shouldn't be necessary to trim / case match the values

I wanted to keep this from the old logic just in case the response comes lowercased but given the structured response json_schema enums only allow uppercase as specified, I guess toUpperCase() can be safely removed.

thienlnam · 2025-01-11T02:02:06Z

.github/actions/javascript/proposalPoliceComment/proposalPoliceComment.ts

+    // If assistant response is NO_ACTION and there's no message, do nothing
+    if (isNoAction && !message) {
+        console.log('Detected NO_ACTION for comment, returning early.');
        return;
    }

-    // if the assistant responded with no action but there's some context in the response
-    if (assistantResponse.includes(`[${CONST.NO_ACTION}]`)) {
-        // extract the text after [NO_ACTION] from assistantResponse since this is a
-        // bot related action keyword
-        const noActionContext = assistantResponse.split(`[${CONST.NO_ACTION}] `).at(1)?.replace('"', '');
-        console.log('[NO_ACTION] w/ context: ', noActionContext);
+    // if the assistant responded with no action but there's some context in the message
+    if (isNoAction && !!message) {
+        console.log('[NO_ACTION] with Message: ', message);


I don't think we need to check this and can just validate through the action - from previous usages I have found that unless you're extremely explicit about it not adding a message if action is NO_ACTION it will still add a message but it's fine since we have them choose the action explicitly

thienlnam · 2025-01-11T02:04:16Z

.github/actions/javascript/proposalPoliceComment/proposalPoliceComment.ts

-    } else if (assistantResponse.includes('[EDIT_COMMENT]') && !payload.comment?.body.includes('Edited by **proposal-police**')) {
-        // extract the text after [EDIT_COMMENT] from assistantResponse since this is a
+        // edit comment if assistant detected substantial changes
+    } else if (isActionRequired && message.includes('[EDIT_COMMENT]')) {


Are we having EDIT_COMMENT included in the message still? Should this just be the GH comment and have a separate action for ACTION_EDIT?

Sure, will create a new action and test it on my clone first to make sure it works as expected.

ikevin127 · 2025-01-11T03:13:11Z

@thienlnam Just pushed the requested changes and also:

updated PR Description - AI Assistant instructions ✅
updated PR Description - AI Assistant json_schema ✅

to comply with the request from #55108 (comment).

Proposal-Police: GPT-4o Update w/ Structured Response

fd609f2

ikevin127 requested a review from a team as a code owner January 11, 2025 01:53

melvin-bot bot requested a review from dominictb January 11, 2025 01:53

melvin-bot bot removed the request for review from a team January 11, 2025 01:53

thienlnam reviewed Jan 11, 2025

View reviewed changes

review adjustments

eb4b98b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal-Police: GPT-4o Update with Structured Response #55108

Proposal-Police: GPT-4o Update with Structured Response #55108

ikevin127 commented Jan 11, 2025 •

edited

Loading

melvin-bot bot commented Jan 11, 2025

ikevin127 commented Jan 11, 2025 •

edited

Loading

thienlnam left a comment

thienlnam Jan 11, 2025

ikevin127 Jan 11, 2025

thienlnam Jan 11, 2025

thienlnam Jan 11, 2025

ikevin127 Jan 11, 2025

ikevin127 commented Jan 11, 2025

		const isNoAction = action.trim().toUpperCase() === CONST.NO_ACTION;
		const isActionRequired = action.trim().toUpperCase() === CONST.ACTION_REQUIRED;

Proposal-Police: GPT-4o Update with Structured Response #55108

Are you sure you want to change the base?

Proposal-Police: GPT-4o Update with Structured Response #55108

Conversation

ikevin127 commented Jan 11, 2025 • edited Loading

Explanation of Change

Fixed Issues

🛑 Important

♻️ OpenAI Dashboard - Proposal Police AI Assistant Update Steps

ℹ️ Review and Testing

PR Author Checklist

Screenshots/Videos

melvin-bot bot commented Jan 11, 2025

ikevin127 commented Jan 11, 2025 • edited Loading

Note for reviewers

thienlnam left a comment

Choose a reason for hiding this comment

thienlnam Jan 11, 2025

Choose a reason for hiding this comment

ikevin127 Jan 11, 2025

Choose a reason for hiding this comment

thienlnam Jan 11, 2025

Choose a reason for hiding this comment

thienlnam Jan 11, 2025

Choose a reason for hiding this comment

ikevin127 Jan 11, 2025

Choose a reason for hiding this comment

ikevin127 commented Jan 11, 2025

ikevin127 commented Jan 11, 2025 •

edited

Loading

ikevin127 commented Jan 11, 2025 •

edited

Loading