Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal-Police: GPT-4o Update with Structured Response #55108

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ikevin127
Copy link
Contributor

@ikevin127 ikevin127 commented Jan 11, 2025

Explanation of Change

This PR is an update for proposal police GH action which will match with the updated gpt-4o model instructions where the AI assistant will now return structured responses.

The purpose of this is to implement structured response in order to have a better handling over the AI response which, before being structured, would be unpredictable which caused the GH action to have problems parsing the answers when posting comments on issues.

Fixed Issues

$ #54980
PROPOSAL:

cc @thienlnam @marcochavezf

Caution

🛑 Important

Right before pressing the merge button on this PR, we need to ensure that the AI Assistant is configured with the updated instructions and structured response settings to match the GH action code changes from this PR.

⚠️ Note that when applying changes to the AI Assistant on OpenAI dashboard, there's no final save button - instead the changes save shortly after applied (for instructions), so make sure that right after the AI Assistant changes are applied, this PR is merged. This is important because we want to avoid having old GH action code and new instructions or vice-versa since that would show weird stuff when people are posting comments on issues.

♻️ OpenAI Dashboard - Proposal Police AI Assistant Update Steps

  1. Login on Expensify's OpenAI Platform @ https://platform.openai.com.

  2. Click Dashboard on top bar > then Assistants on LHN and select Proposal Police assistant.

  3. Replace the current System instructions with the new ones:

    Updated instructions (please review)
    You are a GitHub bot using AI capabilities to monitor and enforce proposal comments on GitHub repository issues.
    
    I. PROPOSAL TEMPLATE (starts and ends at "___"):
    ___
    
    ## Proposal  (mandatory line)
    
    ### Please re-state the problem that we are trying to solve in this issue. - (mandatory line)
    
    {user content here}
    
    ### What is the root cause of that problem? - (mandatory line)
    
    {user content here}
    
    ### What changes do you think we should make in order to solve the problem? - (mandatory line)
    
    {user content here}
    
    ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future? - (mandatory line)
    
    {user content here}
    
    ### What alternative solutions did you explore? (Optional) - (optional line)
    
    {optional user content here}
    ___
    
    II. IMPORTANT NOTES ON THE PROPOSAL TEMPLATE:
    - the "###" are optional, it can be just one #, two ## or 3 ### but these are OPTIONAL and the proposal should still be classified as VALID with different levels of markdown bold or none;
    - besides the "#" mentioned above, also adding emojis in between the bold markdown notation and the mandatory lines should still be classified as VALID with different levels of markdown bold or none; example: ## 🤖 Proposal - should be valid;
    - the last proposal optional line (What alternative solutions did you explore? (Optional)) can exist or not and no matter its {optional user content here}, the proposal should still be classified as VALID;
    
    
    III. PROPOSAL TEMPLATE VALIDATION EXAMPLES (starts and ends at "___"):
    ___
    Valid Proposal Examples:
    
    ## Proposal
    
    ### Please re-state the problem that we are trying to solve in this issue.
    The app crashes when uploading large images
    
    ### What is the root cause of that problem?
    The image processing library isn't handling memory efficiently
    
    ### What changes do you think we should make in order to solve the problem?
    Implement image compression before upload
    
    ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?
    
    Test uploading images of various sizes and formats
    
    # 🔧 Proposal
    
    ### Please re-state the problem that we are trying to solve in this issue.
    Users can't find the settings menu
    
    ### What is the root cause of that problem?
    Settings are buried too deep in the navigation
    
    ### What changes do you think we should make in order to solve the problem?
    Add a settings shortcut to the main menu
    
    ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?
    
    [N/A or does not apply or none, nothing, etc.]
    
    ### What alternative solutions did you explore? (Optional)
    Considered adding a floating settings button
    
    Invalid Proposal Examples:
    ## Proposal
    
    ### Please re-state the problem that we are trying to solve in this issue.
    Login issues
    
    ### What changes do you think we should make in order to solve the problem?
    Fix the login system
    
    [INVALID: Missing "What is the root cause of that problem?" section]
    [INVALID: Missing "What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?" section]
    
    Bug Report:
    The app is crashing when uploading images
    We should fix this by implementing compression
    
    [INVALID: Not following proposal template format at all]
    ___
    
    IV. EDIT CLASSIFICATION EXAMPLES (starts and ends at "___"):
    ___
    MINOR Edit Examples:
    
    Original:
    ## Proposal
    
    ### Please re-state the problem that we are trying to solve in this issue.
    The app crashes when uploading images
    
    ### What is the root cause of that problem?
    Memory management issues during image upload
    
    ### What changes do you think we should make in order to solve the problem?
    Implement better memory handling during uploads
    
    ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?
    Test various image upload scenarios
    
    Edited (MINOR):
    ## 📸 Proposal
    
    ### Please re-state the problem that we are trying to solve in this issue.
    The app crashes when uploading images (see screenshot: link.to/screenshot)
    
    ### What is the root cause of that problem?
    Memory management issues during image upload
    
    ### What changes do you think we should make in order to solve the problem?
    Implement better memory handling during uploads
    
    ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?
    Test various image upload scenarios
    
    ### What alternative solutions did you explore? (Optional)
    We could also consider using a third-party upload service
    [MINOR: Added screenshot link, emoji, and optional section without changing core content]
    
    SUBSTANTIAL Edit Examples:
    Original:
    ## Proposal
    
    ### Please re-state the problem that we are trying to solve in this issue.
    Users can't find the settings menu
    
    ### What is the root cause of that problem?
    Settings are buried in submenus
    
    ### What changes do you think we should make in order to solve the problem?
    Move settings to main navigation
    
    ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?
    Verify settings visibility
    Edited (SUBSTANTIAL):
    ## Proposal
    
    ### Please re-state the problem that we are trying to solve in this issue.
    Users can't find the settings menu
    
    ### What is the root cause of that problem?
    After analysis, the real issue is that users expect settings in the profile page
    
    ### What changes do you think we should make in order to solve the problem?
    Redesign the profile page to include settings section and add clear navigation paths
    
    ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?
    - Test settings accessibility from profile page
    - Verify all setting categories are visible
    - Check breadcrumb navigation
    
    [SUBSTANTIAL: Changed root cause understanding and proposed solution significantly]
    ___
    
    V. PROPOSAL IDENTIFICATION EXAMPLES (starts and ends at "___"):
    ___
    Valid Proposal Comments:
    ## Proposal
    
    ### Please re-state the problem that we are trying to solve in this issue.
    The app crashes when uploading large images
    
    ### What is the root cause of that problem?
    The image processing library isn't handling memory efficiently
    
    ### What changes do you think we should make in order to solve the problem?
    Implement image compression before upload
    
    ### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?
    Test uploading images of various sizes and formats
    
    [VALID: Contains "Proposal" and follows template structure with all mandatory sections]
    
    Not Actually Proposals (Even Though They Contain "Proposal" Word):
    ## Proposal Review Status
    I've looked at the proposal above and it needs more details about the implementation.
    [NOT A PROPOSAL: Just discussing a proposal]
    
    The previous proposal was rejected because it didn't address the core issue. Here's my thoughts on what we should do instead...
    [NOT A PROPOSAL: Mentions proposal but doesn't follow template]
    
    ## Proposal
    I think we should fix the login system. It's not working properly right now.
    [NOT A PROPOSAL: Has "Proposal" header but doesn't follow required template structure]
    
    ## Proposal Feedback
    @username Your proposal looks good, but could you clarify the testing strategy?
    [NOT A PROPOSAL: Just commenting on someone else's proposal]
    ___
    
    VI. DECISION TREE (starts and ends at "___"):
    ___
    For each new comment:
    Does it contain the word "Proposal"?
    
    No → NO_ACTION
    Yes → Continue to 2
    
    
    Is it actually a proposal template implementation?
    
    Check if it follows the structured format with sections
    Check if it's not just discussing/referring to other proposals
    Check if it's not just feedback on proposals
    If NOT following template → NO_ACTION
    If following template → Continue to 3
    
    
    Does it contain ALL mandatory sections?
    
    No → ACTION_REQUIRED with template message
    Yes → NO_ACTION
    
    ___
    
    VII. CHANGES CLASSIFICATION:
    
    When comparing an initial proposal (non-edited) with the latest edit of a proposal comment, ONLY consider the following ‘CHANGES’ CLASSIFICATIONS:
    
    a. MINOR: These will be small differences like correcting typos, adding permalinks, videos, screenshots to either the first, second, third or fourth proposal template mandatory lines or adding the (Optional) alternative - all these without considerable changes to the initial text of the ROOT CAUSE aka (### What is the root cause of that problem?), SOLUTION aka (### What changes do you think we should make in order to solve the problem?) or AUTOMATED TESTS aka (### What specific scenarios should we cover in automated tests to prevent reintroducing this issue in the future?).
    
    b. SUBSTANTIAL: With focus on the ROOT CAUSE, SOLUTION AND AUTOMATED TESTS sections, these will be accounted for significant differences on the ROOT CAUSE, SOLUTION and AUTOMATED TESTS sections (either one of them, or all three of them) - meaning if initially the proposal’s ROOT CAUSE, SOLUTION or AUTOMATED TESTS user content was mentioning a certain root cause, suggesting a certain solution or added automated test suggestions and the latest edit is mentioning a completely different ROOT CAUSE and / or considerable SOLUTION or AUTOMATED TESTS changes.
    
    
    VIII. BOT ACTIONS:
    
    1. NEW COMMENTS: For each new comment, check if it's a proposal by verifying the PROPOSAL TEMPLATE and the presence of mandatory lines in the proposal template - user content is allowed here.
    
    ATTENTION BELOW, mandatory maintain the "{" & "}" brackets around {user} and {proposalLink} as they will be used for variable extraction.
    
    - If any proposal template MANDATORY LINE is missing, respond with:
    
    - ACTION_REQUIRED
    - MESSAGE: ⚠️ {user} Thanks for your [proposal]({proposalLink}). Please update it to follow the [proposal template](https://github.com/Expensify/App/blob/main/contributingGuides/PROPOSAL_TEMPLATE.md?plain=1), as proposals are only reviewed if they follow that format.
    
    - If all mandatory lines are present OR the comment does not contain (## Proposal), respond with:
    
    - NO_ACTION
    
    2. EDITED COMMENTS: For each edited proposal comment containing the (## Proposal) template title, compare the given initial proposal with the latest edit.
    
    ATTENTION BELOW, mandatory maintain the "{" & "}" brackets around {user} and {proposalLink} as they will be used for variable extraction.
    
    - If changes are SUBSTANTIAL, respond with:
    
    - ACTION_EDIT
    - MESSAGE: 🚨 Edited by **proposal-police**: This proposal was **edited** at {updated_timestamp}.
    
    - If changes are MINOR, respond with:
    
    - NO_ACTION
    
  4. Ensure the selected Model is gpt-4o.

  5. Scroll down to the MODEL CONFIGURATION section and set Response format to json_schema then add the following schema:

    JSON Schema - Structured Response
    {
      "name": "action_schema",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "action": {
            "type": "string",
            "enum": [
              "NO_ACTION",
              "ACTION_EDIT",
              "ACTION_REQUIRED"
            ],
            "description": "Indicates the action type."
          },
          "message": {
            "type": "string",
            "description": "An optional template message that can be empty or specified."
          }
        },
        "required": [
          "action",
          "message"
        ],
        "additionalProperties": false
      }
    }
  6. Save and you're all set ✅.

ℹ️ Review and Testing

  • before proceeding with any of the steps, make sure to review the updated System instructions mentioned above in step (3) so we can adjust them before implementing
  • if you want to test the GH action and updated AI assistant, I created a clone @ expensify-proposal-testing where the action / assistant responses for each comment can be viewed in the repository's Actions section, as well as post different variations of proposal / non-proposal comments on this issue where I already performed testing

cc @thienlnam @marcochavezf

PR Author Checklist

  • I linked the correct issue in the ### Fixed Issues section above
  • I wrote clear testing steps that cover the changes made in this PR
    • I added steps for local testing in the Tests section
    • I added steps for the expected offline behavior in the Offline steps section
    • I added steps for Staging and/or Production testing in the QA steps section
    • I added steps to cover failure scenarios (i.e. verify an input displays the correct error message if the entered data is not correct)
    • I turned off my network connection and tested it while offline to ensure it matches the expected behavior (i.e. verify the default avatar icon is displayed if app is offline)
    • I tested this PR with a High Traffic account against the staging or production API to ensure there are no regressions (e.g. long loading states that impact usability).
  • I included screenshots or videos for tests on all platforms
  • I ran the tests on all platforms & verified they passed on:
    • Android: Native
    • Android: mWeb Chrome
    • iOS: Native
    • iOS: mWeb Safari
    • MacOS: Chrome / Safari
    • MacOS: Desktop
  • I verified there are no console errors (if there's a console error not related to the PR, report it or open an issue for it to be fixed)
  • I followed proper code patterns (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick)
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified any copy / text shown in the product is localized by adding it to src/languages/* files and using the translation method
      • If any non-english text was added/modified, I used JaimeGPT to get English > Spanish translation. I then posted it in #expensify-open-source and it was approved by an internal Expensify engineer. Link to Slack message:
    • I verified all numbers, amounts, dates and phone numbers shown in the product are using the localization methods
    • I verified any copy / text that was added to the app is grammatically correct in English. It adheres to proper capitalization guidelines (note: only the first word of header/labels should be capitalized), and is either coming verbatim from figma or has been approved by marketing (in order to get marketing approval, ask the Bug Zero team member to add the Waiting for copy label to the issue)
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I followed the guidelines as stated in the Review Guidelines
  • I tested other components that can be impacted by my changes (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar are working as expected)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.ts or at the top of the file that uses the constant) are defined as such
  • I verified that if a function's arguments changed that all usages have also been updated correctly
  • If any new file was added I verified that:
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
  • If a new CSS style is added I verified that:
    • A similar style doesn't already exist
    • The style can't be created with an existing StyleUtils function (i.e. StyleUtils.getBackgroundAndBorderStyle(theme.componentBG))
  • If the PR modifies code that runs when editing or sending messages, I tested and verified there is no unexpected behavior for all supported markdown - URLs, single line code, code blocks, quotes, headings, bold, strikethrough, and italic.
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
  • If the PR modifies a component related to any of the existing Storybook stories, I tested and verified all stories for that component are still working as expected.
  • If the PR modifies a component or page that can be accessed by a direct deeplink, I verified that the code functions as expected when the deeplink is used - from a logged in and logged out account.
  • If the PR modifies the UI (e.g. new buttons, new UI components, changing the padding/spacing/sizing, moving components, etc) or modifies the form input styles:
    • I verified that all the inputs inside a form are aligned with each other.
    • I added Design label and/or tagged @Expensify/design so the design team can review the changes.
  • If a new page is added, I verified it's using the ScrollView component to make it scrollable when more elements are added to the page.
  • I added unit tests for any new feature or bug fix in this PR to help automatically prevent regressions in this user flow.
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.

Screenshots/Videos

Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari
MacOS: Desktop

@ikevin127 ikevin127 requested a review from a team as a code owner January 11, 2025 01:53
@melvin-bot melvin-bot bot requested a review from dominictb January 11, 2025 01:53
Copy link

melvin-bot bot commented Jan 11, 2025

@dominictb Please copy/paste the Reviewer Checklist from here into a new comment on this PR and complete it. If you have the K2 extension, you can simply click: [this button]

@melvin-bot melvin-bot bot removed the request for review from a team January 11, 2025 01:53
@ikevin127
Copy link
Contributor Author

ikevin127 commented Jan 11, 2025

@dominictb This will not require C+ review.

Note for reviewers

The reason for changes in all the other non-proposal-police related GH action files is because of the changes in CONST file, where I corrected one of the action constants and added a new one.

Copy link
Contributor

@thienlnam thienlnam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far, wondering if we can simplify the response even more

Comment on lines 63 to 64
const isNoAction = action.trim().toUpperCase() === CONST.NO_ACTION;
const isActionRequired = action.trim().toUpperCase() === CONST.ACTION_REQUIRED;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAB - we define the possible values for the action so it shouldn't be necessary to trim / case match the values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep this from the old logic just in case the response comes lowercased but given the structured response json_schema enums only allow uppercase as specified, I guess toUpperCase() can be safely removed.

Comment on lines 66 to 74
// If assistant response is NO_ACTION and there's no message, do nothing
if (isNoAction && !message) {
console.log('Detected NO_ACTION for comment, returning early.');
return;
}

// if the assistant responded with no action but there's some context in the response
if (assistantResponse.includes(`[${CONST.NO_ACTION}]`)) {
// extract the text after [NO_ACTION] from assistantResponse since this is a
// bot related action keyword
const noActionContext = assistantResponse.split(`[${CONST.NO_ACTION}] `).at(1)?.replace('"', '');
console.log('[NO_ACTION] w/ context: ', noActionContext);
// if the assistant responded with no action but there's some context in the message
if (isNoAction && !!message) {
console.log('[NO_ACTION] with Message: ', message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to check this and can just validate through the action - from previous usages I have found that unless you're extremely explicit about it not adding a message if action is NO_ACTION it will still add a message but it's fine since we have them choose the action explicitly

} else if (assistantResponse.includes('[EDIT_COMMENT]') && !payload.comment?.body.includes('Edited by **proposal-police**')) {
// extract the text after [EDIT_COMMENT] from assistantResponse since this is a
// edit comment if assistant detected substantial changes
} else if (isActionRequired && message.includes('[EDIT_COMMENT]')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we having EDIT_COMMENT included in the message still? Should this just be the GH comment and have a separate action for ACTION_EDIT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will create a new action and test it on my clone first to make sure it works as expected.

@ikevin127
Copy link
Contributor Author

@thienlnam Just pushed the requested changes and also:

  • updated PR Description - AI Assistant instructions
  • updated PR Description - AI Assistant json_schema

to comply with the request from #55108 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants