Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(openai): added support for o1 reasoning models #1618

Merged
merged 7 commits into from
Dec 9, 2024

Conversation

KeganHollern
Copy link
Contributor

@KeganHollern KeganHollern commented Dec 7, 2024

Summary

  1. Fixing a bug with systemRoleSupported for the OpenAI endpoint.
  2. Now using max_completion_tokens over max_tokens when querying chat_completions endpoint.

The bugfix is necessary as o1-preview and o1-mini models do not support "system" message types.
The token configuration change was necessary as o1-preview and o1-mini no longer support the, now deprecated, max_tokens field.

Considerations

I added a new endpoint configuration field called useCompletionTokens and defaulted it to true.
This would have the effect of changing the default chat_completions call to use max_completion_tokens over max_tokens.
There may be unintended consequences for users, as this could increase their overall token usage per query (as the default would change).
It may be prudent to start with this field set to false, and later switch to true as part of a planned deprecate of max_tokens all together.

Another consideration is that my change for system messages will leave a "role": "user", "content": "" entry. One thing we may want is to drop the system message entirely if content is blank and system messages are not supported.

Reference

For those wanting to introduce o1 reasoning models based on this PR (future readers), here is an example implementation of o1-mini:

{
      "name": "o1-mini",
      "displayName": "o1 mini test",
      "multimodal": false,
      "description": "ChatGPT o1-mini",
      "systemRoleSupported": false,
      "parameters": {
        "max_new_tokens": 2048,
      },
      "endpoints" : [{
        "type": "openai",
      }]
}

@KeganHollern KeganHollern marked this pull request as ready for review December 7, 2024 02:59
@evalstate
Copy link
Contributor

this is cool - i had a crack at this a while ago ( #1540 (comment) ) at which point streaming wasn't supported so i had to do some extra messing around (i capture token usage too).

@nsarrazin nsarrazin added enhancement New feature or request back This issue is related to the Svelte backend or the DB models This issue is related to model performance/reliability labels Dec 9, 2024
@nsarrazin
Copy link
Collaborator

Hi! Thanks for the contrib. I made a couple changes:

  • Added an example in the doc
  • Made the parameter false by default (like you said, probably better to not change the default behaviour for now)
  • Fixed the linting

Overall this looks great, tried it locally and it seems to work well. Will merge once the checks are done. 🚀

@nsarrazin nsarrazin merged commit 9542b2c into huggingface:main Dec 9, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
back This issue is related to the Svelte backend or the DB enhancement New feature or request models This issue is related to model performance/reliability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants