-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server : (refactoring) do not rely on JSON internally #10643
server : (refactoring) do not rely on JSON internally #10643
Conversation
@ggerganov Because this refactoring change quite a lot of code, so I think it would be better to split it into 2 parts:
So the current I'm also tagging @slaren in case you want to leave some suggestions for my approach of polymorphism in this PR. Thank you! |
I don't have a good overall view of how the server is implemented and what this is doing, but there are several red flags that don't look right to me.
Again, I do not have a good overall view of the server implementation to make specific recommendations, but that's not what I would expect from a class hierarchy. Generally, you should look into abstracting the interface into a few functions, and implement these in the derived classes. Casts from the base class to the derived class should never be necessary. |
The goal to limit the use of JSON object in the server implementation is good, but the proposed implementation has some deficiencies. @slaren highlighted most of the issues. IMO the server-result polymorphism is not warranted in this case and introduces unnecessary complexity. I would recommend to have a single |
Honestly I'm pretty new to cpp polymorphism and thanks to the points that @slaren highlighted, I understand it more clearly now.
I think having prefixed may be worse to manage than the current JSON approach. Having nested struct can be cleaner, but I think it's kinda polymorphism, which better to do with proper cpp Anw, I'll try to implement |
Keep in mind that if you end up needing |
So I've been able to refactor all JSON-related function into I do still use
|
Hint: You can compile and run test in single command, useful for local developement: | ||
|
||
```shell | ||
cmake --build build -j --target llama-server && ./examples/server/tests/tests.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggerganov FYI, the change in tests.sh
should allow you to run test script from anywhere, not necessary need to cd tests
} | ||
|
||
json to_json_oai_compat() { | ||
std::string finish_reason = "length"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nero7991 I'm gonna merge this PR soon. You can adapt your PR #10645 to take advantage of this to_json_oaicompat()
. Please note that for /completion
endpoint, oaicompat_chat
will be false.
Since this is a private function, you can even refactor this function into to_json_oaicompat
and to_json_oaicompat_chat
for these 2 different cases, then have an if..else branch in to_json
to select the correct one
* server : (refactoring) reduce usage of json internally * move all response types to struct * wip [no ci] * many fixes * add virtual function * fix index * minor style fix * add std::move * refactor handle_completions_generic * add virtual functions * remove server.hpp * clarify server_sent_event RFC specs * apply review comments * fix model_alias and completion_probabilities * small clean up * remove virtual for to_json_oai_compat() * naming oai_compat --> oaicompat * fix unwanted recursive call * update docs
|
hmm ok seems like will fix that as soon as I get home |
* server : (refactoring) reduce usage of json internally * move all response types to struct * wip [no ci] * many fixes * add virtual function * fix index * minor style fix * add std::move * refactor handle_completions_generic * add virtual functions * remove server.hpp * clarify server_sent_event RFC specs * apply review comments * fix model_alias and completion_probabilities * small clean up * remove virtual for to_json_oai_compat() * naming oai_compat --> oaicompat * fix unwanted recursive call * update docs
Motivation
Currently, the internal code of
server.cpp
depends too much onjson
type. To a point that we're kinda abusing JSON to circumvent doing properstruct
in the code.Here is now the server process input/output data currently:
launch_slot_with_task
to put correct data intoslot
Proposal
In this PR, I propose that we only handle JSON in HTTP thread:
struct
json
struct
struct
to JSONChanges to the API
/slots
and/completions
:stopped_eos
,stopped_word
,stopped_limit
are replaced by an enum stringstop_type
/chat/completions
:finish_reason
returning incorrect value. If generation is stopped due to stop word or EOS,finish_reason="stop"
. Otherwise,finish_reason="length"
TODO: