-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing OpenAI functionality #116
Comments
Hey @Tostino, thanks for looking into the project.
This is something we might look post merge of the vectorizer branch. We've noticed some things from the openAI library as well, that might steer us in the direction of using http calls made by ourselves directly. What tests are giving you issues? Can you share the logs. We want to improve devex regarding the repo, so any feedback is welcome. We are merging this 2 projects together (vectorizer and extension) into one repository so things might be a little cumbersome while we iron out the wrinkles. Thanks. |
I don't know if that is necessary. The python OpenAI library is comprehensive and I haven't found anything that isn't possible using their client, even with third party open source inference servers. Even if you need to pass in extra parameters for your specific environment or use case. What specifically are you seeing that is steering you in the direct http call direction? e.g. if I want to use Beam Search with vLLM, I can pass that into the API call using the extra_body parameter like so:
This is hitting a vLLM server started using the following command (for reproducibility, this ran on 1x 3090):
As far as failing tests, they are here: https://github.com/Tostino/pgai/actions/runs/11337534000/job/31529286375 Specifically the Would be nice if there was a simple way to output a new version of the files if there were changes and get the changed version in place to be committed...but that would really only work if you can run the tests locally (which I can't yet).
As far as my issues running tests with podman, there are two I found so far... The first was this line |
The library uses a lot of typing mechanism and casting that increase the CPU consumption under heavy load. As a first test we just remove the library and it decrease CPU consumption considerably. We parked this small research for now. I think there's a function in the openAI library that returns the raw response directly, but we didn't try anything further. BTW we've seen your other issues and comments in the Discord, we are super focus on finishing the Vectorizer feature. That's why there's been a little bit of radio silence on our part. I apologize for that. |
No problem at all.
A good portion of the overhead time is spent creating the client for each and every request. I had already planned on doing some work on caching the client between requests, because I know there is some serious overhead in setting that up for every call from a past project. I spent some time on that today, as well as writing some benchmark code to quantify the changes. Edit: I had a bunch of benchmark results here, but I was in a bit of a rush getting them out and overlooked some issues in my benchmark code with the threading / db connections that were introducing an artificial bottleneck...will work on getting a correct benchmark written first and then will repost results. |
So I spent way too much time benchmarking and digging into issues with the async client when in the plpython environment and outside of it. There is a case to be made for overhead from the OpenAI wrapper for actually making the chat completion call. The best I could get with using the OpenAI client was ~2.5-3ms per call for the chat completion endpoint. I think that is reasonably acceptable, considering the overhead for re-creating the client on every call is ~20-30ms. I am still having issues though. I am getting inconsistent performance inside the plpython environment. My first call to the endpoint is quick (3ms), and then every call after it is very delayed (40ms). Need to figure out what is happening here. I cannot reproduce outside of plpython. |
Hey @Tostino can you open a PR with your changes so that we can review and discuss them? |
@alejandrodnm Working on it...the rebase is a bit of a pain. Looks like you changed how the API keys were handled, so that's taking a bit for me to integrate. |
Alright, open: #219 Keep in mind, this still has the unexplained 2nd run latency in this PR. I hadn't had a chance to dig into why. |
Just wanted to mention that I am looking at the
vectorizor
branch, as that seems to be the active development branch as far as I can tell.The current OpenAI API is not really compliant with the python version of the API, restricting it's use.
The extension is missing a bunch of arguments for the openai endpoints, and not all of the arguments that are there behave in-line with the API spec. Some of the missing arguments are needed for using proxy solutions or custom samplers on open source inference servers for example,
The return types for the embed endpoints also seems like it should just be json, because there are options to return the data in either base64 or floating point numbers for example.
I think the functions that provide a "simple" wrapper should be designed on top of the raw functions that return json rather than strict data types. Probably in a bit more structured a way than the current "simple" chat completion function. I haven't gotten to this yet. Having a full featured base set of functions was my priority first.
Either way, I put in a good bit of work into a branch, but cannot get the test suite to pass due to white-space issues with the
expected
files. I'm at my wits end with messing with the test suite considering I cannot run it locally because your build scripts aren't compatible with podman and that's what I use on my dev machine.Comparison with branch: https://github.com/timescale/pgai/compare/vectorizer...Tostino:pgai:fix-openai-api?expand=1
The last major argument I haven't taken into consideration here is
timeout
which will need a little more thought because it needs to be supported at both the client level as well as the function level.Was planning on doing that after getting some feedback.
The text was updated successfully, but these errors were encountered: