Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for OSS models via HuggingFace endpoints #22

Open
jgpruitt opened this issue Jun 10, 2024 · 7 comments
Open

Add support for OSS models via HuggingFace endpoints #22

jgpruitt opened this issue Jun 10, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@jgpruitt
Copy link
Collaborator

No description provided.

@jgpruitt jgpruitt added the enhancement New feature or request label Jun 10, 2024
@Tostino
Copy link

Tostino commented Sep 11, 2024

Yup, this is simply missing passing in the base url into the openai client. If it did that this extension would be compatible with open source endpoints like vLLM.

@Tostino
Copy link

Tostino commented Oct 6, 2024

@jgpruitt I whipped out an implementation tonight that gets the OpenAI endpoints all working with a base_url variable and setting (similar to api_key).

Tested and working with vLLM's endpoint.

One thing I think I need to do is to support sending in extra parameters to the client through a json payload (or something). This would allow users to pass in any custom settings the open source endpoints may support (e.g. custom sampler settings like Beam Search, etc). After I get that going, and some tests added, i'll open a PR.

@Tostino
Copy link

Tostino commented Oct 12, 2024

@jgpruitt I'd like to pick your brain about the parameters and return types for pgai if that's alright. I am trying to get the postgres wrapper to fully comply with the OpenAI python client as it is specified (other than things like streaming which can't be supported), and see some differences I wanted to get some clarification on.

e.g. why are the embedding functions returning text/table vs the consistent json based on the input parameters? Why did you not follow the json return type for the embed endpoint, but you did for the chat completions endpoint?

I'd really like to get some consistency here, and would rather upstream the changes rather than creating a fork/new project to get it...but I also understand not breaking compatibility if that is a requirement at this point.
Another note, is that for the embed endpoint, there is an encoding_format parameter which can be either base64 or float (default), and that breaks when returning a vector type from the function.
IMO, the best way to deal with this is to have two versions of the functions. One that returns raw json(b) from the call, another that calls the json(b) function and parses out the data to return something a little bit more friendly to use from sql.

@Tostino
Copy link

Tostino commented Oct 12, 2024

@jgpruitt Oh...just dug into some of the other branches before submitting a PR. Looks like you already handled this, as well as a few of the other things I was working on. If I wanted to migrate the non-conflicting changes to a new branch that will be used for the 0.4.0 release, which should I use?

@nicoscordialo
Copy link

hey @Tostino, we'd also very much appreciated using HuggingFace models here. do you have any updates on where this is at? cheers!

@BW-Projects
Copy link

@Tostino Same here as well - would be nice to have OpenAI compatible APIs work such as a https://github.com/michaelfeil/infinity

@Tostino
Copy link

Tostino commented Dec 26, 2024

Here is where it sits @nicoscordialo and @BW-Projects: #219

There is currently a performance issue that is known and blocking things from getting merged. If anyone wants to attempt to fix that, it would push things along. I am out of free time for a while, and when I tried to spend some time looking at the problem today the project's build system changed enough to break the build for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants