-
Notifications
You must be signed in to change notification settings - Fork 38
Triton inference server support #26
base: main
Are you sure you want to change the base?
Conversation
from langchain.chains import LLMChain | ||
from langchain import Cohere, PromptTemplate | ||
|
||
# Optional imports | ||
from googleapiclient.discovery import build | ||
|
||
|
||
class Tool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs an update function that can be called, retrieval and calendar will need this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what should the update do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass in the data, the update will use either the text, url, or etc to setup for the generation
selected_start_tokens = probs[:, insert_api_at, start_tokens].argmax().item() | ||
|
||
for i in range(m): | ||
_, api_calls = await model( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add in the tool here before sending, e.g. [Calendar(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, shouldn't that already be in the annotation prompt?
|
||
async def sample_and_filter_api_calls(tool, text, top_k, n_gen): | ||
async for tool_use in sample_api_calls( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should chunk this, otherwise we're cutting off everything past 512? tokens
Needs a tool update function for current data
Cool with putting this into a dev branch and working on the stuff I found independently if that sounds good, or we can update this |
This PR adds support for triton inference server. Might be worth @dmahan93 or @conceptofmind trying it out to verify. Also maybe worth adding a non-triton
infer_model
function that just loads the.pt
file and runs it in process.