Triton inference server support #26

cat-state · 2023-03-10T16:20:51Z

This PR adds support for triton inference server. Might be worth @dmahan93 or @conceptofmind trying it out to verify. Also maybe worth adding a non-triton infer_model function that just loads the .pt file and runs it in process.

dmahan93 · 2023-03-17T23:33:41Z

tools.py

 from langchain.chains import LLMChain
 from langchain import Cohere, PromptTemplate

 # Optional imports
 from googleapiclient.discovery import build


+class Tool:


needs an update function that can be called, retrieval and calendar will need this

what should the update do?

pass in the data, the update will use either the text, url, or etc to setup for the generation

dmahan93 · 2023-03-17T23:35:49Z

data_generator.py

+        selected_start_tokens = probs[:, insert_api_at, start_tokens].argmax().item()
+
+        for i in range(m):
+            _, api_calls = await model(


need to add in the tool here before sending, e.g. [Calendar(

hmm, shouldn't that already be in the annotation prompt?

dmahan93 · 2023-03-17T23:39:31Z

data_generator.py

+
+        async def sample_and_filter_api_calls(tool, text, top_k, n_gen):
+            async for tool_use in sample_api_calls(


Should chunk this, otherwise we're cutting off everything past 512? tokens

Needs a tool update function for current data

dmahan93 · 2023-03-17T23:43:37Z

Cool with putting this into a dev branch and working on the stuff I found independently if that sounds good, or we can update this

cat-state added 10 commits February 26, 2023 03:06

WIP rewrite

8cab854

works but broken padding maybe

ccd9078

merge

f64e742

merge

b0c2f0c

push triton stuff

33a84c2

clean up and merge triton

825916f

clean up and merge triton

b4ad3ba

improve prefiltering

31aa83a

better pbar

4bc1932

asyncify tool usage via ThreadPoolExecutor

0a7adae

dmahan93 reviewed Mar 17, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triton inference server support #26

Triton inference server support #26

cat-state commented Mar 10, 2023

dmahan93 Mar 17, 2023

cat-state Mar 24, 2023

dmahan93 Mar 24, 2023

dmahan93 Mar 17, 2023

cat-state Mar 24, 2023

dmahan93 Mar 17, 2023

dmahan93 commented Mar 17, 2023


		async def sample_and_filter_api_calls(tool, text, top_k, n_gen):
		async for tool_use in sample_api_calls(

Triton inference server support #26

Are you sure you want to change the base?

Triton inference server support #26

Conversation

cat-state commented Mar 10, 2023

dmahan93 Mar 17, 2023

Choose a reason for hiding this comment

cat-state Mar 24, 2023

Choose a reason for hiding this comment

dmahan93 Mar 24, 2023

Choose a reason for hiding this comment

dmahan93 Mar 17, 2023

Choose a reason for hiding this comment

cat-state Mar 24, 2023

Choose a reason for hiding this comment

dmahan93 Mar 17, 2023

Choose a reason for hiding this comment

dmahan93 commented Mar 17, 2023