All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Added new DeepSeek V3 model via Deepseek (alias
dschat
or simplyds
, because they are in a category of their own), Fireworks.ai and Together.ai (fds
andtds
for hosted Deepseek V3, respectively). Added Qwen 2.5 Coder 32B (aliasfqwen25c
ortqwen25c
for Fireworks.ai and Together.ai, respectively). - Added the reasoning Qwen QwQ 32B hosted on Together.ai.
- Added the new OpenAI's O1 model to the model registry (alias
o1
).
- Added assertion in
response_to_message
for missing:tool_calls
key in the response message. It's model failure but it wasn't obvious from the original error. - Fixes error for usage information in CamelCase from OpenAI servers (Gemini proxy now sends it in CamelCase).
- Added a new Gemini 2.0 Flash Experimental model (
gemini-2.0-flash-exp
) and updated the aliasgem20f
with it.
- Added a new
cache=:all_but_last
cache strategy for Anthropic models to enable caching of the entire conversation except for the last user message (useful for longer conversations that you want to re-use, but not continue). See the docstrings for more information on which cache strategy to use.
- Added a new Gemini Experimental model from December 2024 (
gemini-exp-1206
) and updated thegemexp
alias to point to it.
- Added support for Groq's new Llama3.3 models. Updated
gllama370
,gl70
,glm
aliases tollama-3.3-70b-versatile
and addedgl70s
,glms
aliases tollama-3.3-70b-specdec
(faster with speculative decoding).
- Fixed a bug in
extract_docstring
where it would not correctly block "empty" docstrings on Julia 1.11.
- Removed unnecessary printing to
stdout
during precompilation inprecompile.jl
. - Fixed a "bug-waiting-to-happen" in tool use.
to_json_type
now enforces users to provide concrete types, because abstract types can lead to errors during JSON3 deserialization. - Flowed through a bug fix in
StreamCallback
where the usage information was being included in the response even whenusage=nothing
. Lower bound ofStreamCallbacks
was bumped to0.5.1
.
- Changed the official ENV variable for MistralAI API from
MISTRALAI_API_KEY
toMISTRAL_API_KEY
to be compatible with the Mistral docs.
- Added a new Gemini Experimental model from November 2024 (
gemini-exp-1121
with aliasgemexp
). - Added a new
AnnotationMessage
type for keeping human-only information in the message changes. See?annotate!
on how to use it. - Added a new
ConversationMemory
type (exported) to enable long multi-turn conversations with a truncated memory of the conversation history. Truncation works in "batches" to not prevent caching. See?ConversationMemory
andget_last
for more information.
- Changed the ENV variable for MistralAI API from
MISTRALAI_API_KEY
toMISTRAL_API_KEY
to be compatible with the Mistral docs.
- Added support for images in
aitools
to enable passing screenshots viaimage_path
argument (extended to both OpenAI and Anthropic APIs, uses?UserMessageWithImages
internally). - Added the latest Gemini Experimental model via OpenAI compatibility mode (
gemini-exp-1114
with aliasgemexp
).
- Added support for Google's Gemini API via OpenAI compatibility mode (
GoogleOpenAISchema
). Use model aliasesgem15p
(Gemini 1.5 Pro),gem15f
(Gemini 1.5 Flash), andgem15f8
(Gemini 1.5 Flash 8b). Set your ENV api keyGOOGLE_API_KEY
to use it. - Thanks to @sixzero, added support for Google Flash via OpenRouter and Qwen 72b models.
- Fixed a bug in
tool_call_signature
where hidden fields were not hidden early enough and would fail if a Dict argument was provided. It used to do the processing after, but Dicts cannot be processed, so we're now masking the fields upfront.
- Added a new Claude 3.5 Haiku model (
claude-3-5-haiku-latest
) and updated the aliasclaudeh
with it. - Added support for XAI's Grok 2 beta model (
grok-beta
) and updated the aliasgrok
with it. Set your ENV api keyXAI_API_KEY
to use it.
- Added a new
extras
field toToolRef
to enable additional parameters in the tool signature (eg,display_width_px
,display_height_px
for the:computer
tool). - Added a new kwarg
unused_as_kwargs
toexecute_tool
to enable passing unused args as kwargs (see?execute_tool
for more information). Helps with using kwarg-based functions.
- Updated the compat bounds for
StreamCallbacks
to enable both v0.4 and v0.5 (Fixes Julia 1.9 compatibility). - Updated the return type of
tool_call_signature
toDict{String, AbstractTool}
to enable better interoperability with different tool types.
- Added new Claude 3.5 Sonnet model (
claude-3-5-sonnet-latest
) and updated the aliasclaude
andclaudes
with it. - Added support for Ollama streaming with schema
OllamaSchema
(see?StreamCallback
for more information). SchemaOllamaManaged
is NOT supported (it's legacy and will be removed in the future). - Moved the implementation of streaming callbacks to a new
StreamCallbacks
package. - Added new error types for tool execution to enable better error handling and reporting (see
?AbstractToolError
). - Added support for Anthropic's new pre-trained tools via
ToolRef
(see?ToolRef
), to enable the feature, use the:computer_use
beta header (eg,aitools(..., betas = [:computer_use])
).
- Fixed a bug in
call_cost
where the cost was not calculated if any non-AIMessages were provided in the conversation.
- Fixed a bug in multi-turn tool calls for OpenAI models where an empty tools array could have been, which causes an API error.
- New field
name
introduced inAbstractChatMessage
andAIToolRequest
messages to enable role-based workflows. It initializes tonothing
, so it is backward compatible.
- Extends support for structured extraction with multiple "tools" definitions (see
?aiextract
). - Added new primitives
Tool
(to re-use tool definitions) and a functionaitools
to support mixed structured and non-structured workflows, eg, agentic workflows (see?aitools
). - Added a field
name
toAbstractChatMessage
andAIToolRequest
messages to enable role-based workflows. - Added a support for partial argument execution with
execute_tool
function (provide your own context to override the arg values). - Added support for SambaNova hosted models (set your ENV
SAMBANOVA_API_KEY
). - Added many new models from Mistral, Groq, Sambanova, OpenAI.
- Renamed
function_call_signature
totool_call_signature
to better reflect that it's used for tools, but kept a link to the old name for back-compatibility. - Improves structured extraction for Anthropic models (now you can use
tool_choice
keyword argument to specify which tool to use or re-use your parsed tools). - When log probs are requested, we will now also log the raw information in
AIMessage.extras[:log_prob]
field (previously we logged only the full sum). This enables more nuanced log-probability calculations for individual tokens.
- Added support for Cerebras hosted models (set your ENV
CEREBRAS_API_KEY
). Available model aliases:cl3
(Llama3.1 8bn),cl70
(Llama3.1 70bn). - Added a kwarg to
aiclassify
to provide a custom token ID mapping (token_ids_map
) to work with custom tokenizers.
- Improved the implementation of
airetry!
to concatenate feedback from all ancestor nodes ONLY IFfeedback_inplace=true
(because otherwise LLM can see it in the message history).
- Fixed a potential bug in
airetry!
where theaicall
object was not properly validated to ensure it has beenrun!
first.
- Support for Azure OpenAI API. Requires two environment variables to be st:
AZURE_OPENAI_API_KEY
andAZURE_OPENAI_HOST
(i.e. https://.openai.azure.com).
- Removed accidental INFO log in Anthropic's
aigenerate
- Changed internal logging in
streamcallback
to use@debug
when printing raw data chunks.
- Enabled Streaming for OpenAI-compatible APIs (eg, DeepSeek Coder)
- If streaming to stdout, also print a newline at the end of streaming (to separate multiple outputs).
- Relaxed the type-assertions in
StreamCallback
to allow for more flexibility.
- Added support for OpenAI's JSON mode for
aiextract
(just provide kwargjson_mode=true
). Reference Structured Outputs. - Added support for OpenRouter's API (you must set ENV
OPENROUTER_API_KEY
) to provide access to more models like Cohere Command R+ and OpenAI's o1 series. Reference OpenRouter. - Added new OpenRouter hosted models to the model registry (prefixed with
or
):oro1
(OpenAI's o1-preview),oro1m
(OpenAI's o1-mini),orcop
(Cohere's command-r-plus),orco
(Cohere's command-r). Theor
prefix is to avoid conflicts with existing models and OpenAI's aliases, then the goal is to provide 2 letters for each model and 1 letter for additional qualifier (eg, "p" for plus, "m" for mini) ->orcop
(OpenRouter cohere's COmmand-r-Plus).
- Updated FAQ with instructions on how to access new OpenAI o1 models via OpenRouter.
- Updated FAQ with instructions on how to add custom APIs (with an example
examples/adding_custom_API.jl
).
- Fixed a bug in
aiclassify
for the OpenAI GPT4o models that have a different tokenizer. Unknown model IDs will throw an error.
- Improved the performance of BM25/Keywords-based indices for >10M documents. Introduced new kwargs of
min_term_freq
andmax_terms
inRT.get_keywords
to reduce the size of the vocabulary. See?RT.get_keywords
for more information.
- Added beta headers to enable long outputs (up to 8K tokens) with Anthropic's Sonnet 3.5 (see
?anthropic_extra_headers
). - Added a kwarg to prefill (
aiprefill
) AI responses with Anthropic's models to improve steerability (see?aigenerate
).
- Documentation of
aigenerate
to make it clear that ifstreamcallback
is provide WITHflavor
set, there is no automatic configuration and the user must provide the correctapi_kwargs
. - Grouped Anthropic's beta headers as a comma-separated string as per the latest API specification.
- Added a new EXPERIMENTAL
streamcallback
kwarg foraigenerate
with the OpenAI and Anthropic prompt schema to enable custom streaming implementations. Simplest usage is simply withstreamcallback=stdout
, which will print each text chunk into the console. System is modular enabling custom callbacks and allowing you to inspect received chunks. See?StreamCallback
for more information. It does not support tools yet.
- Added more flexible structured extraction with
aiextract
-> now you can simply provide the field names and, optionally, their types without specifying the struct itself (inaiextract
, provide the fields likereturn_type = [:field_name => field_type]
). - Added a way to attach field-level descriptions to the generated JSON schemas to better structured extraction (see
?update_schema_descriptions!
to see the syntax), which was not possible with struct-only extraction.
AIMessage
andDataMessage
now have a new fieldextras
to hold any API-specific metadata in a simple dictionary. Change is backward-compatible (defaults tonothing
).
- Added EXPERIMENTAL support for Anthropic's new prompt cache (see ?
aigenerate
and look forcache
kwarg). Note that COST estimate will be wrong (ignores the caching discount for now). - Added a new
extras
field toAIMessage
andDataMessage
to hold any API-specific metadata in a simple dictionary (eg, used for reporting on the cache hit/miss).
- Added new OpenAI's model "chatgpt-4o-latest" to the model registry with alias "chatgpt". This model represents the latest version of ChatGPT-4o tuned specifically for ChatGPT.
- Implements the new OpenAI structured output mode for
aiextract
(just provide kwargstrict=true
). Reference blog post.
- Added a new specialized method for
hcat(::DocumentTermMatrix, ::DocumentTermMatrix)
to allow for combining large DocumentTermMatrices (eg, 1M x 100K).
- Increased the compat bound for HTTP.jl to 1.10.8 to fix a bug with Julia 1.11.
- Fixed a bug in
vcat_labeled_matrices
where extremely large DocumentTermMatrix could run out of memory. - Fixed a bug in
score_to_unit_scale
where empty score vectors would error (now returns the empty array back).
- Added a new model
gpt-4o-2024-08-06
to the model registry (aliasgpt4ol
withl
for latest). It's the latest version of GPT4o, which is faster and cheaper than the previous version.
getindex(::MultiIndex, ::MultiCandidateChunks)
now returns sorted chunks by default (sorted=true
) to guarantee that potentialcontext
(=chunks
) is sorted by descending similarity score across different sub-indices.
- Updated a
hcat
implementation inRAGTools.get_embeddings
to reduce memory allocations for large embedding batches (c. 3x fewer allocations, seehcat_truncate
). - Updated
length_longest_common_subsequence
signature to work only for pairs ofAbstractString
to not fail silently when wrong arguments are provided.
- Changed the default behavior of
getindex(::MultiIndex, ::MultiCandidateChunks)
to always return sorted chunks for consistency with other similar functions and correctretrieve
behavior. This was accidentally changed in v0.40 and is now reverted to the original behavior.
- Added Mistral Large 2 and Mistral-Nemo to the model registry (alias
mistral-nemo
).
- Fixed a bug where
wrap_string
would not correctly split very long Unicode words.
- Added Llama 3.1 registry records for Fireworks.ai (alias
fllama3
,fllama370
,fllama3405
andfls
,flm
,fll
for small/medium/large similar to the other providers).
- Registered new Meta Llama 3.1 models hosted on GroqCloud and Together.ai (eg, Groq-hosted
gllama370
has been updated to point to the latest available model and 405b model now has aliasgllama3405
). Because that's quite clunky, I've added abbreviations based on sizes small/medium/large (that is 8b, 70b, 405b) undergls/glm/gll
for Llama 3.1 hosted on GroqCloud (similarly, we now havetls/tlm/tll
for Llama3.1 on Together.ai). - Generic model aliases for Groq and Together.ai for Llama3 models have been updated to point to the latest available models (Llama 3.1).
- Added Gemma2 9b model hosted on GroqCloud to the model registry (alias
ggemma9
).
- Minor optimizations to
SubDocumentTermMatrix
to reduce memory allocations and improve performance.
- Introduced a "view" of
DocumentTermMatrix
(=SubDocumentTermMatrix
) to allow views of Keyword-based indices (ChunkKeywordsIndex
). It's not a pure view (TF matrix is materialized to prevent performance degradation).
- Fixed a bug in
find_closest(finder::BM25Similarity, ...)
where the view ofDocumentTermMatrix
(ie,view(DocumentTermMatrix(...), ...)
) was undefined. - Fixed a bug where a view of a view of a
ChunkIndex
wouldn't intersect the positions (it was returning only the latest requested positions).
- Introduces
RAGTools.SubChunkIndex
to allow projectingviews
of various indices. Useful for pre-filtering your data (faster and more precise retrieval). See?RT.SubChunkIndex
for more information and how to use it.
CandidateChunks
andMultiCandidateChunks
intersection methods updated to be an order of magnitude faster (useful for large sets like tag filters).
- Fixed a bug in
find_closest(finder::BM25Similarity, ...)
whereminimum_similarity
kwarg was not implemented.
- Changed the default model for
ai*
chat functions (PT.MODEL_CHAT
) fromgpt3t
togpt4om
(GPT-4o-mini). See the LLM-Leaderboard results and the release blog post.
- Added the new GPT-4o-mini to the model registry (alias
gpt4om
). It's the smallest and fastest model based on GPT4 that is cheaper than GPT3.5Turbo.
- Added a new tagging filter
RT.AllTagFilter
toRT.find_tags
, which requires all tags to be present in a chunk. - Added an option in
RT.get_keywords
to set the minimum length of the keywords. - Added a new method for
reciprocal_rank_fusion
and utility for standardizing candidate chunk scores (score_to_unit_scale
).
- Fixed a bug in CohereReranker when it wouldn't handle correctly CandidateChunks.
- Increase compat bound for FlashRank to 0.4
- Added a prompt template for RAG query expansion for BM25 (
RAGQueryKeywordExpander
)
- Fixed a small bug in the truncation step of the RankGPT's
permutation_step!
(bad indexing of string characters). - Fixed a bug where a certain combination of
rank_start
andrank_end
would not result the last sliding window. - Fixed a bug where partially filled
RAGResult
would fail pretty-printing withpprint
- Added a utility function to RAGTools
reciprocal_rank_fusion
, as a principled way to merge multiple rankings. See?RAGTools.Experimental.reciprocal_rank_fusion
for more information.
RankGPT
implementation for RAGTools chunk re-ranking pipeline. See?RAGTools.Experimental.rank_gpt
for more information and corresponding reranker type?RankGPTReranker
.
- Add back accidentally dropped DBKS keys
- Fixed loading RAGResult when one of the candidate fields was
nothing
. - Utility type checks like
isusermessage
,issystemmessage
,isdatamessage
,isaimessage
,istracermessage
do not throw errors when given any arbitrary input types (previously they only worked forAbstractMessage
types). It's aisa
check, so it should work for all input types. - Changed preference loading to use typed
global
instead ofconst
, to fix issues with API keys not being loaded properly on start. You can now also callPromptingTools.load_api_keys!()
to re-load the API keys (and ENV variables) manually.
- Added registry record for Anthropic Claude 3.5 Sonnet with ID
claude-3-5-sonnet-20240620
(read the blog post). Aliases "claude" and "claudes" have been linked to this latest Sonnet model.
- Changed behavior of
RAGTools.rerank(::FlashRanker,...)
to always dedupe input chunks (to reduce compute requirements).
- Fixed a bug in verbose INFO log in
RAGTools.rerank(::FlashRanker,...)
.
- Improved the implementation of
RAGTools.unpack_bits
to be faster with fewer allocations.
- The return type of
RAGTools.find_tags(::NoTagger,...)
is now::Nothing
instead ofCandidateChunks
/MultiCandidateChunks
with all documents. Base.getindex(::MultiIndex, ::MultiCandidateChunks)
now always returns sorted chunks for consistency with the behavior of othergetindex
methods on*Chunks
.
- Cosine similarity search now uses
partialsortperm
for better performance on large datasets. - Skip unnecessary work when the tagging functionality in the RAG pipeline is disabled (
find_tags
withNoTagger
always returnsnothing
which improves the compiled code). - Changed the default behavior of
getindex(::MultiIndex, ::MultiCandidateChunks)
to always return sorted chunks for consistency with other similar functions. Note that you should always use re-rankering anyway (seeFlashRank.jl
).
- Fixed a bug on Julia 1.11 beta by adding REPL stdlib as a direct dependency.
- Fixed too restrictive argument types for
RAGTools.build_tags
method.
- Added package extension for FlashRank.jl to support local ranking models. See
?RT.FlashRanker
for more information orexamples/RAG_with_FlashRank.jl
for a quick example.
- Added Mistral coding-oriented Codestral to the model registry, aliased as
codestral
ormistralc
. It's very fast, performant and much cheaper than similar models.
- Added a keyword-based search similarity to RAGTools to serve both for baseline evaluation and for advanced performance (by having a hybrid index with both embeddings and BM25). See
?RT.KeywordsIndexer
and?RT.BM25Similarity
for more information, to build usebuild_index(KeywordsIndexer(), texts)
or convert an existing embeddings-based indexChunkKeywordsIndex(index)
.
- For naming consistency,
ChunkIndex
in RAGTools has been renamed toChunkEmbeddingsIndex
(with an aliasChunkIndex
for backwards compatibility). There are now two main index types:ChunkEmbeddingsIndex
andChunkKeywordsIndex
(=BM25), which can be combined into aMultiIndex
to serve as a hybrid index.
- Fixed a rare bug where prompt templates created on MacOS will come with metadata that breaks the prompt loader. From now on, it ignores any dotfiles (hidden files starting with ".").
- Fixed a bug where utility
length_longest_common_subsequence
was not working with complex Unicode characters
- Added new field
meta
toTracerMessage
andTracerMessageLike
to hold metadata in a simply dictionary. Change is backward-compatible. - Changed behaviour of
aitemplates(name::Symbol)
to look for the exact match on the template name, not just a partial match. This is a breaking change for theaitemplates
function only. Motivation is that having multiple matches could have introduced subtle bugs when looking up valid placeholders for a template.
- Improved support for
aiclassify
with OpenAI models (you can now encode upto 40 choices). - Added a template for routing questions
:QuestionRouter
(to be used withaiclassify
) - Improved tracing by
TracerSchema
to automatically capture crucial metadata such as any LLM API kwargs (api_kwargs
), use of prompt templates and its version. Information is captured inmeta(tracer)
dictionary. See?TracerSchema
for more information. - New tracing schema
SaverSchema
allows to automatically serialize all conversations. It can be composed with other tracing schemas, eg,TracerSchema
to automatically capture necessary metadata and serialize. See?SaverSchema
for more information. - Updated options for Binary embeddings (refer to release v0.18 for motivation). Adds utility functions
pack_bits
andunpack_bits
to move between binary and UInt64 representations of embeddings. RAGTools adds the correspondingBitPackedBatchEmbedder
andBitPackedCosineSimilarity
for fast retrieval on these Bool<->UInt64 embeddings (credit to domluna's tinyRAG).
- Fixed a bug where
aiclassify
would not work when returning the full conversation for choices with extra descriptions
- Added model registry record for the latest OpenAI GPT4 Omni model (
gpt4o
) - it's as good as GPT4, faster and cheaper.
- Added support for DeepSeek models via the
dschat
anddscode
aliases. You can set theDEEPSEEK_API_KEY
environment variable to your DeepSeek API key.
- Added new prompt templates for "Expert" tasks like
LinuxBashExpertAsk
,JavascriptExpertTask
, etc. - Added new prompt templates for self-critiquing agents like
ChiefEditorTranscriptCritic
,JuliaExpertTranscriptCritic
, etc.
- Extended
aicodefixer_feedback
methods to work withAICode
andAIGenerate
.
- Added support for Groq, the fastest LLM provider out there. It's free for now, so you can try it out - you just need to set your
GROQ_API_KEY
. We've added Llama3 8b (alias "gllama3"), 70b (alias "gllama370") and Mixtral 8x7b (alias "gmixtral"). For the shortcut junkies, we also added a shorthand Llama3 8b = "gl3" (first two letters and the last digit), Llama3 70b = "gl70" (first two letters and the last two digits).
- New models added to the model registry: Llama3 8b on Ollama (alias "llama3" for convenience) and on Together.ai (alias "tllama3", "t" stands for Together.ai), also adding the llama3 70b on Together.ai (alias "tllama370") and the powerful Mixtral-8x22b on Together.ai (alias "tmixtral22").
- Fixed a bug where pretty-printing
RAGResult
would forget a newline between the sources and context sections.
- Fixed
truncate_dimension
to ignore when 0 is provided (previously it would throw an error).
- Added a few new open-weights models hosted by Fireworks.ai to the registry (DBRX Instruct, Mixtral 8x22b Instruct, Qwen 72b). If you're curious about how well they work, try them!
- Added basic support for observability downstream. Created custom callback infrastructure with
initialize_tracer
andfinalize_tracer
and dedicated types areTracerMessage
andTracerMessageLike
. See?TracerMessage
for more information and the correspondingaigenerate
docstring. - Added
MultiCandidateChunks
which can hold candidates for retrieval across many indices (it's a flat structure to be similar toCandidateChunks
and easy to reason about). - JSON serialization support extended for
RAGResult
,CandidateChunks
, andMultiCandidateChunks
to increase observability of RAG systems - Added a new search refiner
TavilySearchRefiner
- it will search the web via Tavily API to try to improve on the RAG answer (see?refine!
). - Introduced a few small utilities for manipulation of nested kwargs (necessary for RAG pipelines), check out
getpropertynested
,setpropertynested
,merge_kwargs_nested
.
- [BREAKING] change to
CandidateChunks
where it's no longer allowed to be nested (ie,cc.positions
being a list of severalCandidateChunks
). This is a breaking change for theRAGTools
module only. We have introduced a newMultiCandidateChunks
types that can refer toCandidateChunks
across many indices. - Changed default model for
RAGTools.CohereReranker
to "cohere-rerank-english-v3.0".
wrap_string
utility now correctly splits only on spaces. Previously it would split on newlines, which would remove natural formatting of prompts/messages when displayed viapprint
- [BREAKING CHANGE] The default GPT-4 Turbo model alias ("gpt4t") now points to the official GPT-4 Turbo endpoint ("gpt-4-turbo").
- Adds references to
mistral-tiny
(7bn parameter model from MistralAI) to the model registry for completeness. - Adds the new GPT-4 Turbo model (
"gpt-4-turbo-2024-04-09"
), but you can simply use alias"gpt4t"
to access it.
- Adds support for binary embeddings in RAGTools (dispatch type for
find_closest
isfinder=BinaryCosineSimilarity()
), but you can also just convert the embeddings to binary yourself (always chooseMatrix{Bool}
for speed, notBitMatrix
) and use without any changes (very little performance difference at the moment). - Added Ollama embedding models to the model registry ("nomic-embed-text", "mxbai-embed-large") and versioned MistralAI models.
- Added template for data extraction with Chain-of-thought reasoning:
:ExtractDataCoTXML
. - Added data extraction support for Anthropic models (Claude 3) with
aiextract
. Try it with Claude-3 Haiku (model="claudeh"
) and Chain-of-though template (:ExtractDataCoTXML
). See?aiextract
for more information and check Anthropic's recommended practices.
- Fixed a bug in
print_html
where the custom kwargs were not being passed to theHTML
constructor.
- Added support for
aigenerate
with Anthropic API. Preset model aliases areclaudeo
,claudes
, andclaudeh
, for Claude 3 Opus, Sonnet, and Haiku, respectively. - Enabled the GoogleGenAI extension since
GoogleGenAI.jl
is now officially registered. You can useaigenerate
by setting the model togemini
and providing theGOOGLE_API_KEY
environment variable. - Added utilities to make preparation of finetuning datasets easier. You can now export your conversations in JSONL format with ShareGPT formatting (eg, for Axolotl). See
?PT.save_conversations
for more information. - Added
print_html
utility for RAGTools module to print HTML-styled RAG answer annotations for web applications (eg, Genie.jl). See?PromptingTools.Experimental.RAGTools.print_html
for more information and examples.
- Fixed a bug where
set_node_style!
was not accepting any Stylers except for the vanillaStyler
.
- Added pretty-printing via
PT.pprint
that does NOT depend on Markdown and splits text to adjust to the width of the output terminal. It is useful in notebooks to add new lines. - Added support annotations for RAGTools (see
?RAGTools.Experimental.annotate_support
for more information) to highlight which parts of the generated answer come from the provided context versus the model's knowledge base. It's useful for transparency and debugging, especially in the context of AI-generated content. You can experience it if you run the output ofairag
through pretty printing (PT.pprint
). - Added utility
distance_longest_common_subsequence
to find the normalized distance between two strings (or a vector of strings). Always returns a number between 0-1, where 0 means the strings are identical and 1 means they are completely different. It's useful for comparing the similarity between the context provided to the model and the generated answer. - Added a new documentation section "Extra Tools" to highlight key functionality in various modules, eg, the available text utilities, which were previously hard to discover.
- Extended documentation FAQ with tips on tackling rate limits and other common issues with OpenAI API.
- Extended documentation with all available prompt templates. See section "Prompt Templates" in the documentation.
- Added new RAG interface underneath
airag
inPromptingTools.RAGTools.Experimental
. Each step now has a dedicated function and a type that can be customized to achieve arbitrary logic (via defining methods for your own types).airag
is split into two main steps:retrieve
andgenerate!
. You can use them separately or together. See?airag
for more information.
- Renamed
split_by_length
text splitter torecursive_splitter
to make it easier to discover and understand its purpose.split_by_length
is still available as a deprecated alias.
- Fixed a bug where
LOCAL_SERVER
default value was not getting picked up. Now, it defaults tohttp://localhost:10897/v1
if not set in the preferences, which is the address of the OpenAI-compatible server started by Llama.jl. - Fixed a bug in multi-line code annotation, which was assigning too optimistic scores to the generated code. Now the score of the chunk is the length-weighted score of the "top" source chunk divided by the full length of score tokens (much more robust and demanding).
- Added experimental support for image generation with OpenAI DALL-E models, eg,
msg = aiimage("A white cat on a car")
. See?aiimage
for more details.
- Added a new documentation section "How it works" to explain the inner workings of the package. It's a work in progress, but it should give you a good idea of what's happening under the hood.
- Improved template loading, so if you load your custom templates once with
load_templates!("my/template/folder)
, it will remember your folder for all future re-loads. - Added convenience function
create_template
to create templates on the fly without having to deal withPT.UserMessage
etc. If you specify the keyword argumentload_as = "MyName"
, the template will be immediately loaded to the template registry. See?create_template
for more information and examples.
- Added initial support for Google Gemini models for
aigenerate
(requires environment variableGOOGLE_API_KEY
and package GoogleGenAI.jl to be loaded). It must be added explicitly as it is not yet registered. - Added a utility to compare any two string sequences (and other iterators)
length_longest_common_subsequence
. It can be used to fuzzy match strings (eg, detecting context/sources in an AI-generated response or fuzzy matching AI response to some preset categories). See the docstring for more information?length_longest_common_subsequence
. - Rewrite of
aiclassify
to classify into an arbitrary list of categories (including with descriptions). It's a quick and easy option for "routing" and similar use cases, as it exploits the logit bias trick and outputs only 1 token. Currently, onlyOpenAISchema
is supported. See?aiclassify
for more information. - Initial support for multiple completions in one request for OpenAI-compatible API servers. Set via API kwarg
n=5
and it will request 5 completions in one request, saving the network communication time and paying the prompt tokens only once. It's useful for majority voting, diversity, or challenging agentic workflows. - Added new fields to
AIMessage
andDataMessage
types to simplify tracking in complex applications. Added fields:cost
- the cost of the query (summary per call, so count only once if you requested multiple completions in one call)log_prob
- summary log probability of the generated sequence, set API kwarglogprobs=true
to receive itrun_id
- ID of the AI API callsample_id
- ID of the sample in the batch if you requested multiple completions, otherwisesample_id==nothing
(they will have the samerun_id
)finish_reason
- the reason why the AI stopped generating the sequence (eg, "stop", "length") to provide more visibility for the user
- Support for Fireworks.ai and Together.ai providers for fast and easy access to open-source models. Requires environment variables
FIREWORKS_API_KEY
andTOGETHER_API_KEY
to be set, respectively. See the?FireworksOpenAISchema
and?TogetherOpenAISchema
for more information. - Added an
extra
field toChunkIndex
object for RAG workloads to allow additional flexibility with metadata for each document chunk (assumed to be a vector of the same length as the document chunks). - Added
airetry
function toPromptingTools.Experimental.AgentTools
to allow "guided" automatic retries of the AI calls (eg,AIGenerate
which is the "lazy" counterpart ofaigenerate
) if a given condition fails. It's useful for robustness and reliability in agentic workflows. You can provide conditions as functions and the same holds for feedback to the model as well. See a guessing game example in?airetry
.
- Updated names of endpoints and prices of Mistral.ai models as per the latest announcement and pricing. Eg,
mistral-small
->mistral-small-latest
. In addition, the latest Mistral model has been addedmistral-large-latest
(aliased asmistral-large
andmistrall
, same for the others).mistral-small-latest
andmistral-large-latest
now support function calling, which means they will work withaiextract
(You need to explicitly providetool_choice
, see the docs?aiextract
).
- Removed package extension for GoogleGenAI.jl, as it's not yet registered. Users must load the code manually for now.
- Added more specific kwargs in
Experimental.RAGTools.airag
to give more control over each type of AI call (ie,aiembed_kwargs
,aigenerate_kwargs
,aiextract_kwargs
) - Move up compat bounds for OpenAI.jl to 0.9
- Fixed a bug where obtaining an API_KEY from ENV would get precompiled as well, causing an error if the ENV was not set at the time of precompilation. Now, we save the
get(ENV...)
into a separate variable to avoid being compiled away.
- Support for Databricks Foundation Models API. Requires two environment variables to be set:
DATABRICKS_API_KEY
andDATABRICKS_HOST
(the part of the URL before/serving-endpoints/
) - Experimental support for API tools to enhance your LLM workflows:
Experimental.APITools.create_websearch
function which can execute and summarize a web search (incl. filtering on specific domains). It requiresTAVILY_API_KEY
to be set in the environment. Get your own key from Tavily - the free tier enables c. 1000 searches/month, which should be more than enough to get started.
- Added an option to reduce the "batch size" for the embedding step in building the RAG index (
build_index
,get_embeddings
). Setembedding_kwargs = (; target_batch_size_length=10_000, ntasks=1)
if you're having some limit issues with your provider. - Better error message if RAGTools are only partially imported (requires
LinearAlgebra
andSparseArrays
to load the extension).
- [BREAKING CHANGE] The default embedding model (
MODEL_EMBEDDING
) changes to "text-embedding-3-small" effectively immediately (lower cost, higher performance). The default chat model (MODEL_CHAT
) will be changed by OpenAI to 0125 (from 0613) by mid-February. If you have older embeddings or rely on the exact chat model version, please set the model explicitly in your code or in your preferences. - New OpenAI models added to the model registry (see the release notes).
- "gpt4t" refers to whichever is the latest GPT-4 Turbo model ("gpt-4-0125-preview" at the time of writing)
- "gpt3t" refers to the latest GPT-3.5 Turbo model version 0125, which is 25-50% cheaper and has updated knowledge (available from February 2024, you will get an error in the interim)
- "gpt3" still refers to the general endpoint "gpt-3.5-turbo", which OpenAI will move to version 0125 by mid-February (ie, "gpt3t" will be the same as "gpt3" then. We have reflected the approximate cost in the model registry but note that it will be incorrect in the transition period)
- "emb3small" refers to the small version of the new embedding model (dim=1536), which is 5x cheaper than Ada and promises higher quality
- "emb3large" refers to the large version of the new embedding model (dim=3072), which is only 30% more expensive than Ada
- Improved AgentTools: added more information and specific methods to
aicode_feedback
anderror_feedback
to pass more targeted feedback/tips to the AIAgent - Improved detection of which lines were the source of error during
AICode
evaluation + forcing the error details to be printed inAICode(...).stdout
for downstream analysis. - Improved detection of Base/Main method overrides in
AICode
evaluation (only warns about the fact), but you can usedetect_base_main_overrides(code)
for custom handling
- Fixed typos in the documentation
- Fixed a bug when API keys set in ENV would not be picked up by the package (caused by inlining of the
get(ENV,...)
during precompilation) - Fixed string interpolation to be correctly escaped when evaluating
AICode
- Split
Experimental.RAGTools.build_index
into smaller functions to easier sharing with other packages (get_chunks
,get_embeddings
,get_metadata
) - Added support for Cohere-based RAG re-ranking strategy (and introduced associated
COHERE_API_KEY
global variable and ENV variable)
- Fixed
split_by_length
to not mutateseparators
argument (appeared in RAG use cases where we repeatedly apply splits to different documents)
- Initial support for Llama.jl and other local servers. Once your server is started, simply use
model="local"
to route your queries to the local server, eg,ai"Say hi!"local
. Option to permanently set theLOCAL_SERVER
(URL) added to preference management. See?LocalServerOpenAISchema
for more information. - Added a new template
StorytellerExplainSHAP
(see the metadata)
- Repeated calls to Ollama models were failing due to missing
prompt_eval_count
key in subsequent calls.
- Added new Experimental sub-module AgentTools introducing
AICall
(incl.AIGenerate
), andAICodeFixer
structs. The AICall struct provides a "lazy" wrapper for ai* functions, enabling efficient and flexible AI interactions and building Agentic workflows. - Added the first AI Agent:
AICodeFixer
which iteratively analyzes and improves any code provided by a LLM by evaluating it in a sandbox. It allows a lot of customization (templated responses, feedback function, etc.) See?AICodeFixer
for more information on usage and?aicodefixer_feedback
for the example implementation of the feedback function. - Added
@timeout
macro to allow for limiting the execution time of a block of code inAICode
viaexecution_timeout
kwarg (prevents infinite loops, etc.). See?AICode
for more information. - Added
preview(conversation)
utility that allows you to quickly preview the conversation in a Markdown format in your REPL. RequiresMarkdown
package for the extension to be loaded. - Added
ItemsExtract
convenience wrapper foraiextract
when you want to extract one or more of a specificreturn_type
(eg,return_type = ItemsExtract{MyMeasurement}
)
- Fixed
aiembed
to accept any AbstractVector of documents (eg, a view of a vector of documents)
@ai_str
macros now support multi-turn conversations. Theai"something"
call will automatically remember the last conversation, so you can simply reply withai!"my-reply"
. If you send another message withai""
, you'll start a new conversation. Same for the asynchronous versionsaai""
andaai!""
.- Created a new default schema for Ollama models
OllamaSchema
(replacingOllamaManagedSchema
), which allows multi-turn conversations and conversations with images (eg, with Llava and Bakllava models).OllamaManagedSchema
has been kept for compatibility and as an example of a schema where one provides the prompt as a string (not dictionaries like OpenAI API).
- Removed template
RAG/CreateQAFromContext
because it's a duplicate ofRAG/RAGCreateQAFromContext
- Experimental sub-module RAGTools providing basic Retrieval-Augmented Generation functionality. See
?RAGTools
for more information. It's all nested inside ofPromptingTools.Experimental.RAGTools
to signify that it might change in the future. Key functions arebuild_index
andairag
, but it also provides a suite to make evaluation easier (see?build_qa_evals
and?run_qa_evals
or just see the exampleexamples/building_RAG.jl
)
- Stricter code parsing in
AICode
to avoid false positives (code blocks must end with "```\n" to catch comments inside text) - Introduced an option
skip_invalid=true
forAICode
, which allows you to include only code blocks that parse successfully (useful when the code definition is good, but the subsequent examples are not), and an optioncapture_stdout=false
to avoid capturing stdout if you want to evaluateAICode
in parallel (Pipe()
that we use is NOT thread-safe) OllamaManagedSchema
was passing an incorrect model name to the Ollama server, often serving the default llama2 model instead of the requested model. This is now fixed.- Fixed a bug in kwarg
model
handling when leveraging PT.MODEL_REGISTRY
- Improved AICode parsing and error handling (eg, capture more REPL prompts, detect parsing errors earlier, parse more code fence types), including the option to remove unsafe code (eg,
Pkg.add("SomePkg")
) withAICode(msg; skip_unsafe=true, vebose=true)
- Added new prompt templates:
JuliaRecapTask
,JuliaRecapCoTTask
,JuliaExpertTestCode
and updatedJuliaExpertCoTTask
to be more robust against early stopping for smaller OSS models - Added support for MistralAI API via the MistralOpenAISchema(). All their standard models have been registered, so you should be able to just use
model="mistral-tiny
in youraigenerate
calls without any further changes. Remember to either provideapi_kwargs.api_key
or ensure you have ENV variableMISTRALAI_API_KEY
set. Note: This has been since changed toMISTRAL_API_KEY
to be compatible with the Mistral docs, refer to the versions after v0.65. - Added support for any OpenAI-compatible API via
schema=CustomOpenAISchema()
. All you have to do is to provide yourapi_key
andurl
(base URL of the API) in theapi_kwargs
keyword argument. This option is useful if you use Perplexity.ai, Fireworks.ai, or any other similar services.
- Introduced a set of utilities for working with generate Julia code (Eg, extract code-fenced Julia code with
PromptingTools.extract_code_blocks
) or simply applyAICode
to the AI messages.AICode
tries to extract, parse and eval Julia code, if it fails both stdout and errors are captured. It is useful for generating Julia code and, in the future, creating self-healing code agents - Introduced ability to have multi-turn conversations. Set keyword argument
return_all=true
andai*
functions will return the whole conversation, not just the last message. To continue a previous conversation, you need to provide it to a keyword argumentconversation
- Introduced schema
NoSchema
that does not change message format, it merely replaces the placeholders with user-provided variables. It serves as the first pass of the schema pipeline and allow more code reuse across schemas - Support for project-based and global user preferences with Preferences.jl. See
?PREFERENCES
docstring for more information. It allows you to persist your configuration and model aliases across sessions and projects (eg, if you would like to default to Ollama models instead of OpenAI's) - Refactored
MODEL_REGISTRY
aroundModelSpec
struct, so you can record the name, schema(!) and token cost of new models in a single place. The biggest benefit is that yourai*
calls will now automatically lookup the right model schema, eg, no need to define schema explicitly for your Ollama models! See?ModelSpec
for more information and?register_model!
for an example of how to register a new model
- Changed type of global
PROMPT_SCHEMA::AbstractPromptSchema
for an easier switch to local models as a default option
API_KEY
global variable has been renamed toOPENAI_API_KEY
to align with the name of the environment variable and preferences
- Add support for prompt templates with
AITemplate
struct. Search for suitable templates withaitemplates("query string")
and then simply use them withaigenerate(AITemplate(:TemplateABC); variableX = "some value") -> AIMessage
or use a dispatch on the template name as aSymbol
, eg,aigenerate(:TemplateABC; variableX = "some value") -> AIMessage
. Templates are saved as JSON files in the foldertemplates/
. If you add new templates, you can reload them withload_templates!()
(notice the exclamation mark to override the existingTEMPLATE_STORE
). - Add
aiextract
function to extract structured information from text quickly and easily. See?aiextract
for more information. - Add
aiscan
for image scanning (ie, image comprehension tasks). You can transcribe screenshots or reason over images as if they were text. Images can be provided either as a local file (image_path
) or as an url (image_url
). See?aiscan
for more information. - Add support for Ollama.ai's local models. Only
aigenerate
andaiembed
functions are supported at the moment. - Add a few non-coding templates, eg, verbatim analysis (see
aitemplates("survey")
) and meeting summarization (seeaitemplates("meeting")
), and supporting utilities (non-exported):split_by_length
andreplace_words
to make it easy to work with smaller open source models.