Skip to content

Commit

Permalink
Ollama bug fixes (#667)
Browse files Browse the repository at this point in the history
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <[email protected]>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <[email protected]>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <[email protected]>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

* Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651)

* Nightly (#649)

* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <[email protected]>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <[email protected]>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <[email protected]>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

---------

Co-authored-by: Michael Han <[email protected]>
Co-authored-by: Eliot Hall <[email protected]>
Co-authored-by: Rickard Edén <[email protected]>
Co-authored-by: XiaoYang <[email protected]>
Co-authored-by: Oseltamivir <[email protected]>
Co-authored-by: mahiatlinux <[email protected]>
Co-authored-by: Sébastien De Greef <[email protected]>
Co-authored-by: Alberto Ferrer <[email protected]>
Co-authored-by: Thomas Viehmann <[email protected]>
Co-authored-by: Walter Korman <[email protected]>

* Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving

* Implemented better list management and then forgot to actually call the new list variable, fixed

* Check type of given quantization method and return type error if not list or string

* Update save.py

---------

Co-authored-by: Daniel Han <[email protected]>
Co-authored-by: Michael Han <[email protected]>
Co-authored-by: Eliot Hall <[email protected]>
Co-authored-by: Rickard Edén <[email protected]>
Co-authored-by: XiaoYang <[email protected]>
Co-authored-by: Oseltamivir <[email protected]>
Co-authored-by: mahiatlinux <[email protected]>
Co-authored-by: Sébastien De Greef <[email protected]>
Co-authored-by: Alberto Ferrer <[email protected]>
Co-authored-by: Thomas Viehmann <[email protected]>
Co-authored-by: Walter Korman <[email protected]>

* Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652)

This reverts commit 30605de.

* Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653)

This reverts commit e2b2083.

* Update llama.py

* peft

* patch

* Update loader.py

* retrain

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* offload

* Update llama.py

* Create a starter script for command-line training to integrate in ML ops pipelines. (#623)

* Update chat_templates.py

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Ollama

* Update chat_templates.py

* ollama

* Update mapper.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

---------

Co-authored-by: Michael Han <[email protected]>
Co-authored-by: Eliot Hall <[email protected]>
Co-authored-by: Rickard Edén <[email protected]>
Co-authored-by: XiaoYang <[email protected]>
Co-authored-by: Oseltamivir <[email protected]>
Co-authored-by: mahiatlinux <[email protected]>
Co-authored-by: Sébastien De Greef <[email protected]>
Co-authored-by: Alberto Ferrer <[email protected]>
Co-authored-by: Thomas Viehmann <[email protected]>
Co-authored-by: Walter Korman <[email protected]>
Co-authored-by: ArcadaLabs-Jason <[email protected]>
  • Loading branch information
12 people authored Jun 19, 2024
1 parent 8770308 commit c053e42
Show file tree
Hide file tree
Showing 3 changed files with 126 additions and 65 deletions.
103 changes: 47 additions & 56 deletions unsloth/chat_templates.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
"apply_chat_template",

"test_construct_chat_template",
"create_ollama_modelfile",
]

from transformers import StoppingCriteria, StoppingCriteriaList
Expand Down Expand Up @@ -1079,14 +1078,29 @@ def construct_chat_template( \
)
pass

# Check tokenizer types
tokenizer_name = tokenizer.name_or_path.lower()
if tokenizer_name.startswith(("unsloth/llama-3-8b-instruct", "unsloth/llama-3-70b-instruct")):
# Add <|eot_id|>
extra_eos_tokens.append("<|eot_id|>")
elif ("<|eot_id|>" in extra_eos_tokens or "<|eot_id|>" in chat_template) and \
tokenizer_name.startswith(("unsloth/llama-3-8b", "unsloth/llama-3-70b")):
# Warn
logger.warning(
"Unsloth: Base llama-3 models did not train <|eot_id|>.\n"\
"Please use the instruct version or use <|end_of_text|>"
)
pass
extra_eos_tokens = list(set(extra_eos_tokens))

count_eos = 0
for eos in extra_eos_tokens:
count_eos += len(re.findall(r"{OUTPUT}" + eos.encode("unicode-escape").decode("utf-8"), chat_template))
count_eos += len(re.findall(r"{OUTPUT}" + re.escape(eos), chat_template))
pass
if count_eos == 0:
logger.warning("Unsloth: We automatically added an EOS token to stop endless generations.")
eos = extra_eos_tokens[0]
chat_template = re.sub(r"{OUTPUT}", r"{OUTPUT}" + eos.encode("unicode-escape").decode("utf-8"), chat_template)
chat_template = re.sub(r"{OUTPUT}", r"{OUTPUT}" + eos, chat_template)
pass

# O(N^2) search finding 2 repeatted pieces of text
Expand Down Expand Up @@ -1151,7 +1165,9 @@ def construct_chat_template( \
# Check bos_token is in system prompt
ollama_system = system_part
has_bos_token = False
always_bos_token = False
if tokenizer("A").input_ids[0] == getattr(tokenizer, "bos_token_id", None):
always_bos_token = True
if ollama_system.startswith(tokenizer.bos_token):
has_bos_token = True
ollama_system = ollama_system[len(tokenizer.bos_token):]
Expand All @@ -1166,11 +1182,6 @@ def construct_chat_template( \
input_modelfile = "{{ if .Prompt }}" + input_part .replace("{INPUT}", "{{ .Prompt }}") + "{{ end }}"
output_modelfile = output_part.replace("{OUTPUT}", "{{ .Response }}")

# Check if EOS token is at the end of the output
if not output_modelfile.endswith(tuple(extra_eos_tokens)):
output_modelfile += "{__EOS_TOKEN__}"
pass

# Ollama EOS
ollama_eos = get_ollama_eos_tokens(tokenizer, extra_eos_tokens)
ollama_eos = '\n'.join(f'PARAMETER stop "{eos}"' for eos in ollama_eos)
Expand Down Expand Up @@ -1215,32 +1226,30 @@ def process(part, which, content = "message['content']"):
partial_system = process(system_part, "{SYSTEM}", "messages[0]['content']")
partial_system = partial_system.replace("{SYSTEM}", "")

# If {SYSTEM} is non existent, simply just use the content
if "{SYSTEM}" not in partial_system:
partial_system = "messages[0]['content']"
else:
if "{SYSTEM}" in partial_system:
if default_system_message is None:
raise RuntimeError("Unsloth: Please specify a default system message!")
pass

# Separate the BOS
if has_bos_token:
partial_system = partial_system.replace(tokenizer.bos_token, "", 1)
system_part = system_part .replace(tokenizer.bos_token, "", 1)
pass

partial_system = \
"{% if messages[0]['role'] == 'system' %}"\
"{{ " + partial_system + " }}"\
"{% set loop_messages = messages[1:] %}"
if default_system_message is not None:
full_system = system_part.replace("{SYSTEM}", default_system_message)
if "{SYSTEM}" in system_part:
modelfile += '\nSYSTEM: "' + default_system_message + '"'
pass
partial_system += "{% else %}"\
"{{ '" + full_system + "' }}"\
"{% set loop_messages = messages %}"\
"{% endif %}"

# Add to modelfile
modelfile += '\nSYSTEM "' + full_system + '"'
else:
partial_system += "{% endif %}"
pass
Expand All @@ -1251,6 +1260,22 @@ def process(part, which, content = "message['content']"):
jinja_template = "{{ bos_token }}" + jinja_template
pass

# Check if system part is the same!
jinja_template = re.sub(
r"\{\% if messages\[0\]\['role'\] \=\= 'system' \%\}\{\{ '(.+?)' \}\}"\
r"\{\% set loop\_messages \= messages\[1\:\] \%\}"\
r"\{\% else \%\}\{\{ '\1' \}\}\{\% set loop\_messages \= messages \%\}\{\% endif \%\}"\
r"\{\% for message in loop\_messages \%\}",
r"{{ '\1' }}{% for message in messages %}",
jinja_template, flags = re.MULTILINE | re.DOTALL,
)

# Check jinja tempate for bos
if always_bos_token:
if not jinja_template.startswith("{{ bos_token }}"):
jinja_template = "{{ bos_token }}" + jinja_template
pass

return modelfile, jinja_template
pass

Expand All @@ -1260,7 +1285,7 @@ def test_construct_chat_template():
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", token = token)

template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
chat_template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{SYSTEM}<|eot_id|><|start_header_id|>user<|end_header_id|>
Expand All @@ -1277,7 +1302,11 @@ def test_construct_chat_template():

extra_eos_tokens = None

modelfile, jinja_template = construct_chat_template(template, default_system_message, extra_eos_tokens)
modelfile, jinja_template = construct_chat_template(
tokenizer = tokenizer,
chat_template = chat_template,
extra_eos_tokens = extra_eos_tokens,
)

messages = [
{"role": "system", "content": "You are an assistant"},
Expand All @@ -1291,7 +1320,6 @@ def test_construct_chat_template():

tokenizer.chat_template = jinja_template
new_output = tokenizer.apply_chat_template(messages, tokenize = False, add_generation_prompt = True)

assert(correct_output == new_output)
pass
pass
Expand Down Expand Up @@ -1344,43 +1372,6 @@ def formatting_prompts_func(examples):
pass


def create_ollama_modelfile(tokenizer, gguf_location):
"""
Creates an Ollama Modelfile.
Use ollama.create(model = "new_ollama_model", modelfile = modelfile)
"""
modelfile = getattr(tokenizer, "_ollama_modelfile", None)
if modelfile is None:
raise RuntimeError(
"Unsloth: Tokenizer does not have a `ollama_modelfile` attribute.\n"\
"Please use get_chat_template(...)."
)
pass

system_message = getattr(tokenizer, "_system_message", None)
if system_message is None:
__SYSTEM_MESSAGE__ = ""
else:
__SYSTEM_MESSAGE__ = f'SYSTEM """{system_message}"""'
pass

modelfile = modelfile\
.replace("{{", "⚫@✅#🦥")\
.replace("}}", "⚡@🦥#⛵")\
.format(
__FILE_LOCATION__ = gguf_location,
__SYSTEM_MESSAGE__ = __SYSTEM_MESSAGE__,
__EOS_TOKEN__ = tokenizer.eos_token,
)\
.replace("⚫@✅#🦥", "{{")\
.replace("⚡@🦥#⛵", "}}")\
.rstrip()
pass

return modelfile
pass


def create_stopping_criteria(tokenizer, stop_word = "eos_token"):
class StoppingCriteriaSub(StoppingCriteria):
__slots__ = "stop_token", "single_match", "length",
Expand Down
2 changes: 2 additions & 0 deletions unsloth/models/mapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,11 @@
"TinyLlama/TinyLlama-1.1B-Chat-v1.0",
),
"unsloth/mistral-7b-instruct-v0.1-bnb-4bit" : (
"unsloth/mistral-7b-instruct-v0.1",
"mistralai/Mistral-7B-Instruct-v0.1",
),
"unsloth/mistral-7b-instruct-v0.2-bnb-4bit" : (
"unsloth/mistral-7b-instruct-v0.2",
"mistralai/Mistral-7B-Instruct-v0.2",
),
"unsloth/llama-2-7b-chat-bnb-4bit" : (
Expand Down
86 changes: 77 additions & 9 deletions unsloth/save.py
Original file line number Diff line number Diff line change
Expand Up @@ -891,10 +891,10 @@ def save_to_gguf(
# Map quant methods
new_quantization_method = []
for quant_method in quantization_method:
if quant_method == "not_quantized": quantization_method = model_dtype
elif quant_method == "fast_quantized": quantization_method = "q8_0"
elif quant_method == "quantized": quantization_method = "q4_k_m"
elif quant_method is None: quantization_method = "q8_0"
if quant_method == "not_quantized": quant_method = model_dtype
elif quant_method == "fast_quantized": quant_method = "q8_0"
elif quant_method == "quantized": quant_method = "q4_k_m"
elif quant_method is None: quant_method = "q8_0"

# Check if wrong method
if quant_method not in ALLOWED_QUANTS.keys():
Expand Down Expand Up @@ -978,6 +978,11 @@ def save_to_gguf(
pass
pass

# If only q8_0:
if len(quantization_method) == 1 and quantization_method[0] == "q8_0":
strength = 0
pass

if strength >= 3: first_conversion = "f32"
elif strength >= 2: first_conversion = "f16"
elif strength >= 1: first_conversion = "bf16"
Expand Down Expand Up @@ -1008,7 +1013,7 @@ def save_to_gguf(
n_cpus *= 2
# Concurrency from https://rentry.org/llama-cpp-conversions#merging-loras-into-a-model

final_location = f"./{model_directory}-unsloth.{first_conversion.upper()}.gguf"
final_location = f"./{model_directory}/unsloth.{first_conversion.upper()}.gguf"

print(f"Unsloth: [1] Converting model at {model_directory} into {first_conversion} GGUF format.\n"\
f"The output location will be {final_location}\n"\
Expand Down Expand Up @@ -1072,12 +1077,12 @@ def save_to_gguf(

full_precision_location = final_location

all_saved_locations = []
all_saved_locations = [full_precision_location,]
# Convert each type!
for quant_method in quantization_method:
if quant_method != first_conversion:
print(f"Unsloth: [2] Converting GGUF 16bit into {quant_method}. This will take 20 minutes...")
final_location = f"./{model_directory}-unsloth.{quant_method.upper()}.gguf"
final_location = f"./{model_directory}/unsloth.{quant_method.upper()}.gguf"

command = f"./{quantize_location} {full_precision_location} "\
f"{final_location} {quant_method} {n_cpus}"
Expand Down Expand Up @@ -1365,6 +1370,29 @@ def fix_tokenizer_bos_token(tokenizer):
pass


def create_ollama_modelfile(tokenizer, gguf_location):
"""
Creates an Ollama Modelfile.
Use ollama.create(model = "new_ollama_model", modelfile = modelfile)
"""
modelfile = getattr(tokenizer, "_ollama_modelfile", None)
if modelfile is None: return None

modelfile = modelfile\
.replace("{{", "⚫@✅#🦥")\
.replace("}}", "⚡@🦥#⛵")\
.format(
__FILE_LOCATION__ = gguf_location,
)\
.replace("⚫@✅#🦥", "{{")\
.replace("⚡@🦥#⛵", "}}")\
.rstrip()
pass

return modelfile
pass


def unsloth_save_pretrained_gguf(
self,
save_directory : Union[str, os.PathLike],
Expand Down Expand Up @@ -1500,10 +1528,21 @@ def unsloth_save_pretrained_gguf(
new_save_directory, quantization_method, first_conversion, makefile,
)

# Save Ollama modelfile
modelfile = create_ollama_modelfile(tokenizer, all_file_locations[0])
modelfile_location = None
if modelfile is not None:
modelfile_location = os.path.join(new_save_directory, "Modelfile")
with open(modelfile_location, "w") as file:
file.write(modelfile)
pass
print(f"Unsloth: Saved Ollama Modelfile to {modelfile_location}")
pass

if fix_bos_token:
logger.warning(
f"Unsloth: ##### The current model auto adds a BOS token.\n"\
"Unsloth: ##### We removed in GGUF's chat template for you."
"Unsloth: ##### We removed it in GGUF's chat template for you."
)
pass

Expand All @@ -1520,6 +1559,15 @@ def unsloth_save_pretrained_gguf(
new_save_directory.lstrip('/.')
print(f"Saved GGUF to https://huggingface.co/{link}")
pass

# Save modelfile
if modelfile_location is not None:
username = upload_to_huggingface(
self, save_directory, token,
"GGUF converted", "gguf", modelfile_location, old_username, private,
)
print(f"Saved Ollama Modelfile to https://huggingface.co/{link}")
pass
pass
pass

Expand Down Expand Up @@ -1654,6 +1702,17 @@ def unsloth_push_to_hub_gguf(
new_save_directory, quantization_method, first_conversion, makefile,
)

# Save Ollama modelfile
modelfile = create_ollama_modelfile(tokenizer, all_file_locations[0])
modelfile_location = None
if modelfile is not None:
modelfile_location = os.path.join(new_save_directory, "Modelfile")
with open(modelfile_location, "w") as file:
file.write(modelfile)
pass
print(f"Unsloth: Saved Ollama Modelfile to {modelfile_location}")
pass

for file_location in all_file_locations:
print("Unsloth: Uploading GGUF to Huggingface Hub...")
username = upload_to_huggingface(
Expand All @@ -1667,10 +1726,19 @@ def unsloth_push_to_hub_gguf(
print(f"Saved GGUF to https://huggingface.co/{link}")
pass

# Save modelfile
if modelfile_location is not None:
username = upload_to_huggingface(
self, repo_id, token,
"GGUF converted", "gguf", modelfile_location, old_username, private,
)
print(f"Saved Ollama Modelfile to https://huggingface.co/{link}")
pass

if fix_bos_token:
logger.warning(
f"Unsloth: ##### The current model auto adds a BOS token.\n"\
"Unsloth: ##### We removed in GGUF's chat template for you."
"Unsloth: ##### We removed it in GGUF's chat template for you."
)
pass
pass
Expand Down

0 comments on commit c053e42

Please sign in to comment.