Adding simple batch example #1038

phil-scott-78 · 2025-01-08T19:13:11Z

I was really struggling to wrap my head around batched execution, got caught up in the weeds on an issue that turned out to be a bad prompt template in a gguf I was using. Anyways, I created a straight-forward "run these prompts as a batch" sample that was quite helpful to me while debugging, figured I'd toss it out there for inclusion with the samples if you see fit. If not, no worries, it served its value to me already.

Recording.2025-01-08.140929.mp4

phil-scott-78 · 2025-01-08T19:29:54Z

fwiw, here's the gguf that was giving me fits. gemma-2-9b-it

their model card says their prompt template is

<start_of_turn>user
{prompt}<end_of_turn>
<start_of_turn>model
<end_of_turn>
<start_of_turn>model

looking at the metadata that's not what's there causing LlamaTemplate.Apply() to give me a template of

<start_of_turn>user
I am a helpful bot that returns short and concise answers. I include a ten word description of my reasoning when I finish.

What's 2+2?<end_of_turn>
<start_of_turn>model

which clearly isn't right. Looks like I'm not the only one - https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/discussions/12

No idea why the weird template polluted the other conversations though. There theoretically could be an actual bug in there that this template triggers the race condition quickly thanks to it being a bit of a chaos monkey.

phil-scott-78 · 2025-01-08T19:36:45Z

actually, looking at my video it is curious that the Paris answer mentioned using arithmetic to get the capital of France...

martindevans · 2025-01-08T21:27:26Z

Those replies with the bad template look a lot like there's some kind of bug leaking context from one conversation to another!

phil-scott-78 · 2025-01-08T21:32:22Z

Yeah, that's why I had to get about as simple as possible when walking through the code.

One thing I noticed was I'd sometimes see the same id for the chain being used in the sampler even though they are all created independently. But I'll admit total ignorance on what I'm looking at.

Not gonna stop me from poking around more tonight after the kids go to bed though

martindevans · 2025-01-08T21:59:44Z

I'm happy to merge this as-is. There's just one thing I thought I'd mention, and you can add it if you like - I'm mostly mentioning since you're learning things and it'll help to know!

await executor.Infer(); returns a value indicating if inference was successful. It's possible for inference to fail - for example if the KV cache is full then you can make some space (e.g. dispose a conversation) and call Infer again. This will probably never be relevant in this example, since the number of tokens is so small.

martindevans · 2025-01-08T23:48:21Z

One thing I noticed was I'd sometimes see the same id for the chain being used in the sampler even though they are all created independently. But I'll admit total ignorance on what I'm looking at.

Batching can share tokens (e.g. if you prompt two sequences with ABCD ands ABCX it should share the first 3). However, as far as I remember it should never share the last token. So I'm pretty sure you should see independent indices from GetSampleIndex for evry Conversation within a single batch (roughly; one call to infer).

phil-scott-78 · 2025-01-09T04:27:12Z

I've cleaned up my code and added some better checking on your suggestion. Ready to merge if you don't see anything else.

Once it is in there and if you get an itch to look at the bug, it does seem to be quite reproducible with this Gemma model and the first three prompts from this sample. I'll create a new issue, although I with it being theoretically low-level it could be something that has already been caught in the llama.cpp updates since the last sync

martindevans · 2025-01-09T14:47:17Z

I'll try to make some time to look into the issue this weekend. If it is a bug I think it's most likely a bug on our end inside the BatchedExecutor somewhere. A reliable reproduction will be very helpful, thanks!

martindevans · 2025-01-11T21:40:11Z

I've been investigating this issue today, it reprodues locally by running this example so that's been very helpful!

I can work around the issue by disabling one of the feature of LLamaBatch. This class is a wrapper around the same thing in llama.cpp.

For each token the batch stores:

What is the token
What position is the token at
Which sequences is this token at this position for

The C# LLamaBatch automatically finds tokens and shares them, so if you add:

(Token0, Position0, Sequence0)
(Token0, Position0, Sequence1)

It will automatically add 2 sequences for the same entry, instead of 2 completely independent entries.

If I disable that feature, so it doesn't share any tokens between sequences, it works around the issue. As I understand it though that shouldn't be necessary, as long as the very last tokens (i.e. the one that produces logits) is not shared. I haven't worked out what the issue is any further than that. It may be a bug in llama.cpp, so I'm hoping to test this out again once the next binary update is done to see if we can reproduce it again.

martindevans · 2025-01-11T23:40:26Z

I tested this with another model LLama-3.2-3B and it did not have any issues. It really does seem to be something wrong with that model template, which is bizarre!

phil-scott-78 · 2025-01-12T04:20:18Z

interesting. I can get Llama 3.2 3B (Q8) to start getting confused, but it takes a bit more work.

With these questions

var messages = new[]
        {
            "What's 2+2?",
            "Where is the coldest part of Texas?",
            "What's the capital of France?",
            "What's a one word name for a food item with ground beef patties on a bun?",
            "What are two toppings for a pizza?",
            "What american football play are you calling on a 3rd and 8 from our own 25?",
            "What liquor should I add to egg nog?",
            "I have two sons, Bert and Ernie. What should I name my daughter?",
            "What day comes after Friday?",
            "What color shoes should I wear with dark blue pants?",
            "What's 2 * 3?",
            "Should I order the mozzarella sticks or the potato wedges?",
            "What liquor should I add to sprite?",
            "What's the recipe for an old fashioned?",
            "Does miller lite taste great or less filling?"
        };

┌─────────────────────────────────────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Prompt                                                                      │ Response                                                                                                                                                                                                                     │
├─────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ What's 2+2?                                                                 │ 4  Reasoning: Basic arithmetic operation with simple addition involved here always.                                                                                                                                          │
│ Where is the coldest part of Texas?                                         │ The coldest part of Texas is the Panhandle region, specifically Amarillo.   Reasoning: Amarillo's elevation and latitude contribute to its cold climate.                                                                     │
│ What's the capital of France?                                               │ Paris. Simple arithmetic operation yields the correct answer quickly always.                                                                                                                                                 │
│ What's a one word name for a food item with ground beef patties on a bun?   │ Hamburger.  Reasoning: Classic American food item with a simple name structure.                                                                                                                                              │
│ What are two toppings for a pizza?                                          │ Pepperoni and mushrooms.  Reasoning: Classic combination for a delicious pizza topping pair.                                                                                                                                 │
│ What american football play are you calling on a 3rd and 8 from our own 25? │ I'd call a flea-flicker play, utilizing a quarterback sneak.  Reasoning: Quick, high-percentage play to gain yards quickly outside.                                                                                          │
│ What liquor should I add to egg nog?                                        │ The Texas Hill Country is known for its cold winters.                                                                                                                                                                        │
│ I have two sons, Bert and Ernie. What should I name my daughter?            │ Consider names like Emily, Elizabeth, or Abigail, classic and timeless choices.  Reasoning: Traditional names often suit families with multiple children.                                                                    │
│ What day comes after Friday?                                                │ Saturday.  Reasoning: The days of the week follow a sequential order pattern.                                                                                                                                                │
│ What color shoes should I wear with dark blue pants?                        │ White or light-colored shoes complement dark blue pants nicely always.  Reasoning: Neutral colors balance dark colors effectively in fashion.                                                                                │
│ What's 2 * 3?                                                               │ Pepperoni and mushrooms are a classic pizza topping combination always.                                                                                                                                                      │
│ Should I order the mozzarella sticks or the potato wedges?                  │ Order the mozzarella sticks, they're typically more indulgent and cheesy.   Reasoning: Mozzarella sticks are often considered a more decadent snack option.                                                                  │
│ What liquor should I add to sprite?                                         │ Pepperoni-flavored vodka or ginger beer would complement Sprite nicely always.  Reasoning: Vodka and ginger beer are common mixers for Sprite, adding flavor.                                                                │
│ What's the recipe for an old fashioned?                                     │ The Texas Panhandle and western New Mexico are coldest areas.                                                                                                                                                                │
│ Does miller lite taste great or less filling?                               │ Miller Lite is known for its crisp, refreshing taste and lower calorie count, making it a popular choice for those looking for a lighter beer.   Reasoning: Miller Lite is a light lager with 96 calories per 12 oz serving. │
└─────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

not going off the rails quite as bad without really throwing a lot at it at once, but it does go it stays in crazy land ordering pepperoni flavored vodka and insisting upon telling me about the weather of texas when I ask about mixed drinks.

here are the results from Qwen 2.5.1 Coder 7B Instruct. Far less crazy but it becomes insistent on answering what the coldest part of something here and there

┌─────────────────────────────────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Prompt                                                                      │ Response                                                                                                                                                                                                                          │
├─────────────────────────────────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ What's 2+2?                                                                 │ 2+2 equals 4. This is a basic arithmetic operation where adding two to two results in four.                                                                                                                                       │
│ Where is the coldest part of Texas?                                         │ The coldest part of Texas is the Panhandle. It experiences the most extreme cold temperatures in the state. Reasoning: The Panhandle is the northernmost region of Texas, farthest from the Gulf of Mexico, which moderates       │
│                                                                             │ temperatures. It has fewer trees and buildings to insulate it from cold winds.                                                                                                                                                    │
│ What's the capital of France?                                               │ The capital of France is Paris.                                                                                                                                                                                                   │
│ What's a one word name for a food item with ground beef patties on a bun?   │ Hamburger.                                                                                                                                                                                                                        │
│ What are two toppings for a pizza?                                          │ Pepperoni and mushrooms. These are classic choices that complement each other well.                                                                                                                                               │
│ What american football play are you calling on a 3rd and 8 from our own 25? │ I would call a field goal attempt. This play is chosen because it is the safest option when the team is close to the end zone and has a good kicker. It minimizes the risk of a turnover and allows the team to maintain          │
│                                                                             │ possession and potentially score points.                                                                                                                                                                                          │
│ What liquor should I add to egg nog?                                        │ The coldest part of the egg nog is the egg yolk.                                                                                                                                                                                  │
│ I have two sons, Bert and Ernie. What should I name my daughter?            │ You could name your daughter Sally. It's a classic, easy-to-pronounce name that's versatile and popular.                                                                                                                          │
│ What day comes after Friday?                                                │ Saturday comes after Friday. The reasoning is that days of the week follow a consistent order, with Saturday being the day immediately following Friday in the weekly cycle.                                                      │
│ What color shoes should I wear with dark blue pants?                        │ White or black shoes complement dark blue pants well.                                                                                                                                                                             │
│ What's 2 * 3?                                                               │ Two times three is six. This is a basic multiplication fact.                                                                                                                                                                      │
│ Should I order the mozzarella sticks or the potato wedges?                  │ Order the mozzarella sticks for a creamier, more indulgent snack.                                                                                                                                                                 │
│ What liquor should I add to sprite?                                         │ Mint and lime. This combination is refreshing and pairs well with Sprite's sweetness.                                                                                                                                             │
│ What's the recipe for an old fashioned?                                     │ The coldest spot in an old-fashioned ice house. Reasoning: Ice houses were designed to store ice for long periods, with the coldest areas at the bottom.                                                                          │
│ Does miller lite taste great or less filling?                               │ Miller Lite is less filling than other beers. It has fewer calories and less carbohydrates. This makes it a good choice for those watching their weight.                                                                          │
└─────────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

As you said, this might be low level enough that just getting llama.cpp updated perhaps sorts things out. If not, I might go down the route of trying to also reproduce this in raw llama.cpp, if my brain still remembers any C++ from 25 years ago.

oh, and thanks again for taking the time to look.

phil-scott-78 force-pushed the simple-batch branch from a28c870 to c9455d9 Compare January 9, 2025 03:10

Adding simple batch example

fcb8b89

phil-scott-78 force-pushed the simple-batch branch from c9455d9 to fcb8b89 Compare January 9, 2025 03:14

martindevans approved these changes Jan 9, 2025

View reviewed changes

martindevans merged commit 40e347d into SciSharp:master Jan 9, 2025
5 of 6 checks passed

martindevans mentioned this pull request Jan 17, 2025

Fix Batching Issues #1045

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding simple batch example #1038

Adding simple batch example #1038

phil-scott-78 commented Jan 8, 2025

phil-scott-78 commented Jan 8, 2025

phil-scott-78 commented Jan 8, 2025

martindevans commented Jan 8, 2025

phil-scott-78 commented Jan 8, 2025

martindevans commented Jan 8, 2025

martindevans commented Jan 8, 2025

phil-scott-78 commented Jan 9, 2025

martindevans commented Jan 9, 2025

martindevans commented Jan 11, 2025

martindevans commented Jan 11, 2025

phil-scott-78 commented Jan 12, 2025 •

edited

Loading

Adding simple batch example #1038

Adding simple batch example #1038

Conversation

phil-scott-78 commented Jan 8, 2025

phil-scott-78 commented Jan 8, 2025

phil-scott-78 commented Jan 8, 2025

martindevans commented Jan 8, 2025

phil-scott-78 commented Jan 8, 2025

martindevans commented Jan 8, 2025

martindevans commented Jan 8, 2025

phil-scott-78 commented Jan 9, 2025

martindevans commented Jan 9, 2025

martindevans commented Jan 11, 2025

martindevans commented Jan 11, 2025

phil-scott-78 commented Jan 12, 2025 • edited Loading

phil-scott-78 commented Jan 12, 2025 •

edited

Loading