Model conditions & context
Today we are looking at the smallest member of Mistral’s 3rd generation family — the Ministral 3B. As tiny as it is, it’s weight is actually close to 4B. Now, despite the shortcomings and propensity to use bullet points and to be constantly of service and trying to be helpful, Mistral models are some of my most favorite. They are chatty, and have this peculiar flair. Let’s see how this tiny giant did and dive right in.
I picked a very simple prompt which contains a mixture of code generation and some reasoning logic, and writing prowess. What I’m looking for is a number of metrics of interest to me: how much VRAM the model uses, utilization of GPU, wattage and temperature of GPU, CPU utilization, token throughput, total number of tokens written, total time to response. All these are important to me as not only do they match the best model for my hardware, but also provide the best quality of UX for me as an end user. I focus on overall quality of the answer, but most importantly on the reasoning and explanation to someone who is a novice in the field. My goal here is to show whether the model is usable and good enough to help someone who is seeking assistance in learning how to code or write code.
| Specs | Value |
|---|---|
| Linux Distro | Ubuntu Server 24.04.4 LTS |
| Linux Kernel | 6.8.0-101 |
| CPU | Intel CORE i7 14th Gen 14700K Cores: 8P/12E Threads: 28 |
| Motherboard | MSI PRO B660M-A |
| RAM | 80 GB DDR4 (32+16+32+16) |
| SSD | Crucial NVME 1TB |
| GPU | MSI NVidia GeForce RTX 5060Ti Shadow 2X OC PCIe 5.0×8 |
| CUDA Cores | 4,608 |
| VRAM | 16 GB GDDR7 128-bit 448 GB/s |
| GPU Driver | NVidia 590.48.01 |
| CUDA version | 13.1 |
| Ollama version | 0.23.0 |
| Model | Ministral 3 3B |
| Quantization | Q4 K M |
The prompt
Write a simple Python function that checks if a number is prime. Explain how it works in plain English, like you're teaching a beginner.
The results
As I’ve said above Ministral-3 3B is the smallest member of Mistral’s Ministral-3 family, and it is surprisingly capable. At Q4_K_M quantization it loads in just 4.3GB of VRAM, leaving the bulk of my RTX 5060Ti free for other tasks (cough cough … other models… LOL). It averaged 155 tokens per second across three consecutive runs — faster than any model we have benchmarked to date, including the Nemotron Mini from a few days ago.
| Model | Quant | Run | Tokens/sec | Total Time | Tokens Written | VRAM | GPU Util |
|---|---|---|---|---|---|---|---|
| Ministral 3 3B | Q4_K_M | 1 | 159.09 | 3.5s | 523 | 4.3GB | 94% |
| Ministral 3 3B | Q4_K_M | 2 | 154.61 | 3.7s | 570 | 4.3GB | 94% |
| Ministral 3 3B | Q4_K_M | 3 | 150.52 | 4.7s | 712 | 4.3GB | 94% |
What makes this result even more interesting is the context behavior. Each run accumulated tokens from previous turns, growing from 621 to 1,174 to 1,774 input tokens respectively. Speed dropped from 159 to 154 to 150 tokens per second — a 6% decline as context nearly tripled. For a 3B model that is what I’d consider exceptional context scaling. It holds up. And let’s remind ourselves that even that last run with 150 t/s is still stupidly fast.
Output quality held up across all three runs. The model produced correct, well-structured Python with appropriate optimization — at least that is what Claude is trying to convince me, apparently approving of Mistral’s creations. Run 2 produced a simpler unoptimized version, but run 3 recovered with the best output of the set — more examples, clearer explanation, full sqrt optimization. And that right there rang a big bell for me! I’ve seen this before. It’s bigger siblings did something in that vein: second run simpler, third run was a home run. As far as context window is concern, looks like the bolts are holding up and the model help up well with no hallucinations or loss of task awareness and was pinpoint accurate. in the interest of full disclosure, I did not stretch it that far and even though it has an insane maximum context window ceiling of 262k, I had OWUI set up for measly 4k. Never reached it.
The conclusion – TL;DR
Ministral-3 3B — 3GB VRAM, 155 t/s, 256K context, vision and tool use included. What else do you want? The fastest model we’ve benchmarked. If you need a small, fast, capable local model, this is it right here and you should give it a try. Adjust your expectations if you want it to be a generalist with a lot of trivia knowledge. It may or may not take you on an acid trip. I haven’t tried to push it to hallucinate. But for lightweight tasks, quick coding help, or fast local inference where VRAM is at a premium, Ministral-3 3B is a serious contender.
It is not every day to see a 3B model which has vision capability – that means you can feed it images or screenshots of your spreadsheets or PDF documents and it will extract the content.
That is invaluable. And that gargantuan 260k token context window? I’ll just say this: consider this scenario: you have a very long text document — or better yet a procedure or policy library, which you need to read and overhaul. And you need to find relationships between various policies. Lots of text. Lots..
A model like this could ingest this data, read it, and help you figure it out. All the while it was running on a simple laptop with a 4GB graphics card. Times are changing…
Anyone who despises AI and rejects it will later resent their decision. I’ve been saying it for a few years now: AI is not here to replace our jobs. But it will replace the jobs of those who refuse to embrace it for the same of efficiency.
AI is not an end-all be-all. And right now it is an overblown bubble. It’ll burst, self-correct, but it ain’t going anywhere. It’s here to stay. And now, having such a capable model running on an ordinary laptop?
That makes counter-arguments sound like excuses.
You get my point…
The AI output (pasted & unedited)
Simple Prime Number Checker (Beginner-Friendly!)
Here’s a super easy Python function to check if a number is prime, explained step-by-step like we’re teaching a kid:
Python Code:
def is_prime(num): if num <= 1: # 1 or smaller? NOPE, not prime! return False if num == 2: # 2 is the ONLY even prime number! return True if num % 2 == 0: # If divisible by 2, it’s NOT prime (e.g., 4, 6, 8) return False # Check divisors from 3 up to sqrt(num) (only odd numbers) for i in range(3, int(num**0.5) + 1, 2): if num % i == 0: # If divisible by ANY number, it’s NOT prime! return False return True # If no divisors found, IT’S PRIME!
How It Works (Plain English):
Imagine you’re checking if 17 is prime:
- First, we rule out easy cases:
- If the number is 1 or less → Not prime (e.g., 0, 1, 4).
- If the number is 2 → Prime! (2 is the only even prime number).
- If the number is even and bigger than 2 (like 4, 6, 8) → Not prime (because it’s divisible by 2).
- Now, we check divisors:
- For 17, we check numbers from 3 up to √17 (≈4.12), but only odd numbers (3, 5, etc.).
- 17 ÷ 3? No (remainder is 2) → Keep going.
- 17 ÷ 5? No (remainder is 2) → STOP! No divisors found!
- For 17, we check numbers from 3 up to √17 (≈4.12), but only odd numbers (3, 5, etc.).
- If no divisors divide evenly, the number is prime!
- Since 17 isn’t divisible by any number except itself, it’s prime!
Example Runs:
print(is_prime(7)) # True (7 is prime)print(is_prime(10)) # False (10 is not prime)print(is_prime(1)) # False (1 is not prime)print(is_prime(13)) # True (13 is prime)print(is_prime(15)) # False (15 is not prime)
Why This Works (Optimized Version):
- Faster than checking every number (e.g., for 100,000, it only checks up to 316).
- Skips even numbers (since they can’t be prime except for 2).
- Stops at the square root (because if a number has a factor bigger than its square root, it already has a smaller factor).
Try It Yourself!
Run the code and test it with different numbers. Want me to explain any part in more detail? 😊
Need Help?
Need help streamlining your processes or solving tricky business problems? I offer one-on-one consultations to get you unstuck fast. Book a free consultation with me today at goarcherdynamics.com.
Want more practical tips and workflow hacks? I publish them regularly on my blog — check it out and subscribe for newsletter updates: goarcherdynamics.com

Leave a Reply