Conditions & Context
Today we are going back to France! On the table is a Mistral 3.1 Small with a decent 24B weight. It will be a tight squeeze onto a 16GB GPU, so I expect some CPU cores being lit up, but let’s see if it did as bad as Gwen. Let’s dive right in!
I picked a very simple prompt which contains a mixture of code generation and some reasoning logic, and writing prowess. What I’m looking for is a number of metrics of interest to me: how much VRAM the model uses, utilization of GPU, wattage and temperature of GPU, CPU utilization, token throughput, total number of tokens written, total time to response. All these are important to me as not only do they match the best model for my hardware, but also provide the best quality of UX for me as an end user. I focus on overall quality of the answer, but most importantly on the reasoning and explanation to someone who is a novice in the field. My goal here is to show whether the model is usable and good enough to help someone who is seeking assistance in learning how to code or write code.
| Specs | Value |
|---|---|
| Linux Distro | Ubuntu Server 24.04.4 LTS |
| Linux Kernel | 6.8.0-101 |
| CPU | Intel CORE i7 14th Gen 14700K Cores: 8P/12E Threads: 28 |
| Motherboard | MSI PRO B660M-A |
| RAM | 80 GB DDR4 (32+16+32+16) |
| SSD | Crucial NVME 1TB |
| GPU | MSI NVidia GeForce RTX 5060Ti Shadow 2X OC PCIe 5.0×8 |
| CUDA Cores | 4,608 |
| VRAM | 16 GB GDDR7 128-bit 448 GB/s |
| GPU Driver | NVidia 590.48.01 |
| CUDA version | 13.1 |
| Ollama version | 0.17.4 |
| Model | Mistral3.1 Small 24B |
| Quantization | Q4_K_M |
The Prompt
Write a simple Python function that checks if a number is prime.Explain how it works in plain English, like you're teachinga beginner.
The Results
Let’s face it: a nearly 15GB model on a 16 GB VRAM graphics card is just about pushing the limits, and then some. Not quite thrown over the fence yet, but gingerly teetering on top of it, unable to decide which way to fall. This model is a chugger on my 5060Ti. No bones about it. Barely eking out 20 tokens a second while GPT-OSS 20B would run nearly five times as many. But…. there is a big “but”.
This model is GOOD!
| Model | Quant | Run | Tokens/s | Total Time (s) | Tokens Written | VRAM (GB) | GPU Util |
|---|---|---|---|---|---|---|---|
| Mistral Small 3.1 | Q4_K_M | 1 | 20.17 | 29s | 593 | 14.4 | 68% |
| Mistral Small 3.1 | Q4_K_M | 2 | 20.06 | 33s | 656 | 14.4 | 68% |
| Mistral Small 3.1 | Q4_K_M | 3 | 19.91 | 31s | 614 | 14.4 | 68% |
14.4GB VRAM — locked and immovable across all three runs. On a 16GB card, that leaves exactly 1.6GB of breathing room for active context and KV cache. And let’s face it: you will want to stick to OWUI’s 2k default context window size. Anything bigger and you are toast, jetting of to the CPU land and going for a coffee break. The model ran at a steady 68% GPU utilization — notably below the 97% rail-pinned behavior I saw from Ministral 8B. That’s not laziness on the GPU’s part. That’s memory bandwidth being the real bottleneck, not compute. Token throughput held rock solid too: 20.17, 20.06, 19.91. No warmup lag, no degradation. You get exactly what it says on the label, every single run. That makes it predictable. That is good news.
Token counts were equally steady: 593 → 656 → 614. No progressive wordiness, no warmup effect like Ministral 8B’s climbing runs. This model finds its output level and parks there. The shocker is the quality. Clean code, structured walkthrough, accurate step-by-step explanation aimed squarely at a beginner. For a model pressed this close to the VRAM ceiling, it has no business being this competent on my kind of GPU. Speed is not Mistral Small 3.1’s strength on a measly 16GB card — but the quality is!
The Conclusion – TL;DR
Speed is not everything. Not always. There is time when you have a rather slow model for your hardware and its replies feel meh. They just don’t light your soul on fire. So, you just <strong>sudo rm [fill-in-the-blank]</strong> and move on some better things.
But then there may be models, which albeit slower, have pizzazz. They sound good, they are accurate, they go beyond what other models of this size usually bring to the table, and in some situations, they can sound as convincing as a 600B frontier model you are used to chatting with.
Is this such a unicorn? I don’t know, but I will try to tweak this Mistral with a system prompt and see what it has to offer.
Afterall — humans are mostly comfortable reading text at about 15 tokens/sec, so 20 is slow, but still perfectly in the vicinity of comfort. Would I want this model to give me long answers? Hell no!
That would take a coffee break.
But as a daily driver for short queries while knowing it is surprisingly accurate and capable? That is what I’m setting to find out. I sense a sequel coming up on this one…
P.S.:
This model reminded me in a heavy-handed way of the age-old conundrum we local model runners all face:
The moment you unbox your brand-spanking new GPU you wishyou had bought one with more VRAM! This never ever ends, does it?
The Output (unedited)
Sure! Let’s write a simple Python function to check if a number is prime. A prime number is a number that is greater than 1 and has no divisors other than 1 and itself.
Here’s the function:
def is_prime(n): """Check if a number is prime.""" if n <= 1: return False if n == 2: return True # 2 is the only even prime number if n % 2 == 0: return False # Any other even number is not prime # Check for factors from 3 to the square root of n for i in range(3, int(n**0.5) + 1, 2): if n % i == 0: return False return True
Explanation in Plain English:
- Check if the number is less than or equal to 1:
- If
nis 1 or less, it’s not a prime number. So, we returnFalse.
- If
- Check if the number is 2:
- If
nis 2, it’s a prime number. So, we returnTrue.
- If
- Check if the number is even and greater than 2:
- If
nis an even number and greater than 2, it’s not a prime number. So, we returnFalse.
- If
- Check for factors from 3 to the square root of
n:- We only need to check for factors up to the square root of
nbecause ifnhas a factor larger than its square root, it must also have a corresponding factor smaller than its square root. - We start checking from 3 and skip even numbers by incrementing by 2 each time (
range(3, int(n**0.5) + 1, 2)). - If we find any number
ithat dividesnevenly (i.e.,n % i == 0), thennis not a prime number. So, we returnFalse.
- We only need to check for factors up to the square root of
- If no factors are found:
- If we didn’t find any factors, then
nis a prime number. So, we returnTrue.
- If we didn’t find any factors, then
Example Usage:
print(is_prime(11)) # Output: True (11 is a prime number)print(is_prime(4)) # Output: False (4 is not a prime number)print(is_prime(2)) # Output: True (2 is a prime number)print(is_prime(1)) # Output: False (1 is not a prime number)
This function should work well for checking if a number is prime in a straightforward and efficient manner.
Need Help?
Need help streamlining your processes or solving tricky business problems? I offer one-on-one consultations to get you unstuck fast. Book a free consultation with me today at goarcherdynamics.com.
Want more practical tips and workflow hacks? I publish them regularly on my blog — check it out and subscribe for newsletter updates: goarcherdynamics.com

Leave a Reply