Ministral 3 14B – Can Now See!

Model conditions & context

Today we are looking at Mistral’s very fresh model (I think this is from December 2025) Ministral 3 14B in Q4 K M quantization variant in Instruct weight. So, this is a much younger model than the 12B Mistral I recently tested and really liked. Let’s see how it did and dive right in!

As in all my tests, I use the same prompt, the same hardware, and the same methodology. I’m looking at the same set of metrics across every model: VRAM usage, GPU utilization, CPU load, token throughput, tokens written, and total response time. These matter to me because they reveal whether a model is actually usable on consumer hardware — not just in theory, but in practice.

Specs	Value
Linux Distro	Ubuntu Server 24.04.4 LTS
Linux Kernel	6.8.0-101
CPU	Intel CORE i7 14th Gen 14700K Cores: 8P/12E Threads: 28
Motherboard	MSI PRO B660M-A
RAM	80 GB DDR4 (32+16+32+16)
SSD	Crucial NVME 1TB
GPU	MSI NVidia GeForce RTX 5060Ti Shadow 2X OC PCIe 5.0×8
CUDA Cores	4,608
VRAM	16 GB GDDR7 128-bit 448 GB/s
GPU Driver	NVidia 590.48.01
CUDA version	13.1
Ollama version	0.17.4
Model	Ministral 3 14B Instruct
Quantization	Q4 K M

The prompt

			
Write a simple Python function that checks if a number is prime.
Explain how it works in plain English, like you're teaching 
a beginner.

The results

Ministral 3 14B uses 4-bit K M quantization. This allows it to comfortably fit onto a 16 GB GPU and perhaps even a 12 GB one. The performance is rock solid and from power perspective the model taxed the GPU at full tilt — 98%, never overflowed the VRAM –10 GB out of 16 GB, wattage was good — 150W out of 180W. We will dive into the numbers in a bit. First I wanted to mention how unique this model was! The word coming to mind is “sassy”.
I said hey, but fat-fingered it into “ihey” and the reply came back blazing fast (this model runs like a champ on a 3060Ti card) and made me giggle. It was picking on me! Right off the bat. First interaction. Its chatty personality became apparent right away.

From performance perspective the output was at a respectable 48 tokens/sec, which is to be expected for a mid-weight model run on my kind of GPU. Still, that is a very good speed and you will not be sitting there waiting. The replies start flowing-in nearly instantaneously and generate at a rapid clip.

Model	Quant	Run	Tokens/sec	Total Time	Tokens Written	VRAM	GPU Util
Ministral 3 14B	Q4_K_M	1	49.3	12s	601	10GB	98%
Ministral 3 14B	Q4_K_M	2	48.46	18s	868	10GB	98%
Ministral 3 14B	Q4_K_M	3	47.57	21s	973	10GB	98%

This is what stood out to me: a steady climb in output length from 601 to 973 tokens across three runs. And I need to reiterate that I merely pasted the above prompt three times in a row with no other interaction, explanation or prompt in-between.

The model went into a slight drift. Not the kind that comes from randomness, but a deliberate approach: Run 1 delivered a clean, optimal solution for prime-number, using the mathematically accurate √n method. By Run 2, though, the model veered toward a less efficient num//2 method before correcting itself by Run 3. It’s not a flaw; it’s a pattern. Because I repeated these 3 runs and saw the exact same phenomenon — the 2nd run was always generating a less efficient solution and the 3rd run was improved back again.

The 14B version retains the playful, typo-catching wit of the 8B but with a subtler restraint. After putting both 8B and 14B through the wringer — same prompt, same hardware, same relentless repetition — one thing became crystal clear to me: scale isn’t everything. The 8B model was more consistent, outperformed its larger sibling in critical ways. It didn’t drift, it didn’t second-guess itself, ran twice as fast, and it delivered the optimal prime-checking algorithm every single time.

Now. I know that asking a model to generate Python code for finding prime numbers may be a bad case of me soft-balling these models. It is such a trivial and frequent problem they all were bound to have encountered that during their weight training. So, there an argument to be made here that the smaller 8B may have outperformed its big sister because of this – it stuck to the learned script. Maybe? IDK.

The vision bonus! It’s got OCR! Finally!

This freshly released model now has image recognition and text parsing! Thank goodness! I was struggling with this and honestly that was one of the most significant breaking points for me. I often need to show to the model my screen or snippets when I’m debugging code on my Linux servers, installing and configuring services, or just need them to interpret screenshots of tables and data from Excel. Both Ministral 3 8B and 14B got the goods!

I was able to copy data from excel and paste into the chat where the model readily interpreted it and repeated it back. I was able to take a screen clipping of a table as a JPG, paste into the chat and the model correctly wrote it back in markdown format.

From there I exported it as CSV and pasted back into Excel. Easy peasy! I cannot overestimate how important this feature is. This was the single biggest omission in all the models I’ve tested and a real barrier for me. When you do a lot of debugging and installations and the worst of them all: Linux logs? Ooff! Having AI to read and interpret them is such a time saver. A significant time saver. I’ve done cat /var/log/somelogname.log more times than I can count.

The conclusion – TL;DR

The 14B version retains the playful, typo-catching wit of the Ministral 3 8B but with a more subtle flair. It is not as brash and edgy as its smaller sibling, but still more unfiltered than I’m used to. I would liken it to a difference between a tool who wants to give you a quick answer and a true collaborator who thinks about it, refines it and then delivers the goods in a fun way.

If you’re building a rocket, you’ll want the most powerful engine you can get. But if you’re writing a script or debugging a code snippet, the lean, mean, whimsical 8B model who takes the personality of mischievous Loki might just be the MVP you didn’t know you needed. But, if you want more polish, don’t mind half the speed (which is still crazy fast!), and want a good all-rounder to run at home, this 14B Ministral 3 may be it. And hey — if it does start talking back and talking naughty? That’s just the 14B showing off its personality. That’s what the Start New Chat button is for! No?

Welcome to the future where AI has opinions.
What a concept!

PS:
This Ministral3 14B has become my primary home model I use as we speak. Why? It’s good, and the cherry on top: it has a whopping 260k token context window (order of magnitude larger than other comparable models), so that helps in prolonged conversations. It matters more than I’d have thought.

The AI output (pasted & unedited)

Here’s a simple Python function to check if a number is prime, along with a plain-English explanation:

Python Function:

			
def is_prime(n):
    """Check if a number is prime."""
    if n <= 1:  # Numbers ≤ 1 are not prime
        return False
    if n == 2:  # 2 is the only even prime
        return True
    if n % 2 == 0:  # Even numbers > 2 are not prime
        return False
    # Check divisibility from 3 up to √n (skipping even numbers)
    for i in range(3, int(n**0.5) + 1, 2):
        if n % i == 0:
            return False
    return True

		

How It Works (Plain English Explanation):

Edge Cases First:
- If the number is ≤ 1 (like 0, 1, or negative numbers), it’s not prime → return False.
- If the number is 2, it’s the only even prime → return True.
- If the number is even and > 2, it’s not prime → return False.
Check for Divisors (Odd Numbers Only):
- A prime number has no divisors other than 1 and itself.
- Instead of checking all numbers up to n-1, we only check up to √n (because if n has a factor > √n, the other factor must be < √n).
- We skip even numbers (since we already ruled them out) and only check odd divisors (3, 5, 7, etc.).
Loop Through Possible Divisors:
- For each odd number i from 3 to √n, check if n % i == 0 (i.e., if i divides n evenly).
- If it does, n is not prime → return False.
- If the loop finishes without finding any divisors, n is prime → return True.

Example Usage:

			
print(is_prime(7))   # True (prime)
print(is_prime(10))  # False (divisible by 2 & 5)
print(is_prime(1))   # False (not prime)
print(is_prime(13))  # True (prime)

Why This Works Efficiently:

No need to check even numbers after ruling out 2.
Only checks up to √n (saves time—e.g., for n=100, we only check up to 10 instead of 99).
Skips even divisors (faster than checking all numbers).

Would you like me to explain any part in more detail? 😊

Need Help?

Need help streamlining your processes or solving tricky business problems? I offer one-on-one consultations to get you unstuck fast. Book a free consultation with me today at goarcherdynamics.com.

Want more practical tips and workflow hacks? I publish them regularly on my blog — check it out and subscribe for newsletter updates: goarcherdynamics.com

Archer Dynamics