Conditions & context
Today we are looking at Google’s Gemma3 4B IT QAT model, which is lightweight and looks promising. Do promises get delivered? Let’s dive in…
As in all my tests, I use the same prompt, the same hardware, and the same methodology. I’m looking at the same set of metrics across every model: VRAM usage, GPU utilization, CPU load, token throughput, tokens written, and total response time. These matter to me because they reveal whether a model is actually usable on consumer hardware — not just in theory, but in practice.
| Specs | Value |
|---|---|
| Linux Distro | Ubuntu Server 24.04.4 LTS |
| Linux Kernel | 6.8.0-101 |
| CPU | Intel CORE i7 14th Gen 14700K Cores: 8P/12E Threads: 28 |
| Motherboard | MSI PRO B660M-A |
| RAM | 80 GB DDR4 (32+16+32+16) |
| SSD | Crucial NVME 1TB |
| GPU | MSI NVidia GeForce RTX 5060Ti Shadow 2X OC PCIe 5.0×8 |
| CUDA Cores | 4,608 |
| VRAM | 16 GB GDDR7 128-bit 448 GB/s |
| GPU Driver | NVidia 590.48.01 |
| CUDA version | 13.1 |
| Ollama version | 0.17.4 |
| Model | Gemma3 4B IT QAT |
| Quantization | QAT |
The prompt
Write a simple Python function that checks if a number is prime.Explain how it works in plain English, like you're teachinga beginner.
The results
This was a surprising test! Gone were the boring model runs, which all looked the same — number-wise. This Gemma3 4B model shook things up a bit by doing two things:
- Gradually lowering the GPU utilization with each subsequent run
- Repeating the same reasoning error in every run! Huh?
Yes! After a promising start and the fastest token throughput speed I’ve ever measured on my machine, it started going off a cliff – throughput remained nearly identical, total length also was on par with every run, but what started tanking was the GPU utilization and total time. The model was literally getting progressively slower and it kept handing over more and more tasks from the GPU to the CPU itself, again, lighting up all cores of my 14th gen CORE i7. All 28 threads were lit up in htop.
What about the error? Well, the code looked alright, but the explanation Gemma3 4B gave was not. The model didn’t actually understand the algorithm it was explaining, because it used a logic which would never happen, if the script was actually executed. So, the right result, for the wrong reasons. And that begs a question:
Does this model really understand the code it is writing? Or was this code fed into it as a weight during model training? I’m asking, because if the model wrote correct Python code it understood, why is it explaining it in a wrong way?
| Run | Quant | Tokens/sec | Total Time | Tokens Written | VRAM | GPU Util |
|---|---|---|---|---|---|---|
| 1 | QAT | 105.96 | 12s | 1,204 | 5.8GB | 93% |
| 2 | QAT | 103.20 | 18s | 1,468 | 5.8GB | 50% |
| 3 | QAT | 100.64 | 26s | 1,376 | 5.8GB | 22% |
The conclusion – TL;DR
This one is a hard no for me. Testing this small model got me thinking whether this thing was optimized to even run on CUDA cores of a regular GPU. Having learned a lesson from Gemma3 27B and its glacial pace I did not have high expectations from Gemma 4B — It started off very fast, but the fact it kept getting slower and slower with every run was odd.
What sealed the fate was the fact that the model’s output was factually wrong. It had an error in its reasoning as to why 7 was a prime number. The Python code was correct, but the way it explained it was very confident, yet wrong! That code would have confirmed 7 was a prime, yet the model went on these tangents claiming iterations of i=3, then 5 then 7? What???
This is a shining example of a distinction between “code was copied correctly from training data” versus “explanation generated fresh without understanding and feeling good about it”. This is genuinely the most insightful thing I’ve discovered in this test and you should pay attention to it before you start trusting this model. It is strongly confident, alright! But would you trust it? After what I’ve seen, I most certainly would not. It’s a self-confident confabulator.
A pass for me.
The output (pasted & unedited)
Okay, let’s break down how to write a Python function that checks if a number is prime. I’ll explain it step-by-step, just as if you’re a complete beginner to programming!
def is_prime(number): """ This function checks if a number is prime. A prime number is only divisible by 1 and itself. Args: number: The number to check. Returns: True if the number is prime, False otherwise. """ if number <= 1: # Numbers less than or equal to 1 are NOT prime. return False if number == 2: # 2 is the *only* even prime number. return True if number % 2 == 0: # If the number is even (and not 2), it's not prime. return False # Now, let's check if it's divisible by any odd numbers from 3 up to the square root. # This is a clever trick to make the checking faster! for i in range(3, int(number**0.5) + 1, 2): if number % i == 0: # If it's divisible by any 'i', it's not prime. return False # If we get here, it means no divisors were found, so it *is* prime! return True
Let’s break it down part by part:
def is_prime(number):: This line defines the function. It’s like creating a little recipe. We’re calling this functionis_prime, and it takes one ingredient –number– as input. Think ofnumberas a variable that will hold the number we want to test."""...""": This is a docstring. It’s a description of what the function does. It’s helpful for anyone reading your code to understand it. It’s a good habit to always include one!if number <= 1:: This is the first check. Prime numbers have to be greater than 1. So, ifnumberis less than or equal to 1 (meaning it’s 0, 1, or a negative number), it can’t be prime. We immediatelyreturn False– which is like saying “this number is not prime, so stop here!”if number == 2:: This is a special case. The number 2 is the only even prime number. So, ifnumberis exactly 2, wereturn True.if number % 2 == 0:: The modulo operator (%) gives you the remainder after dividing. So,number % 2is the remainder whennumberis divided by 2. If the remainder is 0, that meansnumberis perfectly divisible by 2. If the number is even (and not 2), it’s not prime. Wereturn False.- `for i in range(3, int(number0.5) + 1, 2):`*: This is where the real checking happens. Let’s break this* down:
range(3, int(number**0.5) + 1, 2): This creates a sequence of numbers. It’s like saying, “Give me a list of numbers starting at 3, going up to the square root of the number, but only include odd numbers, stepping by 2 each time.”3: We start at 3 (because we’ve already handled even numbers).int(number**0.5) + 1: This is the square root of the number.number**0.5: This means “number raised to the power of 0.5” (which is the same as taking the square root).int(...): This converts the square root to an integer (a whole number). We add 1 becauserange()stops before the end value.
2: This is the step – how much we increaseieach time. We’re only checking odd numbers, so we skip even numbers.
for i in ...: This loops through all the odd numbers in the sequence we created.iwill take on each of those values, one at a time.
if number % i == 0:: Inside the loop, this checks ifnumberis divisible byi. If it is (meaning the remainder is 0), theniis a divisor ofnumber, and thereforenumberis not prime. We immediatelyreturn False.return True: If the loop finishes without finding any divisors, that meansnumberis only divisible by 1 and itself. Therefore, it is prime. Wereturn True.
Example:
Let’s check if 7 is prime:
is_prime(7)is called.7 <= 1is false.7 == 2is false.7 % 2 == 0is false (remainder is 1).- The loop runs:
istarts at 3.7 % 3 == 0is false (remainder is 1).ibecomes 5.7 % 5 == 0is false (remainder is 2).ibecomes 7.7 % 7 == 0is true (remainder is 0)! Wereturn False.
- The function returns
Falsebecause 7 is divisible by 7.
Do you want me to explain any specific part of this code in more detail, or would you like me to give you another example? Perhaps you’d like to try writing a few test cases yourself?
Need Help?
Need help streamlining your processes or solving tricky business problems? I offer one-on-one consultations to get you unstuck fast. Book a free consultation with me today at goarcherdynamics.com.
Want more practical tips and workflow hacks? I publish them regularly on my blog — check it out and subscribe for newsletter updates: goarcherdynamics.com

Leave a Reply