AI@Home – Gemma3 4B

Conditions & context

Today we are looking at Google’s Gemma3 4B IT QAT model, which is lightweight and looks promising. Do promises get delivered? Let’s dive in…

As in all my tests, I use the same prompt, the same hardware, and the same methodology. I’m looking at the same set of metrics across every model: VRAM usage, GPU utilization, CPU load, token throughput, tokens written, and total response time. These matter to me because they reveal whether a model is actually usable on consumer hardware — not just in theory, but in practice.

SpecsValue
Linux DistroUbuntu Server 24.04.4 LTS
Linux Kernel6.8.0-101
CPUIntel CORE i7 14th Gen 14700K Cores: 8P/12E Threads: 28
MotherboardMSI PRO B660M-A
RAM80 GB DDR4 (32+16+32+16)
SSDCrucial NVME 1TB
GPUMSI NVidia GeForce RTX 5060Ti Shadow 2X OC PCIe 5.0×8
CUDA Cores4,608
VRAM16 GB GDDR7 128-bit 448 GB/s
GPU DriverNVidia 590.48.01
CUDA version13.1
Ollama version0.17.4
ModelGemma3 4B IT QAT
QuantizationQAT

The prompt

Write a simple Python function that checks if a number is prime.
Explain how it works in plain English, like you're teaching
a beginner.

The results

This was a surprising test! Gone were the boring model runs, which all looked the same — number-wise. This Gemma3 4B model shook things up a bit by doing two things:

Yes! After a promising start and the fastest token throughput speed I’ve ever measured on my machine, it started going off a cliff – throughput remained nearly identical, total length also was on par with every run, but what started tanking was the GPU utilization and total time. The model was literally getting progressively slower and it kept handing over more and more tasks from the GPU to the CPU itself, again, lighting up all cores of my 14th gen CORE i7. All 28 threads were lit up in htop.

What about the error? Well, the code looked alright, but the explanation Gemma3 4B gave was not. The model didn’t actually understand the algorithm it was explaining, because it used a logic which would never happen, if the script was actually executed. So, the right result, for the wrong reasons. And that begs a question:

Does this model really understand the code it is writing? Or was this code fed into it as a weight during model training? I’m asking, because if the model wrote correct Python code it understood, why is it explaining it in a wrong way?

RunQuantTokens/secTotal TimeTokens WrittenVRAMGPU Util
1QAT105.9612s1,2045.8GB93%
2QAT103.2018s1,4685.8GB50%
3QAT100.6426s1,3765.8GB22%

The conclusion – TL;DR

This one is a hard no for me. Testing this small model got me thinking whether this thing was optimized to even run on CUDA cores of a regular GPU. Having learned a lesson from Gemma3 27B and its glacial pace I did not have high expectations from Gemma 4B — It started off very fast, but the fact it kept getting slower and slower with every run was odd.

What sealed the fate was the fact that the model’s output was factually wrong. It had an error in its reasoning as to why 7 was a prime number. The Python code was correct, but the way it explained it was very confident, yet wrong! That code would have confirmed 7 was a prime, yet the model went on these tangents claiming iterations of i=3, then 5 then 7? What???
This is a shining example of a distinction between “code was copied correctly from training data” versus “explanation generated fresh without understanding and feeling good about it”. This is genuinely the most insightful thing I’ve discovered in this test and you should pay attention to it before you start trusting this model. It is strongly confident, alright! But would you trust it? After what I’ve seen, I most certainly would not. It’s a self-confident confabulator.

A pass for me.

The output (pasted & unedited)

Okay, let’s break down how to write a Python function that checks if a number is prime. I’ll explain it step-by-step, just as if you’re a complete beginner to programming!

def is_prime(number):
"""
This function checks if a number is prime.
A prime number is only divisible by 1 and itself.
Args:
number: The number to check.
Returns:
True if the number is prime, False otherwise.
"""
if number <= 1: # Numbers less than or equal to 1 are NOT prime.
return False
if number == 2: # 2 is the *only* even prime number.
return True
if number % 2 == 0: # If the number is even (and not 2), it's not prime.
return False
# Now, let's check if it's divisible by any odd numbers from 3 up to the square root.
# This is a clever trick to make the checking faster!
for i in range(3, int(number**0.5) + 1, 2):
if number % i == 0: # If it's divisible by any 'i', it's not prime.
return False
# If we get here, it means no divisors were found, so it *is* prime!
return True

Let’s break it down part by part:

  1. def is_prime(number):: This line defines the function. It’s like creating a little recipe. We’re calling this function is_prime, and it takes one ingredient – number – as input. Think of number as a variable that will hold the number we want to test.
  2. """...""": This is a docstring. It’s a description of what the function does. It’s helpful for anyone reading your code to understand it. It’s a good habit to always include one!
  3. if number <= 1:: This is the first check. Prime numbers have to be greater than 1. So, if number is less than or equal to 1 (meaning it’s 0, 1, or a negative number), it can’t be prime. We immediately return False – which is like saying “this number is not prime, so stop here!”
  4. if number == 2:: This is a special case. The number 2 is the only even prime number. So, if number is exactly 2, we return True.
  5. if number % 2 == 0:: The modulo operator (%) gives you the remainder after dividing. So, number % 2 is the remainder when number is divided by 2. If the remainder is 0, that means number is perfectly divisible by 2. If the number is even (and not 2), it’s not prime. We return False.
  6. `for i in range(3, int(number0.5) + 1, 2):`*: This is where the real checking happens. Let’s break this* down:
    • range(3, int(number**0.5) + 1, 2): This creates a sequence of numbers. It’s like saying, “Give me a list of numbers starting at 3, going up to the square root of the number, but only include odd numbers, stepping by 2 each time.”
      • 3: We start at 3 (because we’ve already handled even numbers).
      • int(number**0.5) + 1: This is the square root of the number.
        • number**0.5: This means “number raised to the power of 0.5” (which is the same as taking the square root).
        • int(...): This converts the square root to an integer (a whole number). We add 1 because range() stops before the end value.
      • 2: This is the step – how much we increase i each time. We’re only checking odd numbers, so we skip even numbers.
    • for i in ...: This loops through all the odd numbers in the sequence we created. i will take on each of those values, one at a time.
  7. if number % i == 0:: Inside the loop, this checks if number is divisible by i. If it is (meaning the remainder is 0), then i is a divisor of number, and therefore number is not prime. We immediately return False.
  8. return True: If the loop finishes without finding any divisors, that means number is only divisible by 1 and itself. Therefore, it is prime. We return True.

Example:

Let’s check if 7 is prime:

  1. is_prime(7) is called.
  2. 7 <= 1 is false.
  3. 7 == 2 is false.
  4. 7 % 2 == 0 is false (remainder is 1).
  5. The loop runs:
    • i starts at 3.
    • 7 % 3 == 0 is false (remainder is 1).
    • i becomes 5.
    • 7 % 5 == 0 is false (remainder is 2).
    • i becomes 7.
    • 7 % 7 == 0 is true (remainder is 0)! We return False.
  6. The function returns False because 7 is divisible by 7.

Do you want me to explain any specific part of this code in more detail, or would you like me to give you another example? Perhaps you’d like to try writing a few test cases yourself?

Need Help?

Need help streamlining your processes or solving tricky business problems? I offer one-on-one consultations to get you unstuck fast. Book a free consultation with me today at goarcherdynamics.com.

Want more practical tips and workflow hacks? I publish them regularly on my blog — check it out and subscribe for newsletter updates: goarcherdynamics.com

Jiri Krecek Avatar

Posted by

Leave a Reply

Discover more from Archer Dynamics

Subscribe now to keep reading and get access to the full archive.

Continue reading