AI@Home – Gemma3 4B – Archer Dynamics

Conditions & context

Today we are looking at Google’s Gemma3 4B IT QAT model, which is lightweight and looks promising. Do promises get delivered? Let’s dive in…

As in all my tests, I use the same prompt, the same hardware, and the same methodology. I’m looking at the same set of metrics across every model: VRAM usage, GPU utilization, CPU load, token throughput, tokens written, and total response time. These matter to me because they reveal whether a model is actually usable on consumer hardware — not just in theory, but in practice.

Specs	Value
Linux Distro	Ubuntu Server 24.04.4 LTS
Linux Kernel	6.8.0-101
CPU	Intel CORE i7 14th Gen 14700K Cores: 8P/12E Threads: 28
Motherboard	MSI PRO B660M-A
RAM	80 GB DDR4 (32+16+32+16)
SSD	Crucial NVME 1TB
GPU	MSI NVidia GeForce RTX 5060Ti Shadow 2X OC PCIe 5.0×8
CUDA Cores	4,608
VRAM	16 GB GDDR7 128-bit 448 GB/s
GPU Driver	NVidia 590.48.01
CUDA version	13.1
Ollama version	0.17.4
Model	Gemma3 4B IT QAT
Quantization	QAT

The prompt

			
Write a simple Python function that checks if a number is prime.
Explain how it works in plain English, like you're teaching
a beginner.

The results

This was a surprising test! Gone were the boring model runs, which all looked the same — number-wise. This Gemma3 4B model shook things up a bit by doing two things:

Gradually lowering the GPU utilization with each subsequent run
Repeating the same reasoning error in every run! Huh?

Yes! After a promising start and the fastest token throughput speed I’ve ever measured on my machine, it started going off a cliff – throughput remained nearly identical, total length also was on par with every run, but what started tanking was the GPU utilization and total time. The model was literally getting progressively slower and it kept handing over more and more tasks from the GPU to the CPU itself, again, lighting up all cores of my 14th gen CORE i7. All 28 threads were lit up in htop.

What about the error? Well, the code looked alright, but the explanation Gemma3 4B gave was not. The model didn’t actually understand the algorithm it was explaining, because it used a logic which would never happen, if the script was actually executed. So, the right result, for the wrong reasons. And that begs a question:

Does this model really understand the code it is writing? Or was this code fed into it as a weight during model training? I’m asking, because if the model wrote correct Python code it understood, why is it explaining it in a wrong way?

Run	Quant	Tokens/sec	Total Time	Tokens Written	VRAM	GPU Util
1	QAT	105.96	12s	1,204	5.8GB	93%
2	QAT	103.20	18s	1,468	5.8GB	50%
3	QAT	100.64	26s	1,376	5.8GB	22%

The conclusion – TL;DR

This one is a hard no for me. Testing this small model got me thinking whether this thing was optimized to even run on CUDA cores of a regular GPU. Having learned a lesson from Gemma3 27B and its glacial pace I did not have high expectations from Gemma 4B — It started off very fast, but the fact it kept getting slower and slower with every run was odd.

What sealed the fate was the fact that the model’s output was factually wrong. It had an error in its reasoning as to why 7 was a prime number. The Python code was correct, but the way it explained it was very confident, yet wrong! That code would have confirmed 7 was a prime, yet the model went on these tangents claiming iterations of i=3, then 5 then 7? What???
This is a shining example of a distinction between “code was copied correctly from training data” versus “explanation generated fresh without understanding and feeling good about it”. This is genuinely the most insightful thing I’ve discovered in this test and you should pay attention to it before you start trusting this model. It is strongly confident, alright! But would you trust it? After what I’ve seen, I most certainly would not. It’s a self-confident confabulator.

A pass for me.

The output (pasted & unedited)

Okay, let’s break down how to write a Python function that checks if a number is prime. I’ll explain it step-by-step, just as if you’re a complete beginner to programming!

			
def is_prime(number):
  """
  This function checks if a number is prime.
  A prime number is only divisible by 1 and itself.
  Args:
    number: The number to check.
  Returns:
    True if the number is prime, False otherwise.
  """
  if number <= 1:  # Numbers less than or equal to 1 are NOT prime.
    return False
  if number == 2:  # 2 is the *only* even prime number.
    return True
  if number % 2 == 0:  # If the number is even (and not 2), it's not prime.
    return False
  # Now, let's check if it's divisible by any odd numbers from 3 up to the square root.
  # This is a clever trick to make the checking faster!
  for i in range(3, int(number**0.5) + 1, 2):
    if number % i == 0:  # If it's divisible by any 'i', it's not prime.
      return False
  # If we get here, it means no divisors were found, so it *is* prime!
  return True

		

Let’s break it down part by part:

def is_prime(number):: This line defines the function. It’s like creating a little recipe. We’re calling this function is_prime, and it takes one ingredient – number – as input. Think of number as a variable that will hold the number we want to test.
"""...""": This is a docstring. It’s a description of what the function does. It’s helpful for anyone reading your code to understand it. It’s a good habit to always include one!
if number <= 1:: This is the first check. Prime numbers have to be greater than 1. So, if number is less than or equal to 1 (meaning it’s 0, 1, or a negative number), it can’t be prime. We immediately return False – which is like saying “this number is not prime, so stop here!”
if number == 2:: This is a special case. The number 2 is the only even prime number. So, if number is exactly 2, we return True.
if number % 2 == 0:: The modulo operator (%) gives you the remainder after dividing. So, number % 2 is the remainder when number is divided by 2. If the remainder is 0, that means number is perfectly divisible by 2. If the number is even (and not 2), it’s not prime. We return False.
`for i in range(3, int(number0.5) + 1, 2):`*: This is where the real checking happens. Let’s break this* down:
- range(3, int(number**0.5) + 1, 2): This creates a sequence of numbers. It’s like saying, “Give me a list of numbers starting at 3, going up to the square root of the number, but only include odd numbers, stepping by 2 each time.”
  - 3: We start at 3 (because we’ve already handled even numbers).
  - int(number**0.5) + 1: This is the square root of the number.
    - number**0.5: This means “number raised to the power of 0.5” (which is the same as taking the square root).
    - int(...): This converts the square root to an integer (a whole number). We add 1 because range() stops before the end value.
  - 2: This is the step – how much we increase i each time. We’re only checking odd numbers, so we skip even numbers.
- for i in ...: This loops through all the odd numbers in the sequence we created. i will take on each of those values, one at a time.
if number % i == 0:: Inside the loop, this checks if number is divisible by i. If it is (meaning the remainder is 0), then i is a divisor of number, and therefore number is not prime. We immediately return False.
return True: If the loop finishes without finding any divisors, that means number is only divisible by 1 and itself. Therefore, it is prime. We return True.

Example:

Let’s check if 7 is prime:

is_prime(7) is called.
7 <= 1 is false.
7 == 2 is false.
7 % 2 == 0 is false (remainder is 1).
The loop runs:
- i starts at 3.
- 7 % 3 == 0 is false (remainder is 1).
- i becomes 5.
- 7 % 5 == 0 is false (remainder is 2).
- i becomes 7.
- 7 % 7 == 0 is true (remainder is 0)! We return False.
The function returns False because 7 is divisible by 7.

Do you want me to explain any specific part of this code in more detail, or would you like me to give you another example? Perhaps you’d like to try writing a few test cases yourself?

Need Help?

Need help streamlining your processes or solving tricky business problems? I offer one-on-one consultations to get you unstuck fast. Book a free consultation with me today at goarcherdynamics.com.

Want more practical tips and workflow hacks? I publish them regularly on my blog — check it out and subscribe for newsletter updates: goarcherdynamics.com

Archer Dynamics

Conditions & context

The prompt

The results

The conclusion – TL;DR

The output (pasted & unedited)

Need Help?

Like this:

Leave a ReplyCancel reply

AI@Home – Gemma3 4B

Conditions & context

The prompt

The results

The conclusion – TL;DR

The output (pasted & unedited)

Need Help?

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Archer Dynamics