Model conditions & context
So, today I’m looking at IBM’s models again. This time it is one of their tiny series, but unlike the 2B dense model I tested last week, this is a Mixture of Experts model. And that always gets my attention (no pun intended). Let’s dive right in.
I picked a very simple prompt which contains a mixture of code generation and some reasoning logic, and writing prowess. What I’m looking for is a number of metrics of interest to me: how much VRAM the model uses, utilization of GPU, wattage and temperature of GPU, CPU utilization, token throughput, total number of tokens written, total time to response. All these are important to me as not only do they match the best model for my hardware, but also provide the best quality of UX for me as an end user. I focus on overall quality of the answer, but most importantly on the reasoning and explanation to someone who is a novice in the field. My goal here is to show whether the model is usable and good enough to help someone who is seeking assistance in learning how to code or write code.
| Specs | Value |
|---|---|
| Linux Distro | Ubuntu Server 24.04.4 LTS |
| Linux Kernel | 6.8.0-101 |
| CPU | Intel CORE i7 14th Gen 14700K Cores: 8P/12E Threads: 28 |
| Motherboard | MSI PRO B660M-A |
| RAM | 80 GB DDR4 (32+16+32+16) |
| SSD | Crucial NVME 1TB |
| GPU | MSI NVidia GeForce RTX 5060Ti Shadow 2X OC PCIe 5.0×8 |
| CUDA Cores | 4,608 |
| VRAM | 16 GB GDDR7 128-bit 448 GB/s |
| GPU Driver | NVidia 590.48.01 |
| CUDA version | 13.1 |
| Ollama version | 0.23.0 |
| Model | IBM Granite 3 MoE |
| Quantization | Q4 K M |
The prompt
Write a simple Python function that checks if a number is prime. Explain how it works in plain English, like you're teaching a beginner.
The results
Doing three runs on this model yielded nearly identical results. Well, actually scratch the “nearly”. They were identical. I have said this in my previous review that this kind of consistency was rare and as with the prior Granite, I would guess that either this model’s temp is set to fairly low temperature, or the fact that this model is so small it has less weights to draw from, therefore has less variance due to smaller arsenal of information. That should not overshadow the fact that it ran great, ran fast, and gave a good answer.
| Model | Quant | Run | Tokens/sec | Total Time | Tokens Written | VRAM | GPU Util |
|---|---|---|---|---|---|---|---|
| Granite 3 MoE | Q4_K_M | 1 | 216.34 | 2s | 462 | 2.8GB | 80% |
| Granite 3 MoE | Q4_K_M | 2 | 213.77 | 6s | 451 | 2.8GB | 80% |
| Granite 3 MoE | Q4_K_M | 3 | 209.68 | 2s | 458 | 2.8GB | 80% |
Unsurprisingly, this 2.8GB featherweight ran at 216 tokens a second and dropped only very slightly across the 3 runs. That is expected. It’s MoE (mixture of experts) architecture routes each token through only a fraction of those parameters per forward pass, not the entire weight system. This allows the model to operate at speeds similar to 1B dense models, but most importantly: it allows us to run it on a smaller card than a similarly weighted dense model.
From my actual experience, since I have a 16 GB GPU, the card did not even try. The runs finished within 2 seconds leaving nearly no time for me to screenshot the NVIDIA-smi utilization and wattage. Having said that, use the 80% utilization with a grain of salt. The utility samples only every second and with a 2-second run length, it was impossible to grab the screenshot to catch a reasonably good number. My 5060 Ti, when fully maxed out, reaches ~190W. In this case the card barely jumped to 83W, before the run finished.
The conclusion – TL;DR
What do i think? Well, it’s difficult to say. This model is very lightweight, incredibly fast, accurate for simple tasks, it uses a modern MoE architecture, making it efficient. But this model is a year and a half old, which in AI space is eternity.
But, it works!!!
And here is one thing many people undersell or do not fully appreciate. Licensing. This model is Apache 2.0. Fully open, high quality, no licensing strings attached, commercially usable. That is a big deal. Not all open weight models are equal from this standpoint and Granite 3 3B MoE is very permissible when licensing is considered.
Sometimes, pursuit of raw numbers and benchmark scores is not where the real story is written.
The AI output (pasted & unedited)
All code from the AI@Home series is available in the Archer Dynamics GitHub repo.
Sure, I’d be happy to help! A prime number is a positive integer greater than 1 that has no positive divisors other than 1 and itself. In simpler terms, it’s a number that can only be divided by 1 and itself without leaving a remainder.
Now, let’s write a simple Python function to check if a number is prime. Here’s how you might do it:
def is_prime(n): # Check if the number is less than or equal to 1 if n <= 1: return False # Check if the number is only divisible by 1 and itself for i in range(2, int(n**0.5) + 1): if n % i == 0: return False # If no factors were found, the number is prime return True
Let’s break it down step by step:
- We start by checking if the number
nis less than or equal to 1. If it is, we immediately know that it’s not prime because 1 is not considered a prime number. - Next, we loop through all numbers from 2 to the square root of
n(inclusive). This is because ifnis divisible by any number greater than its square root, it would also be divisible by a smaller number in that range. - For each number
iin this range, we check ifnis divisible byi. If it is, thennis not prime because it has factors other than 1 and itself. - If no numbers from 2 to the square root of
ndivide evenly inton, thennis prime.
This function works by checking all possible factors of n, starting from the smallest possible factor (which is 2) up to the square root of n. This is because if n has any factors other than 1 and itself, it must have a factor less than or equal to its square root. By checking up to the square root, we can avoid unnecessary computations.
Need Help?
Need help streamlining your processes or solving tricky business problems? I offer one-on-one consultations to get you unstuck fast. Book a free consultation with me today at goarcherdynamics.com.
Want more practical tips and workflow hacks? I publish them regularly on my blog — check it out and subscribe for newsletter updates: goarcherdynamics.com

Leave a Reply