AI@Home – Granite 3B MoE

Model conditions & context

So, today I’m looking at IBM’s models again. This time it is one of their tiny series, but unlike the 2B dense model I tested last week, this is a Mixture of Experts model. And that always gets my attention (no pun intended). Let’s dive right in.

I picked a very simple prompt which contains a mixture of code generation and some reasoning logic, and writing prowess. What I’m looking for is a number of metrics of interest to me: how much VRAM the model uses, utilization of GPU, wattage and temperature of GPU, CPU utilization, token throughput, total number of tokens written, total time to response. All these are important to me as not only do they match the best model for my hardware, but also provide the best quality of UX for me as an end user. I focus on overall quality of the answer, but most importantly on the reasoning and explanation to someone who is a novice in the field. My goal here is to show whether the model is usable and good enough to help someone who is seeking assistance in learning how to code or write code.

SpecsValue
Linux DistroUbuntu Server 24.04.4 LTS
Linux Kernel6.8.0-101
CPUIntel CORE i7 14th Gen 14700K Cores: 8P/12E Threads: 28
MotherboardMSI PRO B660M-A
RAM80 GB DDR4 (32+16+32+16)
SSDCrucial NVME 1TB
GPUMSI NVidia GeForce RTX 5060Ti Shadow 2X OC PCIe 5.0×8
CUDA Cores4,608
VRAM16 GB GDDR7 128-bit 448 GB/s
GPU DriverNVidia 590.48.01
CUDA version13.1
Ollama version0.23.0
ModelIBM Granite 3 MoE
QuantizationQ4 K M

The prompt

Write a simple Python function that checks if a number
is prime. Explain how it works in plain English,
like you're teaching a beginner.

The results

Doing three runs on this model yielded nearly identical results. Well, actually scratch the “nearly”. They were identical. I have said this in my previous review that this kind of consistency was rare and as with the prior Granite, I would guess that either this model’s temp is set to fairly low temperature, or the fact that this model is so small it has less weights to draw from, therefore has less variance due to smaller arsenal of information. That should not overshadow the fact that it ran great, ran fast, and gave a good answer.

ModelQuantRunTokens/secTotal TimeTokens WrittenVRAMGPU Util
Granite 3 MoEQ4_K_M1216.342s4622.8GB80%
Granite 3 MoEQ4_K_M2213.776s4512.8GB80%
Granite 3 MoEQ4_K_M3209.682s4582.8GB80%

Unsurprisingly, this 2.8GB featherweight ran at 216 tokens a second and dropped only very slightly across the 3 runs. That is expected. It’s MoE (mixture of experts) architecture routes each token through only a fraction of those parameters per forward pass, not the entire weight system. This allows the model to operate at speeds similar to 1B dense models, but most importantly: it allows us to run it on a smaller card than a similarly weighted dense model. 

From my actual experience, since I have a 16 GB GPU, the card did not even try. The runs finished within 2 seconds leaving nearly no time for me to screenshot the NVIDIA-smi utilization and wattage. Having said that, use the 80% utilization with a grain of salt. The utility samples only every second and with a 2-second run length, it was impossible to grab the screenshot to catch a reasonably good number. My 5060 Ti, when fully maxed out, reaches ~190W. In this case the card barely jumped to 83W, before the run finished.

The conclusion – TL;DR

What do i think? Well, it’s difficult to say. This model is very lightweight, incredibly fast, accurate for simple tasks, it uses a modern MoE architecture, making it efficient. But this model is a year and a half old, which in AI space is eternity.
But, it works!!!
And here is one thing many people undersell or do not fully appreciate. Licensing. This model is Apache 2.0. Fully open, high quality, no licensing strings attached, commercially usable. That is a big deal. Not all open weight models are equal from this standpoint and Granite 3 3B MoE is very permissible when licensing is considered.
Sometimes, pursuit of raw numbers and benchmark scores is not where the real story is written.

The AI output (pasted & unedited)

All code from the AI@Home series is available in the Archer Dynamics GitHub repo.

Sure, I’d be happy to help! A prime number is a positive integer greater than 1 that has no positive divisors other than 1 and itself. In simpler terms, it’s a number that can only be divided by 1 and itself without leaving a remainder.

Now, let’s write a simple Python function to check if a number is prime. Here’s how you might do it:

def is_prime(n):
# Check if the number is less than or equal to 1
if n <= 1:
return False
# Check if the number is only divisible by 1 and itself
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
return False
# If no factors were found, the number is prime
return True

Let’s break it down step by step:

  1. We start by checking if the number n is less than or equal to 1. If it is, we immediately know that it’s not prime because 1 is not considered a prime number.
  2. Next, we loop through all numbers from 2 to the square root of n (inclusive). This is because if n is divisible by any number greater than its square root, it would also be divisible by a smaller number in that range.
  3. For each number i in this range, we check if n is divisible by i. If it is, then n is not prime because it has factors other than 1 and itself.
  4. If no numbers from 2 to the square root of n divide evenly into n, then n is prime.

This function works by checking all possible factors of n, starting from the smallest possible factor (which is 2) up to the square root of n. This is because if n has any factors other than 1 and itself, it must have a factor less than or equal to its square root. By checking up to the square root, we can avoid unnecessary computations.

Need Help?

Need help streamlining your processes or solving tricky business problems? I offer one-on-one consultations to get you unstuck fast. Book a free consultation with me today at goarcherdynamics.com.

Want more practical tips and workflow hacks? I publish them regularly on my blog — check it out and subscribe for newsletter updates: goarcherdynamics.com

Jiri Krecek Avatar

Posted by

Leave a Reply

Discover more from Archer Dynamics

Subscribe now to keep reading and get access to the full archive.

Continue reading