Model conditions & context
Today we are looking at another model from the family of lighter weights — and this time it is from IBM. Should be interesting. clocking at a puny 2B this Granite 3 2B model is one of the lightest, if not the lightest, I’ve tested. Let’s dive in to see how it did with Python.
I picked a very simple prompt which contains a mixture of code generation and some reasoning logic, and writing prowess. What I’m looking for is a number of metrics of interest to me: how much VRAM the model uses, utilization of GPU, wattage and temperature of GPU, CPU utilization, token throughput, total number of tokens written, total time to response. All these are important to me as not only do they match the best model for my hardware, but also provide the best quality of UX for me as an end user. I focus on overall quality of the answer, but most importantly on the reasoning and explanation to someone who is a novice in the field. My goal here is to show whether the model is usable and good enough to help someone who is seeking assistance in learning how to code or write code.
| Specs | Value |
|---|---|
| Linux Distro | Ubuntu Server 24.04.4 LTS |
| Linux Kernel | 6.8.0-101 |
| CPU | Intel CORE i7 14th Gen 14700K Cores: 8P/12E Threads: 28 |
| Motherboard | MSI PRO B660M-A |
| RAM | 80 GB DDR4 (32+16+32+16) |
| SSD | Crucial NVME 1TB |
| GPU | MSI NVidia GeForce RTX 5060Ti Shadow 2X OC PCIe 5.0×8 |
| CUDA Cores | 4,608 |
| VRAM | 16 GB GDDR7 128-bit 448 GB/s |
| GPU Driver | NVidia 590.48.01 |
| CUDA version | 13.1 |
| Ollama version | 0.23.0 |
| Model | Granite 3 Dense 2B |
| Quantization | Q4_K_M |
The prompt
Write a simple Python function that checks if a numberis prime. Explain how it works in plain English, likeyou're teaching a beginner.
The results
First off, IBM calls this 2B, but the model has 2.6 billion parameters. IBM’s rounding it down to “2B” is generous. Granite 3 is a dense model family and my RTX 5060 Ti card treats it accordingly. All three runs completed in under 2 seconds, with output speeds holding remarkably steady from 205 t/s down to 197 t/s as input context grew eightfold. That negligible drop across the three runs while the context window grew some 8 times is essentially no factor — the model barely notices this extra baggage. Space-wise it eats 2.1GB of VRAM, so very comfortably fitting onto some old 4 gig GPU. Or on a 16 GB card. you could stack several tasks simultaneously without breaking a sweat. Heck, it was so fast that my server did not even get a chance to spool up its fans on any of the runs!
| Model | Quant | Run | Tokens/sec | Total Time | Tokens Written | VRAM | GPU Util |
|---|---|---|---|---|---|---|---|
| Granite 3 Dense 2B | Q4_K_M | 1 | 205.68 | 2s | 355 | 2.1 GB | 94% |
| Granite 3 Dense 2B | Q4_K_M | 2 | 200.03 | 2s | 354 | 2.1 GB | 94% |
| Granite 3 Dense 2B | Q4_K_M | 3 | 197.61 | 2s | 354 | 2.1 GB | 94% |
I saw Identical output across all three runs — word for word, line for line. That level of determinism is unusual and worth flagging: most models introduce at least minor variation between runs, especially with growing context. Granite didn’t. That may be hinting at a pretty low temperature setting, or the model was simply trained on this specific exercise and given its tiny weight pool to draw from it may have only a single set of results it can generate. Or it may be a combination of both.
The code itself is correct and production-ready — proper 6k±1 optimization, clean edge case handling for values under 4, and a step-by-step explanation that any junior developer learner like me could follow. IBM’s enterprise DNA pokes through in the explanation section, through, because this model writes like someone who has lived in the corporate world for a while. The explanation is almost suspiciously thorough for a 2.6B model. The 4096 token context ceiling is the only real constraint — tighter than most modern competitors at this weight class, so no long deep conversations, before context collapses.
The conclusion – TL;DR
Granite 3 Dense 2B provides good output quality and polished explanations, while sipping GPU resources. At 2.1GB and 200+ t/s, it’s one of the fastest usable coding models you can run locally. The enterprise Apache 2.0 license makes it commercially clean and worry-free, and IBM’s training show maturity, no hallucinated drama. It won’t win any creativity awards, it can clearly write code, can do good explanations, but this is not a reasoning model. And its dense, not a lighter MoE kind, nor can it do visuals.
One thing which got my attention is its tooling capability – it supports tools and RAG, so that vastly changes the perspective — this could be a great assistant model for your in-house apps – it has the speed and tools to do so – summarizations, translations, some light code error checking or QA. But given its 4k context window, do not expect it to do complicated tasks.
The AI output (pasted & unedited)
All code from the AI@Home series is available in the Archer Dynamics GitHub repo.
Here’s a simple Python function that checks if a number is prime:
def is_prime(n): if n <= 1: return False elif n <= 3: return True elif n % 2 == 0 or n % 3 == 0: return False i = 5 while i * i <= n: if n % i == 0 or n % (i + 2) == 0: return False i += 6 return True
This function works by checking divisibility starting from 2 up to the square root of the number. It skips even numbers and checks multiples of 3, as well as the next two numbers in that sequence, because all primes are of the form 6k ± 1. This optimization makes the function more efficient.
Here’s a step-by-step breakdown:
- If the number is less than or equal to 1, it’s not prime.
- If the number is less than or equal to 3, it’s prime.
- If the number is divisible by 2 or 3, it’s not prime.
- Start checking divisibility from 5 (the next odd number after 3).
- Check if the number is divisible by the current number or the next number in the sequence (n % i == 0 or n % (i + 2) == 0).
- If the number is divisible, it’s not prime.
- If the loop completes without finding any divisors, the number is prime.
Need Help?
Need help streamlining your processes or solving tricky business problems? I offer one-on-one consultations to get you unstuck fast. Book a free consultation with me today at goarcherdynamics.com.
Want more practical tips and workflow hacks? I publish them regularly on my blog — check it out and subscribe for newsletter updates: goarcherdynamics.com

Leave a Reply