AI@Home: IBM Granite3 2B

May 15, 2026

Model conditions & context

Today we are looking at another model from the family of lighter weights — and this time it is from IBM. Should be interesting. clocking at a puny 2B this Granite 3 2B model is one of the lightest, if not the lightest, I’ve tested. Let’s dive in to see how it did with Python.

I picked a very simple prompt which contains a mixture of code generation and some reasoning logic, and writing prowess. What I’m looking for is a number of metrics of interest to me: how much VRAM the model uses, utilization of GPU, wattage and temperature of GPU, CPU utilization, token throughput, total number of tokens written, total time to response. All these are important to me as not only do they match the best model for my hardware, but also provide the best quality of UX for me as an end user. I focus on overall quality of the answer, but most importantly on the reasoning and explanation to someone who is a novice in the field. My goal here is to show whether the model is usable and good enough to help someone who is seeking assistance in learning how to code or write code.

Specs	Value
Linux Distro	Ubuntu Server 24.04.4 LTS
Linux Kernel	6.8.0-101
CPU	Intel CORE i7 14th Gen 14700K Cores: 8P/12E Threads: 28
Motherboard	MSI PRO B660M-A
RAM	80 GB DDR4 (32+16+32+16)
SSD	Crucial NVME 1TB
GPU	MSI NVidia GeForce RTX 5060Ti Shadow 2X OC PCIe 5.0×8
CUDA Cores	4,608
VRAM	16 GB GDDR7 128-bit 448 GB/s
GPU Driver	NVidia 590.48.01
CUDA version	13.1
Ollama version	0.23.0
Model	Granite 3 Dense 2B
Quantization	Q4_K_M

The prompt

			
Write a simple Python function that checks if a number
is prime. Explain how it works in plain English, like
you're teaching a beginner.

The results

First off, IBM calls this 2B, but the model has 2.6 billion parameters. IBM’s rounding it down to “2B” is generous. Granite 3 is a dense model family and my RTX 5060 Ti card treats it accordingly. All three runs completed in under 2 seconds, with output speeds holding remarkably steady from 205 t/s down to 197 t/s as input context grew eightfold. That negligible drop across the three runs while the context window grew some 8 times is essentially no factor — the model barely notices this extra baggage. Space-wise it eats 2.1GB of VRAM, so very comfortably fitting onto some old 4 gig GPU. Or on a 16 GB card. you could stack several tasks simultaneously without breaking a sweat. Heck, it was so fast that my server did not even get a chance to spool up its fans on any of the runs!

Model	Quant	Run	Tokens/sec	Total Time	Tokens Written	VRAM	GPU Util
Granite 3 Dense 2B	Q4_K_M	1	205.68	2s	355	2.1 GB	94%
Granite 3 Dense 2B	Q4_K_M	2	200.03	2s	354	2.1 GB	94%
Granite 3 Dense 2B	Q4_K_M	3	197.61	2s	354	2.1 GB	94%

I saw Identical output across all three runs — word for word, line for line. That level of determinism is unusual and worth flagging: most models introduce at least minor variation between runs, especially with growing context. Granite didn’t. That may be hinting at a pretty low temperature setting, or the model was simply trained on this specific exercise and given its tiny weight pool to draw from it may have only a single set of results it can generate. Or it may be a combination of both.
The code itself is correct and production-ready — proper 6k±1 optimization, clean edge case handling for values under 4, and a step-by-step explanation that any junior developer learner like me could follow. IBM’s enterprise DNA pokes through in the explanation section, through, because this model writes like someone who has lived in the corporate world for a while. The explanation is almost suspiciously thorough for a 2.6B model. The 4096 token context ceiling is the only real constraint — tighter than most modern competitors at this weight class, so no long deep conversations, before context collapses.

The conclusion – TL;DR

Granite 3 Dense 2B provides good output quality and polished explanations, while sipping GPU resources. At 2.1GB and 200+ t/s, it’s one of the fastest usable coding models you can run locally. The enterprise Apache 2.0 license makes it commercially clean and worry-free, and IBM’s training show maturity, no hallucinated drama. It won’t win any creativity awards, it can clearly write code, can do good explanations, but this is not a reasoning model. And its dense, not a lighter MoE kind, nor can it do visuals.
One thing which got my attention is its tooling capability – it supports tools and RAG, so that vastly changes the perspective — this could be a great assistant model for your in-house apps – it has the speed and tools to do so – summarizations, translations, some light code error checking or QA. But given its 4k context window, do not expect it to do complicated tasks.

The AI output (pasted & unedited)

All code from the AI@Home series is available in the Archer Dynamics GitHub repo.

Here’s a simple Python function that checks if a number is prime:

			
def is_prime(n):
    if n <= 1:
        return False
    elif n <= 3:
        return True
    elif n % 2 == 0 or n % 3 == 0:
        return False
    i = 5
    while i * i <= n:
        if n % i == 0 or n % (i + 2) == 0:
            return False
        i += 6
    return True

		

This function works by checking divisibility starting from 2 up to the square root of the number. It skips even numbers and checks multiples of 3, as well as the next two numbers in that sequence, because all primes are of the form 6k ± 1. This optimization makes the function more efficient.

Here’s a step-by-step breakdown:

If the number is less than or equal to 1, it’s not prime.
If the number is less than or equal to 3, it’s prime.
If the number is divisible by 2 or 3, it’s not prime.
Start checking divisibility from 5 (the next odd number after 3).
Check if the number is divisible by the current number or the next number in the sequence (n % i == 0 or n % (i + 2) == 0).
If the number is divisible, it’s not prime.
If the loop completes without finding any divisors, the number is prime.

Need Help?

Need help streamlining your processes or solving tricky business problems? I offer one-on-one consultations to get you unstuck fast. Book a free consultation with me today at goarcherdynamics.com.

Want more practical tips and workflow hacks? I publish them regularly on my blog — check it out and subscribe for newsletter updates: goarcherdynamics.com

Archer Dynamics