Unlocking Granite 3.2
Most users who download an AI model and run a basic chat interface are only scratching the surface of what that model can do. AI models ship with a broad set of latent capabilities that are only accessible if you know how to address them — the right prompt structures, the right system instructions, the right activation patterns. Archivist is built to expose all of that, invisibly, behind simple controls.
IBM's Granite 3.2 is a compelling example of this. It is a capable, compact model designed explicitly for enterprise use — privacy-respecting, efficient, and packed with features that most off-the-shelf applications never touch.
What the Model Card Doesn't Advertise
Granite 3.2's model card documents several specialized capabilities that require specific prompt structures and system-level instructions to activate. If you run the model through a generic chat UI, you get a capable assistant. If you read the documentation carefully and wire things up correctly, you get something more targeted.
Archivist implements three of these capabilities directly:
Extractive vs. Abstractive Responses
When answering a question from your documents, there are two fundamentally different things a model can do:
- Extractive — The model locates the most relevant passage in your retrieved documents and surfaces it directly, grounding the answer in the exact text. Useful when you need to know exactly what your document says.
- Abstractive — The model uses the retrieved passages as context and generates a synthesized, reasoned response in its own words. Useful when you want the model to interpret and explain, not just quote.
Granite 3.2 supports both modes, but activating each requires a specific prompt structure. Archivist handles this entirely behind the scenes — in the AI Settings panel you see two radio buttons. Switching between them rewrites the underlying prompt automatically.
Extended Thinking
Granite 3.2 supports a reasoning mode that instructs the model to work through a problem step by step before producing its final answer. This "thinking" phase is not shown to the user — it happens internally — but it meaningfully improves accuracy on complex, multi-part questions.
The tradeoff is speed: thinking mode takes longer. For quick lookups, it is overkill. For nuanced analytical questions across a large document set, it earns its cost. Archivist exposes this as a simple toggle. The prompt scaffolding that activates the thinking mode is handled automatically.
RAG-Aware Grounding
Granite 3.2 includes instructions in its model card for structuring retrieval-augmented prompts in a way that signals to the model that it should treat the provided passages as authoritative context — not background knowledge to be blended with its training data. This distinction matters. A generic prompt that appends document text may cause the model to blend retrieved content with its own priors. The Granite-specific prompt structure discourages hallucination and keeps the model tethered to what you actually uploaded.
Why This Matters
Most RAG applications treat the model as a black box: embed, retrieve, prompt, response. Archivist treats the model as a collaborator with documented capabilities worth understanding and using. The result is a more honest, more controllable application — one where the user's choice of response mode actually changes how the model thinks, not just what it says.
The controls are simple on purpose. The complexity lives in the implementation.