ai/granite-4.0-h-small

Verified Publisher

By Docker

Updated 4 months ago

32B long-context instruct model with RL alignment, IF, tool use, and enterprise optimization.

Model
1

5.3K

ai/granite-4.0-h-small repository overview

Granite-4.0-h-Small

logo

Description

Granite-4.0-H-Small is a 32B parameter long-context instruct model finetuned from Granite-4.0-H-Small-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Characteristics

AttributeDetails
ProviderGranite Team, IBM
Architecturegranitehybrid
Cutoff dateNot disclosed
LanguagesEnglish, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese (extensible via finetuning)
Tool calling
Input modalitiesText
Output modalitiesText
LicenseApache 2.0

Available model variants

Model variantParametersQuantizationContext windowVRAM¹Size
ai/granite-4.0-h-small:32B

ai/granite-4.0-h-small:32B-Q4_K_M

ai/granite-4.0-h-small:latest
32.21 BMOSTLY_Q4_K_M1M tokens18.80 GiB18.14 GB

¹: VRAM estimated based on model characteristics.

latest32B

Use this AI model with Docker Model Runner

docker model run ai/granite-4.0-h-small

Considerations

  • Optimized for instruction following, tool/function calling, and long-context (up to 128K tokens) scenarios.
  • Strong generalist capabilities: summarization, classification, extraction, QA/RAG, coding, function-calling, and multilingual dialogue.
  • Multilingual: best performance in English; a few-shot approach or light finetuning can help close gaps for other languages.
  • Safety & reliability: despite alignment, the model can still produce inaccurate or biased outputs—apply domain-specific evaluation and guardrails.
  • Infrastructure note: trained on NVIDIA GB200 NVL72 at CoreWeave; use acceleration libraries (e.g., accelerate, optimized attention/KV cache settings) for efficient inference.

Benchmark performance

CategoryMetricGranite-4.0-h-Small
General Tasks
MMLU (5-shot)78.44
MMLU-Pro (5-shot, CoT)55.47
BBH (3-shot, CoT)81.62
AGI EVAL (0-shot, CoT)70.63
GPQA (0-shot, CoT)40.63
Alignment Tasks
AlpacaEval 2.042.48
IFEval (Instruct, Strict)89.87
IFEval (Prompt, Strict)85.22
IFEval (Average)87.55
ArenaHard46.48
Math Tasks
GSM8K (8-shot)87.27
GSM8K Symbolic (8-shot)87.38
Minerva Math (0-shot, CoT)74.00
DeepMind Math (0-shot, CoT)59.33
Code Tasks
HumanEval (pass@1)88.00
HumanEval+ (pass@1)83.00
MBPP (pass@1)84.00
MBPP+ (pass@1)71.00
CRUXEval-O (pass@1)50.25
BigCodeBench (pass@1)46.23
Tool Calling Tasks
BFCL v364.69
Multilingual Tasks
MULTIPLE (pass@1)57.37
MMMLU (5-shot)69.69
INCLUDE (5-shot)63.97
MGSM (8-shot)38.72
Safety
SALAD-Bench97.30
AttaQ86.64

Tag summary

Content type

Model

Digest

sha256:7a1eb06e9

Size

18.1 GB

Last updated

4 months ago

docker model pull ai/granite-4.0-h-small:32B

This week's pulls

Pulls:

71

Last week