What’s inside the report
If you’re working on Arabic LLMs or deploying AI in Arabic-speaking markets, this report gives you a real-world view of what works and where it works best.
This report offers an independent, multi-task evaluation of 12 top-performing large language models on Arabic. Download the full report now.
This report is intended only for recipients who accessed it through their aiXplain subscription. To approve further distribution, please contact care@aixplain.com. We are happy to support your use of this report.
If you’re working on Arabic LLMs or deploying AI in Arabic-speaking markets, this report gives you a real-world view of what works and where it works best.
12 LLMs evaluated, including open and closed models like SILMA, Jais, and ALLaM
Benchmarked on 11 real-world tasks such as question answering, reasoning, summarization, and translation
Introduces LLM-as-a-Judge, a new metric using Gemini 2.5 Flash to assess coherence and semantic quality
SILMA 9B and GPT-4.1 lead overall, with ALLaM 7B and Qwen3 14B excelling in reasoning and code