What’s inside the report
If you’re working on Arabic LLMs or deploying AI in Arabic-speaking markets, this report gives you a real-world view of what works and where it works best.
This report offers an independent, multi-task evaluation of 9 top-performing large language models on Arabic. Download the full report now.
This report is intended only for recipients who accessed it through their aiXplain subscription. To approve further distribution, please contact care@aixplain.com. We are happy to support your use of this report.
If you’re working on Arabic LLMs or deploying AI in Arabic-speaking markets, this report gives you a real-world view of what works and where it works best.
Llama 4 Maverick and Scout included—tested across 11 core NLP tasks
GPT-4o mini dominates in some tasks while Command R+ leads in text classification
Smaller open-source models like Gemma 2 and Qwen2.5 show surprising strength
All models evaluated on real Arabic data using ROUGE-L and BLEU metrics