Benchmark report Multi-Task Evaluation of Large Language Models on Arabic Datasets

aiXplain Arabic LLM Benchmark Report
May 2025 Arabic LLM benchmark report is now live.

This report offers an independent, multi-task evaluation of 18 top-performing large language models on Arabic. Download the full report now.

This report is intended only for recipients who accessed it through their aiXplain subscription. To approve further distribution, please contact care@aixplain.com. We are happy to support your use of this report.







    What’s inside the report

    If you’re working on Arabic LLMs or deploying AI in Arabic-speaking markets, this report gives you a real-world view of what works and where it works best.

    18 LLMs benchmarked—open and closed, including Arabic-optimized models like Fanar, ALLaM, and LFM

    Tested across 11 real-world Arabic NLP tasks from QA to translation

    Smaller models like ALLaM 7B and Gemma 2 often outperform much larger ones

    Creative writing and text classification are less sensitive to model size or architecture

    Previous Arabic LLM Benchmark reports

    Curious about the results?

    Fill out the form to download the detailed report.