Community Story: ABLA – Revolutionizing Arabic Learning with AI Agents

Team Haa (ح) proudly secured 1st place at the HALA!: Hackathon for Arabic Language in Abu Dhabi, hosted by aiXplain. The hackathon was kicked off at the Arabic NLP Winter School co-located with the 31st International Conference on Computational Linguistics (COLING 2025). Our solution, the Arabic Bilingual Learning Agent (ABLA), demonstrated the power of AI agents in learning the Arabic language and its rich nuances.
Problem: Arabic Language Learning Journey
Arabic is one of the most widely spoken languages globally, with deep historical roots and diverse dialects. However, traditional language learning methods often fail to capture its beauty while showcasing its linguistic richness. Learners often struggle to learn Arabic’s intricate script, grammar, and cultural nuances due to a lack of personalized, accessible teaching resources.
Traditional Arabic learning methods pose several challenges:
- Grammar complexity: Arabic grammar is highly intricate, with rules that shift based on context, dialect, and sentence structure. Learners often struggle to grasp its nuances without detailed, interactive explanations.
- Dialect diversity: The Arabic language spans numerous dialects, each with distinct vocabulary, pronunciation, and usage. This variation makes it difficult for learners to navigate and communicate effectively across different regions.
- Historical and literary richness: Arabic has a vast and deep literary heritage, particularly in poetry, which plays a significant role in the language’s history and cultural identity. However, many learners lack access to structured resources that explore these aspects.
- Limited access to modern learning tools: Many educational platforms rely on traditional textbook-based learning rather than incorporating AI-driven solutions that enhance accessibility, engagement, and understanding. The absence of intelligent tools limits learners from experiencing Arabic more interactively and intuitively.
These limitations inspired the development of ABLA, an intelligent language-learning assistant that simplifies complex Arabic concepts and provides personalized, context-aware explanations.
Goal: Making Arabic More Accessible to Learners
Language learning is often a journey filled with challenges, especially when tackling a language as rich and intricate as Arabic. Our solution reimagines learning the Arabic language using a very simple chatbot interface powered by the latest advancements in natural language processing and generative AI.
With the ability to answer intricate linguistic questions, provide historical context, and guide learners through cultural insights, our solution paves the way for learning languages using the power of AI agents. Whether you’re a native speaker deepening your understanding or a beginner taking your first steps, ABLA can understand both Arabic and English and most of the diverse Arabic dialects. ABLA makes learning Arabic more accessible than ever, revealing its beauty to the world.
Solution: ABLA

“ABLA” in Arabic means “teacher” in multiple dialects. Our solution ABLA, also short for “Arabic Bilingual Learning Agent”, is an AI agent designed to transform the way people engage with the Arabic language. ABLA is built with four main technologies: machine translation, dialectical classification, speech synthesis, and retrieval-augmented generation (RAG). We discovered that improving Arabic-powered LLMs relied on reinforcing these key subsystems, as they address fundamental challenges in Arabic language processing and education.

- Machine translation: ABLA was developed as a language learning agent, requiring a machine translation system for users who don’t speak Arabic.
- Dialectical classification: Arabic is the official language in many countries, with Modern Standard Arabic (MSA) being the standard, but dialects are more commonly spoken. A system that identifies these dialects helps learners navigate the language more effectively.
- Speech synthesis: Pronunciation is a major hurdle in learning Arabic due to its phonetic complexity. A high-quality text-to-speech (TTS) system allows learners to improve their listening skills and pronunciation.
- RAG system: Arabic poetry, known for its significant impact on the culture, plays a key role in learning the language. A retrieval-augmented generation (RAG) system was developed to enhance LLMs’ understanding of Arabic poetry and its complex nature.
Key Features
ABLA’s key features consist of:
- Smart natural language interaction: AI-powered language models interpret queries and generate precise, easy-to-understand responses.
- Dialect identification and translation: Users can input sentences, and ABLA will identify the dialect (e.g., Egyptian, Levantine, or Gulf Arabic) while providing translations and variations.
- Historical and cultural insights: From the origins of the Arabic script to the first scholar who added diacritical marks, ABLA brings history to life with engaging narratives.
- AI-enhanced poetry and literary analysis: Users can ask ABLA to interpret classical Arabic poetry or explore the meaning behind literary texts.
- User-friendly interface: A responsive, interactive interface ensures an engaging experience, making Arabic learning accessible to all levels.
Technology Overview
Tech stack
- Languages: Python
- Frameworks: aiXplain, Streamlit
- Tools: Machine translation system, speech synthesis system, dialect classifier powered by ADIDA’s API, RAG pipeline
- LLMs: GPT-4o
Methodology
Developing ABLA involves building an Arabic language teaching application using an agentic system developed with aiXplain’s framework. The methodology includes setting up dependencies, importing necessary libraries, using off-the-shelf models for machine translation and speech synthesis, creating utility models with external APIs for dialectical classification, and creating a RAG pipeline for Arabic poetry.
Step 1: Setting up the environment
The first step involves installing the required Python libraries and frameworks.
# Install required libraries
pip install aixplain
Step 2: Setting up subsystems
Next, we set up the needed subsystems for the agent. First, we are setting up the machine translation (MT) system, which enables automatic translation between Arabic and other languages. It utilizes pre-trained translation models to provide accurate text conversion. aiXplain’s pipeline API can be used to integrate state-of-the-art translation models.
# Import AI pipeline modules
from aixplain.factories import AgentFactory
# Define a translation agent
mt_tool = AgentFactory.create_model_tool(
function=Function.TRANSLATION,
supplier=Supplier.MICROSOFT,
)
Second, we set up the speech synthesis system, which is responsible for converting Arabic text into spoken words using a state-of-the-art text-to-speech (TTS) model from the aiXplain framework. This is useful for pronunciation assistance and auditory learning.
# Define Arabic speech synthesis system
ss_tool = AgentFactory.create_model_tool(
function=Function.SPEECH_SYNTHESIS,
supplier=Supplier.MICROSOFT,
)
Third, we set a dialect classifier for Arabic text, identifying the Arabic dialects of any written text. Instead of training a model from scratch, we leverage ADIDA API, which supports real-time inference and provides probabilities for different dialects. To use this API, we used aiXplain’s UtilityModel as follows:
from aixplain.factories import ModelFactory
# Define function to call the ADIDA API
def main(text: str) -> str:
# <- code that calls ADIDA API goes here
return dialects_probs
# Define a UtilityModel to call the ADIDA API
dialect_model = ModelFactory.create_utility_model(
name="Arabic Dialect Expert",
code=main,
output_examples="Arabic Dialect Expert"
)
# Arabic Dialect Expert
dialect_tool = AgentFactory.create_model_tool(model=dialect_model.id)
Lastly, we set up the Arabic Poetry RAG system, which uses indexed data from the Arabic Poetry Dataset. This system enables the retrieval of Arabic poetry verses based on user queries. It first searches a poetry corpus and then generates contextually relevant responses using a language model.
from tqdm import tqdm
from aixplain.factories import IndexFactory, PipelineFactory
from aixplain.modules.agent import PipelineTool
# Start with an Document Retrieval system
index_model = IndexFactory.create(
name="Arabic Poetry RAG",
description = "Arabic Poetry RAG system"
)
# Insert indexed information from the dataset
start = 0
for batch_idx in tqdm(range(start, len(documents), 20), desc="Indexing"):
try:
index_model.upsert(documents[batch_idx : batch_idx+20])
except:
print(f"Error on batch {batch_idx}")
# Create the RAG pipeline
pipeline = PipelineFactory.init("Arabic Poetry RAG System")
OPENAI_GPT4O_MINI_ASSET = "669a63646eb56306647e1091"
llm_node = pipeline.text_generation(asset_id=OPENAI_GPT4O_MINI_ASSET)
search_node = pipeline.search(asset_id=index_model.id)
# <- Connect pipeline nodes here
# save pipeline
pipeline.save(save_as_asset=True)
# Create RAG tool
poetry_rag_tool = PipelineTool(
pipeline=pipeline.id,
description="Question-Answering System about Arabic Poetry"
)
Step 3: Combine all subsystems into an agent
Now that all the needed systems are created, let’s combine them into an AI agent.
from aixplain.factories import AgentFactory
# Define Agent role
ROLE = "" #<- Agent role goes here
agent = AgentFactory.create(
name="ABLA Arabic Teacher",
description=ROLE,
tools=[
poetry_rag_tool,
mt_tool,
ss_tool,
dialect_tool
]
)
Step 4: Run and deploy the application
The last step involves using the agent we just created into a Streamlit application and make it available for users. This can be done locally or deployed using a containerized approach. Here, we are going to walk you through how to use our agent locally:
- Clone our GitHub repository and install dependencies
git clone https://github.com/aiXplainHackathon/2025-Team-Haa # install dependencies pip install -r requirements.txt
- Add your aiXplain token to apis.json file
- Run the Streamlit app
streamlit run Main_Chat.py
How aiXplain Empowers ABLA
Leveraging state-of-the-art AI models via aiXplain API, ABLA provides intelligent responses beyond surface-level translations. By integrating a GPT-4o-powered agent with powerful AI tools such as:
- A machine translation system for translating non-Arabic questions.
- A speech synthesis system for reading out Arabic text when prompted by the user.
- A dialect classifier powered by ADIDA’s API.
- A robust Retrieval-Augmented Generation (RAG) mechanism trained on Arabic Poetry Dataset, which contains over 70,000 classic poems with rich metadata that enables the agent to have the cultural background to answer poetry-related questions.
This seamless fusion of technologies enables ABLA to provide detailed, context-aware answers. Additionally, chain-of-thought reasoning enhances its ability to break down complex Arabic concepts into clear, digestible explanations.

Results
ABLA’s core technologies have enhanced existing off-the-shelf models & systems. The following are some comparisons of the agentic system before enhancements and after:
Arabic poetry understanding
- Agent (without RAG)
- Agent (with RAG)
Dialect classification
Example 1:
- Agent (without ADIDA integration)
- Agent (with ADIDA integration)
Example 2:
- Agent (without ADIDA integration)
- Agent (with ADIDA integration)
ABLA’s chatbot interface
The following is a simple conversation with ABLA to showcase the different features it offers.

Conclusion
The ABLA project was an exciting exploration of AI-driven language learning, combining machine translation, dialect classification, speech synthesis, and retrieval-augmented generation (RAG) for Arabic poetry. The seamless integration of diverse models, systems, and pipelines through aiXplain’s Agentic platform was the driving force behind bringing ABLA to life. One of the key learnings from this project was the importance of handling multilingual NLP challenges. We also gained experience in session management, optimizing real-time AI inference, and deploying scalable AI applications.
Overall, this project showcased the potential of AI in language education and provided valuable insights into developing intelligent, interactive learning tools. Future improvements may include faster response times, broader dialect coverage, and more interactive learning features.
Future Work
Our journey with ABLA is just beginning, future enhancements will include the following:
- Personalized learning paths: AI-driven recommendations based on user progress.
- Multimedia support: Integration of images, videos, and interactive maps to provide a rich, immersive exploration of Arabic culture, geography, and heritage.
- Expanded historical and literary resources: Deeply dives into Arabic’s linguistic evolution.
- Real-time dialect translation: Instant conversion between dialects to help learners navigate regional variations.