From Concept to Creation: Building the BaristaAgent

Listen to this article

Building an AI Agent That Knows Its Coffee

In part 1, I walked through my journey of understanding AI agents, from grappling with concepts like function calling and LLMs to exploring how agents combine reasoning, action, observation, and response to solve tasks. With the theory clear, I couldn’t wait to move from understanding what agents are to actually building one.

That’s when I turned to aiXplain’s Agentic Framework—a platform I work with every day, designed to make creating agents surprisingly straightforward. The idea of building my own agent felt both exciting and a little daunting, so I decided to keep it simple and personal.

As a self-proclaimed coffee connoisseur (read: someone who can’t survive mornings without caffeine), I’ve always had a love for brewing the perfect cup. I even have an espresso machine at home and occasionally attempt latte art—though my creations usually look more like abstract art than actual designs. One day, I hope to channel my inner barista and serve up café-worthy masterpieces. Until then, I thought, why not start small and build a digital version of my barista dreams—a BaristaAgent?

The goal was straightforward: to create an agent that suggests coffee recipes using given ingredients, pulling options from Google Search while also generating unique recipes of its own. Additionally, it had to present the output in a structured JSON format, clearly differentiating between its suggestions and search results.

How to Build the BaristaAgent

Building the BaristaAgent was a hands-on experience that brought together everything I’d learned about AI agents. The output had to be structured, with clear differentiation between my agent’s suggestion and the search results.

The full code is available in this Google Colab

Step 1: Setting up the framework

The first step involved setting up the development environment. Using aiXplain’s Agentic Framework, I started by importing the necessary libraries and configuring my API key:

# Install aiXplain package
# pip install aixplain

# Import libraries and set API key
import os
os.environ["TEAM_API_KEY"] = "your-api-key"
from aixplain.factories.agent_factory import AgentFactory
from aixplain.modules.agent import ModelTool

Step 2: Defining the agent’s capabilities

The core of the agent is its ability to suggest recipes and fetch similar ones using Google Search. I defined this functionality in the agent’s description:

# Define the agent's capabilities
agent = AgentFactory.create(
    name="Barista Recipe GPT4o",
    description="""You are a professional barista. Given a set of ingredients, you should:
        1. Suggest your own unique coffee recipe using the provided ingredients
        2. Use Google Search to find 1-2 similar existing recipes
        3. Format your response as:

        MY SUGGESTED RECIPE:
        [Your unique recipe with clear steps and measurements]

        SIMILAR RECIPES FROM SEARCH:
        [1-2 recipes you found, with credit to sources]""",
    tools=[model_tool],  # Include the Google Search tool
    llm_id="6646261c6eb563165658bbb1"  # GPT-4O
)

This description acts as the blueprint for the agent’s behavior, ensuring its outputs align with the intended format.

Step 3: Generating the recipe

With the agent ready, it was time to run a query and see the results. Using a predefined set of ingredients, I asked the agent to suggest a recipe and retrieve similar recipes from Google Search:

query = "What kind of coffee can I make with cardamom, nutmeg, Ethiopian blend coffee beans, milk, and sugar?"
response = agent.run(query)

The output was formatted as follows:

MY SUGGESTED RECIPE:

Cardamom Nutmeg Ethiopian Coffee

Ingredients:
- 1/4 cup Ethiopian blend coffee beans
- 1/2 teaspoon ground cardamom
- 1/4 teaspoon ground nutmeg
- 1 cup milk
- 2 teaspoons sugar (adjust to taste)

Instructions:
1. Grind the Ethiopian coffee beans to a medium-coarse consistency.
2. In a saucepan, combine the milk, cardamom, nutmeg, and sugar. Heat over medium heat until the mixture is warm but not boiling.
3. Brew the coffee using your preferred method (e.g., French press, drip coffee maker) with the ground coffee beans.
4. Once the coffee is ready, pour it into a mug.
5. Froth the spiced milk mixture using a frother or whisk until it becomes frothy.
6. Pour the frothed milk over the brewed coffee.
7. Stir gently and enjoy your aromatic Cardamom Nutmeg Ethiopian Coffee.

SIMILAR RECIPES FROM SEARCH:

1. Cardamom Coffee Recipe: This recipe uses cardamom pods, cinnamon sticks, and Yrgacheffe coffee beans to create a spiced coffee blend. [Source: Google Search]
2. Spiced Ethiopian Coffee: A recipe that combines Ethiopian coffee with spices like cardamom and nutmeg for a rich, aromatic drink. [Source: Google Search]

Step 4: Saving the results

To store the results for further analysis, I saved the agent’s response in a JSON file. This made it easy to revisit and evaluate the agent’s performance later:

output_text = response["data"]["output"]
# Store the response
response_data = {
        "gpt4o": {
            "query": query,
            "response": output_text
        }
 }

# Save the response to a JSON file
with open('barista_responses_gpt4o.json', 'w') as f:
    json.dump(response_data, f, indent=4)

Conclusion

One challenge I faced was ensuring the agent’s output was well-structured and easy to understand. Fine-tuning the prompt in the agent’s description helped address this. Additionally, integrating the Google Search tool required careful testing to ensure the results were relevant and accurate.

Building the BaristaAgent demonstrated how accessible it is to create a functional AI system with the right tools. Using aiXplain’s Agentic Framework, I was able to focus on designing the agent’s behavior and capabilities while the framework handled the underlying complexities. If you are curious about how I built the BaristaAgent, aiXplain’s Agentic Framework and SDK offer everything you need to create similar agents with ease. From defining an agent’s goals to integrating tools like Google Search, these resources simplify the process and make your ideas actionable.

However, creating an agent is just the first step. How well does it perform in real-world scenarios? By measuring metrics like runtime and credit usage, I’ll compare the performance of different LLMs used in the BaristaAgent. In part 3, I’ll take a closer look at how the BaristaAgent performs by analyzing key metrics like runtime and credit usage. Stay tuned to see how the different LLMs compare in terms of efficiency and practicality.