How to build Your own AI Agent

Coding Tutorial

Use the power of advanced technology to automate and streamline your tasks. AI agents are the current trend in artificial intelligence research and application for a good reason. This article will be a good introduction for those who want to upskill themselves in the AI agent field.

AI is everywhere. Whenever we go into social media or have conversations with our family members, AI topics will make their way into another.

But why have AI topics become so hot these days? It’s because of the potential to automatically perform tasks that usually take time but can be done quickly with AI.

Since the launch of the ChatGPT product, the possibility of a Large Language Model (LLM) application for people's works has become a highlight. This becomes even more apparent with the introduction of stronger reasoning models, where the model can “think” better than the previous iteration.

The topic of AI Agents has gained momentum as models have improved their reasoning abilities. While the concept of agents existed long before LLMs became popular, it’s only recently that they’ve surged in popularity—thanks to their ability to perceive their environment and take actions to achieve specific goals.

As Agents become essential in the technology and business world, we will explore what Agents are and how to build your own in this article.

What is an AI Agent?

Before we proceed with the technical tutorial, let’s first understand what an AI agent is.

As stated previously, an agent can perceive anything in its environment and act on that environment to achieve the goal. Conceptually, the agent is an entity that can make decisions independently given its situation. The image below shows how agents work at a high level.

Imagine a human as the agent. A human can perceive external stimuli from the environment through its sensors (eye, ear, etc.) and act on the environment using its actuators (hand, feet, etc.). To process and act upon all the information, the agent will require a core that can make decisions (brain).

The concept of agent above is formalized in computer science as software or systems that can act autonomously to achieve goals predating the idea of AI Agents, in which we use Artificial Intelligence as the Agent core. The earliest example of AI Agents is ELIZA, a conversational program that uses pattern matching to simulate human-like dialogue.

As technology evolves, so do AI agents. In a recent publication, the Google team introduced a clear definition of AI Agents and outlined their key components. The team defines AI Agents in this document as “Autonomous systems that can observe the environment, reason about their goals, and take action using external tools.”

This means that the Agent is a system capable of reasoning about its actions based on input.

AI Agents core doesn’t run by itself but relies on cognitive architecture. It’s an architecture comprised of three components, including:

  1. Model: The language model (LM) is the central decision-making engine.
  2. Tools: Allowing the model to interact with the external systems.
  3. Orchestration: Governs how the agents accept information and are responsible for maintaining memory, state, reasoning, and planning.

If we illustrate the architecture, it will look like the image below.



Each component complements how the AI Agents work, but not every component necessarily needs to be present. For example, AI Agents can still perform without access to the tools.

That is a quick introduction to AI Agents. Let’s move on to the central part of the article, which is developing your own AI Agent.

Develop AI Agent with CrewAI

Several AI Agent development frameworks have already been released—some have been around for a few years, while others are still in the experimental stage. However, I believe it’s not about which framework you use but how you leverage it to solve the problem.

For the tutorial on developing your own AI Agent, we will use the CrewAI framework, as it’s the most popular library and has an easy-to-use API.

To start with the tutorial, we will install all the libraries we require for the tutorial. You can use the following code to do it.

pip install crewai crewai-tools langchain langchain-community duckduckgo-search arxiv

Once the libraries are installed, we can begin building the agent. 

Develop Individual Agent

Let’s create an individual agent using CrewAI that is designed to perform a specific task.

First, we need to decide which LLM to use. Since CrewAI integrates multiple LLM providers through LiteLLM, any model supported by LiteLLM can be used as the agent’s core model. We will use the Gemini-1.5-Flash model for our tutorial, so remember to acquire the API Key from Google AI Studio

Let’s start by initiating the model using the following code.

from crewai import Agent, LLM

llm = LLM(model='gemini/gemini-1.5-flash',
              api_key='GEMINI-API-KEY' )

Next, we will start to set up the agent. In this example, we will have an agent that can assist us in learning a new language. To do that, we will use the following code.

language_tutor_agent = Agent(

    role="Language Tutor",
    goal="To assist users in learning a new language by providing practice exercises.",
    backstory="A former language teacher with experience in helping students achieve
fluency in various languages."
,
    verbose=True,
    llm=llm

)

With CrewAI, we only need to describe the agents using the given hyperparameters (role, goal, and backstory). Give as many details as possible so the agent can work as expected.

Then, we must create and assign an agent a task, essentially the specific action the agent needs to perform. In CrewAI, agents can be assigned multiple tasks. We will create a task where the agent needs to provide a language practice plan based on the user's proficiency

from crewai import Task

language_practice_task = Task(
    description="Provide language {language} practice plan exercises based on the
user's proficiency level: {proficiency},
including vocabulary, grammar, and conversation practice."
,
    expected_output="A set of tailored language exercises with detailed explanations",
   agent=language_tutor_agent
)

In the code above, we provide two placeholders for user input in the task description, which are “{language}” and “{proficiency}.” We can assign the name as anything we want, but ensure we understand it. We also assign the task to the agent. 

The agent and task are ready to kick off. We must wrap them in a group with the CrewAI Crew object. As we only testing out one individual agent, we will not see any collaboration between agents for now. To start the process, we only need to execute the following code:

from crewai import Crew, Process

crew = Crew(agents=[language_tutor_agent],
tasks=[language_practice_task], 
verbose=True,
    process=Process.sequential)

output = crew.kickoff(inputs={"language": "Indonesian", "proficiency": "Beginner"})

The result can vary because we are not limiting the token output or setting the exact output structure detail in the task. Here is a screenshot example from my side where the Agent provides a learning plan output.

print(output.raw)

Congratulation! You just run an AI Agent. However, the one we execute does not truly show the Agent's strength and is not that different from a regular LLM call.

Develop Agent with Tool

Let’s add another key component of the Agent: the Tool. As discussed earlier, a Tool enables the Agent to interact with external systems and acquire real-time information, enhancing its capabilities beyond its standard LLM.

For example, let’s give the agent access to a Web Search tool. To achieve this, we can use a tool provided by LangChain. The search process can be executed using the code below:

from langchain_community.tools import DuckDuckGoSearchRun

search = DuckDuckGoSearchRun()
search.invoke("Who is elected president of United States in 2024?")

CrewAI doesn’t accept tools directly from LangChain, so we need to wrap the tool within a CrewAI tool, as shown in the code below.

from pydantic import BaseModel

from crewai.tools.structured_tool import CrewStructuredTool
from langchain_community.tools import DuckDuckGoSearchRun

class DuckDuckGoSearchInput(BaseModel):
    query: str

def duckduckgo_search_wrapper(*args, **kwargs):
    query = kwargs["query"]
   search = DuckDuckGoSearchRun()
   result = search.invoke(query)
    return result

def create_duckduckgo_search_tool():

    return CrewStructuredTool.from_function(
        name="duckduckgo_search_tool",
        description="Searches the web using DuckDuckGo and returns
relevant information for a given query."
,
       args_schema=DuckDuckGoSearchInput,
       func=duckduckgo_search_wrapper
    )

SearchTool = create_duckduckgo_search_tool()

Then, we can assign it to the Agent by giving them access to a specific tool we just created.

language_tutor_agent = Agent(
    role="Language Tutor",
    goal="To assist users in learning a new language by providing practice exercises.",
    backstory="A former language teacher with experience in helping students achieve
fluency in various languages."
,
    verbose=True,
   llm=llm,
   tools=[SearchTool],
)

The agent now has access to the Web Search Tool. However, we need to tweak the previous task as we don’t explicitly tell them to search the internet. For that reason, let’s add another placeholder, “{internet},” as a way for the user to ask the agent to search the internet or not.

from crewai import Task

language_practice_task = Task(
    description="Provide language {language} practice plan exercises based on the
user's proficiency level: {proficiency},
including vocabulary, grammar, and conversation practice.
Find the reference on internet: {internet}"
,
    expected_output="A set of tailored language exercises with detailed explanations",
    agent=language_tutor_agent

)

We ask the agent to run once more with the following code.

crew.kickoff(inputs={"language": "Indonesian", 
"proficiency": "Beginner", "internet":"Yes"})

Going through the output process, the model can access the Internet via a tool and perform a search using a query they thought would be helpful in composing the learning plan.

That’s how you let the Agent access information using a Tool. You can create as many tools as you want, and the Agent will decide whether to use them based on the input information provided to them (provided the LLM model can reason properly).

Develop Multi-Agent System

The agent is powerful, but their capability will become obsolete the more tasks they need to handle alone. That’s why a multi-agent system exists where several agents work collaboratively.

A multi-agent system expands the possibilities of what AI Agents can achieve by developing several specialized individual agents and orchestrating them within an environment where they can collaborate.

When discussing Multi-Agent systems, it’s not about a single structure that works for every system and use case. Instead, we can consider many architectural approaches when building the system.

Two of the most common architectures are either the Network or Supervisor.

Network architecture is the Multi-Agent system in which agents can communicate with each other and decide which other agent to call. In contrast, supervisor architecture uses a single supervisor agent to decide which agent should act next.

There are still many architectures you can choose and even create your own. For example, I want to develop a system where a language planning agent gathers information from a web search agent and a research paper agent before creating the plan to enhance the quality of the output.

We must set up the agents, tasks, and tools to develop the abovementioned multi-agent system. Let’s start with the research tool, where we’ll use Arxiv paper search to facilitate the process of finding relevant research papers

from langchain_community.tools import ArxivQueryRun

class ArxivInput(BaseModel):
query: str

def arxiv_search_wrapper(*args, **kwargs):

    query = kwargs["query"]
   arxiv_tool = ArxivQueryRun()
   result = arxiv_tool.invoke(query)
    return result

def create_arxiv_tool():

    return CrewStructuredTool.from_function(
        name="arxiv_paper_tool",
        description="Finds academic papers on Arxiv based on the user's query.",
       args_schema=ArxivInput,
       func=arxiv_search_wrapper
    )

arxiv_tool = create_arxiv_tool()

Then, we will initiate the web agent and research agent with the following code.

web_agent = Agent(

    role="Web Agent",
    goal="To gather general information from the web using a search engine regarding.",
    backstory="A curious researcher with expertise in finding
accurate information from various web sources."
,
    verbose=True,
   llm=llm,
   tools = [search_tool]
)


web_task = Task(

    description="Perform a web search using DuckDuckGo to find the most relevant and
up-to-date information for language {language} studying for {proficiency} proficiency."
,
    expected_output="A summary of web search results, including key facts and references.",
   agent=web_agent,
    async_execution=True
)

research_agent = Agent(

    role="Research Agent",
    goal="To assist users in finding relevant academic papers
and summarizing research insights."
,
    backstory="A former academic researcher passionate about
spreading knowledge from scientific literature."
,
    verbose=True,
   llm=llm,
   tool = [arxiv_tool]
)


research_task = Task(

    description="Search for relevant academic papers on Arxiv based on
language {language} studying for {proficiency} proficiency."
,
    expected_output="A list of academic papers with summaries of their key contributions.",
   agent=research_agent,
    async_execution=True
)

We assign the appropriate tools to each agent tool. As in our previous example, each task is designed to allow user input. Additionally, we allow asynchronous execution for both agents so they don’t need to wait for each other to perform their tasks.

Lastly, we initiate the language tutor agent with the following code.

language_tutor_agent = Agent(

    role="Language Tutor",
    goal="To assist users in learning a new language by
providing practice exercises and corrections."
,
    backstory="A former language teacher with experience
in helping students achieve fluency in various languages."
,
    verbose=True,
   llm=llm
)

language_practice_task = Task(

    description="Provide language {language} practice plan exercises based on the
user's proficiency level: {proficiency}, including vocabulary, grammar, and conversation practice."
,
    expected_output="A set of tailored language exercises with detailed explanations",
   agent=language_tutor_agent,
   context = [web_task, research_task]
)

The difference now is that the language practice task requires finishing both web and research tasks before starting this agent.

With everything in place, we will execute our multi-agent system to produce a language learning plan.

crew = Crew(

   agents=[
       web_agent,
       research_agent,
       language_tutor_agent,
    ],

   tasks=[
       web_task,
       research_task,
       language_practice_task,
   ],
    verbose=True,
   process=Process.sequential
)

output = crew.kickoff(inputs={"language": "Indonesian", "proficiency": "Beginner"})

The result is a learning plan with enhanced web and research paper resources output. You can tweak the context to make the agent network intertwine even more.

Lastly, we process all the examples above sequentially, meaning the system moves from one agent to another, with the final agent deciding when to stop. This behavior is achieved by setting the Process.sequential parameter.

If you want a supervisor who decides which agents to act and when to stop, we can change the Crew process parameters and add the manager agent who supervises everything.

        process=Process.hierarchical,
        manager_llm = manager_llm

That’s all you need to understand about creating your own AI Agent. We still haven’t touched upon many topics, such as the reasoning process, memory, and many more. But let’s keep that for a much more in-depth article.