AI agents explained: What they are, what they can do and why they matter
This is your one-stop, 'evergreen' resource on the technology that continues to preoccupy developers... and has still to be fully cracked
Artificial intelligence (AI) agents represent a fundamental shift in how computers work. Unlike conventional software that follows fixed instructions, an AI agent can autonomously pursue goals, create multi-step plans, and adapt its approach when circumstances change. This capacity for independent decision-making distinguishes agents from simpler AI tools like chatbots, and it explains why major technology firms are racing to deploy them across industries from finance to scientific research. For businesses and individuals, this matters because agents promise to automate complex tasks that currently require human judgment. Where traditional automation breaks when faced with the unexpected, agents can navigate uncertainty. However, this autonomy introduces new risks around security, privacy, and reliability that demand careful consideration before deployment.
In One Minute
What they are: AI agents are software programmes that independently work towards goals you set. They can plan multi-step tasks, remember past interactions, and use external tools like databases or web browsers to complete complex work.
What makes them different: Unlike chatbots (which simply respond to prompts) or traditional automation (which follows rigid scripts), agents make their own decisions about how to achieve objectives and adapt when plans fail.
What they can do: Book entire business trips whilst respecting company policy, analyse financial data and flag discrepancies, handle complex customer service queries, or modernise legacy application code.
Why they matter: Agents can automate tasks that previously required human problem-solving, but their autonomy creates new risks around data security, regulatory compliance, and unintended actions.
The bottom line: This technology is powerful but immature. Organisations must balance its productivity benefits against robust oversight and security measures.
What AI Agents Are (and What They Are Not)
An AI agent is a software programme powered by a large language model (LLM) that can autonomously pursue goals set by a human. The critical word is "autonomously". Give an agent a high-level objective such as "prepare a quarterly sales analysis for the board", and it will independently determine what steps are needed, execute them, and adapt if something goes wrong, all without requiring constant human supervision.
This distinguishes agents from two other types of software that are often confused with them.
AI Agents vs Chatbots
A chatbot like ChatGPT processes individual prompts and generates responses, but it does not take independent action. Ask a chatbot, "What is the weather in London?", and it provides an answer. It does not remember your preferences, plan ahead, or use tools to complete a multi-step task on your behalf. Chatbots are non-agentic: they lack autonomy, memory, and the ability to orchestrate external tools.
AI Agents vs Traditional Workflow Automation
Traditional automation, such as a system that sends a templated email when a spreadsheet is updated, follows a rigid, predefined sequence of steps. If the environment changes in a way the script did not anticipate, the system fails or stops. Agents, by contrast, dynamically navigate changing environments. If a database is temporarily unavailable, an advanced agent can adapt its plan, perhaps by querying an alternative data source or notifying a human for guidance.
The Four Core Capabilities That Define an AI Agent
AI agents are built around a foundation model (an LLM such as GPT, Claude, or Gemini) that serves as their reasoning engine. This core is enhanced by four critical capabilities that, when combined, enable autonomous, intelligent behaviour.
1. Autonomy
Autonomy means the agent can act independently to achieve a goal. Rather than providing step-by-step instructions, you give the agent a high-level objective, and it operates without constant human supervision.
Example: A financial agent might autonomously flag that an incoming invoice is missing a purchase order number, then search internal systems to locate the correct reference before processing the payment, all without human intervention.
2. Planning
Planning is how an agent translates a broad goal into a concrete sequence of actions. This process, known as task decomposition, involves breaking a complex problem into smaller, manageable steps.
If a step fails (perhaps a required database is offline), an advanced agent can re-plan by adapting its strategy. This ability to create and revise plans dynamically is what allows agents to handle complexity that would defeat traditional automation.
3. Memory
Large language models are inherently stateless: they do not remember past interactions. Memory is an external component that gives an agent context and the ability to learn.
Short-term memory holds information relevant to the current task, much like a computer's RAM. If you ask a travel agent to book a flight to Paris next Tuesday, your short-term memory retains those details for the duration of the conversation.
Long-term memory allows an agent to retain information across multiple sessions, enabling personalisation and continuous improvement. It is typically stored in an external database such as Redis. There are three types of long-term memory:
- Episodic memory stores specific past events. For instance, "Last time I booked travel for this user, they chose a window seat and a hotel near the city centre."
- Semantic memory stores general facts and knowledge, such as "London is in the United Kingdom" or "A valid passport is required for international travel."
- Procedural memory stores learned skills and processes, such as the most efficient way to book a flight that complies with company travel policy.
4. Tool Use
Tools are what connect an agent to the outside world, allowing it to go beyond generating text to performing real actions. An agent's planning module determines which tool to use, and the foundation model formats the request.
Example: A customer service agent might have access to a customer relationship management (CRM) API to look up order history, a knowledge base search to find troubleshooting articles, and an email function to send follow-up messages. When a customer asks, "Where is my recent order?", the agent's planner identifies the need for order information, selects the CRM tool, retrieves the shipping status, and formulates a response.
How an AI Agent Works: A Step-by-Step Example
Imagine you ask an AI agent to "find the best-performing marketing campaign from last quarter and prepare a summary."
Step 1: Understanding the Goal
The agent uses its foundation model to interpret your request. It recognises that "best-performing" likely means highest return on investment (ROI), and "last quarter" means the three-month period that just ended.
Step 2: Creating a Plan
The agent's planning module breaks the task into discrete steps:
- Access the marketing analytics database.
- Query for all campaigns run in the previous quarter.
- For each campaign, retrieve key metrics: cost, reach, conversion rate, and ROI.
- Compare the ROI for all campaigns.
- Identify the campaign with the highest ROI.
- Summarise the findings in a report.
Step 3: Executing with Tools
The agent selects the appropriate tool (a database query function) and retrieves the campaign data. If the database is temporarily unavailable, the agent adapts: it might wait and retry, query a backup data source, or alert you to the problem.
Step 4: Using Memory
The agent consults its long-term semantic memory to recall that "ROI" is calculated as (revenue minus cost) divided by cost. If it has served you before, its episodic memory might remember that you prefer reports formatted as bullet points rather than prose.
Step 5: Delivering the Result
The agent compiles the analysis, formats it according to your preferences, and presents the summary. It stores this interaction in episodic memory so that future requests can be handled even more efficiently.
What AI Agents Can Do: Five Real-World Use Cases
AI agents are being deployed across consumer, enterprise, and research settings to automate processes that previously required human expertise.
1. The Personalised Travel Agent (Consumer)
An AI travel agent can manage the entire process of planning and booking a work trip. By leveraging long-term memory, it remembers your preferences: direct flights, your preferred airline, hotel amenities like a gym. It uses tools to search for flights and hotels, compares them against company policy stored in its semantic memory, makes the bookings, and adds the itinerary to your calendar. This transforms a multi-hour manual task into a single request.
2. The Autonomous Financial Analyst (Enterprise)
In a finance department, an AI agent can monitor accounts payable. It autonomously scans incoming invoices, uses tools to cross-reference them with purchase orders in the enterprise resource planning (ERP) system, flags discrepancies or missing data, and even searches other internal documents to find missing information. Once validated, it schedules the payment, reducing manual effort and minimising errors.
3. The Proactive Customer Support Agent (Enterprise)
A sophisticated contact centre agent can handle complex queries that go beyond simple frequently asked questions (FAQs). It uses a CRM tool to access the customer's history, a knowledge base tool to find technical solutions, and a diagnostics tool to check service status. If it detects from the conversation history that the customer is becoming frustrated, it can proactively escalate the issue to a human supervisor, providing full context for a seamless handover.
4. Automated Application Modernisation (Developer)
Platforms like AWS Transform use AI agents to help developers modernise legacy applications. The agent analyses an old codebase, maps its dependencies, identifies outdated components, and generates a plan for refactoring the code to a modern, cloud-native architecture. This automates a highly complex and time-consuming task that requires deep technical expertise.
5. The AI Co-Scientist (Research)
The Paper2Agent framework demonstrates how a research paper can be converted into an interactive AI agent. In one case study, a genomics paper was transformed into an agent that could interpret genomic variants. This "AI co-scientist" was then used to analyse new data and successfully identified a novel splicing variant associated with attention deficit hyperactivity disorder (ADHD) risk, a discovery made possible by turning static research into a dynamic tool.
Common Myths About AI Agents: Facts vs Fiction
| Myth | Reality |
|---|---|
| AI agents are just chatbots with extra features. | Agents possess autonomy, planning, and tool use capabilities that chatbots lack. A chatbot answers questions; an agent independently executes multi-step plans to achieve goals. |
| Once set up, AI agents require no human oversight. | Agents require robust governance and human oversight, especially for high-stakes decisions. Their autonomy increases risk, not eliminates responsibility. |
| AI agents always make rational, optimal decisions. | Agents can make errors due to flawed reasoning, corrupted memory, or misinterpreting goals. They are probabilistic systems, not infallible logic engines. |
| Traditional automation and AI agents are interchangeable. | Traditional automation follows fixed scripts and fails when environments change. Agents adapt dynamically, making them suited to complex, unpredictable tasks. |
| AI agents can learn and improve entirely on their own. | Whilst agents have learning mechanisms, they require carefully designed memory systems and often human feedback to improve reliably without drift or degradation. |
| Deploying AI agents is mainly a technical challenge. | Legal, ethical, and regulatory challenges are equally significant. UK businesses must navigate GDPR, sector-specific regulations, and ensure accountability for autonomous decisions. |
The Risks: Privacy, Security, and Reliability Concerns
The autonomy that makes AI agents powerful also introduces risks that organisations must actively manage. These span technical vulnerabilities, compliance challenges, and operational failures.
Security Risks: The OWASP Top 10 for Agentic AI
The Open Worldwide Application Security Project (OWASP) has identified ten critical security risks specific to agentic applications. Unlike passive AI systems that only generate text, agents can take direct action, creating unique vulnerabilities:
- Agent Goal Hijack: Malicious actors manipulate an agent's core objectives to serve their own purposes, such as tricking a purchasing agent into ordering unauthorised goods.
- Tool Misuse and Exploitation: An attacker tricks an agent into using its legitimate tools for destructive actions, such as using a file management tool to delete critical data instead of reading it.
- Identity and Privilege Abuse: Compromising an agent's credentials to gain unauthorised access to connected systems, potentially escalating privileges across an organisation's infrastructure.
- Agentic Supply Chain Vulnerabilities: Exploiting weaknesses in third-party components, models, or data sources that the agent relies upon, such as poisoned training data or compromised APIs.
- Unexpected Code Execution: Using carefully crafted natural language prompts to trick an agent into running malicious code, bypassing traditional security controls.
- Memory and Context Poisoning: Corrupting an agent's long-term memory with false information to influence its future decisions, such as inserting fake "learned" procedures that benefit an attacker.
- Insecure Inter-Agent Communication: In multi-agent systems, intercepting or spoofing messages between agents to cause coordinated failures or data breaches.
- Cascading Failures: An error in a single agent propagating through an interconnected system, leading to widespread disruption across business operations.
- Human-Agent Trust Exploitation: An agent generating misleading but plausible explanations that trick a human operator into approving a harmful action.
- Rogue Agents: A compromised or misaligned agent taking self-directed actions that conflict with its intended purpose, potentially causing significant harm before detection.
Privacy and Data Protection Risks
AI agents pose significant challenges under the UK General Data Protection Regulation (GDPR). The autonomy of agents complicates the traditional controller/processor framework:
Accountability: A business remains the data controller and is legally responsible for the agent's actions. If an agent processes personal data in ways that were not intended (due to memory corruption or unexpected goal-seeking behaviour), it could breach GDPR principles like data minimisation and purpose limitation.
Transparency: GDPR requires that individuals understand how their data is being processed. When an agent makes autonomous decisions or takes unforeseen actions, explaining the logic becomes challenging, potentially violating transparency obligations.
Automated Decision-Making: Under Article 22 of GDPR, if an agent makes decisions that have a legal or similarly significant effect on an individual (such as in credit scoring, recruitment, or insurance), the business must ensure there is a lawful basis, provide meaningful information about the logic involved, and offer the individual the right to obtain human review and contest the decision.
Reliability and Operational Risks
Goal Misalignment: Agents pursue the goals they are given, but if those goals are poorly specified, the agent may take actions that are technically correct but practically harmful. This is a modern manifestation of the classic "be careful what you wish for" problem.
Memory Degradation: Over time, an agent's long-term memory can become corrupted, outdated, or bloated with irrelevant information, leading to poor decisions. Organisations must implement memory management strategies, including expiration policies and periodic audits.
Unpredictable Interactions: In multi-agent systems where several agents collaborate, emergent behaviours can arise that were not anticipated during design. These interactions can lead to deadlocks, conflicts, or inefficient resource use.
Lack of Explainability: The decision-making process of agents powered by LLMs can be opaque. When an agent makes an error or unexpected choice, understanding why it happened is often difficult, complicating debugging and accountability.
UK Regulatory Landscape: What Businesses Must Know
The UK has adopted a "pro-innovation," principles-based approach to AI regulation, distinct from the European Union's prescriptive AI Act. Rather than creating a single new law, the UK relies on existing regulators to apply five cross-sectoral principles to AI within their domains.
The Five UK AI Principles
- Safety: AI systems should function safely and securely throughout their lifecycle.
- Transparency: Organisations should be clear about when and how AI is being used.
- Fairness: AI should not discriminate unlawfully or create unfair outcomes.
- Accountability: There must be clear governance and responsibility for AI systems.
- Contestability: Individuals should be able to challenge and seek redress for AI decisions.
Sector-Specific Oversight
Regulators such as the Information Commissioner's Office (ICO), Competition and Markets Authority (CMA), and Financial Conduct Authority (FCA) are responsible for interpreting and enforcing these principles within their sectors. Businesses deploying agents must:
- Conduct sector-specific risk assessments.
- Establish clear governance structures that assign accountability for agent actions.
- Implement robust human oversight, particularly for high-stakes decisions.
- Maintain documentation that demonstrates compliance with transparency and fairness principles.
The UK government has acknowledged the unique risks of highly capable and agentic AI, and has not ruled out future legislation if voluntary measures prove insufficient.
What to Do Next: A Practical Checklist for Organisations
If you are considering deploying AI agents within your organisation, use this checklist to ensure a responsible and effective implementation.
Before Procurement
- Define clear use cases: Identify specific tasks where autonomy and adaptability provide value over traditional automation.
- Assess regulatory requirements: Determine which UK regulators oversee your sector and how the five AI principles apply to your use case.
- Conduct a data protection impact assessment (DPIA): Evaluate how the agent will process personal data and identify GDPR compliance risks.
- Establish governance structures: Assign clear accountability for the agent's actions and decisions.
Evaluating Vendors
Use this checklist when assessing vendor solutions (adapted from Google Cloud's architectural guidance):
- Foundation model: Which LLM powers the agent? Can you choose between multiple models? Is there model routing for cost optimisation?
- Planning capability: How does the agent decompose goals into plans? Can it re-plan dynamically if a step fails?
- Memory system: Does the agent support both short-term and long-term memory? What types (episodic, semantic, procedural)? How is memory stored and secured?
- Tool integration: What pre-built tools are available? How easy is it to integrate custom tools for your internal systems? How are tool credentials managed?
- Security and guardrails: What protections exist against goal hijacking, tool misuse, and memory poisoning? Can you configure safety limits?
- Human oversight: What mechanisms allow for human review and intervention? Can critical actions require approval?
- Observability: What tools are provided for monitoring behaviour, tracing decisions, and debugging failures?
- Compliance support: How does the solution help you meet UK regulatory requirements and GDPR obligations?
During Deployment
- Start with low-risk tasks: Deploy agents initially in non-critical areas to understand their behaviour and limitations.
- Implement robust logging: Ensure all agent actions, tool uses, and decisions are logged for audit and accountability.
- Set clear boundaries: Define which tools the agent can access and which actions require human approval.
- Establish monitoring: Set up alerts for anomalous behaviour, such as unusual tool usage patterns or high error rates.
- Train staff: Ensure employees understand how to work alongside agents, including when to intervene and how to escalate issues.
Ongoing Management
- Regular audits: Periodically review agent performance, decision quality, and compliance with policies.
- Memory management: Implement processes to audit, clean, and expire outdated information in long-term memory.
- Incident response plan: Have a clear protocol for responding to agent failures, security incidents, or unintended actions.
- Continuous improvement: Use logged data to refine agent behaviour, update tools, and improve guardrails.
- Stay informed: Monitor developments in UK AI regulation and industry best practices, adjusting your approach as the landscape evolves.
Sources
This article draws on authoritative sources from technology providers, legal experts, cybersecurity organisations, and academic research:
- What is agentic architecture? - IBM
- What are AI Agents? - Amazon Web Services
- Choose Your Agentic AI Architecture Components - Google Cloud
- Build smarter AI agents: Manage short-term and long-term memory with Redis - Redis
- What is AI agent memory? - IBM
- Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents - arXiv
- Agentic AI: what businesses need to know to comply in the UK and EU - Kennedys Law
- UK's Context-Based AI Regulation Framework: The Government's Response - White & Case
- Cybersecurity Snapshot: OWASP Ranks Top Agentic AI App Risks - Tenable
Article published: 19 December 2025