Skip to main content
StudioMeyer

LLM Integration · Agentic AI

LLMs that actually do work. Not just talk.

Function calling and tool use with Claude and GPT. Your AI agent books appointments, sends mail, writes invoices, queries your database, calls external APIs. With guardrails, approval workflows for sensitive actions and complete audit trail. This is what sets us apart from a chatbot shop.

What you get

Six building blocks for a production-grade LLM agent

An agent in a demo is easy. An agent that works correctly over months and doesn't suddenly send wrong invoices is engineering. We build the full stack including safety net.

Tool inventory with permission map
We walk through which actions the agent is allowed to take. Reading is usually okay, writing with approval, deleting never without a human. Per tool we define scope, inputs, rate limits and sandbox behaviour.
Function schemas (OpenAI plus Anthropic)
Clean JSON schemas for every tool. Clear descriptions so the agent picks the right tool at the right time. Schema tests prevent hallucinations on parameters.
Guardrails and approval workflows
What the agent can do alone, what needs approval. Example: appointment booking without asking, invoice send with one-click approval, money transfer never without human sign-off. Configurable per tool.
Multi-tool orchestration
Complex tasks need multiple tools in sequence. We build the orchestration logic plus error handling — when tool 3 fails, the agent knows how to undo tools 1 and 2.
Audit trail and logging
Every agent action is logged with timestamp, input, output, confidence. You can trace why the agent took action Y in situation X. Important for compliance.
Fallback to human
When the agent is unsure, hits complexity or runs into an unusual constellation, it escalates to you or a defined employee. With all data and its reasoning so far.

What it's used for

Five concrete agentic AI setups

Travel agency with complex multi-step bookings

Agent books flight plus hotel plus rental car plus transfer, sends confirmation to the customer, creates the booking in the CRM, schedules the follow-up call. All in one conversation with the customer, human steps in only for special cases.

Law firm with client onboarding

Agent registers new client in the accounting system, creates the initial data processing agreement from template, sends the document for signature, schedules the kick-off meeting. Sensitive actions (invoice) need approval from the responsible lawyer.

Online shop with order cancellation

Customer says cancel my order. Agent cancels in the shop, initiates refund via Stripe or Klarna, notifies the warehouse, writes confirmation mail. All in 30 seconds, no human needed.

Medical practice software with appointment pipeline

Agent registers new patient, checks insurance status via insurance API, books matching slot in the practice calendar, sends SMS reminder 24h before. Escalates to reception for private patients.

Trade business with material ordering

Site manager says order material X for site Y. Agent checks stock, looks up the part at three suppliers, compares price and delivery time, places the order, schedules delivery in the site plan. Owner approval for orders above 1,000 EUR.

How it works

Four phases from action inventory to production

  1. 01

    Action inventory with risk assessment

    One week. We walk through all actions the agent could take with you and one to two key employees. Per action: value if right, harm if wrong, reversibility. From there we derive the approval logic.

  2. 02

    Function schemas and guardrails

    We build the function schemas for all approved actions. Each schema gets tests against typical mis-calls (wrong parameters, missing fields, hallucinations). Guardrails configuration defines what's alone vs with approval vs never.

  3. 03

    Sandbox test with real data

    We run the agent in a sandbox with test data. You give us 20-50 scenarios (including edge cases), we test each. Success criterion: 90%+ correct actions, 0% harmful actions without approval.

  4. 04

    Production rollout with audit trail

    Agent goes live. Audit trail runs from day one. First 14 days we stay close (daily log review), then we move to weekly monitoring. On drift or anomalies we tune guardrails.

Pricing

From 2,500 EUR setup per use case plus 199-499 EUR/mo hosting

Simple setups (two to four tools, one clear workflow): from 2,500 EUR one-off plus 199 EUR/mo hosting. More complex multi-tool agents (five to ten tools, multiple workflows, approval hierarchy): from 4,500 EUR setup plus 299-499 EUR/mo. LLM costs run separately on your OpenAI or Anthropic account. We give you a projection beforehand.

See pricing and packages

FAQ

Common questions about agentic AI

What's the difference to a normal chatbot?

A chatbot answers with text. An agent executes actions. The chatbot tells you the office hours, the agent actually books you an appointment. Function calling is the technical foundation — the LLM gets tool schemas and can call them instead of just generating text. Claude and GPT-4 both do this well, GPT-3.5 only in limited ways.

Which models do function calling cleanly?

Top tier: Claude Sonnet 4.6 and GPT-4 Turbo plus GPT-4o. Both very reliable on simple tools and good on multi-tool orchestration. Mid tier: Claude Haiku 4.5 and GPT-4o-mini (good for simple workflows, less reliable on complexity). Local: Llama 3.3 70B or Mistral Large can do it, but with significantly higher hallucination rate. For production we recommend Claude or GPT-4.

What if the agent takes a wrong action?

Three safety layers: 1) Guardrails determine which actions are allowed at all 2) Approval workflows require human sign-off for sensitive actions 3) Audit trail documents every action for tracking and possibly reversal. Plus: we build reversible operations wherever possible (e.g. soft-delete instead of hard-delete).

How does approval workflow work concretely?

Example: agent wants to send an invoice to a customer. Instead of sending directly, it sends a Slack or Telegram or mail message to the responsible employee with the invoice details. They answer *yes* or *no* or *change X*. Only then does the agent execute the action. Approval latency per action: seconds to minutes depending on setup.

Who's liable when the agent does something wrong?

From a GDPR and liability perspective the agent is a tool, you are the responsible party. That's why guardrails plus approval workflows are so important — they show you took reasonable measures. We document the setup auditable. For specific industries (tax, medical, law) we bring in a specialist lawyer before production, that's part of the package.

GDPR with external tool calls?

When the agent calls OpenAI or Anthropic, their data processing agreements plus EU hosting options apply. For API calls to your own systems, data stays in your infrastructure. For third-party APIs (e.g. external booking system) data goes there — we do GDPR mapping per tool before production and lay open the data processing agreements.

Next step

Free 30-minute intro call.

We look at which workflows you want to automate, which actions are sensitive and whether agentic AI is the right lever for your case. Honest assessment instead of sales pitch.

Agentic AI & Tool Use: LLM agents that actually do work | StudioMeyer | StudioMeyer