AgentKit In Action: Build A ChatGPT Agent That Clicks, Books, And Buys

A calendar reminder pops up while you are pouring coffee. In the next five minutes, your assistant scans flights, reserves a hotel near the venue, books dinner at a spot that fits your diet, and checks out with a gift for the host. No tabs. No checkout forms. The work just happens, with your thumbs-up at the right moments. That is the promise of AgentKit in Action: Build a ChatGPT Agent That Clicks, Books, & Buys.

OpenAI’s recent wave of agent features moves this from hype to buildable reality. AgentKit gives you building blocks for real agents. Operator shows a computer-using model that can click, type, and scroll like a person. Instant Checkout introduces an open protocol for buying inside ChatGPT. Together, they create a practical path for an agent that can navigate the web, confirm plans, and securely purchase what you approve. (OpenAI)

This guide shows how to design, ship, and tune a working concierge agent for daily life and small business. The stack is simple on purpose. The workflows are opinionated. The prompts are tested. Everything aims at one outcome: an agent you can trust to click, book, and buy with you in the loop.

What “clicks, books, and buys” means in practice

Clicks means web control. The agent can press buttons, fill forms, and navigate. You can do this two ways: by connecting to a computer-using agent in ChatGPT, or by driving a headless browser from your app using Playwright. Operator demonstrates the first path inside ChatGPT. Playwright covers your custom flows when an API is missing. (OpenAI)
Books means confirmed reservations. Flights, hotels, restaurants, haircuts, repair visits, and coworking desks. Booking works best through official APIs when available, with browser control as the fallback for legacy sites.
Buys means a purchase you authorize. Inside ChatGPT, Instant Checkout routes orders through the Agentic Commerce Protocol that OpenAI co-developed with Stripe and merchants. Your agent passes only the details required, and you confirm the order before it goes through. (OpenAI)

The golden rule is human-in-the-loop at the right decision points. You approve, the agent executes, and you stay in control.

The build stack at a glance

AgentKit for visual workflow design, evals, and an embeddable chat UI called ChatKit. Agent Builder helps you compose multi-step logic, wire tools, add guardrails, and version changes. Evals and reinforcement fine-tuning push reliability higher over time. (OpenAI)
Function calling so the model chooses your tools at the right time. Your app decides whether to execute the tool call, which keeps safety, logging, and approvals under your control. (OpenAI Platform)
Browser automation with Playwright for sites without APIs. Playwright generates trusted user events and auto-waits for elements so the agent acts like a real user. (Playwright)
Agentic commerce inside ChatGPT through Instant Checkout and the Agentic Commerce Protocol for confirmed purchases. (OpenAI)

Optional add-ons: Google Calendar, Maps, OpenTable, airline and hotel APIs, Stripe server webhooks, Slack or email notifications, and a tiny SQLite or Postgres log for auditable trails.

Architecture blueprint

Intent to action pipeline

User goal in chat: “Book a one-night room near the convention center on Jan 14 with parking, then buy a thank-you gift under 50 dollars and ship it to the host.”
Classification inside AgentKit: parse goal into subgoals: hotel booking, gift purchase, confirmations.
Planning agent picks tool chain: search → book_hotel API or BrowserTask → confirm → buy_gift via Instant Checkout.
Approval gates surface summaries and costs.
Execution with retries and safe fallbacks.
Receipts saved to Notes and emailed or Slacked.
Evals log traces and outcomes for improvement.

Security posture

Only call tools after an explicit plan is visible to the user.
Always ask for confirmation before any irreversible action.
Scope tokens per vendor.
Store nothing sensitive without consent.
Keep a signed, immutable event log of actions.

Step 1: Sketch the flow in Agent Builder

Open Agent Builder and drop three core nodes:

Planner: takes the user goal and maps it to steps.
Doer: runs tool calls and browser tasks.
Verifier: checks results and flags anomalies.

Connect the nodes with two approval gates:

After plan creation.
Before any purchase.

Add a Guardrail to detect PII leakage, prompt injection, and risky instructions. Agent Builder includes guardrails support and versioned templates so you can iterate without breaking production flows. (OpenAI)

Design note: keep early versions linear. Branches explode complexity. Aim for one happy path and two graceful fallbacks.

Step 2: Define tools with function calling

Your agent needs tools for search, booking, payments, and notifications. Function calling lets the model propose a tool call with structured arguments. Your server decides whether to execute it. (OpenAI Platform)

TypeScript example: tool schema and execution

// tools.ts

import { z } from “zod”;

export const tools = {

search_hotels: {

description: “Find hotels by city and date with filters”,

parameters: z.object({

city: z.string(),

checkIn: z.string(), // ISO date

checkOut: z.string(), // ISO date

maxPrice: z.number().optional(),

parkingRequired: z.boolean().optional()

})

book_hotel: {

description: “Book a selected hotel room”,

parameters: z.object({

hotelId: z.string(),

roomType: z.string(),

guestName: z.string(),

email: z.string().email(),

holdCardToken: z.string()

})

instant_checkout_buy: {

description: “Purchase an item via agentic commerce”,

parameters: z.object({

productId: z.string(),

quantity: z.number().int().default(1),

shipTo: z.object({

name: z.string(),

line1: z.string(),

city: z.string(),

state: z.string(),

postal: z.string()

})

notify_user: {

description: “Send user a summary and ask for approval”,

parameters: z.object({

summary: z.string(),

cost: z.number(),

requiresApproval: z.boolean().default(true)

})

}

} as const;

Server-side function executor

// executor.ts

import type { ToolCall } from “./types”;

import { tools } from “./tools”;

import { searchHotels, bookHotel } from “./vendors/hotels”;

import { buyWithInstantCheckout } from “./vendors/agentic-commerce”;

import { notify } from “./vendors/notify”;

export async function executeTool(call: ToolCall) {

switch (call.name) {

case “search_hotels”:

return await searchHotels(call.arguments);

case “book_hotel”:

return await bookHotel(call.arguments);

case “instant_checkout_buy”:

return await buyWithInstantCheckout(call.arguments);

case “notify_user”:

return await notify(call.arguments);

default:

throw new Error(“Unknown tool: ” + call.name);

}

Why this pattern works
The model suggests, your app disposes. That separation gives you approvals, logging, and recovery without sacrificing speed. It is also the recommended mental model in the docs. (OpenAI Platform)

Step 3: Add browser superpowers for “clicks”

APIs are clean, but many booking and vendor sites still force you through a UI. Use Playwright to do it safely, with real trusted events and auto-wait so you avoid flaky timing. (Playwright)

Playwright task to reserve a table on a legacy site

// browserTasks.ts

import { chromium } from “playwright”;

export async function reserveTableLegacySite({ url, date, time, partySize }) {

const browser = await chromium.launch({ headless: true });

const ctx = await browser.newContext();

const page = await ctx.newPage();

try {

await page.goto(url, { waitUntil: “domcontentloaded” });

await page.getByLabel(“Date”).fill(date); // 2025-12-10

await page.getByLabel(“Time”).fill(time); // 19:00

await page.getByLabel(“Party size”).selectOption(String(partySize));

// The button might appear after async pricing loads

await page.getByRole(“button”, { name: /search|find a table/i }).click();

await page.getByText(/7:00 pm/i).first().click();

await page.getByRole(“button”, { name: /continue|reserve/i }).click();

const summary = await page.locator(“#reservation-summary”).innerText();

return { ok: true, summary };

} finally {

await browser.close();

}

Wrap every browser task in a timeout and two retries. Always include a screenshot on failure, plus a clean fallback to “ask the user to choose another time.”

Tip: keep one curated list of selectors per site. Regenerate selectors weekly if the site changes.

Step 4: Make purchases simple and safe

Inside ChatGPT, shopping is easiest with Instant Checkout. Your agent shows relevant products, asks you to confirm, and then passes details to the merchant through the Agentic Commerce Protocol. Stripe powers the flow so merchants keep using their systems while your payment stays secure. The experience is quick for users and low-friction for sellers. (OpenAI)

If you sell your own digital products, you can implement the protocol on your backend to accept agentic orders in the same chat. The pattern works for small carts now, with multi-item carts rolling out next. (OpenAI)

Step 5: System prompts that keep the agent honest

Well written instructions reduce errors more than any single trick. Start with a strict role, constraints, and approval language.

Prompt: You are a booking and shopping concierge. You plan first, then execute. Always show a numbered plan and ask for approval before any purchase or reservation. Summaries list vendor, date, item, price, and refund policy. If an API exists, use it. If no API exists, use a browser task. On risky actions, pause and ask for guidance. Never guess credentials. Never store payment data. Always produce a receipt.

For the browser side, add a micro-instruction that prevents sloppy clicks.

Prompt: When using a browser task, wait for visible and enabled elements. Confirm each navigation with a current URL check. If a selector fails, try the next best semantic selector. Fall back to human confirmation if three retries fail.

For shopping, keep costs clear.

Prompt: Before checking out, show the exact total including tax and shipping. Ask the user to confirm with “Approve purchase” or “Cancel.”

Step 6: Glue it together with ChatKit

You need a chat that feels fast and native. ChatKit lets you drop a production-quality chat into your site or app and wire it to the workflow you built in Agent Builder. It handles streaming, threads, and agent UI so you do not lose weeks rebuilding chat from scratch. (OpenAI)

Minimal ChatKit bootstrap

// App.tsx

import { ChatKit } from “@openai/chatkit-react”;

import { useEffect } from “react”;

export default function ConciergeChat() {

useEffect(() => {

// Optional: hydrate session or fetch prior thread

}, []);

return (

<ChatKit

agentId={process.env.NEXT_PUBLIC_AGENT_ID!}

theme=”system”

inputPlaceholder=”Tell me what to handle…”

onToolCall={(call) => console.log(“Tool requested”, call)}

onApproval={(summary) => console.log(“Approval requested”, summary)}

</div>

);

}

Step 7: Approvals, receipts, and logs

Approvals make or break trust. Keep them predictable.

The plan requires a thumbs-up before any action.
Purchases require a second confirm with totals.
Every action writes a signed log entry with who, what, when, where, and cost.

Example approval payload

{

“summary”: “Hilton Garden Inn, Jan 14 one night, King room, parking included. Total 189.00.”,

“actions”: [

{“type”: “book_hotel”, “hotelId”: “hgi-001”, “roomType”: “king”},

{“type”: “instant_checkout_buy”, “productId”: “gift-flowers-rose”, “quantity”: 1}

“requiresApproval”: true

}

Receipt format

Subject: “Confirmed: Hotel + Gift for Jan 14”
Body: items with totals, policies, links, and a simple “Dispute this charge” button.

Step 8: Evals that improve real performance

Add evals that reflect real user journeys. Use trace grading to catch failure modes like “booked the right date but wrong city” or “clicked the sponsor instead of the organic result.” Then use reinforcement fine-tuning to teach the model to call your tools more reliably. AgentKit ships new eval tools and RFT hooks to close this loop. (OpenAI)

Starter eval set

Plan completeness score.
Tool choice accuracy.
Selector robustness across site updates.
Checkout confirmation present.
User satisfaction emoji after each job.

A weekend concierge you can ship today

Scenario: family visit in a new city. You need a hotel near the stadium, a family-friendly dinner, and a small gift for the host.

Flow

Agent searches hotels within 1 mile of the stadium for your dates.
It proposes two options with parking and breakfast.
You approve one.
Agent books the room, then opens dinner options for 6 people at 6:30.
You pick the Italian place with gluten-free options.
Agent schedules the reservation and adds it to your calendar.
It shows three gifts under your budget, and you approve a board game.
Agent checks out and emails a receipt bundle.

Realistic friction

The restaurant site blocks automation. Agent switches to OpenTable or another API.
The hotel price fluctuates by 10 dollars. Agent asks if the change is fine.
The board game is out of stock. Agent alerts you with two in-stock alternatives.

Cost control

Every vendor has a hard cap.
Time boxing: 2 minutes per search task, then ask for guidance.
One-click cancel inside the chat.

Developer quickstart

1) Project setup

pnpm create next-app agentkit-concierge

cd agentkit-concierge

pnpm add openai zod playwright @openai/chatkit-react

npx playwright install

2) Environment

OPENAI_API_KEY=…

AGENT_ID=…

STRIPE_PUBLIC_KEY=…

STRIPE_SECRET_KEY=…

3) API route for chat with tools

// /pages/api/chat.ts

import { OpenAI } from “openai”;

import { tools } from “../../server/tools”;

import { executeTool } from “../../server/executor”;

export default async function handler(req, res) {

const { messages } = req.body;

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const response = await client.chat.completions.create({

model: “gpt-5”,

messages,

tools: Object.entries(tools).map(([name, t]) => ({

type: “function”,

function: { name, description: t.description, parameters: t.parameters }

})),

tool_choice: “auto”

});

// Handle tool calls iteratively until done

let result = response;

while (true) {

const call = result.choices?.[0]?.message?.tool_calls?.[0];

if (!call) break;

const toolResult = await executeTool({

name: call.function.name,

arguments: JSON.parse(call.function.arguments)

});

messages.push(result.choices[0].message);

messages.push({

role: “tool”,

tool_call_id: call.id,

content: JSON.stringify(toolResult)

});

result = await client.chat.completions.create({

model: “gpt-5”,

messages,

tools: [], // optional subsequent calls

tool_choice: “auto”

});

}

res.status(200).json(result);

}

4) Playwright task runner

// /server/runBrowserTask.ts

import { reserveTableLegacySite } from “./browserTasks”;

export async function runBrowserTask(taskName: string, args: any) {

if (taskName === “reserve_table_legacy”) {

return await reserveTableLegacySite(args);

}

throw new Error(“Unknown browser task”);

}

5) AgentKit workflow wiring

In Agent Builder, create nodes: Plan → Approve → Execute Tools → Approve Purchase → Finalize.
Attach guardrails: PII mask, jailbreak detection, site safety checks.
Publish to dev. Deploy ChatKit frontend to your Next.js app. (OpenAI)

Risk, policy, and trust

Agents that can buy must be boring by design. Predictability is a feature.

Confirm totals every time.
Never store raw card data.
Hand control back on CAPTCHAs, logins, and MFA. Operator uses take-over prompts for sensitive steps inside ChatGPT, and you should mirror that behavior in your product. (OpenAI)
Log everything with non-repudiation.
Limit scope of tokens and OAuth grants.
Kill switch for stuck loops.
User first on refunds and errors.

Troubleshooting playbook

It clicked the wrong button
Improve selectors. Prefer role, label, and text over brittle CSS. Add URL assertions between steps.
It booked the right date in the wrong city
Add a pre-flight consistency check that compares plan variables to the final summary page.
The checkout fails near the end
Cache the cart. On retry, reopen directly to the confirm page. Keep a cooldown between retries.
The model calls tools too often
Tighten your system prompt and raise the threshold in your planner. Use Evals trace grading to penalize over-calling. (OpenAI)

Case study: “Weekend Auto-Concierge” for a busy parent

Audience: IT dads and small business owners who need a fast planner.

Goal: one prompt handles travel, dinner, and a gift.

Outcome: median time from request to receipts under 3 minutes. Two approvals. One final message with links and calendar invites.

Why it works

Clear constraints keep the agent inside budget.
Browser tasks fill the gaps where APIs are missing.
Instant Checkout removes the payment friction. (OpenAI)

FAQ

Can I run this entirely outside ChatGPT?
Yes. Use function calling, your own UI, and Playwright for browser tasks. If you want integrated purchasing inside ChatGPT, use Instant Checkout.

Do I need special models?
Any current top model with strong tool calling works. GPT-5 offers better planning and tool selection, and AgentKit includes RFT hooks to tune your agent on real traces. (OpenAI)

What about crypto purchases or onchain flows?
You can extend the pattern with onchain agent kits if your use case needs it, but start with traditional checkout. Keep scope tight until your evals pass at scale.

Will websites block the agent?
Some will. Prefer official APIs. When you must automate, respect robots and ToS. Keep a manual path when automation is blocked.

Checklist before you go live

System prompt enforces plan → approve → execute.
Tool schemas are explicit and validated.
All irreversible actions require approval.
Playwright tasks have retries, timeouts, and screenshots.
Instant Checkout integrated for purchases inside ChatGPT with a clear approval message. (OpenAI)
Evals run nightly with trace grading and a failure digest. (OpenAI)
One-click cancel and refund instructions are present.
Signed audit logs enabled.

Copy-paste prompts to speed up your build

Prompt: Summarize the user goal into a numbered plan with tool placeholders. Each step names the exact tool and arguments you will request. Include an approval message with clear totals and policies.

Prompt: Before calling any tool, extract constraints: dates, budget, location, dietary needs, and vendor preferences. Confirm the constraints back to the user in one short paragraph.

Prompt: For browser tasks, generate a selector plan with three options per element. After each action, assert the current URL or a unique text snippet to verify progress.

Prompt: For purchases, prepare an order summary with item, price, shipping method, tax, and grand total. Ask for a plain “Approve purchase” reply to proceed.

Bringing it all together

You now have a realistic pattern to put AgentKit in Action: Build a ChatGPT Agent That Clicks, Books, & Buys into production. Use Agent Builder to sketch and version the flow. Use function calling to keep control of tools. Use Playwright for the stubborn sites. Use Instant Checkout for purchases that finish inside ChatGPT with a clean approval step. Across the stack, log every action, test with evals, and tune for reliability. The result is a helpful agent that moves at your speed and respects your wallet. (OpenAI)

What to build next

Ship a narrow concierge
Pick one city and one use case. For example, “book a hotel near the convention center, dinner for six, and a host gift under 50 dollars.” Wire the exact vendors and selectors. Measure everything.
Add evals and tune
Record traces. Grade tool choices and outcomes. Use AgentKit’s eval features and reinforcement fine-tuning hooks to push accuracy higher. (OpenAI)
Turn it into a product
Embed ChatKit on your site, add Instant Checkout for products you sell, and offer a paid tier for priority tasks and guaranteed response times. (OpenAI)

If you want, I can turn this into a starter repo with the workflows, prompts, and Playwright tasks prewired for a single city and vendor set.

What “clicks, books, and buys” means in practice

The build stack at a glance

Architecture blueprint

Step 1: Sketch the flow in Agent Builder

Step 2: Define tools with function calling

Step 3: Add browser superpowers for “clicks”

Step 4: Make purchases simple and safe

Step 5: System prompts that keep the agent honest

Step 6: Glue it together with ChatKit

Step 7: Approvals, receipts, and logs

Step 8: Evals that improve real performance

A weekend concierge you can ship today

Developer quickstart

Risk, policy, and trust

Troubleshooting playbook

Case study: “Weekend Auto-Concierge” for a busy parent

FAQ

Checklist before you go live

Copy-paste prompts to speed up your build

Bringing it all together

What to build next

Thank you for Subscribing to the Alt+Penguin Newsletter!

Related News

The Weekend AI Side Hustle: Sell a Customer Support Chatbot to Local Businesses

Sell Make.com Style Automation Blueprints: The Template Hustle Nobody Talks About

AI Customer Service Setup Packages: Chat, Email, and Help Center in One Offer

AI Policies and SOPs for Small Teams: A Paid Service Owners Love