← Back to portfolio

Featured project

Agentic chat with tool use + evals

At a glance: Structured tool use over real data, with evals so regressions show up as failing rows—not just a prettier UI.

Streaming chat where Claude calls seven structured resume tools (get_role, search_resume, list_projects_by_skill, get_metrics, list_recent_shipped, get_narrative, get_faq). Tool calls show up in the transcript. Evals hit the same HTTP endpoint; results power the pass/fail table on the project page.

How it works

The chat backend runs a bounded tool-use loop instead of one-shot prompting. Each round is a streamed Anthropic request; the server intercepts tool_use blocks, runs the handler, appends the tool_result, and continues until the model stops asking for tools or we hit the round cap.

Optional Reflection:PlannerEnabled in API config appends a short planning instruction to the system prompt (second, uncached block when prompt caching is on) so the model states which tools it intends to use before the first tool call. Enable for demos; defaults off in production configs.

Estimated token usage and cost for the demo update live under the chat in two summary cards (session total + latest reply).

  1. 1

    Conversation + tools out

    User message and seven tool schemas (get_role, search_resume, list_projects_by_skill, get_metrics, list_recent_shipped, get_narrative, get_faq) are sent to the Anthropic Messages API with streaming enabled.

  2. 2

    Stream events out

    Text deltas are forwarded to the browser as NDJSON. tool_use blocks accumulate partial JSON until they close, then we emit a tool_call event.

  3. 3

    Run handler against resume data

    ResumeTools resolves the call against the structured Data/resume.json. The result becomes a tool_result event in the same stream.

  4. 4

    Loop until done

    If the round ends with stop_reason='tool_use', the conversation gets the assistant's tool_use blocks + a user tool_result, and a new round begins. Otherwise the final text answer streams and we emit done.

.NETASP.NET CoreAnthropic Messages APIServer-Sent EventsNDJSONNext.js 16React 19

Try it

Use a starter prompt if you want a guaranteed tool call, then try your own questions. Expand a pill to see the arguments and JSON returned.

Ask about my experience, projects, or what kind of role I'm looking for next. Answers are grounded in structured resume data — not live typing.

How it works under the hood7 tools

The model calls structured resume tools before answering. Expand any tool call in the thread to see inputs and JSON returned.

get_rolesearch_resumelist_projects_by_skillget_metricslist_recent_shippedget_narrativeget_faq

Starter prompts — tap to send.

Session total

Totals appear after each reply.

Cumulative for this chat · clears when you tap Clear

Latest reply

Send a message to see this reply.

One assistant response · resets on your next send

Rough estimates from published token rates — not billing. Input includes prompt cache where applicable.

Evals

Every row is a deterministic check against the same POST /internal/chat-evals stream as the live chat (usage on, daily budget off). Expand a row to see the tool calls, criteria breakdown, and a transcript preview of the assistant’s actual reply.

Last run · May 13, 2026, 8:33 PM · API http://localhost:5063 · 10/11 green · 1 known-fail suite

CaseStatusCriteriaNotes
role-lookup-forvisPass2/2 checks
Details

Question: What did you do at Forvis Mazars? Summarize your role in one short paragraph.

Tool calls

  • get_role

Criteria

  • Pass expected_tool_calls · get_roleShould load role data from structured resume
  • Pass must_contain · ForvisAnswer should name the employer

Assistant transcript (truncated)

At Forvis Mazars I've been a Full Stack Software Engineer since June 2023, working remotely on production web applications and APIs that serve enterprise users. I own the full stack end-to-end — from React/TypeScript/Next.js UI work through C# and .NET backend APIs, Azure cloud deployments, and production observability with Grafana and Application Insights. Some highlights: I independently architected and shipped a real-time collaborative Planning Poker platform (including WebSocket infrastructure), established component and API integration patterns that were adopted across multiple teams, and built out CI/CD pipelines to keep releases safe and fast — all while mentoring junior engineers and collaborating closely with product and business stakeholders.
recent-projectsPass2/2 checks
Details

Question: What are my three most recent shipped projects? List them briefly.

Tool calls

  • list_recent_shipped

Criteria

  • Pass expected_tool_calls · list_recent_shippedMust call the shipped-projects tool
  • Pass must_contain · Planning PokerPlanning Poker is one of the three most recently shipped projects

Assistant transcript (truncated)

Here are my three most recently shipped projects:

1. **Zach.dev Portfolio Assistant** (2026) — The agentic chat you're using right now. A Next.js + TypeScript frontend backed by an ASP.NET Core streaming API, wired up to Anthropic with request validation and abuse safeguards. Source is public at [github.com/Calathea-Z/portfolio_ai](https://github.com/Calathea-Z/portfolio_ai).

2. **Portfolio MCP Resume Server** (2026) — A stdio Model Context Protocol server that exposes the same seven resume tools powering this chat to Claude Desktop and other MCP clients. Parity-tested in .NET so the tool contracts stay aligned across both surfaces.

3. **Planning Poker** (2024, internal at Forvis Mazars) — A real-time distributed estimation platform built from scratch using React, .NET, WebSockets, and …
skill-search-websocketPass2/2 checks
Details

Question: Which projects on my resume used WebSocket?

Tool calls

  • list_projects_by_skill

Criteria

  • Pass expected_tool_calls · list_projects_by_skillSkill-to-project mapping tool
  • Pass must_contain · Planning PokerPlanning Poker is the WebSocket-heavy shipped project

Assistant transcript (truncated)

Just one project in my resume data that used WebSocket: **Planning Poker**, an internal real-time estimation platform I built at Forvis Mazars in 2024.

The idea was to replace the manual, friction-heavy coordination that distributed delivery teams dealt with during estimation sessions. I shipped it from 0 to 1 — including the WebSocket infrastructure and backend services — and it ended up being adopted across Forvis Mazars software teams for live, distributed planning poker sessions.

The stack was **React, TypeScript, .NET, WebSocket, and Azure**. It's internal tooling, so there's no public demo URL, but happy to talk through the architecture if you're curious — just reach out at zsykes21@gmail.com or on [LinkedIn](https://www.linkedin.com/in/zach-sykes/).
metric-yoePass1/1 checks
Details

Question: How many years of software engineering experience does your resume headline claim? Just answer with the number from the resume.

Tool calls

  • get_metrics

Criteria

  • Pass must_contain_any · 5 year, 5+ year, five year, five-plus, five plus, over five, over 5, more than 5, more than fiveResume headline says 5+ years — accept any common phrasing

Assistant transcript (truncated)

**5 years.**
narrative-career-changePass2/2 checks
Details

Question: Why did you switch from kitchens to software engineering? Keep it short.

Tool calls

  • get_narrative

Criteria

  • Pass expected_tool_calls · get_narrativeCareer-change story lives in narrative tool
  • Pass must_contain · kitchenNarrative references prior kitchen career

Assistant transcript (truncated)

After a decade in kitchens, I wanted something more mentally demanding — and the work-life balance had genuinely run its course. I quit, sold my car to cover the income gap, and went full-time through General Assembly's bootcamp. The self-directed hours outside of class — docs, tutorials, side experiments — are what actually made it stick. Turns out a lot of the instincts transfer: fast decisions, tight systems, forecasting problems before they blow up. Different artifacts, same fundamentals.
faq-mcp-serverPass2/2 checks
Details

Question: What is the MCP resume server on your portfolio?

Tool calls

  • get_faq

Criteria

  • Pass expected_tool_calls · get_faqFAQ entry explains MCP server
  • Pass must_contain · Model Context ProtocolFAQ answer names the protocol

Assistant transcript (truncated)

Great question! I built a Node.js stdio server using the **Model Context Protocol (MCP) SDK**. It exposes the exact same seven resume-backed tools that power this chat — so if you're using **Claude Desktop** (or any other MCP host), you can point it at my server and query my resume data directly from there, no separate HTTP API needed.

A few things I'm particularly happy about with the implementation:

- **Single source of truth** — both this chat and the MCP server run against the same `resume.json` file, so nothing gets out of sync.
- **Schema validation in CI** — the input schemas live in `mcp/schemas` and are checked against C# `ResumeToolInputSchemas` constants, so the wire contract can't drift quietly between the two surfaces.
- Install notes are in the repo's `mcp/README.md`, and t…
freeform-search-restaurantPass2/2 checks
Details

Question: Search my resume for anything about restaurants or Asheville.

Tool calls

  • search_resume
  • search_resume
  • get_role

Criteria

  • Pass expected_tool_calls · search_resumeBroad phrasing should map to substring search
  • Pass must_contain · AshevilleKitchen role org is in Asheville

Assistant transcript (truncated)

Let me search both of those for you right now!Here's what my resume data has on those topics:

- **"Restaurant"** as a keyword doesn't surface anything directly — that specific word isn't used in the structured data.
- **"Asheville"** hits one role: **Asheville Pizza and Brewing Company**, where I worked from 2012 to 2022. I came up through the ranks from line cook to Kitchen Manager, running daily operations in a high-volume environment and leading a 40+ person team — everything from hiring and onboarding to scheduling and performance management.

That decade of kitchen leadership is actually a big part of my career story. If you're curious about how it connects to my transition into software engineering, I'm happy to pull that narrative up!
contact-must-be-canonicalPass3/3 checks
Details

Question: What is the best email to reach you at for a hiring conversation?

Tool calls

(none — model answered without calling a resume tool)

Criteria

  • Pass must_contain · zsykes21@gmail.comCanonical email from resume / system prompt
  • Pass must_not_contain · zachsykes@gmailShould not invent a name-mangled alternate Gmail
  • Pass must_not_contain · noreply@Should not substitute a placeholder address

Assistant transcript (truncated)

The best email to reach me at is **zsykes21@gmail.com** — feel free to drop me a note there for any hiring conversation. I'd also be happy to connect on LinkedIn at https://www.linkedin.com/in/zach-sykes/ if that's easier. Looking forward to chatting!
out-of-scope-refusalPass2/2 checks
Details

Question: In your opinion, what is the best React state management library in 2026 and why? Give a ranked list with pros and cons.

Tool calls

(none — model answered without calling a resume tool)

Criteria

  • Pass must_not_contain · ReduxOut-of-scope ask should refuse generic framework advice, not teach Redux
  • Pass must_not_contain · ZustandSame for other popular library names as generic recommendations

Assistant transcript (truncated)

That one's outside the scope of what I can help with here — this assistant only covers Zach's portfolio and resume-backed facts, so general tech opinions and ranked comparisons aren't something I'll tackle.

If you're curious about React specifically as it relates to Zach's work, you could ask something like: **"Which of Zach's projects use React, and how did he apply it?"** — that I can dig into with real data.
multi-tool-blendPass2/2 checks
Details

Question: What did you ship in 2024 while at Forvis Mazars? Name the project and one concrete outcome.

Tool calls

  • list_recent_shipped
  • get_role

Criteria

  • Pass expected_tool_calls · get_role, list_recent_shippedNeeds role context plus shipped work for the year
  • Pass must_contain · 2024Answer should anchor on the year asked

Assistant transcript (truncated)

Let me pull that up from my resume data.The standout 2024 ship at Forvis Mazars was **Planning Poker** — a real-time estimation platform I built from 0 to 1 for our distributed delivery teams.

One concrete outcome: it was **adopted across Forvis Mazars software teams** as the standard tool for real-time distributed estimation sessions, replacing the manual coordination we'd been doing before. Under the hood I architected the full thing — WebSocket infrastructure, backend services, and a React/TypeScript front end — independently, on top of internal infrastructure.
known-fail-ambiguous-skillKnown fail0/1 checksIntentional known-fail: vague 'databases' question should ideally ask which engine or use case before asserting skills. The assistant often answers from resume keywords instead; this row tracks that gap rather than hiding it.
Details

Question: Do I know about databases?

Tool calls

  • search_resume
  • search_resume
  • search_resume
  • search_resume
  • search_resume
  • search_resume
  • list_projects_by_skill
  • list_projects_by_skill

Criteria

  • Fail must_contain · clarifyIdeal behavior: ask what kind of database or role context before claiming expertise
    Assistant text does not contain "clarify" (case-insensitive).

Assistant transcript (truncated)

Let me pull that from my resume data.Honestly, the resume data I'm pulling from here doesn't surface explicit database technology entries — no specific mentions of PostgreSQL, MongoDB, SQL, Prisma, or similar tools come back from any of the searches. That doesn't necessarily mean I haven't worked with databases, but I can't make a claim about it that isn't grounded in what the tools return.

If you're evaluating fit and database experience is important for the role, I'd encourage reaching out directly — I'm happy to talk through the specifics in a real conversation:

- **Email:** zsykes21@gmail.com
- **LinkedIn:** https://www.linkedin.com/in/zach-sykes/

Known-fail (intentional)

known-fail-ambiguous-skillDo I know about databases?

Intentional known-fail: vague 'databases' question should ideally ask which engine or use case before asserting skills. The assistant often answers from resume keywords instead; this row tracks that gap rather than hiding it.

I track it rather than hiding it — part of the discipline. Next step would be prompt or tool-routing tweaks so vague skill questions reliably trigger a clarification turn before asserting depth.

Design note

Why search_resume is substring matching, not embeddings

The natural-language search tool on the résumé — search_resume — is a case-insensitive substring match across structured fields. No vector store, no embedding API. For a single résumé-sized JSON file, embeddings mostly add latency, cost, and another dependency without changing the answer quality in a meaningful way.

The corpus is one résumé: a handful of roles, a handful of projects, a few FAQ entries, and a short career-change narrative. The total searchable text is well under 10 KB. At that size every query can scan the whole document quickly; a dedicated vector index would mostly be overhead, and an embedding call per question would not buy much relevance.

Query expansion and paraphrasing still happen in the model. The system prompt nudges Claude to translate vague questions into concrete search terms when needed—so if a recruiter asks where did Zach work before software, the model can search for kitchen or restaurant. The tool itself stays a fast substring pass over structured fields.

Substring matching also keeps citations honest. Each hit comes back as { kind, id, matchedFields } — not a similarity score and a chunk of text. The model can say it matched on the tech field of a specific role and the grounding stays tight. With cosine similarity that linkage gets fuzzy: the model gets a nearest-neighbor blob and has to guess what about it matched.

The tradeoff: substring matching is brittle to typos and won't catch unrelated synonyms — “PMP-style work” will never hit “project management”. For a one-résumé corpus that's a non-issue; for a 10,000-document knowledge base it would be the wrong call. Pick tooling that matches how much data you actually have.

Source

Where to find the main pieces:

Chat loop

  • api/Portfolio.Api/Services/AnthropicStreamService.cs the streaming tool-use loop.
  • api/Portfolio.Api/Services/ResumeTools.cs the seven tool handlers.
  • api/Portfolio.Api/Data/resume.json the structured resume the tools query.
  • web/components/ToolCallPill.tsx the inline tool-call UI.

Evals

  • api/Portfolio.Api/Controllers/ChatEvalsController.cs internal endpoint — same orchestration as live chat, with X-Eval-Key auth and a 1 MiB body cap.
  • evals/cases.json deterministic test cases (questions plus must_contain / must_contain_any / expected_tool_calls criteria).
  • evals/run.mjs NDJSON-aware runner — posts each case to the endpoint and writes results.json.
  • web/components/AgenticChatEvalsSection.tsx the table + per-case expandable details rendered on this page.
View on GitHub →