<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Ray Kao | Applied Research on Ray Kao</title><link>https://www.raykao.io/</link><description>Recent content in Ray Kao | Applied Research on Ray Kao</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Thu, 07 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://www.raykao.io/index.xml" rel="self" type="application/rss+xml"/><item><title>We Are Legion, We Are Building the Bobiverse: What the Bobiverse Tells Us About the Agentic Software Industry</title><link>https://www.raykao.io/posts/we-are-legion-we-are-building-the-bobiverse/</link><pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate><guid>https://www.raykao.io/posts/we-are-legion-we-are-building-the-bobiverse/</guid><description>&lt;p>I&amp;rsquo;ve finally gotten around to reading the Bobiverse series by Dennis E. Taylor, and somewhere around the third time Bob replicates himself to explore a new solar system, I had one of those &amp;ldquo;wait a minute&amp;rdquo; moments. Not about the books - they&amp;rsquo;re great, funny, and hold up well. The moment was about the software industry.&lt;/p>
&lt;p>We are building the Bobiverse. Not intentionally, not with that in mind, but converging on the same architecture because the problems are the same. That&amp;rsquo;s what I want to talk about.&lt;/p></description></item><item><title>The AI Control Plane for Software Engineering</title><link>https://www.raykao.io/posts/ai-control-plane/</link><pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate><guid>https://www.raykao.io/posts/ai-control-plane/</guid><description>&lt;p>Modern software engineering is a coordination problem masquerading as a technical one. The raw capability to write, test, deploy, and observe software is well-understood. What is largely unsolved is how to coordinate the flow of intent, context, and action across the systems that execute these activities.&lt;/p>
&lt;p>Think about what a senior engineer actually does when resolving a production incident. They move from an alert in their observability platform, to a log query interface, to a source code repository, to a deployment system, to a communication thread, and back to code. Each transition requires them to carry context that no system holds. The engineer is the integration layer.&lt;/p></description></item><item><title>Cluster Doctor Configuration: Agentic Platform Engineering - Part 2</title><link>https://www.raykao.io/posts/cluster-doctor-agentic-platform-engineering-part-2/</link><pubDate>Wed, 01 Apr 2026 00:00:00 +0000</pubDate><guid>https://www.raykao.io/posts/cluster-doctor-agentic-platform-engineering-part-2/</guid><description>&lt;p>&lt;em>Published on the &lt;a href="https://www.youtube.com/@microsoftglobalblackbelts">Microsoft Global Black Belts YouTube channel&lt;/a>.&lt;/em>&lt;/p>
&lt;p>Part 2 picks up where the architecture walkthrough left off. This one is about configuration - how you wire the agent to your actual cluster, tune its diagnostic scope, and manage the output so it is useful to the operator rather than overwhelming.&lt;/p>
&lt;p>The goal was to show that agentic tooling does not have to be a black box. With the right configuration surface, a human can stay in control of what the agent looks at and how it escalates.&lt;/p></description></item><item><title>Cluster Doctor: Agentic Platform Engineering - Part 1</title><link>https://www.raykao.io/posts/cluster-doctor-agentic-platform-engineering-part-1/</link><pubDate>Wed, 01 Apr 2026 00:00:00 +0000</pubDate><guid>https://www.raykao.io/posts/cluster-doctor-agentic-platform-engineering-part-1/</guid><description>&lt;p>&lt;em>Published on the &lt;a href="https://www.youtube.com/@microsoftglobalblackbelts">Microsoft Global Black Belts YouTube channel&lt;/a>.&lt;/em>&lt;/p>
&lt;p>Cluster Doctor is a demo I built to show what agentic platform engineering looks like beyond the toy examples. The agent takes a broken or degraded Kubernetes cluster, runs diagnostics, correlates signals across logs and metrics, and produces an actionable diagnosis - without a human manually triaging each layer.&lt;/p>
&lt;p>Part 1 covers the core agent architecture and the initial diagnostic loop. Part 2 goes into configuration and tuning.&lt;/p></description></item><item><title>m365 WorkIQ: GitHub Copilot Spec Kit Demo</title><link>https://www.raykao.io/posts/m365-workiq-github-copilot-spec-kit-demo/</link><pubDate>Wed, 01 Apr 2026 00:00:00 +0000</pubDate><guid>https://www.raykao.io/posts/m365-workiq-github-copilot-spec-kit-demo/</guid><description>&lt;p>&lt;em>Published on the &lt;a href="https://www.youtube.com/@microsoftglobalblackbelts">Microsoft Global Black Belts YouTube channel&lt;/a>.&lt;/em>&lt;/p>
&lt;p>This demo walks through using GitHub Copilot&amp;rsquo;s spec kit to accelerate the requirements and design phase of a real project. The idea is that writing a good spec is itself a bottleneck - and AI can help structure intent into something an engineering team can actually execute against.&lt;/p>
&lt;p>WorkIQ was the project context here, but the pattern applies broadly: use Copilot to get from a vague brief to a crisp, reviewable spec faster than you would writing from scratch.&lt;/p></description></item><item><title>When Infrastructure Scales but Understanding Doesn't</title><link>https://www.raykao.io/posts/when-infrastructure-scales-but-understanding-doesnt/</link><pubDate>Wed, 02 Jul 2025 00:00:00 +0000</pubDate><guid>https://www.raykao.io/posts/when-infrastructure-scales-but-understanding-doesnt/</guid><description>&lt;p>&lt;em>Co-authored with &lt;a href="https://www.linkedin.com/in/diegocasati/">Diego Casati&lt;/a> on the &lt;a href="https://azureglobalblackbelts.com/2025/07/02/when-infrastructure-scales-but-understanding-doesnt">Azure Global Black Belts blog&lt;/a>.&lt;/em>&lt;/p>
&lt;p>The scenario in this piece is familiar to anyone who has been on-call for a distributed system: a deployment fails at 2am, and resolving it means correlating logs across seventeen microservices, three monitoring systems, and a service mesh configuration change from last week. The technology scaled. The humans did not.&lt;/p>
&lt;p>The argument here is that platform engineering&amp;rsquo;s real job is not building more tools - it is making the tools we have more humane. Golden paths, opinionated defaults, AI as the conversation layer between what a developer needs and what the platform knows. Not replacing expertise. Amplifying it and making it accessible at the moment it is needed.&lt;/p></description></item><item><title>The Human Scale Problem in Platform Engineering</title><link>https://www.raykao.io/posts/human-scale-problem-platform-engineering/</link><pubDate>Tue, 24 Jun 2025 00:00:00 +0000</pubDate><guid>https://www.raykao.io/posts/human-scale-problem-platform-engineering/</guid><description>&lt;p>&lt;em>Co-authored with &lt;a href="https://www.linkedin.com/in/diegocasati/">Diego Casati&lt;/a> on the &lt;a href="https://azureglobalblackbelts.com/2025/06/24/human-scale-problem-in-platform-engineering">Azure Global Black Belts blog&lt;/a>.&lt;/em>&lt;/p>
&lt;p>Every step forward in the infrastructure stack has produced the same pattern: we solve the immediate problem and discover a harder one waiting behind it. Containers solved manual server config. Kubernetes solved container sprawl. And now we are coordinating distributed systems across teams who can barely talk to each other.&lt;/p>
&lt;p>The piece argues the real problem is not technical - it is that our solutions keep outpacing our ability to operate them at human scale. Silos are not the enemy. The inability to help specialists communicate across silos is.&lt;/p></description></item><item><title>Introduction to GitHub Advanced Security</title><link>https://www.raykao.io/posts/introduction-to-github-advanced-security/</link><pubDate>Wed, 10 Apr 2024 00:00:00 +0000</pubDate><guid>https://www.raykao.io/posts/introduction-to-github-advanced-security/</guid><description>&lt;p>&lt;em>Streamed live on the &lt;a href="https://www.youtube.com/@microsoftglobalblackbelts">Microsoft Global Black Belts YouTube channel&lt;/a>.&lt;/em>&lt;/p>
&lt;p>GitHub Advanced Security covers the three areas that matter most for supply chain security: secrets scanning, code scanning (via CodeQL), and dependency review. This live session is a practical introduction - not a feature tour, but a walkthrough of how these capabilities fit into a real developer and security workflow, and what you actually need to configure to get value from them.&lt;/p></description></item><item><title>Fixing Missing C++ Libraries in GitHub Codespaces Using Miniconda3</title><link>https://www.raykao.io/posts/fixing-cpp-libraries-github-codespaces/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://www.raykao.io/posts/fixing-cpp-libraries-github-codespaces/</guid><description>&lt;p>&lt;em>Published on the &lt;a href="https://www.youtube.com/@microsoftglobalblackbelts">Microsoft Global Black Belts YouTube channel&lt;/a>.&lt;/em>&lt;/p>
&lt;p>One of the persistent friction points with cloud development environments is dependency resolution - especially native libraries that assume a specific system configuration. This short walkthrough shows a clean way to use Miniconda3 inside GitHub Codespaces to resolve missing C++ libraries without fighting the container&amp;rsquo;s base image.&lt;/p>
&lt;p>Small fix, but the kind of thing that costs an hour if you haven&amp;rsquo;t seen it before.&lt;/p></description></item><item><title>GitHub Advanced Security, Azure DevOps, and Microsoft Defender for Cloud DevOps Security</title><link>https://www.raykao.io/posts/github-advanced-security-azure-devops-defender/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://www.raykao.io/posts/github-advanced-security-azure-devops-defender/</guid><description>&lt;p>&lt;em>Published on the &lt;a href="https://www.youtube.com/@microsoftglobalblackbelts">Microsoft Global Black Belts YouTube channel&lt;/a>.&lt;/em>&lt;/p>
&lt;p>Security tooling in the DevOps space has fragmented over the years, and a lot of teams end up with overlapping tools that don&amp;rsquo;t talk to each other well. This session walks through how GitHub Advanced Security, Azure DevOps, and Microsoft Defender for Cloud DevOps Security can be connected into a unified security posture - covering secrets scanning, code scanning, dependency review, and pipeline security together rather than in isolation.&lt;/p></description></item><item><title>MLFlow Codespaces</title><link>https://www.raykao.io/posts/mlflow-codespaces/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://www.raykao.io/posts/mlflow-codespaces/</guid><description>&lt;p>&lt;em>Published on the &lt;a href="https://www.youtube.com/@microsoftglobalblackbelts">Microsoft Global Black Belts YouTube channel&lt;/a>.&lt;/em>&lt;/p>
&lt;p>MLFlow is one of the most useful tools in the ML practitioner&amp;rsquo;s toolkit for tracking experiments, but getting it running cleanly in a Codespaces environment has some gotchas. This demo walks through a working setup - reproducible, shareable, and ready to use without fighting local environment drift.&lt;/p>
&lt;p>The broader point: if you want ML work to be reviewable and reproducible, the dev environment needs to be treated as a first-class artifact.&lt;/p></description></item><item><title>Power Apps: Makers - Video Series</title><link>https://www.raykao.io/posts/power-apps-makers-series/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://www.raykao.io/posts/power-apps-makers-series/</guid><description>&lt;p>&lt;em>Published on the &lt;a href="https://www.youtube.com/@microsoftglobalblackbelts">Microsoft Global Black Belts YouTube channel&lt;/a>.&lt;/em>&lt;/p>
&lt;p>Power Apps lowers the barrier to building apps, but makers often hit a wall when they need to connect to external data or APIs. This short series fills in the foundational concepts - not for developers, but for makers who want to understand what&amp;rsquo;s actually happening under the hood.&lt;/p>
&lt;p>Three episodes:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://www.youtube.com/watch?v=paN9mN2TBvE">EP 01 - What&amp;rsquo;s an API?&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.youtube.com/watch?v=EwzJ0dUG4XE">EP 02 - What&amp;rsquo;s JSON?&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.youtube.com/watch?v=muevOE9o4nQ">EP 03 - What&amp;rsquo;s REST?&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Azure Kubernetes Service Production Baseline: Intro</title><link>https://www.raykao.io/posts/aks-production-baseline-intro/</link><pubDate>Thu, 17 Sep 2020 00:00:00 +0000</pubDate><guid>https://www.raykao.io/posts/aks-production-baseline-intro/</guid><description>&lt;p>&lt;em>Published on the &lt;a href="https://www.youtube.com/@microsoftglobalblackbelts">Microsoft Global Black Belts YouTube channel&lt;/a>.&lt;/em>&lt;/p>
&lt;p>Running Kubernetes in production is a different problem from running it in a lab. This video introduces the AKS production baseline - the set of architectural and configuration decisions you need to make before you call a cluster production-ready. Networking, security boundaries, identity, observability: the baseline is about getting those foundations right before you ship anything on top of them.&lt;/p>
&lt;p>&lt;a href="https://youtu.be/-Hjyqxn1cqI">Watch on YouTube →&lt;/a>&lt;/p></description></item><item><title>Microsoft AKS Public Office Hours: Private Clusters</title><link>https://www.raykao.io/posts/aks-office-hours-private-clusters/</link><pubDate>Thu, 10 Sep 2020 00:00:00 +0000</pubDate><guid>https://www.raykao.io/posts/aks-office-hours-private-clusters/</guid><description>&lt;p>&lt;em>Published on the &lt;a href="https://www.youtube.com/@microsoftglobalblackbelts">Microsoft Global Black Belts YouTube channel&lt;/a>.&lt;/em>&lt;/p>
&lt;p>This AKS Office Hours session focused on private clusters - one of the more commonly misunderstood configurations at the time. The live format meant real questions from practitioners actually running into the problems: how do you handle DNS resolution, what happens to your kubectl access, how do you reach the API server from a pipeline runner? Good companion to the standalone private clusters video.&lt;/p></description></item><item><title>Azure Kubernetes - Private Clusters</title><link>https://www.raykao.io/posts/azure-kubernetes-private-clusters/</link><pubDate>Mon, 10 Feb 2020 00:00:00 +0000</pubDate><guid>https://www.raykao.io/posts/azure-kubernetes-private-clusters/</guid><description>&lt;p>&lt;em>Published on the &lt;a href="https://www.youtube.com/@microsoftglobalblackbelts">Microsoft Global Black Belts YouTube channel&lt;/a>.&lt;/em>&lt;/p>
&lt;p>Private AKS clusters keep the Kubernetes API server accessible only within your virtual network, which is a meaningful security improvement for production workloads. The trade-off is operational complexity - you need to think carefully about how your CI/CD pipelines, tooling, and operators reach the cluster. This video walks through the setup and the networking patterns that make private clusters manageable in practice.&lt;/p></description></item><item><title>About</title><link>https://www.raykao.io/about/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://www.raykao.io/about/</guid><description>&lt;img src="https://github.com/raykao.png?size=240" alt="Ray Kao" class="profile-photo">
&lt;h2 id="ray-kao">Ray Kao&lt;/h2>
&lt;p>I’m a Principal Solutions Engineer at Microsoft, part of the Cloud &amp;amp; AI Platforms Global Black Belt (GBB) team. The GBB team has historically been a specialized field engineering and product incubation team that works directly with enterprise customers to help solve complex business needs via novel technical solutions.&lt;/p>
&lt;p>I’ve dedicated 10 years of my career to this role. Most of this time has been spent working alongside large engineering organizations internally at Microsoft and within some of the worlds top enterprises, working through difficult real world problems with the dedicated people and teams responsible for the outcomes.&lt;/p></description></item></channel></rss>