David Koleczek

I am a Senior Applied Scientist in Microsoft’s Office of the CTO, advising executives on the future of AI, its progress, and its implications, grounded in science. I build demos, write technical content, and contribute to open source. Previously I worked on M365 Copilot, Microsoft Loop, and was a member of MAIDAP.

Prior to Microsoft, I was a data scientist at MassMutual and at ISO New England where at the latter I created forecasts to reliably predict energy demand for New England, which to my knowledge, is still used daily (yes, I’m proud of that).

2022 -

Senior Applied Scientist

I started at Microsoft as an Applied Scientist in the MAIDAP program where I helped lead efforts to open-source a "Guided Conversations" agent framework as a demo in Semantic Kernel, create Loop Copilot, advance Copilot technology for Microsoft Federal, and automated cloud incident root cause analysis for Azure. Then as a member of the AI team for Microsoft Loop, I helped develop features for Copilot Pages such as being able to edit pages from Copilot Chat. And finally I am in my current role in the Office of the CTO.

2020 - 2022

Data Scientist

I was a Data Scientist at MassMutual as a member of the Data Science Development Program, working on various projects in Investment & Finance and Cybersecurity & Fraud. I completed my Master's in CS at this time as well.

2020 - 2022

M.S. Computer Science

2017 - 2020

Data Science Intern

I was an intern in the Day-Ahead Forecasting and Related Markets team at ISO New England. I worked on a variety of projects, including a machine learning system to forecast day ahead energy demand. As of August 2021, it is being used as part of daily electric grid operations for the entire New England electric grid!

2016 - 2020

B.S. Computer Science

Portfolio

Open Source

agent-core(opens in new tab)

agent-core is a Python library providing common building blocks for creating AI agents. It uses InteropRouter as a unified interface for managing different AI model providers.

Open Source

agent-tui(opens in new tab)

agent-tui is a terminal user interface for AI agents built with Textual. It is built on top of agent-core and InteropRouter.

Open Source

InteropRouter(opens in new tab)

InteropRouter is designed to seamlessly interoperate between the most common AI providers at a high level of quality. It uses the OpenAI Responses API types as a common denominator for inputs and outputs, allowing you to switch between providers with minimal code changes.

Open Source

Microsoft Eval Recipes(opens in new tab)

Eval Recipes is a library dedicated to making it easier to keep up with the state-of-the-art in evaluating AI agents. It currently has two main components: a benchmarking harness for evaluating CLI agents (GitHub Copilot CLI, Claude Code, etc) on real-world tasks via containers and an online evaluation framework for LLM chat assistants. The common thread between these components is the concept of recipes which are a mix of code and LLM calls to achieve a desired tradeoff between flexibility and quality.

Open Source

Microsoft Amplifier(opens in new tab)

Amplifier brings AI assistance to your command line with a modular, extensible architecture. My contributions include evaluation and building out core provider modules.

Patent

December 2025

US Patent Application US-20250378320-A1(opens in new tab)

US Patent Application US-20250378320-A1 - "GENERATIVE AGENT GUIDED CONVERSATIONS FOR ARTIFACT COMPLETION" (Pending)

Game Dev

October 2025

Trash Dash(opens in new tab)

A game where you knock pieces of trash into the air and collect them in your dump truck. Avoid the obstacles and get a high score! Built for Ludum Dare 58.

Open Source

Semantic Workbench Document Assistant(opens in new tab)

The Document Assistant is an AI assistant in Microsoft's Semantic Workbench focused on being easy to use for everyone with a core feature being reliable document creation and editing, grounded in all of your context across files and the conversation.

Chat Context Toolkit Message History Management Diagram

Open Source

Chat Context Toolkit(opens in new tab)

The Chat Context Toolkit is a Python library, currently a part of Microsoft's Semantic Workbench designed to efficiently manage context for most AI agents. Read more on LinkedIn.

The chat context toolkit provides these three core, modular components:

Message History Management: Applies context engineering techniques to ensure that messages fit within a token budget.

Archive: A task for archiving and processing chunks of the message history that may no longer fit within a token budget to ensure older data can still be considered.

Virtual Filesystem: Creates a common abstraction for LLMs to read, edit, and explore files coming from a variety of disparate sources.

Project

July 2025

TinkerTasker(opens in new tab)

TinkerTasker is an open-source and local first CLI agent similar to the likes of Claude Code and Codex. It's a project that allowed me to focus on teaching about important AI tech like the Model Context Protocol (MCP), while also still having unique aspects: namely it is fully hackable by being simple and modular and I developed it to run completely locally without any APIs at all.

Open Source

not-again-ai(opens in new tab)

not-again-ai is a collection of various building blocks that come up over and over again when developing AI products. The key goals of this package are to have simple, yet flexible interfaces and to minimize dependencies.

Patent

April 2025

US Patent Application US-20250131289-A1(opens in new tab)

US Patent Application US-20250131289-A1 - "Knowledge Graph Extraction" (Pending)

Diggity Diggity Diggity Dash Cover Image

Game Dev

April 2025

Diggity Diggity Diggity (Dash)(opens in new tab)

DIGGITY DIGGITY DIGGITY ITS TIME TO GO RACING IN THE DEPTHS OF MOLE HILLS!!!

You are a mole who is in D.O.W.N.S.P.E.E.D. (Digging Operators with Notable Speed Pioneering Earth Excavation & Depth). Your goal is to win the championship and take home the cup! Built for Ludum Dare 57.

Project

December 2024

ReDoodle(opens in new tab)

ReDoodle is a "daily" web puzzle game where you are given a starting image, and your goal is to transform it into a goal image through a series of prompts.

An example to demonstrate how prompt formatting impacts GPT-3.5-turbo-16k-0613 model's performance based on our experiments on multiple choice questions related to international law from the MMLU benchmark

Publication

November 2024

Does Prompt Formatting Have Any Impact on LLM Performance?(opens in new tab)

Jia He, Mukund Rungta, David Koleczek, Arshdeep Sekhon, Franklin X Wang, Sadid Hasan

In the realm of Large Language Models (LLMs), prompt optimization is crucial for model performance. Although previous research has explored aspects like rephrasing prompt contexts, using various prompting techniques (like in-context learning and chain-of-thought), and ordering few-shot examples, our understanding of LLM sensitivity to prompt templates remains limited. Therefore, this paper examines the impact of different prompt templates on LLM performance. We formatted the same contexts into various human-readable templates, including plain text, Markdown, JSON, and YAML, and evaluated their impact across tasks like natural language reasoning, code generation, and translation using OpenAI's GPT models. Experiments show that GPT-3.5-turbo's performance varies by up to 40% in a code translation task depending on the prompt template, while larger models like GPT-4 are more robust to these variations. Our analysis highlights the need to reconsider the use of fixed prompt templates, as different formats can significantly affect model performance.

Game Dev

October 2024

Bacteria Bashers(opens in new tab)

Use your tiny bacteria to fight foes and grow into something bigger than the sum of your tiny parts! Built for Ludum Dare 56.

Open Source

Guided Conversations(opens in new tab)

Guided Conversations is a framework in Semantic Kernel for building AI agents that lead goal-driven conversations with defined constraints, where the agent initiates dialogue, follows a structured conversation flow, exercises judgment to stay on track, and generates artifacts like notes and forms throughout the interaction. Common use cases include teaching scenarios, customer service interactions, and any situation where a "creator" defines conversation goals and collects information semi-autonomously through an AI assistant.

Game Dev

April 2024

Soul Food(opens in new tab)

Soul Food is a game where you summon monsters to satiate your hungry customers! Built for Ludum Dare 55.

Game Dev

October 2023

Societies Stranding(opens in new tab)

Societies around the galaxy are running out of space. You are the last hope to delay the stranding. Deliver pods from overpopulated planets (denoted with a red icon) to growing planets (denoted with blue icons). Get a high score before all the planets run out of space! Built for Ludum Dare 54.

Game Dev

May 2023

Courier Crusaders(opens in new tab)

Enter a realm of magical deliveries in this fantasy RPG management game. Assemble your elite team of couriers and strive to thrive in the cutthroat world of delivery services. Built for Ludum Dare 53.

Game Dev

January 2023

Barn Busters(opens in new tab)

Barn Busters is a physics based tower defense game inspired by Fall Guys, built for Ludum Dare 52. Placed in the top 20 for both innovation and fun and in the top 10% overall out of over 1,000 submissions.

Game Dev

October 2022

Big Block Mode(opens in new tab)

Big Block Mode is another take on the classic tetromino puzzler, built for Ludum Dare 51. Placed 67th for Innovation and in the top 20% overall.

Publication

July 2022

UMass PCL at SemEval-2022 Task 4: Pre-trained Language Model Ensembles for Detecting Patronizing and Condescending Language(opens in new tab)

David Koleczek, Alex Scarlatos, Siddha Karakare, Preshma Linet Pereira

The 16th International Workshop on Semantic Evaluation (SemEval-2022)

Patronizing and condescending language (PCL) is everywhere, but rarely is the focus on its use by media towards vulnerable communities. Accurately detecting PCL of this form is a difficult task due to limited labeled data and how subtle it can be. In this paper, we describe our system for detecting such language which was submitted to SemEval 2022 Task 4: Patronizing and Condescending Language Detection. Our approach uses an ensemble of pre-trained language models, data augmentation, and optimizing the threshold for detection. Experimental results on the evaluation dataset released by the competition hosts show that our work is reliably able to detect PCL, achieving an F1 score of 55.47% on the binary classification task and a macro F1 score of 36.25% on the fine-grained, multi-label detection task.

Game Dev

April 2022

Cabbage Crashers(opens in new tab)

Cabbage Crashers is a cabbage farming simulation game built for Ludum Dare 50.

Publication

February 2022

On Optimizing Interventions in Shared Autonomy(opens in new tab)

Weihao Tan, David Koleczek, Siddhant Pradhan, Nicholas Perello, Vivek Chettiar, Nan Ma, Aaslesha Rajaram, Vishal Rohra, Soundar Srinivasan, H M Sajjad Hossain, Yash Chandak

Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2022)

Shared autonomy refers to approaches for enabling an autonomous agent to collaborate with a human with the aim of improving human performance. However, besides improving performance, it may often also be beneficial that the agent concurrently accounts for preserving the user's experience or satisfaction of collaboration. We propose two model-free reinforcement learning methods that can account for both hard and soft constraints on the number of interventions. We show that not only does our method outperform the existing baseline, but also eliminates the need to manually tune a black-box hyperparameter for controlling the level of assistance. Code available at: https://github.com/DavidKoleczek/human_marl

Publication

July 2021

Intervention Aware Shared Autonomy(opens in new tab)

Weihao Tan, David Koleczek, Siddhant Pradhan, Nicholas Perello, Vivek Chettiar, Nan Ma, Aaslesha Rajaram, Vishal Rohra, Soundar Srinivasan, H M Sajjad Hossain, Yash Chandak

HumanAI workshop @ Thirty-eighth International Conference on Machine Learning (ICML 2021)

Project

May 2019 - March 2023

mlfeed.tech(opens in new tab)

An NLP-powered web application to automatically curate tweets from the machine learning community on Twitter. The content was also reposted on Twitter.

Open Source

agent-core(opens in new tab)

agent-core is a Python library providing common building blocks for creating AI agents. It uses InteropRouter as a unified interface for managing different AI model providers.

Open Source

agent-tui(opens in new tab)

agent-tui is a terminal user interface for AI agents built with Textual. It is built on top of agent-core and InteropRouter.

Open Source

InteropRouter(opens in new tab)

Open Source

Microsoft Eval Recipes(opens in new tab)

Open Source

Microsoft Amplifier(opens in new tab)

Amplifier brings AI assistance to your command line with a modular, extensible architecture. My contributions include evaluation and building out core provider modules.

Patent

December 2025

US Patent Application US-20250378320-A1(opens in new tab)

US Patent Application US-20250378320-A1 - "GENERATIVE AGENT GUIDED CONVERSATIONS FOR ARTIFACT COMPLETION" (Pending)

Game Dev

October 2025

Trash Dash(opens in new tab)

A game where you knock pieces of trash into the air and collect them in your dump truck. Avoid the obstacles and get a high score! Built for Ludum Dare 58.

Open Source

Semantic Workbench Document Assistant(opens in new tab)

Open Source

Chat Context Toolkit(opens in new tab)

The Chat Context Toolkit is a Python library, currently a part of Microsoft's Semantic Workbench designed to efficiently manage context for most AI agents. Read more on LinkedIn.

The chat context toolkit provides these three core, modular components:

Message History Management: Applies context engineering techniques to ensure that messages fit within a token budget.

Archive: A task for archiving and processing chunks of the message history that may no longer fit within a token budget to ensure older data can still be considered.

Virtual Filesystem: Creates a common abstraction for LLMs to read, edit, and explore files coming from a variety of disparate sources.

Project

July 2025

TinkerTasker(opens in new tab)

Open Source

not-again-ai(opens in new tab)

Patent

April 2025

US Patent Application US-20250131289-A1(opens in new tab)

US Patent Application US-20250131289-A1 - "Knowledge Graph Extraction" (Pending)

Game Dev

April 2025

Diggity Diggity Diggity (Dash)(opens in new tab)

DIGGITY DIGGITY DIGGITY ITS TIME TO GO RACING IN THE DEPTHS OF MOLE HILLS!!!

Project

December 2024

ReDoodle(opens in new tab)

ReDoodle is a "daily" web puzzle game where you are given a starting image, and your goal is to transform it into a goal image through a series of prompts.

Publication

November 2024

Does Prompt Formatting Have Any Impact on LLM Performance?(opens in new tab)

Jia He, Mukund Rungta, David Koleczek, Arshdeep Sekhon, Franklin X Wang, Sadid Hasan

Game Dev

October 2024

Bacteria Bashers(opens in new tab)

Use your tiny bacteria to fight foes and grow into something bigger than the sum of your tiny parts! Built for Ludum Dare 56.

Open Source

Guided Conversations(opens in new tab)

Game Dev

April 2024

Soul Food(opens in new tab)

Soul Food is a game where you summon monsters to satiate your hungry customers! Built for Ludum Dare 55.

Game Dev

October 2023

Societies Stranding(opens in new tab)

Game Dev

May 2023

Courier Crusaders(opens in new tab)

Game Dev

January 2023

Barn Busters(opens in new tab)

Game Dev

October 2022

Big Block Mode(opens in new tab)

Big Block Mode is another take on the classic tetromino puzzler, built for Ludum Dare 51. Placed 67th for Innovation and in the top 20% overall.

Publication

July 2022

UMass PCL at SemEval-2022 Task 4: Pre-trained Language Model Ensembles for Detecting Patronizing and Condescending Language(opens in new tab)

David Koleczek, Alex Scarlatos, Siddha Karakare, Preshma Linet Pereira

The 16th International Workshop on Semantic Evaluation (SemEval-2022)

Game Dev

April 2022

Cabbage Crashers(opens in new tab)

Cabbage Crashers is a cabbage farming simulation game built for Ludum Dare 50.

Publication

February 2022

On Optimizing Interventions in Shared Autonomy(opens in new tab)

Weihao Tan, David Koleczek, Siddhant Pradhan, Nicholas Perello, Vivek Chettiar, Nan Ma, Aaslesha Rajaram, Vishal Rohra, Soundar Srinivasan, H M Sajjad Hossain, Yash Chandak

Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2022)

Publication

July 2021

Intervention Aware Shared Autonomy(opens in new tab)

Weihao Tan, David Koleczek, Siddhant Pradhan, Nicholas Perello, Vivek Chettiar, Nan Ma, Aaslesha Rajaram, Vishal Rohra, Soundar Srinivasan, H M Sajjad Hossain, Yash Chandak

HumanAI workshop @ Thirty-eighth International Conference on Machine Learning (ICML 2021)

Project

May 2019 - March 2023

mlfeed.tech(opens in new tab)

An NLP-powered web application to automatically curate tweets from the machine learning community on Twitter. The content was also reposted on Twitter.