Selected Work

Projects

Each project here started with a question about where current AI systems fall short — and an attempt to build something that actually addresses it.


01 / 06

Lantern Intelligence

Demo on Request
PythonFastAPIRAGChromaDBOllamaSQLite

Why it matters

Most financial AI either hallucinates numbers or ignores the actual data entirely. Lantern solves that by grounding every response in SQL execution before the LLM is ever involved — making outputs auditable and trustworthy for real business decisions.

A multi-agent AI accounting assistant that lets small businesses query their financial data in plain English. The system runs three simulated company databases simultaneously — the same question produces different, grounded answers across each one.

Available for live demo upon request

02 / 06

Lumen

In Development
PythonFastAPILLM EvalLLM-as-judge

Why it matters

Most teams deploying LLMs have no systematic way to know when their model regresses, drifts, or fails on edge cases. Lumen treats evaluation as ongoing infrastructure, not a one-time check — because the real risk isn't the first deployment, it's the tenth.

A blackbox LLM evaluation system that tests any model through inputs and outputs alone — no access to model internals required. Works with any provider, requires no infrastructure changes from the user.

In development

03 / 06

Abductive Reasoning with LLMs

PythonPyTorchTransformersContrastive LearningRST

Why it matters

LLMs are surprisingly bad at choosing the most plausible explanation for an event — they pattern-match rather than reason causally. When two hypotheses look nearly identical in embedding space, models can't reliably pick the right one. This research identifies exactly where that failure begins.

A research project exploring abductive inference through a dual-hypothesis framework — contrasting gold explanations, evidence-derived hypotheses, and deliberately inverted hypotheses to probe the geometry of abductive space. Submitted to SemEval 2026 Task 12. Co-authored with Yifei Zhang and Echo Canaday at CU Boulder.


04 / 06

Housing Price Forecasting

Time-series RegressionXGBoostElastic NetPythonZillow API

Why it matters

Housing affordability is one of the most consequential financial decisions people make, yet most analysis is surface-level. This project treats the buy vs. rent question as a rigorous forecasting problem — modeling the macroeconomic drivers that actually move prices, not just the prices themselves.

A time-series study predicting rent and mortgage costs across Denver, Boulder, and Fort Collins using macroeconomic indicators from Zillow, the Federal Reserve, and the Bureau of Labor Statistics. Twelve models were developed and compared; XGBoost and Elastic Net emerged as the strongest performers.


05 / 06

Multimodal Interview Outcome Predictor

TF-IDFWord2VecProsodic FeaturesRandom ForestSHAPEBM

Why it matters

Prediction accuracy alone isn't enough when the decision affects people. This project was built around interpretability from the start — understanding exactly which signals drive each prediction so the model's behavior can be audited and trusted.


06 / 06

Wolfie — Emotion-Aware Music Generation

In Design
Generative AIDeep LearningEmotion ModelingMusic

Why it matters

Most generative music AI optimizes for statistical plausibility — it sounds like music, but it doesn't feel like anything in particular. Wolfie is built around the opposite goal: emotional coherence first, with harmonic structure serving the feeling rather than the other way around.

Deferred pending hardware — design phase complete