01 / ABOUT

I'm a Master's student in Computer Science at Northeastern, focused on GenAI — RAG pipelines, agents, and the eval harnesses and guardrails that decide whether an LLM feature is trustworthy enough to ship.

Underneath that is a backend engineer: I build the distributed systems — Go services, AWS infra, the production plumbing — that turn a model into something people can actually rely on.

When I'm not at a keyboard, I'm probably on a chairlift at Whistler, behind a camera, or going way too deep into NASA space documentaries.

02 / EDUCATION
2024 — Now

MS, Computer Science · Northeastern University

Focused on machine learning and the cloud infrastructure that AI systems run on. Coursework: Machine Learning, Cloud Computing, Distributed Systems, Algorithms.

2020 — 2024

BSc, Economics & Finance · Minzu University of China

Pivoted into software toward the end of undergrad; that finance background still shapes how I think about engineering tradeoffs.

View Full Résumé
03 / PROJECTS
06 shown
[ Bedrock RAG · 2-region AWS ]
AIBACKEND

CanPlan — RAG Task Planner

A production RAG backend on AWS that turns a daily-living goal into grounded, step-by-step plans, each citing its source. An eval harness caught the guardrail over-flagging 96% of outputs — three measured rounds of fixes brought it back to 100% valid citations and a grade-6 reading level.

TypeScriptAWS CDKBedrockOpenSearchDynamoDBRAG Eval
[ F1 0.83 · 1.31% params ]
AIMACHINE LEARNING

Financial Sentiment Analysis & Agent

Fine-tuned a small language model — F1 0.83 while training only 1.31% of params (a 3.4 MB adapter) — to read financial news as bullish, bearish, or neutral, then wrapped it in a chat agent that pulls live headlines end-to-end.

Live Demo
PythonLLMFine-tuningLangChain
[ ~1,884 req/s · p95 −21% ]
BACKEND

Distributed Order Processing System

An e-commerce order backend split into small services that never oversell stock, even under load — sustaining ~1,884 req/s with p95 latency cut 21% (140 → 110 ms) and zero oversell.

GoGinRabbitMQAWSPrometheusGrafanaDocker
[ 8.7× smaller · 0.05 MB ]
MACHINE LEARNING

VAD Model Compression

Compressed a CRDNN speech-detection model 8.7× smaller (0.435 → 0.050 MB) at near-baseline F1 so it runs on phones and embedded chips — then dug into exactly where a few predictions quietly slipped.

PyTorchSpeechBrainModel Compression
[ 30GB+ ETL · 26 workers ]
BACKEND

Distributed Book Recommendation

A book recommender that chews through 30 GB+ of public library data across 26 workers / 27 partitions, distilling a 3,574-feature catalog down ~14× to ~50K — figuring out which steps actually benefit from going parallel.

PythonSparkAWSMongoDB
[ Full-stack · React · Express ]
BACKEND

Movify — Movie Social Network

A full-stack movie social network where people register, follow each other, and post reviews — then like and vote on both movies and reviews. A React + Redux front end talks to an Express / MongoDB API that handles sessions and the social graph, and proxies live movie data from the TMDB API.

ReactRedux ToolkitTypeScriptExpressMongoDBTMDB API
04 / CONTACT

Let's build something great

Looking for an AI engineer for '27 — or just want to talk RAG, agents, and ski lines? My inbox is open.

Email me
© 2026 Theodore Pei · Vancouver, BC