I build high-throughput backend systems and AI retrieval pipelines — architecting services that stay fast and reliable at 10M+ requests a day.
I'm a software engineer who lives in the backend — the queues, caches, and data pipelines that have to stay fast when traffic spikes and stay correct when things fail.
At Samsung Electronics America, I lead a core authentication service handling 120K+ requests a day and a pricing engine serving 10M+. Lately I've gone deep on AI infrastructure: RAG pipelines, vector search, LLM tooling, and MCP agents that do real work in production.
I care about the unglamorous parts — sub-100ms p99s, fault-tolerant ingestion, observability you can actually debug with. The kind of engineering you only notice when it's missing.
A production-grade 5-layer RAG system with async ingestion, OCR fallback, and hierarchical chunking. Resumable embedding guarantees zero data loss on worker failure. An 8-step retrieval pipeline + cross-encoder reranking lifted answer quality ~60%; TTL caching cut repeat latency 95%+.
Celery workers ingest movies, generate embeddings, and persist to pgvector; an LLM agent re-ranks cosine-similarity candidates for natural-language quality. A Transactional Outbox guarantees no record loss between ingestion and vector storage under worker failure.
A hybrid threading + asyncio model parallelizes real-time price streams across instruments. Event-driven per-instrument task queues enable low-latency RSI signal processing and automated order execution — without shared-state contention.
Built with Next.js (SSR + App Router) and Pusher pub-sub for real-time bidirectional events — no polling overhead. Hybrid persistence via NeonDB + local storage, with CDN edge caching for sub-50ms global asset delivery.
Things I've learned building systems under load — RAG internals, latency hunting, and architecture decisions.
// Posts coming soon — check back shortly.