Efficient ML Systems, Quantization & Compression
Compression and quantization-aware methods for ML workloads where data movement is a bottleneck, with emphasis
on practical representations that reduce bandwidth and memory pressure without changing model behavior.
- Low-precision and entropy-aware representations for ML tensors
- Compact codec state and high-throughput compression
Representative: Shannonic; Atalanta; Every Bit Matters
Hardware/Software Co‑Design for AI Data Movement
Systems work that connects compression algorithms and quantized data formats to real implementation constraints, including codec state, cache behavior, throughput, hardware cost, and deployment across memory, network, and edge–cloud links.
- Codec-aware software and hardware implementations
- Data movement reduction across system boundaries
Representative: Shannonic; Atalanta
Evaluation Methodology for Efficient Training
Evaluation methods for efficient training systems, with emphasis on fair comparisons, reproducible baselines,
and metrics that reflect practical end-to-end training behavior.
- Wall-clock and time-to-target evaluation
- Benchmarking methodology for training acceleration
Representative: Selective Workload Skipping (IHIET‑AI 2025)
Computationally Efficient Graphics
Systems work on efficient graphics pipelines, with emphasis on compression, latency, quality, and resource-aware deployment for emerging visual representations.
- Computationally optimal Gaussian splatting
- Rate–distortion–latency trade-offs
Representative: COGS; Gaussian splatting systems
AI-Assisted Educational Systems
Human-in-the-loop AI-assisted grading workflows for large programming courses, focusing on rubric alignment, grading consistency, human oversight, and responsible deployment.
- Rubric-grounded feedback and grade suggestions
- Instructor- and TA-controlled grading workflows
Current work: AI-assisted grading for large programming courses
Selected Prior & Collaborative Work
Broader systems work on on-device neural interfaces, with emphasis on reducing resource requirements while
preserving application-level quality.
- Scalable spike sorting for untethered brain-machine interfaces
- Energy- and throughput-aware deployment
Representative: Marple (ASPLOS 2024)