Jiakai Chen

Jiakai Chen

About

I am a Master of Computer Science student at the University of Illinois Urbana-Champaign, with broad interests in multimodal representation learning, robot learning, and embodied AI. I am currently a research intern at the Rehg Lab at UIUC, where I work on pose estimation from real-world visual data.

Prior to UIUC, I was admitted to the PhD program in Finance at the University of Rochester, where I later earned a master’s degree. Deciding to master out and restart on a new path was one of the most difficult but also one of the most important decisions in my academic journey. My training at Rochester provided me with a strong foundation in research, quantitative reasoning, and principled problem formulation. Over time, I realized that my core interests lie in AI and robotics, with the goal of building intelligent systems that can perceive, reason, and act in the real world. This experience not only shaped my research perspective, but also gave me the clarity and conviction to pursue the problems that matter most to me.

I received my bachelor’s degree in Financial Technology, with a second major in Computer Science, from the University of Hong Kong, where I built a strong foundation in quantitative analysis and computing, and developed an early interest in machine learning through coursework.

jiakaic3 [at] illinois [dot] edu · @jiaka1chen · LinkedIn · Resume

News

  • [Apr 2026] I will attend ICLR 2026 and am happy to connect in person. I am actively looking for research opportunities in robot learning and embodied AI.
  • [Mar 2026] ProCLIP: Product Space Multimodal Contrastive Alignment was accepted to GRaM workshop @ ICLR 2026.

Research Interest

My research interests lie in multimodal representation learning, robot learning, and embodied AI, with an emphasis on building unified frameworks that connect perception, representation, and action. I focus on geometry-aware and structure-preserving representations for modeling heterogeneous multimodal data. My ultimate goal is to enable scalable transfer from visual data to real-world robotic systems, advancing generalizable and physically grounded embodied intelligence.

  • Learning transferable action representations from video: Addressing data scarcity in robot learning by extracting reusable motion priors from large-scale human videos, including pose estimation and latent action abstraction for cross-embodiment transfer.
  • Geometry-aware multimodal alignment: Developing representation spaces that capture heterogeneous semantics via mixed geometries, and enabling dynamic alignment across modalities at multiple levels.
  • Towards unified representation and generation: Exploring frameworks that integrate representation learning with generative modeling to capture both structure and dynamics of the world.

Selected Projects

ProCLIP: Product Space Multimodal Contrastive Alignment

Jiakai Chen and Hangke Sui. GRaM Workshop @ ICLR 2026. [paper]

This work challenges the standard single-manifold assumption in multimodal learning by introducing a mixed-curvature product space that models hierarchical, angular, and continuous semantics in a unified framework. By replacing cosine similarity with a geometry-aware metric, it provides a principled and lightweight drop-in improvement to CLIP-style models. The approach consistently outperforms single-geometry baselines on image–text retrieval tasks.