Jinbin Bai

I received B.S. in Computer Science from Nanjing University and high school diploma from the Affiliated High School of Shanxi University. After that, I studied at CS Dept. of National University of Singapore and founded MeissonFlow Research (See Organization Card for more details) for developing masking paradigm in generative modeling.

I am trying to find ways to build interactive models and algorithms for content creation. I want to build the world with visual prior, though i sadly agree that the language prior dominates current unified models. I love imagination, I love Astronomy. So I made an analogy in the figure below.

Email  /  Google Scholar /  Github  /  Hugging Face  /  X  

profile photo

Jinbin in Cambridge, UK, 2025.

Research

Universal gravity pulls matter into wells. Gradient descent pulls models into minima. Optimization landscapes are just the gravity wells of learning. Some minima (like Mercury) are just... hard to reach.

Why "noyii"

The name noyii is a small, personal echo of Noÿs from Isaac Asimov’s The End of Eternity. In the final chapters of the novel, Noys reveals what Eternity’s obsession with “optimizing” history really does to humanity: by surgically removing risk and deviation, it also removes the branches on which any ambitious future could grow.

In the novel, Noÿs explains that while humanity remained safely on Earth for 125,000 Centuries, younger civilizations had caught up, passed them, and colonized the entire Galaxy. When humanity finally looked outward, they found the stars barred against them.

“When we moved out into space, the signs were up. Occupied! No Trespassing! Clear Out! Mankind drew back its exploratory feelers, remained at home. But now he knew Earth for what it was: a prison surrounded by an infinity of freedom . . . And mankind died out!”
Excerpt From: Isaac Asimov. “The End of Eternity.”

It wasn't a sudden cataclysm, but a slow suffocation of the soul caused by the lack of a frontier. Noÿs describes the ultimate fate of this "safe" humanity:

“They didn’t just die out. It took thousands of Centuries. There were ups and downs but, on the whole, there was a loss of purpose, a sense of futility, a feeling of hopelessness that could not be overcome. Eventually there was one last decline of the birth rate and finally, extinction. Your Eternity did that.”
Excerpt From: Isaac Asimov. “The End of Eternity.”

noyii stands for the alternative. It aligns with Musk's philosophy that we must take the risks now to ensure we are the "first true colonists" later.

That is why, for a long time, I have anchored my readme.md with these two defining citations as a testament to the timeline I choose to inhabit:

“With that disappearance, he knew, even as Noÿs moved slowly into his arms, came the end, the final end of Eternity.

– And the beginning of Infinity.”

Excerpt From: Isaac Asimov. “The End of Eternity.”

“In the course of things children will be born, and families raised on Mars—the first true colonists of a new branch of human civilization.”

Excerpt From: Zubrin, Robert. “The Case for Mars.”

News

  • 2025-09   Two papers accepted by NeurIPS 2025.
  • 2025-06   Two papers accepted by ICCV 2025.
  • 2025-04   One paper accepted by CVPR 2025 AI for Content Creation Workshop.
  • 2025-04   One paper accepted by IJCAI 2025.
  • 2025-04   Invited Talk from Riot Video Games.
  • 2025-03   Awarded Frontier Top Ten Young Scholars Award (1st) from Century Frontier Asset Management.
  • 2025-03   Invited Talk from University of Illinois Urbana-Champaign (UIUC).
  • 2025-02   One paper accepted by CVPR 2025.
  • 2025-01   One paper accepted by ICLR 2025, see you in Singapore!
  • 2024-12   One paper accepted by AAAI 2025.
  • 2024-11   Invited Talk from Safe SuperIntelligence (SSI) Club.
  • 2024-04   One paper accepted by IJCAI 2024, see you in Jeju!
  • 2023-08   One paper accepted by BMVC 2023.
  • 2023-07   Two papers accepted by ACM MM 2023.
  • 2023-07   Two papers accepted by ICCV 2023.
  • 2023-06   Taming Diffusion Models for Music-driven Conducting Motion Generation accepted by AAAI 2023 Summer Symposium, with Best Paper Award.
  • 2023-05   One paper accepted by ICIP 2023, see you in Kuala Lumpur!
  • 2023-02   Translating natural language to planning goals with large-language models now on arxiv.
  • 2022-11   One paper accepted by ACCV 2022.
  • 2022-06   LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval now on arxiv.
  • 2021-03   Awarded as Outstanding Graduate by Nanjing University.
  • 2019-03   Awarded as Outstanding Student by Nanjing University.

Selected Publications

guide


From Masks to Worlds: A Hitchhiker’s Guide to World Models
Jinbin Bai, Yu Lei, Hecong Wu, Yuchen Zhu, Shufan Li, Yi Xin, Xiangtai Li, Molei Tao, Aditya Grover, Ming-Hsuan Yang
Technical Report 2025
[Paper] [GitHub]
A Hitchhiker’s guide for those who want to build worlds. We follow one clear road: from early masked models, to unified architectures that share a single paradigm, then to interactive generative models, and finally to memory-augmented systems that sustain consistent worlds over time.

DiMOO


Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
Alpha VLLM Team
Technical Report 2025
[Paper] [Model] [Code]
Lumina-DiMOO is a unified masked diffusion model that can not only generate high-resolution images, but also support multimodal capabilities including text-to-image, image-to-image, and image understanding. SOTA performance with novel application Interactive Retouching!

Muddit


Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
MeissonFlow Research
Technical Report 2025
[Paper] [Model] [Code]
Muddit (offical Meissonic II) is a unified masked diffusion model that can not only generate high-resolution images, but support multimodal capabilities including text-to-image, image-to-text, and VQA. We verified one unified model can be trained from visual prior learned by Meissonic!

Meissonic


Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
MeissonFlow Research
ICLR 2025
[Paper] [Model] [Code] [Demo] [Discord_Discussion] [Toturial_EN] [Toturial_JA] [Media_Report_CN]
Meissonic is a text-to-image masked diffusion model that can generate high-resolution images. It is designed to run on consumer graphics cards. The left figure is generated by Meissonic.

Miscellaneous

  • I am a huge fan of Cities: Skylines and I love designing and simulating cities. I can't wait for the release of Cities: Skylines II on Oct 24th, 2023! And, I've attended World Cities Summit (WCS) 2024 Conference!
  • My favorite movies in recent years is Free Guy, and I dream of designing a game like this.
  • I enjoy traveling and have visited 13 countries, guess where I have been?
  • I like swimming, diving, surfing, beach under the sunshine.