Accelerated Likelihood Maximization for Diffusion-based Versatile Content Generation

1ECE, Seoul National University 2INMC & IPAI, Seoul National University
ECCV 2026
ALM teaser showing versatile content generation results across multiple tasks
TL;DR: ALM is a training-free sampling strategy that directly optimizes unobserved regions during diffusion sampling, turning pretrained models into versatile content generators.

Abstract

Generating diverse, coherent, and plausible content from partially given inputs remains a fundamental challenge for diffusion models. Existing approaches face clear limitations: training-based approaches offer strong task-specific results but require costly computation, and they generalize poorly across tasks. Training-free approaches offer better efficiency, but they do not explicitly optimize over unobserved variables, leading to globally inconsistent results. To address these limitations, we introduce Accelerated Likelihood Maximization (ALM), a novel training-free sampling strategy integrated into the reverse diffusion process that significantly extends the applicability of diffusion models beyond simple generation tasks. Unlike previous methods that implicitly influence missing regions through pre-generated region constraints, we directly optimize the unobserved region during the sampling process, enabling globally coherent and plausible generation. Furthermore, we incorporate an acceleration strategy that significantly improves computational efficiency without sacrificing performance. Experimental results demonstrate that ALM consistently outperforms state-of-the-art methods in various data domains and tasks, establishing a powerful paradigm for versatile content generation.

Introduction

  • Versatile content generation goes beyond generate-from-scratch settings by conditioning on partially observed or pre-generated inputs, covering inpainting, outpainting, motion completion, and 3D view-consistent generation.
  • Training-based methods can achieve strong task-specific performance, but training is costly and the resulting models remain domain-limited; training-free synchronization methods are efficient, but often guide missing content only indirectly.
  • ALM directly optimizes the unobserved region inside reverse diffusion by combining contextual consistency with full-sample realism, while keeping the pretrained model fixed.
  • A one-step acceleration strategy makes this optimization practical while preserving strong performance.

Method

ALM modifies only the reverse diffusion sampler. At each denoising step, it keeps the pre-generated content fixed as context, blends it with the current unobserved variable, and asks the pretrained denoiser how the missing region should move.

ALM is based on score-based likelihood maximization: the update follows score estimates of a composite objective over the unobserved region. It combines a conditional likelihood term for contextual consistency with a joint log-density term for realism under the diffusion prior. The acceleration strategy collapses iterative optimization into one update per step, making the same mechanism practical for versatile content generation scenarios.

Overview of Accelerated Likelihood Maximization
Method overview. ALM optimizes the unobserved region during reverse diffusion with a one-step accelerated update.

Experimental Results

We evaluate ALM on (1) image inpainting, (2) wide image generation, (3) 3D mesh texturing, (4) long video generation, and (5) human motion completion,

Image Inpainting

ALM directly optimizes the unobserved image region while preserving the pre-generated content, improving global consistency across diverse masks and prompts.

Image inpainting comparison with baselines
Image inpainting comparison. ALM harmonizes missing regions with pre-generated context.
Additional image inpainting results
Additional image inpainting results. ALM handles diverse layouts and contexts, producing high-quality results.
Image inpainting results across diverse backbone architectures
Results on diverse backbone architectures. ALM works across diverse backbones, including unconditional diffusion, SDXL, and FLUX.

Wide Image Generation

For wide image generation, ALM performs autoregressive outpainting while explicitly optimizing newly generated regions to remain compatible with previous patches.

Wide image generation comparison with baselines
Wide image generation comparison. ALM reduces discontinuities, blur, and color inconsistency.
Additional wide image generation results
Additional wide image generation results. ALM extends scenes with coherent global structure.

3D Mesh Texturing

ALM generates multi-view consistent textures by optimizing each new view with respect to the already generated context, then aggregating the resulting views into a mesh texture.

Comparison with Baselines

Additional Results

Long Video Generation

ALM extends short text-to-video generations into longer videos by autoregressively generating future segments that remain consistent with previous frames.

Human Motion Completion

ALM also applies to human motion completion, including first-half, middle-half, and last-half prediction. Given frames are shown in orange, and generated frames are shown in blue.

Comparison with Baselines

Additional Results

Ablation Study

The ablation study isolates the contributions of overall unobserved region optimization, the conditional likelihood term, the joint log-density term, and the acceleration strategy.

Ablation study for ALM image inpainting
Ablation study. The full objective improves consistency and plausibility; acceleration preserves quality.

Selected References

[1] Ju et al. "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion", ECCV, 2024.

[2] Zhuang et al. "A Task Is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting", ECCV, 2024.

[3] Manukyan et al. "HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models", ICLR, 2025.

[4] Rombach et al. "High-Resolution Image Synthesis with Latent Diffusion Models", CVPR, 2022.

[5] Lee et al. "SyncSDE: A Probabilistic Framework for Diffusion Synchronization", CVPR, 2025.

[6] Kim et al. "SyncTweedies: A General Generative Framework Based on Synchronized Diffusions", NeurIPS, 2024.

[7] Yeo et al. "StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces", ICLR, 2025.

[8] Cohan et al. "Flexible Motion In-betweening with Diffusion Models", SIGGRAPH, 2024.

[9] Ho et al. "Video Diffusion Models", NeurIPS, 2022.

[10] Kim et al. "Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering", CVPR, 2024.

[11] Zhang et al. "TexPainter: Generative Mesh Texturing with Multi-view Consistency", SIGGRAPH, 2024.

[12] Richardson et al. "TEXTure: Text-Guided Texturing of 3D Shapes", SIGGRAPH, 2023.