LINR Bridge: Vector Graphic Animation via Neural Implicits and Video Diffusion Priors

Wenshuo Gao, Xicheng Lan, Luyao Zhang, Shuai Yang
Wangxuan Institute of Computer Technology, Peking University, Beijing, China
Paper Code

Abstract

Vector graphics, known for their scalability and user-friendliness, provide a unique approach to visual content compared to traditional pixel-based images. Animation of these graphics, driven by the motion of their elements, offers enhanced comprehensibility and controllability but often requires substantial manual effort. To automate this process, we propose a novel method that integrates implicit neural representations with text-to-video diffusion models for vector graphic animation. Our approach employs layered implicit neural representations to reconstruct vector graphics, preserving their inherent properties such as infinite resolution and precise color and shape constraints, which effectively bridges the large domain gap between vector graphics and diffusion models. The neural representations are then optimized using video score distillation sampling, which leverages motion priors from pretrained text-to-video diffusion models. Finally, the vector graphics are warped to match the representations, resulting in smooth animation. Experimental results validate the effectiveness of our method in generating vivid and natural vector graphic animations, demonstrating significant improvement over existing techniques that suffer from limitations in flexibility and animation quality.

Pipeline

Pipeline Diagram
The pipeline of LINR Bridge takes an SVG and a text prompt as inputs, and produces an SVG animation as output. The pipeline consists of three steps: (1) Vector Graphics Reconstruction: Optimize a LINR network to reconstruct the SVG. (2) Coarse Animation Generation: Replicate the network multiple times to construct a multi-frame video, then input the frames and the text prompt into the text-to-video model, optimizing with video score distillation sampling (VSDS) to obtain a coarse animation. (3) Animation Refinement: Warp the SVG based on optical flow to match the animation, resulting in the final animated SVG.

Results

Input Image 6
Input Image
"Anime style, Vector 2D art, Yellow spotted butterfly flies and flaps its wings."
Result 6
Input Image 4
Input Image
"Anime style, Vector 2D art, A woman in a yellow dress is dancing on her feet and hands."
Result 4
Input Image 1
Input Image
"Anime style, Vector 2D art, A man is waving his hands smoothly."
Result 1
Input Image 1
Input Image
"Anime style, Vector 2D art, A man is walking on his feet."
Result 1
Input Image 2
Input Image
"Anime style, Vector 2D art, A cartoon blue dolphin swims and flexes its body smoothly."
Result 2
Input Image 3
Input Image
"Anime style, Vector 2D art, A cartoon yellow dog is walking."
Result 3
Input Image 5
Input Image
"The blue round clock with white board and two blue pointers rotates clockwise smoothly."
Result 5
Input Image 7
Input Image
"Anime style, Vector 2D art, A tree is swinging left and right in the wind."
Result 7
Input Image 8
Input Image
"Anime style, Vector 2D art, A woman in yellow hair is rowing a boat in the river."
Result 8
Input Image 9
Input Image
"Anime style, Vector 2D art, A bat is flying and flapping its wings."
Result 9
Input Image 10
Input Image
"Anime style, Vector 2D art, A balloon is swinging left and right in the wind."
Result 10

Comparisons

LiveSketch[1]

LiveSketch Result 1 LiveSketch Result 2 LiveSketch Result 3

AniClipart[2]

AniClipart Result 1 AniClipart Result 2 AniClipart Result 3

Ours

Ours Result 1 Ours Result 2 Ours Result 3

References

1. Gal R, Vinker Y, Alaluf Y, et al. Breathing Life Into Sketches Using Text-to-Video Priors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 4325-4336.

2. Wu R, Su W, Ma K, et al. AniClipart: Clipart Animation with Text-to-Video Priors[J]. arXiv preprint arXiv:2404.12347, 2024.