frablock's picture

16 14

frablock

frablock

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 5 days ago

Towards Interactive Intelligence for Digital Humans

upvoted a paper 5 days ago

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

reacted to Xenova's post with 🔥 10 days ago

Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! 🤯 Demo (+ source code): https://huggingface.co/spaces/webml-community/DINOv3-video-tracking This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! 😍 How does it work? 🤔 1️⃣ Generate and cache image features for each frame 2️⃣ Create a list of embeddings for selected patch(es) 3️⃣ Compute cosine similarity between each patch and the selected patch(es) 4️⃣ Highlight those whose score is above some threshold ... et voilà! 🥳 You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video. Excited to see what the community builds with it!

View all activity

Organizations

None yet

upvoted 2 papers 5 days ago

Towards Interactive Intelligence for Digital Humans

Paper • 2512.13674 • Published 13 days ago • 11

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

Paper • 2512.15603 • Published 11 days ago • 56

upvoted 2 papers 19 days ago

Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality

Paper • 2512.07951 • Published 20 days ago • 47

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published 19 days ago • 126

upvoted a paper 4 months ago

Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models

Paper • 2508.12945 • Published Aug 18 • 14

upvoted 6 papers 5 months ago

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

Paper • 2507.16812 • Published Jul 22 • 63

PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation

Paper • 2507.16116 • Published Jul 22 • 11

NeuralOS: Towards Simulating Operating Systems via Neural Generative Models

Paper • 2507.08800 • Published Jul 11 • 80

The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs

Paper • 2507.11097 • Published Jul 15 • 64

NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining

Paper • 2507.14119 • Published Jul 18 • 58

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction

Paper • 2507.15852 • Published Jul 21 • 38

upvoted 4 papers 6 months ago

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 246

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30 • 89

WebSailor: Navigating Super-human Reasoning for Web Agent

Paper • 2507.02592 • Published Jul 3 • 123

LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Paper • 2507.01945 • Published Jul 2 • 76

upvoted a paper 9 months ago

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Paper • 2504.08685 • Published Apr 11 • 130