Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
RichardForests 's Collections
Language Models
CV
RL
Diffusion models
3D/4D Gaussian Splatting
Multimodal
Mamba
NeRF
Transformers & MoE
(3D) Foundation Models
SSL
DL & Software DStructures
Gemma & MoE
Dora
Flash Attention in Triton
Lora variations
Parameter Efficient - LLMs
Robotics - Cross Attention
LLM Agents OS
DMs - Lighting Conditions

Multimodal

updated Feb 24, 2024
Upvote
-

  • Running on Zero
    MCP
    Featured
    1.99k

    Stable Video Diffusion 1.1

    πŸ“Ί
    1.99k

    Generate a video from a single image


  • Generative Multimodal Models are In-Context Learners

    Paper β€’ 2312.13286 β€’ Published Dec 20, 2023 β€’ 36

  • COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

    Paper β€’ 2401.00849 β€’ Published Jan 1, 2024 β€’ 17

  • TheBloke/Sonya-7B-GPTQ

    Text Generation β€’ 7B β€’ Updated Dec 31, 2023 β€’ 13 β€’ 2

  • Runtime error
    Featured
    142

    TextDiffuser 2

    πŸ“š
    142

    Generate images from text prompts with layout planning


  • AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

    Paper β€’ 2402.12226 β€’ Published Feb 19, 2024 β€’ 45
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs