Wan – Open-source alternative to VEO 3

(github.com)

220 pontos | por modinfo 240 dias atrás

13 comentários

bsenftner
240 dias atrás
If you want to play with this, as in really play, with over a dozen variant models with acceleration loras and a vibrant community, ya gotta check out:
https://github.com/deepbeepmeep/Wan2GP
And the discord community: https://discord.gg/g7efUW9jGV
"Wan2GP" is AI video and images "for the GPU poor", get all this operating with as little as 6GB VRAM, Nvidia only.
[-]
- diggan
  240 dias atrás
  On the other side, is there any projects focusing on performance instead? I have the VRAM available to run Wan2.1, but still takes minutes per frame. Basically something like what vLLM is for running local LLM weights, but for video/WAN?
  [-]
  - bsenftner
    240 dias atrás
    This person here has accelerator loras that reduce the compute from 30+ steps to 4 and 8 steps with minimal quality loss: https://huggingface.co/Kijai/WanVideo_comfy
    There are a lot of people focused on performance, various methods, just as there are a lot of people focused on non-performance issues like fine tunes that add aspects the models lack, such as terminology linking professional media terms to the model, the pop culture terminology the model does not know, accuracy of body posture during fight, dance, gymnastic, and sports activity, and then less flashy but pragmatic actions like proper use of tableware, chopsticks, keyboards and musical instruments - complex actions that stand out when done incorrectly or never shown. The model knowledge is high but has limits, which people are adding.
    [-]
    - bsenftner
      240 dias atrás
      There is also a ton of Wan video activity in the ComfyUI community. Everyday for a while, about two weeks ago, ComfyUI had updates specific to Wan 2.2 video integrations in the standard installation. ComfyUI is more complex application, significantly, than Wan2GP though.
- bobajeff
  240 dias atrás
  If having only 6GB VRAM is GPU poor then I must be GPU destitute.
  [-]
  - hirako2000
    240 dias atrás
    It's hard to get an nvidia consumer having then less than 12GB of VRAM, not just these days.
    By GPU poor they didn't mean GPUless or GPU of the previous decade. It's on the readme that only Nvidia is supported.
    [-]
    - giancarlostoro
      239 dias atrás
      That doesn't stop Mac / iPhone from using these models. I've build videos with Wan 2.2 on my wife's M4 Mac Mini w/ 24GB of RAM. It might take a little longer to render though ;)
      [-]
      - zakki
        239 dias atrás
        Nice. Any reference for the installation?
        [-]
        giancarlostoro
        239 dias atrás
        Draw Things app in the appstore is free, open source and holds your hand.
    - hypercube33
      240 dias atrás
      I wish they'd state suggested or required hardware upfront.
      Also disappointing that I haven't seen anything target the new Ryzen AI chips that can do 96gb since they seem pretty capable. I'm not sure how much memory m4 pro on the apple side can be utilized for this but it seems like the typical machines are 48 or 64gb these days. Lot more bang for your buck than an Nvidia card on paper?
      [-]
      - pkroll
        239 dias atrás
        Well, they sort of do: they keep referring to the 4090, on their Github and primary promotional pages (https://wan.video/).
        But really all the various video models really want an 80+ gig vram card, to run comfortably. The contortions the ComfyUI community goes through to get things running at a reasonable speed on the current, dinky-sized vram consumer cards, are impressive.
  - giancarlostoro
    239 dias atrás
    Try Framepack... nevermind, even that needs at least 6GB VRAM...
    https://github.com/lllyasviel/FramePack
cubefox
240 dias atrás
Arguably most interesting facts about the new Wan 2.2 model:
- they are now using a 27B MoE architecture (with two 14B experts, for low level and high level detail), which were usually only used for autoregressive LLMs rather than diffusion models
- the smaller 5B model supports up to 720p24 video and runs on 24 GB of VRAM, e.g. an RTX 4090, a consumer graphics card
- if their benchmarks are reliable, the model performance is SOTA even compared to closed source models
[-]
- liuliu
  240 dias atrás
  Some facts are wrong:
  - The 27B "MoE" are not the MoE commonly referred to in LLM world. It is not MoE on FFN layers. It simply means two different models used for different denoising timestep ranges (exactly the same as SDXL-Base / SDXL-Refiner). Calling it MoE is not technically wrong. But claiming "which were usually only used for autoregressive LLMs rather than diffusion models" is just wrong (not to mention HiDream I1 is a model actually incorporated MoE layers (in FFN layer) and is a diffusion model).
  - The A14B models can run on 24GiB VRAM too, with CPU offloading and quantization.
  - Yes, it is SotA even including some closed source models.
- mandeepj
  240 dias atrás
  > - the smaller 5B model supports up to 720p24 video and runs on 24 GB of VRAM, e.g. an RTX 4090, a consumer graphics card
  Seems like you can run it 2 Gpus each having 12 GB VRAM. At least, a breakdown on their GitHub page implied so.
  [-]
  - cubefox
    239 dias atrás
    That would be a lot cheaper than an RTX 4090.
CosmicShadow
240 dias atrás
Wan2.1 was great, but Wan2.2 is really awesome! Here's some samples I made locally with my 5090:
- https://imgur.com/a/VeTn4Ej
- https://imgur.com/a/CujxVX3
Those were both Image to Video and then I upscaled them to 4k. I made the images using Flux Dev Krea.
Took about 3-4 minutes per video to generate and another 2-3 to upscale. Images took 20-40s to generate.
[-]
- scroogey
  240 dias atrás
  What did you use to upscale them?
  [-]
  - CosmicShadow
    239 dias atrás
    One was with Topaz Video, the other was with SeedVR2.
    [-]
    - scroogey
      239 dias atrás
      Thanks!
franky47
240 dias atrás
Quick, someone make a UI for this and call it Obi.
[-]
- tmikaeld
  239 dias atrás
  The Obi for your Wan
- mprivat
  240 dias atrás
  [dead]
ahmedhawas123
240 dias atrás
Are there video generation benchmarks similar to how there are benchmarks for LLMs? Reason I ask is because with lots of these models you have to go through a long cycle to get them up and running before you see an output, and often they will break with basic tasks requiring physics, state, etc. Would love to see some comparison of models across basic things like that.
cuuupid
240 dias atrás
I’ve been using this via Replicate for a while and it’s honestly amazing while being way cheaper. China is definitely leading on open source
[-]
- danielbln
  240 dias atrás
  *open weights
ProofHouse
240 dias atrás
How can they manage that but not the website?
harbingerofdoom
239 dias atrás
[dead]
ychnlt
240 dias atrás
[dead]
cyb0rg1
240 dias atrás
[flagged]
ivape
240 dias atrás
Censored?
esseph
240 dias atrás
Ugh hate they used this name
[-]
- yorwba
  240 dias atrás
  You can call it Wanxiang (万相, ten thousand pictures) if you want. Similarly, Qwen is Qianwen (千问, one thousand questions).
  [-]
  - CapsAdmin
    240 dias atrás
    Its original name was WanX, but the gen ai community found that to be too funny / unfortunate, so they changed it to just Wan.
    [-]
    - arresin
      240 dias atrás
      It’s probably a more appropriate name to be fair.
  - latentsea
    240 dias atrás
    They should just pretend it's an acronym. Wide Art Network.
  - qiine
    240 dias atrás
    ha TIL, very cool names!
- diggan
  240 dias atrás
  Why "hate" this name more than any other name? At least justify your semi-spam.
  [-]
  - esseph
    240 dias atrás
    https://en.wikipedia.org/wiki/Wide_area_network
    [-]
    - diggan
      240 dias atrás
      I'm familiar with that, but would people really confuse a video generation model for a type of computer networks?
      [-]
      - esseph
        239 dias atrás
        That assumes you know what VEO 3 is by reading the title.
        But, I guess sometimes you use a plane to build a plane while the material is aligned to a particular plane.
- ProofHouse
  240 dias atrás
  HATE