베타 업데이트로 UVR5에서도 MDX23C 모델이 사용 가능해졌는데 생각보다 모르는 사람들이 많은 거 같아서 공유함



https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_7_11_23_20_51_BETA.exe
UVR5 베타로 업데이트하고




https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/MDX23C_D1581.ckpt
모델 다운받아서 MDX_Net_Models 폴더에 넣으면 됨


단 풀밴드 모델이 아니며 풀밴드 모델은 몇 달 내에 출시 예정이라 함


그래서 난 이 모델만 단독으로 사용하지 말고 Ensemble 조합으로 쓰는 걸 권장함

실제로 단독으로 사용했을 때보다 voc ft, Inst HQ3 등 다른 모델과 조합했을 때 보컬 SDR이 높은 걸 볼 수 있음

https://mvsep.com/quality_checker/entry/4420

https://mvsep.com/quality_checker/multisong_leaderboard?sort=vocals


그리고 UVR5 베타버전 내부에 새로운 옵션들이 생겼는데 이게 생각보다 중요함 

개발자 피셜 SDR이 크게 증가한다고 함


각 옵션들의 자세한 설명은 이거 번역해서 읽어보면 이해될거임


  • Full ability to run the MDXNET23 models: You can find the models via the following links  - mdx_AB, cdx, mdx_C - Please note, all you need are the checkpoints, that's it. You must change the model names in "mdx_C" from "ckpt" to model1.ckpt, model2.ckpt, & model3.ckpt. Also, we've trained some very good models on this network. We have yet to work out a release date for them, but we will keep you updated! The good news is this patch is 100% compatible with them.
  • Segmentation: The original MDX-NET code had its own built-in chunking mechanism in place since the beginning. This is why vocal chops persisted despite the introduction of batch mode a few months ago. Due to the use of onnx, I had to make some tweaks to add the ability to change the native segment/chunk size (the default has been 256). The previous method we used for chunking was essentially nesting the code's native chunks, leading to double chunking (and worse conversion results). Larger segment sizes lead to better results, higher SDR scores, and more RAM/V-RAM usage.
  • Overlap: This feature helps eliminate vocal chops, almost entirely, depending on your setting. Overlap works with the native chunking mechanism. I modeled it after Demucs' overlap feature. It increases the SDR by a lot!
  • Pitch-shift Conversion: You can now change the pitch of the input for conversion. Each whole number represents a semitone. For example, setting -2 is minus- 2 semitones, 0 is native pitch, and 2 is plus 2 semitones. Pitch shift is compatible with all the networks and models except VR Arch.
  • You can find more details on the rest of the changes in the change log within the GUI.

본인은 MDX23C, voc ft, Inst HQ3 3개 조합에
Segmentation 1760 Overlap 0.8 MDXNET23 Overlap 8 정도로 사용중임

마지막으로 에밀리아 아이돌 커버 듣고 가