Unfortunately, there is no universally "recommended" set of settings for these items as they depend on many factors such as the size and complexity of your dataset, the architecture of your model, the amount of computational resources available, etc.

As a general guideline, it's often a good idea to start with a relatively low learning rate (e.g. 1e-4 or 1e-5) and gradually increase it if the model is not making progress. For text encoder and UNet learning rates, a value of 1e-3 to 1e-4 is a common starting point. Network alpha can be set in the range of 0.1 to 1.0, with a higher value indicating more emphasis on reconstruction loss and a lower value indicating more emphasis on KL divergence loss.

Ultimately, the best way to determine the optimal settings is through experimentation and cross-validation, trying out different combinations of settings and evaluating the results.


기본값만 알려주고 직접 교차검증 하면서 조절하라 하네