전체
C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loggers\test_tube.py:104: LightningDeprecationWarning: The TestTubeLogger is deprecated since v1.5 and will be removed in v1.7. We recommend switching to the `pytorch_lightning.loggers.TensorBoardLogger` as an alternative.
rank_zero_deprecation(
Monitoring val/loss_simple_ema as checkpoint metric.
Merged modelckpt-cfg:
{'target': 'pytorch_lightning.callbacks.ModelCheckpoint', 'params': {'dirpath': 'logs\\MyProject2022-11-02T10-19-32_nahida\\checkpoints', 'filename': '{epoch:03}-{global_step:05}', 'verbose': True, 'save_last': True, 'monitor': 'val/loss_simple_ema', 'save_top_k': 3, 'every_n_epochs': 1, 'save_on_train_epoch_end': False}}
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
**** Loading data set: data_root: training_samples\MyProject, as set: train
**** Loaded 26 images fromt training_samples\MyProject
**** Loading data set: data_root: training_samples\MyProject, as set: train
**** Loaded 26 images fromt training_samples\MyProject
#### Data #####
train, WrappedDataset, 130
validation, WrappedDataset, 26
accumulate_grad_batches = 1
++++ NOT USING LR SCALING ++++
Setting learning rate to 1.00e-06
C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:275: LightningDeprecationWarning: The `on_keyboard_interrupt` callback hook was deprecated in v1.5 and will be removed in v1.7. Please use the `on_exception` callback hook instead.
rank_zero_deprecation(
C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:284: LightningDeprecationWarning: Base `LightningModule.on_train_batch_start` hook signature has changed in v1.5. The `dataloader_idx` argument will be removed in v1.7.
rank_zero_deprecation(
C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:291: LightningDeprecationWarning: Base `Callback.on_train_batch_end` hook signature has changed in v1.5. The `dataloader_idx` argument will be removed in v1.7.
rank_zero_deprecation(
C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\core\datamodule.py:469: LightningDeprecationWarning: DataModule.setup has already been called, so it will not be called again. In v1.6 this behavior will change to always call DataModule.setup.
rank_zero_deprecation(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
LatentDiffusion: Also optimizing conditioner params!
Project config
model:
base_learning_rate: 1.0e-06
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
reg_weight: 1.0
linear_start: 0.00085
linear_end: 0.012
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: image
cond_stage_key: caption
image_size: 64
channels: 4
cond_stage_trainable: true
conditioning_key: crossattn
monitor: val/loss_simple_ema
scale_factor: 0.18215
use_ema: false
embedding_reg_weight: 0.0
unfreeze_model: true
model_lr: 1.0e-06
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 32
in_channels: 4
out_channels: 4
model_channels: 320
attention_resolutions:
- 4
- 2
- 1
num_res_blocks: 2
channel_mult:
- 1
- 2
- 4
- 4
num_heads: 8
use_spatial_transformer: true
transformer_depth: 1
context_dim: 768
use_checkpoint: true
legacy: false
first_stage_config:
target: ldm.models.autoencoder.AutoencoderKL
params:
embed_dim: 4
monitor: val/rec_loss
ddconfig:
double_z: true
z_channels: 4
resolution: 512
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
ckpt_path: animefull-final-pruned.ckpt
data:
target: main.DataModuleFromConfig
params:
batch_size: 1
num_workers: 6
wrap: falsegit
train:
target: ldm.data.every_dream.EveryDreamBatch
params:
size: 512
set: train
repeats: 5
validation:
target: ldm.data.personalized.PersonalizedBase
params:
size: 512
set: val
repeats: 1
Lightning config
modelcheckpoint:
params:
every_n_epochs: 1
save_on_train_epoch_end: false
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 200
max_images: 16
increase_log_steps: false
trainer:
benchmark: true
max_epochs: 300
max_steps: 20000
gpus: 0,
| Name | Type | Params
---------------------------------------------------------
0 | model | DiffusionWrapper | 859 M
1 | first_stage_model | AutoencoderKL | 83.7 M
2 | cond_stage_model | FrozenCLIPEmbedder | 123 M
---------------------------------------------------------
982 M Trainable params
83.7 M Non-trainable params
1.1 B Total params
4,264.941 Total estimated model params size (MB)
Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s]C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\utilities\data.py:59: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 19. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.
warning_cache.warn(
Global seed set to 23
Epoch 0: 0%| | 0/156 [00:00<?, ?it/s]C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\utilities\data.py:59: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.
warning_cache.warn(
C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\connectors\logger_connector\result.py:227: UserWarning: You called `self.log('global_step', ...)` in your `training_step` but the value needs to be floating point. Converting it to torch.float32.
warning_cache.warn(
Summoning checkpoint.
Training complete. max_steps or max_epochs, reached or we blew up.
Traceback (most recent call last):
File "main.py", line 740, in <module>
trainer.fit(model, data)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1199, in _run
self._dispatch()
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1289, in run_stage
return self._run_train()
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1319, in _run_train
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run
self.advance(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 234, in advance
self.epoch_loop.run(data_fetcher)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run
self.advance(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 193, in advance
batch_output = self.batch_loop.run(batch, batch_idx)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run
self.advance(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\batch\training_batch_loop.py", line 88, in advance
outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run
self.advance(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 215, in advance
result = self._run_optimization(
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 266, in _run_optimization
self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 378, in _optimizer_step
lightning_module.optimizer_step(
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\core\lightning.py", line 1652, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\core\optimizer.py", line 164, in step
trainer.accelerator.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 336, in optimizer_step
self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\plugins\precision\precision_plugin.py", line 163, in optimizer_step
optimizer.step(closure=closure, **kwargs)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\optim\optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\optim\adamw.py", line 92, in step
loss = closure()
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\plugins\precision\precision_plugin.py", line 148, in _wrap_closure
closure_result = closure()
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 160, in __call__
self._result = self.closure(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 155, in closure
self._backward_fn(step_output.closure_loss)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 327, in backward_fn
self.trainer.accelerator.backward(loss, optimizer, opt_idx)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 311, in backward
self.precision_plugin.backward(self.lightning_module, closure_loss, *args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\plugins\precision\precision_plugin.py", line 91, in backward
model.backward(closure_loss, optimizer, *args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\core\lightning.py", line 1434, in backward
loss.backward(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\autograd\__init__.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA out of memory. Tried to allocate 3.23 GiB (GPU 0; 23.99 GiB total capacity; 4.33 GiB already allocated; 16.53 GiB free; 4.44 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
이러고 떨어지는데 뭐가 문제인지 알 수 있는 사람 있으면 도움 바람