전체 



C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loggers\test_tube.py:104: LightningDeprecationWarning: The TestTubeLogger is deprecated since v1.5 and will be removed in v1.7. We recommend switching to the `pytorch_lightning.loggers.TensorBoardLogger` as an alternative.

  rank_zero_deprecation(

Monitoring val/loss_simple_ema as checkpoint metric.

Merged modelckpt-cfg:

{'target': 'pytorch_lightning.callbacks.ModelCheckpoint', 'params': {'dirpath': 'logs\\MyProject2022-11-02T10-19-32_nahida\\checkpoints', 'filename': '{epoch:03}-{global_step:05}', 'verbose': True, 'save_last': True, 'monitor': 'val/loss_simple_ema', 'save_top_k': 3, 'every_n_epochs': 1, 'save_on_train_epoch_end': False}}

GPU available: True, used: True

TPU available: False, using: 0 TPU cores

IPU available: False, using: 0 IPUs

**** Loading data set: data_root: training_samples\MyProject, as set: train

**** Loaded 26 images fromt training_samples\MyProject

**** Loading data set: data_root: training_samples\MyProject, as set: train

**** Loaded 26 images fromt training_samples\MyProject

#### Data #####

train, WrappedDataset, 130

validation, WrappedDataset, 26

accumulate_grad_batches = 1

++++ NOT USING LR SCALING ++++

Setting learning rate to 1.00e-06

C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:275: LightningDeprecationWarning: The `on_keyboard_interrupt` callback hook was deprecated in v1.5 and will be removed in v1.7. Please use the `on_exception` callback hook instead.

  rank_zero_deprecation(

C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:284: LightningDeprecationWarning: Base `LightningModule.on_train_batch_start` hook signature has changed in v1.5. The `dataloader_idx` argument will be removed in v1.7.

  rank_zero_deprecation(

C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:291: LightningDeprecationWarning: Base `Callback.on_train_batch_end` hook signature has changed in v1.5. The `dataloader_idx` argument will be removed in v1.7.

  rank_zero_deprecation(

C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\core\datamodule.py:469: LightningDeprecationWarning: DataModule.setup has already been called, so it will not be called again. In v1.6 this behavior will change to always call DataModule.setup.

  rank_zero_deprecation(

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

LatentDiffusion: Also optimizing conditioner params!

Project config

model:

  base_learning_rate: 1.0e-06

  target: ldm.models.diffusion.ddpm.LatentDiffusion

  params:

    reg_weight: 1.0

    linear_start: 0.00085

    linear_end: 0.012

    num_timesteps_cond: 1

    log_every_t: 200

    timesteps: 1000

    first_stage_key: image

    cond_stage_key: caption

    image_size: 64

    channels: 4

    cond_stage_trainable: true

    conditioning_key: crossattn

    monitor: val/loss_simple_ema

    scale_factor: 0.18215

    use_ema: false

    embedding_reg_weight: 0.0

    unfreeze_model: true

    model_lr: 1.0e-06

    unet_config:

      target: ldm.modules.diffusionmodules.openaimodel.UNetModel

      params:

        image_size: 32

        in_channels: 4

        out_channels: 4

        model_channels: 320

        attention_resolutions:

        - 4

        - 2

        - 1

        num_res_blocks: 2

        channel_mult:

        - 1

        - 2

        - 4

        - 4

        num_heads: 8

        use_spatial_transformer: true

        transformer_depth: 1

        context_dim: 768

        use_checkpoint: true

        legacy: false

    first_stage_config:

      target: ldm.models.autoencoder.AutoencoderKL

      params:

        embed_dim: 4

        monitor: val/rec_loss

        ddconfig:

          double_z: true

          z_channels: 4

          resolution: 512

          in_channels: 3

          out_ch: 3

          ch: 128

          ch_mult:

          - 1

          - 2

          - 4

          - 4

          num_res_blocks: 2

          attn_resolutions: []

          dropout: 0.0

        lossconfig:

          target: torch.nn.Identity

    cond_stage_config:

      target: ldm.modules.encoders.modules.FrozenCLIPEmbedder

    ckpt_path: animefull-final-pruned.ckpt

data:

  target: main.DataModuleFromConfig

  params:

    batch_size: 1

    num_workers: 6

    wrap: falsegit

    train:

      target: ldm.data.every_dream.EveryDreamBatch

      params:

        size: 512

        set: train

        repeats: 5

    validation:

      target: ldm.data.personalized.PersonalizedBase

      params:

        size: 512

        set: val

        repeats: 1


Lightning config

modelcheckpoint:

  params:

    every_n_epochs: 1

    save_on_train_epoch_end: false

callbacks:

  image_logger:

    target: main.ImageLogger

    params:

      batch_frequency: 200

      max_images: 16

      increase_log_steps: false

trainer:

  benchmark: true

  max_epochs: 300

  max_steps: 20000

  gpus: 0,


  | Name              | Type               | Params

---------------------------------------------------------

0 | model             | DiffusionWrapper   | 859 M

1 | first_stage_model | AutoencoderKL      | 83.7 M

2 | cond_stage_model  | FrozenCLIPEmbedder | 123 M

---------------------------------------------------------

982 M     Trainable params

83.7 M    Non-trainable params

1.1 B     Total params

4,264.941 Total estimated model params size (MB)

Validation sanity check:   0%|                                                                   | 0/2 [00:00<?, ?it/s]C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\utilities\data.py:59: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 19. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.

  warning_cache.warn(

Global seed set to 23

Epoch 0:   0%|                                                                                 | 0/156 [00:00<?, ?it/s]C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\utilities\data.py:59: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.

  warning_cache.warn(

C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\connectors\logger_connector\result.py:227: UserWarning: You called `self.log('global_step', ...)` in your `training_step` but the value needs to be floating point. Converting it to torch.float32.

  warning_cache.warn(

Summoning checkpoint.

Training complete. max_steps or max_epochs, reached or we blew up.


Traceback (most recent call last):

  File "main.py", line 740, in <module>

    trainer.fit(model, data)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 740, in fit

    self._call_and_handle_interrupt(

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 685, in _call_and_handle_interrupt

    return trainer_fn(*args, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 777, in _fit_impl

    self._run(model, ckpt_path=ckpt_path)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1199, in _run

    self._dispatch()

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1279, in _dispatch

    self.training_type_plugin.start_training(self)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 202, in start_training

    self._results = trainer.run_stage()

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1289, in run_stage

    return self._run_train()

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1319, in _run_train

    self.fit_loop.run()

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run

    self.advance(*args, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 234, in advance

    self.epoch_loop.run(data_fetcher)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run

    self.advance(*args, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 193, in advance

    batch_output = self.batch_loop.run(batch, batch_idx)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run

    self.advance(*args, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\batch\training_batch_loop.py", line 88, in advance

    outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run

    self.advance(*args, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 215, in advance

    result = self._run_optimization(

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 266, in _run_optimization

    self._optimizer_step(optimizer, opt_idx, batch_idx, closure)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 378, in _optimizer_step

    lightning_module.optimizer_step(

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\core\lightning.py", line 1652, in optimizer_step

    optimizer.step(closure=optimizer_closure)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\core\optimizer.py", line 164, in step

    trainer.accelerator.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 336, in optimizer_step

    self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\plugins\precision\precision_plugin.py", line 163, in optimizer_step

    optimizer.step(closure=closure, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\optim\optimizer.py", line 88, in wrapper

    return func(*args, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context

    return func(*args, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\optim\adamw.py", line 92, in step

    loss = closure()

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\plugins\precision\precision_plugin.py", line 148, in _wrap_closure

    closure_result = closure()

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 160, in __call__

    self._result = self.closure(*args, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 155, in closure

    self._backward_fn(step_output.closure_loss)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 327, in backward_fn

    self.trainer.accelerator.backward(loss, optimizer, opt_idx)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 311, in backward

    self.precision_plugin.backward(self.lightning_module, closure_loss, *args, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\plugins\precision\precision_plugin.py", line 91, in backward

    model.backward(closure_loss, optimizer, *args, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\core\lightning.py", line 1434, in backward

    loss.backward(*args, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\_tensor.py", line 307, in backward

    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)

  File "C:\ProgramData\Anaconda3\envs\ldm\lib\site-packages\torch\autograd\__init__.py", line 154, in backward

    Variable._execution_engine.run_backward(

RuntimeError: CUDA out of memory. Tried to allocate 3.23 GiB (GPU 0; 23.99 GiB total capacity; 4.33 GiB already allocated; 16.53 GiB free; 4.44 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF





이러고 떨어지는데 뭐가 문제인지 알 수 있는 사람 있으면 도움 바람