Pytorch: ๋Ÿฐํƒ€์ž„ ์˜ค๋ฅ˜: CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. 12.50MiB ํ• ๋‹น ์‹œ๋„(GPU 0, 10.92GiB ์ด ์šฉ๋Ÿ‰, 8.57MiB ์ด๋ฏธ ํ• ๋‹น๋จ, 9.28GiB ์—ฌ์œ  ๊ณต๊ฐ„, 4.68MiB ์บ์‹œ๋จ)

์— ๋งŒ๋“  2019๋…„ 01์›” 27์ผ  ยท  91์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: pytorch/pytorch

CUDA ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ ์˜ค๋ฅ˜์ด์ง€๋งŒ CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๊ฑฐ์˜ ๋น„์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ €๋Š” ํ˜„์žฌ ๋งค์šฐ ๋งŽ์€ ์–‘์˜ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ(์•ฝ 70GiB์˜ ํ…์ŠคํŠธ)์— ๋Œ€ํ•ด ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
์ด๋ฅผ ์œ„ํ•ด ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค( grid5000 ํด๋Ÿฌ์Šคํ„ฐ ๋„คํŠธ์›Œํฌ์˜ 'grele' ).

์ด ๋งค์šฐ ์ด์ƒํ•œ CUDA ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€๋ฅผ 3์‹œ๊ฐ„ ๋™์•ˆ ๊ต์œกํ•œ ํ›„ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฉ”์‹œ์ง€๊ฐ€ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค.
RuntimeError: CUDA out of memory. Tried to allocate 12.50 MiB (GPU 0; 10.92 GiB total capacity; 8.57 MiB already allocated; 9.28 GiB free; 4.68 MiB cached) .
๋ฉ”์‹œ์ง€์— ๋”ฐ๋ฅด๋ฉด ํ•„์š”ํ•œ ๊ณต๊ฐ„์ด ์žˆ์ง€๋งŒ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ• ๋‹นํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ด ๋ฌธ์ œ์˜ ์›์ธ์ด ๋ฌด์—‡์ธ์ง€ ์•„์‹ญ๋‹ˆ๊นŒ?

์ฐธ๊ณ ๋กœ ๋‚ด ์‚ฌ์ „ ์ฒ˜๋ฆฌ๋Š” torch.multiprocessing.Queue ๋ฐ ๋‚ด ์†Œ์Šค ๋ฐ์ดํ„ฐ ๋ผ์ธ์— ๋Œ€ํ•œ ๋ฐ˜๋ณต์ž๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ฆ‰์„์—์„œ ์‚ฌ์ „ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

์ „์ฒด ์Šคํƒ ์ถ”์ 

Traceback (most recent call last):
  File "/home/emarquer/miniconda3/envs/pytorch/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/emarquer/miniconda3/envs/pytorch/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/emarquer/miniconda3/envs/pytorch/lib/python3.6/site-packages/memory_profiler.py", line 1228, in <module>
    exec_with_profiler(script_filename, prof, args.backend, script_args)
  File "/home/emarquer/miniconda3/envs/pytorch/lib/python3.6/site-packages/memory_profiler.py", line 1129, in exec_with_profiler
    exec(compile(f.read(), filename, 'exec'), ns, ns)
  File "run.py", line 293, in <module>
    main(args, save_folder, load_file)
  File "run.py", line 272, in main
    trainer.all_epochs()
  File "/home/emarquer/papud-bull-nn/trainer/trainer.py", line 140, in all_epochs
    self.single_epoch()
  File "/home/emarquer/papud-bull-nn/trainer/trainer.py", line 147, in single_epoch
    tracker.add(*self.single_batch(data, target))
  File "/home/emarquer/papud-bull-nn/trainer/trainer.py", line 190, in single_batch
    result = self.model(data)
  File "/home/emarquer/miniconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/emarquer/papud-bull-nn/model/model.py", line 54, in forward
    emb = self.emb(input)
  File "/home/emarquer/miniconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/emarquer/miniconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 118, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home/emarquer/miniconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 1454, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA out of memory. Tried to allocate 12.50 MiB (GPU 0; 10.92 GiB total capacity; 8.57 MiB already allocated; 9.28 GiB free; 4.68 MiB cached)

needs reproduction

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

๋ฐ์ดํ„ฐ์˜ ๋ฏธ๋‹ˆ ๋ฐฐ์น˜๊ฐ€ GPU ๋ฉ”๋ชจ๋ฆฌ์— ๋งž์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋ฉด ๋ฉ๋‹ˆ๋‹ค. cifar10 ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•ด ๋ฐฐ์น˜ ํฌ๊ธฐ = 256์„ ์„ค์ •ํ•  ๋•Œ ๋™์ผํ•œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋ฐฐ์น˜ ํฌ๊ธฐ = 128๋กœ ์„ค์ •ํ•˜๋ฉด ํ•ด๊ฒฐ๋ฉ๋‹ˆ๋‹ค.

๋ชจ๋“  91 ๋Œ“๊ธ€

๋™์ผํ•œ ๋Ÿฐํƒ€์ž„ ์˜ค๋ฅ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

Traceback (most recent call last):
  File "carn\train.py", line 52, in <module>
    main(cfg)
  File "carn\train.py", line 48, in main
    solver.fit()
  File "C:\Users\Omar\Desktop\CARN-pytorch\carn\solver.py", line 95, in fit
    psnr = self.evaluate("dataset/Urban100", scale=cfg.scale, num_step=self.step)
  File "C:\Users\Omar\Desktop\CARN-pytorch\carn\solver.py", line 136, in evaluate
    sr = self.refiner(lr_patch, scale).data
  File "C:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\Omar\Desktop\CARN-pytorch\carn\model\carn.py", line 74, in forward
    b3 = self.b3(o2)
  File "C:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\Omar\Desktop\CARN-pytorch\carn\model\carn.py", line 30, in forward
    c3 = torch.cat([c2, b3], dim=1)
RuntimeError: CUDA out of memory. Tried to allocate 195.25 MiB (GPU 0; 4.00 GiB total capacity; 2.88 GiB already allocated; 170.14 MiB free; 2.00 MiB cached)

@EMarquer @OmarBazaraa ์šฐ๋ฆฌ๊ฐ€ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ์†Œํ•œ์˜ ์žฌํ˜„ ์˜ˆ์ œ๋ฅผ ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

๋” ์ด์ƒ ๋ฌธ์ œ๋ฅผ ์žฌํ˜„ํ•  ์ˆ˜ ์—†์œผ๋ฏ€๋กœ ๋ฌธ์ œ๋ฅผ ์ข…๋ฃŒํ•ฉ๋‹ˆ๋‹ค.
RAM์— ์ „์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ ์ €์žฅ์„ ์ค‘๋‹จํ•˜๋ฉด ๋ฌธ์ œ๊ฐ€ ์‚ฌ๋ผ์กŒ์Šต๋‹ˆ๋‹ค.

@OmarBazaraa , ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ท€ํ•˜์˜ ๋ฌธ์ œ๊ฐ€ ๋‚ด ๋ฌธ์ œ์™€ ๋™์ผํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

  • 9.28GiB ์—ฌ์œ  ๊ณต๊ฐ„์ด ์žˆ๋Š” 12.50MiB๋ฅผ ํ• ๋‹นํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • 170.14MiB ์—ฌ์œ  ๊ณต๊ฐ„์ด ์žˆ๋Š” 195.25MiB๋ฅผ ํ• ๋‹นํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์ด ๋ฌธ์ œ์— ๋Œ€ํ•œ ๋‚˜์˜ ์ด์ „ ๊ฒฝํ—˜์— ๋”ฐ๋ฅด๋ฉด CUDA ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ•ด์ œํ•˜์ง€ ์•Š๊ฑฐ๋‚˜ CUDA์— ๋„ˆ๋ฌด ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์œผ๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
CUDA ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ•ด์ œํ•˜์ง€ ์•Š์Œ์œผ๋กœ์จ ๋” ์ด์ƒ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” CUDA์˜ ํ…์„œ์— ๋Œ€ํ•œ ์ฐธ์กฐ๊ฐ€ ์—ฌ์ „ํžˆ ์žˆ์„ ์ˆ˜ ์žˆ์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํ…์„œ๋ฅผ ์‚ญ์ œํ•˜์—ฌ ํ• ๋‹น๋œ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•ด์ œ๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค.

์ผ๋ฐ˜์ ์ธ ํ•ด๊ฒฐ์ฑ…์ด ์žˆ์Šต๋‹ˆ๊นŒ?

CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. 196.00MiB ํ• ๋‹น ์‹œ๋„(GPU 0, 2.00GiB ์ด ์šฉ๋Ÿ‰, 359.38MiB ์ด๋ฏธ ํ• ๋‹น๋จ, 192.29MiB ์—ฌ์œ  ๊ณต๊ฐ„, 152.37MiB ์บ์‹œ๋จ)

@aniks23 ์šฐ๋ฆฌ๋Š” ์ด ๊ฒฝ์šฐ ๋” ๋‚˜์€ ๊ฒฝํ—˜์„ ์ œ๊ณตํ•  ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋Š” ํŒจ์น˜๋ฅผ ์ž‘์—… ์ค‘์ž…๋‹ˆ๋‹ค. ๊ณ„์† ์ง€์ผœ๋ด ์ฃผ์„ธ์š”

๋‚ด ์‹œ์Šคํ…œ์ด ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์ด๋‚˜ ๋„คํŠธ์›Œํฌ์˜ ํฌ๊ธฐ๋ฅผ ์•Œ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๊นŒ?
์ด ๋ฌธ์ œ์— ๋ถ€๋”ชํžˆ์ง€ ์•Š๊ณ ?

2019๋…„ 2์›” 1์ผ ๊ธˆ์š”์ผ ์˜ค์ „ 3:55 Francisco Massa [email protected]
์ผ๋‹ค:

@aniks23 https://github.com/aniks23 ์šฐ๋ฆฌ๋Š” ํŒจ์น˜๋ฅผ ์ž‘์—… ์ค‘์ž…๋‹ˆ๋‹ค.
์ด ๊ฒฝ์šฐ ๋” ๋‚˜์€ ๊ฒฝํ—˜์„ ์ œ๊ณตํ•  ๊ฒƒ์ด๋ผ๊ณ  ๋ฏฟ์Šต๋‹ˆ๋‹ค. ๊ณ„์† ์ง€์ผœ๋ด ์ฃผ์„ธ์š”

โ€”
๋‹น์‹ ์ด ์–ธ๊ธ‰๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์„ ๋ฐ›๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ณ  GitHub์—์„œ ํ™•์ธ
https://github.com/pytorch/pytorch/issues/16417#issuecomment-459530332 ,
๋˜๋Š” ์Šค๋ ˆ๋“œ ์Œ์†Œ๊ฑฐ
https://github.com/notifications/unsubscribe-auth/AUEJD4SYN4gnRkrLgFYEKY6y14P1TMgLks5vI21wgaJpZM4aUowv
.

์ด ๋ฉ”์‹œ์ง€๋„ ๋ฐ›์•˜์Šต๋‹ˆ๋‹ค.

RuntimeError: CUDA out of memory. Tried to allocate 32.75 MiB (GPU 0; 4.93 GiB total capacity; 3.85 GiB already allocated; 29.69 MiB free; 332.48 MiB cached)

Fast.ai Lesson1 Pets https://course.fast.ai/ (cell 31)๋ฅผ ์‹คํ–‰ํ•˜๋ ค๊ณ  ํ•  ๋•Œ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค.

์ €๋„ ๊ฐ™์€ ์˜ค๋ฅ˜์— ๋น ์ ธ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚ด ๋ชจ๋ธ์€ ์ด์ „์— ์ •ํ™•ํ•œ ์„ค์ •์œผ๋กœ ์ž‘์—…ํ–ˆ์ง€๋งŒ ์ง€๊ธˆ์€ ๊ด€๋ จ์ด ์—†์–ด ๋ณด์ด๋Š” ์ผ๋ถ€ ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•œ ํ›„ ์ด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

RuntimeError: CUDA out of memory. Tried to allocate 1.34 GiB (GPU 0; 22.41 GiB total capacity; 11.42 GiB already allocated; 59.19 MiB free; 912.00 KiB cached)

๋‚ด ์‹œ๋‚˜๋ฆฌ์˜ค๊ฐ€ ์›๋ž˜ ๋ฌธ์ œ์™€ ๊ด€๋ จ์ด ์žˆ๋Š”์ง€ ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ ๋‚ด ๋ชจ๋ธ์˜ nn.Sequential ๋ ˆ์ด์–ด๋ฅผ ๋ถ„ํ•ดํ•˜์—ฌ ๋ฌธ์ œ(์ด์ „ ๋ฉ”์‹œ์ง€์˜ OOM ์˜ค๋ฅ˜๊ฐ€ ์‚ฌ๋ผ์กŒ์Šต๋‹ˆ๋‹ค)๋ฅผ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค.

self.input_layer = nn.Sequential(
    nn.Conv3d(num_channels, 32, kernel_size=3, stride=1, padding=0),
    nn.BatchNorm3d(32),
    nn.ReLU()
)

output = self.input_layer(x)

์—๊ฒŒ

self.input_conv = nn.Conv3d(num_channels, 32, kernel_size=3, stride=1, padding=0)
self.input_bn = nn.BatchNorm3d(32)

output = F.relu(self.input_bn(self.input_conv(x)))

๋‚ด ๋ชจ๋ธ์—๋Š” ์ด ์ค‘ ํ›จ์”ฌ ๋” ๋งŽ์€ ๊ฒƒ์ด ์žˆ์Šต๋‹ˆ๋‹ค(์ •ํ™•ํžˆ 5๊ฐœ ์ด์ƒ). nn.Sequential์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๊นŒ? ์•„๋‹ˆ๋ฉด ์ด๊ฒƒ์ด ๋ฒ„๊ทธ์ž…๋‹ˆ๊นŒ? @yf225 @fmassa

๋น„์Šทํ•œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

CUDA out of memory. Tried to allocate 196.50 MiB (GPU 0; 15.75 GiB total capacity; 7.09 GiB already allocated; 20.62 MiB free; 72.48 MiB cached)

@treble-maker123, nn.Sequential์ด ๋ฌธ์ œ๋ผ๋Š” ๊ฒƒ์„ ๊ฒฐ๋ก ์ ์œผ๋กœ ์ฆ๋ช…ํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๊นŒ?

๋น„์Šทํ•œ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. pytorch ๋ฐ์ดํ„ฐ ๋กœ๋”๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. 5GB ์ด์ƒ์˜ ์—ฌ์œ  ๊ณต๊ฐ„์ด ์žˆ์–ด์•ผ ํ•˜์ง€๋งŒ 0๋ฐ”์ดํŠธ ์—ฌ์œ  ๊ณต๊ฐ„์ด ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค.

RuntimeError Traceback(๊ฐ€์žฅ ์ตœ๊ทผ ํ˜ธ์ถœ ๋งˆ์ง€๋ง‰)
~์—
22
23 ๋ฐ์ดํ„ฐ, ์ž…๋ ฅ = ์ƒํƒœ_์ž…๋ ฅ
---> 24๊ฐœ ๋ฐ์ดํ„ฐ, ์ž…๋ ฅ = Variable(data).float().to(device), Variable(inputs).float().to(device)
25 ์ธ์‡„(data.device)
26 enc_out = ์ธ์ฝ”๋”(๋ฐ์ดํ„ฐ)

๋Ÿฐํƒ€์ž„ ์˜ค๋ฅ˜: CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. 11.00MiB ํ• ๋‹น ์‹œ๋„(GPU 0, 6.00GiB ์ด ์šฉ๋Ÿ‰, 448.58MiB ์ด๋ฏธ ํ• ๋‹น๋จ, 0๋ฐ”์ดํŠธ ์—ฌ์œ  ๊ณต๊ฐ„, 942.00KiB ์บ์‹œ๋จ)

์•ˆ๋…•ํ•˜์„ธ์š”, ์ €๋„์ด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค.

 File "xxx", line 151, in __call__
    logits = self.model(x_hat)
  File "anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "unet.py", line 67, in forward
    x = up(x, blocks[-i-1])
  File "anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "unet.py", line 120, in forward
    out = self.conv_block(out)
  File "anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "unet.py", line 92, in forward
    out = self.block(x)
  File "anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 320, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 8.00 MiB (GPU 1; 11.78 GiB total capacity; 10.66 GiB already allocated; 1.62 MiB free; 21.86 MiB cached)

์Šฌํ”„๊ฒŒ๋„ ๋‚˜๋Š” ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ๋งŒ๋‚ฌ์Šต๋‹ˆ๋‹ค.

RuntimeError: CUDA out of memory. Tried to allocate 1.33 GiB (GPU 1; 31.72 GiB total capacity; 5.68 GiB already allocated; 24.94 GiB free; 5.96 MiB cached)

์„œ๋ฒ„ ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ๋‚ด ๋ชจ๋ธ์„ ํ›ˆ๋ จํ–ˆ๋Š”๋ฐ ๋‚ด ์„œ๋ฒ„ ์ค‘ ํ•˜๋‚˜์— ์˜ˆ๊ธฐ์น˜ ์•Š๊ฒŒ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ด๋Ÿฌํ•œ ์œ ์„  ์˜ค๋ฅ˜๋Š” ๋‚ด ํ›ˆ๋ จ ์ „๋žต ์ค‘ ํ•˜๋‚˜์—์„œ๋งŒ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์œ ์ผํ•œ ์ฐจ์ด์ ์€ ๋ฐ์ดํ„ฐ ๋ณด๊ฐ• ์ค‘์— ์ฝ”๋“œ๋ฅผ ์ˆ˜์ • ํ•˜๊ณ  ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ๋ฅผ ๋‹ค๋ฅธ ๊ฒƒ๋ณด๋‹ค ๋ณต์žกํ•˜๊ฒŒ ๋งŒ๋“ ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ž˜ ๋ชจ๋ฅด๊ฒ ์Šต๋‹ˆ๋‹ค.

๋‚˜๋Š” ๋˜ํ•œ์ด ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•??? RuntimeError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 4.00 GiB total capacity; 2.94 GiB already allocated; 10.22 MiB free; 18.77 MiB cached)

์—ฌ๊ธฐ๋„ ๊ฐ™์€ ๋ฌธ์ œ RuntimeError: CUDA out of memory. Tried to allocate 54.00 MiB (GPU 0; 11.00 GiB total capacity; 7.89 GiB already allocated; 7.74 MiB free; 478.37 MiB cached)

@fmassa ์ด๊ฒƒ์— ๋Œ€ํ•œ ๋” ๋งŽ์€ ์ •๋ณด๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ?

https://github.com/pytorch/pytorch/issues/16417#issuecomment -484264163

๋‚˜์—๊ฒŒ ๊ฐ™์€ ๋ฌธ์ œ
์นœ์• ํ•˜๋Š”, ์†”๋ฃจ์…˜์„ ์–ป์—ˆ์Šต๋‹ˆ๊นŒ?
(๊ธฐ๋ณธ) F:\Suresh\st-gcn>python main1.py ์ธ์‹ -c config/st_gcn/ntu-xsub/train.yaml --device 0 --work_dir ./work_dir
C:\Users\cudalab10\Anaconda3lib\site-packages\torch\cuda__init__.py:117: UserWarning:
cuda ๊ธฐ๋Šฅ 1.1์˜ GPU0 TITAN Xp๋ฅผ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค.
PyTorch๋Š” ์ด GPU๊ฐ€ ๋„ˆ๋ฌด ์˜ค๋ž˜๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋” ์ด์ƒ ์ง€์›ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

warnings.warn(old_gpu_warn % (d, ์ด๋ฆ„, ์ฃผ์š”, ๊ธฐ๋Šฅ[1]))
[05.22.19|12:02:41] ๋งค๊ฐœ๋ณ€์ˆ˜:
{'base_lr': 0.1, 'ignore_weights': [], '๋ชจ๋ธ': 'net.st_gcn.Model', 'eval_interval': 5, 'weight_decay': 0.0001, 'work_dir': './work_dir', 'save_interval ': 10, 'model_args': {'in_channels': 3, 'dropout': 0.5, 'num_class': 60, 'edge_importance_weighting': True, 'graph_args': {'์ „๋žต': '๊ณต๊ฐ„', '๋ ˆ์ด์•„์›ƒ': 'ntu-rgb+d'}}, '๋””๋ฒ„๊ทธ': ๊ฑฐ์ง“, 'pavi_log': ๊ฑฐ์ง“, 'save_result': ๊ฑฐ์ง“, 'config': 'config/st_gcn/ntu-xsub/train.yaml', '์ตœ์ ํ™”๊ธฐ': 'SGD', '๊ฐ€์ค‘์น˜': ์—†์Œ, 'num_epoch': 80, 'batch_size': 64, 'show_topk': [1, 5], 'test_batch_size': 64, 'step': [10, 50], 'use_gpu ': True, 'phase': 'train', 'print_log': True, 'log_interval': 100, 'feeder': 'feeder.feeder.Feeder', 'start_epoch': 0, 'nesterov': True, '์žฅ์น˜ ': [0], 'save_log': ์ฐธ, 'test_feeder_args': {'data_path': './data/NTU-RGB-D/xsub/val_data.npy', 'label_path': './data/NTU- RGB-D/xsub/val_label.pkl'}, 'train_feeder_args': {'data_path': './data/NTU-RGB-D/xsub/train_data.npy', '๋””๋ฒ„๊ทธ': False, 'label_path': ' ./data/NTU-RGB-D/xsub/train_l abel.pkl'}, 'num_worker': 4}

[05.22.19|12:02:41] ํ›ˆ๋ จ ์—ํฌํฌ: 0
์—ญ์ถ”์ (๊ฐ€์žฅ ์ตœ๊ทผ ํ˜ธ์ถœ ๋งˆ์ง€๋ง‰):
ํŒŒ์ผ "main1.py", 31ํ–‰,
p.start()
ํŒŒ์ผ "F:\Suresh\st-gcn\processor\processor.py", 113ํ–‰, ์‹œ์ž‘ ์‹œ
self.train()
ํŒŒ์ผ "F:\Suresh\st-gcn\processor\recognition.py", 91ํ–‰, ๊ธฐ์ฐจ
์ถœ๋ ฅ = self.model(๋ฐ์ดํ„ฐ)
ํŒŒ์ผ "C:\Users\cudalab10\Anaconda3lib\site-packages\torch\nn\modules\module.py", 489ํ–‰, __call__
๊ฒฐ๊ณผ = self.forward( ์ž…๋ ฅ, * kwargs)
ํŒŒ์ผ "F:\Suresh\st-gcn\net\st_gcn.py", 82ํ–‰, ์•ž์œผ๋กœ
x, _ = gcn(x, self.A * ์ค‘์š”๋„)
ํŒŒ์ผ "C:\Users\cudalab10\Anaconda3lib\site-packages\torch\nn\modules\module.py", 489ํ–‰, __call__
๊ฒฐ๊ณผ = self.forward( ์ž…๋ ฅ, * kwargs)
ํŒŒ์ผ "F:\Suresh\st-gcn\net\st_gcn.py", 194ํ–‰, ์•ž์œผ๋กœ
x, A = self.gcn(x, A)
ํŒŒ์ผ "C:\Users\cudalab10\Anaconda3lib\site-packages\torch\nn\modules\module.py", 489ํ–‰, __call__
๊ฒฐ๊ณผ = self.forward( ์ž…๋ ฅ, * kwargs)
ํŒŒ์ผ "F:\Suresh\st-gcn\net\utils\tgcn.py", 60ํ–‰, ์•ž์œผ๋กœ
x = self.conv(x)
ํŒŒ์ผ "C:\Users\cudalab10\Anaconda3lib\site-packages\torch\nn\modules\module.py", 489ํ–‰, __call__
๊ฒฐ๊ณผ = self.forward( ์ž…๋ ฅ, * kwargs)
ํŒŒ์ผ "C:\Users\cudalab10\Anaconda3lib\site-packages\torch\nn\modules\conv.py", 320ํ–‰, ์•ž์œผ๋กœ
self.padding, self.dilation, self.groups)
๋Ÿฐํƒ€์ž„ ์˜ค๋ฅ˜: CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. 1.37GiB ํ• ๋‹น ์‹œ๋„(GPU 0, 12.00GiB ์ด ์šฉ๋Ÿ‰, 8.28GiB ์ด๋ฏธ ํ• ๋‹น๋จ, 652.75MiB ์—ฌ์œ  ๊ณต๊ฐ„, 664.38MiB ์บ์‹œ๋จ)

๋ฐ์ดํ„ฐ์˜ ๋ฏธ๋‹ˆ ๋ฐฐ์น˜๊ฐ€ GPU ๋ฉ”๋ชจ๋ฆฌ์— ๋งž์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋ฉด ๋ฉ๋‹ˆ๋‹ค. cifar10 ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•ด ๋ฐฐ์น˜ ํฌ๊ธฐ = 256์„ ์„ค์ •ํ•  ๋•Œ ๋™์ผํ•œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋ฐฐ์น˜ ํฌ๊ธฐ = 128๋กœ ์„ค์ •ํ•˜๋ฉด ํ•ด๊ฒฐ๋ฉ๋‹ˆ๋‹ค.

์˜ˆ @balcilar ๊ฐ€ ๋งž์Šต๋‹ˆ๋‹ค. ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์ค„์˜€์œผ๋ฉฐ ์ด์ œ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

๋น„์Šทํ•œ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

RuntimeError: CUDA out of memory. Tried to allocate 11.88 MiB (GPU 4; 15.75 GiB total capacity; 10.50 GiB already allocated; 1.88 MiB free; 3.03 GiB cached)

8 V100์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ˜ผ๋ž€์Šค๋Ÿฌ์šด ๋ถ€๋ถ„์€ ์—ฌ์ „ํžˆ โ€‹โ€‹3.03GB๊ฐ€ ์บ์‹œ๋˜์–ด ์žˆ์œผ๋ฉฐ 11.88MB์— ํ• ๋‹นํ•  ์ˆ˜ ์—†๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ๋ณ€๊ฒฝํ–ˆ์Šต๋‹ˆ๊นŒ? ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์ ˆ๋ฐ˜์œผ๋กœ ์ค„์ด์‹ญ์‹œ์˜ค. ๋ฐฐ์น˜๋ฅผ ๋งํ•˜๋‹ค
๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ํฌ๊ธฐ๋Š” 16์ž…๋‹ˆ๋‹ค. ๋ฐฐ์น˜ ํฌ๊ธฐ 8์„ ์‚ฌ์šฉํ•ด ๋ณด๊ณ  ์ž‘๋™ํ•˜๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.

์ฆ๊ธฐ๋‹ค

2019๋…„ 6์›” 10์ผ ์›”์š”์ผ ์˜ค์ „ 2์‹œ 10๋ถ„์— magic282 [email protected]์—์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ผ์Šต๋‹ˆ๋‹ค.

๋น„์Šทํ•œ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

๋Ÿฐํƒ€์ž„ ์˜ค๋ฅ˜: CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. 11.88MiB ํ• ๋‹น ์‹œ๋„(GPU 4, 15.75GiB ์ด ์šฉ๋Ÿ‰, 10.50GiB ์ด๋ฏธ ํ• ๋‹น๋จ, 1.88MiB ์—ฌ์œ  ๊ณต๊ฐ„, 3.03GiB ์บ์‹œ๋จ)

8 V100์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ—ท๊ฐˆ๋ฆฌ๋Š” ๋ถ€๋ถ„์€
์—ฌ์ „ํžˆ 3.03GB๊ฐ€ ์บ์‹œ๋˜์–ด 11.88MB์— ํ• ๋‹นํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

โ€”
๋‹น์‹ ์ด ๋Œ“๊ธ€์„ ๋‹ฌ์•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์„ ๋ฐ›๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ณ  GitHub์—์„œ ํ™•์ธ
https://github.com/pytorch/pytorch/issues/16417?email_source=notifications&email_token=AGGVQNIXGPJ3HXGSVRPOYUTPZXV5NA5CNFSM4GSSRQX2YY3PNVWWissues3TUL52HS4DFVEXG43VMXXVWJ3NVWWK3TUL52HS4DFVREWSL
๋˜๋Š” ์Šค๋ ˆ๋“œ ์Œ์†Œ๊ฑฐ
https://github.com/notifications/unsubscribe-auth/AGGVQNPVGT5RLM6ZV5KMSULPZXV5NANCNFSM4GSSRQXQ
.

๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋ ค๊ณ  ์‹œ๋„ํ–ˆ์ง€๋งŒ ํšจ๊ณผ๊ฐ€์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ํ˜ผ๋ž€์Šค๋Ÿฌ์šด ๋ถ€๋ถ„์€ ์บ์‹œ๋œ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ• ๋‹นํ•  ๋ฉ”๋ชจ๋ฆฌ๋ณด๋‹ค ํฌ๋‹ค๋Š” ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€์ž…๋‹ˆ๋‹ค.

๋‚˜๋Š” predict ๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์—์„œ ๋™์ผํ•œ ๋ฌธ์ œ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์€ ํšจ๊ณผ๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.

์ตœ์‹  ๋ฒ„์ „์˜ PyTorch๋กœ ์—…๋ฐ์ดํŠธํ•˜๋ฉด ์ด์™€ ๊ฐ™์€ ์˜ค๋ฅ˜๊ฐ€ ๋œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ค๋ฅ˜์˜ ์ˆซ์ž๊ฐ€ ํ•ฉ์‚ฐ๋˜์ง€ ์•Š๋Š” ์ด์œ ๋ฅผ ์—ฌ์ญค๋ด๋„ ๋ ๊นŒ์š”?!
๋‚˜๋Š” (์—ฌ๋Ÿฌ๋ถ„ ๋ชจ๋‘์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ) ๋‹ค์Œ์„ ์–ป์Šต๋‹ˆ๋‹ค.
Tried to allocate 20.00 MiB (GPU 0; 1.95 GiB total capacity; 763.17 MiB already allocated; 6.31 MiB free; 28.83 MiB cached)
๋‚˜์—๊ฒŒ ๊ทธ๊ฒƒ์€ ๋‹ค์Œ์ด ๋Œ€๋žต ์‚ฌ์‹ค์ด์–ด์•ผ ํ•จ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
1.95 (GB total) - 20 (MiB needed) == 763.17 (MiB already used) + 6.31 (MiB free) + 28.83 (MiB cached)
ํ•˜์ง€๋งŒ ๊ทธ๋ ‡์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋‚ด๊ฐ€ ๋ฌด์—‡์„ ์ž˜๋ชป ์•Œ๊ณ  ์žˆ์Šต๋‹ˆ๊นŒ?

U-net์„ ํ›ˆ๋ จํ•  ๋•Œ๋„ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ์บ์‹œ๋Š” ์ถฉ๋ถ„ํ•˜์ง€๋งŒ ์—ฌ์ „ํžˆ ์ถฉ๋Œํ•ฉ๋‹ˆ๋‹ค.

๋‚˜๋Š” ๊ฐ™์€ ์˜ค๋ฅ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค ...
๋Ÿฐํƒ€์ž„ ์˜ค๋ฅ˜: CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. 312.00MiB ํ• ๋‹น ์‹œ๋„(GPU 0, 10.91GiB ์ด ์šฉ๋Ÿ‰, 1.07GiB ์ด๋ฏธ ํ• ๋‹น๋จ, 109.62MiB ์—ฌ์œ  ๊ณต๊ฐ„, 15.21MiB ์บ์‹œ๋จ)

ํฌ๊ธฐ๋ฅผ ์ค„์ด์‹ญ์‹œ์˜ค(๊ฒฐ๊ณผ๋ฅผ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๋Š” ๋ชจ๋“  ํฌ๊ธฐ)๊ฐ€ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

ํฌ๊ธฐ๋ฅผ ์ค„์ด์‹ญ์‹œ์˜ค(๊ฒฐ๊ณผ๋ฅผ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๋Š” ๋ชจ๋“  ํฌ๊ธฐ)๊ฐ€ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

์•ˆ๋…•ํ•˜์„ธ์š”, batch_size๋ฅผ 1๋กœ ๋ณ€๊ฒฝํ–ˆ์ง€๋งŒ ์ž‘๋™ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค!

๋‹ค๋ฅธ ํฌ๊ธฐ๋กœ ๋ณ€๊ฒฝํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Vร o 21:50, CN, 2019๋…„ 7์›” 14์ผ Bcw93 [email protected] ฤ‘รฃ viแบฟt:

ํฌ๊ธฐ๋ฅผ ์ค„์ด์‹ญ์‹œ์˜ค(๊ฒฐ๊ณผ๋ฅผ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๋Š” ๋ชจ๋“  ํฌ๊ธฐ)๊ฐ€ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

์•ˆ๋…•ํ•˜์„ธ์š”, batch_size๋ฅผ 1๋กœ ๋ณ€๊ฒฝํ–ˆ์ง€๋งŒ ์ž‘๋™ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค!

โ€”
๋‹น์‹ ์ด ๋Œ“๊ธ€์„ ๋‹ฌ์•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์„ ๋ฐ›๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ณ  GitHub์—์„œ ํ™•์ธ
https://github.com/pytorch/pytorch/issues/16417?email_source=notifications&email_token=AHLNPF7MWQ7U5ULGIT44VRTP7MOKFA5CNFSM4GSSRQX2YY3PNVWWK3TUL52HS4DFVEXG43VMVVBW63
๋˜๋Š” ์Šค๋ ˆ๋“œ ์Œ์†Œ๊ฑฐ
https://github.com/notifications/unsubscribe-auth/AHLNPF4227GHH32PI4WC4SDP7MOKFANCNFSM4GSSRQXQ
.

์ด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒฝ์šฐ:
๋Ÿฐํƒ€์ž„ ์˜ค๋ฅ˜: CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. 2.00MiB ํ• ๋‹น ์‹œ๋„(GPU 0, 7.94GiB ์ด ์šฉ๋Ÿ‰, 7.33GiB ์ด๋ฏธ ํ• ๋‹น๋จ, 1.12MiB ์—ฌ์œ  ๊ณต๊ฐ„, 40.48MiB ์บ์‹œ๋จ)

์—”๋น„๋””์•„-smi
2019๋…„ 8์›” 22์ผ ๋ชฉ 21:05:52
+---------------------------------------------------------------- --------------------------+
| NVIDIA-SMI 430.40 ๋“œ๋ผ์ด๋ฒ„ ๋ฒ„์ „: 430.40 CUDA ๋ฒ„์ „: 10.1 |
|---------------------------------------------+----------------- --+----------------------+
| GPU ์ด๋ฆ„ ์ง€์†์„ฑ-M| ๋ฒ„์Šค ID Disp.A | ํœ˜๋ฐœ์„ฑ ๋ถ€์ •ํ™•. ECC |
| ํŒฌ ์˜จ๋„ ์„ฑ๋Šฅ Pwr: ์‚ฌ์šฉ๋Ÿ‰/ํ•œ๋„ | ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰ | GPU ํ™œ์šฉ ์ปดํ“จํŒ… M. |
|===================================================== =====+========================|
| 0 Quadro M4000 ๋„๊ธฐ | 00000000:09:00.0 ์ผœ๊ธฐ | ํ•ด๋‹น ์—†์Œ |
| 46% 37C P8 12W / 120W | 71MiB / 8126MiB | 10% ๊ธฐ๋ณธ๊ฐ’ |
+-------------------------------+-------------------- --+----------------------+
| 1 GeForce GTX 105... ๋„๊ธฐ | 00000000:41:00.0 ์ผœ๊ธฐ | ํ•ด๋‹น ์—†์Œ |
| 29% 33C P8 ํ•ด๋‹น ์—†์Œ / 75W | 262MiB / 4032MiB | 0% ๊ธฐ๋ณธ๊ฐ’ |
+-------------------------------+-------------------- -----+----------------------+

+---------------------------------------------------------------- --------------------------+
| ํ”„๋กœ์„ธ์Šค: GPU ๋ฉ”๋ชจ๋ฆฌ |
| GPU PID ์œ ํ˜• ํ”„๋กœ์„ธ์Šค ์ด๋ฆ„ ์‚ฌ์šฉ๋ฒ• |
|=================================================== ==============================|
| 0 1909 G /usr/lib/xorg/Xorg 50MiB |
| 1 1909 G /usr/lib/xorg/Xorg 128MiB |
| 1 5236 G ...ํ€˜์ŠคํŠธ ์ฑ„๋„ ํ† ํฐ=9884100064965360199 130MiB |
+---------------------------------------------------------------- --------------------------+

OS: ์šฐ๋ถ„ํˆฌ 18.04 ๋ฐ”์ด์˜ค๋‹‰
์ปค๋„: x86_64 Linux 4.15.0-58-์ผ๋ฐ˜
๊ฐ€๋™ ์‹œ๊ฐ„: 29๋ถ„
ํŒจํ‚ค์ง€: 2002๋…„
์‰˜: bash 4.4.20
ํ•ด์ƒ๋„: 1920x1080 1080x1920
DE: LXDE
WM: ์˜คํ”ˆ๋ฐ•์Šค
GTK ํ…Œ๋งˆ: Lubuntu-default [GTK2]
์•„์ด์ฝ˜ ํ…Œ๋งˆ: ๋ฃจ๋ถ„ํˆฌ
๊ธ€๊ผด: ์šฐ๋ถ„ํˆฌ 11
CPU: AMD Ryzen Threadripper 2970WX 24์ฝ”์–ด @ 48x 3GHz [61.8ยฐC]
GPU: ์ฟผ๋“œ๋กœ M4000, ์ง€ํฌ์Šค GTX 1050 Ti
๋žจ: 3194MiB / 64345MiB

์ด๊ฑฐ ๊ณ ์ณ์กŒ๋‚˜์š”? ํฌ๊ธฐ์™€ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ๋ชจ๋‘ 1๋กœ ์ค„์˜€์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์— ๋‹ค๋ฅธ ์†”๋ฃจ์…˜์ด ์—†์ง€๋งŒ ์ด ํ‹ฐ์ผ“์€ ๋‹ซํ˜€ ์žˆ์Šต๋‹ˆ๋‹ค. Cuda 10.1 Windows 10, Pytorch 1.2.0์—์„œ ๋™์ผํ•œ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

@hughkf ์ฝ”๋“œ์˜ ์–ด๋””์—์„œ batch_size๋ฅผ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๊นŒ?

@aidoshacks , ์ฝ”๋“œ์— ๋”ฐ๋ผ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์—ฌ๊ธฐ์— ํ•œ ๊ฐ€์ง€ ์˜ˆ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๋‚ด ์ปดํ“จํ„ฐ์—์„œ ์ด ๋ฌธ์ œ๋ฅผ ์•ˆ์ •์ ์œผ๋กœ ์ผ์œผํ‚ค๋Š” ๋…ธํŠธ๋ถ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค: https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson3-camvid-tiramisu.ipynb. ๋‹ค์Œ ์ค„์„ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.

bs,size = 8,src_size//2 ~ bs,size = 1,1 ํ•˜์ง€๋งŒ ์—ฌ์ „ํžˆ ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

๋‚˜๋ฅผ ์œ„ํ•ด batch_size๋ฅผ 128์—์„œ 64๋กœ ๋ณ€๊ฒฝํ•˜๋Š” ๊ฒƒ์ด ํšจ๊ณผ๊ฐ€ ์žˆ์—ˆ์ง€๋งŒ ๋‚˜์—๊ฒŒ ๊ณต๊ฐœ๋œ ์†”๋ฃจ์…˜์ฒ˜๋Ÿผ ๋ณด์ด์ง€ ์•Š๊ฑฐ๋‚˜ ๋‚ด๊ฐ€ ๋†“์น˜๊ณ  ์žˆ๋Š” ๊ฒƒ์ด ์žˆ์Šต๋‹ˆ๊นŒ?

์ด ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜์—ˆ์Šต๋‹ˆ๊นŒ? ๋‚˜๋Š” ๋˜ํ•œ ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‚ด ์ฝ”๋“œ๋ฅผ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š์•˜์ง€๋งŒ ์—ฌ๋Ÿฌ ๋ฒˆ ์‹คํ–‰ํ•œ ํ›„ ๋‹ค์Œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

"๋Ÿฐํƒ€์ž„ ์˜ค๋ฅ˜: CUDA ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ. 40.00MiB ํ• ๋‹น ์‹œ๋„(GPU 0, 15.77GiB ์ด ์šฉ๋Ÿ‰, 13.97GiB ์ด๋ฏธ ํ• ๋‹น๋จ, 256.00KiB ์—ฌ์œ  ๊ณต๊ฐ„, 824.57MiB ์บ์‹œ๋จ)"

์—ฌ์ „ํžˆ ์ด ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ ์ƒํƒœ๊ฐ€ ํ•ด๊ฒฐ๋˜์ง€ ์•Š์Œ์œผ๋กœ ๋ณ€๊ฒฝ๋˜๋ฉด ์ข‹์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

ํŽธ์ง‘ํ•˜๋‹ค:
๋ฐฐ์น˜ ํฌ๊ธฐ 1๋กœ ๋ณผ ๋•Œ ๋ฐฐ์น˜ ํฌ๊ธฐ์™€ ๊ฑฐ์˜ ๊ด€๋ จ์ด ์—†์—ˆ์Šต๋‹ˆ๋‹ค. ์ปค๋„์„ ๋‹ค์‹œ ์‹œ์ž‘ํ•˜๋ฉด ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜์—ˆ์œผ๋ฉฐ ๊ทธ ์ดํ›„๋กœ๋Š” ๋ฐœ์ƒํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

๊ทธ๋ ‡๋‹ค๋ฉด ์•„๋ž˜์™€ ๊ฐ™์€ ์˜ˆ์ œ์˜ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ(์ฆ‰, ๋งŽ์€ ์—ฌ์œ  ๋ฉ”๋ชจ๋ฆฌ์™€ ๋งค์šฐ ์ ์€ ํ• ๋‹น์„ ์‹œ๋„ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์‹ค์ œ๋กœ ์ผ๋ถ€ ์˜ˆ์ œ์™€ ๋‹ค๋ฆ…๋‹ˆ๋‹ค)?

๋Ÿฐํƒ€์ž„ ์˜ค๋ฅ˜: CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. 1.33 ์ง€๋ธŒ์„ ํ• ๋‹น ์‹œ๋„ (GPU 1, 31.72 ์ง€๋ธŒ ์ด ์šฉ๋Ÿ‰, 5.68 ์ง€๋ธŒ๊ฐ€ ์ด๋ฏธ ํ• ๋‹น, 24.94 ์ง€๋ธŒ ๋ฌด๋ฃŒ, 5.96 MIB๋Š” ์บ์‹œ)

์ตœ์‹  pytorch ๋ฒ„์ „(1.2) ๋ฐ ์ตœ์‹  NVIDIA GPU(V-100)์—์„œ ์—ฌ์ „ํžˆ ๋ฐœ์ƒํ•˜๋ฏ€๋กœ ๋ฌธ์ œ๊ฐ€ '๋‹ซํž˜' ์ƒํƒœ๋กœ ์ „ํ™˜๋œ ์ด์œ ๋ฅผ ๋ชจ๋ฅด๊ฒ ์Šต๋‹ˆ๋‹ค.

๊ฐ์‚ฌ ํ•ด์š”!

fastai ํŒจํ‚ค์ง€์—์„œ ์ด ํŠน์ • ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€๋ฅผ ๋ฐ›๋Š” ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ ๋น„์ •์ƒ์ ์œผ๋กœ ์ž‘์€ GPU๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ปค๋„์„ ๋‹ค์‹œ ์‹œ์ž‘ํ•˜๊ณ  ์ œ๊ณตํ•˜๋Š” ๊ฒฝ๋กœ์— ๋” ์ž‘์€ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์— ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. pytorch0.4.1, ๋ฐฐ์น˜ ํฌ๊ธฐ=4๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ๊ดœ์ฐฎ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ pytorch1.3์œผ๋กœ ๋ณ€๊ฒฝํ•˜๊ณ  ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ 1๋กœ ์„ค์ •ํ•ด๋„ oom ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

๋‚ด pytorch๋ฅผ ์ตœ์‹  ๋ฒ„์ „์œผ๋กœ ์—…๋ฐ์ดํŠธํ•˜์—ฌ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค... conda update pytorch

๋ฐ์ดํ„ฐ์˜ ๋ฏธ๋‹ˆ ๋ฐฐ์น˜๊ฐ€ GPU ๋ฉ”๋ชจ๋ฆฌ์— ๋งž์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋ฉด ๋ฉ๋‹ˆ๋‹ค. cifar10 ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•ด ๋ฐฐ์น˜ ํฌ๊ธฐ = 256์„ ์„ค์ •ํ•  ๋•Œ ๋™์ผํ•œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋ฐฐ์น˜ ํฌ๊ธฐ = 128๋กœ ์„ค์ •ํ•˜๋ฉด ํ•ด๊ฒฐ๋ฉ๋‹ˆ๋‹ค.

๋•๋ถ„์— ์ด ๋ฐฉ๋ฒ•์œผ๋กœ ์˜ค๋ฅ˜๋ฅผ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค.

batch_size๋ฅผ 8๋กœ ์ค„์˜€๋”๋‹ˆ ์ž˜ ๋ฉ๋‹ˆ๋‹ค. ์•„์ด๋””์–ด๋Š” ์ž‘์€ batch_size๋ฅผ ๊ฐ–๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

ํŠน์ • ๋ ˆ์ด์–ด๊ฐ€ ์ฒ˜๋ฆฌํ•˜๋Š” ์ด ์ž…๋ ฅ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๋‹ค๋ฅด๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, 256(32x32) ์ด๋ฏธ์ง€์˜ ๋ฐฐ์น˜๊ฐ€ ๋ ˆ์ด์–ด์—์„œ 128๊ฐœ์˜ ํ•„ํ„ฐ๋ฅผ ํ†ต๊ณผํ•˜๋Š” ๊ฒฝ์šฐ ์ด ์ž…๋ ฅ ํฌ๊ธฐ๋Š” 256x32x32x128 = 2^25์ž…๋‹ˆ๋‹ค. ์ด ์ˆซ์ž๋Š” ํŠน์ • ์ž„๊ณ„๊ฐ’๋ณด๋‹ค ๋‚ฎ์•„์•ผ ํ•˜๋ฉฐ ์ด๋Š” ์‹œ์Šคํ…œ์— ๋”ฐ๋ผ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด AWS p3.2xlarge์˜ ๊ฒฝ์šฐ 2^26์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ CuDA ๋ฉ”๋ชจ๋ฆฌ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒฝ์šฐ ๋ฐฐ์น˜ ํฌ๊ธฐ ๋˜๋Š” ํ•„ํ„ฐ ์ˆ˜๋ฅผ ์ค„์ด๊ฑฐ๋‚˜ ์ŠคํŠธ๋ผ์ด๋“œ ๋˜๋Š” ํ’€๋ง ๋ ˆ์ด์–ด์™€ ๊ฐ™์€ ๋” ๋งŽ์€ ๋‹ค์šด์ƒ˜ํ”Œ๋ง์„ ๋„ฃ์–ด๋ณด์‹ญ์‹œ์˜ค.

๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค:
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.93 GiB total capacity; 0 bytes already allocated; 3.83 GiB free; 0 bytes cached)
์ตœ์‹  pytorch(1.3) ๋ฐ cuda(10.1) ๋ฒ„์ „. Nvidia-smi๋Š” ๋˜ํ•œ GPU๊ฐ€ ๋ฐ˜์ฏค ๋น„์–ด ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ๋ฏ€๋กœ ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€์˜ ์—ฌ์œ  ๋ฉ”๋ชจ๋ฆฌ ์–‘์ด ์ •ํ™•ํ•ฉ๋‹ˆ๋‹ค. ์•„์ง ๊ฐ„๋‹จํ•œ ์ฝ”๋“œ๋กœ ์žฌํ˜„ํ•  ์ˆ˜ ์—†์Œ

์ปค๋„ ์žฌ์„ค์ •๋„ ์ €์—๊ฒŒ ํšจ๊ณผ์ ์ด์—ˆ์Šต๋‹ˆ๋‹ค! ๋‚ด๊ฐ€ ํ•  ๋•Œ๊นŒ์ง€ ๋ฐฐ์น˜ ํฌ๊ธฐ = 1๋กœ๋„ ์ž‘๋™ํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

์—ฌ๋Ÿฌ๋ถ„, ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์ ˆ๋ฐ˜์œผ๋กœ ์ค„์ด๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค.

RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 3.95 GiB total capacity; 0 bytes already allocated; 2.02 GiB free; 0 bytes cached)

์žฌ๋ถ€ํŒ… ํ›„ ์ˆ˜์ •

batch_size 64(rtx2080 ti)๋ฅผ โ€‹โ€‹32(rtx 2060)๋กœ ๋ณ€๊ฒฝํ•˜์—ฌ ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฐ ์ข…๋ฅ˜์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์„ ์•Œ๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

์ด๊ฒƒ์€ ๋‚ด๊ฐ€ ์˜ˆ์ธก์„ ํ•  ๋•Œ ๋‚˜์—๊ฒŒ ์ผ์–ด๋‚˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค!
๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ 1024์—์„œ 8๋กœ ๋ณ€๊ฒฝํ–ˆ๋Š”๋ฐ ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ 82%๊ฐ€ ํ‰๊ฐ€๋  ๋•Œ ์—ฌ์ „ํžˆ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

with torch.no_grad() ์ถ”๊ฐ€ํ–ˆ์„ ๋•Œ ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

test_loader = init_data_loader(X_test, y_test, torch.device('cpu'), batch_size, num_workers=0)

print("Starting inference ...")
result = []
model.eval()
valid_loss = 0

with torch.no_grad():
    for batch_x, batch_y in tqdm(test_loader):
        batch_x, batch_y = batch_x.to(device), batch_y.to(device)
        output = model(batch_x)
        result.extend(output[:, 0, 0])
        loss =  torch.sqrt(criterion(output, batch_y))
        valid_loss += loss

valid_loss /= len(train_loader)
print("Done!")

๋‚˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ–ˆ๋‹ค

loader = DataLoader(dataset, batch_size=128, shuffle=True, num_workers=4)
์—๊ฒŒ
loader = DataLoader(dataset, batch_size=64, shuffle=True, num_workers=4)

๋‚˜๋Š” ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ๊ณ  ๋‚ด ์ปดํ“จํ„ฐ์˜ GPU ์‚ฌ์šฉ๋ฅ ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฏธ ๋งŽ์ด ์‚ฌ์šฉ๋˜์—ˆ๊ณ  ๋งค์šฐ ์ ์€ ์–‘์˜ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋‚จ์•˜์Šต๋‹ˆ๋‹ค. jupyter ๋…ธํŠธ๋ถ์„ ์ข…๋ฃŒํ•˜๊ณ  ๋‹ค์‹œ ์‹œ์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ž์œ ๋กœ์›Œ์ง€๊ณ  ์ž‘์—…์ด ์‹œ์ž‘๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

nvidia-smi - To check the memory utilization on GPU
ps -ax | grep jupyter - To get PID of jupyter process
sudo kill PID

์ด ๋ฉ”์‹œ์ง€๋„ ๋ฐ›์•˜์Šต๋‹ˆ๋‹ค.

RuntimeError: CUDA out of memory. Tried to allocate 32.75 MiB (GPU 0; 4.93 GiB total capacity; 3.85 GiB already allocated; 29.69 MiB free; 332.48 MiB cached)

Fast.ai Lesson1 Pets https://course.fast.ai/ (cell 31)๋ฅผ ์‹คํ–‰ํ•˜๋ ค๊ณ  ํ•  ๋•Œ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์˜ ๋ฐฐ์น˜ ํฌ๊ธฐ(bs)๋ฅผ ์ค„์ด์‹ญ์‹œ์˜ค.
๋‹น์‹ ์—๊ฒŒ ํšจ๊ณผ๊ฐ€ ๋ฌด์—‡์ธ์ง€๋ณด์‹ญ์‹œ์˜ค.

๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•˜์ง€ ์•Š๊ณ ๋„ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ„ฐ๋ฏธ๋„ ๋ฐ ํŒŒ์ด์ฌ ํ”„๋กฌํ”„ํŠธ ์—ด๊ธฐ

import torch
torch.cuda.empty_cache()

Python ์ธํ„ฐํ”„๋ฆฌํ„ฐ๋ฅผ ์ข…๋ฃŒํ•˜๊ณ  ์›๋ž˜ PyTorch ๋ช…๋ น์„ ๋‹ค์‹œ ์‹คํ–‰ํ•˜๋ฉด CUDA ๋ฉ”๋ชจ๋ฆฌ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋‚ด ์ปดํ“จํ„ฐ๊ฐ€ CPU RAM์„ ๋„ˆ๋ฌด ๋งŽ์ด ์‚ฌ์šฉํ•˜๋ฉด ์ผ๋ฐ˜์ ์œผ๋กœ ์ด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ์•˜์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋” ํฐ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์›ํ•  ๋•Œ CPU RAM ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ด๋ ค๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋น„์Šทํ•œ ๋ฌธ์ œ๊ฐ€์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๊ณ  ์ปค๋„์„ ๋‹ค์‹œ ์‹œ์ž‘ํ•˜๋ฉด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ œ ๊ฒฝ์šฐ์—๋Š” Adam ์˜ตํ‹ฐ๋งˆ์ด์ €๋ฅผ SGD ์˜ตํ‹ฐ๋งˆ์ด์ €๋กœ ๊ต์ฒดํ•ด๋„ ๋™์ผํ•œ ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์Œ, ์ œ ๊ฒฝ์šฐ์—๋Š” with torch.no_grad(): (train model) , output.to("cpu") ๋ฐ torch.cuda.empty_cache() ํ–ˆ๊ณ  ์ด ๋ฌธ์ œ๋Š” ํ•ด๊ฒฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋Ÿฐํƒ€์ž„ ์˜ค๋ฅ˜: CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. 54.00MiB ํ• ๋‹น ์‹œ๋„(GPU 0, 3.95GiB ์ด ์šฉ๋Ÿ‰, 2.65GiB ์ด๋ฏธ ํ• ๋‹น๋จ, 39.00MiB ์—ฌ์œ  ๊ณต๊ฐ„, 87.29MiB ์บ์‹œ๋จ)

์†”๋ฃจ์…˜์„ ์ฐพ์•˜๊ณ  batch_size ๊ฐ’์„ ์ค„์˜€์Šต๋‹ˆ๋‹ค.

์‚ฌ์šฉ์ž ์ง€์ • ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ Darknet53 ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ YOLOv3์„ ํ›ˆ๋ จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚ด GPU๋Š” NVIDIA RTX 2080์ด๊ณ  ๋™์ผํ•œ ๋ฌธ์ œ์— ์ง๋ฉดํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ๋ณ€๊ฒฝํ•˜๋ฉด ํ•ด๊ฒฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ถ”๋ก  ์‹œ๊ฐ„ ๋™์•ˆ ์ด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค....i'm ru
CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. 102.00MiB ํ• ๋‹น ์‹œ๋„(GPU 0, 15.78GiB ์ด ์šฉ๋Ÿ‰, 14.54GiB ์ด๋ฏธ ํ• ๋‹น๋จ, 48.44MiB ์—ฌ์œ  ๊ณต๊ฐ„, PyTorch์—์„œ ์ด 14.67GiB ์˜ˆ์•ฝ)

-------------------------------------------------- ---------------------------+
| NVIDIA-SMI 440.59 ๋“œ๋ผ์ด๋ฒ„ ๋ฒ„์ „: 440.59 CUDA ๋ฒ„์ „: 10.2 |
|---------------------------------------------+----------------- --+----------------------+
| GPU ์ด๋ฆ„ ์ง€์†์„ฑ-M| ๋ฒ„์Šค ID Disp.A | ํœ˜๋ฐœ์„ฑ ๋ถ€์ •ํ™•. ECC |
| ํŒฌ ์˜จ๋„ ์„ฑ๋Šฅ Pwr: ์‚ฌ์šฉ๋Ÿ‰/ํ•œ๋„ | ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰ | GPU ํ™œ์šฉ ์ปดํ“จํŒ… M. |
|===================================================== =====+========================|
| 0 Tesla V100-SXM2... ์ผœ๊ธฐ | 00000000:00:1E.0 ๋„๊ธฐ | 0 |
| ํ•ด๋‹น ์—†์Œ 35C P0 41W / 300W | 16112MiB / 16160MiB | 0% ๊ธฐ๋ณธ๊ฐ’ |
+-------------------------------+-------------------- -----+----------------------+

+---------------------------------------------------------------- --------------------------+
| ํ”„๋กœ์„ธ์Šค: GPU ๋ฉ”๋ชจ๋ฆฌ |
| GPU PID ์œ ํ˜• ํ”„๋กœ์„ธ์Šค ์ด๋ฆ„ ์‚ฌ์šฉ๋ฒ• |
|=================================================== ==============================|
| 0 13978 C
+---------------------------------------------------------------- --------------------------+

๋ฐ์ดํ„ฐ์˜ ๋ฏธ๋‹ˆ ๋ฐฐ์น˜๊ฐ€ GPU ๋ฉ”๋ชจ๋ฆฌ์— ๋งž์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋ฉด ๋ฉ๋‹ˆ๋‹ค. cifar10 ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•ด ๋ฐฐ์น˜ ํฌ๊ธฐ = 256์„ ์„ค์ •ํ•  ๋•Œ ๋™์ผํ•œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋ฐฐ์น˜ ํฌ๊ธฐ = 128๋กœ ์„ค์ •ํ•˜๋ฉด ํ•ด๊ฒฐ๋ฉ๋‹ˆ๋‹ค.

๊ณ ๋งˆ์›Œ์š”, ๋‹น์‹ ์ด ์˜ณ์Šต๋‹ˆ๋‹ค

GPU ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ถฉ๋ถ„ํ•˜์ง€๋งŒ ์—ฌ์ „ํžˆ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ํŠน์ • ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค. ์ œ ๊ฒฝ์šฐ์—๋Š” ๋ฐ์ดํ„ฐ ๋กœ๋”์˜ ์ž‘์—…์ž ์ˆ˜๋ฅผ ์ค„์—ฌ์„œ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค.

๋ฐฐ๊ฒฝ

py36, pytorch1.4, tf2.0, ์ฝ˜๋‹ค
๋กœ๋ฒ ๋ฅดํƒ€ ๋ฏธ์„ธ ์กฐ์ •

๋ฌธ์ œ

@EMarquer ์™€ ๋™์ผํ•œ ๋ฌธ์ œ : pycharm ์€ ์—ฌ์ „ํžˆ โ€‹โ€‹์ถฉ๋ถ„ํ•œ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์žˆ์ง€๋งŒ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น์— ์‹คํŒจํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๋‚ด๊ฐ€ ์‹œ๋„ํ•œ ๋ฐฉ๋ฒ•

  1. "batch_size = 1" ์‹คํŒจ
  2. "torch.cuda.empty_cache()" ์‹คํŒจ
  3. CUDA_VISIBLE_DEVICES="0" python Run.py ์‹คํŒจ
  4. jupyter๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์ปค๋„์„ ๋‹ค์‹œ ์‹œ์ž‘ํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.

์„ฑ๊ณต์ ์ธ ๋ฐฉ๋ฒ•

  1. ์—”๋น„๋””์•„-smi
    ๅ›พ็‰‡
    ๅ›พ็‰‡
  2. ์ง„์‹ค์€ pycharm์ด ํ‘œ์‹œํ•˜๋Š” ๊ฒƒ์ด "nvidia-smi"๊ฐ€ ํ‘œ์‹œํ•˜๋Š” ๊ฒƒ๊ณผ ๋‹ค๋ฅด๋ฉฐ(pycharm ์‚ฌ์ง„์„ ์ €์žฅํ•˜์ง€ ์•Š์•„์„œ ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค) ์‹ค์ œ๋กœ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š๋‹ค๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค.
  3. ํ”„๋กœ์„ธ์Šค 6123 ๋ฐ 32644๋Š” ์ด์ „์— ํ„ฐ๋ฏธ๋„์—์„œ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค.
  4. sudo kill -9 6123
  5. sudo kill -9 32644

๋‹จ์ˆœํžˆ ๋‚˜๋ฅผ ์œ„ํ•ด ์ผํ•œ ๊ฒƒ :

import gc 

# Your code with pytorch using GPU

gc.collect() 

๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•˜์ง€ ์•Š๊ณ ๋„ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ„ฐ๋ฏธ๋„ ๋ฐ ํŒŒ์ด์ฌ ํ”„๋กฌํ”„ํŠธ ์—ด๊ธฐ

import torch
torch.cuda.empty_cache()

Python ์ธํ„ฐํ”„๋ฆฌํ„ฐ๋ฅผ ์ข…๋ฃŒํ•˜๊ณ  ์›๋ž˜ PyTorch ๋ช…๋ น์„ ๋‹ค์‹œ ์‹คํ–‰ํ•˜๋ฉด CUDA ๋ฉ”๋ชจ๋ฆฌ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋‚ด ๊ฒฝ์šฐ์—๋Š” ๋‚ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.

--device_ids 0๊ณผ ํ•จ๊ป˜ ์Šฌ๋กฏ 0์—์„œ GPU๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.

๋‚˜๋Š” ์šฉ์–ด๋ฅผ ๋„์‚ดํ•˜๊ณ  ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ณ  ์žˆ์ง€๋งŒ ํšจ๊ณผ๊ฐ€์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. id๋ฅผ ์„ ํƒํ•˜์ง€ ์•Š์œผ๋ฉด GPU ๋Œ€์‹  CPU๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.

๋™์ผํ•œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
๋Ÿฐํƒ€์ž„ ์˜ค๋ฅ˜: CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. 4.84GiB ํ• ๋‹น ์‹œ๋„(GPU 0, 7.44GiB ์ด ์šฉ๋Ÿ‰, 5.22GiB ์ด๋ฏธ ํ• ๋‹น๋จ, 1.75GiB ์—ฌ์œ  ๊ณต๊ฐ„, 18.51MiB ์บ์‹œ๋จ)

ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋‹ค์‹œ ์‹œ์ž‘ํ•˜๊ฑฐ๋‚˜ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ๋ณ€๊ฒฝํ•˜๋ฉด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด ์†”๋ฃจ์…˜์ด ๋งˆ์Œ์— ๋“ค์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ์‹ฌ์ง€์–ด torch.cuda.empty_cache() ๋ฅผ ์‹œ๋„ํ–ˆ์ง€๋งŒ ์ด๊ฒƒ์€ ๋‚˜๋ฅผ ์œ„ํ•ด ์ž‘๋™ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์„ ํ•ด๊ฒฐํ•  ๋‹ค๋ฅธ ํšจ์œจ์ ์ธ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๊นŒ?

๋‚ด ์‹œ๋‚˜๋ฆฌ์˜ค๊ฐ€ ์›๋ž˜ ๋ฌธ์ œ์™€ ๊ด€๋ จ์ด ์žˆ๋Š”์ง€ ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ ๋‚ด ๋ชจ๋ธ์˜ nn.Sequential ๋ ˆ์ด์–ด๋ฅผ ๋ถ„ํ•ดํ•˜์—ฌ ๋ฌธ์ œ(์ด์ „ ๋ฉ”์‹œ์ง€์˜ OOM ์˜ค๋ฅ˜๊ฐ€ ์‚ฌ๋ผ์กŒ์Šต๋‹ˆ๋‹ค)๋ฅผ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค.

self.input_layer = nn.Sequential(
    nn.Conv3d(num_channels, 32, kernel_size=3, stride=1, padding=0),
    nn.BatchNorm3d(32),
    nn.ReLU()
)

output = self.input_layer(x)

์—๊ฒŒ

self.input_conv = nn.Conv3d(num_channels, 32, kernel_size=3, stride=1, padding=0)
self.input_bn = nn.BatchNorm3d(32)

output = F.relu(self.input_bn(self.input_conv(x)))

๋‚ด ๋ชจ๋ธ์—๋Š” ์ด ์ค‘ ํ›จ์”ฌ ๋” ๋งŽ์€ ๊ฒƒ์ด ์žˆ์Šต๋‹ˆ๋‹ค(์ •ํ™•ํžˆ 5๊ฐœ ์ด์ƒ). nn.Sequential์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๊นŒ? ์•„๋‹ˆ๋ฉด ์ด๊ฒƒ์ด ๋ฒ„๊ทธ์ž…๋‹ˆ๊นŒ? @yf225 @fmassa

๋‚˜๋„ ๋น„์Šทํ•œ ์˜ค๋ฅ˜๋ฅผ ํ•ด๊ฒฐํ•˜์ง€๋งŒ ๋ฐ˜๋Œ€๋กœ ๋‹น์‹ ๊ณผ ํ•จ๊ป˜ ํ•ด๊ฒฐํ•˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
๋‚˜๋Š” ๋ชจ๋‘๋ฅผ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค

self.input_layer = nn.Sequential(
    nn.Conv3d(num_channels, 32, kernel_size=3, stride=1, padding=0),
    nn.BatchNorm3d(32),
    nn.ReLU()
)

output = self.input_layer(x)

์—๊ฒŒ

self.input_layer = nn.Sequential(
    nn.Conv3d(num_channels, 32, kernel_size=3, stride=1, padding=0),
    nn.BatchNorm3d(32),
    nn.ReLU()
)

output = self.input_layer(x)

๋‚˜๋ฅผ ์œ„ํ•ด batch_size ๋˜๋Š” ์ฃผ์–ด์ง„ ์†”๋ฃจ์…˜์„ ๋ณ€๊ฒฝํ•ด๋„ ๋„์›€์ด๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‚ด .cfg ํŒŒ์ผ์— ์ž˜๋ชป๋œ ํด๋ž˜์Šค ๊ฐ’๊ณผ ํ•œ ๋ ˆ์ด์–ด์˜ ํ•„ํ„ฐ๊ฐ€ ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์•„๋ฌด ๊ฒƒ๋„ ๋„์›€์ด ๋˜์ง€ ์•Š์œผ๋ฉด .cfg๋ฅผ ๋‹ค์‹œ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.

ํ„ฐ๋ฏธ๋„ ์—ด๊ธฐ

์ฒซ ๋ฒˆ์งธ ์œ ํ˜•
์—”๋น„๋””์•„-smi

๊ทธ๋Ÿฐ ๋‹ค์Œ python ๋˜๋Š” anaconda ๊ฒฝ๋กœ์— ํ•ด๋‹นํ•˜๋Š” PID๋ฅผ ์„ ํƒํ•˜๊ณ  ์ž‘์„ฑํ•˜์‹ญ์‹œ์˜ค.
sudo kill -9 PID

๋‚˜๋Š”์ด ๋ฒ„๊ทธ๋ฅผ ํ•œ๋™์•ˆ ๊ฒช์—ˆ์Šต๋‹ˆ๋‹ค. ์ €์—๊ฒŒ๋Š” ๋ชจ๋ธ ๊ฒฐ๊ณผ๋ฅผ ์ฐธ์กฐํ•˜๋Š” python ๋ณ€์ˆ˜(์˜ˆ: ํ† ์น˜ ํ…์„œ)๋ฅผ ๊ณ„์† ๋ณด์œ ํ•˜๊ณ  ์žˆ์œผ๋ฏ€๋กœ ์ฝ”๋“œ๊ฐ€ ์—ฌ์ „ํžˆ ์•ก์„ธ์Šคํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์•ˆ์ „ํ•˜๊ฒŒ ํ•ด์ œํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

๋‚ด ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

predictions = []
for batch in dataloader:
     p = model(batch.to(torch.device("cuda:0")))
     predictions.append(p)

์ด์— ๋Œ€ํ•œ ์ˆ˜์ •์€ p ๋ฅผ ๋ชฉ๋ก์œผ๋กœ ์ „์†กํ•˜๋Š” ๊ฒƒ์ด์—ˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.

predictions = []
for batch in dataloader:
     p = model(batch.to(torch.device("cuda:0")))
     predictions.append(p.tolist())

์ด๋ ‡๊ฒŒ ํ•˜๋ฉด predictions ๊ฐ€ GPU์˜ ํ…์„œ๊ฐ€ ์•„๋‹ˆ๋ผ ์ฃผ ๋ฉ”๋ชจ๋ฆฌ์— ๊ฐ’์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.

pytorch์— ์˜์กดํ•˜๋Š” fastai.vision ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด ๋ฒ„๊ทธ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” CUDA 10.1์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค

training_args = TrainingArguments(
    output_dir="./",
    overwrite_output_dir=True,
    num_train_epochs=5,
    per_gpu_train_batch_size=4,  #  4;  8    ;16 out of  memory  
    save_steps=10_000,
    save_total_limit=2,
)

per_gpu_train_batch_size๋ฅผ 16์—์„œ 8๋กœ ์ค„์ด๋ฉด ๋‚ด ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ตœ์‹  ๋ฒ„์ „์˜ PyTorch๋กœ ์—…๋ฐ์ดํŠธํ•˜๋ฉด ์ด์™€ ๊ฐ™์€ ์˜ค๋ฅ˜๊ฐ€ ๋œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ •๋ง, ์™œ ๊ทธ๋ ‡๊ฒŒ ๋งํ•ฉ๋‹ˆ๊นŒ

์ด ๋ฌธ์ œ์˜ ์ฃผ์š” ์งˆ๋ฌธ์€ ์—ฌ์ „ํžˆ โ€‹โ€‹๋ฏธํ•ด๊ฒฐ ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ๋™์ผํ•œ ์ด์ƒํ•œ CUDA ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ ๋ฉ”์‹œ์ง€๊ฐ€ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค. 4.08GiB์— 2.26GiB๋ฅผ ๋ฌด๋ฃŒ๋กœ ํ• ๋‹นํ•˜๋ ค๊ณ  ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ถฉ๋ถ„ํ•ด ๋ณด์ด์ง€๋งŒ ํ• ๋‹น์— ์‹คํŒจํ•ฉ๋‹ˆ๋‹ค.
ํ”„๋กœ์ ํŠธ ์ •๋ณด: ๋ฐฐ์น˜ ํฌ๊ธฐ๊ฐ€ 4์ธ activitynet ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ํ†ตํ•ด resnet 10์„ ํ›ˆ๋ จํ•˜๋ฉด ์ฒซ ๋ฒˆ์งธ epoch์˜ ๋งˆ์ง€๋ง‰์—์„œ ์‹คํŒจํ•ฉ๋‹ˆ๋‹ค.
ํŽธ์ง‘๋จ: ์ผ๋ถ€ ์ธ์‹: RAM ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ฒญ์†Œํ•˜๊ณ  Python ์ฝ”๋“œ๋งŒ ๊ณ„์† ์‹คํ–‰ํ•˜๋ฉด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. GPU์— ์ถฉ๋ถ„ํ•œ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์žˆ์ง€๋งŒ RAM ๋ฉ”๋ชจ๋ฆฌ๋Š” ๋‹ค๋ฅธ ๋ชจ๋“  ์ฒ˜๋ฆฌ ๋‹จ๊ณ„๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.
์ปดํ“จํ„ฐ ์ •๋ณด: Dell G5 - i7 9th - GTX 1660Ti 6GB - 16GB RAM
EDITED2: 4๋ช…์˜ ์ž‘์—…์ž์™€ ํ•จ๊ป˜ "_MultiProcessingDataLoaderIter"๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์—ˆ๋Š”๋ฐ ์ „๋‹ฌ ํ˜ธ์ถœ์—์„œ ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ ๋ฉ”์‹œ์ง€๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ž‘์—…์ž ์ˆ˜๋ฅผ 1๋กœ ์ค„์ด๋ฉด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ž‘์—…์ž๊ฐ€ 1์ธ ๊ฒฝ์šฐ ๋žจ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์€ 11/16GB๋กœ ์œ ์ง€๋˜๊ณ  4์ธ ๊ฒฝ์šฐ 14.5/16GB๋กœ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ๋กœ 1๋ช…์˜ ์ž‘์—…์ž๋กœ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ 32๋กœ ๋Š˜๋ฆด ์ˆ˜ ์žˆ๊ณ  GPU ๋ฉ”๋ชจ๋ฆฌ๋ฅผ 3.5GB/6GB๋กœ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋Ÿฐํƒ€์ž„ ์˜ค๋ฅ˜: CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. 2.26GiB ํ• ๋‹น ์‹œ๋„(GPU 0, 6.00GiB ์ด ์šฉ๋Ÿ‰, 209.63MiB ์ด๋ฏธ ํ• ๋‹น๋จ, 4.08GiB ์—ฌ์œ  ๊ณต๊ฐ„, PyTorch์—์„œ ์ด ์˜ˆ์•ฝ 246.00MiB)

์ „์ฒด ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€

์—ญ์ถ”์ (๊ฐ€์žฅ ์ตœ๊ทผ ํ˜ธ์ถœ ๋งˆ์ง€๋ง‰):
ํŒŒ์ผ "main.py", 450ํ–‰,
opt.distributed์ธ ๊ฒฝ์šฐ:
main_worker์˜ ํŒŒ์ผ "main.py", 409ํ–‰
opt.device, current_lr, train_logger,
"D:\Guilherme\Google Drive\Profissional\Cursos\Mestrado\Pesquisa\HMDB51\training.py" ํŒŒ์ผ, 37ํ–‰, train_epoch
์ถœ๋ ฅ = ๋ชจ๋ธ(์ž…๋ ฅ)
ํŒŒ์ผ "D:\Guilherme\Google Drive\Profissional\Cursos\Mestrado\Pesquisa\HMDB51\envlib\site-packages\torch\nn\modules\module.py", 532ํ–‰, __call__
๊ฒฐ๊ณผ = self.forward( ์ž…๋ ฅ, * kwargs)
ํŒŒ์ผ "D:\Guilherme\Google Drive\Profissional\Cursos\Mestrado\Pesquisa\HMDB51\envlib\site-packages\torch\nnparallel\data_parallel.py", 150ํ–‰, ์•ž์œผ๋กœ
๋ฐ˜ํ™˜ self.module( ์ž…๋ ฅ[0], * kwargs[0])
ํŒŒ์ผ "D:\Guilherme\Google Drive\Profissional\Cursos\Mestrado\Pesquisa\HMDB51\envlib\site-packages\torch\nn\modules\module.py", 532ํ–‰, __call__
๊ฒฐ๊ณผ = self.forward( ์ž…๋ ฅ, * kwargs)
ํŒŒ์ผ "D:\Guilherme\Google Drive\Profissional\Cursos\Mestrado\Pesquisa\HMDB51\models\resnet.py", 205ํ–‰, ์•ž์œผ๋กœ
x = self.layer3(x)
ํŒŒ์ผ "D:\Guilherme\Google Drive\Profissional\Cursos\Mestrado\Pesquisa\HMDB51\envlib\site-packages\torch\nn\modules\module.py", 532ํ–‰, __call__
๊ฒฐ๊ณผ = self.forward( ์ž…๋ ฅ, * kwargs)
ํŒŒ์ผ "D:\Guilherme\Google Drive\Profissional\Cursos\Mestrado\Pesquisa\HMDB51\envlib\site-packages\torch\nn\modules\container.py", 100ํ–‰, ์•ž์œผ๋กœ
์ž…๋ ฅ = ๋ชจ๋“ˆ(์ž…๋ ฅ)
ํŒŒ์ผ "D:\Guilherme\Google Drive\Profissional\Cursos\Mestrado\Pesquisa\HMDB51\envlib\site-packages\torch\nn\modules\module.py", 532ํ–‰, __call__
๊ฒฐ๊ณผ = self.forward( ์ž…๋ ฅ, * kwargs)
ํŒŒ์ผ "D:\Guilherme\Google Drive\Profissional\Cursos\Mestrado\Pesquisa\HMDB51\models\resnet.py", 51ํ–‰, ์•ž์œผ๋กœ
out = self.conv2(out)
ํŒŒ์ผ "D:\Guilherme\Google Drive\Profissional\Cursos\Mestrado\Pesquisa\HMDB51\envlib\site-packages\torch\nn\modules\module.py", 532ํ–‰, __call__
๊ฒฐ๊ณผ = self.forward( ์ž…๋ ฅ, * kwargs)
ํŒŒ์ผ "D:\Guilherme\Google Drive\Profissional\Cursos\Mestrado\Pesquisa\HMDB51\envlib\site-packages\torch\nn\modules\conv.py", 480ํ–‰, ์•ž์œผ๋กœ
self.padding, self.dilation, self.groups)
๋Ÿฐํƒ€์ž„ ์˜ค๋ฅ˜: CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. 2.26GiB ํ• ๋‹น ์‹œ๋„(GPU 0, 6.00GiB ์ด ์šฉ๋Ÿ‰, 209.63MiB ์ด๋ฏธ ํ• ๋‹น๋จ, 4.08GiB ์‚ฌ์šฉ ๊ฐ€๋Šฅ, 246.00MiB ์˜ˆ์•ฝ๋จ)
PyTorch์— ์˜ํ•ด ์ด๊ณ„)

image

image

์ž‘์€ ๋ฐฐ์น˜ ํฌ๊ธฐ, ์ž‘๋™

๋‚˜๋Š”์ด ๋ฒ„๊ทธ๋ฅผ ํ•œ๋™์•ˆ ๊ฒช์—ˆ์Šต๋‹ˆ๋‹ค. ์ €์—๊ฒŒ๋Š” ๋ชจ๋ธ ๊ฒฐ๊ณผ๋ฅผ ์ฐธ์กฐํ•˜๋Š” python ๋ณ€์ˆ˜(์˜ˆ: ํ† ์น˜ ํ…์„œ)๋ฅผ ๊ณ„์† ๋ณด์œ ํ•˜๊ณ  ์žˆ์œผ๋ฏ€๋กœ ์ฝ”๋“œ๊ฐ€ ์—ฌ์ „ํžˆ ์•ก์„ธ์Šคํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์•ˆ์ „ํ•˜๊ฒŒ ํ•ด์ œํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

๋‚ด ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

predictions = []
for batch in dataloader:
     p = model(batch.to(torch.device("cuda:0")))
     predictions.append(p)

์ด์— ๋Œ€ํ•œ ์ˆ˜์ •์€ p ๋ฅผ ๋ชฉ๋ก์œผ๋กœ ์ „์†กํ•˜๋Š” ๊ฒƒ์ด์—ˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.

predictions = []
for batch in dataloader:
     p = model(batch.to(torch.device("cuda:0")))
     predictions.append(p.tolist())

์ด๋ ‡๊ฒŒ ํ•˜๋ฉด predictions ๊ฐ€ GPU์˜ ํ…์„œ๊ฐ€ ์•„๋‹ˆ๋ผ ์ฃผ ๋ฉ”๋ชจ๋ฆฌ์— ๊ฐ’์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.

@abdelrahmanhosny ์ง€์ ํ•ด ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ๋‚˜๋Š” PyTorch 1.5.0์—์„œ ๋˜‘๊ฐ™์€ ๋ฌธ์ œ์— ์ง๋ฉดํ–ˆ๊ณ  ํ›ˆ๋ จ ์ค‘์— OOM ๋ฌธ์ œ๊ฐ€ ์—†์—ˆ์ง€๋งŒ ์ถ”๋ก ํ•˜๋Š” ๋™์•ˆ ๋ฉ”๋ชจ๋ฆฌ์— ๋ชจ๋ธ ๊ฒฐ๊ณผ๋ฅผ ์ฐธ์กฐํ•˜๋Š” ํŒŒ์ด์ฌ ๋ณ€์ˆ˜(์˜ˆ: ํ† ์น˜ ํ…์„œ)๋ฅผ ๊ณ„์† ์œ ์ง€ํ•˜์—ฌ GPU๊ฐ€ ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ์„ ์ดˆ๋ž˜ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ผ์ • ์ˆ˜์˜ ๋ฐฐ์น˜ ํ›„.

๊ทธ๋Ÿฌ๋‚˜ ์ œ ๊ฒฝ์šฐ์—๋Š” ๋„คํŠธ์›Œํฌ๋กœ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ๋•Œ ์˜ˆ์ธก์„ ๋ชฉ๋ก์œผ๋กœ ์ „์†กํ•˜๋Š” ๊ฒƒ์ด ์ž‘๋™ํ•˜์ง€ ์•Š์•˜์œผ๋ฏ€๋กœ ๋‹ค์Œ์„ ์ˆ˜ํ–‰ํ•ด์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค.

predictions.append(p.detach().cpu().numpy()) 

๊ทธ๋Ÿฌ๋ฉด ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค!

์ผ๋ฐ˜์ ์ธ ํ•ด๊ฒฐ์ฑ…์ด ์žˆ์Šต๋‹ˆ๊นŒ?

CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. 196.00MiB ํ• ๋‹น ์‹œ๋„(GPU 0, 2.00GiB ์ด ์šฉ๋Ÿ‰, 359.38MiB ์ด๋ฏธ ํ• ๋‹น๋จ, 192.29MiB ์—ฌ์œ  ๊ณต๊ฐ„, 152.37MiB ์บ์‹œ๋จ)

์ผ๋ฐ˜์ ์ธ ํ•ด๊ฒฐ์ฑ…์ด ์žˆ์Šต๋‹ˆ๊นŒ?

CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. 196.00MiB ํ• ๋‹น ์‹œ๋„(GPU 0, 2.00GiB ์ด ์šฉ๋Ÿ‰, 359.38MiB ์ด๋ฏธ ํ• ๋‹น๋จ, 192.29MiB ์—ฌ์œ  ๊ณต๊ฐ„, 152.37MiB ์บ์‹œ๋จ)

๋‚˜๋Š”์ด ๋ฒ„๊ทธ๋ฅผ ํ•œ๋™์•ˆ ๊ฒช์—ˆ์Šต๋‹ˆ๋‹ค. ์ €์—๊ฒŒ๋Š” ๋ชจ๋ธ ๊ฒฐ๊ณผ๋ฅผ ์ฐธ์กฐํ•˜๋Š” python ๋ณ€์ˆ˜(์˜ˆ: ํ† ์น˜ ํ…์„œ)๋ฅผ ๊ณ„์† ๋ณด์œ ํ•˜๊ณ  ์žˆ์œผ๋ฏ€๋กœ ์ฝ”๋“œ๊ฐ€ ์—ฌ์ „ํžˆ ์•ก์„ธ์Šคํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์•ˆ์ „ํ•˜๊ฒŒ ํ•ด์ œํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.
๋‚ด ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

predictions = []
for batch in dataloader:
     p = model(batch.to(torch.device("cuda:0")))
     predictions.append(p)

์ด์— ๋Œ€ํ•œ ์ˆ˜์ •์€ p ๋ฅผ ๋ชฉ๋ก์œผ๋กœ ์ „์†กํ•˜๋Š” ๊ฒƒ์ด์—ˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.

predictions = []
for batch in dataloader:
     p = model(batch.to(torch.device("cuda:0")))
     predictions.append(p.tolist())

์ด๋ ‡๊ฒŒ ํ•˜๋ฉด predictions ๊ฐ€ GPU์˜ ํ…์„œ๊ฐ€ ์•„๋‹ˆ๋ผ ์ฃผ ๋ฉ”๋ชจ๋ฆฌ์— ๊ฐ’์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.

@abdelrahmanhosny ์ง€์ ํ•ด ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ๋‚˜๋Š” PyTorch 1.5.0์—์„œ ๋˜‘๊ฐ™์€ ๋ฌธ์ œ์— ์ง๋ฉดํ–ˆ๊ณ  ํ›ˆ๋ จ ์ค‘์— OOM ๋ฌธ์ œ๊ฐ€ ์—†์—ˆ์ง€๋งŒ ์ถ”๋ก ํ•˜๋Š” ๋™์•ˆ ๋ฉ”๋ชจ๋ฆฌ์— ๋ชจ๋ธ ๊ฒฐ๊ณผ๋ฅผ ์ฐธ์กฐํ•˜๋Š” ํŒŒ์ด์ฌ ๋ณ€์ˆ˜(์˜ˆ: ํ† ์น˜ ํ…์„œ)๋ฅผ ๊ณ„์† ์œ ์ง€ํ•˜์—ฌ GPU๊ฐ€ ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ์„ ์ดˆ๋ž˜ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ผ์ • ์ˆ˜์˜ ๋ฐฐ์น˜ ํ›„.

๊ทธ๋Ÿฌ๋‚˜ ์ œ ๊ฒฝ์šฐ์—๋Š” ๋„คํŠธ์›Œํฌ๋กœ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ๋•Œ ์˜ˆ์ธก์„ ๋ชฉ๋ก์œผ๋กœ ์ „์†กํ•˜๋Š” ๊ฒƒ์ด ์ž‘๋™ํ•˜์ง€ ์•Š์•˜์œผ๋ฏ€๋กœ ๋‹ค์Œ์„ ์ˆ˜ํ–‰ํ•ด์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค.

predictions.append(p.detach().cpu().numpy()) 

๊ทธ๋Ÿฌ๋ฉด ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค!

ParallelWaveGAN ๋ชจ๋ธ์—์„œ ๋™์ผํ•œ ๋ฌธ์ œ๊ฐ€ ์žˆ๊ณ  #16417์˜ ์†”๋ฃจ์…˜์„ ์‚ฌ์šฉํ–ˆ์ง€๋งŒ ์ž‘๋™ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

y = self.model_gan(*x).view(-1).detach().cpu().numpy()
gc.collect()
ํ† ์น˜.cuda.empty_cache()

ํ›ˆ๋ จ ์ค‘์— ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค.
๊ฐ ์—ํฌํฌ ํ›„์— ์“ฐ๋ ˆ๊ธฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  cuda ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋น„์šฐ๋ฉด ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

gc.collect()
torch.cuda.empty_cache()

๋‹จ์ˆœํžˆ ๋‚˜๋ฅผ ์œ„ํ•ด ์ผํ•œ ๊ฒƒ :

import gc 

# Your code with pytorch using GPU

gc.collect() 

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!! ๋‚˜๋Š” ๊ณ ์–‘์ด์™€ ๊ฐœ ์˜ˆ์ œ๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๋ฐ ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ๊ณ  ์ด๊ฒƒ์ด ๋‚˜๋ฅผ ์œ„ํ•ด ์ผํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ›ˆ๋ จ ์ค‘์— ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค.
๊ฐ ์—ํฌํฌ ํ›„์— ์“ฐ๋ ˆ๊ธฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  cuda ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋น„์šฐ๋ฉด ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

gc.collect()
torch.cuda.empty_cache()

๋‚˜์—๊ฒŒ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€

๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๊ณ  ์—ํฌํฌ๋ฅผ ๋Š˜๋ฆฝ๋‹ˆ๋‹ค. ๊ทธ๊ฒƒ์ด ๋‚ด๊ฐ€ ํ•ด๊ฒฐํ•œ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

@areebsyed ๋žจ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค. ๋งŽ์€ ์ž‘์—…์ž๋ฅผ ๋ณ‘๋ ฌ๋กœ ์„ค์ •ํ•  ๋•Œ์ด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค.

๋‹จ์ผ epoch๋ฅผ ์™„๋ฃŒํ•˜์ง€ ์•Š๊ณ  Colab์˜ pytorch์—์„œ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ bert2bert EncoderDecoderModel์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋™์•ˆ์—๋„ ๋™์ผํ•œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

RuntimeError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 15.90 GiB total capacity; 13.77 GiB already allocated; 59.88 MiB free; 14.98 GiB reserved in total by PyTorch)

@Aakash12980 ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ

@areebsyed ์˜ˆ, ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ 4๋กœ ์ค„์˜€์œผ๋ฉฐ ์ž‘๋™ํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ฐ™์€

RuntimeError                              Traceback (most recent call last)
<ipython-input-116-11ebb3420695> in <module>
     28         landmarks = landmarks.view(landmarks.size(0),-1).cuda()
     29 
---> 30         predictions = network(images)
     31 
     32         # clear all the gradients before calculating them

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

<ipython-input-112-174da452c85d> in forward(self, x)
     13         ##out = self.first_conv(x)
     14         x = x.float()
---> 15         out = self.model(x)
     16         return out

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

~/anaconda3/lib/python3.7/site-packages/torchvision/models/resnet.py in forward(self, x)
    218 
    219     def forward(self, x):
--> 220         return self._forward_impl(x)
    221 
    222 

~/anaconda3/lib/python3.7/site-packages/torchvision/models/resnet.py in _forward_impl(self, x)
    204         x = self.bn1(x)
    205         x = self.relu(x)
--> 206         x = self.maxpool(x)
    207 
    208         x = self.layer1(x)

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/pooling.py in forward(self, input)
    157         return F.max_pool2d(input, self.kernel_size, self.stride,
    158                             self.padding, self.dilation, self.ceil_mode,
--> 159                             self.return_indices)
    160 
    161 

~/anaconda3/lib/python3.7/site-packages/torch/_jit_internal.py in fn(*args, **kwargs)
    245             return if_true(*args, **kwargs)
    246         else:
--> 247             return if_false(*args, **kwargs)
    248 
    249     if if_true.__doc__ is None and if_false.__doc__ is not None:

~/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py in _max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode, return_indices)
    574         stride = torch.jit.annotate(List[int], [])
    575     return torch.max_pool2d(
--> 576         input, kernel_size, stride, padding, dilation, ceil_mode)
    577 
    578 max_pool2d = boolean_dispatch(

RuntimeError: CUDA out of memory. Tried to allocate 80.00 MiB (GPU 0; 7.80 GiB total capacity; 1.87 GiB already allocated; 34.69 MiB free; 1.93 GiB reserved in total by PyTorch)

๊ฐ™์€

RuntimeError                              Traceback (most recent call last)
<ipython-input-116-11ebb3420695> in <module>
     28         landmarks = landmarks.view(landmarks.size(0),-1).cuda()
     29 
---> 30         predictions = network(images)
     31 
     32         # clear all the gradients before calculating them

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

<ipython-input-112-174da452c85d> in forward(self, x)
     13         ##out = self.first_conv(x)
     14         x = x.float()
---> 15         out = self.model(x)
     16         return out

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

~/anaconda3/lib/python3.7/site-packages/torchvision/models/resnet.py in forward(self, x)
    218 
    219     def forward(self, x):
--> 220         return self._forward_impl(x)
    221 
    222 

~/anaconda3/lib/python3.7/site-packages/torchvision/models/resnet.py in _forward_impl(self, x)
    204         x = self.bn1(x)
    205         x = self.relu(x)
--> 206         x = self.maxpool(x)
    207 
    208         x = self.layer1(x)

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/pooling.py in forward(self, input)
    157         return F.max_pool2d(input, self.kernel_size, self.stride,
    158                             self.padding, self.dilation, self.ceil_mode,
--> 159                             self.return_indices)
    160 
    161 

~/anaconda3/lib/python3.7/site-packages/torch/_jit_internal.py in fn(*args, **kwargs)
    245             return if_true(*args, **kwargs)
    246         else:
--> 247             return if_false(*args, **kwargs)
    248 
    249     if if_true.__doc__ is None and if_false.__doc__ is not None:

~/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py in _max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode, return_indices)
    574         stride = torch.jit.annotate(List[int], [])
    575     return torch.max_pool2d(
--> 576         input, kernel_size, stride, padding, dilation, ceil_mode)
    577 
    578 max_pool2d = boolean_dispatch(

RuntimeError: CUDA out of memory. Tried to allocate 80.00 MiB (GPU 0; 7.80 GiB total capacity; 1.87 GiB already allocated; 34.69 MiB free; 1.93 GiB reserved in total by PyTorch)

@monajalal ๋ฐฐ์น˜ ํฌ๊ธฐ ๋˜๋Š” ์ž…๋ ฅ ์ฐจ์› ํฌ๊ธฐ๋ฅผ ์ค„์ด์‹ญ์‹œ์˜ค.

๊ทธ๋ ‡๋‹ค๋ฉด ์•„๋ž˜์™€ ๊ฐ™์€ ์˜ˆ์ œ์˜ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ(์ฆ‰, _free_ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋งŽ๊ณ  ํ• ๋‹น์„ ๊ฑฐ์˜ ์‹œ๋„ํ•˜์ง€ ์•Š์Œ - ์‹ค์ œ๋กœ ์—ฌ์œ  ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๊ฑฐ์˜ ์—†๊ณ  ์•„๋ฌด ๋ฌธ์ œ๊ฐ€ ์—†์„ ๋•Œ ์ด ์Šค๋ ˆ๋“œ์˜ _some_ ์˜ˆ์ œ์™€ ๋‹ค๋ฆ…๋‹ˆ๋‹ค)?

๋Ÿฐํƒ€์ž„ ์˜ค๋ฅ˜: CUDA ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. ํ• ๋‹น ์‹œ๋„ _ 1.33 ์ง€๋ธŒ _ (GPU 1, 31.72 ์ง€๋ธŒ ์ด ์šฉ๋Ÿ‰, ์ด๋ฏธ ํ• ๋‹น 5.68 ์ง€๋ธŒ๋Š”; _ 24.94 ์ง€๋ธŒ ๋ฌด๋ฃŒ _; 5.96 MIB๋Š” ์บ์‹œ)

์ตœ์‹  pytorch ๋ฒ„์ „(1.2) ๋ฐ ์ตœ์‹  NVIDIA GPU(V-100)์—์„œ ์—ฌ์ „ํžˆ ๋ฐœ์ƒํ•˜๋ฏ€๋กœ ๋ฌธ์ œ๊ฐ€ '๋‹ซํž˜' ์ƒํƒœ๋กœ ์ „ํ™˜๋œ ์ด์œ ๋ฅผ ๋ชจ๋ฅด๊ฒ ์Šต๋‹ˆ๋‹ค.

๊ฐ์‚ฌ ํ•ด์š”!

์˜ˆ, ๋Œ€๋ถ€๋ถ„์˜ ์‚ฌ๋žŒ๋“ค์ด ๋ฌธ์ œ๊ฐ€ ๋‹จ์ˆœํžˆ OOM์ด ์•„๋‹ˆ๋ผ OOM์ด ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๊นจ๋‹ซ์ง€ ๋ชปํ•˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์˜ค๋ฅ˜์—๋Š” ์—ฌ์œ  ๊ณต๊ฐ„์ด ์ถฉ๋ถ„ํ•˜๋‹ค๋Š” ์˜ค๋ฅ˜๊ฐ€ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. Windows์—์„œ๋„ ์ด ๋ฌธ์ œ์— ์ง๋ฉดํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ•ด๊ฒฐ์ฑ…์„ ์ฐพ์œผ์…จ์Šต๋‹ˆ๊นŒ?

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰

๊ด€๋ จ ๋ฌธ์ œ

eliabruni picture eliabruni  ยท  3์ฝ”๋ฉ˜ํŠธ

kdexd picture kdexd  ยท  3์ฝ”๋ฉ˜ํŠธ

dablyo picture dablyo  ยท  3์ฝ”๋ฉ˜ํŠธ

szagoruyko picture szagoruyko  ยท  3์ฝ”๋ฉ˜ํŠธ

Coderx7 picture Coderx7  ยท  3์ฝ”๋ฉ˜ํŠธ