Pytorch: DataLoader์—์„œ num_workers > 0์ผ ๋•Œ CPU ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ ์ฐจ ๋ˆ„์ˆ˜๋จ

์— ๋งŒ๋“  2018๋…„ 10์›” 29์ผ  ยท  79์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: pytorch/pytorch

ํŽธ์ง‘์ž ์ฐธ๊ณ  ์‚ฌํ•ญ: ์ด ๋ฌธ์ œ์— ๋Œ€ํ•ด ์•Œ๋ ค์ง„ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์€ Python ๋ชฉ๋ก์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  ๋Œ€์‹  numpy ๋ฐฐ์—ด ๋˜๋Š” ํ…์„œ๋ฅผ ์ง์ ‘ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๐Ÿ› ๋ฒ„๊ทธ

DataLoader num_workers > 0 ์ด๋ฉด CPU ๋ฉ”๋ชจ๋ฆฌ ๋ˆ„์ˆ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

์žฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด

๋‹ค์Œ ์Šค๋‹ˆํŽซ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

from torch.utils.data import Dataset, DataLoader
from PIL import Image
from torchvision import transforms
import os

class DataIter(Dataset):
    def __init__(self):
        path = "path/to/data"
        self.data = []

        for cls in os.listdir(path):
            for img in os.listdir(os.path.join(path, cls)):
                self.data.append(os.path.join(path, cls, img))

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        with Image.open(self.data[idx]) as img:
            img = img.convert('RGB')
            return transforms.functional.to_tensor(img)


train_data = DataIter()
train_loader = DataLoader(train_data, batch_size=300,
                          shuffle=True,
                          drop_last=True,
                          pin_memory=False,
                          num_workers=18)

for i, item in enumerate(train_loader):
    if i % 200 == 0:
        print(i)

์˜ˆ์ƒ๋˜๋Š” ํ–‰๋™

CPU ๋ฉ”๋ชจ๋ฆฌ๋Š” ์ ์ฐจ ์ฆ๊ฐ€ํ•˜๊ธฐ ์‹œ์ž‘ํ•˜์—ฌ ๊ฒฐ๊ตญ ์ „์ฒด RAM์„ ์ฑ„์›๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ํ”„๋กœ์„ธ์Šค๋Š” ์•ฝ 15GB์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ์‹œ์Šคํ…œ์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ „์ฒด 128GB๋ฅผ ์ฑ„์›๋‹ˆ๋‹ค.
num_workers=0 ์ผ ๋•Œ RAM ์‚ฌ์šฉ๋Ÿ‰์€ ์ผ์ •ํ•ฉ๋‹ˆ๋‹ค.

ํ™˜๊ฒฝ

PyTorch version: 1.0.0.dev20181028
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.4 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.5
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration: 
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti

Nvidia driver version: 390.67
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.1.4

Versions of relevant libraries:
[pip] Could not collect
[conda] Could not collect

PIL.__version__
'5.3.0'

์ถ”๊ฐ€ ์ •๋ณด

๋ฐ์ดํ„ฐ ์„ธํŠธ์—๋Š” ์•ฝ 2,400๋งŒ ๊ฐœ์˜ ์ด๋ฏธ์ง€๊ฐ€ ์žˆ์œผ๋ฉฐ ๋ชจ๋“  ์ด๋ฏธ์ง€ ๊ฒฝ๋กœ๋Š” ์œ„์˜ ์ฝ”๋“œ ์Šค๋‹ˆํŽซ์— ํ‘œ์‹œ๋œ ๋Œ€๋กœ ๋‹จ์ผ ๋ชฉ๋ก์— ๋กœ๋“œ๋ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ ์—ฌ๋Ÿฌ Pytorch(0.4.0 ๋ฐ 0.4.1) ๋ฒ„์ „์„ ์‹œ๋„ํ–ˆ์ง€๋งŒ ํšจ๊ณผ๋Š” ๋™์ผํ•ฉ๋‹ˆ๋‹ค.

cc @ezyang @gchanan @zou3519 @SsnL

high priority dataloader memory usage molly-guard multiprocessing triaged

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

์ข€ ๋” ์กฐ์‚ฌํ•œ ํ›„์— ๋ˆ„์ถœ์ด ๋ฐœ์ƒํ•˜๋Š” ์ •ํ™•ํ•œ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ์ฝ”๋“œ ์˜ˆ์ œ๋ฅผ ๊ณ ๋ คํ•˜์‹ญ์‹œ์˜ค.

from torch.utils.data import Dataset, DataLoader
import numpy as np
import torch


class DataIter(Dataset):
    def __init__(self):
        self.data_np = np.array([x for x in range(24000000)])
        self.data = [x for x in range(24000000)]

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        data = self.data[idx]
        data = np.array([data], dtype=np.int64)
        return torch.tensor(data)


train_data = DataIter()
train_loader = DataLoader(train_data, batch_size=300,
                          shuffle=True,
                          drop_last=True,
                          pin_memory=False,
                          num_workers=18)

for i, item in enumerate(train_loader):
    if i % 1000 == 0:
        print(i)

Python์˜ ํ‘œ์ค€ ์ •์ˆ˜ ๋ชฉ๋ก์ธ self.data ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฐ์ดํ„ฐ ๋ˆ„์ˆ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ self.data_np ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋™์ผํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์ง€๋งŒ Numpy ๋ฐฐ์—ด ํ˜•ํƒœ๋กœ ๋˜์–ด ์žˆ์œผ๋ฉด ๋ˆ„์ˆ˜๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
๋˜ ๋‹ค๋ฅธ ๊ด€์ฐฐ์€ DataLoader ์˜ shuffle=False ์ธ ๊ฒฝ์šฐ ๋ˆ„์ถœ์ด ํ›จ์”ฌ ๋œ ์‹ฌ๊ฐํ•˜๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋ชจ๋“  79 ๋Œ“๊ธ€

๋ฐ˜๋ณตํ•  ๋•Œ ๋˜๋Š” ๋ฐ˜๋ณต์„ ์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ๋ณด์ž…๋‹ˆ๊นŒ?

@SsnL ๋ฐ˜๋ณต ์ค‘์—๋งŒ.

#13243์„ ์ˆ˜์ •ํ•  ๋•Œ ์ด๊ฒƒ๋„ ์ˆ˜์ •๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

num_workers>0 batch_sampler ๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ OOM์ด ํŠธ๋ฆฌ๊ฑฐ๋  ๋•Œ๊นŒ์ง€ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ์ง€์†์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๋Š” ๋น„์Šทํ•œ ์ƒํ™ฉ์„ ๊ฒฝํ—˜ํ–ˆ์Šต๋‹ˆ๋‹ค.

์žฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด

import math

from torch.utils.data import DataLoader


class Sampler:
    def __init__(self, n=100000, batch_size=32):
        self.n = n
        self.batch_size = batch_size

    def __len__(self):
        return math.ceil(float(self.n)/self.batch_size)

    def __iter__(self):
        batch = []
        for i in range(self.n):
            batch.append(i)
            if len(batch) == self.batch_size:
                yield batch
                batch = []
        if batch:
            yield batch


N = 100000000
train_data = list(range(N))


def ok():
    train_sampler = Sampler(len(train_data))
    train_loader = DataLoader(train_data,
                              num_workers=0,
                              batch_sampler=train_sampler)

    for i, item in enumerate(train_loader):
        if i % 10000 == 0:
            print(i)


def leaky():
    train_sampler = Sampler(len(train_data))
    train_loader = DataLoader(train_data,
                              num_workers=8,
                              batch_sampler=train_sampler)

    for i, item in enumerate(train_loader):
        if i % 10000 == 0:
            print(i)


print('Starting ok')
ok()
print('ok done, starting leaky()')
leaky()
print('leaky done')

ํ™˜๊ฒฝ

$ python3 collect_env.py
Collecting environment information...
PyTorch version: 0.4.0
Is debug build: No
CUDA used to build PyTorch: 9.1.85

OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.5
Is CUDA available: Yes
CUDA runtime version: 9.1.85
GPU models and configuration: GPU 0: GeForce GTX 1050 Ti with Max-Q Design
Nvidia driver version: 390.77
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.1.2
/usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a

Versions of relevant libraries:
[pip] Could not collect
[conda] Could not collect

@ezyang

#13243์„ ์ˆ˜์ •ํ•  ๋•Œ ์ด๊ฒƒ๋„ ์ˆ˜์ •๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ด ๋ฌธ์ œ๋Š” 1.0.0.dev20181105 ์— ์—ฌ์ „ํžˆ ์กด์žฌํ•˜๋ฉฐ, ์—ฌ๊ธฐ์„œ #13243์ด ์ˆ˜์ •๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ข€ ๋” ์กฐ์‚ฌํ•œ ํ›„์— ๋ˆ„์ถœ์ด ๋ฐœ์ƒํ•˜๋Š” ์ •ํ™•ํ•œ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ์ฝ”๋“œ ์˜ˆ์ œ๋ฅผ ๊ณ ๋ คํ•˜์‹ญ์‹œ์˜ค.

from torch.utils.data import Dataset, DataLoader
import numpy as np
import torch


class DataIter(Dataset):
    def __init__(self):
        self.data_np = np.array([x for x in range(24000000)])
        self.data = [x for x in range(24000000)]

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        data = self.data[idx]
        data = np.array([data], dtype=np.int64)
        return torch.tensor(data)


train_data = DataIter()
train_loader = DataLoader(train_data, batch_size=300,
                          shuffle=True,
                          drop_last=True,
                          pin_memory=False,
                          num_workers=18)

for i, item in enumerate(train_loader):
    if i % 1000 == 0:
        print(i)

Python์˜ ํ‘œ์ค€ ์ •์ˆ˜ ๋ชฉ๋ก์ธ self.data ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฐ์ดํ„ฐ ๋ˆ„์ˆ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ self.data_np ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋™์ผํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์ง€๋งŒ Numpy ๋ฐฐ์—ด ํ˜•ํƒœ๋กœ ๋˜์–ด ์žˆ์œผ๋ฉด ๋ˆ„์ˆ˜๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
๋˜ ๋‹ค๋ฅธ ๊ด€์ฐฐ์€ DataLoader ์˜ shuffle=False ์ธ ๊ฒฝ์šฐ ๋ˆ„์ถœ์ด ํ›จ์”ฌ ๋œ ์‹ฌ๊ฐํ•˜๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋น„์Šทํ•œ ๋ฌธ์ œ์— ์ง๋ฉดํ–ˆ์ง€๋งŒ ์ œ ๊ฒฝ์šฐ์—๋Š” numpy ๋ฐฐ์—ด์—์„œ๋„ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ €๋Š” Python 3.7 ๋ฐ PyTorch ์•ผ๊ฐ„ ๋ฆด๋ฆฌ์Šค๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์‹ฑ์ด pytorch์˜ ํ›„๋“œ ์•„๋ž˜์—์„œ ์‹ค์ œ๋กœ ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€ ๋ชจ๋ฅด์ง€๋งŒ ์šฐ๋ฆฌ๋Š” fast.ai ํฌ๋Ÿผ(https://forums.txt)์—์„œ ์ด "๋ฉ”๋ชจ๋ฆฌ ๋ˆ„์ˆ˜" ๋ฌธ์ œ(์•„๋งˆ๋„ ๋ฉ”๋ชจ๋ฆฌ ๋ˆ„์ˆ˜๊ฐ€ ์•„๋‹ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค!)์— ๋Œ€ํ•ด ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ๋…ผ์˜ํ–ˆ์Šต๋‹ˆ๋‹ค. fast.ai/t/runtimeerror-dataloader-worker-is-killed-by-signal/31277/55?u=marcmuc). ์—ฌ๊ธฐ์— ์•ฝ๊ฐ„์˜ ํ†ต์ฐฐ๋ ฅ์„ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ์˜ˆ๋น„ ๊ฒฐ๊ณผ(์ด๊ฒƒ์ด ์ ์šฉ๋˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ ์˜๊ฒฌ์„ ๋งํ•˜์‹ญ์‹œ์˜ค!):

Python Multiprocessing: Python ์˜ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ์— ์ž„์˜์˜ Python ๊ฐœ์ฒด(๋‹จ์ˆœํ•œ ๋ชฉ๋ก ํฌํ•จ)๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์€ ์—†์Šต๋‹ˆ๋‹ค. refcounts๋Š” ๋ฉ”๋ชจ๋ฆฌ ํŽ˜์ด์ง€ ๋‹จ์œ„๋กœ ์ถ”๊ฐ€๋˜๋ฏ€๋กœ ์†Œ๋น„๊ฐ€ ์ฒœ์ฒœํžˆ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ํ”„๋กœ์„ธ์Šค(์ž‘์—…์ž)๋Š” ๊ฒฐ๊ตญ ๋ชจ๋“ /๋Œ€๋ถ€๋ถ„์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋น„ํŠธ ๋‹จ์œ„๋กœ ๋ณต์‚ฌํ•˜๊ฒŒ ๋˜๋ฏ€๋กœ ๋ฉ”๋ชจ๋ฆฌ ์˜ค๋ฒ„ํ”Œ๋กœ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ด ๋™์ž‘์— ๋Œ€ํ•œ ๊ฐ€์žฅ ์ข‹์€ ์„ค๋ช…์€ ์—ฌ๊ธฐ (SO)์ž…๋‹ˆ๋‹ค.

๊ฐ€๋Šฅํ•œ ํ•ด๊ฒฐ์ฑ…:
์ง€๊ธˆ์ฒ˜๋Ÿผ ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์‹ฑ ์‚ฌ์šฉํ•˜๊ธฐ: ํŒŒ์ด์ฌ ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์‹ฑ์ด ์ด๋Ÿฌํ•œ refcount ํšจ๊ณผ ์—†์ด ์ž‘๋™ํ•˜๋ ค๋ฉด ํ”„๋กœ์„ธ์Šค ํ’€์ด ์ƒ์„ฑ๋˜๊ณ  ์ž‘์—…์ž๊ฐ€ ๋ถ„๊ธฐ๋˜๊ธฐ ์ „์— ๊ฐ์ฒด๊ฐ€ "ํ˜ธํ™˜"๋˜๊ณ  multiprocessing.Array ๋กœ ๋ž˜ํ•‘๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์‹ค์ œ๋กœ ๊ณต์œ ๋˜๊ณ  ๊ธฐ๋ก ์ค‘ ๋ณต์‚ฌ๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์Œ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ numpy ๋ฐฐ์—ด์— ๋Œ€ํ•ด ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•˜๊ณ  ๊ทธ ์ด์œ ๋ฅผ ๋‹ค์‹œ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. copy-on-write๊ฐ€ ์ด ๋ชจ๋“  ๊ฒƒ์„ ๋ถˆํ•„์š”ํ•˜๊ฒŒ ๋งŒ๋“ ๋‹ค๋Š” ์ด ์ข‹์€ ๋‹ต๋ณ€์˜ ์ž‘์„ฑ์ž๋ผ๋„ ์ผ๋ถ€ ์ž˜๋ชป๋œ ์ง„์ˆ ์— ํ˜ผ๋™ํ•˜์ง€ ๋งˆ์‹ญ์‹œ์˜ค. ์ด๋Š” ์‚ฌ์‹ค์ด ์•„๋‹™๋‹ˆ๋‹ค. ํ•œ ์˜๊ฒฌ์€ ๋˜ํ•œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ง€์ ํ•ฉ๋‹ˆ๋‹ค.

"์ฐธ๊ณ ๋กœ Python์—์„œ fork()๋Š” ์‹ค์ œ๋กœ ์•ก์„ธ์Šค ์‹œ ๋ณต์‚ฌ๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค(๊ฐ์ฒด์— ์•ก์„ธ์Šคํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋„ ์ฐธ์กฐ ํšŸ์ˆ˜๊ฐ€ ๋ณ€๊ฒฝ๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค)."

๋‚˜๋Š” pytorch ์‚ฌ์šฉ์„ ์ดํ•ดํ•˜๋Š” torch.multiprocessing ๋“œ๋กญ์ธ ๊ต์ฒด์— ์ต์ˆ™ํ•˜์ง€ ์•Š์ง€๋งŒ ํ•ต์‹ฌ python refcount ๋ฌธ์ œ๋ฅผ ์ œ๊ฑฐํ•  ์ˆ˜ ์—†์„ ๊ฒƒ์ด๋ผ๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.

@mprostock torch.multiprocessing์€ ์‚ฌ์šฉ์ž ์ง€์ • ํ”ผํด๋Ÿฌ๊ฐ€ ์žˆ๋Š” ๋‹จ์ˆœํžˆ Python ๋‹ค์ค‘ ์ฒ˜๋ฆฌ์ž…๋‹ˆ๋‹ค. ์ปค์Šคํ…€ ํ”ผํด๋Ÿฌ๋Š” torch.tensor ๋ฅผ ๋งŒ๋‚  ๋•Œ๋งˆ๋‹ค ์ž๋™์œผ๋กœ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋กœ ์ด๋™ํ•˜๋ฏ€๋กœ ์ ์–ด๋„ torch.tensor ๊ฐ์ฒด์—์„œ๋Š” copy-on-write๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์„ค๋ช… ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค! ๋‚˜๋Š” @bfreskura ์˜ ์žฌ์ƒ์‚ฐ ์˜ˆ์ œ๋ฅผ ์‹คํ—˜ํ–ˆ๊ณ  ์ด์ œ ๋ฌธ์ œ๋ฅผ ์ •ํ™•ํžˆ ์ง€์ ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

์œ„์˜ bfreskura์˜ ์žฌ์ƒ์‚ฐ ์˜ˆ์ œ๋Š” ์ผ๋ฐ˜ ํŒŒ์ด์ฌ ๋ชฉ๋ก๊ณผ numpy ๋ฐฐ์—ด์˜ ์ฐจ์ด์ ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ฌธ์ œ๋Š” ํŒŒ์ด์ฌ ๋ชฉ๋ก ์ž์ฒด์—๋งŒ ์žˆ๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ๊ฐ์ฒด ์œ ํ˜•์˜ numpy ๋ฐฐ์—ด์—์„œ๋„ ๋™์ผํ•˜๊ฒŒ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. Python ๋ชฉ๋ก์€ ๊ฐ์ฒด์— ๋Œ€ํ•œ ์ฐธ์กฐ๋งŒ ์ €์žฅํ•˜๊ณ  ๊ฐ์ฒด๋Š” ๋ฉ”๋ชจ๋ฆฌ์— ๋ณ„๋„๋กœ ๋ณด๊ด€๋ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ๊ฐ์ฒด์—๋Š” refcount๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ ๋ชฉ๋ก์˜ ๋ชจ๋“  ํ•ญ๋ชฉ์—๋Š” refcount๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

Numpy ๋ฐฐ์—ด(ํ‘œ์ค€ np ์œ ํ˜•)์€ ๋ฉ”๋ชจ๋ฆฌ์— ์—ฐ์† ๋ธ”๋ก์œผ๋กœ ์ €์žฅ๋˜๋ฉฐ ํ•˜๋‚˜์˜ ์ฐธ์กฐ ์นด์šดํŠธ๊ฐ€ ์žˆ๋Š” ํ•˜๋‚˜์˜ ๊ฐ์ฒด์ผ ๋ฟ์ž…๋‹ˆ๋‹ค.

์ด๊ฒƒ์€ numpy ๋ฐฐ์—ด์„ ๊ฐ์ฒด ์œ ํ˜•์œผ๋กœ ๋ช…์‹œ์ ์œผ๋กœ ๋งŒ๋“ค๋ฉด ๋ณ€๊ฒฝ๋˜์–ด ์ผ๋ฐ˜ ํŒŒ์ด์ฌ ๋ชฉ๋ก์ฒ˜๋Ÿผ ๋™์ž‘ํ•˜๊ธฐ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค((๋ฌธ์ž์—ด) ๊ฐ์ฒด์— ๋Œ€ํ•œ ์ฐธ์กฐ๋งŒ ์ €์žฅ). ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„์™€ ๋™์ผํ•œ "๋ฌธ์ œ"๊ฐ€ ์ด์ œ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค.

์ด๊ฒƒ์€ ์ผ๋ฐ˜ ๋ชฉ๋ก(๋˜๋Š” ๊ฐ์ฒด ์œ ํ˜•์˜ numpy ๋ฐฐ์—ด)์—์„œ "๋ฉ”๋ชจ๋ฆฌ ๋ˆ„์ˆ˜"๋ฅผ ๋ณด๋Š” ์ด์œ ๋ฅผ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์‹ค์ œ๋กœ ๋ฉ”๋ชจ๋ฆฌ ๋ˆ„์ˆ˜๊ฐ€ ์•„๋‹ˆ๋ผ ์ฐธ์กฐ ์นด์šดํŠธ ๋ณ€๊ฒฝ์œผ๋กœ ์ธํ•œ ๋ถ„๊ธฐ๋œ ํŒŒ์ด์ฌ ํ”„๋กœ์„ธ์Šค์˜ ์•ก์„ธ์Šค ์‹œ ๋ณต์‚ฌ ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ ๋ฌธ์ œ๋Š” ์•„๋งˆ๋„ (์ข…์ข…) ํ…์„œ ๋˜๋Š” ์‹ค์ œ ํ† ์น˜ ๊ฐœ์ฒด์™€ ์•„๋ฌด ๊ด€๋ จ์ด ์—†์œผ๋ฉฐ, ์˜คํžˆ๋ ค ๋ฐ์ดํ„ฐ ๋กœ๋”/๋ฐ์ดํ„ฐ ์„ธํŠธ ๋‚ด์—์„œ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ํŒŒ์ผ ์ด๋ฆ„ ๋ฐ ๋ ˆ์ด๋ธ” ์‚ฌ์ „ ๋ชฉ๋ก๊ณผ ๊ด€๋ จ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๋นจ๋ฆฌ ๊ทธ๊ฒƒ์„ ์‹œ๋„ํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ๋‚˜๋Š” ๋…ธํŠธ๋ถ ์š”์ง€ ๋ฅผ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค.
๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„๋ฅผ ์‚ดํŽด๋ณด์‹ญ์‹œ์˜ค(์ „์ฒด ์‹œ์Šคํ…œ์˜ ๋น ๋ฅด๊ณ  ๋”ํ‹ฐํ•œ ๋ฉ”๋ชจ๋ฆฌ์ด๋ฏ€๋กœ ๋‹ค๋ฅธ ํ”„๋กœ์„ธ์Šค์˜ ์ž‘์€ ์˜ํ–ฅ์œผ๋กœ ์‹œ์Šคํ…œ์„ ๊นจ๋—ํ•˜๊ฒŒ ์œ ์ง€ํ•˜๋ ค๊ณ  ํ–ˆ์Šต๋‹ˆ๋‹ค)

๊ณ ์ • ๊ธธ์ด ๋ฌธ์ž์—ด ๋ฐฐ์—ด์˜ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„(GB):
image

๊ฐœ์ฒด ๋ฐฐ์—ด์˜ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„(GB)(๋ณ€๊ฒฝ๋งŒ ๊ฐ€๋Šฅ)
image

๋‚˜๋Š” ๊ฐ™์€ ๋ฌธ์ œ์— ์ง๋ฉดํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. num_workers > 0์ด๋ฉด RAM์„ ๋งค์šฐ ๋น ๋ฅด๊ฒŒ ์ฑ„์›๋‹ˆ๋‹ค.
๋‚ด ์ฝ”๋“œ์—์„œ ๋” ์ด์ƒ ํ•„์š”ํ•˜์ง€ ์•Š๋‹ค๊ณ  ์ƒ๊ฐ๋˜๋Š” ๋ณ€์ˆ˜๋ฅผ ์‚ญ์ œํ•˜๊ณ  ๋ชจ๋“  ๋ฐ˜๋ณต์—์„œ gc.collect()๋ฅผ ํ˜ธ์ถœํ•˜์ง€๋งŒ ์•„๋ฌด ๊ฒƒ๋„ ๋„์›€์ด ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๊นŒ?

dict์—์„œ pandas๋กœ, ๋ชฉ๋ก์—์„œ numpy ๋ฐฐ์—ด๋กœ ์ „ํ™˜ํ•˜๋ฉด ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

๋‚˜๋Š” ๊ฐ™์€ ๋ฌธ์ œ์— ์ง๋ฉดํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. num_workers > 0์ด๋ฉด RAM์„ ๋งค์šฐ ๋น ๋ฅด๊ฒŒ ์ฑ„์›๋‹ˆ๋‹ค.
๋‚ด ์ฝ”๋“œ์—์„œ ๋” ์ด์ƒ ํ•„์š”ํ•˜์ง€ ์•Š๋‹ค๊ณ  ์ƒ๊ฐ๋˜๋Š” ๋ณ€์ˆ˜๋ฅผ ์‚ญ์ œํ•˜๊ณ  ๋ชจ๋“  ๋ฐ˜๋ณต์—์„œ gc.collect()๋ฅผ ํ˜ธ์ถœํ•˜์ง€๋งŒ ์•„๋ฌด ๊ฒƒ๋„ ๋„์›€์ด ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๊นŒ?

๋‹ต์žฅ์„ ๋ณด๋‚ด ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ๋‚˜๋Š” ๊ทธ๊ฒƒ์„ ์‹œ๋„ํ•˜๊ณ  ํฌ๋ง์ ์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

์ด ๋ฌธ์ œ์— ๋Œ€ํ•œ ํ•ด๊ฒฐ์ฑ…์„ ์š”์ฒญํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ? ๋งˆ์ง€๋ง‰์œผ๋กœ ๋งค์ผ ๋นŒ๋“œ๋œ pytorch์—์„œ @samgd ์ฝ”๋“œ๋ฅผ ์‹œ๋„ํ–ˆ์ง€๋งŒ ์—ฌ์ „ํžˆ ๋ˆ„์ถœ๋˜๊ณ  ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

@Godricly ์œ„์˜ @mprostock ๋ฐ @soumith ์˜ ์˜๊ฒฌ์„ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค. ์ด๊ฒƒ์€ ์‹ค์ œ๋กœ ๋ˆ„์ถœ์ด ์•„๋‹ˆ์ง€๋งŒ python ๊ธฐ๋ณธ ๋ชฉ๋ก์„ ์‚ฌ์šฉํ•˜๋Š” ๋ถˆํ–‰ํ•œ ๋™์ž‘์ž…๋‹ˆ๋‹ค. ํ† ์น˜ ํ…์„œ ๋˜๋Š” np ์–ด๋ ˆ์ด๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ด ๋ฉ”๋ชจ๋ฆฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

@mprostock ๋‹ค๋ฅธ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์•ก์„ธ์Šค ์‹œ ๋ณต์‚ฌ๋กœ ์ธํ•ด ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ์„ฑ๋œ ๋ณต์‚ฌ๋ณธ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๊นŒ? ๊ทธ๋ฆฌ๊ณ  ์‚ฌ์šฉ ํ›„์—๋Š” ๋ณต์‚ฌ๊ฐ€ ํ•ด์ œ๋˜์ง€ ์•Š์Šต๋‹ˆ๊นŒ?

๋ˆ„๊ตฐ๊ฐ€๋Š” ์ตœ์†Œํ•œ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•œ ์ ์ ˆํ•œ ์ฆ๊ฐ• ์ž‘์—…์„ ์ž‘์„ฑํ•˜๊ณ  ๋‹จ๊ณ„๋ฅผ ๋†’์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ชจ๋“  ๋‹ค์ค‘ ์ฒ˜๋ฆฌ ์†์ž„์ˆ˜์— ๋Œ€ํ•œ ๋ชจ๋“  ์ด์œ ๋Š” ๋น„์ „ ๋ฐ์ดํ„ฐ ์„ธํŠธ๊ฐ€ ๋‹ค์ค‘ ์ฝ”์–ด์—์„œ ์ด๋ฏธ์ง€๋ฅผ ๋””์ฝ”๋”ฉํ•˜๊ณ  ์ž๋ฅด๊ธฐ _๊ฐ€์ ธ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋””์ฝ”๋”ฉ ๋ฐ ๊ธฐํ•˜ํ•™์  ์ด๋ฏธ์ง€ ๋ณ€ํ™˜(ํฌ๊ธฐ ์กฐ์ •, ์ž๋ฅด๊ธฐ ๋’ค์ง‘๊ธฐ, ์ „๋‹จ, ์•„ํ•€)์„ ์ฒ˜๋ฆฌํ•˜๊ณ  ๋ฐฐ์น˜ ํ…์„œ๋ฅผ ์ง์ ‘ ์ƒ์„ฑํ•˜๋Š” ์ž‘์—…์ด ์žˆ๋‹ค๋ฉด ๋‹ค์ค‘ ์ฒ˜๋ฆฌ๋ฅผ ์ „ํ˜€ ์‚ฌ์šฉํ•  ํ•„์š”๊ฐ€ ์—†์œผ๋ฉฐ ๋” ๋‚˜์•„๊ฐ€ ๋น„๊ธฐํ•˜ํ•™์  ์ฆ๊ฐ• ๋‹จ๊ณ„๋ฅผ ์‚ฌ์šฉํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. (์ƒ‰์ƒ, ๋ฏธ๋ฐฑ/์ •๊ทœํ™”, ๋…ธ์ด์ฆˆ)๋Š” ๋‚ด๋ถ€ ์—ฐ์‚ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ „์ฒด ํ…์„œ๋ฅผ ์ฐข์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฃผ์„(๊ฒฝ๊ณ„ ์ƒ์ž, ๋งˆ์Šคํฌ, ํ‚คํฌ์ธํŠธ ๋“ฑ)์˜ ๋ณ‘๋ ฌ ๋ณ€ํ™˜์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ํ…์„œ์˜ ๊ฐ ์ƒ˜ํ”Œ์— ๋Œ€ํ•œ ๋ณ€ํ™˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์™ธ๋ถ€์— ๋…ธ์ถœํ•˜๋„๋ก ์ด๋Ÿฌํ•œ ์—ฐ์‚ฐ์„ ์„ค๊ณ„ํ•  ๋•Œ ์ฃผ์˜๋ฅผ ๊ธฐ์šธ์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค.
๋˜๋Š” ๋” ๋‚˜์€ ๋ฐฉ๋ฒ•์€ ์ด ์„œ๋ฒ„๋ฅผ ์—ฌ๋Ÿฌ ํ”„๋กœ์„ธ์Šค(๋ฐ ๋‹ค๋ฅธ DL ํ”„๋ ˆ์ž„์›Œํฌ)์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

@mprostock ํ›Œ๋ฅญํ•œ ์„ค๋ช… ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!

๊ทธ๋Ÿฌ๋‚˜ ์•„์ง ํ•ด๊ฒฐ์ฑ…์ด ์ œ์‹œ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. Dataset ๊ฐ์ฒด์— ํŒŒ์ผ ์ด๋ฆ„ ๋ชฉ๋ก์„ ์ €์žฅํ•˜๋Š” ๊ฒƒ์ด ๊ณต์ •ํ•ด ๋ณด์ด์ง€๋งŒ ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ? ๋ˆ„๊ตฌ๋“ ์ง€ ๊ทธ๊ฒƒ์„ ์•Œ์•„ ๋ƒˆ์Šต๋‹ˆ๊นŒ?

@1e100 @fmassa ๊ฐ€ torchvision ์— ๋„ค์ดํ‹ฐ๋ธŒ ์ด๋ฏธ์ง€ ํ™•๋Œ€ ์ž‘์—…์„ ์ถ”๊ฐ€ํ•˜๋Š” ์ž‘์—…์„ ํ•˜๊ณ  ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด ๋ฌธ์ œ์— ๋Œ€ํ•œ ์—…๋ฐ์ดํŠธ๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ?

๋งŽ์€ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ๋„์ปค ์ปจํ…Œ์ด๋„ˆ ๋‚ด๋ถ€์—์„œ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๊ณ  ์žˆ๊ณ  ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์„ค์ •ํ•  ์ˆ˜ ์—†๋Š” ๊ฒฝ์šฐ ์Šคํฌ๋ฆฝํŠธ ๋‚ด๋ถ€์—์„œ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์„ค์ •ํ•˜๋Š” ํ•ดํ‚น์ž…๋‹ˆ๋‹ค.

os.system(f"mount -o remount,size={args.shared_memory_size} /dev/shm")

๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ํฌ๊ธฐ๋Š” ์˜ˆ๋ฅผ ๋“ค์–ด ์ด RAM์˜ ์ ˆ๋ฐ˜์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด biiig ๋จธ์‹ ์˜ ๊ฒฝ์šฐ '80G'์ž…๋‹ˆ๋‹ค.

๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํŠน์ • ์ง€์ ๊นŒ์ง€ _์—ฌ์ „ํžˆ_ ํฌ๋ฆฝ์ง€๋งŒ ํ—ˆ์šฉ๋˜๋Š” ํŒŒ์ผ ์„ค๋ช…์ž ์ˆ˜๋ฅผ ๋ณ€๊ฒฝํ•˜์—ฌ ์ด ๋ฌธ์ œ์™€ ๊ด€๋ จ๋œ unable to open shared memory object </torch_22291_1137042840> in read-write mode ์˜ค๋ฅ˜ ๋ฐœ์ƒ์— ๋Œ€ํ•œ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์„ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค.

ํ—ˆ์šฉ๋œ ํŒŒ์ผ ์„ค๋ช…์ž ์ˆ˜๋ฅผ ํ™•์ธํ•˜๋ ค๋ฉด bash์— ulimit -a ๋ฅผ ์ž…๋ ฅํ•˜๋ฉด -n ํƒœ๊ทธ ์•„๋ž˜์— ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. ํ˜„์žฌ ์…ธ์— ๋Œ€ํ•œ ์ด ์ œํ•œ์„ ๋†’์ด๋ ค๋ฉด(์ฆ‰, ์„œ๋ฒ„์— ๋Œ€ํ•œ ๊ถŒํ•œ์ด ์—†๋Š” ๊ฒฝ์šฐ) ๋‹ค์Œ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
๋ฐฐ์‰ฌ: ulimit -n NEW_VALUE

์ „์ฒด ์‹œ์Šคํ…œ์— ๋Œ€ํ•ด ๋ณ€๊ฒฝํ•˜๋ ค๋ฉด ์—ฌ๊ธฐ๋ฅผ ์ฐธ์กฐ ํ•˜์‹ญ์‹œ์˜ค .

๋”ฐ๋ผ์„œ ๋‚ด๊ฐ€ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ดํ•ดํ•œ๋‹ค๋ฉด ์ž‘์—…์ž ํ”„๋กœ์„ธ์Šค๋Š” ๋ชฉ๋ก์— ์•ก์„ธ์Šคํ•  ๋•Œ๋งˆ๋‹ค ๊ธด ํŒŒ์ผ ๊ฒฝ๋กœ ๋ชฉ๋ก์˜ ๋ณต์‚ฌ๋ณธ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๊นŒ? ๊ทธ๋Ÿฌ๋‚˜ ์ด ์ž„์‹œ ๋ณต์‚ฌ๋ณธ์€ ํ•ด๋‹น ํ”„๋กœ์„ธ์Šค์— ๋Œ€ํ•œ __getitem__ ํ•จ์ˆ˜๊ฐ€ ๋ฐ˜ํ™˜๋˜์ž๋งˆ์ž ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜(๊ฒฐ๊ณผ์ ์œผ๋กœ ํŒŒ๊ดด๋จ) ๋˜์ง€ ์•Š์Šต๋‹ˆ๊นŒ? RAM ์†Œ๋น„๊ฐ€ ์ œํ•œ ์—†์ด ์ฆ๊ฐ€ํ•˜๋Š” ์ด์œ ๋Š” ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ์ด ๋ฌธ์ œ๋ฅผ ํ”ผํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๋ช‡ ๊ฐ€์ง€ ๋ชจ๋ฒ” ์‚ฌ๋ก€๊ฐ€ ํฌํ•จ๋œ ์งง์€ ๊ฐ€์ด๋“œ๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค๋ฉด ์ข‹์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ˆซ์ž ๊ฐ’์„ ์‚ฌ์šฉํ•˜๋ฉด Python ๋ชฉ๋ก์„ NumPy ๋ฐฐ์—ด๋กœ ์‰ฝ๊ฒŒ ๊ต์ฒดํ•  ์ˆ˜ ์žˆ์ง€๋งŒ (๊ฐ€๋ณ€ ํฌ๊ธฐ) ๋ฌธ์ž์—ด ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋ช…ํ™•ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ œ ๊ฒฝ์šฐ์—๋Š” ์ƒ์„ฑ์ž์—์„œ ์ƒ์„ฑ/์ฑ„์›Œ์ง„ ์‚ฌ์šฉ์ž ์ •์˜ ํด๋ž˜์Šค ๊ฐœ์ฒด ๋ชฉ๋ก์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ ํŒŒ์ผ ๊ฒฝ๋กœ ์„ธํŠธ๋งŒ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ __getitem__ ๋‚ด๋ถ€์—์„œ ํ•ด๋‹น ์ด๋ฏธ์ง€๋ฅผ ๋กœ๋“œํ•˜๊ณ , ์ผ๋ถ€ ์ „์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ , ํ† ์น˜ ํ…์„œ๋กœ ๋ณ€ํ™˜ํ•œ ๋‹ค์Œ ๋ฐ˜ํ™˜ํ•˜๊ธฐ ์ „์— ๋กœ๋“œ๋œ ์ด๋ฏธ์ง€์—์„œ ๋ช…์‹œ์ ์œผ๋กœ del ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ๋ฌธ์ œ๋Š” ๊ฒ‰๋ณด๊ธฐ์— ๋ฌดํ•ดํ•ด ๋ณด์ด๋Š” ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด ์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ๋ฒ—์–ด๋‚œ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Py 3.8์˜ mp.shared_memory ๋Š” ๋งŽ์€ ๋น„ํ…์„œ/nparray ๊ฐ์ฒด๋ฅผ ๊ณต์œ ํ•˜๋Š” ๋ฐ ์ถฉ๋ถ„ํ•œ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์˜ˆ: ๊ณต์œ  ๋ชฉ๋ก: https://docs.python.org/3.8/library/multiprocessing.shared_memory.html). #multiprocessing.shared_memory.ShareableList. :)

๋ฉด์ฑ… ์กฐํ•ญ : ๋‚˜๋Š” ์‹ค์ œ๋กœ ๋ฌธ์„œ๋ฅผ ์ž์„ธํžˆ ์ฝ์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ ์šฐ๋ฆฌ๊ฐ€ ํ•  ์ˆ˜ ์žˆ๋Š” ์กฐ์น˜๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ? ์ผ๋ถ€ ์‚ฌ์šฉ ์‚ฌ๋ก€๋ฅผ ํ•ด๋‹น ์‚ฌ๋ก€๋กœ ์˜ฎ๊ธฐ๋Š” ๊ฒƒ์„ ๋ฌธ์„œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ํ† ์น˜๋น„์ „์—์„œ ์ง€์›๋˜๋Š” ์ถฉ๋ถ„ํ•œ ์ด๋ฏธ์ง€ ๋ณ€ํ™˜์ด ์žˆ์Šต๋‹ˆ๊นŒ?

์—ฌ๊ธฐ์—์„œ ์š”์ ์„ ๋ช…ํ™•ํžˆ ํ•˜๊ธฐ ์œ„ํ•ด: @1e100 ์ด ์ œ์•ˆํ•œ ๊ฒƒ์„ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์€ ์šฐ๋ฆฌ๊ฐ€ torchvision์˜ ๋กœ๋“œ๋งต์— ํฌํ•จํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์ด์ง€๋งŒ, ์šฐ๋ฆฌ ๋ชฉ๋ก์˜ ๋งจ ์œ„์— ์žˆ์ง€ ์•Š์œผ๋ฉฐ ์•„๋งˆ๋„ ๋จผ์ € ์ค‘์ฒฉ ํ…์„œ ์ง€์›์ด ํ•„์š”ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ฆ‰, ์ด๊ฒƒ์€ ์ด ๋ฌธ์ œ์— ๋Œ€ํ•œ ์ผ๋ฐ˜์ ์ธ ์ˆ˜์ •์ด ์•„๋‹™๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์ ‘๊ทผ ๋ฐฉ์‹(์˜ˆ: GPU์˜ ๋ณ€ํ™˜)์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ๋กœ๋“œ์—์„œ ๋‹ค์ค‘ ์ฒ˜๋ฆฌ์˜ ํ•„์š”์„ฑ์„ ์šฐํšŒํ•  ๋ฟ์ž…๋‹ˆ๋‹ค.

๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ์ค‘์ฒฉ ํ…์„œ๋ฅผ ์–ธ๊ธ‰ํ•˜๋Š” ๊ฒƒ์„ ๋ณด์•˜์„ ๋•Œ cc @cpuhrsch . (๊ทธ๋Ÿฐ๋ฐ @cpuhrsch , ์ค‘์ฒฉ ํ…์„œ์— ๋Œ€ํ•œ ๋ชจ๋“ˆ ๋ ˆ์ด๋ธ”์„ ๋งŒ๋“ค๊ณ  https://github.com/pytorch/pytorch/issues/24422 ์—์„œ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?)

์ด ๋ฒ„๊ทธ๊ฐ€ 1๋…„ ๋™์•ˆ ํ•ด๊ฒฐ๋˜์ง€ ์•Š์€ ์ด์œ ๋Š” ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

@IMLHF ์ด ๋ฌธ์ œ ์„ค๋ช…์˜ ์ฒซ ๋ฒˆ์งธ ์ค„ ๋˜๋Š” ์œ„์˜ ๋…ผ์˜๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค. ์ด๊ฒƒ์€ ์‹ค์ œ๋กœ ๋ˆ„์ถœ์ด ์•„๋‹ˆ๋ผ ์šฐ๋ฆฌ์˜ ์†์„ ๋ฒ—์–ด๋‚œ ๋ถˆํ–‰ํ•œ ํŒŒ์ด์ฌ ๋””์ž์ธ์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. pytorch์™€ numpy๋Š” ๋ชจ๋‘ ํ…์„œ ๋ฐ ndarray์— ๋Œ€ํ•œ ์‚ฌ์šฉ์ž ์ง€์ • ์ง๋ ฌํ™”๋ฅผ ๊ตฌํ˜„ํ•˜์—ฌ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๊ณ  ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์šฐ๋ฆฌ๋Š” ์ผ๋ฐ˜์ ์ธ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๋ฅผ ์„ค๋ช…ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ์‚ฌ์šฉ์ž๊ฐ€ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋„๋ก ๋” ๋งŽ์€ ์œ ํ‹ธ๋ฆฌํ‹ฐ๋ฅผ ๊ตฌํ˜„ํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์—ด๋ ค ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฐ ๋ฐ˜๋ณต ๋์— torch.cuda.empty_cache() ๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์„ ์ถ”๊ฐ€ํ•œ ํ›„ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ๋ˆ„์ ๋˜์ง€ ์•Š๊ณ  ๋ณ€๋™ํ•ฉ๋‹ˆ๋‹ค.

์•„๋งˆ๋„ ์šฐ๋ฆฌ๋Š” ๊ฒฝ๊ณ ๋ฅผ ์ถ”๊ฐ€ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

@VitalyFedyunin ์ด๊ฒƒ์„ ๋ณผ ๋Œ€์—ญํญ์ด ์žˆ์Šต๋‹ˆ๊นŒ? ์ตœ์†Œํ•œ ์ด๊ฒƒ์ด https://github.com/pytorch/pytorch/issues/17499์™€ ๋™์ผํ•œ ๋ฌธ์ œ์ธ์ง€ ์•Œ์•„๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

๋‚ด ํ”„๋กœ์ ํŠธ์—์„œ ํ…์„œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋Œ€์‹  ndarray๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

๋‚ด ์ด์ „ ์ฝ”๋“œ๋Š”

def df2var(x):
    return (torch.LongTensor(token2id(x['Query'], max_char = max_length_char)), 
            torch.tensor(coll2id[x['Agg_Coll']], dtype = torch.long))

class Making_Dataset(Dataset):
    def __init__(self, input_dataframe):
        self.dataset = input_dataframe.apply(lambda x : df2var(x), axis = 1)

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, data_index):
        return self.dataset[data_index]

๊ทธ๋ฆฌ๊ณ  ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.

class Making_Dataset(Dataset):
    def __init__(self, input_dataframe):
        self.text = np.array([token2id(q, max_char = max_length_char) for q in input_dataframe.Query])
        self.labels = np.array([coll2id[coll] for coll in input_dataframe.Agg_Coll])

    def __len__(self):
        return len(self.text)

    def __getitem__(self, data_index):
        return self.text[data_index], self.labels[data_index]

์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•œ ํ›„ ๊ฐ ์—ํฌํฌ์˜ ๋ฉ”๋ชจ๋ฆฌ ์ฆ๊ฐ€ ๋ฌธ์ œ๊ฐ€ ๋‚ด ํ”„๋กœ์ ํŠธ์—์„œ ์‚ฌ๋ผ์กŒ์Šต๋‹ˆ๋‹ค.
์ด ๋ฌธ์ œ์˜ ์›์ธ์ด ๋ฌด์—‡์ธ์ง€ ์ •ํ™•ํžˆ ๋ชจ๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ์ด์— ๋Œ€ํ•œ ๋ชจ๋“  ์˜๊ฒฌ์„ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค!

Ubuntu 18.04์˜ CUDA 10์ด ํฌํ•จ๋œ Torch 1.3.0๊ณผ ์œ ์‚ฌํ•œ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ 64GB RAM์ด ์žˆ๋Š” AWS ์‹œ์Šคํ…œ์—์„œ๋Š” ๋ฌธ์ œ๊ฐ€ ๋˜์ง€ ์•Š์•˜์ง€๋งŒ 128GB RAM ๋ฐ 128GB ์Šค์™‘์ด ์žˆ๋Š” ๋กœ์ปฌ ์‹œ์Šคํ…œ์—์„œ๋Š” 150 ์—ํฌํฌ๋„ ํ†ต๊ณผํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ๋ช‡ GB(์˜ˆ์ƒ)์—์„œ 128GB๋กœ ๊ณ„์† ์กฐ๊ธˆ์”ฉ ๋Š˜์–ด๋‚ฉ๋‹ˆ๋‹ค + GB.

์—…๋ฐ์ดํŠธ

๋‚ด ๋ฌธ์ œ๋Š” ๊ตํ™œํ•œ ์ž‘์€ ๋ฒ„๊ทธ์˜€์Šต๋‹ˆ๋‹ค. ๊ต์œก ํ†ต๊ณ„๋ฅผ ๊ธฐ๋กํ•˜๋Š” ๋™์•ˆ ์ˆœ์ˆ˜ ๊ฐ’๊ณผ ํ•จ๊ป˜ ๊ธฐ์šธ๊ธฐ ์ •๋ณด๋ฅผ ์ €์žฅํ•˜๊ณ  ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ์ •๋ณด๋Š” ๋ถˆํ•„์š”ํ•˜๊ณ  ๊ฐ ์—ํฌํฌ์˜ ๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„์— ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค.

Py 3.8์˜ mp.shared_memory๋Š” ๋งŽ์€ ๋น„ํ…์„œ/nparray ๊ฐ์ฒด(์˜ˆ: ๊ณต์œ  ๋ชฉ๋ก)๋ฅผ ๊ณต์œ ํ•˜๋Š” ๋ฐ ์ถฉ๋ถ„ํ•œ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
https://github.com/pytorch/pytorch/issues/13246#issuecomment -513480017

๋งŽ์€ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
https://github.com/pytorch/pytorch/issues/13246#issuecomment -487042977

์•ˆ๋…•ํ•˜์„ธ์š”, ์ด ์ฃผ์ œ์— ๋Œ€ํ•ด ์กฐ๊ธˆ ๋Šฆ์—ˆ์ง€๋งŒ ์‚ฌ์ „๊ณผ ๋™์ผํ•œ ๋ฌธ์ œ์— ์ง๋ฉดํ•ด ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ํ•ดํ‚น์œผ๋กœ ์„ฑ๊ณตํ•œ ์‚ฌ๋žŒ์ด ์žˆ์Šต๋‹ˆ๊นŒ?

์ด๊ฒƒ์€ ์—ฌ์ „ํžˆ โ€‹โ€‹์œ ํšจํ•œ ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๋ฉ”๋ชจ๋ฆฌ ๋ˆ„์ˆ˜๋ฅผ ์ผ์œผํ‚ค์ง€ ์•Š๊ณ  DataLoaders์—์„œ ์—ฌ๋Ÿฌ ์ž‘์—…์ž๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๋ชจ๋ฒ” ์‚ฌ๋ก€ ๋ชฉ๋ก์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

@marrrcin ๋‚ด ์ƒ๊ฐ์— ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์€ ํ…์„œ๋ฅผ ๋น„์‹ผ ๊ฒƒ์œผ๋กœ ๊ฐ„์ฃผํ•˜๋Š” ๊ฒƒ์ด๋ฏ€๋กœ, ํŠนํžˆ ํ…์„œ์— ๊ทธ๋ผ๋””์–ธํŠธ ์ •๋ณด๊ฐ€ ์žˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋Š” ๊ฒฝ์šฐ ์‚ฌ์šฉ ๋นˆ๋„์— ์ฃผ์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด torch ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•  ๋•Œ๊นŒ์ง€ ๋ชจ๋“  ํ•ญ๋ชฉ์„ ๋ชฉ๋ก ๋˜๋Š” numpy.ndarray ๋กœ ์ €์žฅํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

@AudreyBard๋‹˜ ๋‹ต๋ณ€ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ๋‚ด ๋ฐ์ดํ„ฐ ์„ธํŠธ ์ฝ”๋“œ์—๋Š” ๋ชจ๋“  ๊ฒƒ์ด numpy/lists/strings/int๋กœ ์ €์žฅ๋˜์–ด ์žˆ์œผ๋ฉฐ ํ…์„œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์œ ์ผํ•œ ๋ถ€๋ถ„์€ __getitem__ ์ด๊ณ  ๋‚˜์ค‘์— collate_fn (ํŒจ๋”ฉ ์ ์šฉ)์ž…๋‹ˆ๋‹ค. requires_grad ๊ฐ€ false๋กœ ์„ค์ •๋œ ํ…์„œ๋ฅผ ์ƒ์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ? ๋‚ด ์ฝ”๋“œ๊ฐ€ num_workers>0์œผ๋กœ ๋“ค์–ด๊ฐ€๋ฉด ๋ฉ”๋ชจ๋ฆฌ ๋ˆ„์ˆ˜๊ฐ€ ์‹œ์ž‘๋ฉ๋‹ˆ๋‹ค.

๋ชจ๋ฐ”์ผ์ด๋ผ ํ˜•์‹์ด ์•ˆ๋งž์•„์„œ ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค.

@marrrcin ๋‚˜๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ __getitem__ ์˜ tensor ๋กœ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ(์ด๋ฏธ์ง€ ๋˜๋Š” ์‹ ํ˜ธ ๋“ฑ)๋งŒ ์บ์ŠคํŠธํ•ฉ๋‹ˆ๋‹ค. ๋‚ด ๋ ˆ์ด๋ธ” ๋“ฑ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๋ชฉ๋ก์œผ๋กœ ๋ฐ˜ํ™˜๋ฉ๋‹ˆ๋‹ค. ์–ด๋–ค ์ข…๋ฅ˜์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š”์ง€ ๋˜๋Š” ํŠน๋ณ„ํ•œ ์ข…๋ฅ˜์˜ ํŒจ๋”ฉ์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š”์ง€ ์ž˜ ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ ์ผ๋ฐ˜์ ์œผ๋กœ __getitem__ torchvision.transforms #$ ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋งŒํ•œ ๊ฐ€์น˜๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉ์ž ์ •์˜ collate_fn ๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒฝ์šฐ๋Š” ๊ฑฐ์˜ ์—†์Šต๋‹ˆ๋‹ค.

์ƒ๊ฐ: ๋‚˜๋Š” ๋ฉ”๋ชจ๋ฆฌ ๋ˆ„์ˆ˜๋ผ๊ณ  ์ƒ๊ฐํ–ˆ๋˜ ๊ฒƒ์„ ๊ฒฝํ—˜ํ•˜๊ณ  ์žˆ์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์›๋ž˜ ์—ฌ๊ธฐ์— ๊ฒŒ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค. ๋งค ์—ํฌํฌ๋งˆ๋‹ค ๋ถˆํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ์— ๋งค๋‹ฌ๋ฆฌ๊ณ  ์žˆ์—ˆ๊ณ , ์ œ ์ž…์žฅ์—์„œ๋Š” ์ •๋ง ๋ฏธ๋ฌ˜ํ•œ ๋ณ€์ˆ˜ ๊ด€๋ฆฌ์˜€์„ ๋•Œ ๋ˆ„์ˆ˜ ์ฆ์ƒ์ด ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค. ๋ฌด์Šจ ์ผ์ด ์ผ์–ด๋‚˜๊ณ  ์žˆ๋Š”์ง€ ์ •ํ™•ํžˆ ํŒŒ์•…ํ•˜๋Š” ๋ฐ ์‹œ๊ฐ„์ด ๊ฑธ๋ ธ์Šต๋‹ˆ๋‹ค.

@AudreyBeard ์ œ ๊ฒฝ์šฐ๋Š” ์ด๋ฏธ์ง€/torchvision๊ณผ ๊ด€๋ จ์ด ์—†์Šต๋‹ˆ๋‹ค. ๊ฐ€๋ณ€ ๊ธธ์ด ํ…์ŠคํŠธ์—์„œ ์ถ”์ถœํ•œ ํ† ํฐ์— ํŒจ๋”ฉ์„ ์‚ฌ์šฉํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— collate_fn ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

class PaddingCollateFn:
    def __call__(self, batch):
        sorted_batch = sorted(batch, key=lambda x: x[0].shape[0], reverse=True)
        sequences = [x[0] for x in sorted_batch]
        sequences_padded = torch.nn.utils.rnn.pad_sequence(sequences, batch_first=True)

        attention_masks = [torch.tensor([1 for _ in x[0]]) for x in sorted_batch]

        attention_masks_padded = torch.nn.utils.rnn.pad_sequence(
            attention_masks, batch_first=True
        )
        lengths = torch.tensor([len(x) for x in sequences])
        labels = torch.tensor([x[1] for x in sorted_batch])

        return (sequences_padded, lengths, attention_masks_padded), labels

ํŒจ๋”ฉ ํ›„ ์›๋ณธ ํ…์„œ๋ฅผ ์‚ญ์ œํ•ด์•ผ ํ•˜๋‚˜์š”(์˜ˆ del ์‚ฌ์šฉ)? collate_fn ๊ฐ€ ์™„๋ฃŒ๋˜๋ฉด ํ•ด๋‹น ํ•ญ๋ชฉ์— ๋Œ€ํ•œ ์ฐธ์กฐ๊ฐ€ ์—†์œผ๋ฏ€๋กœ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜ ์ œ๊ฑฐ๋  ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค.

๋‚˜๋Š” ์ด๊ฒƒ์„ Pytorch ๋ฒ„์ „ 1.3.1์—์„œ ๋งŒ๋‚ฌ์Šต๋‹ˆ๋‹ค.... ImageNet์„ ํ›ˆ๋ จํ•  ๋•Œ....๋ˆ„๊ตฐ๊ฐ€ ์•„์ด๋””์–ด๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ?
์ œ ๊ฒฝ์šฐ์—๋Š” 24๊ฐœ์˜ num_workers๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ Epoch 1์—์„œ๋Š” ์•ฝ 110G ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜์ง€๋งŒ ๋‘ ๋ฒˆ์งธ Epoch๋ฅผ ํ›ˆ๋ จํ•  ๋•Œ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ชจ๋‘ ์†Œ๋ชจ๋˜๊ณ  ์‹œ์Šคํ…œ์ด ๋ฐ์ดํ„ฐ๋กœ๋”๋ฅผ ์ฃฝ์ผ ๊ฒƒ์ž…๋‹ˆ๋‹ค..... ์™œ์ธ์ง€ ๋ชฐ๋ผ....

๋‚˜์—๊ฒŒ ๋ฌธ์ œ๋Š” ์ด๋ฏธ numpy ๋ฐฐ์—ด์„ ๋ฐ์ดํ„ฐ ๋กœ๋” __getitem__ ์—์„œ ํ† ์น˜ ํ…์„œ๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Numpy ๋ฐฐ์—ด์€ ๋ชจ๋ธ๋กœ ์ „์†ก๋˜๊ธฐ ์ง์ „์— ํŠธ๋ ˆ์ด๋„ˆ ๋ฃจํ”„์—์„œ ํ† ์น˜ ํ…์„œ๋กœ๋งŒ ๋ณ€ํ™˜๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด ํ…์„œ๊ฐ€ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

watch -n .3 df -h ๋ช…๋ น์„ ์‹คํ–‰ํ•˜์—ฌ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋Š” /dev/shm ํ–‰์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค.
์‚ฌ์šฉ๋œ ์–‘์€ ๊ฐ Epoch ํ›„์— ์ฆ๊ฐ€ํ•˜์ง€ ์•Š์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋‚˜๋Š” ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€์žˆ๋‹ค

์ด ๋ฒ„๊ทธ๋Š” pytorch 1.4.0์—์„œ ํ•ด๊ฒฐ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

๋‚˜๋„ ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€์žˆ๋‹ค

์ €๋„ ๊ฐ™์€ ๋ฌธ์ œ์— ์ง๋ฉดํ•ด ์žˆ์Šต๋‹ˆ๋‹ค.
1) ๋ถˆํ•„์š”ํ•œ ๋ณ€์ˆ˜๋ฅผ ๋ชจ๋‘ ์‚ญ์ œ
2) ๋ชฉ๋ก ๋Œ€์‹  numpy ๋ฐฐ์—ด ์‚ฌ์šฉ
3) gc.collect() ์‚ฌ์šฉ

@annukkaa ๋ฐ ๊ธฐํƒ€: ๋ฌธ์ž์—ด ๋ชฉ๋ก์„ ๋งŽ์€ ๊ฐœ์ฒด๋กœ ์ €์žฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— np.array(list_of_paths) ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋Š” ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. np.array(list_of_paths).astype(np.string_) ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐฐ์—ด์„ ์ •์‚ฌ๊ฐํ˜• ๋ฐ”์ดํŠธ ๋ฐฐ์—ด๋กœ ์บ์ŠคํŒ…ํ•ฉ๋‹ˆ๋‹ค(๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ๋กœ ๋ฌธ์ž์—ด์„ ์‚ฌ์šฉํ•  ๋•Œ ๋ฐ”์ดํŠธ์—์„œ str๋กœ ๋ณ€ํ™˜ํ•ด์•ผ ํ•จ). ๋„์›€์ด ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋†’์€ ๊ฐ’(์˜ˆ: 100GB)์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

์ด ์Šค๋ ˆ๋“œ์—์„œ ๋ช…์‹œ์ ์œผ๋กœ ์–ธ๊ธ‰๋œ ๊ฒƒ์„ ๋ณธ ์ ์ด ์—†์œผ๋ฏ€๋กœ ๋‚ด ์†”๋ฃจ์…˜์„ ๊ณต์œ ํ•  ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค.
์ œ ๊ฒฝ์šฐ์—๋Š” ๋ชจ๋“  ๋ฐ˜๋ณต์— ์•ก์„ธ์Šคํ•˜์—ฌ ๋น ๋ฅด๊ฒŒ CPU ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ณ ๊ฐˆ์‹œํ‚จ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ์‚ฌ์šฉ์ž ์ง€์ • ํด๋ž˜์Šค ๊ฐœ์ฒด์™€ ๋ฌธ์ž์—ด ๋ชฉ๋ก์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
๊ณต์œ  ์ƒํƒœ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋‹ค์ค‘ ์ฒ˜๋ฆฌ ๊ด€๋ฆฌ์ž ๊ฐœ์ฒด ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํด๋ž˜์Šค์™€ ๋ชฉ๋ก์„ ๋ž˜ํ•‘ํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ๋ˆ„์ˆ˜๋ฅผ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

์ตœ์†Œํ•œ์˜ ์˜ˆ์ œ์™€ ์—ฐ๊ฒฐํ•˜๋ฉด ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

from torch.utils.data import Dataset, DataLoader
import torch
from multiprocessing import Manager


class DataIter(Dataset):
    def __init__(self):
        manager = Manager()
        self.data = manager.list([x for x in range(24000000)])

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        data = self.data[idx]
        return torch.tensor(data)


train_data = DataIter()
train_loader = DataLoader(train_data, batch_size=300,
                          shuffle=True,
                          drop_last=True,
                          pin_memory=False,
                          num_workers=18)

for i, item in enumerate(train_loader):
    if i % 1000 == 0:
        print(i)

๊ฐ์ฒด๊ฐ€ ํ”ผํด๋˜๊ธฐ ๋•Œ๋ฌธ์— ์•ฝ๊ฐ„์˜ ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ์žˆ์ง€๋งŒ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํญ๋ฐœํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์ข‹์€ ๋Œ€์•ˆ์ž…๋‹ˆ๋‹ค.

์ด๊ฒŒ ๋‹ค ๊ณ ์ณ์งˆ๊นŒ์š”????

๋ฌธ์ œ๊ฐ€ ์•„์ง ์—ด๋ ค์žˆ๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

ndarrays๋ฅผ ์‚ฌ์šฉํ•ด๋„ ๋„์›€์ด ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ž‘์—…์ž๊ฐ€ 0์ธ ์ƒํƒœ์—์„œ CPU RAM์„ ์•ฝ 4๋ฐฐ ์˜ฌ๋ฆฝ๋‹ˆ๋‹ค.
๋ธ์„ ์‹œ๋„ํ–ˆ์ง€๋งŒ ์œ ์˜๋ฏธํ•œ ๊ฐœ์„ ์€ ์—†์—ˆ์Šต๋‹ˆ๋‹ค.

์•ˆ๋…• ๋ชจ๋‘,

๋‚˜๋Š” ์ด๊ฒƒ์— ๋Œ€ํ•œ ํ•ด๊ฒฐ์ฑ…์„ ์‹œ๋„ํ–ˆ๊ณ  ์ด๊ฒƒ์€ ์ ˆ๋Œ€์ ์ธ ์•„๋ฆ„๋‹ค์›€์ฒ˜๋Ÿผ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

๋‚˜๋ฅผ ์œ„ํ•ด ๋กœ์ปฌ์— ์ €์žฅ๋œ numpy ๋ฐฐ์—ด๋กœ imagenet ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๋‚ด ์‚ฌ์šฉ์ž ์ •์˜ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค ---

`์ˆ˜์ž… ํ† ์น˜
Torch.utils์—์„œ ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ
numpy๋ฅผ np๋กœ ๊ฐ€์ ธ์˜ค๊ธฐ

ํด๋ž˜์Šค DataSetBuilder(data.Dataset):
"""TinyImagenet ๋ฐ์ดํ„ฐ ์„ธํŠธ."""

def __init__(self, rootpath, train=True, transform=None):
    """
    Args:
        rootpath: Path to the pytorch file with annotations.
        root_dir (string): Directory with all the images.
        transform (callable, optional): Optional transform to be applied
            on a sample.
    """
    self.path = rootpath
    self.transform = transform
    self.train = train
    # Load input data
    if self.train:
        self.X_ = np.load(self.path +'x_train.npy')
    else:
        self.X_ = np.load(self.path +'x_test.npy')
    # Load target data
    if self.train:
        self.y_ = np.load(self.path +'y_train.npy')
    else:
        self.y_ = np.load(self.path +'y_test.npy')

def __len__(self):
    if self.train:
        dataFile = self.path + 'x_train.npy'
    else:
        dataFile = self.path + 'x_test.npy'

    data = np.load(dataFile)
    return data.shape[0]

def __getitem__(self, idx):
    if torch.is_tensor(idx):
        idx = idx.tolist()
    X = self.X_[idx, :, :, :]
    y = self.y_[idx]
    if self.transform is not None:
        X = self.transform(X)
    return X, torch.from_numpy(y).type(torch.LongTensor)`

__getitem__์— ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋“œํ•˜๋Š” ๋Œ€์‹  ๊ฐ์ฒด๋ฅผ ๋นŒ๋“œํ•  ๋•Œ ๋กœ๋“œํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ๋งค๋ฒˆ ๋™์ผํ•œ numpy ๋ฐฐ์—ด์„ ๋ฉ”๋ชจ๋ฆฌ์— ๋กœ๋“œํ•˜์ง€ ์•Š๊ณ  ๊ฐ์ฒด ์ƒ์„ฑ ์‹œ ํ•œ ๋ฒˆ์— ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.

๋„์›€์ด ๋˜์—ˆ๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค!

์ด๊ฒƒ์ด ํšจ๊ณผ๊ฐ€ ์žˆ๋‹ค๋ฉด ๋Œ“๊ธ€์„ ๋‚จ๊ฒจ์ฃผ์„ธ์š”... :-)

์•ˆ๋…•ํ•˜์„ธ์š” @varinder-singh๋‹˜,
ํ•ด๊ฒฐ์ฑ…์„ ์ฐพ์œผ์…จ๋‹ค๋‹ˆ ๋‹คํ–‰์ž…๋‹ˆ๋‹ค. ์ด์ „ ์— @bfreskura ๊ฐ€ ์ œ๊ณตํ•œ numpy ์˜ˆ์ œ์™€ ์ด๊ฒƒ์ด ์–ด๋–ป๊ฒŒ ๋‹ค๋ฅธ์ง€ ๋ชจ๋ฅด๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ท€ํ•˜์˜ __getitem__ ๋Š” ๋˜ํ•œ numpy ๋ฐฐ์—ด์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์Šฌ๋ผ์ด์Šคํ•ฉ๋‹ˆ๋‹ค.
์ฝ”๋“œ๋ฅผ ์ž˜๋ชป ์ฝ๊ณ  ์žˆ๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„์— ๋‹ค๋ฅธ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์ด์œ ๋ฅผ ์„ค๋ช…ํ•ด ์ฃผ์‹œ๊ฒ ์Šต๋‹ˆ๊นŒ?

ํ˜„์žฌ ํ”„๋กœ์ ํŠธ์—์„œ ์ด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๊ณ  ์ด ์Šค๋ ˆ๋“œ๋ฅผ ์ฝ์€ ํ›„ ๋‚ด ์ƒ๊ฐ์„ ์ถ”๊ฐ€ํ•˜๊ณ  ๋‹ค์†Œ ๋‹ค์–‘ํ•œ ์†”๋ฃจ์…˜์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ด ์œ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

๋จผ์ € ์ฒซ ๋ฒˆ์งธ ๊ฒƒ๋“ค:

1) ์—ฌ๊ธฐ์—์„œ ๊ด€์ฐฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋“  ๊ฒƒ์„ ๊ณ ๋ คํ•˜๋ฉด @mprostock ์˜ ์ง„๋‹จ์ด ์ •ํ™•ํ•ฉ๋‹ˆ๋‹ค. ๋‹น์‹ ์˜ ์ž‘์—… ๋•๋ถ„์— ํ˜ผ์ž์„œ ๋•…์„ ํŒŒ๋Š”๋ฐ ๋งŽ์€ ์‹œ๊ฐ„์„ ์ ˆ์•ฝํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
2) ๋ฌผ๋ก  @sumith ์˜ ์‘๋‹ต๋„ ์ •ํ™•ํ•˜์ง€๋งŒ @mprostock ์˜ ์ดํ›„ ๊ฐ์ฒด ๋ฐฐ์—ด ๊ฒŒ์‹œ๋ฌผ์—์„œ ์–ธ๊ธ‰ํ•œ ์ด์œ ๋กœ ์ธํ•ด ์ด ๊ฒฝ์šฐ์—๋Š” ์ ์šฉ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ด๊ฒƒ์€ pyTorch ๋ฌธ์ œ๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ํŒŒ์ด์ฌ ๋ฌธ์ œ์ด๋ฏ€๋กœ ๊ฑฐ๊ธฐ์—์„œ ํ•ด๊ฒฐํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ฌธ์ œ๋Š” Python์˜ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ์˜ ํ•„์ˆ˜์ ์ธ ๋ถ€๋ถ„์ธ ์ฐธ์กฐ ์นด์šดํŒ…์œผ๋กœ ์ธํ•ด ๋ฐœ์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๋Š” ๊ณง ๋ฐœ์ƒํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์œ„์—์„œ ์ œ์•ˆํ•œ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ• ์ค‘ ์ผ๋ถ€๋Š” ํฅ๋ฏธ๋กญ์ง€ ๋งŒ ์™œ ๊ทธ๋Ÿฌํ•œ ๊ธธ์ด๋กœ ์ด๋™ํ•ฉ๋‹ˆ๊นŒ? ์ž‘์—…์ด ํŒŒ์ผ ์ด๋ฆ„๊ณผ ๊ฐ™์€ ์—ฌ๋Ÿฌ ๊ฐ€๋ณ€ ๊ธธ์ด ์‹œํ€€์Šค์— ๊ณต๋™์œผ๋กœ ์•ก์„ธ์Šคํ•˜๋Š” ๊ฒƒ์ด๋ผ๊ณ  ๊ฐ€์ •ํ•˜๋ฉด ์ƒˆ๋กœ์šด ๊ฒƒ์„ ๋ฐœ๋ช…ํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. numpy๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‹œํ€€์Šค๋ฅผ ์••์ถ•ํ•˜๊ณ  ๊ฐ„์ ‘ ์กฐํšŒ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๋‚ด ๋ง์„ ์ดํ•ดํ•˜๋ ค๋ฉด ์ด ์Šค๋ ˆ๋“œ์—์„œ ๋…ผ์˜๋œ ๋ฌธ์ œ๋ฅผ ์™„์ „ํžˆ ํ”ผํ•˜๋Š” ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

@mprostock ๋ฐ @smolendawid ๋ฌธ์ž์—ด์€ ๋ณธ์งˆ์ ์œผ๋กœ ์ •์ˆ˜ ์‹œํ€€์Šค์ด๋ฉฐ numpy์—์„œ ์‰ฝ๊ฒŒ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ์œ ํ˜•์ž…๋‹ˆ๋‹ค. ์•„๋ž˜ ์˜ˆ์ œ๋Š” ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ ๋กœ๋” ๊ฐ„์— ๋ฌธ์ž์—ด ๋ชฉ๋ก(์˜ˆ: ์ด๋ฏธ์ง€์˜ ํŒŒ์ผ ์ด๋ฆ„)์„ ๊ณต์œ ํ•˜๋„๋ก ์กฐ์ •๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
@marrrcin ๋ชจ๋ฒ” ์‚ฌ๋ก€๋ฅผ ์š”์ฒญํ•˜์…จ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๊ฐ•๋ ฅํ•˜๋ฉฐ ๊ฐ€๋ณ€ ๊ธธ์ด ์‹œํ€€์Šค์˜ ๋ชจ๋“  ๋ชฉ๋ก์—์„œ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ํ˜„์žฌ ํ”„๋กœ์ ํŠธ์—์„œ ๊ฐ ์ฐจ์›์˜ ๊ธธ์ด๊ฐ€ ๊ฐ€๋ณ€์ ์ธ ๋‹ค์ฐจ์› ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์ด๊ฒƒ์˜ ์•ฝ๊ฐ„ ๋” ์ •๊ตํ•œ ๋ณ€ํ˜•์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
@SsnL ์ด๊ฒƒ์€ ๋ฉ‹์ง„ Python 3.8 ๊ตฌ์„ฑ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  /issues/ 20433 ์—์„œ @zhiweifang๊ณผ ๋…ผ์˜ํ•œ ๋ฌธ์ œ๋ฅผ ์•”์‹œ์ ์œผ๋กœ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.

import numpy as np
import torch
from typing import Union

# --- UTILITY FUNCTIONS ---
def string_to_sequence(s: str, dtype=np.int32) -> np.ndarray:
    return np.array([ord(c) for c in s], dtype=dtype)

def sequence_to_string(seq: np.ndarray) -> str:
    return ''.join([chr(c) for c in seq])

def pack_sequences(seqs: Union[np.ndarray, list]) -> (np.ndarray, np.ndarray):
    values = np.concatenate(seqs, axis=0)
    offsets = np.cumsum([len(s) for s in seqs])
    return values, offsets

def unpack_sequence(values: np.ndarray, offsets: np.ndarray, index: int) -> np.ndarray:
    off1 = offsets[index]
    if index > 0:
        off0 = offsets[index - 1]
    elif index == 0:
        off0 = 0
    else:
        raise ValueError(index)
    return values[off0:off1]


# --- OUR DATASET CODE STARTS HERE ---
class MyDataset(torch.utils.data.Dataset):

    def __init__(self):
        strings = [
            'I like', # You can use np.int8 for ASCII strings.
            'chocolate',
            'ๆˆ‘ๅ–œๆฌข', # If you use anything that is not standard ASCII,
            'ๅทงๅ…‹ๅŠ›', # need to use np.int16, or even np.int32.
        ]

        # Convert each string to sequence of codepoints (integer),
        # and then pack them into a numpy array.
        seqs = [string_to_sequence(s) for s in strings]
        self.strings_v, self.strings_o = pack_sequences(seqs)

    def __len__(self): return 4

    def __getitem__(self, i):
        # Use indirect lookup to fetch the i-th sequence. This only uses integer numpy
        # array lookups, which avoids that the objects are subsequently replicated by
        # child processes.
        seq = unpack_sequence(self.strings_v, self.strings_o, i)
        string = sequence_to_string(seq)
        # ACTION NEEDED: You probably do not want to return the string itself ;-).
        return string


m = MyDataset()
for i in range(len(m)):
    print(i, '=', m[i])

# Output
# -------
# 0 = I like
# 1 = chocolate
# 2 = ๆˆ‘ๅ–œๆฌข
# 3 = ๅทงๅ…‹ๅŠ›

๋‚˜๋Š” ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์—ˆ๊ณ  ๋‚ด 2์„ผํŠธ๋ฅผ ๊ณต์œ ํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ฌธ์ž์—ด์ด ๋ฌธ์ œ๋ผ๊ณ  @harpone ๊ณผ ๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค์ด ์ง€์ ํ•œ ์•„์ด๋””์–ด๋ฅผ ๋”ฐ๋ž์Šต๋‹ˆ๋‹ค. ๋‚ด Dataset ํด๋ž˜์Šค์— 2๊ฐœ์˜ ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š” ์ธ์ˆ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. numpy ๋ฌธ์ž์—ด ๋ฐฐ์—ด(.astype(str)์„ ์‚ฌ์šฉํ•œ ์บ์ŠคํŒ…์€ ๋„์›€์ด ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค)
  2. ๋ฌธ์ž์—ด์—์„œ numpy ๋ฒกํ„ฐ๋กœ์˜ ์‚ฌ์ „

๋ฉ”๋ชจ๋ฆฌ ๋ˆ„์ˆ˜๋ฅผ ๋ง‰์œผ๋ ค๋ฉด 1๊ณผ 2๋ฅผ ๋ชจ๋‘ ์ˆ˜์ •ํ•ด์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค. 1์˜ ๊ฒฝ์šฐ ๋‚ด ๋ฌธ์ž์—ด์€ ์‹ค์ œ๋กœ ์‚ฌ์ „์˜ numpy ๋ฒกํ„ฐ์— ์•ก์„ธ์Šคํ•˜๊ธฐ ์œ„ํ•œ ํ•ด์‹œ์ด๋ฏ€๋กœ ๊ณ ์ • ํฌ๊ธฐ ์‚ฌ์ „์ด ์žˆ์œผ๋ฏ€๋กœ ๋ชจ๋“  ๋ฌธ์ž์—ด์„ ์ •์ˆ˜๋กœ ๋ณ€ํ™˜ํ–ˆ์Šต๋‹ˆ๋‹ค.

2์˜ ๊ฒฝ์šฐ ์ •์ˆ˜ ํ‚ค๋ฅผ ์‚ฌ์šฉํ•˜๋„๋ก ์‚ฌ์ „์„ ๋ณ€ํ™˜ํ–ˆ์ง€๋งŒ ๋ฉ”๋ชจ๋ฆฌ ๋ˆ„์ˆ˜๋Š” ์—ฌ์ „ํžˆ ์ง€์†๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ์ž‘๋™ํ•œ ๊ฒƒ์€ ์‚ฌ์ „์„ Dataset ํด๋ž˜์Šค์— ์ „ํ˜€ ์ „๋‹ฌํ•˜์ง€ ์•Š๊ณ  __getitem___์—์„œ interger ํ‚ค๋ฅผ ๋ฐ˜ํ™˜ํ•˜๊ณ  Pytorch ํ…์„œ๋กœ ์‚ฌ์ „ ์ธ๋ฑ์‹ฑ/์ด๋™/๋‚ด ๊ธฐ์ฐจ ๋ฃจํ”„์—์„œ GPU๋กœ ์Šน๊ฒฉํ•˜๋Š” ๊ฒƒ์ด์—ˆ์Šต๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ๋กœ๋” ํ”„๋กœ์„ธ์Šค๊ฐ€ ๋งค ์—ํฌํฌ(epoch)๋งˆ๋‹ค ์Šค์Šค๋กœ๋ฅผ ๋‹ค์‹œ ์ดˆ๊ธฐํ™”ํ•˜๊ณ  ๋ˆ„์ถœ๋œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋ชจ๋‘ ์ •๋ฆฌํ•˜๋„๋ก ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๊นŒ?

@Pozimek ๊ทธ๋“ค์€ ์ด๋ฏธ ๋ชจ๋“  ์‹œ๋Œ€๋ฅผ ๋‹ค์‹œ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋ ‡๋‹ค๋ฉด ์ง€๊ธˆ ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

@wangchust : @bashimao ๊ฐ€ ์ œ์•ˆํ•œ ์†”๋ฃจ์…˜์€ ์ ๋‹นํžˆ ํฐ(2,500๋งŒ ๊ฐœ ์ด์ƒ์˜ ํ…์ŠคํŠธ ์‹œํ€€์Šค) ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ๋„ ์•„๋ฆ„๋‹ต๊ฒŒ ์ž‘๋™ํ–ˆ์Šต๋‹ˆ๋‹ค.

@wangchust : @bashimao ๊ฐ€ ์ œ์•ˆํ•œ ์†”๋ฃจ์…˜์€ ์ ๋‹นํžˆ ํฐ(2,500๋งŒ ๊ฐœ ์ด์ƒ์˜ ํ…์ŠคํŠธ ์‹œํ€€์Šค) ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ๋„ ์•„๋ฆ„๋‹ต๊ฒŒ ์ž‘๋™ํ–ˆ์Šต๋‹ˆ๋‹ค.

์ €๋„ ์š”. @bashimao ์˜ ์†”๋ฃจ์…˜์€ ๋งค์šฐ ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

์•ˆ๋…•ํ•˜์„ธ์š” ์—ฌ๋Ÿฌ๋ถ„, ๋‹ค์‹œ ์™”์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ๋กœ๋” ์ž‘์—…์ž๊ฐ€ ์ฃผ ํ”„๋กœ์„ธ์Šค์—์„œ ๋ถ„๊ธฐํ•  ๋•Œ "OverflowError: 4GiB๋ณด๋‹ค ํฐ ๋ฐ”์ดํŠธ์—ด ๊ฐœ์ฒด๋ฅผ ์ง๋ ฌํ™”ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค"๋ฅผ ์ถฉ์กฑํ•˜๋Š” ์‚ฌ๋žŒ์ด ์žˆ์Šต๋‹ˆ๊นŒ?

์•ˆ๋…•ํ•˜์„ธ์š” ์—ฌ๋Ÿฌ๋ถ„, ๋‹ค์‹œ ์™”์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ๋กœ๋” ์ž‘์—…์ž๊ฐ€ ์ฃผ ํ”„๋กœ์„ธ์Šค์—์„œ ๋ถ„๊ธฐํ•  ๋•Œ "OverflowError: 4GiB๋ณด๋‹ค ํฐ ๋ฐ”์ดํŠธ์—ด ๊ฐœ์ฒด๋ฅผ ์ง๋ ฌํ™”ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค"๋ฅผ ์ถฉ์กฑํ•˜๋Š” ์‚ฌ๋žŒ์ด ์žˆ์Šต๋‹ˆ๊นŒ?

@wangchust ์ง๋ ฌํ™” ์ค‘์ด๋ผ๋ฉด ์•„๋งˆ๋„ ๋ญ”๊ฐ€ ์ž˜๋ชปํ•˜๊ณ  ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ฐ ํ”„๋กœ์„ธ์Šค๋Š” 4๊ฐœ ๋˜๋Š” ์•„๋ฌด๋ฆฌ ํฐ ๊ฐœ์ฒด๊ฐ€ ๊ธฐ๊ฐ€๋ฐ”์ดํŠธ ๋‹จ์œ„๋ผ๋„ ์—ญ์ง๋ ฌํ™”ํ•˜๊ณ  ์ง๋ ฌํ™”๋œ ๊ฐœ์ฒด๋ฅผ ์žฌ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋ณต์ œํ•˜๊ณ  ๋ณ‘๋ ฌ ํ”„๋กœ์„ธ์Šค๊ฐ€ ๋งŽ์œผ๋ฉด ๊ฒฐ๊ตญ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•ด์ง‘๋‹ˆ๋‹ค. ์ด ์Šค๋ ˆ๋“œ์—์„œ ๋‚˜์™€ ๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค์ด ์ œ์•ˆํ•œ ์กฐ์น˜์˜ ์š”์ ์€ ๋ฉ”๋ชจ๋ฆฌ ๋ณต์ œ๋ฅผ ํ”ผํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋‚ด ์ฒซ ๋ฒˆ์งธ ๋ฌธ์žฅ์—์„œ ๋งํ–ˆ๋“ฏ์ด, ๋‚˜๋Š” ๋‹น์‹ ์ด ์•„๋งˆ๋„ ๊ฝค ๊ธฐ๋ณธ์ ์ธ ์ˆ˜์ค€์—์„œ ๋ญ”๊ฐ€ ์ž˜๋ชปํ–ˆ๋‹ค๊ณ  ๋ฏฟ์Šต๋‹ˆ๋‹ค.

์‚ฌ์šฉ์ž ์ง€์ • ํ…์„œ ์ง€์› ๋ฌธ์ž์—ด ๋ฐฐ์—ด์ด https://gist.github.com/vadimkantorov/86c3a46bf25bed3ad45d043ae86fff57 ์— ๋„์›€์ด ๋˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

import torch

class TensorBackedImmutableStringArray:
    def __init__(self, strings, encoding = 'utf-8'):
        encoded = [torch.ByteTensor(torch.ByteStorage.from_buffer(s.encode(encoding))) for s in strings]
        self.cumlen = torch.cat((torch.zeros(1, dtype = torch.int64), torch.as_tensor(list(map(len, encoded)), dtype = torch.int64).cumsum(dim = 0)))
        self.data = torch.cat(encoded)
        self.encoding = encoding

    def __getitem__(self, i):
        return bytes(self.data[self.cumlen[i] : self.cumlen[i + 1]]).decode(self.encoding)

    def __len__(self):
        return len(self.cumlen) - 1

    def __list__(self):
        return [self[i] for i in range(len(self))]

์•„๋งˆ๋„ ์ด์™€ ๊ฐ™์€ sth๋Š” ํ•ต์‹ฌ PyTorch์— ํฌํ•จํ•  ๊ฐ€์น˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ˆ„๊ตฌ๋“ ์ง€ ์‚ฌ์ „์ด ์ž‘๋™ํ•˜๊ณ  ๋ˆ„์ถœ๋˜์ง€ ์•Š๋„๋ก ์šด์ด ์žˆ์Šต๋‹ˆ๊นŒ?
์œ„์—์„œ ์ด ๊ฒŒ์‹œ๋ฌผ ์„ ๋ณด์•˜์ง€๋งŒ ํ•ด๋‹น ์˜๊ฒฌ์—์„œ ์ œ์•ˆํ•˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ์™ธ๋ถ€์—์„œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋Œ€์‹  ์ž‘์—…์ž ๋‚ด๋ถ€์˜ ์ผ๋ถ€ ์œ ํ˜•์˜ ํ•ด์‹œ ํ…Œ์ด๋ธ”์— ์•ก์„ธ์Šคํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ ์ค‘ ํ•˜๋‚˜๋ฅผ ๊ณ ๋ คํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๋‹ค์ค‘ ์ฒ˜๋ฆฌ ๊ด€๋ฆฌ์ž dict
  • ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ๋˜๋Š” mmap
  • pyobject๊ฐ€ ์—†๋Š” ์ž์ฒด ๊ฐœ๋ฐœํ•œ numpy ๊ธฐ๋ฐ˜ ํ•ด์‹œ ํ…Œ์ด๋ธ”.

๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋Š” ๊ฐ€์žฅ ์œ ๋งํ•˜๊ณ  python ์˜ต์…˜์— ๊ณ ์œ ํ•œ ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ž…๋‹ˆ๋‹ค. dict๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ ๊ฐ€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์ผ๋ฐ˜์ ์ธ ํŒจํ„ด์€ ํ•ญ๋ชฉ ๋ชฉ๋ก(์ผ๋ฐ˜์ ์œผ๋กœ ๋ฌธ์ž์—ด)์„ ๊ฐ–๊ณ  ์ƒ‰์ธ์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋‚˜์—๊ฒŒ ๊ทธ๊ฒƒ์€ ์›๋ž˜ dicts ๋ชฉ๋ก์ด์—ˆ์Šต๋‹ˆ๋‹ค (์˜ˆ์ œ ๋ฉ”ํƒ€ ๋ฐ์ดํ„ฐ ๋ชฉ๋ก, ๋ชจ๋“  ์˜ˆ๋Š” dict์˜€์Šต๋‹ˆ๋‹ค)

์•Œ์•˜์–ด์š”. ์ผ๋ฐ˜์ ์œผ๋กœ dicts๋Š” ๋ฉ”๋ชจ๋ฆฌ ์•ก์„ธ์Šค ํŒจํ„ด์ด ์ˆœ์ฐจ์ ์ด์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ๋” ์–ด๋ ต๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค. Fork-safe ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ ์ง€์›์„ ์ถ”๊ฐ€ํ•  ์ƒ๊ฐ์ž…๋‹ˆ๋‹ค.

๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋Š” ๊ฐ€์žฅ ์œ ๋งํ•˜๊ณ  python ์˜ต์…˜์— ๊ณ ์œ ํ•œ ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ž…๋‹ˆ๋‹ค. dict๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ ๊ฐ€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์ผ๋ฐ˜์ ์ธ ํŒจํ„ด์€ ํ•ญ๋ชฉ ๋ชฉ๋ก(์ผ๋ฐ˜์ ์œผ๋กœ ๋ฌธ์ž์—ด)์„ ๊ฐ–๊ณ  ์ƒ‰์ธ์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

@VitalyFedyunin ํŒ ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋จผ์ € ์‹œ๋„ํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
dict์˜ ์ด์œ ๋Š” ๋ฐ”๋กœ ์ง€๊ธˆ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๋‹จ๊ณ„์—์„œ ์ž„์˜ ์ƒ˜ํ”Œ๋ง ๊ธฐ๋Šฅ์— ๋Œ€ํ•œ ์š”์†Œ์˜ O(1) ์กฐํšŒ์ž…๋‹ˆ๋‹ค. ๋ณด๋‹ค ๊ตฌ์ฒด์ ์œผ๋กœ, dict๊ฐ€ user_id์— ๋Œ€ํ•ด ์ž…๋ ฅ๋˜๊ณ  ๊ฐ’์ด ํ•ด๋‹น ์‚ฌ์šฉ์ž์™€ ๊ด€๋ จ๋œ ๊ธ์ •์ ์ธ ์˜ˆ์˜ ๋ชฉ๋ก์ธ "ํŠธ๋ฆฌํ”Œ๋ › ๋งˆ์ด๋‹"์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋ณด๋ ค๋ฉด ์—ฌ๊ธฐ ๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

@marrrcin ๋‚˜๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ __getitem__ ์˜ tensor ๋กœ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ(์ด๋ฏธ์ง€ ๋˜๋Š” ์‹ ํ˜ธ ๋“ฑ)๋งŒ ์บ์ŠคํŠธํ•ฉ๋‹ˆ๋‹ค. ๋‚ด ๋ ˆ์ด๋ธ” ๋“ฑ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๋ชฉ๋ก์œผ๋กœ ๋ฐ˜ํ™˜๋ฉ๋‹ˆ๋‹ค. ์–ด๋–ค ์ข…๋ฅ˜์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š”์ง€ ๋˜๋Š” ํŠน๋ณ„ํ•œ ์ข…๋ฅ˜์˜ ํŒจ๋”ฉ์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š”์ง€ ์ž˜ ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ ์ผ๋ฐ˜์ ์œผ๋กœ __getitem__ torchvision.transforms #$ ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋งŒํ•œ ๊ฐ€์น˜๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉ์ž ์ •์˜ collate_fn ๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒฝ์šฐ๋Š” ๊ฑฐ์˜ ์—†์Šต๋‹ˆ๋‹ค.

์ƒ๊ฐ: ๋‚˜๋Š” ๋ฉ”๋ชจ๋ฆฌ ๋ˆ„์ˆ˜๋ผ๊ณ  ์ƒ๊ฐํ–ˆ๋˜ ๊ฒƒ์„ ๊ฒฝํ—˜ํ•˜๊ณ  ์žˆ์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์›๋ž˜ ์—ฌ๊ธฐ์— ๊ฒŒ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค. ๋งค ์—ํฌํฌ๋งˆ๋‹ค ๋ถˆํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ์— ๋งค๋‹ฌ๋ฆฌ๊ณ  ์žˆ์—ˆ๊ณ , ์ œ ์ž…์žฅ์—์„œ๋Š” ์ •๋ง ๋ฏธ๋ฌ˜ํ•œ ๋ณ€์ˆ˜ ๊ด€๋ฆฌ์˜€์„ ๋•Œ ๋ˆ„์ˆ˜ ์ฆ์ƒ์ด ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค. ๋ฌด์Šจ ์ผ์ด ์ผ์–ด๋‚˜๊ณ  ์žˆ๋Š”์ง€ ์ •ํ™•ํžˆ ํŒŒ์•…ํ•˜๋Š” ๋ฐ ์‹œ๊ฐ„์ด ๊ฑธ๋ ธ์Šต๋‹ˆ๋‹ค.

@AudreyBard ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๋„์›€์ด๋˜์—ˆ๊ณ  ๋‚ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค.

๋‚ด๊ฐ€ ๊ถ๊ธˆํ•œ ์ ์€ (1) ์…”ํ”Œ์ด ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„์— ๋งŽ์€ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์ด์œ ์™€ (2) ์ „์ฒด ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ํ”„๋กœ์„ธ์Šค ์ˆ˜ * ๋ฐ์ดํ„ฐ ์†์„ฑ ํฌ๊ธฐ๋ณด๋‹ค ํ›จ์”ฌ ๋งŽ์€ ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ด๋Š” ์ด์œ ์ž…๋‹ˆ๋‹ค.

@bfreskura ์˜ ์˜ˆ์—์„œ self.data ์˜ ํฌ๊ธฐ๋Š” 24e7 ์ •์ˆ˜๋กœ ๋Œ€๋žต 1.83GB์ž…๋‹ˆ๋‹ค. ์Šคํฌ๋ฆฝํŠธ๋ฅผ ๋น ๋ฅด๊ฒŒ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก 24e5๋กœ ๋‚ฎ์ถ”๋ฉด ๋ฐ์ดํ„ฐ ๊ฐœ์ฒด์˜ ํฌ๊ธฐ๋Š” ๋Œ€๋žต 18.92MB์ž…๋‹ˆ๋‹ค.

Python ๋ชฉ๋ก์˜ ๊ฒฝ์šฐ shuffle=False๋กœ ์„ค์ •ํ•˜๋ฉด ํ”„๋กœ์„ธ์Šค๊ฐ€ 298.17MB๋ฅผ ์†Œ๋น„ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ธก์ •๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ shuffle=True๋กœ ์„ค์ •ํ•˜๊ณ  ํ”„๋กœ์„ธ์Šค๊ฐ€ 1.44GB๋ฅผ ์†Œ๋น„ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ 18๋ช…์ด ๋„˜๋Š” ์ž‘์—…์ž + 1๊ฐœ์˜ ์ฃผ์š” ์ƒ์œ„ ํ”„๋กœ์„ธ์Šค, ๋ชจ๋“  ๋ฐ์ดํ„ฐ๊ฐ€ ๋ชจ๋“  ํ”„๋กœ์„ธ์Šค์— ๋ณต์‚ฌ๋˜๋”๋ผ๋„ ์ตœ๋Œ€ 359.48MB์˜ ์ถ”๊ฐ€ RAM๋งŒ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. shuffle=True์ผ ๋•Œ ๊ทธ ์–‘์˜ ๊ฑฐ์˜ 4๋ฐฐ๊ฐ€ ๋˜๋Š” ๊ฒฝ์šฐ๋Š” ์–ด๋–ป๊ฒŒ ๋ฉ๋‹ˆ๊นŒ? ์ˆœ์ฐจ ๋Œ€ ์ž„์˜ ๋ฉ”๋ชจ๋ฆฌ ์•ก์„ธ์Šค ๋ฐ ๊ฒฐ๊ณผ์ ์ธ ํŽ˜์ด์ง€ ์˜ค๋ฅ˜์™€ ๊ด€๋ จ์ด ์žˆ์–ด์•ผ ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•˜์ง€๋งŒ ์—ฌ๊ธฐ์—์„œ ๋ฌด์Šจ ์ผ์ด ์ผ์–ด๋‚˜๊ณ  ์žˆ๋Š”์ง€ ๋” ์ •ํ™•ํ•˜๊ฒŒ ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ๋Š” ์‚ฌ๋žŒ์ด ์žˆ๋Š”์ง€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค.

@bfreskura ์Šคํฌ๋ฆฝํŠธ์— ๋Œ€ํ•œ ๋‚ด ์ˆ˜์ • ์‚ฌํ•ญ(CLI ์‹คํ–‰ + ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„ ๋ณด๊ณ )์„ ์ฐธ์กฐํ•˜๋ ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ•˜์‹ญ์‹œ์˜ค.

https://gist.github.com/Erotemic/3f017de31529dc64c1a54948f37da1d5

๋žœ๋ค ์•ก์„ธ์Šค๋Š” ํŒŒ์ด์ฌ์ด ๊ฐ์ฒด ์นด์šดํ„ฐ๋ฅผ ๋ฉ”๋ชจ๋ฆฌ์— ๋‹ค์‹œ ์“ฐ๋„๋ก ํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ํ”„๋ ˆ์ž„์˜ ๊ธฐ๋ก ์‹œ ๋ณต์‚ฌ๋ฅผ ์œ ๋ฐœํ•ฉ๋‹ˆ๋‹ค. ์ˆœ์ฐจ ์•ก์„ธ์Šค๋Š” ๋ณ€๊ฒฝ๋˜์ง€ ์•Š์€ ์นด์šดํ„ฐ๋ฅผ ์ž‘์„ฑํ•˜์ง€ ์•Š์Œ์œผ๋กœ์จ ์ž ์žฌ์ ์œผ๋กœ ์ตœ์ ํ™”๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(GC ์ฃผ๊ธฐ์— ๋”ฐ๋ผ ๋‹ค๋ฆ„). ๋˜ํ•œ ์ž‘์—…์ž ์ˆ˜ *(๋ชจ๋“  ๊ฐœ์ฒด ํฌ๊ธฐ + ๊ฐœ์ฒด ์ˆ˜ * ํŒŒ์ด์ฌ ๊ฐœ์ฒด ํฌ์ธํ„ฐ+์นด์šดํ„ฐ ํฌ๊ธฐ)์ธ ์ตœ๋Œ€ ์‚ฌ์šฉ๋Ÿ‰์œผ๋กœ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์ด ํ›จ์”ฌ ๋” ์•ˆ์ „ํ•ฉ๋‹ˆ๋‹ค(์ง€๊ธˆ๊นŒ์ง€๋Š”). ์šฐ๋ฆฌ๋Š” ํ˜„์žฌ ์ „์ฒด ๋ฉ”๋ชจ๋ฆฌ ๋ณต์‚ฌ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ์†”๋ฃจ์…˜์„ ์—ฐ๊ตฌํ•˜๊ณ  ์žˆ์ง€๋งŒ ์ƒ๋‹นํ•œ ์žฌ์„ค๊ณ„๊ฐ€ ํ•„์š”ํ•˜๊ณ  ์‹œ๊ฐ„์ด ๊ฑธ๋ฆด ๊ฒƒ์ž…๋‹ˆ๋‹ค.

@VitalyFedyunin ์„ค๋ช… ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค๋งŒ ์•„์ง ์ž˜ ์ดํ•ด๊ฐ€ ๋˜์ง€ ์•Š๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค :์Šค๋งˆ์ผ:

๋ชฉ๋ก ๋Œ€์‹  numpy ๋ฐฐ์—ด, ์˜ˆ๋ฅผ ๋“ค์–ด np.string_ ์œ ํ˜•์˜ ์ •์‚ฌ๊ฐํ˜• numpy ๋ฐ”์ดํŠธ ๋ฐฐ์—ด์„ ์‚ฌ์šฉํ•˜์—ฌ ์œ„์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ–ˆ์ง€๋งŒ ์ด์ œ webdataset(https:/ /github.com/tmbdev/webdataset/issues/24#issuecomment-709101119). ๋ถ„๋ช…ํžˆ shm์ด ๋ถ€์กฑํ•˜์ง€ ์•Š์ง€๋งŒ @tmbdev ๊ฐ€ webdataset ์Šค๋ ˆ๋“œ์—์„œ ์•ž์„œ ์ง€์ ํ–ˆ๋“ฏ์ด ๋ฌธ์ œ๋Š” ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์„ธ๊ทธ๋จผํŠธ์˜ _์ˆซ์ž_์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค...

์ด ๋ฌธ์ œ ๋ฐ/๋˜๋Š” ๊ด€๋ จ ์ž„์‹œ ํ•ดํ‚น์„ ๋””๋ฒ„๊น…ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ํŒ์ด ์žˆ์Šต๋‹ˆ๊นŒ? ๋‚˜๋Š” ipcs๋ฅผ ์‹œ๋„ํ–ˆ์ง€๋งŒ ๊ทธ๊ฒƒ์€ ๋‚˜์—๊ฒŒ ์œ ์šฉํ•œ ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค (๋‚ด ์ƒ๊ฐ์—). lsof /dev/shm ๋Š” shm ๊ฐœ์ฒด ๋ฐ ํฌ๊ธฐ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋ณด์—ฌ์ฃผ์ง€๋งŒ ๊ทธ ์˜๋ฏธ๊ฐ€ ๋ฌด์—‡์ธ์ง€ ์ž˜ ๋ชจ๋ฅด๊ฒ ์Šต๋‹ˆ๋‹ค...

์ €์—๊ฒŒ proportional set size (pss in psutil)์„ ์ธก์ •ํ•˜๋ฉด ๋ฌธ์ œ์˜ ํฌ๊ธฐ๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž ์ง€์ • StringArray ๋ฐ DictArray ํด๋ž˜์Šค๋กœ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค.

@wangchust : @bashimao ๊ฐ€ ์ œ์•ˆํ•œ ์†”๋ฃจ์…˜์€ ์ ๋‹นํžˆ ํฐ(2,500๋งŒ ๊ฐœ ์ด์ƒ์˜ ํ…์ŠคํŠธ ์‹œํ€€์Šค) ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ๋„ ์•„๋ฆ„๋‹ต๊ฒŒ ์ž‘๋™ํ–ˆ์Šต๋‹ˆ๋‹ค.

์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค. github ์‚ฌ์šฉ์— ๋Œ€ํ•ด ๋ˆ„๋ฝ๋œ ๊ฒƒ์ด ์žˆ์„ ์ˆ˜ ์žˆ์ง€๋งŒ ์ด ์Šค๋ ˆ๋“œ์—์„œ @bashimao ์˜ ํ•ด๊ฒฐ์ฑ…์€ ์—†๊ณ  ์ฃผ์„๋งŒ ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ฌด๋„ ๋‚˜์—๊ฒŒ ๊ทธ๊ฒƒ์„ ๊ฐ€๋ฆฌํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

np.string_ ๋กœ ์บ์ŠคํŒ…ํ•˜๋Š” ๊ฒƒ์ด ํ›จ์”ฌ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค( str ๋“ฑ์€ ์•„๋‹˜). ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๊ณ 

strings = ['hello', 'world']

๊ทธ๋Ÿผ ํ•ด

strings_byte = np.array(strings).astype(np.string_)

๊ทธ๋Ÿฌ๋ฉด ๊ฒฐ๊ณผ๋Š” ๋‹จ์ผ ์ •์‚ฌ๊ฐํ˜• ๋ฐ”์ดํŠธ ๋ฐฐ์—ด์ด ๋ฉ๋‹ˆ๋‹ค(dtype ์ฐธ๊ณ ).

array([b'hello', b'world'], dtype='|S5')

๊ทธ๋Ÿฐ ๋‹ค์Œ str(strings_byte[0], encoding='utf-8') ์™€ ๊ฐ™์ด ๋ฌธ์ž์—ด์„ ์„ ํƒํ•  ๋•Œ ๋ฌธ์ž์—ด๋กœ ๋‹ค์‹œ ์ธ์ฝ”๋”ฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ด๊ฒƒ์€ ์ž‘๋™ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

strings_byte = np.array(strings).astype(str)

dtype์— ์œ ์˜ํ•˜์‹ญ์‹œ์˜ค.

array(['hello', 'world'], dtype='<U5')

๊ทธ๊ฒƒ์€ ์ •์‚ฌ๊ฐํ˜• ๋ฐ”์ดํŠธ ๋ฐฐ์—ด, ์ฆ‰ ๋‹จ์ผ ๊ฐ์ฒด๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค.

์ด ๋ฌธ์ œ๊ฐ€ ์ง€์†๋˜๊ณ  ๋‚˜์™€ ๋‚ด ๋™๋ฃŒ๊ฐ€ ์ด ๋ฌธ์ œ์— ๋ถ€๋”ชํžŒ ํšŸ์ˆ˜๋ฅผ ๊ณ ๋ คํ•  ๋•Œ ์ด๊ฒƒ์ด ์›์ธ์ธ์ง€ ์—ฌ๋ถ€๋ฅผ ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค๋ฉด ๋„์›€์ด ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ์Šค๋ ˆ๋“œ๋ฅผ ์ฒ ์ €ํžˆ ์ฝ์€ ํ›„ ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ์ข‹์€ ์ œ์•ˆ์ด ์žˆ๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค(https://github.com/pytorch/pytorch/issues/13246#issuecomment-436632186, https://github.com/pytorch/pytorch/issues). /13246#issuecomment-612396143), ์ผ๋ถ€ ํ˜ผ๋ž€์Šค๋Ÿฌ์šด ๋™์ž‘(https://github.com/pytorch/pytorch/issues/13246#issuecomment-708067670)๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. ์ด ๋ฌธ์ œ๋ฅผ ๋ฐฐ์ œํ•˜๊ธฐ ์œ„ํ•ด while True ๋ฃจํ”„์—์„œ ๋ฐ์ดํ„ฐ ๋กœ๋”๋งŒ ์‹คํ–‰ํ•˜๊ณ  ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ๊ด€์ฐฐํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ถฉ๋ถ„ํ•ฉ๋‹ˆ๊นŒ? ๋ฃจํ”„๊ฐ€ ์‹คํ–‰๋  ๋•Œ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ปค์ง€๋ฉด ๋ฐ์ดํ„ฐ ์„ธํŠธ ๊ฐœ์ฒด์— ๊ฐœ์ฒด๋ฅผ ์ถ•์ ํ•˜๋Š” ๋ณ‘๋ฆฌํ•™์  ๋™์ž‘์ด ์žˆ๊ฑฐ๋‚˜ ์ด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๊ณ  ์žˆ๋‹ค๋Š” ๊ฒฐ๋ก ์„ ๋‚ด๋ฆด ์ˆ˜ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.
  2. ์ด ์Šค๋ ˆ๋“œ์—์„œ ์ •๋ง ์ดํ•ดํ•  ์ˆ˜ ์—†๋Š” ๊ฒƒ์€ ๋ช‡ MB์˜ ๋ฐ์ดํ„ฐ๋งŒ ๋ณด์œ ํ•˜๋Š” ๋ฐ์ดํ„ฐ ์„ธํŠธ ํด๋ž˜์Šค๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ ์ด๊ฒƒ์ด ๋ฌธ์ œ๊ฐ€ ๋˜๋Š” ์ด์œ ์ž…๋‹ˆ๋‹ค. ๋‚ด๊ฐ€ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ดํ•ดํ–ˆ๋‹ค๋ฉด ์—ฌ๊ธฐ์— ์„ค๋ช…๋œ ๋ฌธ์ œ๋Š” ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ์•ก์„ธ์Šคํ•˜๋Š” ๋ชจ๋“  Python ๊ฐœ์ฒด๊ฐ€ ๊ฒฐ๊ตญ ์ž‘์—…์ž ์Šค๋ ˆ๋“œ์˜ ๋ฉ”๋ชจ๋ฆฌ๋กœ ๋ณต์‚ฌ๋œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์„ธํŠธ ํด๋ž˜์Šค์˜ ํ•„๋“œ๋กœ ์ €์žฅ๋œ ๊ฒฝ๋กœ ๋ชฉ๋ก์—์„œ ๋น„๋””์˜ค๋ฅผ ๋กœ๋“œํ•˜๋Š” ๊ฐ„๋‹จํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ ํด๋ž˜์Šค๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ ์ด๊ฒƒ์ด ๋ฌธ์ œ๊ฐ€ ๋˜๋Š” ์ด์œ ๋Š” ๋ฌด์—‡์ž…๋‹ˆ๊นŒ? ๊ณต์œ  ๋ฐ์ดํ„ฐ๋Š” ๋ฌด์‹œํ•  ์ˆ˜ ์žˆ์„ ์ •๋„๋กœ ์ž‘์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ๋กœ๋”์—์„œ shuffle=True ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด https://github.com/pytorch/pytorch/issues/13246#issuecomment -708067670์— ์„ค๋ช…๋œ ๋Œ€๋กœ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ๋” ๋งŽ์•„์ง€๋Š” ์ด์œ ๋Š” ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

๋‚˜์—๊ฒŒ ๋งž๋Š” ์†”๋ฃจ์…˜ - https://t.me/snakers4/2577

๋‚˜์—๊ฒŒ ๋งž๋Š” ์†”๋ฃจ์…˜ - https://t.me/snakers4/2577

์ด๊ฑฐ ์ข‹๋‹ค! https://gist.github.com/vadimkantorov/86c3a46bf25bed3ad45d043ae86fff57 ์—์„œ ๋‚ด ๋ฐฉ๋ฒ•์˜ ์œ ์ผํ•œ ์žฅ์ ์€ DDP ํ”„๋ฆฌ๋ฏธํ‹ฐ๋ธŒ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ DDP ์ž‘์—…์ž ๊ฐ„์— ํ…์„œ๋กœ ๊ฐ€๋“ ์ฐฌ ๊ฐœ์ฒด๋ฅผ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค(์ฆ‰, ํ•˜๋‚˜์˜ ์Šค๋ ˆ๋“œ์—์„œ๋งŒ ๊ฑฐ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ ๊ฐœ์ฒด๋ฅผ ์ฝ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ํ…์„œ๋กœ ๊ฐ€๋“ ์ฐฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ ๊ฐœ์ฒด๋ฅผ ๋‹ค๋ฅธ DDP ์ˆœ์œ„์— ๋ถ„์‚ฐ์‹œํ‚ต๋‹ˆ๋‹ค. ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ DDP ๋งˆ์Šคํ„ฐ ์ž‘์—…์ž๋Š” DDP ์ˆœ์œ„์—์„œ ํ…์„œ๋กœ ๊ฐ€๋“ ์ฐฌ ๋ฌธ์ž์—ด ๋ฐฐ์—ด์„ ์ˆ˜์ง‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ๋ฒ„๊ทธ์˜ ๋˜ ๋‹ค๋ฅธ ์‹ค์ œ ๋ฐœ์ƒ: https://github.com/NVIDIA/NeMo/issues/1467

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰