Requests: λ©”λͺ¨λ¦¬ λˆ„μˆ˜ μš”μ²­

에 λ§Œλ“  2018λ…„ 04μ›” 20일  Β·  22μ½”λ©˜νŠΈ  Β·  좜처: psf/requests

μš”μ•½.

μ˜ˆμƒ κ²°κ³Ό

μ •μƒμ μœΌλ‘œ μ‹€ν–‰λ˜λŠ” ν”„λ‘œκ·Έλž¨

μ‹€μ œ κ²°κ³Ό

μž‘λ™μ„ 멈좜 λ•ŒκΉŒμ§€ λͺ¨λ“  λž¨μ„ μ†Œλͺ¨ν•˜λŠ” ν”„λ‘œκ·Έλž¨

λ²ˆμ‹ 단계

μ˜μ‚¬ μ½”λ“œ:

def function():
    proxies = {
        'https': proxy
    }
    session = requests.Session()
    session.headers.update({'User-Agent': 'user - agent'})
    try:                                           #
        login = session.get(url, proxies=proxies)  # HERE IS WHERE MEMORY LEAKS
    except:                                        #
        return -1                                  #
    return 0

μ‹œμŠ€ν…œ 정보

$ python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.6"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.6.3"
  },
  "platform": {
    "release": "10",
    "system": "Windows"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.18.4"
  },
  "system_ssl": {
    "version": "100020bf"
  },
  "urllib3": {
    "version": "1.22"
  },
  "using_pyopenssl": false
}
Needs Info Propose Close

κ°€μž₯ μœ μš©ν•œ λŒ“κΈ€

λΉ„μŠ·ν•œ λ¬Έμ œμž…λ‹ˆλ‹€. μš”μ²­μ€ μŠ€λ ˆλ“œμ—μ„œ 싀행될 λ•Œ λ©”λͺ¨λ¦¬λ₯Ό λ¨ΉμŠ΅λ‹ˆλ‹€. μ—¬κΈ°μ—μ„œ μž¬ν˜„ν•  μ½”λ“œ:

import gc
from concurrent.futures import ThreadPoolExecutor, as_completed
import requests
from memory_profiler import profile

def run_thread_request(sess, run):
    response = sess.get('https://www.google.com')
    return

<strong i="6">@profile</strong>
def main():
    sess = requests.session()
    with ThreadPoolExecutor(max_workers=1) as executor:
        print('Starting!')
        tasks = {executor.submit(run_thread_request, sess, run):
                    run for run in range(50)}
        for _ in as_completed(tasks):
            pass
    print('Done!')
    return

<strong i="7">@profile</strong>
def calling():
    main()
    gc.collect()
    return

if __name__ == '__main__':
    calling()

μœ„μ— 주어진 μ½”λ“œμ—μ„œ μ„Έμ…˜ 개체λ₯Ό μ „λ‹¬ν•˜μ§€λ§Œ requests.get μ‹€ν–‰ν•˜λŠ” κ²ƒμœΌλ‘œ κ΅μ²΄ν•˜λ©΄ 아무 것도 λ³€κ²½λ˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.

좜λ ₯은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.

➜  thread-test pipenv run python run.py
Starting!
Done!
Filename: run.py

Line #    Mem usage    Increment   Line Contents
================================================
    10     23.2 MiB     23.2 MiB   <strong i="13">@profile</strong>
    11                             def main():
    12     23.2 MiB      0.0 MiB       sess = requests.session()
    13     23.2 MiB      0.0 MiB       with ThreadPoolExecutor(max_workers=1) as executor:
    14     23.2 MiB      0.0 MiB           print('Starting!')
    15     23.4 MiB      0.0 MiB           tasks = {executor.submit(run_thread_request, sess, run):
    16     23.4 MiB      0.0 MiB                       run for run in range(50)}
    17     25.8 MiB      2.4 MiB           for _ in as_completed(tasks):
    18     25.8 MiB      0.0 MiB               pass
    19     25.8 MiB      0.0 MiB       print('Done!')
    20     25.8 MiB      0.0 MiB       return


Filename: run.py

Line #    Mem usage    Increment   Line Contents
================================================
    22     23.2 MiB     23.2 MiB   <strong i="14">@profile</strong>
    23                             def calling():
    24     25.8 MiB      2.6 MiB       main()
    25     25.8 MiB      0.0 MiB       gc.collect()
    26     25.8 MiB      0.0 MiB       return

그리고 Pipfile은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.

[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true

[requires]
python_version = "3.6"

[packages]
requests = "==2.21.0"
memory-profiler = "==0.55.0"

λͺ¨λ“  22 λŒ“κΈ€

의 좜λ ₯을 μ œκ³΅ν•˜μ‹­μ‹œμ˜€.

python -m requests.help

μš”μ²­ λ²„μ „μ—μ„œ μ‚¬μš©ν•  수 μ—†λŠ” 경우 μ‹œμŠ€ν…œμ— λŒ€ν•œ λͺ‡ 가지 κΈ°λ³Έ 정보(Python 버전, 운영 체제 λ“±)λ₯Ό μ œκ³΅ν•˜μ‹­μ‹œμ˜€.

@sigmavirus24 μ™„λ£Œ

@munroc , μŠ€λ ˆλ”© κ΅¬ν˜„μ΄ μ˜μ‚¬ μ½”λ“œμ— ν¬ν•¨λ˜μ–΄ μžˆμ§€ μ•ŠκΈ° λ•Œλ¬Έμ— μŠ€λ ˆλ”© κ΅¬ν˜„μ— λŒ€ν•œ λͺ‡ 가지 κ°„λ‹¨ν•œ μ§ˆλ¬Έμž…λ‹ˆλ‹€.

  • λͺ¨λ“  μŠ€λ ˆλ“œμ— λŒ€ν•΄ μƒˆ μ„Έμ…˜μ„ λ§Œλ“€κ³  있으며 μ‚¬μš© 쀑인 μŠ€λ ˆλ“œ ν’€μ˜ ν¬κΈ°λŠ” μ–Όλ§ˆμž…λ‹ˆκΉŒ?

  • λˆ„μΆœμ΄ μ–΄λ””μ—μ„œ μ˜€λŠ”μ§€ ν™•μΈν•˜κΈ° μœ„ν•΄ μ–΄λ–€ 도ꡬλ₯Ό μ‚¬μš©ν•˜κ³  μžˆμŠ΅λ‹ˆκΉŒ? κ²°κ³Όλ₯Ό κ³΅μœ ν•΄ μ£Όμ‹œκ² μŠ΅λ‹ˆκΉŒ?

μž μ‹œ λ™μ•ˆ μ„Έμ…˜ μ£Όλ³€μ—μ„œ λ©”λͺ¨λ¦¬ λˆ„μˆ˜μ— λŒ€ν•œ νžŒνŠΈκ°€ μžˆμ—ˆμ§€λ§Œ, μ—°κΈ°κ°€ λ‚˜λŠ” μ΄μ΄λ‚˜ μ‹€μ œλ‘œ ν™•μΈλœ 영ν–₯을 μ°Ύμ•˜λŠ”μ§€ ν™•μ‹€ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.

@nateprewitt μ•ˆλ…•ν•˜μ„Έμš”. 예, λͺ¨λ“  μŠ€λ ˆλ“œμ— λŒ€ν•΄ μƒˆ μ„Έμ…˜μ„ μƒμ„±ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€. μŠ€λ ˆλ“œ 풀은 30μž…λ‹ˆλ‹€. μ–΄μ¨Œλ“  2 - 200개의 μŠ€λ ˆλ“œμ™€ λ©”λͺ¨λ¦¬ λˆ„μˆ˜λ₯Ό μ‹œλ„ν–ˆμŠ΅λ‹ˆλ‹€. 도ꡬλ₯Ό μ‚¬μš©ν•˜μ§€ μ•Šκ³  κΈ°λŠ₯을 λ‹€μŒκ³Ό 같이 λ³€κ²½ν–ˆμŠ΅λ‹ˆλ‹€.
둜그인 = session.get 전에 return 0을 μž…λ ₯ν•˜κ³  λ©”λͺ¨λ¦¬ λˆ„μˆ˜κ°€ μ—†μŠ΅λ‹ˆλ‹€. 둜그인 ν›„ return 0을 λ„£μœΌλ©΄ session.get λ©”λͺ¨λ¦¬ λˆ„μˆ˜κ°€ μ‹œμž‘λ©λ‹ˆλ‹€. 당신이 μ›ν•œλ‹€λ©΄ λ‚΄κ°€ λ‹Ήμ‹ μ—κ²Œ λ‚΄ μ†ŒμŠ€ μ½”λ“œλ₯Ό 보낼 수 μžˆμŠ΅λ‹ˆλ‹€ λ„ˆλ¬΄ 크지 μ•ŠμŠ΅λ‹ˆλ‹€.

@Munroc 전체 μ½”λ“œκ°€ μžˆλ‹€λ©΄ μ‹€μ œ 원인을 λΆ„λ¦¬ν•˜λŠ” 것이 더 μ‰¬μšΈ 것이라고 μƒκ°ν•©λ‹ˆλ‹€. ν•˜μ§€λ§Œ 제곡된 μ½”λ“œ μš”μ μ„ λ°”νƒ•μœΌλ‘œ λ©”λͺ¨λ¦¬ λˆ„μˆ˜κ°€ μžˆλ‹€κ³  λ‹¨μ •ν•˜κΈ°λŠ” 맀우 μ–΄λ ΅λ‹€κ³  μƒκ°ν•©λ‹ˆλ‹€.

μ–ΈκΈ‰ν–ˆλ“―μ΄ session.get λ₯Ό ν˜ΈμΆœν•˜κΈ° 직전에 return ν•˜λ©΄ proxies 및 session 개체만 λ©”λͺ¨λ¦¬μ— μ‘΄μž¬ν•©λ‹ˆλ‹€(κ³Όλ„ν•˜κ²Œ λ‹¨μˆœν™”λ˜μ—ˆμ§€λ§Œ.. 당신이 아이디어λ₯Ό μ–»κΈ°λ₯Ό λ°”λžλ‹ˆλ‹€ : 슀마일 :). κ·ΈλŸ¬λ‚˜ session.get(url, proxies=proxies) λ₯Ό ν˜ΈμΆœν•˜λ©΄ url 의 HTML이 κ²€μƒ‰λ˜μ–΄ login λ³€μˆ˜μ— 둜컬둜 μ €μž₯λ©λ‹ˆλ‹€. 즉, 각 session.get ν˜ΈμΆœμ€ λ©”λͺ¨λ¦¬ λˆ„μˆ˜κ°€ "처럼 λ³΄μ΄μ§€λ§Œ" μ‹€μ œλ‘œλŠ” url 결과의 크기만큼 μ„ ν˜•μ μœΌλ‘œ μ¦κ°€ν•˜μ—¬ (λ©”λͺ¨λ¦¬) μ •μƒμ μœΌλ‘œ μž‘λ™ν•©λ‹ˆλ‹€.

κ·ΈλŸ¬λ‚˜ μŠ€λ ˆλ“œλ₯Ό μ‚¬μš©ν•˜κ³  있고 κ·Έ 직후에 .join() ν•œλ‹€κ³  κ°€μ •ν•΄ λ³΄κ² μŠ΅λ‹ˆλ‹€. 이 경우 μŠ€λ ˆλ“œκ°€ μ–΄λ–»κ²Œ κ΄€λ¦¬λ˜μ—ˆλŠ”μ§€, μŠ€λ ˆλ“œκ°€ μ œλŒ€λ‘œ λ‹«ν˜”λŠ”μ§€/μ •λ¦¬λ˜μ—ˆλŠ”μ§€ 확인해야 ν•œλ‹€κ³  μƒκ°ν•©λ‹ˆλ‹€.

@LeoSZN κ·€ν•˜μ˜ νŠΉμ • μ˜ˆμ—μ„œ urls μš”μ†Œλ‹Ή μ—¬λŸ¬ Process μƒμ„±ν•œ ν›„ λ§ˆμ§€λ§‰ Process 개체만 λ‹«λŠ”λ‹€ κ³  μƒκ°ν•©λ‹ˆλ‹€.

p.daemon = True μ‚¬μš©ν•˜μ—¬ 데λͺ¬ν™”λ₯Ό μ‹œλ„ν•˜κ³  μ‹€ν–‰ν•  수 μžˆμŠ΅λ‹ˆκΉŒ(메인 μŠ€λ ˆλ“œκ°€ μ’…λ£Œλ˜λ©΄ μƒμ„±λœ λͺ¨λ“  μžμ‹ ν”„λ‘œμ„ΈμŠ€λ„ 죽도둝)? 그렇지 μ•ŠμœΌλ©΄ μƒμ„±λœ ν”„λ‘œμ„ΈμŠ€λ₯Ό λ³„λ„μ˜ 배열에 μ €μž₯ν•˜κ³  루프λ₯Ό μ‚¬μš©ν•˜μ—¬ λͺ¨λ“  ν”„λ‘œμ„ΈμŠ€λ₯Ό λ‹«μ•„μ•Ό ν•©λ‹ˆλ‹€.

@initbar

p.join() 전에 λ£¨ν”„μ—μ„œ λ˜λŠ” 루프 μ™ΈλΆ€μ—μ„œ p.daemon = True λ₯Ό μ‹€ν–‰ν•΄μ•Ό ν•©λ‹ˆκΉŒ? 그런데 λ‚˜λŠ” 아직도 ν•„μš”ν•©λ‹ˆκΉŒ p.join() μ μš©ν•œ ν›„ p.daemon = True ?

_Ook, λ‚˜λŠ” μƒˆλ‘œμš΄ μ£Όμ œμ—μ„œ 이 주제둜 μ«“κ²¨λ‚¬μœΌλ‹ˆ, λ‹Ήμ‹ μ˜ μ£Όμ œμ— λ™μ°Έν•˜κ² μŠ΅λ‹ˆλ‹€.
이 λ¬Έμ œκ°€ 더 λ§Žμ€ 정보λ₯Ό μ œκ³΅ν•˜κ³  문제 해결을 κ°•ν™”ν•  κ²ƒμž…λ‹ˆλ‹€..._

ν…”λ ˆκ·Έλž¨ 봇을 μ‹€ν–‰ν•˜κ³  μžˆλŠ”λ° 봇을 μ˜€λž«λ™μ•ˆ μ‹€ν–‰ν•˜λ©΄ μ—¬μœ  λ©”λͺ¨λ¦¬κ°€ μ €ν•˜λ˜λŠ” 것을 λ°œκ²¬ν–ˆμŠ΅λ‹ˆλ‹€. 첫째, λ‚΄ μ½”λ“œκ°€ μ˜μ‹¬λ©λ‹ˆλ‹€. 그런 λ‹€μŒ 봇을 μ˜μ‹¬ν•˜κ³  λ§ˆμΉ¨λ‚΄ μš”μ²­μ„ λ°›μ•˜μŠ΅λ‹ˆλ‹€. :)
len(gc.get_objects()) 을 μ‚¬μš©ν•˜μ—¬ λ¬Έμ œκ°€

μ˜ˆμƒ κ²°κ³Ό

len(gc.get_objects()) λŠ” λͺ¨λ“  루프 λ°˜λ³΅μ—μ„œ λ™μΌν•œ κ²°κ³Όλ₯Ό μ œκ³΅ν•΄μ•Ό ν•©λ‹ˆλ‹€.

μ‹€μ œ κ²°κ³Ό

len(gc.get_objects()) 의 값은 루프가 반볡될 λ•Œλ§ˆλ‹€ μ¦κ°€ν•©λ‹ˆλ‹€.

Test N2
GetObjects len: 27959
Test N3
GetObjects len: 27960
Test N4
GetObjects len: 27961
Test N5
GetObjects len: 27962
Test N6
GetObjects len: 27963
Test N7
GetObjects len: 27964

λ²ˆμ‹ 단계

token = "XXX:XXX"
chat_id = '111'
proxy = {'https':'socks5h://ZZZ'} #You may need proxy to run this in Russia

from time import sleep
import gc, requests

def garbage_info():
    res = ""
    res += "\nGetObjects len: " + str(len(gc.get_objects()))
    return res

def tester():
    count = 0
    while(True):
        sleep(1)
        count += 1
        msg = "\nTest N{0}".format(count) + garbage_info()
        print(msg)

        method_url = r'sendMessage'
        payload = {'chat_id': str(chat_id), 'text': msg}

        request_url = "https://api.telegram.org/bot{0}/{1}".format(token, method_url)
        method_name = 'get'

        session = requests.session()
        req = requests.Request(
            method=method_name.upper(),
            url=request_url,
            params=payload
        )
        prep = session.prepare_request(req)

        settings = session.merge_environment_settings(
            prep.url, None, None, None, None)
#            prep.url, proxy, None, None, None)  #Change the line to enable proxy
        send_kwargs = {
            'timeout': None,
            'allow_redirects': None,
        }
        send_kwargs.update(settings)
        resp = session.send(prep, **send_kwargs)

        # For more clean output
        gc.collect()

tester()

μ‹œμŠ€ν…œ 정보

{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": "2.3.1"
  },
  "idna": {
    "version": "2.7"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.6.6"
  },
  "platform": {
    "release": "4.15.0-36-generic",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "1010009f",
    "version": "17.5.0"
  },
  "requests": {
    "version": "2.19.1"
  },
  "system_ssl": {
    "version": "1010007f"
  },
  "urllib3": {
    "version": "1.23"
  },
  "using_pyopenssl": true
}

_Windows10의 Python 3.5.3μ—μ„œμ™€ λ™μΌν•œ λ™μž‘._

@LeoSZN

@initbar

p.join() 전에 λ£¨ν”„μ—μ„œ λ˜λŠ” 루프 μ™ΈλΆ€μ—μ„œ p.daemon = True λ₯Ό μ‹€ν–‰ν•΄μ•Ό ν•©λ‹ˆκΉŒ? 그런데 λ‚˜λŠ” 아직도 ν•„μš”ν•©λ‹ˆκΉŒ p.join() μ μš©ν•œ ν›„ p.daemon = True ?

# ..
     for i in urls:
        p = Process(target=main, args=(i,))
        p.daemon = True  # before `.start`
        p.start()
# ..

μž‘μ€ λ…ΈνŠΈλ‘œμ„œ, 당신은 아직 ν•  수 .join ν”„λ‘œμ„ΈμŠ€ 데λͺ¬ - μžμ‹ μ˜ λΆ€λͺ¨ ν”„λ‘œμ„ΈμŠ€μ˜ μ’…λ£Œκ°€ (그듀은 μ–΄λ–»κ²Œ λ“  λ ν•˜μ§€ μ•ŠλŠ” μ‹€μˆ˜ κ³ μ•„ κ·ΈλŸ¬λ‚˜ 그듀이 μ‚΄ν•΄λ˜λŠ” 거의 보μž₯;!ν•˜λŠ” 경우 μ£Όμ‹œκΈ° λ°”λžλ‹ˆλ‹€ λ‚˜ λ‚΄κ°€ μ•Œκ³  그것에 λŒ€ν•΄ 더 많이 배우고 μ‹ΆμŠ΅λ‹ˆλ‹€).

그렇지 μ•ŠμœΌλ©΄ Process 개체λ₯Ό λ°°μ—΄λ‘œ λ³„λ„λ‘œ μ €μž₯ν•˜κ³  λ§ˆμ§€λ§‰μ— 쑰인할 수 μžˆμŠ΅λ‹ˆλ‹€.

# ..
processes = [ 
  Process(target=main, args=(i,))
  for i in urls
]
# start the process activity.

μ˜ˆμƒ κ²°κ³Ό

len(gc.get_objects()) λŠ” λͺ¨λ“  루프 λ°˜λ³΅μ—μ„œ λ™μΌν•œ κ²°κ³Όλ₯Ό μ œκ³΅ν•΄μ•Ό ν•©λ‹ˆλ‹€.

이 λ™μž‘μ˜ 원인은 "μš”μ²­" μΊμ‹œ λ©”μ»€λ‹ˆμ¦˜μ—μ„œ 찾을 수 μžˆμŠ΅λ‹ˆλ‹€.

μ˜¬λ°”λ₯΄μ§€ μ•Šκ²Œ μž‘λ™ν•©λ‹ˆλ‹€(μ˜μ‹¬λ¨): Telegram API URL에 λŒ€ν•œ λͺ¨λ“  ν˜ΈμΆœμ— μΊμ‹œ λ ˆμ½”λ“œλ₯Ό μΆ”κ°€ν•©λ‹ˆλ‹€(ν•œ 번 μΊμ‹±ν•˜λŠ” λŒ€μ‹ ). κ·ΈλŸ¬λ‚˜ μΊμ‹œ 크기가 20으둜 μ œν•œλ˜κ³  이 μ œν•œμ— λ„λ‹¬ν•œ ν›„ μΊμ‹œκ°€ μž¬μ„€μ •λ˜κ³  μ¦κ°€ν•˜λŠ” 개체 μˆ˜κ°€ 초기 κ°’μœΌλ‘œ λ‹€μ‹œ κ°μ†Œν•˜κΈ° λ•Œλ¬Έμ— λ©”λͺ¨λ¦¬ λˆ„μˆ˜λ‘œ 이어지지 μ•ŠμŠ΅λ‹ˆλ‹€.

λΉ„μŠ·ν•œ λ¬Έμ œμž…λ‹ˆλ‹€. μš”μ²­μ€ μŠ€λ ˆλ“œμ—μ„œ 싀행될 λ•Œ λ©”λͺ¨λ¦¬λ₯Ό λ¨ΉμŠ΅λ‹ˆλ‹€. μ—¬κΈ°μ—μ„œ μž¬ν˜„ν•  μ½”λ“œ:

import gc
from concurrent.futures import ThreadPoolExecutor, as_completed
import requests
from memory_profiler import profile

def run_thread_request(sess, run):
    response = sess.get('https://www.google.com')
    return

<strong i="6">@profile</strong>
def main():
    sess = requests.session()
    with ThreadPoolExecutor(max_workers=1) as executor:
        print('Starting!')
        tasks = {executor.submit(run_thread_request, sess, run):
                    run for run in range(50)}
        for _ in as_completed(tasks):
            pass
    print('Done!')
    return

<strong i="7">@profile</strong>
def calling():
    main()
    gc.collect()
    return

if __name__ == '__main__':
    calling()

μœ„μ— 주어진 μ½”λ“œμ—μ„œ μ„Έμ…˜ 개체λ₯Ό μ „λ‹¬ν•˜μ§€λ§Œ requests.get μ‹€ν–‰ν•˜λŠ” κ²ƒμœΌλ‘œ κ΅μ²΄ν•˜λ©΄ 아무 것도 λ³€κ²½λ˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.

좜λ ₯은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.

➜  thread-test pipenv run python run.py
Starting!
Done!
Filename: run.py

Line #    Mem usage    Increment   Line Contents
================================================
    10     23.2 MiB     23.2 MiB   <strong i="13">@profile</strong>
    11                             def main():
    12     23.2 MiB      0.0 MiB       sess = requests.session()
    13     23.2 MiB      0.0 MiB       with ThreadPoolExecutor(max_workers=1) as executor:
    14     23.2 MiB      0.0 MiB           print('Starting!')
    15     23.4 MiB      0.0 MiB           tasks = {executor.submit(run_thread_request, sess, run):
    16     23.4 MiB      0.0 MiB                       run for run in range(50)}
    17     25.8 MiB      2.4 MiB           for _ in as_completed(tasks):
    18     25.8 MiB      0.0 MiB               pass
    19     25.8 MiB      0.0 MiB       print('Done!')
    20     25.8 MiB      0.0 MiB       return


Filename: run.py

Line #    Mem usage    Increment   Line Contents
================================================
    22     23.2 MiB     23.2 MiB   <strong i="14">@profile</strong>
    23                             def calling():
    24     25.8 MiB      2.6 MiB       main()
    25     25.8 MiB      0.0 MiB       gc.collect()
    26     25.8 MiB      0.0 MiB       return

그리고 Pipfile은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.

[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true

[requires]
python_version = "3.6"

[packages]
requests = "==2.21.0"
memory-profiler = "==0.55.0"

FWIW λ˜ν•œ @jotunskij 와 μœ μ‚¬ν•œ λ©”λͺ¨λ¦¬ λˆ„μˆ˜κ°€ λ°œμƒν•˜κ³ 

https://github.com/nicolargo/glances/issues/1447

λ˜ν•œ μŠ€λ ˆλ”©κ³Ό ν•¨κ»˜ requests.get을 μ‚¬μš©ν•˜λ©΄ μ‹€μ œλ‘œ μš”μ²­λ‹Ή μ•½ 0.1 - 0.9 λ©”λͺ¨λ¦¬λ₯Ό μ†Œλͺ¨ν•˜κ³  μš”μ²­ 후에 자체적으둜 "μ§€μš°κΈ°"κ°€ μ•„λ‹ˆλΌ μ €μž₯ν•˜λŠ” 것과 λ™μΌν•œ λ¬Έμ œκ°€ μžˆμŠ΅λ‹ˆλ‹€.

여기도 λ§ˆμ°¬κ°€μ§€μž…λ‹ˆλ‹€. ν•΄κ²° 방법이 μžˆλ‚˜μš”?

νŽΈμ§‘ν•˜λ‹€
λ‚΄ λ¬Έμ œλŠ” μš”μ²­μ—μ„œ verify=False λ₯Ό μ‚¬μš©ν•˜κΈ° λ•Œλ¬Έμ— λ°œμƒν•œ 것 κ°™μŠ΅λ‹ˆλ‹€. #5215μ—μ„œ 버그λ₯Ό μ œκΈ°ν–ˆμŠ΅λ‹ˆλ‹€.


같은 λ¬Έμ œκ°€ μžˆμŠ΅λ‹ˆλ‹€. μŠ€λ ˆλ“œλ₯Ό μƒμ„±ν•˜λŠ” κ°„λ‹¨ν•œ μŠ€ν¬λ¦½νŠΈκ°€ μžˆμŠ΅λ‹ˆλ‹€. 이 μŠ€λ ˆλ“œλŠ” while 루프λ₯Ό μ‹€ν–‰ν•˜λŠ” ν•¨μˆ˜λ₯Ό ν˜ΈμΆœν•˜κ³ , 이 λ£¨ν”„λŠ” APIλ₯Ό μΏΌλ¦¬ν•˜μ—¬ μƒνƒœ 값을 ν™•μΈν•œ λ‹€μŒ 10초 λ™μ•ˆ νœ΄λ©΄ν•œ λ‹€μŒ μŠ€ν¬λ¦½νŠΈκ°€ 쀑지될 λ•ŒκΉŒμ§€ 루프가 λ‹€μ‹œ μ‹€ν–‰λ©λ‹ˆλ‹€.

requests.get ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•  λ•Œ μƒμ„±λœ ν”„λ‘œμ„ΈμŠ€λ₯Ό λ³΄λ©΄μ„œ μž‘μ—… κ΄€λ¦¬μžλ₯Ό 톡해 λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰μ΄ 천천히 μ¦κ°€ν•˜λŠ” 것을 λ³Ό 수 μžˆμŠ΅λ‹ˆλ‹€.

κ·ΈλŸ¬λ‚˜ λ£¨ν”„μ—μ„œ requests.get ν˜ΈμΆœμ„ μ œκ±°ν•˜κ±°λ‚˜ urllib3 직접 μ‚¬μš©ν•˜μ—¬ get μš”μ²­μ„ ν•˜λ©΄ λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰μ΄ 거의 μ—†μŠ΅λ‹ˆλ‹€.

λ‚˜λŠ” 두 경우 λͺ¨λ‘ 2μ‹œκ°„ λ™μ•ˆ 이것을 λ³΄μ•˜κ³  requests.get λ•Œ λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰μ€ urllib3 μ‚¬μš©ν•  λ•Œμ™€ 같이 λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰μ΄ μ•½ 2μ‹œκ°„ 후에 1GB+μž…λ‹ˆλ‹€. 2μ‹œκ°„ ν›„ 20MB.

Python 3.7.4 및 μš”μ²­ 2.22.0

RequestsλŠ” μ—¬μ „νžˆ κ·ΈλŸ¬ν•œ λ©”λͺ¨λ¦¬ λˆ„μˆ˜κ°€ μžˆλŠ” 베타 단계에 μžˆλŠ” 것 κ°™μŠ΅λ‹ˆλ‹€. μ–΄μ„œ, μ–˜λ“€μ•„, 이것을 패치! πŸ˜‰πŸ‘

이에 λŒ€ν•œ μ—…λ°μ΄νŠΈκ°€ μžˆμŠ΅λ‹ˆκΉŒ? 파일 μ—…λ‘œλ“œκ°€ ν¬ν•¨λœ κ°„λ‹¨ν•œ POST μš”μ²­λ„ μœ μ‚¬ν•œ λ©”λͺ¨λ¦¬ λˆ„μˆ˜ 문제λ₯Ό μƒμ„±ν•©λ‹ˆλ‹€.

λ‚˜μ—κ²Œλ„ λ§ˆμ°¬κ°€μ§€μž…λ‹ˆλ‹€ ... μŠ€λ ˆλ“œ ν’€ μ‹€ν–‰ 쀑 λˆ„μΆœμ€ Windows python38μ—μ„œλ„ λ°œμƒν•©λ‹ˆλ‹€.
μš”μ²­ 2.22.0

λ‚˜μ—κ²Œλ„ λ§ˆμ°¬κ°€μ§€

μ—¬κΈ° λ‚΄ λ©”λͺ¨λ¦¬ λˆ„μˆ˜ λ¬Έμ œκ°€ μžˆμŠ΅λ‹ˆλ‹€. λˆ„κ΅¬λ“ μ§€ λ„μšΈ 수 μžˆμŠ΅λ‹ˆκΉŒ? https://stackoverflow.com/questions/59746125/memory-keep-growing-when-using-mutil-thread-download-file

Session.close() ν˜ΈμΆœν•˜κ³  Response.close() ν˜ΈμΆœν•˜λ©΄ λ©”λͺ¨λ¦¬ λˆ„μˆ˜λ₯Ό ν”Όν•  수 μžˆμŠ΅λ‹ˆλ‹€.
그리고 ssl은 더 λ§Žμ€ λ©”λͺ¨λ¦¬λ₯Ό μ†ŒλΉ„ν•˜λ―€λ‘œ https URL을 μš”μ²­ν•  λ•Œ λ©”λͺ¨λ¦¬ λˆ„μˆ˜κ°€ 더 λ‘λ“œλŸ¬μ§‘λ‹ˆλ‹€.

λ¨Όμ € 4개의 ν…ŒμŠ€νŠΈ μΌ€μ΄μŠ€λ₯Ό λ§Œλ“­λ‹ˆλ‹€.

  1. μš”μ²­ + SSL(https://)
  2. μš”μ²­ + λΉ„ SSL(http://)
  3. μ•„μ΄μ˜€http + ssl (https://)
  4. aiohttp + λΉ„ SSL(http://)

μ˜μ‚¬ μ½”λ“œ:

def run(url):
    session = requests.session()
    response = session.get(url)

while True:
    for url in urls:  # about 5k urls of public websites
        # execute in thread pool, size=10
        thread_pool.submit(run, url)

# in another thread, record memory usage every seconds

λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰ κ·Έλž˜ν”„(yμΆ•: MB, xμΆ•: μ‹œκ°„), μš”μ²­μ€ λ§Žμ€ λ©”λͺ¨λ¦¬λ₯Ό μ‚¬μš©ν•˜κ³  λ©”λͺ¨λ¦¬λŠ” 맀우 λΉ λ₯΄κ²Œ μ¦κ°€ν•˜μ§€λ§Œ iohttp λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰μ€ μ•ˆμ •μ μž…λ‹ˆλ‹€.

requests-non-ssl
requests-ssl

aiohttp-non-ssl
aiohttp-ssl

그런 λ‹€μŒ Session.close() ν•˜κ³  λ‹€μ‹œ ν…ŒμŠ€νŠΈν•©λ‹ˆλ‹€.

def run(url):
    session = requests.session()
    response = session.get(url)
    session.close()  # close session !!

λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰μ΄ 크게 κ°μ†Œν–ˆμ§€λ§Œ μ‹œκ°„μ΄ 지남에 따라 λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰μ€ 계속 μ¦κ°€ν•©λ‹ˆλ‹€.

requests-non-ssl-close-session
requests-ssl-close-session

λ§ˆμ§€λ§‰μœΌλ‘œ Response.close() ν•˜κ³  λ‹€μ‹œ ν…ŒμŠ€νŠΈν•©λ‹ˆλ‹€.

def run(url):
    session = requests.session()
    response = session.get(url)
    session.close()  # close session !!
    response.close()  # close response !!

λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰μ΄ λ‹€μ‹œ κ°μ†Œν•˜κ³  μ‹œκ°„μ΄ μ§€λ‚˜λ„ μ¦κ°€ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.

requests-non-ssl-close-all
requests-ssl-close-all

aiohttp 및 μš”μ²­μ„ λΉ„κ΅ν•˜λ©΄ λ©”λͺ¨λ¦¬ λˆ„μˆ˜κ°€ ssl둜 μΈν•œ 것이 μ•„λ‹ˆλΌ μ—°κ²° λ¦¬μ†ŒμŠ€κ°€ λ‹«νžˆμ§€ μ•Šμ•˜κΈ° λ•Œλ¬Έμž„μ„ λ³΄μ—¬μ€λ‹ˆλ‹€.

μœ μš©ν•œ 슀크립트:

class MemoryReporter:
    def __init__(self, name):
        self.name = name
        self.file = open(f'memoryleak/memory_{name}.txt', 'w')
        self.thread = None

    def _get_memory(self):
        return psutil.Process().memory_info().rss

    def main(self):
        while True:
            t = time.time()
            v = self._get_memory()
            self.file.write(f'{t},{v}\n')
            self.file.flush()
            time.sleep(1)

    def start(self):
        self.thread = Thread(target=self.main, name=self.name, daemon=True)
        self.thread.start()


def plot_memory(name):
    filepath = 'memoryleak/memory_{}.txt'.format(name)
    df_mem = pd.read_csv(filepath, index_col=0, names=['t', 'v'])
    df_mem.index = pd.to_datetime(df_mem.index, unit='s')
    df_mem.v = df_mem.v / 1024 / 1024
    df_mem.plot(figsize=(16, 8))

μ‹œμŠ€ν…œ 정보:

$ python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.8"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.7.4"
  },
  "platform": {
    "release": "18.0.0",
    "system": "Darwin"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.22.0"
  },
  "system_ssl": {
    "version": "1010104f"
  },
  "urllib3": {
    "version": "1.25.6"
  },
  "using_pyopenssl": false
}

SSL λˆ„μˆ˜ λ¬Έμ œλŠ” Windows 및 OSXμ—μ„œ OpenSSL <= 3.7.4 νŒ¨ν‚€μ§€λ‘œ 제곡되며 μ»¨ν…μŠ€νŠΈμ—μ„œ λ©”λͺ¨λ¦¬λ₯Ό μ œλŒ€λ‘œ ν•΄μ œν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.

https://github.com/VeNoMouS/cloudscraper/issues/143#issuecomment -613092377

이 νŽ˜μ΄μ§€κ°€ 도움이 λ˜μ—ˆλ‚˜μš”?
0 / 5 - 0 λ“±κΈ‰