We are using Django Rest Framework with MongoEngine, Redis, Celery and Kombu, and we are getting the following error in our logs:
`[2017-03-22 06:26:01,702: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/worker/consumer/consumer.py", line 318, in start
blueprint.start(self)
File "/usr/local/lib/python2.7/dist-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/usr/local/lib/python2.7/dist-packages/celery/worker/consumer/consumer.py", line 594, in start
c.loop(c.loop_args())
File "/usr/local/lib/python2.7/dist-packages/celery/worker/loops.py", line 88, in asynloop
next(loop)
File "/usr/local/lib/python2.7/dist-packages/kombu/async/hub.py", line 345, in create_loop
cb(cbargs)
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/redis.py", line 1039, in on_readable
self.cycle.on_readable(fileno)
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/redis.py", line 337, in on_readable
chan.handlerstype
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/redis.py", line 667, in _receive
ret.append(self._receive_one(c))
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/redis.py", line 678, in _receive_one
response = c.parse_response()
File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 2183, in parse_response
return self._execute(connection, connection.read_response)
File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 2176, in _execute
return command(*args)
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 577, in read_response
response = self._parser.read_response()
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 238, in read_response
response = self._buffer.readline()
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 168, in readline
self._read_from_socket()
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 143, in _read_from_socket
(e.args,))
ConnectionError: Error while reading from socket: ('Connection closed by server.',)
[2017-03-22 06:26:01,868: INFO/MainProcess] Connected to redis://:**********/1
python==2.7.12
redis==3.2.3
kombu==4.0.2
Django==1.10.3
celery==4.0.2
amqp==2.1.1
billiard==3.5.0.2
pytz==2016.7
Django==1.10.3o
We have HaProxy infront of Redis cluster and connections, everything else work without any issues. Could you please help us trouble shoot this error ?
Or guide on what to look for and where to look for , etc ?
We really appreciate your help,
Thanks,
Darsana
@dharanpdarsana Do you have redis running? Here is how you can validate -
$ redis-cli
redis 127.0.0.1:6379> ping
PONG
redis 127.0.0.1:6379> set mykey somevalue
OK
redis 127.0.0.1:6379> get mykey
"somevalue"
@jpatel3 redis is running and 90% of the connections are going through without any issue
I am also (suddenly) experiencing a broken pipe error.
It was working perfectly fine before. I've forgotten what I have installed or did to trigger this.
[2017-05-22 17:18:10,939: INFO/MainProcess] Connected to amqp://user123:**@192.168.99.100:5672//
[2017-05-22 17:18:10,955: INFO/MainProcess] mingle: searching for neighbors
[2017-05-22 17:18:10,965: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 318, in start
blueprint.start(self)
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/celery/worker/consumer/mingle.py", line 38, in start
self.sync(c)
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/celery/worker/consumer/mingle.py", line 42, in sync
replies = self.send_hello(c)
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/celery/worker/consumer/mingle.py", line 55, in send_hello
replies = inspect.hello(c.hostname, our_revoked._data) or {}
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/celery/app/control.py", line 129, in hello
return self._request('hello', from_node=from_node, revoked=revoked)
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/celery/app/control.py", line 81, in _request
timeout=self.timeout, reply=True,
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/celery/app/control.py", line 436, in broadcast
limit, callback, channel=channel,
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/kombu/pidbox.py", line 315, in _broadcast
serializer=serializer)
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/kombu/pidbox.py", line 290, in _publish
serializer=serializer,
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/kombu/messaging.py", line 181, in publish
exchange_name, declare,
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/kombu/messaging.py", line 194, in _publish
[maybe_declare(entity) for entity in declare]
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/kombu/messaging.py", line 194, in <listcomp>
[maybe_declare(entity) for entity in declare]
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/kombu/messaging.py", line 102, in maybe_declare
return maybe_declare(entity, self.channel, retry, **retry_policy)
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/kombu/common.py", line 125, in maybe_declare
return _maybe_declare(entity, declared, ident, channel, orig)
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/kombu/common.py", line 131, in _maybe_declare
entity.declare(channel=channel)
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/kombu/entity.py", line 185, in declare
nowait=nowait, passive=passive,
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/amqp/channel.py", line 630, in exchange_declare
wait=None if nowait else spec.Exchange.DeclareOk,
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/amqp/abstract_channel.py", line 64, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/amqp/method_framing.py", line 174, in write_frame
write(view[:offset])
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/amqp/transport.py", line 269, in write
self._write(s)
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/eventlet/greenio/base.py", line 397, in sendall
tail = self.send(data, flags)
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/eventlet/greenio/base.py", line 391, in send
return self._send_loop(self.fd.send, data, flags)
File "/Users/richmond/Projects/backend/env/lib/python3.6/site-packages/eventlet/greenio/base.py", line 378, in _send_loop
return send_method(data, *args)
BrokenPipeError: [Errno 32] Broken pipe
[2017-05-22 17:18:11,032: INFO/MainProcess] Connected to amqp://user123:**@192.168.99.100:5672//
[2017-05-22 17:18:11,046: INFO/MainProcess] mingle: searching for neighbors
[2017-05-22 17:18:12,075: INFO/MainProcess] mingle: all alone
[2017-05-22 17:18:12,104: INFO/MainProcess] [email protected] ready.
$ pip freeze
alabaster==0.7.10
alembic==0.9.1
amqp==2.1.4
aniso8601==1.2.0
appdirs==1.4.3
argh==0.26.2
Babel==2.4.0
billiard==3.5.0.2
blinker==1.4
bumpversion==0.5.3
cached-property==1.3.0
celery==4.0.2
cffi==1.10.0
click==6.7
colorama==0.3.8
coverage==4.1
cryptography==1.7
docker==2.2.1
docker-compose==1.11.2
docker-pycreds==0.2.1
dockerpty==0.4.1
docopt==0.6.2
docutils==0.13.1
ecdsa==0.13
enum-compat==0.0.2
eventlet==0.21.0
factory-boy==2.8.1
Faker==0.7.11
flake8==2.6.0
Flask==0.12
Flask-Babel==0.11.2
Flask-Cors==3.0.2
Flask-GraphQL==1.4.0
Flask-Mail==0.9.1
Flask-Migrate==2.0.3
Flask-RESTful==0.3.5
Flask-Script==2.0.5
Flask-SocketIO==2.8.6
Flask-SQLAlchemy==2.1
future==0.16.0
graphene==1.1.3
graphene-sqlalchemy==1.1.1
graphql-core==1.1
graphql-relay==0.4.5
greenlet==0.4.12
gunicorn==19.7.1
idna==2.5
imagesize==0.7.1
inflection==0.3.1
iso8601==0.1.11
itsdangerous==0.24
Jinja2==2.9.6
jsonschema==2.6.0
kombu==4.0.2
Mako==1.0.6
MarkupSafe==1.0
mccabe==0.5.3
mod-wsgi==4.5.15
ndg-httpsclient==0.4.2
packaging==16.8
pathtools==0.1.2
pluggy==0.3.1
promise==2.0
py==1.4.33
pyasn1==0.2.3
pycodestyle==2.0.0
pycparser==2.17
pycrypto==2.6.1
pyflakes==1.2.3
Pygments==2.2.0
PyMySQL==0.7.10
pyOpenSSL==17.0.0
pyparsing==2.2.0
python-dateutil==2.6.0
python-editor==1.0.3
python-engineio==1.4.0
python-jose==1.3.2
python-socketio==1.7.4
pytz==2017.2
PyYAML==3.11
raven==5.32.0
requests==2.11.1
singledispatch==3.4.0.3
six==1.10.0
snowballstemmer==1.2.1
Sphinx==1.4.8
SQLAlchemy==1.1.5
texttable==0.8.8
tox==2.3.1
typing==3.6.1
vine==1.1.3
virtualenv==15.1.0
voluptuous==0.9.3
watchdog==0.8.3
websocket-client==0.40.0
Werkzeug==0.12.1
I have the same problem and I'm using the same versions and I can't find a solution :(
I am using kombu and running into the same situation, I'm not sure but I think maybe it's an issue working with haproxy.
When connected to haproxy which routes connections to several rabbitmq nodes, both producers and consumers have the same "connection to broker lost" issue, but when directly connected to a rabbitmq node, the issue never occurs.
Also seeing this, and also behind haproxy:
kombu==4.0.2
celery==4.0.2
redis==2.10.5
Maybe it's happened SQS brokers too:
boto==2.47.0
celery==4.0.2
kombu==4.0.2
The same problem here, Celery with RabbitMQ and eventlet.
amqp==2.2.1
billiard==3.5.0.3
celery==4.1.0
Django==1.8.5
eventlet==0.21.0
kombu==4.1.0
pytz==2017.2
vine==1.1.4
[2017-08-10 06:35:39,629: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 320, in start
blueprint.start(self)
File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/usr/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 596, in start
c.loop(*c.loop_args())
File "/usr/lib/python2.7/site-packages/celery/worker/loops.py", line 118, in synloop
connection.drain_events(timeout=2.0)
File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 301, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 103, in drain_events
return connection.drain_events(**kwargs)
File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 485, in drain_events
while not self.blocking_read(timeout):
File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 490, in blocking_read
frame = self.transport.read_frame()
File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 240, in read_frame
frame_header = read(7, True)
File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 415, in _read
s = recv(n - len(rbuf))
File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 360, in recv
return self._recv_loop(self.fd.recv, b'', bufsize, flags)
File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 354, in _recv_loop
self._read_trampoline()
File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 325, in _read_trampoline
timeout_exc=socket_timeout('timed out'))
File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 207, in _trampoline
mark_as_closed=self._mark_as_closed)
File "/usr/lib/python2.7/site-packages/eventlet/hubs/__init__.py", line 163, in trampoline
return hub.switch()
File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 295, in switch
return self.greenlet.switch()
File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 347, in run
self.wait(sleep_time)
File "/usr/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 84, in wait
presult = self.do_poll(seconds)
File "/usr/lib/python2.7/site-packages/eventlet/hubs/epolls.py", line 61, in do_poll
return self.poll.poll(seconds)
File "/usr/lib/python2.7/site-packages/celery/apps/worker.py", line 333, in restart_worker_sig_handler
safe_say('Restarting celery worker ({0})'.format(' '.join(sys.argv)))
File "/usr/lib/python2.7/site-packages/celery/apps/worker.py", line 89, in safe_say
print('\n{0}'.format(msg), file=sys.__stderr__)
IOError: [Errno 32] Broken pipe
[2017-08-10 06:35:39,820: INFO/MainProcess] Connected to amqp://guest:**@127.0.0.1:5672//
[2017-08-10 06:35:39,940: INFO/MainProcess] mingle: searching for neighbors
[2017-08-10 06:35:40,975: INFO/MainProcess] mingle: all alone
[2017-08-10 06:35:41,115: INFO/MainProcess] pidbox: Connected to amqp://guest:**@127.0.0.1:5672//.
I'm having this same problem with RabbitMQ running on localhost. It's odd that local connections would time-out or lose connection. I do have some pretty intensive tasks, and my CPU is at 100%, but my memory usage is low. Would that cause dropped connections?
celery==4.0.2
kombu==4.0.2
Django==1.8.6
RabbitMQ==3.2.4
@chrisspen, were you able to resolve the issue? I am having a similar scenario, where while performing an CPU intensive task, the connection to redis gets dropped: the following is one of the many manifestation of exceptions which I receive:
[2017-10-11 13:34:57,244: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
File "/usr/local/python/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 318, in start
blueprint.start(self)
File "/usr/local/python/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/usr/local/python/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 594, in start
c.loop(*c.loop_args())
File "/usr/local/python/lib/python2.7/site-packages/celery/worker/loops.py", line 88, in asynloop
next(loop)
File "/usr/local/python/lib/python2.7/site-packages/kombu/async/hub.py", line 345, in create_loop
cb(*cbargs)
File "/usr/local/python/lib/python2.7/site-packages/kombu/transport/redis.py", line 1039, in on_readable
self.cycle.on_readable(fileno)
File "/usr/local/python/lib/python2.7/site-packages/kombu/transport/redis.py", line 337, in on_readable
chan.handlers[type]()
File "/usr/local/python/lib/python2.7/site-packages/kombu/transport/redis.py", line 714, in _brpop_read
**options)
File "/usr/local/python/lib/python2.7/site-packages/redis/client.py", line 585, in parse_response
response = connection.read_response()
File "/usr/local/python/lib/python2.7/site-packages/redis/connection.py", line 577, in read_response
response = self._parser.read_response()
File "/usr/local/python/lib/python2.7/site-packages/redis/connection.py", line 238, in read_response
response = self._buffer.readline()
File "/usr/local/python/lib/python2.7/site-packages/redis/connection.py", line 168, in readline
self._read_from_socket()
File "/usr/local/python/lib/python2.7/site-packages/redis/connection.py", line 143, in _read_from_socket
(e.args,))
ConnectionError: Error while reading from socket: ('Connection closed by server.',)
I also had a " Connection to broker lost" using rabbit + celery under high CPU usage; not sure is related.
this might be an issue with HAProxy
@auvipy I doubt it. I do not use HAProxy and have been trying to use Celery 4.0.3 to migrate from 3.1.25.
I still get this error.
Let wait for 4.2 release
I'm having this same problem with Redis running on localhost. My redis config timeout value is 5. When too many tasks executed beyond 5 seconds, it would result in this exception:
[2018-03-28 10:13:09,557: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
File "/usr/local/python2.7/lib/python2.7/site-packages/celery-4.2.0rc1-py2.7.egg/celery/worker/consumer/consumer.py", line 322, in start
blueprint.start(self)
File "/usr/local/python2.7/lib/python2.7/site-packages/celery-4.2.0rc1-py2.7.egg/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/usr/local/python2.7/lib/python2.7/site-packages/celery-4.2.0rc1-py2.7.egg/celery/worker/consumer/consumer.py", line 598, in start
c.loop(*c.loop_args())
File "/usr/local/python2.7/lib/python2.7/site-packages/celery-4.2.0rc1-py2.7.egg/celery/worker/loops.py", line 91, in asynloop
next(loop)
File "/usr/local/python2.7/lib/python2.7/site-packages/kombu/async/hub.py", line 354, in create_loop
cb(*cbargs)
File "/usr/local/python2.7/lib/python2.7/site-packages/kombu/transport/redis.py", line 1040, in on_readable
self.cycle.on_readable(fileno)
File "/usr/local/python2.7/lib/python2.7/site-packages/kombu/transport/redis.py", line 337, in on_readable
chan.handlers[type]()
File "/usr/local/python2.7/lib/python2.7/site-packages/kombu/transport/redis.py", line 714, in _brpop_read
**options)
File "/usr/local/python2.7/lib/python2.7/site-packages/redis/client.py", line 680, in parse_response
response = connection.read_response()
File "/usr/local/python2.7/lib/python2.7/site-packages/redis/connection.py", line 624, in read_response
response = self._parser.read_response()
File "/usr/local/python2.7/lib/python2.7/site-packages/redis/connection.py", line 284, in read_response
response = self._buffer.readline()
File "/usr/local/python2.7/lib/python2.7/site-packages/redis/connection.py", line 216, in readline
self._read_from_socket()
File "/usr/local/python2.7/lib/python2.7/site-packages/redis/connection.py", line 191, in _read_from_socket
(e.args,))
ConnectionError: Error while reading from socket: ('Connection closed by server.',)
[2018-03-28 10:13:09,558: WARNING/MainProcess] Restoring 7 unacknowledged message(s)
And i found that if argument for celery application is too long, such as a string of length 65000 ,and the connection between celery and redis with closed and can't reconnect.
Here is the example:
# filename: task.py
# test for celery task
import time
from celery import Celery
app = Celery('tasks', broker='redis://:@127.0.0.1:6379/2')
@app.task
def test(data):
time.sleep(20)
start 2 celery worker
celery -A task worker -c 2 -l info
# filename: test.py
# produce celery task
from mytask import test
for i in range(40):
test.delay('1'*65000)
```
run test
```bash
python test.py
Wait for about 2mins, then run this command and you will find no connection between celery and redis is ESTABLISHED.
netstat -natp | grep 6379 | grep ESTABLISHED
It indicated that connections between celery and redis were closed and reconnected failed.So what cause dropped connections and reconnected failed?
celery-4.1.0 or celery-4.2.0rc1
kombu==4.1.0
billiard==3.5.0.3
pytz==2018.3
amqp==2.2.2
vine==1.1.4
redis 4.0.2
I don't know why this happened, but if timeout of redis.conf is set to 0, problems will disappeared.
Thank you.
I have the same problem with amqp broker and:
kombu==4.2.1
celery==4.2.0
Same issue with celery==4.2.0 and kombu==4.2.1
anyone had any fix in practice?
Same issue with celery == 4.2.1
how do I resolve this ?
same issue, I am using celery==4.2.0, kombu==4.2.0, and Rabbitmq cluster as broker
any solution ?
I also have this problem. My logs are constantly spammed by this exceptions
Same here with latest celery, any fix available?
I try following to suppress logs. It's works fine for me.
from billiard.exceptions import (
SoftTimeLimitExceeded as CelerySoftTimeLimitExceeded,
)
from botocore.exceptions import (
ClientError as BotocoreClientError,
HTTPClientError as BotocoreHTTPClientError,
)
@app.task(
autoretry_for=(
BotocoreClientError,
BotocoreHTTPClientError,
CelerySoftTimeLimitExceeded,
),
)
def my_task():
pass
hi.
i have the same problem when i use package version celery 4.2.1 and amqp 2.4.0.
now i try celery 4.2.1 and amqp 2.3.2, it seems good.
work well.
hope can help somebody who have the problem.
@auvipy, I can see that this issue was closed but I am not sure if there is a fix for it yet or if one is planned... It would be great if you could confirm what the current status is, thanks.
could you please try celery and kombu from the master and report again?
I'm also having this issue. I tried the solution from @shaoeChen above using 2.3.2 amqp and 4.2.1 celery but still having this problem
Same issue,
celery 4.2.1
kombu 4.3.0
amqp 2.4.1
redis 2.10.5
Fixed when downgrading to amqp 2.3.2 as per @shaoeChen's answer.
In the special case of AWS ELB + cluster of Rabbit MQ, this configuration seems to solve it:
Default idle timeout is 60.
Same issue
redis==3.2.1
celery[redis]==4.3.0
Restart celery will be ok
try celery==4.4.0rc5
@auvipy would be good if you reference the change in celery==4.4.0rc5
that fixes this issue for posterity.
FWIW, bumping to 4.4.0rc5 seems to have stabilized this issue for me as well.
celery==4.4.0 is the current stable release. try that & report back.
celery==4.4.0
kombu==4.6.7
works (better).
Thanks for referencing 4.4.x
milestone
UPDATE: Whereas before, it was dropping connections frequently (roughly every minute)... now it seems to be dropping connections much more infrequently (once an hour) ? I'd have to log it and do some deeper log analysis for a clearer signal.
I am facing the same issue
[2020-04-22 05:31:09,243: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
File "c:\users\administrator\code\user\cel_env\lib\site-packages\redis\connection.py", line 700, in send_packed_command
sendall(self._sock, item)
File "c:\users\administrator\code\user\cel_env\lib\site-packages\redis\_compat.py", line 8, in sendall
return sock.sendall(*args, **kwargs)
File "c:\users\administrator\code\user\cel_env\lib\site-packages\eventlet\greenio\base.py", line 403, in sendall
tail = self.send(data, flags)
File "c:\users\administrator\code\user\cel_env\lib\site-packages\eventlet\greenio\base.py", line 397, in send
return self._send_loop(self.fd.send, data, flags)
File "c:\users\administrator\code\user\cel_env\lib\site-packages\eventlet\greenio\base.py", line 384, in _send_loop
return send_method(data, *args)
ConnectionResetError: An existing connection was forcibly closed by the remote host
my pip list
billiard==3.6.3.0
BTrees==4.4.1
celery==4.4.2
kombu==4.6.8
redis==3.4.1
I'm facing the same issue as reported in https://github.com/apache/airflow/issues/11622. This started to occure after https://github.com/apache/airflow/pull/11336
The interesting thing is that it happens only when we try to run Celery worker in daemonized way.
celery==4.4.7
kombu==4.6.11
billiard==3.6.3.0
redis==3.5.3
FYI: I tried upgrading to 5.0 but no success.
I had a similar issue using the HAproxy provided by redis-ha Helm chart. Turns out its default timeout client
was 30s
, and Redis' default TCP keepalive interval is "about 5 minutes". Raising the timeout client
and timeout server
both to 330s
, the problem seems to have been fixed.
I had a similar issue using the HAproxy provided by redis-ha Helm chart. Turns out its default
timeout client
was30s
, and Redis' default TCP keepalive interval is "about 5 minutes". Raising thetimeout client
andtimeout server
both to330s
, the problem seems to have been fixed.
should we provide this on the docs?
Well, I'm not sure whether it's the right solution, but I changed my EC2 instance on Amazon on which I use celery+rabbitmq+gunicorn+supervisor as part of my infrastructure of long (up to several hours long) tasks with thousands of messages managed by this 24/7 ...
... - from 1Gb memory-sized to that which has 2Gb (maybe you have different needs, so adjust accordingly)
As syslog file in Ubuntu showed - oom error was killing brokers and other things, and increasing server memory size helped to solve this.
Now my logs are clear from errors.
And no need to adjust my code.
Well, I'm not sure whether it's the right solution, but I changed my EC2 instance on Amazon on which I use celery+rabbitmq+gunicorn+supervisor as part of my infrastructure of long (up to several hours long) tasks with thousands of messages managed by this 24/7 ...
... - from 1Gb memory-sized to that which has 2Gb (maybe you have different needs, so adjust accordingly)
As syslog file in Ubuntu showed - oom error was killing brokers and other things, and increasing server memory size helped to solve this.
Now my logs are clear from errors.
And no need to adjust my code.
very practical!
Most helpful comment
Same issue with celery == 4.2.1
how do I resolve this ?