Faraday: Distinguish TimeoutErrors for open and read timeouts

Created on 9 Aug 2017 · 32Comments · Source: lostisland/faraday

In faraday/adapter/rack.rb, TimeoutError is raised for both open and read timeouts:

timeout  = env[:request][:timeout] || env[:request][:open_timeout]
response = if timeout
  Timer.timeout(timeout, Faraday::Error::TimeoutError) { execute_request(env, rack_env) }
else ... end

According to https://stackoverflow.com/questions/10322283/what-is-timeout-and-open-timeout-in-faraday, open_timeout is for the tcp connection and timeout is for the response read.

It would be nice to have separate exception types for these timeouts. Then we could determine whether or not to retry the request. Does adding something like Faraday::Error::OpenTimeoutError and Faraday::Error::ResponseTimeoutError and using those here make sense?

feature help wanted

Source

coberlin

Most helpful comment

Hi @coberlin I believe this might be a nice addition, I'm just scared about backwards compatibility.
However, a possible solution for this might be to have OpenTimeoutError and ResponseTimeoutError to inherit from TimeoutError, so that existing rescues will keep working as expected.
It's definitely worth some testing 😃

iMacTia on 10 Aug 2017

👍5

All 32 comments

iMacTia on 10 Aug 2017

👍5

The rack_adapter might be the wrong place for this feature. I think Rack applications don't necessarily distinguish between open and read timeouts. Perhaps this feature would work in the HTTPClient adapter or other adapters? From adapter/httpclient.rb:

    @app.call env
  rescue ::HTTPClient::TimeoutError, Errno::ETIMEDOUT
    raise Faraday::Error::TimeoutError, $!
  rescue ::HTTPClient::BadResponseError => err
    if err.message.include?('status 407')
      raise Faraday::Error::ConnectionFailed, %{407 "Proxy Authentication Required "}
    else
      raise Faraday::Error::ClientError, $!
    end
  rescue Errno::ECONNREFUSED, IOError, SocketError
    raise Faraday::Error::ConnectionFailed, $!
  rescue => err
    if defined?(OpenSSL) && OpenSSL::SSL::SSLError === err
      raise Faraday::SSLError, err
    else
      raise
    end

::HTTPClient::TimeoutError has 3 subclasses ConnectTimeoutError, ReceiveTimeoutError, SendTimeoutError, see e.g http://www.rubydoc.info/gems/httpclient/2.1.5.2/HTTPClient/TimeoutError

Faraday has Faraday::Error::ConnectionFailed already. Is that appropriate for ConnectTimeoutError? Faraday::Error::TimeoutError could be subclassed into Faraday::Error::ReceiveTimeoutError and Faraday::Error::SendTimeoutError.

coberlin on 15 Aug 2017

Faraday has Faraday::Error::ConnectionFailed already. Is that appropriate for ConnectTimeoutError?

This makes sense, but it wouldn't be backwards compatible. We have to keep in mind that people are already catching Faraday::Error::TimeoutError in their application so switching to ConnectionFailed will brake those cases.
What we want to do, instead, is defining 2 subclasses for Faraday::Error::TimeoutError whose names should be as much generic as possible:

Faraday::Error::OpenTimeoutError
Faraday::Error::ReadTimeoutError

Next step is to go into each adapter and map the adapter exceptions accordingly. E.g. for the HTTPClient:

HTTPClient::ConnectTimeoutError ==> Faraday::Error::OpenTimeoutError
HTTPClient::ReceiveTimeoutError ==> Faraday::Error::ReadTimeoutError
HTTPClient::TimeoutError ==> Faraday::Error::TimeoutError (this will catch also SendTimeoutError, which I'm not sure have a corresponding mapping in Faraday or a specific setting)

Finally, tests should be added where possible :)

iMacTia on 16 Aug 2017

Hey guys.

We had this discussion a "little" time ago (https://github.com/lostisland/faraday/pull/324).

I'm giving it another try (https://github.com/mistersourcerer/faraday/tree/718_mrsrcr_timeout-wrapping-2nd-chance), will try and open a new PR as soon as I have some progress on it.

mistersourcerer on 5 Oct 2017

Hi @mistersourcerer, thanks for the nudge, I was totally unaware that discussion took place.
I'm a bit confused as I see the PR closed, but the change in the code, haps to know you got your change merged somehow in the end 😄
Your help would be appreciated in this case as I think you're already comfortable with Timeout testing from your previous work (even though we're talking about 3 years ago!).
I hope my explanation on the OpenTimeoutError and ReadTimeoutError is clear, but if that's not the case then please let me know.
Take your time and open a pull request once you're done 👍

iMacTia on 5 Oct 2017

Hey @iMacTia.

If I remember correctly, we didn't manage to solve the situation back then. But I'm not sure exactly why.
The main problem was to write a test that failed consistently among all the adapters. So, I don't think my code was merged at all at the time.
Anyways, I have an idea for this some years after haha, let's see how it goes.

Right now, the tests for _EMSynchrony_ are failing on Travis, but not locally. Trying to figure it out. I'm thinking even on open an "early" PR so maybe we can discuss this.

And your explanation is crystal clear, seems the perfect way to go with it.

Thanks for the awesome work on this, man.

mistersourcerer on 5 Oct 2017

Thank you @mistersourcerer!

RE your changes: I'm not really sure of what happened, but I see @mislav finally merged your changes here: https://github.com/lostisland/faraday/commit/f73d13ee09814fa68b37efa7bddafa47331948c2

So rejoice, Errno::ETIMEDOUT is already wrapped under Faraday::Error::TimeoutError on most (if not all) adapters 😄

iMacTia on 5 Oct 2017

Thanks for working on this @mistersourcerer!

Looking at your commit here, I wonder if for backwards compatibility, we need 2 new subclasses: OpenConnectionError < ConnectionError for Net::HTTP and OpenTimeoutError < TimeoutError for HttpClient?

coberlin on 5 Oct 2017

There appears to be some confusion around this issue.
The reason is that a decision was taken on #438 to handle "open timeout" errors as ConnectionFailed. That is arguably the best decision, but reality is that someone decided to go down that route.
Now, this doesn't affect only the Rack adapter but also all other adapters, and their behaviour is probably not even consistent.
I'm planning to standardise them on the same behaviour with v1.0 and I'll keep this issue as a reference.

iMacTia on 13 Nov 2017

Follow-up in my previous comment.

Basically, we're currently raising a Faraday::ConnectionFailed error in case of an open timeout, while we raise a Faraday::TimeoutError for a read timeout. Although different adapters are currently behaving in different ways, this seems to be the most common behaviour.
This was decided something like 3 years ago, but here we're discussing on having a Faraday::TimeoutError for the former case as well (with proper sub-classes to distinguish between open and close).

On one side I understand that would be closer to reality, but if I analyse the issue from an implementation point of view, I find it hard to justify this change.
If I call a service and I get back a ConnectionFailed, I know that my call can't possibly have been processed. I probably didn't reach the server, or couldn't resolve the hostname, or something else happened.
If I get back a TimeoutError, then my request might have been processed, or partially processed, and I might have missed the response. That's a completely different case and requires to double-check with the server I was calling what happened.

Making the open timeout a sub-category of TimeoutError means taking a simple situation (request not processed) under a more complex domain, and surely requires additional checks to decide what to do: was it an open timeout or a read timeout?

We need to:

Decide how to bubble-up open timeouts
Standardise all adapters to the same behaviour

@coberlin @erik-escobedo @mislav @mistersourcerer would like to hear your thoughts after considering the above 😄

iMacTia on 13 Nov 2017

Going with ConnectionFailed for open timeout errors makes sense to me and would provide what I was hoping to get by distinguishing the open timeout errors from the other timeout errors. For adapter consistency, this would mean, for example, that Net::HTTP is ok as it is, but HTTPClient would change, with ConnectTimeoutErrors mapping to ConnectionFailed instead of to TimeoutError.

coberlin on 13 Nov 2017

That's OK, once we decide we'll standardise all adapters to the same behaviour (in v1.0 obviously, as this will be backward incompatible)

iMacTia on 13 Nov 2017

Throwing a use-case into the ring:

At work we've been suffering from some Open Timeouts due to Nginx + Kubernetes failing to route to hanging pods (or something). Anyway, NetHTTP used to throw OpenTimeout and ReadTimeout errors, and that was really handy for us debugging which was which.

Now we've switched to Typhoeus we sadly have all timeouts munged together, and it's tough for us to tell if our work on the nginx + kuber problems have been improved, or if we're just now successfully making more requests to an increasingly struggling system. Either way the number of timeouts are about the same, and without getting them separated we're kinda stuck guessing.

I don't think just adding Faraday::OpenTimeoutError is enough, we should have Faraday::OpenTimeoutError and Faraday::ReadTimeoutError extending from Faraday::TimeoutError IMO.

philsturgeon on 14 Nov 2017

👍1

@philsturgeon and what about the other proposed solution, would that help as well?

Open timeout -> Faraday::ConnectionFailed
Read timeout -> Faraday::TimeoutError

That should be the behaviour on all adapters, but unfortunately some are not behaving as expected (i.e. Typhoeus)

iMacTia on 15 Nov 2017

I feel like those are different things.

ConnectionFailed seems like "I have no idea how to talk to this server", like an invalid DNS/IP etc.

OpenTimeout is "I know where this server is im just waiting for it to do a thing"

philsturgeon on 15 Nov 2017

OpenTimeout is "I know where this server is im just waiting for it to do a thing"

I disagree with that, I'd rather say:

Open Timeout: I'm trying to contact the server, but I can't reach it (Note: connection not established or "opened" yet).
Read Timeout: I've established a connection with the server but I'm waiting for it to do a thing (reading the output).

A faulty firewall/proxy/load_balancer are just simple examples of how you might get an open timeout, but in all this cases the connection to the server has not started yet. That's the most important bit for me. "ConnectionFailed" to me simply means: I couldn't connect to the server. And it perfectly suits these cases.

If you still think that a specific Faraday::OpenTimeoutError should exist, then I'd suggest that to inherit from ConnectionFailed rather than TimeoutError but I agree that would be a bit confusing and not sure how it would help in practice.
Please see my previous https://github.com/lostisland/faraday/issues/718#issuecomment-343957963 on how this might actually help to manage the error.

Does it make sense? I would like to find a solution that fits everyone

iMacTia on 15 Nov 2017

I accept your more accurate definitions for open timeout but I come to a different conclusion.

You consider open timeout to be considered a connection failure as the amount of time you're wiling to wait for that connection is considered part of the connection. "Failed to make a connection in 5s" certainly makes sense if you explain it like that, but that's not how a lot of people think.

For many, open timeout just means it has not happened yet. That makes it less of a definitive statement than most connection failures, which is "The server is down" or "This DNS is garbage".

I suppose it doesn't much matter, as connection failures and open timeouts should both be retried, here as a read timeout might be considered grounds to back off?

philsturgeon on 16 Nov 2017

I agree, we can argue as much as we want on the reading one can apply to it, but the practicality of my point is what you said as well: If you get an Open Timeout it means you can retry the request, if you get a Read Timeout it means you have to be VERY careful as your request might have been process (entirely or partially). Coincidentally, the practical meaning of an open timeout matches the one of a failed connection, hence I would make it inherit from there.

Today people are catching ConnectionFailed and TimeoutError exceptions and the logics behind is very probably reflecting what we said before. If we introduce the new exception as a subclass of ConnectionFailed then chances are high that most (If not all) application won't need any change.

I understand (and agree) from a semantic point of view though that an OpenTimeout is just another type of Timeout.

But hey, what if we call it ConnectionTimedOut instead?

iMacTia on 16 Nov 2017

There would be some confusion around open_timeout: X being the name of the property that says how long to wait until throwing a ConnectionTimedOut.

philsturgeon on 16 Nov 2017

Good point 😞

iMacTia on 16 Nov 2017

Call it ConnectionOpenTimeout? It makes it clear its a connection problem and keeps it clear that its an opening timeout. I think this name keeps understanding in line with the "Failed to make a connection in X seconds" meaning, even though some people might still wonder why timeout is not a timeout. 😅

philsturgeon on 16 Nov 2017

Sounds good to me 👍!

iMacTia on 16 Nov 2017

@iMacTia hey, if you could give me some pointers in where to start, I could have a go at doing this.

philsturgeon on 8 May 2018

Thanks @philsturgeon, that would be great! Allow me to recap the main points around this:

All changes will need to be done against v1.0 branch (as they'll be backwards incompatible).
Timeout management behaviour is inconsistent across adapters, so we need to standardise it.
The agreed behaviour is the following:
In case of OPEN timeout, we'll raise a ConnectionOpenTimeout error that will inherit from ConnectionFailed.
In case of READ timeout, we'll raise a TimeoutError.

Did I miss anything?

iMacTia on 9 May 2018

Will ConnectionOpenTimeout be added to the default Faraday::Request::Retry handled exceptions?

mjhoy on 21 Aug 2018

@mjhoy that's a good point but at the moment the Retry middleware doesn't retry in case of connection problems. It only retries the request if the connection was successful but there was a timeout. In fact I'm not sure it makes sense to retry a request if the service you're calling is not reachable at all, you may prefer to get the exception back and do something else in that case.

However, the Faraday::Request::Retry is configurable so nothing stops you from adding ConnectionOpenTimeout or even ConnectionFailed to the list of exceptions you want it to handle.

I'd like to see some more "community opinion" before adding those to the list of default exceptions

iMacTia on 22 Aug 2018

We're running into OpenTimeout errors with an API endpoint occasionally that need to be retried; it seems from your logic above that open timeouts are safe to retry (more safe than a read timeout). We also had assumed that with the retry middleware, both open and read timeouts would be retried; the documentation reads, "By default, it retries 2 times and handles only timeout exceptions." The default exceptions handled are Errno::ETIMEDOUT, Timeout::Error, Error::TimeoutError, and Net::OpenTimeout is a subclass of Timeout::Error; it wasn't particularly clear that Faraday was treating them differently. So perhaps the documentation should be updated? In any case, yes, we configured the middleware; I'm just wondering if the default makes sense.

mjhoy on 22 Aug 2018

👍1

@mjhoy You're right in saying that Timeout::Error includes Net::OpenTimeout as well so with the current implementation it seems like the open timeout should also be retried. Moreover, Timeout::Error is rescued and re-raised by the adapter under normal circumstances, so its presence in the list of the exception might be unnecessary or just for extra safety.

Once we'll be done with the exceptions refactoring, the Net::OpenTimeout will be raised as a new exception and as I said in my comment, changing the current default behaviour.

I still believe that shouldn't be part of the defaults, but it's definitely something to consider while doing the work.

Thanks for raising this 😄

iMacTia on 7 Sep 2018

Hey, sorry this languished in my teams backlog for a year and now our priorities have changed a bunch. I won't be doing any work on this issue, but good luck!

philsturgeon on 21 Sep 2018

@iMacTia Is anyone working on this change? I don't mind taking this up for v2.0.

ragav0102 on 7 Oct 2019

Hi @ragav0102, thanks for the support!
No one is working on this yet, as we're still pushing to get v1.0 out of the door.

We'd definitely appreciate the help, but we don't have a plan yet for v2.0 so I can't tell when it will be released, so your changes may need to wait months before they can be used.

If you need this in one of your projects, then that's probably not feasible.
If you just passed and would like to contribute, I'd suggest you to pick something scheduled for v1.0 as it will be released much sooner 😄

iMacTia on 8 Oct 2019

Got it!

ragav0102 on 9 Oct 2019

Was this page helpful?

0 / 5 - 0 ratings