Faraday: response.body is ASCII-8BIT when Content-Type is text/xml; charset=utf-8

Created on 16 Apr 2012  ·  9Comments  ·  Source: lostisland/faraday

First time using faraday, so I might be doing things incorrectly, but the response.body encoding in the following is ASCII-8BIT:

  def self.search(term)
    connection = Faraday.new(url: 'https://en.wikipedia.org')
    response = connection.get do |req|
      req.options = { :timeout => 5, :open_timeout => 3 }
      req.url '/w/api.php' , action: 'opensearch', format: 'xml', search: term
    end
    puts response.body.encoding
  end

In 1.9.2 this causes REXML to throw an Encoding::CompatibilityError.

I couldn't find a way to force faraday to provide response.body in UTF-8.

What is the preferred solution to this?

bug

Most helpful comment

Workaround I used is:

response.body.force_encoding('utf-8')

Yahuda has a dissertation about the problem here.

All 9 comments

Just encountered the same issue. Any ideas?

Workaround I used is:

response.body.force_encoding('utf-8')

Yahuda has a dissertation about the problem here.

I'm pretty sure Faraday just passes on the response body from the underlying adapter. I'm not sure I want to raise errors or perform lossy conversions of the data in Faraday. That can be done in a custom middleware if you really need it.

Fair enough. If the problem is elsewhere, as it appears, I guess it will be cleaned up in time. It's not a show stopper for me.

Closing because it's not a bug with Faraday.

I'm not sure the underlying adapter - at least net/http - does any encoding transformation. You can set Ruby's Encoding.default_external to something like 'US-ASCII', then hit an endpoint with Content-Type = '...; charset=utf-8' ... net/http will parse the charset string and make it available, but does nothing to the encoding of the body string. Maybe net/http should be responsible for that, but if it isn't, the ParseJson middleware (for example) can blow up.

Did some more research on this - some of the underlying adapters handle the Content-Type charset, some don't:

EM-HTTP-Request does. [commit].
Patron does. [commit].
HTTPClient does [[commit](https://github.com/nahi/httpclient/commit/e5efea5afb3b5cf6ead3a131644dee71be1ee5e9)] [[issue](https://github.com/nahi/httpclient/issues/26)].
Typhoeus and Excon (and net/http) don't appear to.

I guess the nicest thing to do would be to perhaps offer an optional middleware for adapters that don't try, but, yeah, I'd agree, this probably shouldn't be Faraday's responsibility.

@chrismo you're my hero. Thanks for doing that research!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

subvertallchris picture subvertallchris  ·  5Comments

amrrbakry picture amrrbakry  ·  4Comments

ryanbyon picture ryanbyon  ·  3Comments

mattmill30 picture mattmill30  ·  4Comments

luizkowalski picture luizkowalski  ·  3Comments