Request: accented characters? � � �

Created on 26 Nov 2011  ·  17Comments  ·  Source: request/request

Hi!

I'm trying to scrap a web page with accented characters á é ó ú ê ã etc. I tried encoding: utf-8, but I'm still getting this ��� characters in the result.

 request.get({
      uri: url,
      encoding: 'utf-8'
      // ...

Most helpful comment

request({url: "www.example.com", encoding: "latin1"}, function (error, response, html) {
console.log('error:', error);

All 17 comments

Well, what's the encoding the page uses? You can't just throw a utf8 parser at ISO-whatever.

@thejh the page encoding is iso-8859-1, I've also tried:

request.get({
    uri: url,
    encoding: 'iso-8859-1'
    // ...

and I got:

Error: Unknown encoding

But I reading this issue https://github.com/mikeal/request/issues/27 then http://nodejs.org/docs/v0.6.0/api/http.html#request.setEncoding

Set the encoding for the request body. Either 'utf8' or 'binary'. Defaults to null, which means that the 'data' event will emit a Buffer object..

It worked.

Have a look at the iconv library.

Okay... but do you know why binary worked?

Because it just takes the raw buffers data. Also, the string still isn't utf8, so don't do it.

But in this case, what is the proper value for encoding?

No encoding. Take it as a buffer, then stuff it into iconv.

the confusion appears to be over "binary" and Buffer, which is also binary.

"binary" is, mostly, a legacy encoding from the node 0.1.x days where we encoded all binary in to strings.

in node.js 0.2 we got a Buffer object, which is a raw allocation of memory outside of v8's heap. the object is not a string, and can hold raw binary data you get out of a file descriptor and send it to another file descriptor without suffering conversion to string.

in request, you can pipe() a request object to any stream and all the buffers will be sent to the destination stream. if all you're doing is taking binary data from an http request and sending to a file, socket, or http response, you should just use pipe().

How can I use pipe with request module?

@mikeal Awesome!

I need to scrap more than one URL in the same HTTP request (it's a webapp) and then send all these data to the response.

I can't send it like that:

request.get({
        uri: url1
}).pipe(res);

request.get({
        uri: ur2
}).pipe(res);

Is there any other way to do it instead of

var writeStream = fs.createWriteStream('./output');
request.get({
        uri: url1
}).pipe(writeStream);

request.get({
        uri: url2
}).pipe(writeStream);

// after all pipes finish I send writeStream content to the response

?

Which stream can I use with pipe?

you can use any Stream :)

HTTP Server responses, you can use it as the body of another request object, you can open a file write stream. anything :)

request({url: "www.example.com", encoding: "latin1"}, function (error, response, html) {
console.log('error:', error);

@vickygill69 Thanks, your answer resolve my problem

setting encoding to null and then using the response buffer with iconv worked for me. Thanks!

Was this page helpful?
0 / 5 - 0 ratings