Hi!
I'm trying to scrap a web page with accented characters á é ó ú ê ã etc. I tried encoding: utf-8
, but I'm still getting this ��� characters in the result.
request.get({
uri: url,
encoding: 'utf-8'
// ...
Well, what's the encoding the page uses? You can't just throw a utf8 parser at ISO-whatever.
Btw, he cross-posted to SO: http://stackoverflow.com/questions/8332500/module-request-how-to-properly-retrieve-accented-characters
@thejh the page encoding is iso-8859-1, I've also tried:
request.get({
uri: url,
encoding: 'iso-8859-1'
// ...
and I got:
Error: Unknown encoding
But I reading this issue https://github.com/mikeal/request/issues/27 then http://nodejs.org/docs/v0.6.0/api/http.html#request.setEncoding
Set the encoding for the request body. Either 'utf8' or 'binary'. Defaults to null, which means that the 'data' event will emit a Buffer object..
It worked.
Have a look at the iconv library.
Okay... but do you know why binary worked?
Because it just takes the raw buffers data. Also, the string still isn't utf8, so don't do it.
But in this case, what is the proper value for encoding?
No encoding. Take it as a buffer, then stuff it into iconv.
the confusion appears to be over "binary" and Buffer, which is also binary.
"binary" is, mostly, a legacy encoding from the node 0.1.x days where we encoded all binary in to strings.
in node.js 0.2 we got a Buffer object, which is a raw allocation of memory outside of v8's heap. the object is not a string, and can hold raw binary data you get out of a file descriptor and send it to another file descriptor without suffering conversion to string.
in request, you can pipe() a request object to any stream and all the buffers will be sent to the destination stream. if all you're doing is taking binary data from an http request and sending to a file, socket, or http response, you should just use pipe().
How can I use pipe with request module?
@phstc docs dude! https://github.com/mikeal/request/blob/master/README.md
@mikeal Awesome!
I need to scrap more than one URL in the same HTTP request (it's a webapp) and then send all these data to the response.
I can't send it like that:
request.get({
uri: url1
}).pipe(res);
request.get({
uri: ur2
}).pipe(res);
Is there any other way to do it instead of
var writeStream = fs.createWriteStream('./output');
request.get({
uri: url1
}).pipe(writeStream);
request.get({
uri: url2
}).pipe(writeStream);
// after all pipes finish I send writeStream content to the response
?
Which stream can I use with pipe?
you can use any Stream :)
HTTP Server responses, you can use it as the body of another request object, you can open a file write stream. anything :)
request({url: "www.example.com", encoding: "latin1"}, function (error, response, html) {
console.log('error:', error);
@vickygill69 Thanks, your answer resolve my problem
setting encoding to null and then using the response buffer with iconv worked for me. Thanks!
Most helpful comment
request({url: "www.example.com", encoding: "latin1"}, function (error, response, html) {
console.log('error:', error);