Phantomjs: Phantomjs๋Š” ๋‹ค๋ฅธ UTF-8 ๋‹จ์–ด๋ฅผ ํ‘œ์‹œํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

์— ๋งŒ๋“  2014๋…„ 03์›” 16์ผ  ยท  3์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: ariya/phantomjs

url2src.js๋ฅผ ์ž‘์„ฑํ•˜์—ฌ html ํŒŒ์ผ์„ ๋ชจ๋“  ์ž๋ฐ”์Šคํฌ๋ฆฝํŠธ๊ฐ€ ์ฒ˜๋ฆฌ๋˜๋Š” ๋‹ค๋ฅธ html ํŒŒ์ผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
์ผ๋ถ€ UTF-8 ๋‹จ์–ด(์˜ˆ: โ€”)๊ฐ€ ์žˆ์œผ๋ฉด ๊ฒฐ๊ณผ๊ฐ€ ์˜ฌ๋ฐ”๋ฅด์ง€ ์•Š๋‹ค๋Š” ๊ฒƒ์„ ์•Œ์•˜์Šต๋‹ˆ๋‹ค.

Windows 8.1 ์ค‘๊ตญ์–ด ๋ฒ„์ „์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. Windows cmd์—์„œ ๋‹ค์Œ ๋ช…๋ น์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

d:\epub\components>phantomjs  --output-encoding=utf8 --script-encoding=utf8 url2src.js activities2.html activities2-processed.html
d:\epub\components>phantomjs --version
1.9.7

url2src.js์˜ ๋‚ด์šฉ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

var page = require('webpage').create(),
    system = require('system'),
    t, address, output;

if (system.args.length !== 3) {
    console.log('Usage: url2src.js <some URL> <output File path>');
    phantom.exit();
}

t = Date.now();
address = system.args[1];
output = system.args[2];
page.open(address, function (status) {
    if (status !== 'success') {
        console.log('FAIL to load the address : ' + address);
    } else {
        t = Date.now() - t;
        //console.log('Loading time ' + t + ' ms');
        var js = page.evaluate(function () {
            return document;
        });
        //console.log(js.all[0].outerHTML); 
        var fs = require('fs');
        try {
            fs.write(output, js.all[0].outerHTML, 'w');
        } catch(e) {
            console.log(e);
        }
    }
    phantom.exit();
});

์ž์„ธํ•œ ์ •๋ณด๋Š” ์—ฌ๊ธฐ ์ฒจ๋ถ€ ํŒŒ์ผ์— ์žˆ์Šต๋‹ˆ๋‹ค:
https://groups.google.com/forum/#!topic/phantomjs/oqvK8mkk6aY

๋„์›€์„ ์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

์ž…๋ ฅ๋œ html ๋ฌธ์„œ๊ฐ€ ๊ธฐ๋ณธ์ ์œผ๋กœ ISO-8859-1๋กœ ์ทจ๊ธ‰๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด์ œ ๊ทธ ์ด์œ ๋ฅผ ์•Œ์•˜์Šต๋‹ˆ๋‹ค.
๋ฉ”ํƒ€๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ charset=utf-8๋กœ ์„ค์ •ํ•˜๋ฉด ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

<html>
<head>
 <meta http-equiv="content-type" content="text/html; charset=UTF-8">
 </head>
 <body>
    <p> The user interface for an activity is provided by a hierarchy of viewsโ€”objects derived from the <code>View</code> class. </p> 
 </body>
</html>

๋ชจ๋“  3 ๋Œ“๊ธ€

์ž…๋ ฅ๋œ html ๋ฌธ์„œ๊ฐ€ ๊ธฐ๋ณธ์ ์œผ๋กœ ISO-8859-1๋กœ ์ทจ๊ธ‰๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด์ œ ๊ทธ ์ด์œ ๋ฅผ ์•Œ์•˜์Šต๋‹ˆ๋‹ค.
๋ฉ”ํƒ€๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ charset=utf-8๋กœ ์„ค์ •ํ•˜๋ฉด ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

<html>
<head>
 <meta http-equiv="content-type" content="text/html; charset=UTF-8">
 </head>
 <body>
    <p> The user interface for an activity is provided by a hierarchy of viewsโ€”objects derived from the <code>View</code> class. </p> 
 </body>
</html>

๋‹น์‹  ๋ง์ด ๋งž์Šต๋‹ˆ๋‹ค.ํ•˜์ง€๋งŒ ๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค์ด ๋‚˜์—๊ฒŒ์ฃผ๋Š” html์„ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.์–ด๋–ป๊ฒŒํ•ด์•ผํ•ฉ๋‹ˆ๊นŒ?

ํ—ค๋” ์ •๋ณด๊ฐ€ ์—†๋Š” ๋ฌธ์„œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค(ํƒ€์‚ฌ ์•ฑ์—์„œ ๊ฐ€์ ธ์˜ด)... ์ธ์ฝ”๋”ฉ์„ ๊ฐ•์ œํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์ข‹์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰