Tesseract: C # Tesseract 3.02 ์ด๋ฏธ์ง€์—์„œ ๋‹จ์–ด์˜ ๊ฐ ๋ฌธ์ž์— ์•ก์„ธ์Šคํ•˜๋Š” ๋ฐฉ๋ฒ•

์— ๋งŒ๋“  2014๋…„ 01์›” 12์ผ  ยท  3์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: charlesw/tesseract

์•ˆ๋…•ํ•˜์„ธ์š”, ์ €๋Š” ์—ฌ๊ธฐ ์ดˆ๋ณด์ž์ž…๋‹ˆ๋‹ค.
๋จผ์ € ์ด๋ฏธ์ง€์—์„œ ๋‹จ์–ด์˜ ๊ฐ ๋ฌธ์ž์— ์ง์‚ฌ๊ฐํ˜•์„ ๊ทธ๋ ค์•ผํ•ฉ๋‹ˆ๋‹ค.
์ด์ „ ๋ฒ„์ „์˜ tesseract์—์„œ ๋‚˜๋Š” ์šฐ๋ฆฌ๊ฐ€

foreach (word.CharList์˜ tesnet2.Character c)
e.Graphics.DrawRectangle ..........

demo

ํ•˜์ง€๋งŒ ์ง€๊ธˆ์€ Tesseract 3.02๋กœ C # winform์—์„œ ์ž‘์—…ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

TesseractEngine a = new TesseractEngine (@ "./ tessdata", "eng", EngineMode.TesseractAndCube);
Tesseract.Page page1 = a.Process (์ด๋ฏธ์ง€);
foreach (....... 1 ํŽ˜์ด์ง€)
{
// (๊ฐ ๋ฌธ์ž์˜ ๊ฒฝ๊ณ„ ์ƒ์ž)์—์„œ ์ง์‚ฌ๊ฐํ˜•์„ ๊ทธ๋ฆฝ๋‹ˆ๋‹ค.
}

์งˆ๋ฌธ 1 : page1์˜ ๊ฐ ๋ฌธ์ž์— ์•ก์„ธ์Šคํ•˜๋Š” ๋ฐฉ๋ฒ•.

PageIteratorLevel๊ณผ ๊ฐ™์€ ๋งŽ์€ ๋ฐฉ๋ฒ•์„ ์‹œ๋„ํ•˜๊ณ  ์ฒซ ๋ฒˆ์งธ ์ค„, ์ฒซ ๋ฒˆ์งธ ๋‹จ์–ด ๋˜๋Š” ์ฒซ ๋ฒˆ์งธ ๋ธ”๋ก๊ณผ ๊ฐ™์€ ํŽ˜์ด์ง€์˜ ์ผ๋ถ€๋ฅผ ์–ป์—ˆ์ง€๋งŒ ์ฒซ ๋ฒˆ์งธ ๋ฌธ์ž๋ฅผ ์–ป์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.
๊ธ€์Ž„, ๋‚˜๋Š” page1์˜ HOCRtext ๊ฒฐ๊ณผ ํ…์ŠคํŠธ์—์„œ word, line, block๊ณผ ๊ฐ™์€ ๊ฐ ์š”์†Œ์— Bounding box์˜ ๊ฐ’์ด ์žˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์งˆ๋ฌธ 2 : ๊ฐ ์š”์†Œ์˜ ๊ฒฝ๊ณ„ ์ƒ์ž ๊ฐ’์„ ์–ป๋Š” ๋ฐฉ๋ฒ•. (๋ถ€์šธ ๋งŒ ๋ฐ˜ํ™˜ํ•˜๋Š” "TryGetBoundingBox"๋ฉ”์„œ๋“œ๋ฅผ ํ•˜๋‚˜๋งŒ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค.

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

question

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

1 ๋ถ„๊ธฐ ๋‹ต๋ณ€ :

๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜๋ณตํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์˜ˆ์ œ๋ฅผ ์ œ๊ณตํ•˜๋ฏ€๋กœ ์ œ๊ณต๋œ ์ฝ˜์†” ์ƒ˜ํ”Œ์„ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒƒ์ด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

using (var iter = page.GetIterator()) {
    do {
        do {
            do {
                if (iter.IsAtBeginningOf(PageIteratorLevel.Block)) {
                    // do whatever you need to do when a block (top most level result) is encountered.
                }
                if (iter.IsAtBeginningOf(PageIteratorLevel.Para)) {
                    // do whatever you need to do when a paragraph is encountered.
                }
                if (iter.IsAtBeginningOf(PageIteratorLevel.TextLine)) {
                    // do whatever you need to do when a line of text is encountered is encountered.
                }                                               
                if (iter.IsAtBeginningOf(PageIteratorLevel.Word)) {
                    // do whatever you need to do when a word is encountered is encountered.
                }

                // get bounding box for symbol
                Rect symbolBounds;
                if(iter.TryGetBoundingBox(PageIteratorLevel.Symbol, out symbolBounds)) {
                    // do whatever you want with bounding box for the symbol
                }
            } while(iter.Next(PageIteratorLevel.Word, PageIteratorLevel.Block));
        } while (iter.Next(PageIteratorLevel.TextLine, PageIteratorLevel.Word));
    } while (iter.Next(PageIteratorLevel.Para, PageIteratorLevel.TextLine));
}

์ผ๋ฐ˜์ ์ธ ๊ฒฐ๊ณผ ๊ณ„์ธต์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋ธ”๋ก-> Para-> TextLine-> Word-> Symbol

์ฆ‰, ๊ฒฐ๊ณผ ์„ธํŠธ๋Š” ๋งŽ์€ ๋ธ”๋ก์„ ํฌํ•จ ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ฐจ๋ก€๋กœ ๋งŽ์€ ๋‹จ๋ฝ์„ ํฌํ•จ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์งˆ๋ฌธ 2์— ๋Œ€ํ•œ ๋‹ต๋ณ€ :

์œ„์™€ ๊ฐ™์ด TryGetBoundingBox ๋ฉ”์„œ๋“œ๋Š” out ๋งค๊ฐœ ๋ณ€์ˆ˜์˜ ๊ฒฝ๊ณ„๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. Dictionary.TryGetValue ์™€ ๋งค์šฐ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋“  3 ๋Œ“๊ธ€

1 ๋ถ„๊ธฐ ๋‹ต๋ณ€ :

๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜๋ณตํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์˜ˆ์ œ๋ฅผ ์ œ๊ณตํ•˜๋ฏ€๋กœ ์ œ๊ณต๋œ ์ฝ˜์†” ์ƒ˜ํ”Œ์„ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒƒ์ด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

using (var iter = page.GetIterator()) {
    do {
        do {
            do {
                if (iter.IsAtBeginningOf(PageIteratorLevel.Block)) {
                    // do whatever you need to do when a block (top most level result) is encountered.
                }
                if (iter.IsAtBeginningOf(PageIteratorLevel.Para)) {
                    // do whatever you need to do when a paragraph is encountered.
                }
                if (iter.IsAtBeginningOf(PageIteratorLevel.TextLine)) {
                    // do whatever you need to do when a line of text is encountered is encountered.
                }                                               
                if (iter.IsAtBeginningOf(PageIteratorLevel.Word)) {
                    // do whatever you need to do when a word is encountered is encountered.
                }

                // get bounding box for symbol
                Rect symbolBounds;
                if(iter.TryGetBoundingBox(PageIteratorLevel.Symbol, out symbolBounds)) {
                    // do whatever you want with bounding box for the symbol
                }
            } while(iter.Next(PageIteratorLevel.Word, PageIteratorLevel.Block));
        } while (iter.Next(PageIteratorLevel.TextLine, PageIteratorLevel.Word));
    } while (iter.Next(PageIteratorLevel.Para, PageIteratorLevel.TextLine));
}

์ผ๋ฐ˜์ ์ธ ๊ฒฐ๊ณผ ๊ณ„์ธต์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋ธ”๋ก-> Para-> TextLine-> Word-> Symbol

์ฆ‰, ๊ฒฐ๊ณผ ์„ธํŠธ๋Š” ๋งŽ์€ ๋ธ”๋ก์„ ํฌํ•จ ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ฐจ๋ก€๋กœ ๋งŽ์€ ๋‹จ๋ฝ์„ ํฌํ•จ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์งˆ๋ฌธ 2์— ๋Œ€ํ•œ ๋‹ต๋ณ€ :

์œ„์™€ ๊ฐ™์ด TryGetBoundingBox ๋ฉ”์„œ๋“œ๋Š” out ๋งค๊ฐœ ๋ณ€์ˆ˜์˜ ๊ฒฝ๊ณ„๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. Dictionary.TryGetValue ์™€ ๋งค์šฐ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

์•ˆ๋…• ์ฐฐ์Šค,

๋‹น์‹ ์ด ์ž˜ํ•˜๊ณ  ์žˆ๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค.

๋‚˜๋Š”์ด ๋ฌผ๊ฑด์— ์ต์ˆ™ํ•˜์ง€ ์•Š๋‹ค. ์ž‘์€ ๊ทธ๋ฆผ์ด๋‚˜ ํ…Œ์ŠคํŠธ ๊ทธ๋ฆผ์—์„œ ํ•„์š”ํ•œ ํ…์ŠคํŠธ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์ง€๋งŒ ์‹ค์ œ ๊ทธ๋ฆผ์—์„œ๋Š” ์–ป์„ ์ˆ˜ ์—†๋‹ค.

  1. ์‚ฌ์ง„์—์„œ BIB #๋ฅผ ์ถ”์ถœํ•˜๋Š” ๋ฐฉ๋ฒ•.
    NotWorking

  2. ์ „์ฒด ์‚ฌ์ง„์—์„œ BIB # ์˜์—ญ์„ ์ธ์‹ํ•˜๋Š” ๋ฐฉ๋ฒ•.
    H1764

๊ฐ์‚ฌ.

opencv๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์˜์—ญ์„ ์ฐพ๊ณ  ์ž๋ฆ…๋‹ˆ๋‹ค. .net์œผ๋กœ ๋ฒˆ์—ญํ•˜๊ธฐ ์–ด๋ ต์ง€ ์•Š์€ Python์œผ๋กœ ์ž‘์„ฑ๋œ ๋ฐ๋ชจ๋ฅผ ๊ฐ€์ง„ ์‚ฌ๋žŒ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰