Notepad-plus-plus: Notepad++ 7.7 in ANSI encoding instead of Cyrillic characters "????????"

Created on 20 May 2019  ·  86Comments  ·  Source: notepad-plus-plus/notepad-plus-plus

Notepad++ v7.7 (32-bit)
Build time : May 19 2019 - 13:08:20
Path : C:\Users\Uzeer\Downloads\npp.7.7.bin.minimalist\notepad++.exe
Admin mode : ON
Local Conf mode : ON
OS : Windows 7 (64-bit)
Plugins : none

Notepad++ v7.7 (64-bit)
Build time : May 19 2019 - 13:05:35
Path : C:\Users\Uzeer\Downloads\npp.7.7.bin.minimalist.x64\notepad++.exe
Admin mode : ON
Local Conf mode : ON
OS : Windows 7 (64-bit)
Plugins : none

Most helpful comment

@andrecool-68
Write me an email, you can find it in bulgarian.xml. I can answer you here of course, but this is an N++ issue, not for other programs :)

All 86 comments

Confirm

Click to expand


The screen is old, but the problem still exist

Debug Info

Notepad++ v7.7 (64-bit)
Build time : May 19 2019 - 13:05:35
Path : D:\Install\Office Programs\Notepad++\notepad++.exe
Admin mode : ON
Local Conf mode : ON
OS : Windows 7 (64-bit)
Plugins : AutoSave.dll ComparePlugin.dll ShtirlitzNppPlugin.dll TakeNotes.dll VisualStudioLineCopy.dll

@donho
I have tested all versions of SciTE, starting with the version of Scintilla 3.6.7, problems with Cyrillic begin.

Imgur

@rddim & @andrecool-68
So what's the native language of your OS ?

Is anyone can reproduce it under Windows 10 ?

@donho
I have Win 7 x64 Russian version.

@donho

just in case you see the same as i do:
this will be f. tricky to solve for a non native russian or bulgarian, if possible at all.

i was never able to insert cyrillic text to ansi files, on an english, spanish or german windows 10 and 7, regardless of the notepad++ version.
tested and re-tested on 7.5.5 to 7.7.
russian text to utf-8 works on all versions.
(see screencast below)

7 6 6 cyrillic on ansi

greetings.

@andrecool-68

I have tested all versions of SciTE, starting with the version of Scintilla 3.6.7, problems with Cyrillic begin.

So Scintilla 3.6.6 works for you?
Typing Cyrillic in ANSI is broken in version 3.6.7 and later version, is that correct ?

@donho
That's right, the problems start in version 3.6.7 and higher!
Windows 7 x64

333

Windows 10 x64 (VirtualBox)

111

@donho
Both machines (office laptop and home pc) are with Win7 Pro x64 SP1 English. I can not test on Win10.
Confirm that the problem start from SciTE v3.6.7 and it is not broken in SciTE v3.6.6

@donho
I'm not sure is this will help you but: https://sourceforge.net/p/scintilla/bugs/2093/#3ee4

@donho
I tried to make these changes, and Cyrillic appeared.
file: ScintillaWin.cxx
+ case SC_CHARSET_DEFAULT: return documentCodePage;
- case SC_CHARSET_DEFAULT: вернуть documentCodePage? documentCodePage: 1252;

But after each attempt to print a Cyrillic character, an error occurs)))

Безымянный

@andrecool-68 that's strange, case SC_CHARSET_DEFAULT: return documentCodePage; is the old code in Scintilla before 3.6.6 (used by npp before 7.7).

Can you test or debug Notepad2 (in both GDI and D2D mode, Settings -> Rendering Technology) at https://github.com/zufuliu/notepad2/releases
When debug assertion failed, choose to break, then look at the stacktrace, see where it failed.
With the stacktrace we may know how to fix it.

@donho I can confirm the bug also manifests itself on Windows 10 x64.
@zufuliu I've tested your Notepad2 builds and the text renders correctly in both D2D and GDI mode.

image

Edited to add:
If you manually select Windows-1251 codepage from the Encoding menu of NP++, the text renders correctly.

@rddim do you select encoding to ANSI before typing? (via menu: File -> Encoding -> ANSI), because the default encoding is UTF-8. the status bar shows encoding name before EOL mode like CR+LF.

Also try to use different scheme. Scheme -> Text File, test both Text File and 2nd Text File. or enable (which use monospaced font like Consolas) and disable (which use proportional font like Segoe UI) Scheme -> Use Default Code Style.

@zufuliu

Everything work as expected in Notepad2

notepad2_ansi_cyrillic

@rddim thanks.

Screenshot from https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5671#issuecomment-495190280 pointer to _chvalidator function. From the comment above the function, some ctype functions been called with out of range characters.
It's unknown where the call is, Scintilla it doesn't direct call ctype functions (except lexers).

@andrecool-68 can run your debug build of NPP under VS (click Local Windows Debugger), and take a screenshot of Call Stack when assertion failed?

OK, I find out this code (already reported as issue #5280) cause assertion failure when typing non-ASCII characters (both ANSI and UTF-8 code page).

static bool isAllDigits(const generic_string &str)
{
    return std::all_of(str.begin(), str.end(), ::isdigit);
}

@zufuliu

OK, I find out this code (already reported as issue #5280) cause assertion failure when typing non-ASCII characters (both ANSI and UTF-8 code page).

It's used by Auto-completion of Notepad++, but nothing to do with the Russian input failure.
Any idea about the source of problem?

@andrecool-68 @rddim
Could you disable auto-completion then try it again?

@donho

Same result - кирилица (utf-8) => ???????? (ansi)

I think the main reason is the change in Scintilla's CodePageFromCharSet:

-   case SC_CHARSET_DEFAULT: return documentCodePage;
+   case SC_CHARSET_DEFAULT: return documentCodePage ? documentCodePage : 1252;

Since SC_CHARSET_DEFAULT is used, so 1252 instead of 1251 is used to convert input Cyrillic characters, which maps to rubbish.

From discussion on bug https://sourceforge.net/p/scintilla/bugs/2093/#3ee4,
it's to suggested set font charset to SC_CHARSET_RUSSIAN in this case.
But from my experience, set locale depended charset need the font been used actual supporting the charset.
https://sourceforge.net/p/scintilla/bugs/2093/#263b/5bac/7f06

Revert back to case SC_CHARSET_DEFAULT: return documentCodePage is possible the simplest fix, because we known the code page we set to Scintilla are only UTF-8, DBCS ANSI code pages (932, 949, 950 and 1361) and SBCS ANSI code pages (0, CP_ACP).

@donho I suggest upgrade to 4.1.5, because the bug for tying DBCS https://sourceforge.net/p/scintilla/bugs/2093

@donho
@zufuliu

I compiled the original Scintilla 4.1.5, copied SciLexer.dll to the Notepad ++ directory. The result has not changed, instead of Cyrillic "?????"

000

@andrecool-68 with the change to Scintilla's CodePageFromCharSet (in ScintillaWin.cxx)?

+   case SC_CHARSET_DEFAULT: return documentCodePage;
-   case SC_CHARSET_DEFAULT: return documentCodePage ? documentCodePage : 1252;

The assertion failure is because auto-completion, you can disable auto-completion before testing.

@zufuliu I have not changed anything ... the original Scintilla downloaded from the official site

@andrecool-68 please try changing CodePageFromCharSet to case SC_CHARSET_DEFAULT: return documentCodePage;.

or add first line to void ScintillaEditView::defineDocType(LangType typeDoc) (line 1315 in PowerEditor\src\ScitillaComponent\ScintillaEditView.cpp)

    execute(SCI_STYLESETCHARACTERSET, STYLE_DEFAULT, SC_CHARSET_RUSSIAN);
    execute(SCI_STYLECLEARALL);

@donho
@zufuliu

@andrecool-68 with the change to Scintilla's CodePageFromCharSet (in ScintillaWin.cxx)?

+   case SC_CHARSET_DEFAULT: return documentCodePage;
-   case SC_CHARSET_DEFAULT: return documentCodePage ? documentCodePage : 1252;

The assertion failure is because auto-completion, you can disable auto-completion before testing.

In the debug build notepad ++ it worked!

@andrecool-68 please try changing CodePageFromCharSet to case SC_CHARSET_DEFAULT: return documentCodePage;.

or add first line to void ScintillaEditView::defineDocType(LangType typeDoc) (line 1315 in PowerEditor\src\ScitillaComponent\ScintillaEditView.cpp)

    execute(SCI_STYLESETCHARACTERSET, STYLE_DEFAULT, SC_CHARSET_RUSSIAN);
    execute(SCI_STYLECLEARALL);

Both options are working!
But I think it is better to make corrections in the file ScintillaWin.cxx ?!
Because it is not known how the second option will affect other languages.

111

I can’t test the release build ... without signing the certificate it doesn’t work, I don’t know how to sign the certificate)))

See my comment in https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5671#issuecomment-496003800

I prefer to change Scintilla's CodePageFromCharSet.

@andrecool-68 @rddim
Could you tried the new compiled Scilexer.dll (without boost's PCRE) ?
https://notepad-plus-plus.org/temp/

@zufuliu

See my comment in #5671 (comment)

I prefer to change Scintilla's CodePageFromCharSet.

I said the same thing in the previous comment))

@zufuliu Thank you for your info.
Can you reproduce the bug (which is fixed in Scintilla 4.1.5) in Notepad++ 7.7 ?
If you can, could you provide the way to reproduce it?

@donho

@andrecool-68 @rddim
Could you tried the new compiled Scilexer.dll (without boost's PCRE) ?
https://notepad-plus-plus.org/temp/

With this file, Notepad ++ does not start at all, it gives an error "not found SciLexer.dll"

@donho This is not fixed in 4.1.5, the line case SC_CHARSET_DEFAULT: return documentCodePage ? documentCodePage : 1252; has not changed since 2016.

Scintilla 4.1.5 fixed bug with typing DBSC characters in DBSC code pages.

@donho
I changed only one line

namespace Scintilla {

UINT CodePageFromCharSet(DWORD characterSet, UINT documentCodePage) {
    if (documentCodePage == SC_CP_UTF8) {
        return SC_CP_UTF8;
    }
    switch (characterSet) {
    case SC_CHARSET_ANSI: return 1252;
    case SC_CHARSET_DEFAULT: return documentCodePage;
    // case SC_CHARSET_DEFAULT: return documentCodePage ? documentCodePage : 1252;

C:\Users\Uzzer\Downloads\notepad-plus-plus-master\scintilla\win32\ScintillaWin.cxx
Does not depend on the version is 4.1.4 or 4.1.5

With my file Scilexer.dll in the debug build, everything works for me only when auto-complete is disabled

@zufuliu

Scintilla 4.1.5 fixed bug with typing DBSC characters in DBSC code pages.

I have tested Notepad++ 7.7 under Windows 7 Chinese version.
The Chinese input in ANSI mode works in v7.7.
So for me there's no DBSC issue - at least for Chinese.
Have you any steady way to produce DBSC issue in Notepad++ v7.7 ?

@donho

Edit: I didn't find any other debug version of N++ on the website

@donho see the bug report at https://sourceforge.net/p/scintilla/bugs/2093/
It can be reproduced with NPP 7.7 binary.

@donho the bug (Typing DBCS) at https://sourceforge.net/p/scintilla/bugs/2093/ and another bug (Inline IME) at https://sourceforge.net/p/scintilla/bugs/2038/ (not fixed) will affect auto-completion.
I think NPP can simply ignore any ch > 0x7F in DBCS code pages or when ch is input from IME, auto-completion for CJK words is meanness.

@andrecool-68 @rddim
Please try this x64 build. Works only with Notepad++ debug mode, since it's not signed.

@donho

Another try this time with https://notepad-plus-plus.org/pluginListTestTools/notepad++.debug.x86.zip and SciLexer.32.dll gives me the same as https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5671#issuecomment-495190280 screen

I don't know how to run N++ in debug mode. If you mean something like from VS, I don't have VS.

@donho
Безымянный

2

@andrecool-68 @rddim
Thank you for you test. It seems it's not the solution what @zufuliu has suggested:

+   case SC_CHARSET_DEFAULT: return documentCodePage;
-   case SC_CHARSET_DEFAULT: return documentCodePage ? documentCodePage : 1252;

That's also interesting, it works in debug mode (https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5671#issuecomment-496128852) but not release mode?

at least Notaped2 use the change https://github.com/zufuliu/notepad2/blob/master/scintilla/win32/ScintillaWin.cxx#L1292

@andrecool-68 first line of execute(SCI_STYLECLEARALL); can be omitted.

@zufuliu
I did not notice the duplication of the line ...need to make a smoke break)))

@donho

After a smoke break too with notepad++.debug.x86.zip, SciLexer.32.dll and disabled Auto-Completion everything work as expected. With enabled Auto-Completion it gives me the previously mentioned error

@donho How can I turn off the "scintilla" certificate verification?
... so that can test the release builds
I will not distribute these releases, I will only test them.

After a smoke break too with notepad++.debug.x86.zip, SciLexer.32.dll and disabled Auto-Completion everything work as expected. With enabled Auto-Completion it gives me the previously mentioned error

So smoking is bad for the health, but good for program test ? :D

How can I turn off the "scintilla" certificate verification?

You can't. I will provide you guys the 32 & 64 signed release binary to make sure everything is OK.

@zufuliu So your solution works. Thank you. However, Does this modification create any side effect?

@donho Truth is born in any dispute
Thanks to everyone and my dog
Only she can carry my computer

@zufuliu thanks for your help
Your editor supports localization?

Woohoo it's alive :D Now about 275 million peoples can type Cyrillic in ANSI
Thank you very much @andrecool-68 @rddim @donho @MetaChuh @zufuliu

npp_ansi_solved

It doesn't matter to me at all ... but my friends need 1255 and 1251.

@andrecool-68
lol ... after what we have read today, it's better to put a black censor bar on your posted image, to make sure we don't offend anybody 😂

@rddim
i'm the bad cop, so no need for any thanks towards me.
it is don's private initiative to pursue this issue with all your collective help, despite newer scintilla versions being the cause.

best regards.

@MetaChuh
My dog is very tolerant to motorcycle drivers))

@MetaChuh
You are cunning ... you have something from a Jew
When the fight ended .. the boy came on a motorcycle ... it's not fair
When I want to turn my dog's tail ... I don’t ask anyone for help
But if there are problems with notepad ++, I want to solve this problem.

@zufuliu thanks for your help
Your editor supports localization?

Sorry, no plan on i18n.

After a smoke break too with notepad++.debug.x86.zip, SciLexer.32.dll and disabled Auto-Completion everything work as expected. With enabled Auto-Completion it gives me the previously mentioned error

So smoking is bad for the health, but good for program test ? :D

How can I turn off the "scintilla" certificate verification?

You can't. I will provide you guys the 32 & 64 signed release binary to make sure everything is OK.

@zufuliu So your solution works. Thank you. However, Does this modification create any side effect?

What solution? change Scintilla's CodePageFromCharSet or using SCI_STYLESETCHARACTERSET? the former returns ACP (0), should has side effect, the later has, see https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5671#issuecomment-496003800.

I had a simmilar issue with Notepad++ 7.7 32-bit, Czech language (windows-1250).
I was unable to type or paste some Czech letters in ANSI encoding.

Replacing the SciLexer.dll with the one provided (SciLexer32.signed.dll 27-May-2019 22:00) helped, the issue is gone.

Will be an Update for npp?

@lehha

Will be an Update for npp?

No, it won't - Not yet at least. It's not a complete solution and it'll have the side effect, as mentioned by @zufuliu

@andrecool-68 & @rddim
So the new Scilexer.dll works for you but you have to disable Auto-Completion right?
What's happening if Auto-Completion is ON?

SciLexer.32.dll and disabled Auto-Completion everything work as expected. With enabled Auto-Completion it gives me the previously mentioned error

What's the "previously mentioned error" ? Could you make it more clear for me?

@donho
~I don’t see any errors with Cyrillic.~
Auto-complete enabled.
https://notepad-plus-plus.org/temp/cyrillacPb/

000

Sorry, there are errors. After updating the window, Notepad ++ disappears the first character and the encoding has changed.
Безымянный

Here's another error, changing characters, changing the case of characters, changing the encoding.
111

Thanks! I cant see direct links above, so there is:

https://notepad-plus-plus.org/temp/cyrillacPb/SciLexer32.signed.dll
or
https://notepad-plus-plus.org/temp/cyrillacPb/SciLexer64.signed.dll

Must to be replaced SciLexer.dll in C:\Program Files (x86)\Notepad++

@donho

With the SciLexers from https://notepad-plus-plus.org/temp/cyrillacPb/ it works with enabled Auto-Completion, i.e. no problems. It doesn't work with combination from this comment https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5671#issuecomment-496189871 which is also the answer for "previously mentioned error"

@rddim So for you SciLexers from https://notepad-plus-plus.org/temp/cyrillacPb/ has fixed the problem without any side effect ?

@donho I test it again to be sure and:

  • the x64 SciLexer works perfect - no issues for now, cyrillic works and Auto-Completion works in ANSI
    npp_x64_auto-compl
  • the x32 SciLexer - I can type in cyrillic, but Auto-Completion doesn't work, it work with latin chars
    npp_x32_no_auto-compl

@rddim Restart your notepad++ and re-open this file ... and what do you see?

@andrecool-68 both x32 and x64 are readable after reopen

npp_reopen

@rddim Autodetect character encoding enabled?

@andrecool-68 Yes, default settings, thats why it is Windows-1251 but not ANSI

@rddim And I don’t understand anything at all)))

Imgur

@andrecool-68 @donho
In x32 when the word begin with small letter Auto-Completion doesn't work for it, but not always (new 3). I think the other issues are with Autodetect character encoding

npp_x32_cyr1

Edit: the issue with Auto-Completion exist in 7.6.6 x32

@rddim The fourth tab was obtained Hebrew))
what plugin do you use to insert the finished text?

@andrecool-68 external clipboard manager - CLCL

@rddim
How can I disable these lines, I can not find it in the settings

Imgur

@andrecool-68
Write me an email, you can find it in bulgarian.xml. I can answer you here of course, but this is an N++ issue, not for other programs :)

can somebody help me too?
I think my issue is similar,
I posted in npp community (in 7.7 version's thread) but I wasn't able to receive the support I requested, I never figured out why..

anyway,
I'm re-posting here:

I’m receiving sql queries by email that I open with npp and
I then copy the file content to sql manager program in order to execute them,
well, with 7.7 Greek characters are displayed like Chinese, I rolled back to 7.6.6 and they display properly

"autodetect character encoding" is disabled for me,
following another issue I had in the past with it on,
so I would prefer it to stay off

edit: I tried the above scilexer (32bit):
https://notepad-plus-plus.org/temp/cyrillacPb/SciLexer32.signed.dll
and it worked for me too!

@patrickdrd @rddim @andrecool-68
Could you guys test the following binaries and confirm me the bug is fixed and without regression please (with auto-completion enabled)?
32 bits:
https://notepad-plus-plus.org/temp/cyrillacPb/npp.7.7.bin.zip
64 bits:
https://notepad-plus-plus.org/temp/cyrillacPb/npp.7.7.bin.x64.zip

I've just tested the 32-bit and it looks good,
even though it will be some hours till I will be able to test the same scenario

ok, original scenario works too

@donho
There were no problems when entering Cyrillic characters.
When you reopen the file (ANSI) the initial characters disappear.
(ANSI turns into Macintosh)
Most likely to blame "Auto-detect character encoding"
(Autodetect character encoding and auto-completion ...enabled)

000

@andrecool-68 I need your confirmation after testing the both binaries I provide, for the ANSI mode input issue. I saw your post and this problem is another one. Let's fix problem one after another, otherwise you just confuse everybody and it doesn't help at all !

@donho Maybe my words you do not understand ... excuse me, but I write through translate.google))
I wanted to say that I tested both files, and the Cyrillic alphabet was printed without errors.
And the fact that the artifacts when re-opening the file ... I myself do not know ... this is the continuation of the old or new error.
What I notice when testing ... then I inform you

@donho
I compiled the debug build Notepad ++ with "scintilla416" (with the same changes),

// case SC_CHARSET_DEFAULT: return documentCodePage ? documentCodePage : 1252;
case SC_CHARSET_DEFAULT: return documentCodePage;

the automatic definition of the encoding is disabled ... then there are no problems with Cyrillic.

If "automatic encoding detection" is enabled, then exactly the same artifacts with the encoding as I wrote in the previous post.
Безымянный

Maybe this will help you in some way.

If "automatic encoding detection" is enabled, then exactly the same artifacts with the encoding as I wrote in the previous post.

So it's the issue of "automatic encoding detection". Please create a new issue for that.

@donho sorry for the late answer, I was very busy and away from home.
The fix works perfect. Thanks

this problems still persists I'm afraid guys,
a text document with greek characters isn't displayed properly on my windows 7 32-bit desktop,
while it shows fine on windows 8 64-bit and if using another editor as well

edit: maybe it's a different problem because I rolled back to 7.6.6 and it was still there,
I even tried to write in greek in npp but it seems impossible!
I can type in greek in notepad (windows), but it seems I can't in npp

Was this page helpful?
0 / 5 - 0 ratings