terminal 🚀 - Feature Request: sixel graphics support

While implementing Sixel, it is important to test with images that contain transparency.
Transparency can be achieved by drawing pixels of different colors but not drawing some pixels in any of the Sixel colors, leaving the background color as it.
I believe this is the only way to properly draw non-rectangular Sixels, and would be especially nice with the background acrylic transparency in the new Windows Terminal.

Testing using WSL with Ubuntu for example, in mlterm such images are properly rendered as having a transparency mask and the background color is kept, while in xterm -ti vt340, untouched pixels are drawn black, even though the background is white, which seems to imply they render sixels on a memory bitmap initialized as black without transparency mask or alpha before blitting them into the terminal window.

PhMajerus on 7 May 2019

👍2

OOh. Sixel is very cool stuff.

I've decided that I need that. NEED.

fearthecowboy on 7 May 2019

👍11 😄2

I'll happily review a PR :)

zadjii-msft on 7 May 2019

😄6 👍4

Caught the Build 2019 interview today that mentioned this request. I still maintain that Xorg on sixel is just wrong. So _very very wrong_.

The ffmpeg-sixel "Steve Ballmer Sells CS50" demo never gets tired tho. Gotta say, it is a little disappointing the video lacks sound (sound really makes the video). Consoles already have sound, naturally. They totally beep. Precedent set. What we really _need_ is a new CSI sequence for the opus clips interleaved with the frames, amirite?

therealkenc on 9 May 2019

Ken, I truly deserve this for mentioning Sixels ;)

From: therealkenc notifications@github.com
Sent: Wednesday, May 8, 2019 4:31:31 PM
To: microsoft/Terminal
Cc: Subscribed
Subject: Re: [microsoft/Terminal] Sixel graphics support (#448)

Caught the Build 2019https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmybuild.techcommunity.microsoft.com%2Fhome%23top-anchor&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=i8rfPCaN%2FxqdF%2F4qRtdN2Py4%2BVRlbPgpwJWtPZSGGHc%3D&reserved=0 interview today that mentioning this request. I still maintain that Xorg on sixel is just wronghttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2FWSL%2Fissues%2F1099%23issuecomment-248513013&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=J%2BwCnn0z70FkI9lDcus1nMXcKz1P0ArL%2Bmdz5oi9uDo%3D&reserved=0. So very very wrong.

The ffmpeg-sixelhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsaitoha%2FFFmpeg-SIXEL&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=G%2F9mvw1EdADkwChSbHZ%2FI54k9xvXagV%2FxD9VbJtyw7g%3D&reserved=0 "Steve Ballmer Sells CS50" demohttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D7z6lo4aq6zc%26feature%3Dyoutu.be&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=6IVwBHs6%2F43rXdk6GabiSUpTFS86xUGB6bubfkS3ea0%3D&reserved=0 never gets tired tho. Gotta say, it is a little disappointing the video lacks sound (sound really makes the videohttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DEl2mr5aS8y0&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=Mm1ICN5KcgrP5YmdAZsUCzUKbVQDtxFE1qAEpkhKiZk%3D&reserved=0). Consoles already have sound, naturally. They totally beep. Precedent set. What we really need is a new CSI sequencehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FANSI_escape_code%23CSI_sequences&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=29pJq5661TXtnn2huLyUMgebTyYMEhTKXpAm19jzqHU%3D&reserved=0 for the opushttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FOpus_(audio_format)&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=XOq6Acz4%2B7gQeTKQBQ2fYJPnoLvx6vUjmLRhgOX1eDo%3D&reserved=0 clips interleaved with the frames, amirite?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2FTerminal%2Fissues%2F448%23issuecomment-490688164&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=pnXPvsuGF7l5mQfU2htzFwJnqZjEuW4zNuh1HaBJnKM%3D&reserved=0, or mute the threadhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADNHLGXQOYKINZMIBKTB4LTPUNPFHANCNFSM4HLENFOQ&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=%2F4pMmm7bvPa%2BbFmE1gyN8%2BoTZDKJyRksBrkJpDh%2BLug%3D&reserved=0.

DHowett-MSFT on 9 May 2019

❤3

Related: #120

DHowett-MSFT on 9 May 2019

👍1

Need.

needthis

chadbr on 9 May 2019

👍4 👎1

LOL I was watching the stream and I just thought to myself "here's my boss assigning me work live in front of a studio audience".

zadjii-msft on 9 May 2019

😄1

Please make this a priority for v1.0!

WSLUser on 10 May 2019

👍2

3d animations can be v1.5 😛

WSLUser on 10 May 2019

OMG

gh28 on 15 May 2019

Upvoting this request, Sixels would be such an amazing thing to have in the Terminal.

lofcz on 7 Jun 2019

This weekend I finished implementing sixel read support for my MIT-licensed Java-based TUI library, and it was surprisingly straightforward. The code to convert a string of sixel data to a bitmap image is here, and the client code for the Sixel class is here.

I have done very little for performance on the decoder. But when using the Swing backend, performance is still OK, as seen here. (The snake image looks bad only because byzanz used a poor palette creating the demo gif.) I was a bit taken aback how quickly it came together. It's very fair to say that the "decode sixel into bitmap" part is the easy bit, the hard bit is the "stick image data into a text cell, and when that is present blit the image to screen rather than the character".

Just want to mention it to other folks interested in terminal support for sixel, and hoping it could help you out.

klamonte on 5 Aug 2019

👍2

I'll upvote if someone else writes a Jupyter notebook client ;)

kfarmer-msft on 6 Aug 2019

👍2

We already have an example of Sixel support in mintty which is written in C (vice java). Only thing needed is a refactor to C++ (at least for initial support). Still always good to see how it's been implemented in other projects.

WSLUser on 6 Aug 2019

😄1

We already have an example of Sixel support in mintty which is written in C (vice java). Only thing needed is a refactor to C++ (at least for initial support). Still always good to see how it's been implemented in other projects.

Any issues with mintty's license (GPLv3 or later)?

klamonte on 6 Aug 2019

https://github.com/mintty/mintty/blob/master/LICENSE

WSLUser on 6 Aug 2019

https://github.com/mintty/mintty/blob/master/LICENSE

From that link:

Sixel code (sixel.c) is relicensed under GPL like mintty with the
permission of its author (kmiya@culti)

If you transliterate that exact code to C++, the derivative work would need to be licensed GPLv3 or later, as per its terms, or not distributed at all. (One could also ask kmiya@culti if they are willing to offer sixel.c under a different license, or if it was once available under something else find a copy from that source.)

I don't know what is acceptable or not for inclusion in Windows Terminal -- my quick glance at Windows Terminal says it is MIT licensed, so depending on how it is linked/loaded using a direct descendant of mintty's GPLv3+ sixel.c could lead to a license issue.

Anyway, sorry to be bugging someone else's project here, heading back to the cave now...

klamonte on 6 Aug 2019

There is a sixel capable, humble terminal emulator widget written in C/C++ for Windows/Linux, and it has a SixelRenderer class which you can use, (though it needs some optimization), and it has a BSD-3 license. Arguably its biggest downside is that it is written for a specific C++ framework. Still, IMO the SixelRenderer's code is translatable with little effort. (I know this because I am its author. :) )

https://github.com/ismail-yilmaz/upp-components/tree/master/CtrlLib/Terminal

ismail-yilmaz on 20 Aug 2019

😄2 👍2

While implementing Sixel, it is important to test with images that contain transparency.
Transparency can be achieved by drawing pixels of different colors but not drawing some pixels in any of the Sixel colors, leaving the background color as it.
I believe this is the only way to properly draw non-rectangular Sixels, and would be especially nice with the background acrylic transparency in the new Windows Terminal.

Testing using WSL with Ubuntu for example, in mlterm such images are properly rendered as having a transparency mask and the background color is kept, while in xterm -ti vt340, untouched pixels are drawn black, even though the background is white, which seems to imply they render sixels on a memory bitmap initialized as black without transparency mask or alpha before blitting them into the terminal window.

hmm. the VT340 i'm in front of honors the P2 parameter in the DCS P1 ; P2 ; P3 ; q sequence that initiates the SIXEL sequence. Xterm, on the other hand, seems to ignore it. But if you use the raster attributes sequence ( " Pan ; Pad ; Ph ; Pv ) and give it a height and width, it will clear the background so you get a black pixel.

i was thinking about getting the free trial of the ttwin emulator and checking out how it's behviour differs from the VT340 and the Xterm acting as a VT340.

But... +1 on the idea of supporting SIXEL in general and +10 for the idea of coming up with compatibility tests.

OhMeadhbh on 18 Jan 2020

We could add support for iTerm2 Inline Images Protocol once we are there... At least it should be easier to implement, it just only need a path to the image and does everything on its own.

One doubt I have with both systems is, what happens with aligment? If images width or height are a multiple of chars width or height everything is ok, but if not, should a padding be added only in lower and right sides, or should image be centered adding padding to all sides?

piranna on 18 Jan 2020

👍5 ❤2

Hey here are some relevant links for research:

"Basics for a Good Image Protocol" on terminal-wg (and a linked earlier discussion)
this massive thread about sixel support

zadjii-msft on 4 Jun 2020

👍1

We could add support for iTerm2 Inline Images Protocol once we are there... At least it should be easier to implement, it just only need a path to the image and does everything on its own.

That probably should be a different task. Sixel and ReGIS are explicitly for in-band graphical or character data. I'm not saying it's a bad idea, I'm just saying it should be treated as a different feature.

One doubt I have with both systems is, what happens with aligment? If images width or height are a multiple of chars width or height everything is ok, but if not, should a padding be added only in lower and right sides, or should image be centered adding padding to all sides?

Alignment of Sixel and ReGIS graphical data is described (poorly) in various manuals. Sixel images are aligned on character cell boundaries. If you want a black border around an image, you have to add those black pixels yourself; there's no concept of anything like HTML's margin or padding. Each line of sixel data describes a stripe six pixels high. If you're trying to align sixel image data with text characters on a terminal emulator, this can be frustrating as the software generating the sixel data may not know how many pixels high each character glyph is. If you have an old-school xterm handy, you can see this by starting it up in vt340 mode, specifying different font sizes (to give you different character cell sizes) and then printing out some sixel data that tries to align image data with text data. (Here's a simple test file that looks correct when I tell the font server to use 96DPI and I specify a 15 point font. Modifying the font size causes images to increasingly come out of alignment with the text. https://gist.github.com/OhMeadhbh/3d63f8b8aa4080d4de40586ffff819de )

The original vt340s didn't have this problem because (of course) you didn't get to specify a font size when turning the terminal on.

The other thing you can see from that image, that isn't well described in the sixel documentation is that printing a line of sixel data establishes a "virtual left margin" for the image data. If you do the moral equivalent of a CR or CRLF using the '$' or '-' characters, the next line is printed relative to this virtual left margin, not the real left margin at the left side of the terminal.

Hope this helps.

OhMeadhbh on 4 Jun 2020

Finally scrolling back to read this. Sorry for the tardy reply.

Testing using WSL with Ubuntu for example, in mlterm such images are properly rendered as having a transparency mask and the background color is kept, while in xterm -ti vt340, untouched pixels are drawn black, even though the background is white, which seems to imply they render sixels on a memory bitmap initialized as black without transparency mask or alpha before blitting them into the terminal window.

It shouldn't be too hard to support transparency in xterm. I've been digging around in the code for other reasons. I fear that someone, somewhere is depending on this behaviour of Xterm so would recommend putting it behind a compatibility flag, which also should be straight-forward. But then there's the question of the default value. What should be the default? Black or transparent.

Do we know what the original VT240, 241, 330 and 340's did? Could I suggest trying to faithfully represent the experience of an actual VT as the default behaviour? You could test this by printing inverted space characters, then layering sixel graphics above them and seeing what color unspecified pixels render as.

I don't know that I care too much what the default is for the msft terminal as long as there's the capability of behaving like Xterm emulating a VT340. The code I've written to do loglines over ssh in the terminal sort of assumes the "unspecified pixels are black" behaviour described above. I'd have to rewrite that code if we make this change.

OhMeadhbh on 4 Jun 2020

If you're trying to align sixel image data with text characters on a terminal emulator, this can be frustrating as the software generating the sixel data may not know how many pixels high each character glyph is.

The original vt340s didn't have this problem because (of course) you didn't get to specify a font size when turning the terminal on.

Is there any reason why a terminal emulator couldn't just scale the image to exactly match the behaviour of the original DEC terminals? So if the line height on a VT340 was 20 pixels, then a image that is 200px in height should cover exactly 10 lines, regardless of the font size. That seems to me the only way you could remain reasonably compatible with legacy software, which is kind of the point of a terminal emulator.

I can understand wanting to extend that behaviour to render images at a higher resolution, but that should be an optional extension I think (or just use one of the existing proprietary formats). So ideally I'd like the default for Sixel to be as close as possible to what you would have gotten on an actual DEC terminal.

j4james on 4 Jun 2020

Hey here are some relevant links for research:
"Basics for a Good Image Protocol" on terminal-wg

Sixel is broken because it cannot be supported by tmux with side-by-side panes.

therealkenc on 5 Jun 2020

👍6

font-resize

therealkenc on 5 Jun 2020

👍4

It took some work (actually a lot of work), but with sixel one can perform nearly all of the "images in a terminal" tricks one can image:

Layered per-cell-masked images in a terminal: https://jexer.sourceforge.io/images/sixel_many_images.png
A floating (multiplexed) terminal window in a terminal that is using sixel for VT100-style double-width support: https://jexer.sourceforge.io/screenshots/jexer_sixel_in_sixel.png
"tmux-style" tiled terminals with images: https://gitlab.com/klamonte/jexer/-/wikis/uploads/7603381f82414ef9ae214bfcf759c064/example_tilingwm2_1.png
Multi-headed shared terminal session with differing text cell sizes showing the same plot: https://jexer.sourceforge.io/screenshots/multiscreen_2b.png
The use of sixel to render CJK and emoji that are not present in the main terminal's font: https://jexer.sourceforge.io/screenshots/xterm_sixel_cjk.png

I have included some other remarks at the referenced "good" protocol thread that might be of interest.

If nothing else, sixel is a good stepping stone to working out the terminal side infrastructure of mixed pictures-and-text. Speaking from direct experience, the terminal side (storing/displaying images) is about 1/4 as hard as the multiplexer/application side (tmux/mc et al).

klamonte on 8 Jun 2020

👍3

sixels are indeed the ideal solution for in-band graphics (for example over ssh): as they are supported by many existing tools, they are ready to use for practical purposes like plotting timestamp sync issues on the go.

As illustrated by therealkenc and further explained by klamonte in 640292222 everything can be handled with sixels, even side-by-side images, but it requires some work.

A while ago I was working with a few other people on a fallback mode for tmux, using advanced unicode graphics to represent sixel images in terminals that do not support sixel.

It is a bit like automated ANSII art, taking advantage of special block characters that are present in most fonts: this equivalent color unicode representation could be substituted for the sixels, then later overwritten by the actual sixel image (or not!). It would also solve the problem of keeping all the sixel pictures for scrolling back, by substituting them with low fidelity unicode placeholders (for ex to save memory), and having placeholders for sixel images when they can't be displayed for whatever reason.

The code was public domain. It could be usable immediately as a first step towards sixel support:

detect when sixels sequence are transmitted, then compute the unicode text replacement
diplay this unicode sequence, which is already supported by Windows Terminal
later, when sixels are implemented, render on top the sixel sequence.

Would you be interested?

BTW I recognize here my familiar gnuplot x^2 sin and 10 sin(x) plots I'm happy it provided some inspiration 😄

csdvrx on 25 Jun 2020

👍2

Please.

QvQQ on 28 Aug 2020

@DHowett Is acac350 a first step toward actually rendering sixel graphics? I'm getting requests for sixel support in Microsoft Terminal from folks using ssh and wanting to view directories of images using my lsix program.

hackerb9 on 10 Sep 2020

Sorta. We now have the ability to handle incoming DCS sequences. We haven't hooked up any handlers yet, but having the infrastructure to do so was pretty important. :smile:

DHowett on 10 Sep 2020

👍6

Here's some updates. I have a working branch here. An early screenshot looks like this:

Contrary to what I originally thought, the most difficult part of rendering sixel images is actually the conpty layer. Sixel images are supposed to be inline objects. The rendering of sixel images depends on the rendering size of a character. However due to the extra conpty layer we actually can not get the rendering size of a character when processing sixel sequences. This sounds very abstract and vague. Anyone who's interested in this can checkout my branch and see how it's done.

Overall, the conpty layer makes it very difficult to handle scrolling and resizing of sixel images. In my branch it works if you only need to display it. But both scrolling and resizing are completely broken.

skyline75489 on 9 Oct 2020

❤5 👀3

Didn't look yet but can you use pass-through mode to implement in Terminal itself? I would still add it in OpenConsole but sounds like sharing code isn't possible. Since Windows Terminal needs to be decoupled from OpenConsole at some point, you're best off simply duplicating the code for both. Also are you basing it on yours and j4james PRs for parameters? That would likely help as well.

WSLUser on 9 Oct 2020

@WSLUser Thanks for the attention. This screenshot is actually from about a month ago, when the fantastic parameters PR from j4james does not even exists. My work is entirely inside Windows Terminal, not conhost. I showed this PR to the Console team internally and made some progress since then. But I'm stuck because of the conpty problem.

skyline75489 on 9 Oct 2020

Yeah I'd rebase off of master and add https://github.com/microsoft/terminal/pull/7578 and https://github.com/microsoft/terminal/pull/7799. From there, maybe see what's missing in ConPTY for pass-through mode. I wonder Mintty is using pass-through for ConPTY mode.

WSLUser on 9 Oct 2020

I wonder Mintty is using pass-through for ConPTY mode.

Pretty sure mintty isn't using conpty at all 😜

The trick here with conpty is that the console (conpty) will need to know about the cells that are filled with sixel contents, as to not accidentally clear that content out from the connected Terminal. Maybe conpty could be enlightened to ignore painting cells with sizel graphics, and just assume that the connected Terminal will leave those cells alone.

That might mess up some of our optimizations (like we can't EraseLine rows that have sixel data), but it might be a good enough start

\

zadjii-msft on 9 Oct 2020

Maybe conpty could be enlightened to ignore painting cells with sizel graphics, and just assume that the connected Terminal will leave those cells alone.

This had been my original plan as well, and it may well be the best solution with the current conpty architecture, but there are a number of complications.

How would this interact with DCS streaming (which I don't think we've even got a solution for yet). I'm assuming we'd need some kind of split stream concept that passed the byte stream through to conpty at the same time as it's sent to the conhost buffer, but that seems like it would add a lot of unnecessary overhead to the process.
This would only work if you know the pixel cell size of the conpty terminal. I've mentioned before I think the best solution for Sixel is to match the cell size of the original VT terminals, and if we were doing that this wouldn't be an issue. However, as far as I'm aware, no other terminal emulators do that, so it wouldn't work with anyone else.

j4james on 10 Oct 2020

The second issue @j4james brought up becomes even more complicated with the consideration of different font, different font size and font resizing. So generally I think there's 3 aspects of the issue:

First conpty will need to know about the cells that are filled with sixel contents, Without this, the backing buffer in conpty and the drawing buffer in WT will be inevitably out of sync.
In order to do that, conpty will need to know pixel cell size in the drawing context, which is handled by the drawing layer in WT. There is a huge gap between conpty and the actual DXRenderer, which makes this a difficult task.
Besides, when the font or the font size changes, ideally the sixel image should change correspondingly.
And finally deal with other things like pane, alternative buffer, differential drawing, scrolling, etc.

skyline75489 on 10 Oct 2020

The second issue @j4james brought up becomes even more complicated with the consideration of different font, different font size and font resizing. So generally I think there's 3 aspects of the issue:

Just to be clear, my point was that none of that would be a problem if we exactly matched the behaviour of a VT340, so a 10x20 pixel image would occupy exactly one character cell, regardless of font size. It's only an issue if we want to match the behaviour of other terminal emulators, and that could always be an option that is left for later. There would still be complications with this approach, but I personally think they're less of a concern.

My bigger concern is that you seem to be ignoring the DCS streaming issue, which I expect could fundamentally change the architecture of the solution. The steps I would like to have seen are: 1. Resolve #7316; 2. Agree on a solution for cell pixel size; 3. Get something working in conhost; 4. Once all the complications are worked out in conhost, only then consider how we make it work over conpty.

j4james on 10 Oct 2020

Sorry for leaving the DCS streaming issue. In my current implementation I just store the entire string and pass it to the engine. This introduces performance issue when the sequence is larger. But at least it works. So my comments above are largely based on it.

But you are right. The DCS streaming issue is actually the top priority if someone else want to get their hands dirty on this.

获取 Outlook for iOShttps://aka.ms/o0ukef

skyline75489 on 10 Oct 2020

Per discussion in https://github.com/microsoft/terminal/issues/57, I thought conpty doesn't care about fonts at all?

wrt resizing I think the most natural way to do it is to "anchor down" the image into character cells once the image arrives, and re-calculate image size based on the anchor geometry. Anything else will cause inconsistency in image vs. character cells.

yatli on 13 Oct 2020

@yatli Yes. That's also what makes the issue tricky.

10x20 pixel image would occupy exactly one character cell

This is unfortunately wrong, at least for my current font setting.

Correct me if I'm wrong, but for pixel perfect image display, I think we do need to care about fonts.

skyline75489 on 13 Oct 2020

@skyline75489 pls see my updated comment about the "anchor"

yatli on 13 Oct 2020

The cell data structure needs to be updated as char | sixel anchor

The sixel anchor should contain information about:

A pointer to the image object
The char cell region it occupies, in floating numbers (e.g. 5.2 lines x 7.8 cols)

yatli on 13 Oct 2020

It's a good idea but the implementation details were killing me, due to the extra translation in conpty layer. To avoid spamming people with email, feel free to reach me on Teams @yatli if you're interested.

skyline75489 on 13 Oct 2020

😄1 👍1

10x20 pixel image would occupy exactly one character cell

This is unfortunately wrong, at least for my current font setting.

What I'm suggesting is that you should make that the case. If you create a 10x20 pixel image and output it on a real DEC VT320 terminal, it's going to take exactly one character (at least in 80 column mode). So if we're trying to emulate that terminal, then we should be doing the same thing. If your current font happens to be 30x60, then you need to scale the image up. If your font is smaller, then you scale the image down.

This guarantees that you can output a Sixel image at any font size and always get the same layout. If you want it to cover a certain area of the screen, or you want to draw a border around it with text characters, you know exactly how much space the image will occupy.

Correct me if I'm wrong, but for pixel perfect image display, I think we do need to care about fonts.

It's true that you're not going to get "pixel perfect" images this way, but I don't think that should be the primary goal. Many modern computers have high dpi displays where it's routine for images to be scaled up, so it's not like this is a strange concept. And if we want to keep the layout consistent when the user changes their font size, we're going to have to scale the image at some point anyway, so you might as well do it from the start and get all the benefits of a predictable size.

And of course the other benefit of doing things this way is that it could feasibly be implemented over conpty. I don't see how you can make conpty work if the area occupied by the image is dependent on the font size, which you can't possibly know.

I'm not going to pretend this approach won't have any downsides, but I think the positives outweigh the negatives.

j4james on 13 Oct 2020

What if the font has a different aspect ratio than 10:20?

PetterS on 13 Oct 2020

What if the font has a different aspect ratio than 10:20?

May I suggest reading this long - and somewhat "brutal"- discussion about the general problems regarding the inline images in terminal emulators.

It can give you the general idea.

Best regards

ismail-yilmaz on 13 Oct 2020

What if the font has a different aspect ratio than 10:20?

The image may be a bit stretched or squished, but I don't think that's the end of the world.

Let me demonstrate with a real world example. Imagine I'm a Bond villain, and I've got an old security system using a VT340 as the frontend. Now because of the coronavirus, I'm in lockdown and working from home, so I'm logging into the system remotely with Windows Terminal. If we exactly match the VT340 this is no problem - the terminal looks like this:

But maybe I prefer fonts with a weird aspect ratio. So let's see what it would look like with _Miriam Fixed_, which is wider than most. The image of Bond now looks a bit squished, but he is still easily recognisable.

The alternative would be to go with a pixel perfect image (not currently feasible with conpty, but let's pretend for a second). Bond no longer looks squished, but now the image is only a fraction of the size it was expected to be. And the higher the resolution of your monitor, the worse this is going to look.

Maybe this is a matter of personal preference, but I know I'd definitely choose option 1 over option 2.

Also note that there is no reason we couldn't have options to tweak the exact behaviour when the font aspect ratio isn't 1:2. One option could be to center the image within the cells it was expected to occupy. Or we could expand the image so it covers the full area, but clip the edges that overflow the boundaries. Any of these choices would be better than an exact pixel rendering in my opinion.

j4james on 13 Oct 2020

❤3 😄3 👎1

Maybe this is a matter of personal preference, but I know I'd definitely choose option 1 over option 2.

Me too, just only it would be better to know the font has a different aspect ratio, so image can adjust itself and keep the correct one.

One option could be to center the image within the cells it was expected to occupy. Or we could expand the image so it covers the full area, but clip the edges that overflow the boundaries

I think it's better to center them.

piranna on 13 Oct 2020

Maybe I'm misreading this thread. Are we actually talking about the terminal faking 10:20 characters for sixel image? I think that will cause many problems like the Bond distortion. Doing it the right way may be more difficult, but, in my humble opinion, a modern terminal should be font agnostic and leave it up to application programmers to deal with sixels and character cells.

Using escape sequences a user run program can determine the character cell size in pixels and decide how to intelligently deal with distortion for that application. The image viewing program I use works exactly like that. As I change font family or size, the displayed thumbnail updates to always be precisely five text lines high. The width is scaled proportionally for the image, unless it would be larger than a certain (in this case, rather large) maximum. By basing the image size on the character cell, it works automatically on high-DPI screens.

While the VT340 is a noble goal to emulate, fixing character cell resolution at 10:20 (and thus limiting resolution for the entire screen) is a mistake. The VT340 was only one of several sixel implementations, so its font size isn't necessarily more correct.

Forcing 10:20 will also lead to ugly kludges. (E.g., how to respond to a request for the size of the terminal window in pixels. Tell the truth, presuming they'll be positioning windows on the screen? Or, always return 800x480, presuming the user is scaling images for sixel output?)

hackerb9 on 13 Oct 2020

👍1

Are we actually talking about the terminal faking 10:20 characters for sixel image?

Yes.

a modern terminal should be font agnostic

This proposal is font agnostic. The application doesn't need to know anything about the font. That's the whole point.

Using escape sequences a user run program can determine the character cell size in pixels and decide how to intelligently deal with distortion for that application.

I'm not exactly sure what method you're using, but the way I've seen this done before is with a proprietary XTerm query to get the window pixel size, and another query to get the window cell size, and then using that data to calculate the actual cell pixel size. The downsides of such an approach are:

It's proprietary, so wouldn't work on a real terminal, or any terminal emulator that exactly matched a real terminal.
If the user changes their font size while your application is running, then your calculations will no longer be correct, and images will be rendered at the wrong size (unless you're continuously recalculating the font size which seems impractical).
If the user has a high resolution display, and/or large font size, you're forced to send through a massive image to try and match that resolution. Considering how inefficient Sixel is to start with, that can amount to a lot of bandwidth.

That said, I understand that this is a mode that some people may wish to use, and I think we should at least have an option to support it one day (for reasons discussed above, this just isn't possible at the moment). But in my opinion, this is not the best approach for Sixel.

j4james on 14 Oct 2020

👎1

I have 300+ VT340's in nuclear power plants that I would like to eventually
replace.

There are commercial terminal emulation packages we could use, but I think
all but one have been EoL'd.

We have replaced some of them with Linux PCs running XTerm (or less
frequently, Win10 + Hummingbird + WSL running XTerm), because it has a
half-way decent open source sixel implementation and a sort of bad, but
open sourced ReGIS implementation.

The likelihood that we will be writing new software for the part of this
system that generates the sixel octet stream is NIL.

If your objective is to send graphics over an inline octet stream, there
are other options. But if you want to support sixel graphics, you should
support sixel graphics in a way that is halfway similar to previous
implementations. This, unfortunately, means you should emulate the
behaviour of exemplar systems (i.e. VT240, VT241, VT330 and VT340
terminals) even when it comes to integrating graphics with text.

This is a mock-up of the kind of thing I'm talking about. It would be very
nice if any new Sixel implementation maintains compatibility with existing
implementations so images do not run off the edge of the screen or only
fill half the screen.

https://vimeo.com/user32814426/review/467991744/ac5892fa7e

OhMeadhbh on 14 Oct 2020

👍4 😄1

a modern terminal should be font agnostic

This proposal is font agnostic. The application doesn't need to know anything about the font. That's the whole point.

I meant the _terminal_ should be font agnostic instead of imposing 10:20 on every font. The application should be able to know the actual font size, if it wishes, since it's the application that knows the domain of what it is trying to show and can figure out the best way to present text and graphics together.

Using escape sequences a user run program can determine the character cell size in pixels and decide how to intelligently deal with distortion for that application.

I'm not exactly sure what method you're using, but the way I've seen this done before is with a proprietary XTerm query to get the window pixel size, and another query to get the window cell size, and then using that data to calculate the actual cell pixel size.

Yup, that's about right. There's also a query to directly get the character cell size, but I don't think that's as widely supported as just getting the screen size and dividing by ROWS and COLUMNS.

The downsides of such an approach are:

1. It's proprietary, so wouldn't work on a real terminal, or any terminal emulator that exactly matched a real terminal.

That's not a downside. It only means the program has to fall back on doing what it would have done anyway: presume $TERM=="VT340" means character cells are 10:20, "VT240" means 10:10, "mskermit" means 8:8, and so on.

Also, it's not an xterm proprietary sequence. Getting the screen size is called a "dtterm" escape sequence, but it was actually first implemented in SunView (SunOS, 1986). I believe it was later documented in the PHIGS Programming Manual (1992). Try sending "\e[14t" to a few terminal emulators and you'll see it is widely implemented.

2. If the user changes their font size while your application is running, then your calculations will no longer be correct, and images will be rendered at the wrong size (unless you're continuously recalculating the font size which seems impractical).

This is not a problem. The program simply traps SIGWINCH and only recalculates if the window has actually changed.

3. If the user has a high resolution display, and/or large font size, you're forced to send through a massive image to try and match that resolution. Considering how inefficient Sixel is to start with, that can amount to a lot of bandwidth.

Yes, sixel is extremely inefficient. But on modern computers, sending full screen images is quite usable, even over ssh. Does the Microsoft Terminal have some sort of baudrate limitation?

By the way, I believe sixel does have a "high DPI" mode where every dot is doubled in width and height. I've never used it and I don't think xterm even implements it, but perhaps that would alleviate concerns about bandwidth.

That said, I understand that this is a mode that some people may wish to use, and I think we should at least have an option to support it one day (for reasons discussed above, this just isn't possible at the moment).

This "mode" is simply having characters and graphics aligned just like the various historical sixel terminals did and current emulators do. I admit, I don't understand why it is not possible to do the same in Microsoft Terminal. If you say this 10:20 kludge is the best that can be done, I will trust that you are correct and thank you for doing it. A distorted picture is much better than nothing.

hackerb9 on 14 Oct 2020

Using escape sequences a user run program can determine the character cell size in pixels and decide how to intelligently deal with distortion for that application.

@hackerb9, what's the actual escape sequence to get the font dimensions?

piranna on 14 Oct 2020

The relevant XTerm sequences can be found here: https://invisible-island.net/xterm/ctlseqs/ctlseqs.html -- look for XTWINOPS.

Additionally, on Unix you can typically get the terminal's internal pixel size along with the cell size using the TIOCGWINSZ ioctl. With openssh this works remotely too.

Just as a data point, the sixel branch for libvte is taking the cell size-agnostic route @hackerb9 is talking about. It treats incoming sixel data as "pixel perfect" and rescales previously received images across zoom levels and font sizes to cover a consistent cell extent. When merged, this implementation will be available to a large share of Linux terminal emulators, including GNOME Terminal, the XFCE Terminal, Terminator, etc. Superficially this seems to be interoperable with at least XTerm and mlterm.

Since libvte records a per-image virtual cell size, it'd be trivial to make this work with a fixed virtual 10x20 cell size too for interoperation. However, we'd need a way for programs to communicate their expected pixel:cell ratios to the terminal (e.g. by extending the DCS parameters). That could be very useful in general, since it'd also provide a form of pixel density control in bandwidth-constrained environments, as you touched on above.

hpjansson on 14 Oct 2020

Additionally, on Unix you can typically get the terminal's internal pixel size along with the cell size using the TIOCGWINSZ ioctl. With openssh this works remotely too.

Linux console returns always 0... they should fix that, though, but seems are not willing too :-/

piranna on 14 Oct 2020

Terminal: Feature Request: sixel graphics support

Most helpful comment

All 58 comments

Related issues