Js-beautify: "unformatted" paradigm broken, "unformatted" and "inline" are not the same

Created on 13 Jan 2016  ·  6Comments  ·  Source: beautify-web/js-beautify

The whole approach js-beautify takes in determining what to wrap is conceptually broken, if I understand it correctly.

Let's say I have the following HTML:

    <li>Install at least one Git client. <span class="tip">Recommended: <a

         href="https://www.sourcetreeapp.com/">SourceTree</a> (Windows, Mac)</span>all Git commands.</li>

Note all that blank space and newlines inside the <a> tag attributes. If I put span as one of the unformatted elements, then js-beautify doesn't change anything! It leaves all that ugly whitespace!

If on the other hand I remove span from the unformatted elements, then js-beautify gives me this:

    <li>Install at least one Git client.
        <span class="tip">Recommended: <a href="https://www.sourcetreeapp.com/">SourceTree</a> (Windows, Mac)</span>all Git commands.</li>

Note that js-beautify now adds a newline before <span>! This too is completely incorrect --- <span> is an inline phrasing element, and there is no need to break here.

The problem seems to be that js-beautify is using the idea of "unformatted" to be equivalent to "inline" element. This is the wrong approach and does not match HTML/CSS semantics!! The idea if block vs inline is completely orthogonal to whether elements should be wrapped.

js-beautify should determine whether an element is block or inline. Block elements should have newlines before them; inline elements should not.

The idea of "unformatted" is a distinct concept; it should tell js-beautify to leave it alone, regardless of whether the element is block or inline.

As it is now, there is no way to tell js-beautify that an element should _not_ have a newline, yet its contents should be formatted.

The entire design is broken. Please fix this fundamental issue. (Or explain to me that I am mistaken, and that js-beautify is not conflating the concepts of "block/inline" and "formatted/unformatted".)

html enhancement

Most helpful comment

I'm not sure you're getting what I'm saying. All elements should be formatted! It's just that some are inline, and some are block elements. This is how it has been with XML/HTML/CSS for a couple of decades now.

If it's a block element, then it creates a new block visually. These things, when formatting the HTML source code, almost always have a newline before them (both in the source code and in the rendering). An inline element does not have a newline before it (both in the source code and in the display). But that doesn't mean it its _contents_ shouldn't be formatted. (It _does_ mean that its contents shouldn't have newlines added, however, unless you're wrapping lines.)

So this is not a tweak. This is a recognition that the entire paradigm js-beautify uses for formatting is broken. (I'm not being mean---I'm just being objective, and trying to help you by explaining.) You need to distinguish among three different things:

  • "formatted" - Is this element formatted? (By default all elements should be formatted, unless someone turns it off.)
  • "block/display" - Is this a block or a display element? If it is a block element, it should have a newline _prepended_ before the beginning tag, and it should be indented to the current indent level. If not, it should have no newline in front of it.
  • "container" - Is this thing normally only used for containing other block elements? If so, then a newline should be generated _after_ the beginning tag, after the ending tag, and the contents should be indented at a new indent level. This is the most controversial, that is, this will vary most depending on users' tastes. One obvious candidate is <ul>; its content (e.g. <li>) should be indented. Some people will want to add <p> to this list so that the <p> tags will be on separate lines. Others (e.g. me) will not want them on separate lines.

I suggest you review the CSS box model and understand how block vs inline works. And then rewrite the js-beautify HTML formatting logic, taking the CSS box model into account (because the source code should reflect how the rendering eventually will be done).

By default nothing should be "unformatted". Everything should be formatted --- just formatted appropriately. "unformatted" should be a last-resort setting for a user to say, "hey, I don't like what js-beautify is doing, so just don't touch this element and let me deal with it.

All 6 comments

Your proposal of how it should work sounds reasonable. Please list which elements should be inline vs which should be unformatted. Then please implement and submit a pull request.

I'm not sure you're getting what I'm saying. All elements should be formatted! It's just that some are inline, and some are block elements. This is how it has been with XML/HTML/CSS for a couple of decades now.

If it's a block element, then it creates a new block visually. These things, when formatting the HTML source code, almost always have a newline before them (both in the source code and in the rendering). An inline element does not have a newline before it (both in the source code and in the display). But that doesn't mean it its _contents_ shouldn't be formatted. (It _does_ mean that its contents shouldn't have newlines added, however, unless you're wrapping lines.)

So this is not a tweak. This is a recognition that the entire paradigm js-beautify uses for formatting is broken. (I'm not being mean---I'm just being objective, and trying to help you by explaining.) You need to distinguish among three different things:

  • "formatted" - Is this element formatted? (By default all elements should be formatted, unless someone turns it off.)
  • "block/display" - Is this a block or a display element? If it is a block element, it should have a newline _prepended_ before the beginning tag, and it should be indented to the current indent level. If not, it should have no newline in front of it.
  • "container" - Is this thing normally only used for containing other block elements? If so, then a newline should be generated _after_ the beginning tag, after the ending tag, and the contents should be indented at a new indent level. This is the most controversial, that is, this will vary most depending on users' tastes. One obvious candidate is <ul>; its content (e.g. <li>) should be indented. Some people will want to add <p> to this list so that the <p> tags will be on separate lines. Others (e.g. me) will not want them on separate lines.

I suggest you review the CSS box model and understand how block vs inline works. And then rewrite the js-beautify HTML formatting logic, taking the CSS box model into account (because the source code should reflect how the rendering eventually will be done).

By default nothing should be "unformatted". Everything should be formatted --- just formatted appropriately. "unformatted" should be a last-resort setting for a user to say, "hey, I don't like what js-beautify is doing, so just don't touch this element and let me deal with it.

P.S. I already gave you a start in #840 indicating what constitutes an inline element in HTML5. I don't have the time right now to rewrite your entire formatting engine, though. If I do get that time, it would be more efficient just to write my own. So in the meantime, I'm trying to help you so that you can get yours working, which would be best for us both. Good luck, and let me know what further information I can provide.

I completely understand what you said originally. You believe there is a bug in the html beautifier. You would like to describe it as "the entire paradigm js-beautify uses for formatting [html] is broken". I didn't design or implement html beautifier module, and it has seen the least improvement in my time as maintainer of this project. It definitely needs attention and significant reworking, and so does the CSS beautifier for that matter.

The html beautifier does, however, do some things in a reasonably passable way despite it's flaws. And I question your evaluation that handling #840 in particular would require a "rewrite" or a change in "paradigm". It is an enhancement like any other. There are probably a number of ways to implement it, from an ugly bit of hackery, up to a full redesign and rewrite.

Whatever the scope, I think the change you've described is worth doing eventually but I already have a sizable list of concrete, well-described, highly-demanded bug fixes and features that have overall higher priority for the project. I must spend what little time I have for this project on those items. If you really believe this is worth fixing now, you will need to spend your time doing so. I welcome pull requests.

You believe there is a bug in the html beautifier.

No, I'm saying that there is no way to tell js-beautify not to put a line break _before_ an element without at the same time telling js-beautify not to format anything _inside_ the element. Is this true or is this not true?

You would like to describe it as "the entire paradigm js-beautify uses for formatting [html] is broken".

If js-beautify does not know the difference between "things that must not have line breaks added before them" and "things that I must ignore altogether", then yes, its paradigm is inadequate for even simple HTML. After all, I don't want a line break before <dfn> (because this is an inline element), but if the <dfn> contains an <img>, I don't want js-beautify to forego formatting the <img> tag. (This is a contrived example; it applies to any inline image.)

So you tell me. Can js-beautify format an <img> inside a <dfn> without putting a line feed before the <dfn>?

The point of what I'm saying is that "don't format the contents of this element" is different than "don't put a newline before this element." So let me ask you again. Does js-beautify understand the distinction between these two things, or does it conflate these concepts? If it does not understand the difference, its fundamental model is broken.

But sure, sometimes you can work around fundamental flaws. So I come back and ask you again: how would I work around this flaw? How would I tell js-beautify not to put a line feed before <code> but to still format an <img> element inside the <code>?

@garretwilson This is available in 1.8.0-rc2.

Was this page helpful?
0 / 5 - 0 ratings