Fish-shell: special-casing PATH/MANPATH/CDPATH is weird; we need a more general solution like zsh "tied" variables

Created on 11 Dec 2012  ·  52Comments  ·  Source: fish-shell/fish-shell

In expand.h:

/** Character for separating two array elements. We use 30, i.e. the ascii record separator since that seems logical. */
#define ARRAY_SEP 0x1e

/** String containing the character for separating two array elements */
#define ARRAY_SEP_STR L"\x1e"

This results in:

xiaq@blackie ~> set a (printf 'a\x1eb')
xiaq@blackie ~> count $a
2
xiaq@blackie ~> set a (printf 'a\x1fb')
xiaq@blackie ~> count $a
1

It's clear that the char \x1e is treated specially as element delimiter.

enhancement

All 52 comments

Is the concern that there's no way to represent \x1e? Or are you thinking about architectural improvements?

I think the array separator is mainly used in persisting arrays in places that only take strings, like universal variables or environment variables.

在 2012-12-12 上午2:09,"ridiculousfish" [email protected]写道

Is the concern that there's no way to represent \x1e? Or are you thinking
about architectural improvements?

I'd say I have both conerns in mind. For the former, just think about a
filename with embeded \x1e. POSIX says anything but \0 is allowed in
filenames, so it's perfectly possible. Using \0 as array delimiters may be
a slightly better choice, but that leads to the second concern - it's
fragile and plainly wrong.

I think the array separator is mainly used in persisting arrays in places
that only take strings, like universal variables or environment variables.

If persisting means "serialized", no. Arrays are always stored in
\x1e-delimited form. Exported environment arrays are joined by ":". They
are used in universal variables though - which IMHO should be implemented
with proper escaping instead.


Reply to this email directly or view it on GitHub.

How about using a private use area character as separator? Fish already uses some of those in some cases.

@JanKanis That's no better; it's perfectly possible in filenames (considering filesystems using non-utf8, native encodings) and other strings.

I would say that among others, "\0" is a _slightly_ better choice among others, since that's the only character UNIX forbids in filenames, but it still has the bad smell to me. Also, we are already doing a lot of splitting and assembling of arrays, I would expect implementing true arrays to result in both better architecture and less code.

Fish already handles private use characters and invalid bytes when it encodes external strings to wchars. These special values are encoded byte by byte into a specific set of private use chars that fish also decodes again on output, so in principle using another private use char could work. However I agree using true arrays is much better. There is one complication in that communication between fish and fishd happens over a socket using utf8 strings, and there fish uses (I think) the escape sequence "\x1e" (rather than an 0x1e byte) to separate array items. But that could probably be solved by using e.g. a private unused escape sequence.

I share xiaq's concerns, but (for the practical ones) I find the implicit splitting on \n way more offensive:

a@raspeball ~> count (printf 'a\x1eb')
1
a@raspeball ~> count (printf 'a\nb')
2

The (newline delimited) array interpretation of subprocess output should be explicit & optional!

But that has probably nothing to do with the underlying storage of arrays…

@xiaq: What's wrong with using \0? xargs seems to use it as its "reliable" option.

You can't use \0 in environment variables, so it would make it hard to export arrays to child shells.

I'm migrating from zsh where I use this sequence to define the $LESS env var in a sane fashion by leveraging its "tied" variables feature:

typeset -xT LESS less ' '
less=(
    --HILITE-UNREAD
    --LONG-PROMPT
)

I've omitted the complete list of options for brevity. In zsh that results in the $LESS env var being a space separated list of the options in the $less array. The equivalent in fish results in the elements being separated by the record separator (\x1e) character. Despite the documentation saying that the elements of the array will be separated by spaces (modulo the special arrays such as PATH). I have to explicitly do an assignment that interpolates the values into a single string to get the expected result:

set -x LESS --HILITE-UNREAD \
    --LONG-PROMPT
set LESS "$LESS"

At the moment I don't really care if \x1e is used internally for serializing arrays rather than \x00. I do care that exported arrays have their elements separated by \x1e. That's just broken, wrong, fubar. Pick your adjective. It's also inconsistent with the aforementioned workaround and documented behavior. This issue should be tagged as a bug IMHO.

P.S., Nowhere in the documentation is the use of the record separator character (\x1e) mentioned. Which is another problem.

@krader1961 Thanks for sharing this. There's no standard Unix convention for list-like environment variables - some are colon delimited, others are space delimited. fish uses \x1e so it can distinguish its own arrays.

Can you please point us at the erroneous documentation?

How do you think arrays be exported - colons, spaces, newlines, something else? Should fish tokenize environment variables on this character as well?

It looks like less expects space-delimited arguments. Probably the simplest workaround is set -x LESS '--HILITE-UNREAD --LONG-PROMPT', etc.

There's no standard for list like environment variables because by definition they're an arbitrary sequence of bytes composed of a key and value separated by an equal sign and terminated by a null byte. They don't even have to be printable characters. The only widely accepted convention for a higher level of abstraction is the one established by the execlp() function for the PATH env var.

The documentation is erroneous in as far as it makes no mention of using \x1E, \036, 30, or the "record separator" character to separate elements of an array when exporting a var with more than one element. The documentation does state that

..., and array variables will be concatenated using the space character.

That's from the section "Variable expansion" in http://fishshell.com/docs/current/index.html. It's reasonable to infer that statement also applies to exported vars that are not special-cased as documented in the "Arrays" and "Special variables" sections of that same document.

It's my feeling that fish should not automatically tokenize env vars into a list outside of the colon-delimited special case vars such as PATH. There should, however, be a robust means by which a user can tokenize a var into an array on an arbitrary character.

Absent a mechanism for configuring the character to be used on a var by var basis (ala the zsh "typeset -T" command) a space should be used when concatenating the elements of the array (again, excluding the colon separated special-case vars). Obviously this does not apply to private data stores such as the storage of universal variables.

Lastly, I couldn't find any uses in the standard fish functions where an env var is used to pass an array containing more than one element to another function or script. Such use cases may exist but it should require the scripts to explicitly cooperate in the serialization/deserialization of the data rather than rely on fish to implicitly reconstruct arrays from vars whose strings contain the record separator character.

Thanks for your thoughtful response. The section you quoted about concatenation using space is specifically for double-quoted strings. We should add some discussion of what happens with exported arrays.

Users can tokenize strings with e.g. set SOMEPATH (string split : $SOMEPATH).

The downside of space-concatenating exported variables is that they get changed when fish is run recursively. Today this works:

> set -x list 1 2 3
> count $list
3
> fish -c 'count $list'
3

But if we exported with spaces, this would show 1 for the recursive call. As you say we don't rely on this, but it's nice from a consistency standpoint.

Thanks for your thoughtful response.

I'll have to second that! Always nice to have a fresh perspective on things.

For those new to this discussion, I think I'll have to bring in some of the things that are related to this.

What comes to mind immediately is the listify whitelist, which shows up in issues like #2090.

This means that for $PATH, $CDPATH and $MANPATH, they'll appear as lists/arrays to fish, but when exported, will be joined with ":" again. Then a fish-inside-a-fish will split them again. This operates on colons, not \x1e. From my understanding of the code it appears to do it on every colon, with no chance for escaping, so it might break on $PATH entries with a colon inside them - which UNIX allows inside filepaths, though it seems broken for $PATH at least. This scheme is also used for e.g. PYTHONPATH and GOPATH.

I'd love to have something slightly more explicit for splitting environment variables than the implicit always-split-on-\x1e-except-for-these-three-split-them-on-colon, because this is actually two different schemes in one and exporting a list currently will always confuse everything but fish.

My preferred solution would be a function like splitenv var1 var2 var3:

function splitenv --no-scope-shadowing
    set -e IFS # string split doesn't have superpowers, so unset IFS lest we split on both : and \n
    for arg in $argv
        set -q $arg; and set $arg (string split ":" $$arg)
    end
end

(If string split had superpowers, this would be a bit simpler)

All lists would then be joined-with-colons when exported, so a user can explicitly unjoin them with splitenv (though I'm not dead-set on a helper function, I do believe making this trivial is a good thing to do). For backwards-compatibility, splitenv PATH CDPATH MANPATH would be executed on startup. If a user wishes to export it differently, string join is available.

All of this means that we no longer need \x1e, we have a scheme that at least has a fighting chance of being understood by other programs, but the (rather exotic IMHO) fish-inside-fish now becomes fish -c 'splitenv list; count $list'.

The problem is of course that, as mentioned, the usual colon-separated-list scheme has no way of escaping a colon, and if we wanted to add one, string split doesn't have a "--unescaped" option to only split on unescaped separators.

Am I making any sense?

@faho I think that idea has merit. The worst part of the old scheme was implicitly splitting on colons, which would mangle variables that should not be split. In your idea this is (almost) always explicit so I think it's quite safe.

Regarding escaping, not escaping colon in PATH is intentional per the link you found. I doubt PYTHONPATH, CLASSPATH, etc. are any more consistent in this regard. Since you can't use a colon in these paths, we can choose whether or not we escape it; but if we escape a colon we need to escape backslashes, and I'll bet you can have a backslash in PATH. We may need a "don't escape" whitelist (ugh).

Alternatively we don't worry about it, and just let any colon act as delimiters. I think I lean towards this for simplicity and familiarity with other shells.

We still are faced with the problem that some list-like variables are space-delimited, and others are colon delimited. One possibility is that splitenv accepts a delimiter, remembers it, and uses it to rebuild the value on export:

splitenv --on ' ' LESS
splitenv --on ':' PYTHONPATH

These calls now play the dual roles of importing any existing variable, and marking how it gets exported. What do you think?

Also, is there a way to do this without editing config.fish? Maybe as part of universal variables?

We still are faced with the problem that some list-like variables are space-delimited, and others are colon delimited. One possibility is that splitenv accepts a delimiter, remembers it, and uses it to rebuild the value on export:

Sounds good. Though at that point making splitenv a script probably wouldn't help, since we'd need cooperation from the C++ side anyway.

These calls now play the dual roles of importing any existing variable, and marking how it gets exported.

It's possible that now "splitenv" isn't the perfect name anymore (it was when I thought of it, of course :laughing: ) - I've also considered "listify".

Though it's bugging me that I can't remember where we've had a related discussion before - I think I'll need to scour the issues again tonight.

Users can tokenize strings with e.g. set SOMEPATH (string split : $SOMEPATH).

The string command is not documented anywhere I can find. Also, man string shows the string(3) man page that documents the string manipulation functions on BSD (and Mac OS X).

But if we exported with spaces, this would show 1 for the recursive call. As you say we don't rely on this, but it's nice from a consistency standpoint.

That behavior is, however, surprising. I'm willing to bet that if you ask 100 people what happens when a var with more than one element is exported 90 of them will say the values are concatenated with space as a separator. A few might say comma or other char is used as the separator. And the two persons who ran env will say the values are concatenated with no separator because unless you filter the output through something like cat -evt the record separator character is invisible.

which shows up in issues like #2090

I'm sorry but I don't see any merit to that user's complaint. The problem is trivially worked around by explicitly testing whether MANPATH is already set. Which, it seems to me, is something you have to do in any event given the semantics of leading versus trailing colons.

so it might break on $PATH entries with a colon inside them

It is at least thirty years too late to fix that. We should not implement escaping of colons (and by extension the escape character) as that would be non-standard behavior. Until recently I spent 20+ years as a UNIX support specialist. I have never heard someone complain that the presence of a colon in a directory embedded in $PATH or a similar variable was an issue.

My preferred solution would be a function like splitenv var1 var2 var3

That's fine although it's not clear why the (undocumented) string split isn't sufficient. Regardless of whether we need a new function we definitely should not add any new auto-split env vars. The only two that are common enough to warrant that behavior are PATH and CDPATH (and MANPATH since its already special-cased). Other vars like PYTHONPATH can be explicitly split by a user if they find that useful.

However, having said that there certainly should be a way to register that a given var (e.g., PYTHONPATH) should have its elements concatenated with a specific separator char when being exported. The most natural way of doing this is via a new option for the set command. For example,

set -x -S ':' PYTHONPATH dir1 dir2 dir3

This would not affect how the var is stored in the universal var data store where the record-separator char would still be used and it would be auto-split when loaded from that data store. To be determined is whether the registered separator char for export should also affect string interpolation. My feeling is that it should. That is, if the above "set" command is executed then a subsequent

echo "PYTHONPATH is $PYTHONPATH"

should use a colon rather than a space for concatenating the values of PYTHONPATH. The default separator is a space to preserve existing semantics and minimize surprise for the user. Note that the special-cased vars like PATH would also use a colon in that example. Which is incompatible with the current behavior but is consistent with the new semantics and less surprising. In other words, why are the elements of $PATH separated by colons in the exported environment but spaces in the output of

echo "PATH is $PATH"

The string command is not documented anywhere I can find. Also, man string shows the string(3) man page that documents the string manipulation functions on BSD (and Mac OS X).

Easy, tiger. It's in the development versions - see https://github.com/fish-shell/fish-shell/blob/master/doc_src/string.txt

That's fine although it's not clear why the (undocumented) string split isn't sufficient.

My original idea was that it's a convenience function so this operation becomes completely trivial. With @ridiculousfish's proposal it becomes something more and adjusts some kind of store so the variable will also be joined on that character when exported. string split is just a command that splits a string - basically our version of cut.

The most natural way of doing this is via a new option for the set command.

That's another option, though I'm not completely sold on the semantics. E.g. set -S ':' PYTHONPATH. Would that set PYTHONPATH to the empty list or would it just split the existing PYTHONPATH? So far all set options have done the former, so you'd have to do set -S ':' PYTHONPATH $PYTHONPATH. Or we'd make it _not_ do that and have inconsistency within the same tool.

In other words, why are the elements of $PATH separated by colons in the exported environment but spaces in the output of echo "PATH is $PATH"

That's actually a good question. Of course you wouldn't expect the separator to show up in say for p in $PATH; echo $p; end, but joining it with the per-variable separator char might be the right thing to do. Of course there's string join to do it manually.

That behavior is, however, surprising. I'm willing to bet that if you ask 100 people what happens when a var with more than one element is exported 90 of them will say the values are concatenated with space as a separator.

There's a general problem with doing design-by-survey and fish. Because the surveyed people would frequently have knowledge of bash (and to a lesser extent other POSIXy shells) while the very idea of fish is to do something _better_ by abandoning at least some of POSIX.

That's not to say it's completely worthless, it's just something to keep in mind - if we stuck to these kinds of ideas, we'd have bash's word-splitting behavior and if-fi.

Would set -S ':' PYTHONPATH set PYTHONPATH to the empty list or would it just split the existing PYTHONPATH?

It would set it to an empty list. If the user wants to retain the existing value they have to explicitly include it (see below).

We already have all the needed capabilities with the exception of a means to configure the character (or the empty string) to be used when concatenating array elements of a given var for export or interpolation. If someone wants to manipulate a var like PYTHONPATH they can either treat it as a simple string:

set PYTHONPATH "/a/new/dir:$PYTHONPATH"

Or they can treat it as an array:

set -S ":" PYTHONPATH /a/new/dir (string split ":" $PYTHONPATH)

Note that my proposal to use the split/concatenation character rather than a space when interpolating into a string provides consistent behavior regardless of whether or not the user splits the var into an array.

I am definitely not suggesting design by committee. That way lies madness and bogosities like zsh. I'm simply pointing out that when given two or more options with no other reason to choose one over another then picking the option that will least surprise a user of the shell is the best choice. It's also why I'm (for the moment) opposed to introducing new commands or behaviors such as auto-splitting vars (other than PATH and CDPATH, of course). This is the sort of thing that is done infrequently and usually only in config.fish and a few specialized functions like Anaconda's "activate" script. And the way to make the latter behave correctly regardless of whether or not the user has already split the var into an array in his config.fish is to always treat it as a string that needs to be split. For example, if PYTHONPATH needed to be amended it might do something like this:

# Hypothetical snippet from the Anaconda activate script.
if test (count PYTHONPATH) -gt 1
    set -S ':' PYTHONPATH /activated/python/tree $PYTHONPATH
else
    set PYTHONPATH "/activated/python/tree:$PYTHONPATH"
end

Or, more simply,

# Hypothetical snippet from the Anaconda activate script.
set -S ':' PYTHONPATH /activated/python/tree (string split ':' $PYTHONPATH)

Yes, that potentially turns what may have been a simple string into an array. But with my rule that the character specified by the -S switch is used when exporting and interpolating that conversion into an array should not matter in practice. There is, however, one corner case. What happens if the user has not explicitly converted the var to an array in his config.fish and he then runs something like the hypothetical script above. The var potentially becomes a multi-element array which means that if they subsequently do

for elem in $PYTHONPATH
   echo $elem
end

That won't execute the body of the for loop just once with the value in the form of colon-separated directories as the user might expect since they were unaware of the splitting done by the hypothetical "activate" script. I think we can live with that as it would be perverse for a user to do something like that.

tl;dr I think lists should "remember" their delimiter and below is why.


I agree with a lot of the most of the above. One thing that still seems tediuous though is that the commands above still seem overly verbose; i.e., it's sometimes simpler to describe some of these commands in plain English.

As an example (and I'm not focusing on length as much as on the number of repeated things):

  • fish: set -S ':' PYTHONPATH /activated/python/tree (string split ':' $PYTHONPATH)
  • English: "append /activated/python/tree to PYTHONPATH (:-delimited`)"

There are two repeated things here: PYTHONPATH and the delimiter :. That PYTHONPATH should be repeated is arguably okay for two reasons, and neither of these two reasons apply to the delimiter.

  1. It's not hard to intuit what is going on when someone says set PYTHONPATH /activated/python/tree $PYTHONPATH, because this is very similar to things like i = 2 + i, which is a very familiar concept/idiom. (But we still have shortcuts like +=, which is why I propose the --append flag below.) On the other hand, when people think about appending to a list, they do not think about splitting and joining on a delimiter. They don't think about converting the entire list to some other format, doing the real operation to it, and putting it back. In their mind, they naturally read the delimiter as a delimiter instead of changing it to some "internal" or "preferred" delimiter.
  2. Using a plain set command for appending saves adding another command for joining two different lists. On the other hand, converting from one delimiter to another is something that we ideally never want the user to do manually, mostly for the reason above.

In contrast, I suggest another way to specify the delimiter on lists: associate it with the list indefinitely. So the example above might be done as follows:

# Changes the delimiter for this list. This might be done in some global config file for common lists as this one.
set -S ':' --no-modify PYTHONPATH
# or, workaround if you don't want to add extra options to set:
set -S ':' PYTHONPATH $PYTHONPATH

# The actual append operation
set --prepend PYTHONPATH /activated/python/tree
# or, workaround if you don't want to add extra options to set:
set PYTHONPATH /activated/python/tree $PYTHONPATH

Implications/follow-up questions/etc:

  1. This is fairly compatible with the current suggestion. Here are the changes required:

    • Make the delimiter stick (possibly by using another variable as __fish_sep_PYTHONPATH)

    • (optional) Add a flag I'm currently calling --no-modify, which tells fish to change the delimiter of a list without changing its contents. Possibly also add --append and --prepend flags because of reason (1) above. Anyway this one isn't required, as the workaround above shows, à la how appending and prepending are done in fish today.

  2. In fish, lists must at least _be treated_ like first-class citizens. This means that changing the delimiter should change the string representation, not the list representation (unless the delimiter is present in one of the elements, in which case splitting there is inevitable). For example, if you're changing delimiters from , to :, ["0:1", "2"] should become ["0", "1", "2"] and not ["0", "1,2"] (which is what would happen if you simply change the delimiter without changing the string backing the list). This behavior should maximize compatibility with the current behavior and the fact that there is currently a default, immutable delimiter.

Here's the bottom line:

  • This involves a lot less tokens, and almost nothing is repeated.
  • This parallels the mental model many users have. Users think in these terms: "prepend", "set the delimiter", "don't modify".
  • This looks like the only right way to do this task (the old way now looks extra clumsy), and so these additions do not destroy orthogonality.
set --no-modify -S : PYTHONPATH
set --prepend PYTHONPATH /activated/python/tree

Thank you, @szhu, for the detailed comment regarding my proposal. However, there are many problems with your proposed solution. For example, the addition of the --no-modify option does in fact modify the variable by converting it into a list and thus does modify the variable. While I reject nearly all the elements of your proposal it did make me think about a more straightforward solution that would address most, if not all, of your points. Perhaps there should be a mechanism for telling fish that a given env var should always be automatically split and reconstituted on a given token (e.g., ":" or " "). This might be called an auto-array designation and when executed any existing value would be immediately split if it was not already an array.

A new option could be added to the set command to activate this behavior. However, I'm concerned that doing so is ambiguous and could be interpreted as defining a variable with no value. Would adding a -A token option to the set command be unambiguous? For example:

set -x -A ':' PYTHONPATH

Presumably that would immediately convert any existing PYTHONPATH env var to an array after being split on a colon. It would conversely result in the values being concatenated on a colon when exported or interpolated into a string. Similarly, even if PYTHONPATH did not exist at the time that command was executed the auto-array specification would be remembered and subsequent uses would be affected. For example, this would obviously create an array:

set PYTHONPATH /a/path /b/path /c:/d/path

But what about that last argument? Should it be automatically split into two tokens?

Note that this behavior should only apply to exported vars and an error would be raised otherwise. There are also some corner cases that need to be spelled out. For example, what if the original auto-split declaration includes values as in this example:

set -x -A ':' PYTHONPATH 'hello:goodbye' $PYTHONPATH

Should those values be split on the auto-split token? Or should it result in an error and require modifying the value be done in a separate statement? And whichever syntax is chosen you still have the issue of what to do about values that contain the auto-split token. The devil is in the details. Which is to say there may be other ramifications of this proposal I haven't thought about. My original proposal with a more verbose syntax avoids those issues as far as I can tell.

@krader1961, thanks for your response. However, you seem to think that I'm converting variables from strings to lists. I think you're misunderstanding one important concept in fish: every variable is a list of strings. The variables that appear to be strings are actually lists of length 1. fish treats them no differently than lists of length 0 or 2 or any other length.

Also, note that while the underlying string used to pass around environment variables might change when you change delimiters, one of the fortes of fish is that the user typically does not need to think about delimiters at all. This is why I recommend that all the -S option does is specify how this list should be converted to a string when it is _exported outside_ fish. -S should not change the in-fish list representation (except in cases where it's impossible to represent that list using the target delimiter).

By the way, here is one example that shows how clean my proposal is. Here is code for converting a variable $L back to the default delimiter of \x1e. It will have absolutely no effect on any variable (any scope, any number of items) that is able to be created in fish today.

set -S \x1e L $L

One more thing: the --no-modify family of arguments are just shortcuts. Here are their equivalents:

| shortcut | equivalent |
| --- | --- |
| set [other args] --no-modify L | set [other args] L $L |
| set [other args] --prepend L $TOADD... | set [other args] L $TOADD... $L |
| set [other args] --append L $TOADD... | set [other args] L $L $TOADD... |

(I've stated the following before but I think I can do a better job explaining it now.) By emphasizing how "dumb" these three arguments are, some may question if they're needed at all. One may cite that fish has a design principle of orthogonality. When all things are orthogonal, this means that for any big task you want to do, it should be obvious which set of features to pick to do that task -- there should be only one right way to do it. Here, I indeed add another way to prepend to/append to/prevent modification to a list, but this is only because I think the equivalents being replaced are unnecessarily verbose; they should not be the right ways to append modify lists. One way to convince yourself of this is to think about how you think of appending to a list. You probably think "append $TOADD to $L" rather than "set $L to $L $TOADD".

Let me know what you think, and if this makes a more convincing case for my proposal. (Also it's pretty common for me to misunderstand things so feel free to correct me.)

@szhu I am quite aware that all vars in fish are lists of zero, one, or more values. You also apparently did not read my earlier comments where I clearly state that the associated delimiter should not affect the internal representation or how the values are persisted to the universal data store (other than storing the delimiter). You also did not address my previous points. Consider your last example:

set -S \x1e L $L

What should that do if L already contains two or more values? Presumably nothing other than changing the associated delimiter. Would the $L argument be optional in that case? Or should it first convert the existing array into a simple string (presumably concatenating using the existing delimiter) then split that string on the new delimiter? As I said before, the devil is in the details.

Ultimately the established designers and maintainers will decide whether or not your --no-modify family of functions should be added but I vote no as they don't add enough value in my opinion relative to their cost.

Sorry, this thread is long, I must have missed your acknowledgement of this above; good to know we're on the same page! I think I've addressed most of your concerns above as well, but not all of it. I'll specifically address each of your concerns below.


1. Is the $L optional in set -S \x1e L $L?

The existing behavior of set will not change. Under current behavior, set L $L does not change L and set L makes L an empty list. Same with set -S \x1e L $L and set -S \x1e L.

1.1 Doesn't set -S \x1e L $L seem overly verbose for just changing the delimiter?

Slightly. That's why I propose the --no-modify option as a shortcut for this.

But my plan won't fall apart if this shortcut does not exist. We already deal with this issue every day when we append lists: set PATH ~/.bin $PATH. That's why, for the same reason, I propose --prepend and --append as well.

2. How would the conversion process take place?

Let's say our old delimiter is \x1e and our new one is :, and that we have a fish list ["hello", "world"] (exported as hello\x1eworld). There two basic ways to do conversion ("conversion options"):

  1. Use ["hello", "world"] and convert it to​ ["hello", "world"] (exported as hello,world)
    Advantage: The list representation doesn't change.
  2. Use hello\x1eworld and convert it to ["hello\x1eworld"] (exported as hello\x1eworld)
    Advantage: The exported value representation doesn't change.

Note that this is from a UI perspective, not an implementation perspective; we're talking about what it appears like to the user. I'll cover implementation in the next question. _Note: the rest of this answer is new stuff that I have not addressed above, prompted by your questions. Thanks!_

Within fish. First, if we are working entirely in fish, lists are first-class and you should never have to worry about delimiters, and so neither are required. (Again, "should" is from a user's perspective, as developers, it is our responsibility to make this true.)

Changing export format of vars. Thus, the only reason a user will need to change delimiters is to change the exported string for programs that read environment variables. For lists that are created in fish, we will definitely use conversion option 1, because the the variable's meaning as a list is important and well-defined, so we need to preserve the list representation.

Changing import format of vars. However, for environment variables like PATH that are initially created outside of fish, we need to, for a list that already has a delimiter, tell fish what that delimiter is. For this we can use conversion option 2.

2.1 How would this be implemented?

Fish even though the user should not need to know this, fish actually stores lists as strings. The variable x is stored as hello\x1eworld. Under my proposal, there would be another variable, __fish_delimiter_x, that specifies the delimiter of variable x. It does not exist right now and so we use the default delimiter, \x1e.

For conversion option 1:

  1. Split the variable on old the delimiter, resulting in a true list in the implementation language (C++).
  2. Join the list using the new delimiter, resulting in a new string.
  3. Save the new delimiter into __fish_delimiter_x.

For conversion option 1, an equivalent implementation:

  1. In the variable, replace all occurrences of the old delimiter with the new one.
  2. Save the new delimiter into __fish_delimiter_x.

For conversion option 2:

  1. Save the new delimiter into __fish_delimiter_x.

2.2 If we need both conversion options, how would the user specify which to use?

Perhaps we can have two options: -D or --convert-delimiter for option 1 and -d or --set-delimiter for option 2.

2.3 Do we really need both options?

Under current fish we choose to assume that we won't see \x1e in the wild outside of fish. If we keep this as the default delimiter and keep this assumption, conversion option 1 is sufficient to both convert and set the delimiter and we will not need conversion option 2. (A easy way to be convinced of this is to realize that if the assumption is true, when converting lists created externally, conversion option 1's step "replace all occurrences of the old delimiter with the new one" will do nothing, reducing the entire conversion to conversion option 2.)


@ridiculousfish, I would appreciate your feedback on this too, specifically regarding the UI and implementation details. Thanks!

It seems that there are two issues here. Let's just talk about the first one?

+1 for true array

Is that separator trick really needed for fish? Ping #627

Revisiting this in light of my fix for issue #2106 wherein I noticed there were two incompatible ways to convert the string representation for variable values to arrays. One of which incorrectly elided empty elements. The core issue is that class env_var_t is based on wcstring rather than a vector of wcstring. Whether or not changing that is worth the effort is open for debate.

If you're following this issue I encourage you to take a look at, and perhaps test, PR #4082. I decided the best way to address this is "tied" variables similar to the feature by the same name in zsh.

It seems like the questions surrounding these colon-exported lists and what envvars should be included in this whitelist, is something yet undecided. What is the current status on this matter? Can we hope for a final solution to this topic? Just today, I again run into the trap that LD_LIBRARY_PATH is not in the whitelist...

To summarize: Unix environment variables are strings, so array-like env vars must use a delimiter. Most use colons ($PATH); though not all do ($LESS). We would like to just use colons to delimit all arrays on import/export (and indeed fish used to work this way); the problem is some flat variables contain colons ($DISPLAY, $SSH_CONNECTION).

The goal is to make delimited variables work naturally with fish's array support. @szhu suggested enhancing set to track a delimiter, but IMO enhancing set is the wrong place to attach this:

  • Annoying interaction between setting the variable and setting its delimiter (motivating --no-modify).
  • Tricky questions around how delimiters interact with variable scope.
  • Problems around universal variables. We'd have to teach uvars to remember the delimiter, and also about --no-modify since there's no way to set a variable to its current value with uvars.

On review I favor @faho's splitenv idea. splitenv name would split the existing variable on colons, that's it. It's wonderfully simple. Variables that don't use colons are rare enough that we don't need special support for them.

The downside is that fish exported arrays would be re-imported as a colon-delimited string; in practice I think this will be rare.

We should not inflict this splitenv complication on users if we can avoid it. So I want to go one step further, and expand the colon whitelist to all environment variables whose name ends in PATH, e.g. LD_LIBRARY_PATH, PYTHONPATH, etc. In practice this should do the right thing almost all the time; anyone bitten by it can use string join to fix the value up.

So what I propose (really faho's proposal):

  • Export all arrays using colons; no more ASCII record separator.
  • Implement a splitenv function that splits a variable on colons. This can be written in fish script.
  • Enhance the colon-separated whitelist to all variables ending in PATH.

I believe this will address most of the pain associated with colon-delimited arrays in a minimalist and compatible way.

regarding the splitenv idea:

Let's say I wanted fish to export a list whose items contain colons, for example: ["item1", "item:2"]. (I don't think this is a rare occurrence, especially when arrays are used to store user input.)

Will the list be exported as item1:item:2? If so, it will be impossible to recreate the original list after export.

Also, having an immutable whitelist for variables feels wrong, although having the whitelist be *?PATH feels less wrong. (That was another reason for my proposal of storing the delimiter as a variable -- the whitelist can be changed by setting a variable.)

@szhu You're right. Exported-lists-can't-contain-colons is a problem that Unix already suffers from:

Since \

so I don't feel too bad about introducing the same limitation for fish (for exported variables only).

Also this problem only occurs when exporting an array to a recursively-invoked fish instance, which I think will be rare. If this does prove common, we could either attach sidecar data in another variable, or use a different array separator when invoking fish recursively. My guess is we won't need to go that far.

I agree that the edge case where the proposal doesn't work well isn't very common, but I'm afraid that it would be really bad if people do run into it.

Here's an example that is only partially-contrived:

set -x TEST color:red 
set -x TEST2 color:red font:serif
set -x TEST_PATH color:red 
set -x TEST2_PATH color:red font:serif
exec fish
echo $TEST #=> color:red
echo $TEST2 #=> color:red:font:serif
echo $TEST_PATH #=> color red 
echo $TEST2_PATH #=> color red font serif

I can imagine a lot of new users being confused after observing the above.

I think the following behavior would be significantly more pleasant:

set -x TEST color:red 
set -x TEST2 color:red font:serif
set -x TEST_PATH color:red 
set -x TEST2_PATH color:red font:serif
exec fish
echo $TEST # color:red
echo $TEST2 # color:red font:serif
echo $TEST_PATH #=> color:red 
echo $TEST2_PATH #=> color:red font:serif

I'd like your and the community's thoughts on this one.

Why not escape existing colons? That would preserve the distinction.

Escaping makes sense to me.

Could it still be confusing that whether a variable is an array won't be remembered?

set -x TEST2 color:red font:serif
set -x TEST2_PATH color:red font:serif
exec fish
echo $TEST2 # color\:red:font\:serif
echo $TEST2_PATH #=> color:red font:serif

Yes. My hypothesis is that exporting arrays is not common, outside of path lists.

MANPATH has a special meaning for double colons (::) - see #2090 - does this throw a spanner in the works?

I'd also argue that having a sidecar variable FISH_ARRAY_VARIABLES=FOO\t:\nBAR\t;\nBAZ\t-\n in any case would be a good hint for instances of Fish being invoked to pick up the array variables again, without disturbing other processes and without requiring a "we're involing fish now" checker..

re: https://github.com/fish-shell/fish-shell/issues/436#issuecomment-392409659 @zanchey
I haven't read #2090 in detail, but I believe converting between the colon-delimited-string and array forms is completely seamless (except for when colon ~does not appears~ appears in array items).

To include a double-colon in MANPATH, just add an empty string where the double colon should appear:

$ set -x MANPATH 1 2 '' 3
# Check if it's set
$ bash -c 'echo $MANPATH'
1:2::3

To start MANPATH with a colon, just add an empty string item to the beginning:

$ set -x MANPATH '' 1 2 3
# Check if it's set
$ bash -c 'echo $MANPATH'
:1:2:3

I haven't followed everything here, but as a user I want to advocate for "no configuration".
I think set -S and splitenv are forms of configuration. Some users would do them in fish.config and handle PYTHONPATH as array. Others would not and handle PYTHONPATH as single colon-delimited word. Copy-pasting stackoverflow advice and running scripts manipulating PYTHONPATH from one user to another would then not always work...

A fixed "if it ends with PATH" rule is configuration-free and sounds as perfect as you can get :+1:
(I don't have opinion on whether it's worth the backward incompatibility)
Yes, set -x TEST2_PATH color:red font:serif would be imported as color red font serif array but that's the deal with exporting variables. You can't really set an exported var to an array without understanding how it works.

Yes. My hypothesis is that exporting arrays is not common, outside of path lists.

@ridiculousfish that may be true in current shells, but I imagine that as fish gains more traction, users might want to leverage fish's ability to send lists to child fish shells. I can imagine there may eventually be programs/plugins that manage the state of a fish session (I'll check back on this comment in a few years to see if this is true), and being able to universally auto-de/serialize lists will make that code more clean and less workaround-y.


Sort of similar but slightly different thought: Treating PATH as special case is an anachronistic edge case that users probably understand only if they have a history on typical use cases of shells. This limits fish's ability to be used as a general scripting language, and limits some potential future use cases.

@ridiculousfish I think one possible solution is to associate each environment variable/array with its own separator (and you can keep '\x1e' or ' ' or ':' as the default one), and the user who creates the environment variable is responsible for choosing appropriate separators to avoid conflicts. The command may be like: set --separator ':' TMP 1 2 3. Thus for those well-known environment variables, users can just choose the corresponding well-known separators which can also be recognized by other programs and can make fish more compatible with more programs (such as Python).

For those who are only reading recent comments, just a note that @thuzhf's set --separator recommendation is the same as the set -S recommendation mentioned repeatedly throughout this thread. To get more context on that discussion, you can grep this page for set -S.

@szhu Sorry for not noticing the previous set -S. That's basically what I want too. I also noticed that there were several concerns others had on this new option. I can give my thoughts on these concerns below (since fish's set haven't used -s as an option, I will use -s to refer to --separator hereafter):

  1. --no-modify does modify something. Yes, and you should change the name to be explicit, e.g., --change-separator.
  2. There are some corner/tricky cases. This is basically due to the not-well-defined syntax, and can be avoided naturally if we give a strict definition of the syntax. For example:

    1. Basic idea: every var (list) is associated with its own separator when defined (default is ' '). This separator will be used when this var is created from string and when it is converted to string (this is a common idea in some languages such as Python's join() function). A var is converted to string when exported or when the user wants to do this.

    2. How to create env vars



      1. set ENV_VAR a b c. Without -s, we choose ' ' as the default separator.


      2. set -s ':' ENV_VAR. In this case, ENV_VAR is set to be an empty list.


      3. set -s ':' ENV_VAR a b:c d e:f. In this case, users who write this code should clearly understand ':' is the separator and understand ENV_VAR will be an array like ['a b', 'c d e', 'f'] and will be exported as 'a b:c d e:f'. What if you want the exported ENV_VAR starts with spaces and ends with spaces? You should use escapes like: set -s ':' ENV_VAR \ a b:c d e:f\. Then ENV_VAR will be [' a b', 'c d e', 'f '] and will be exported as ' a b:c d e:f '.


      4. set -s ':' ENV_VAR a b:c d e:f $ENV_VAR. In this case, it depends on how $ works. If it is defined as extracting the string value of ENV_VAR instead of the list, then this command will be the same as just replacing $ENV_VAR with its string value converted from the underneath list, and in this case, set -s ':' ENV_VAR a b:c d e:f:$ENV_VAR is probably what you really want (notice the : after f); if it is defined as extracting the variable of ENV_VAR (which is a list instead of string), then this should become a list extending operation just as in python. E.g., in the latter case, if ENV_VAR is ['x', 'y'] before, then after this operation ENV_VAR will become ['a b', 'c d e', 'f', 'x', 'y']. What if ENV_VAR's previous separator is not ':'? In the former case, it's your responsibility to make sure you are doing the right thing, e.g., you probably should use consistent separator by changing original separator to ':' or by changing current separator to be the original one. In the latter case, this will set the separator of this array from the original one (no matter what it is) to ':'.



    3. How to change separator



      1. set --change-separator ':' ENV_VAR. If ENV_VAR doesn't exist, the program should exit with error code other than 0. Easy and explicit enough.



    4. How to view separator



      1. set --view-separator ENV_VAR.



Besides, I really think this problem is obvious and urgent and a big pain point for users and hope this problem can be solved as soon as possible because this really affects user experience greatly. Actually I haven't encountered other (even very small) problems for now using fish except this so big one.

I really think this problem is obvious and urgent

@thuzhf: I'd say you're overestimating that.

One reason being that your problem in #5169 was with $LD_LIBRARY_PATH, but that's not actually a list in fish! You should set it like set LD_LIBRARY_PATH "$LD_LIBRARY_PATH:/some/path", just like in other shells.

Fish turns exactly three inherited/exported variables into lists automatically:

$PATH, $MANPATH and $CDPATH. And exactly this list will have a ":" separator when exported.

Other "standardized" variables like $LD_LIBRARY_PATH should not be handled as lists in fishscript, so you don't have this issue. Variables that aren't standardized you can handle however you want, since other programs won't do anything with them anyway, so the separator is non-critical.

@faho Thanks for your clear explanation. That really makes a lot sense to me. OK, I can say this problem is solved for me.

I took a look at the MANPATH issue described in #2090. The scenario is to append to manpath such that it continues using system paths.

In bash one would write this as export MANPATH="$MANPATH:/new/path". If MANPATH is set, this will append to it. If unset, this will prepend a colon, which is a man-specific indication to use system directories. This syntax doesn't work in fish; the problem is that MANPATH is an array and so "$MANPATH" will have spaces instead of colons.

A "tied variables" approach would allow us to have e.g. fish_manpath as an array that mirrors MANPATH as a colon-separated string. This could be built entirely in fish script. However we would want to do this for all path-like variables, not just MANPATH, and that would be a significant compatibility break which it's not clear how to manage. Also it has the same problems, e.g. the manpath array-variable in zsh is awkward to append to, so it's not clear why it even exists.

My proposal here doesn't make the MANPATH situation better or worse; I think the thing to do is punt and just have an easy story for appending to MANPATH, which is this:

set -q MANPATH || set MANPATH ''
set -x MANPATH $MANPATH /new/path

That is not too painful to paste into config.fish.

My proposal here doesn't make the MANPATH situation better or worse; I think the thing to do is punt and just have an easy story for appending to MANPATH, which is this:

@ridiculousfish: I've been thinking of going one step further, actually: Split these special variables on ":" also on assignment, and join them with ":" instead of space in quoted expansion.

That means when you do set -gx MANPATH "$MANPATH:/new/path", fish goes and performs the splitting automatically, resulting in the equivalent of set -gx MANPATH "" /new/path.

Now, this means that ":" can't appear in a path in $MANPATH (and $PATH, and $CDPATH), but they can't do that anyway because it'd break non-fish utilities!

That would also allow us to maybe one day remove the special handling, because it adds a cross-compatible way of handling it - you'd just have to assign with the :, and use it with (string split : -- $MANPATH), and it would work even if that handling were removed.

@faho I'm warming to the idea - how would the user mark a variable as receiving this special treatment? Would splitenv do it?

how would the user mark a variable as receiving this special treatment?

My idea was actually to not allow marking at all - just leave it as special behavior for $PATH et al. Which would let us get away from listifying at some point in the future.

However, I've since come to understand that allowing this for other variables helps us with other variables as well - e.g. I've said before that my $EDITOR is set as one element (set EDITOR "emacs -nw") for compatibility with external tools, but fish would like it better if it were a list.

So I'd probably default to _space_ as a delimiter, unless it's a PATH-like (and assuming that it is if the name ends in PATH is probably okay).

Would splitenv do it?

I don't really love introducing another builtin for this, so I'd probably go with the argument-to-set option.

I agree that "special-casing PATH/MANPATH/CDPATH is weird; we need a more general solution".

I propose that we STOP special-casing PATH/MANPATH/CDPATH. They would be treated (by the fish end-user) the way they are in other shells. $PATH (and the others) would be a single string (or in fish jargon a list with a length of 1) with colons in it. Note that I am referring to the fish user experience, not how these things are handled internally; I don't know what the implementation inside fish would look like--I rely on others to point out any problems there.

Granted, it would have the disadvantage of a backward incompatibility, but I think it would be worth it as a big gain in simplicity and elegance. I think it would address #2090 too.

What does everyone think?

5245 has been merged, so this seems solved.

Was this page helpful?
0 / 5 - 0 ratings