Julia: API consistency review

Created on 2 Feb 2017  ·  131Comments  ·  Source: JuliaLang/julia

I'm starting this as a place to leave notes about things to make sure to consider when checking for API consistency in Julia 1.0.

  • [x] Convention prioritization. Listing and prioritizing our what-comes-first conventions in terms of function arguments for do-blocks, IO arguments for functions that print, outputs for in-place functions, etc (https://github.com/JuliaLang/julia/issues/19150).

  • [ ] Positional vs keyword arguments. Long ago we didn't have keyword arguments. They're still sometimes avoided for performance considerations. We should make this choice based on what makes the best API, not on that kind of historical baggage (keyword performance issues should also be addressed so that this is no longer a consideration).

  • [ ] Metaprogramming tools. We have a lot of tools like @code_xxx that are paired with underlying functions like code_xxx. These should behave consistently: similar signatures, if there are functions with similar signatures, make sure they have similar macro versions. Ideally, they should all return values, rather than some returning values and others printing results, although that might be hard for things like LLVM code and assembly code.

  • [ ] IO <=> file name equivalence. We generally allow file names as strings to be passed in place of IO objects and the standard behavior is to open the file in the appropriate mode, pass the resulting IO object to the same function with the same arguments, and then ensure that the IO object is closed afterwards. Verify that all appropriate IO-accepting functions follow this pattern.

  • [ ] Reducers APIs. Make sure reducers have consistent behaviors – all take a map function before reduction; congruent dimension arguments, etc.

  • [ ] Dimension arguments. Consistent treatment of "calculate across this [these] dimension[s]" input arguments, what types are allowed etc, consider whether doing these as keyword args might be desired.

  • [ ] Mutating/non-mutating pairs. Check that non-mutating functions are paired with mutating functions where it makes sense and vice versa.

  • [ ] Tuple vs. vararg. Check that there is general consistency between whether functions take a tuple as the last argument or a vararg.

  • [ ] Unions vs. nullables vs. errors. Consistent rules on when functions should throw errors, and when they should return Nullables or Unions (e.g. parse/tryparse, match, etc.).

  • [ ] Support generators as widely as possible. Make sure any function that could sensibly work with generators does so. We're pretty good about this already, but I'm guessing we've missed a few.

  • [ ] Output type selection. Be consistent about whether "output type" API's should be in terms of element type or overall container type (ref #11557 and #16740).

  • [x] Pick a name. There are a few functions/operators with aliases. I think this is fine in cases where one of the names is non-ASCII and the ASCII version is provided so people can still write pure-ASCII code, but there are also cases like <: which is an alias for issubtype where both names are ASCII. We should pick one and deprecated the other. We deprecated is in favor of === and should do similarly here.

  • [ ] Consistency with DataStructures. It's somewhat beyond the scope of Base Julia, but we should make sure that all of collections in DataStructures have consistent APIs with those provided by Base. The connection in the other direction is that some of those types may inform how we end up designing the APIs in Base since we want them to extend smoothly and consistently.

  • [ ] NaNs vs. DomainErrors. See https://github.com/JuliaLang/julia/issues/5234 – have a policy for when to do which and make sure it is followed consistently.

  • [ ] Collection <=> generator. Sometimes you want a collection, sometimes you want a generator. We should go through all our APIs and make sure there's an option for both where it makes sense. Once upon a time, there was a convention to use an uppercase name for the generator version and a lowercase name for the version that's eager and returns a new collection. But no one ever paid any attention to that, so maybe we need a new convention.

  • [ ] Higher order functions on associatives. Currently some higher order functions iterate over associative collections with signature (k,v) – e.g. map, filter. Others iterate over pairs, i.e. with signature kv, requiring the body to explicitly destructure the pair into k and v – e.g. all, any. This should be reviewed and made consistent.

  • [x] Convert vs. construct. Allow conversion where appropriate. E.g. there have been multiple issues/questions about convert(String, 'x'). In general, conversion is appropriate when there is a single canonical transformation. Conversion of strings into numbers in general isn't appropriate because there are many textual ways to represent numbers, so we need to parse instead, with options. There's a single canonical way to represent version numbers as strings, however, so we may convert those. We should apply this logic carefully and universally.

  • [ ] Review completeness of collections API. We should look at the standard library functions for collections provided by other languages and make sure we have a way of expressing the common operations they have. For example, we don't have a flatten function or a concat function. We probably should.

  • [ ] Underscore audit.

deprecation

Most helpful comment

Underscore audit

The following is an analysis of all symbols exported from Base which contain underscores, are not deprecated, and are not string macros. The main thing to note here is that these are exported names only; this does not include unexported names that we tell people to call qualified.

I've separated things out by category. Hopefully that's more useful than it is annoying.

Reflection

We have the following macros with corresponding functions:

  • [ ] @code_llvm, code_llvm
  • [ ] @code_lowered, code_lowered
  • [ ] @code_native, code_native
  • [ ] @code_typed, code_typed
  • [ ] @code_warntype, code_warntype

Whatever change is applied to the macros, if any, should be similarly applied to the functions.

  • [x] module_name -> nameof (#25622)
  • [x] module_parent -> parentmodule (#25629, see #25436 for a previous attempt at renaming)
  • [x] method_exists -> hasmethod (#25615)
  • [x] object_id -> objectid (#25615)
  • [ ] pointer_from_objref

pointer_from_objref could perhaps do with a more descriptive name, maybe something like address?

Aliases for C interop

The type aliases containing underscores are C_NULL, Cintmax_t, Cptrdiff_t, Csize_t, Cssize_t, Cuintmax_t, and Cwchar_t. Those that end in _t should stay, as they're named to be consistent with their corresponding C types.

C_NULL is the odd one out here, being the only C alias containing an underscore that isn't mirrored in C (since in C this is just NULL). We could consider calling this CNULL.

  • [ ] C_NULL

Bit counting

  • [ ] count_ones
  • [ ] count_zeros
  • [ ] trailing_ones
  • [ ] trailing_zeros
  • [ ] leading_ones
  • [ ] leading_zeros

For a discussion of renaming these, see #23531. I very much favor removing the underscores for these, as well as some of proposed replacements in that PR. I think it should be reconsidered.

Unsafe operations

  • [ ] unsafe_copyto!
  • [ ] unsafe_load
  • [ ] unsafe_pointer_to_objref
  • [ ] unsafe_read
  • [ ] unsafe_store!
  • [ ] unsafe_string
  • [ ] unsafe_trunc
  • [ ] unsafe_wrap
  • [ ] unsafe_write

It's probably okay to keep these as-is; the ugliness of the underscore further underscores their unsafety.

Indexing

  • [ ] broadcast_getindex
  • [ ] broadcast_setindex!
  • [ ] to_indices

Apparently broadcast_getindex and broadcast_setindex! exist. I don't understand what they do. Perhaps they could use a more descriptive name?

Interestingly, the single index version of to_indices, Base.to_index, is not exported.

Traces

  • [ ] catch_backtrace
  • [x] catch_stacktrace -> stacktrace(catch_backtrace()) (#25615)

Presumably these are the catch block equivalents of backtrace and stacktrace, respectively.

Tasks, processes, and signals

  • [ ] current_task
  • [ ] task_local_storage
  • [ ] disable_sigint
  • [ ] reenable_sigint
  • [ ] process_exited
  • [ ] process_running

Streams

  • [ ] redirect_stderr
  • [ ] redirect_stdin
  • [ ] redirect_stdout
  • [x] nb_available -> bytesavailable (#25634)

It would be nice to have a more general IO -> IO redirection function into which all of these could be combined, e.g. redirect(STDOUT, io), thereby removing both underscores and exports.

Promotion

  • [ ] promote_rule
  • [ ] promote_shape
  • [ ] promote_type

See #23999 for a relevant discussion regarding promote_rule.

Printing

  • [x] print_with_color -> printstyled (see #25522)
  • [ ] print_shortest (see #25745)
  • [ ] escape_string (see #25620)
  • [ ] unescape_string

escape_string and unescape_string are a little odd in that they can print to a stream or return a string. See #25620 for a proposal to move/rename these.

Code loading

  • [ ] include_dependency
  • [ ] include_string

include_dependency. Is this even used outside of Base? I can't think of a situation where you would want this instead of include in any typical scenario.

include_string. Isn't this just an officially sanctioned version of eval(parse())?

Things I didn't bother categorizing

  • [x] gc_enable -> GC.enable (#25616)
  • [ ] get_zero_subnormals
  • [ ] set_zero_subnormals
  • [ ] time_ns

get_zero_subnormals and set_zero_subnormals could do with more descriptive names. Do they need to be exported?

All 131 comments

Apologies if this isn't the appropriate place to mention this, but it would be nice to be more consistent with underscores in function names going forward.

No, this is a good place for that. And yes, we should strive to eliminate all names where underscores are necessary :)

  • consistent treatment of "calculate across this [these] dimension[s]" input arguments, what types are allowed etc, consider whether doing these as keyword args might be desired
  • listing and prioritizing our what-comes-first conventions in terms of function arguments for do-blocks, IO arguments for functions that print, outputs for in-place functions, etc (edit: thought there might already be one open for this)

For @tkelman's second point, see https://github.com/JuliaLang/julia/issues/19150

There was also a recent Julep regarding the API for find and related functions: https://github.com/JuliaLang/Juleps/blob/master/Find.md

Should we deprecate put! and take! on channels (and maybe do the same for futures) since we have push! and shift! on them? Just suggesting removing 2 redundant words in the API.

I am suspicious of shift! being user friendly. A candidate is fetch! we already have fetch which is the non-mutating version of take!

ref #13538 #12469

@amitmurthy @malmaud

Edit: It would even make sense to reuse send and recv on channels. (I'm surprised that these are only used for UDPSockets at the moment)

+1 for replacing put!/take! with push!/fetch!

I'll add renaming @inferred to @test_inferred.

Double-check that specializations are consistent with the more generic functions, i.e. not something like #20233.

Review all exported functions to check if any can be eliminated by replacing them with multiple dispatch, e.g. print_with_color

The typical pairing is push! and shift! when working with a queue-like data structure.

If we're not going to use the typical name pairing for this kind of data structure because we're worried that the operation entails communication overhead that isn't adequately conveyed by those names, then I don't think push! makes sense either. send and recv really might be better.

Maybe double-check that there is general consistency between whether functions take a tuple as the last argument or a vararg.

Perhaps too big for this issue, but it would be good to have consistent rules on when functions should throw errors, and when they should return Nullables or Unions (e.g. parse/tryparse, match, etc.)

No issue too big, @simonbyrne – this is the laundry list.

Btw: this isn't really for specific changes (e.g. renaming specific functions) – it's more about kinds of things we can review. For specific proposed changes, just open an issue proposing that change.

We have a lot of tools like @code_xxx that are paired with underlying functions like code_xxx

Not sure if this is what you're talking about, but see CreateMacrosFrom.jl

  • Whether "output type" API's should be in terms of element type or overall container type (ref #11557 and #16740)
  • Document all exported functions (including doctests)

Document all exported functions (including doctests)

if this is part of this, then maybe also: remember to label your tests with the issue/pr number. It makes it a lot easier to understand why that test is there. I know how git blame works, but when adding testsets (just to give an example) it's sometimes a bit of a mystery what is being tested, and it would be great if the issue/pr number was always there.

@dpsanders: and exported macros! e.g. @fastmath has no docstring.

This is very minor, but the string and Symbol functions do almost the same thing and have different capitalization. I think symbol would make more sense.

@amellnik The difference is that Symbol is a type constructor and string is a regular function. IIRC we used to have symbol but it was deprecated in favor of the type constructor. I'm not convinced a change is necessary for this, but if anything I think we should use the String constructor in place of string.

if anything I think we should use the String constructor in place of string.

No, they are different functions and shouldn't be merged

julia> String(UInt8[])
""

julia> string(UInt8[])
"UInt8[]"

No, they are different functions and shouldn't be merged

This looks like a situation where string(args...) should just be deprecated in favor of sprint(print, args...), then - having both string and String is confusing. We could specialize on sprint(::typeof(print), args...) to recover any lost performance. Along these lines, it might also make sense to deprecate repr(x) for sprint(showall, args...).

That sounds ok although calling string to turn something into a string seems pretty standard....

calling string to turn something into a string seems pretty standard

Yes, but that's where the disconnect between String and string comes in.

sprint(print, ...) feels redundant. If we get rid of string, we can rename sprint to string so we get string(print, foo) and string(showall, foo) which reads well in my opinion.

This might be a case where consistency is overrated. I think it's fine to have string(x) for "just give me a string representation of x". If it's going to be more complicated than that, e.g. requiring you to specify which printing function to use, then using another name like sprint makes sense.

It would also be ok with me to rename String(UInt8[]) to something else, and use String instead of string. string gives us a bit more flexibility in the future to change what type of string we return, but that doesn't seem likely to happen.

Does reinterpret(String, ::Vector{UInt8} make sense at all, or is this a pun on reinterpret?

That does seem to make sense.

An issue is that this function is sometimes copying, so that name is somewhat misleading.

True, but strings are supposed to be immutable, so we can probably get away with that.

There is also a String(::IOBuffer) method, but it looks like that could be deprecated to readstring.

I've thought about your proposed API change as well, but the interface of string(a, b...) is that it stringifies and concatenates its arguments, and this would make an annoying gotcha exception for callable first arguments. If we remove concatenation from string then it could be made to work.

Yes, agreed; consistency and avoiding gotchas is most important.

Noting issues #18326 and #3893 in the "dimension arguments" category.

If I can tack on another item: making sure the behavior of containers of mutables is both documented and consistent.

@JaredCrean2: can you elaborate on what you mean by that?

I certainly hope it doesn't involve making lots of "defensive copies".

For example, if I have an array of mutable types and I call sort on it, does the returned array point to the same objects as the input array, or does it copy the objects and make the returned array point to them?

The same objects. I'm pretty sure all our collection sorting, getindex, filtering, searching, etc. methods follow this rule, no?

I don't think there's any lack of clarity or consistency on that point – it's always the same objects.

In fact, I think the only standard function where that's not the case is deepcopy where the whole point is that you get all new objects.

Is that documented somewhere?

No – we could but I'm not sure where it would be best to document it. Why would functions make copies unnecessarily? Where did you get the impression that they might?

Hello. I have not seen i believe any remarks about data serialization.

Soon or later julia programs will be written and run publicly, data will start to stratify sometimes, for years. Data serialization eg. the chain : object to bytes driven by type (maybe over json or ...) has to be built to be time resilient. Thinking about semantic versioning and web api may count too.

Could we expect the serialization for user data to stay close to https://github.com/JuliaLang/julia/blob/v0.5.1/base/serialize.jl ?

Why would functions make copies unnecessarily? Where did you get the impression that they might?

I don't know whether they do or not. As far as I can tell, the behavior is undefined. From @JeffBezanson 's comment, there are people who advocate making defensive copies, which he opposes. So the documentation should address the question of defensive copies somewhere.

You seem to be implying some kind of least-action principle, but depending on the details of the algorithm, what is the "least-action" gets ambiguous. In order to get consistency across the API, I think more specific guidance is required.

@o314: this is an API consistency review issue, I'm not sure how serialization relates.

@JaredCrean2: whether the top-level object is copied or not does certainly need to be documented. What I'm saying is that deeper objects are never copied, except by deepcopy (obviously).

What I'm saying is that deeper objects are never copied, except by deepcopy (obviously).

There was a recent discussion about this in the context of copy for some of the array wrappers, e.g. SubArray and SparseMatrixCSC but also Symmetric, LowerTriangular. It seems to me that under the above mentioned policy, copy would be a noop for such wrapper types. Is the policy you mention the right level of abstraction here? E.g. I think it implies that if Arrays were implemented in Julia (wrapping a buffer), the behavior of copy on Arrays should then change to a noop.

If the convention is that deeper objects are never copied, then all that remains is to document it. Documentation is a really important part of an API. This behavior may seem obvious to you (perhaps because you wrote parts of the code), but from an outside perspective its not so obvious.

Edit: didn't see Andreas post. That's an interesting consideration.

@StefanKarpinski I agree with your point.
And all the main topics invocated here are very good and clever.

But i have sometimes a bit of fear concerning the balancing between process and data in Julia :

We can call fortran or c easily for sure,
But will code be deployed as easily on modern datacenter, for ex. aws lambda with its function as a service pattern. will code be easily callable thru internet, open api?

Sometimes one has to lower the functional load to scale (no generic, no vaargs on function signature, no high-order on public api) and bind more systematically data behind (json schema / openapi).

I have seen some very good python library sinking these way and that's a pity.

I think it's a crucial point for a language 1.0 to keep data and function balanced and modular to be able to easily deploy on the web. And for this function interface has to be less pet and more cattle oriented when needed.

May be that's not the point in this topic.

@StefanKarpinski I may have misunderstood your post. When you said

whether the top-level object is copied or not does certainly need to be documented

what does "top-level object" mean? If I have x::Vector{MyMutableType}, is the top level object x or the elements of x?

Top-level object refers to x itself, not the elements of x.

@andreasnoack The notion of top-level object should refer to the abstract structure implemented, not implementation details.

maybe add float and other similar function that act both on types vs values ?

Going over the 0.6 release notes, it seems odd that iszero(A::Array{T}) is introduced, while many other functions (eg sumabs, isinteger, isnumber) over arrays are deprecated in favor of all(f,A).

There are zero arrays, and they are zero elements in their vector space. iszero generically tests for whether something is the additive inverse, which zero arrays are.

Re. consistency of underscores in function names, here's a breadcrumb to count_ones and count_zeros.

It seems to me that if you want to keep the Julia API consistently consistent, you'll need to have some software that lets you (a) specify what the API rules/conventions are, (b) perform static analysis of Julia code to detect deviations from those rules/conventions, and (c) offer suggestions. Such a tool would benefit both Julia Base and all Julia Packages. A new Julia package may do the trick. (Tongue in cheek: the first thing this package should do is to suggest its own name; APICheck.jl, ApiCheck.jl, API_Check.jl, APIChecker.jl, JuliaAPIChecker.jl, etc.) I'm rather new to Julia, so don't want to take the lead on such a thing. However, I wouldn't mind contributing. Any suggestions for how to get this going?

We'd love to have that in Lint.jl!

num2hex and hex2num (#22031 and #22088)

I too believe that Lint.jl is the right package for Julia API consistency checking. But if we go that route, Stefan's original list must be made much more precise. For example when he writes "we should make sure that all of collections in DataStructures have consistent APIs" questions that come to mind are:

  • What makes a data structure a collection? (List of collections in base)
  • What are the functions that a collection MUST support? (A spreadsheet of functions by collections)
  • What should the signature of each of those functions be?
  • Are the APIs provided by Base for collections actually consistent already?

To manage such inventories and analyses we might want to add a project to the Julia repository (project #8), or to the JuliaPraxis repository (suggested offline by TotalVerb) or to the Lint repository. In that case we'd need to sort out who would own such a project, which people should be involved from the beginning, and who should make the final decisions on what the Julia conventions actually are (for the purpose of linting).

But before progressing more along these lines, I like to ask @StefanKarpinski: what are your ideas about working your list of Julia API consistency issues?

I agree that specing this out specifically is a good idea. Figuring out what that list should be is part of the work here – if you'd like to take a crack at it, that would be great.

Do we really need a Base.datatype_module and a Base.function_module?

A unified function "module" (maybe getmodule) dispatching upon datatype and function seems more consistent to me.

What about the underscores in @code_typed and friends?

That's a nice opportunity for refactoring (the stated reason for the underscore ban). You could have a @code macro with the first argument being the kind of code you want.

(@bramtayl, please remember to put backticks around macros as this pings the github user "code" if not; @code)

FWIW, tab completion only works with function names. Being able to do @code_<TAB> is nice....

If refactoring is considered but rejected, then whether or not there are underscores is moot, because the only point of the underscore ban is to encourage refactoring. In fact, in that case, it seems like underscores should be encouraged for making the language clearer

Underscore audit.

counter-proposal: we still audit these names, but we instead add more underscores where it would make the code easier to read (codetyped vs code_typed, isos2 vs is_os2).

I'm not absolutist about this. I think code_typed is fine, usefully tab-completes as @KristofferC points out, and there isn't really anything obvious to pass as an argument to select which output you want.

To me it seems rather like the ship has sailed on adding more underscores, as we'd have to deprecate basically half of Base. As an example, there are 74 predicate functions that begin with is and only 6 that begin with is_. Which makes more sense, deprecating 6 or 74?

Ok, there's several conflicting goals here:

1) Making names more readable
2) Reducing code churn
3) Encouraging refactoring

Eliminating underscores by colliding words together fails on all 3 fronts.

That show methods accepting a stream are not !-terminated seems inconsistent with the usual convention? Ref. https://github.com/JuliaLang/julia/pull/22604/commits/db9d70a279763ded5088016d9c3d4439a49e3fca#r125115063. Best! (Edit: I suppose this matches write methods accepting a stream.)

There are inconsistencies with the traits API. Some traits are computed by calling the trait like
TypeArithmetic(Float64)
while others this function must be spelled in lowercase:
iteratorsize(Vector{Float64})

Consider renaming size->shape (xref #22665)

Array{T,1}() should probably be deprecated too:

julia> Array{Int,1}()                                                                                                                  
0-element Array{Int64,1}                                                                                                               

julia> Array{Int,2}()                                                                                                                  
WARNING: Matrix{T}() is deprecated, use Matrix{T}(0, 0) instead.                                                                       

Have been thinking about collections. We basically have three kinds:

  • Simple set-like (or bag-like) collections, which just contain some values.
  • Array-like collections, which add an index alongside the data.
  • Dict-like collections, which have indices as part of the data, i.e. k=>v pairs.

I've become skeptical of the dict-like behavior. The canary in the coalmine is map, where operating on key-value pairs isn't natural, since you usually don't want to change the set of keys. It's also possible for arrays and dicts to implement the same interface:

  • keys corresponds to eachindex
  • mapindexed and filterindexed would be useful for both dicts and arrays. These are like map and filter except also pass your function the index of the item in question.
  • Arrays and dicts (and named tuples, and maybe other things) could use a pairs iterator, which is basically a shortcut for zip(keys(c), values(c)).

Consider renaming ind2sub and sub2ind, which are matlabisms apparently, and which have a non-julian and strange name for a non-matlab user. Possible names would be indice and linearindice respectively. I didn't dare to make a PR as I'm not sure what people think about that, but will do if there is support.

Same thing with rad2deg and deg2rad.

Ref. #22791 (select -> partialsort). Best!

One thing I haven't seen here: do optional positional arguments go first or last? Sometimes optional positional arguments go first, as in sum(f, itr) and rand([rng,] ..). But elsewhere they go last, e.g. in median(v[, region]) or split(s::AbstractString[, chars]). Sometimes they can go first or last, but not both! (For example, mean can take a function first or a dimension last, but not both.)

The current language semantics force optional arguments to go last: you can write f(a, b=1) but not f(b=1, a). But if all the optional arguments go last, what happens to convenient do blocks?

If nothing else, it's a minor wart that the language has to define methods like so, from rand.jl: shuffle!(a::AbstractVector) = shuffle!(GLOBAL_RNG, a). The optional positional argument syntax should take care of exactly this use case.

Maybe should go in a separate issue, but it does seem possible to move the optional arguments wherever you want. So for example, f(a = 1, b, c = 2) would define f(x) = f(1, x, 2) and f(x, y) = f(x, y, 2)

xref #22460 for an (unpopular) attempt at enabling default args at any position.

Maybe rename warn to warning (matlab also uses this), not a big deal, but thought I'd mention ?

I like warn since it's a verb, like throw.

I just got very confused by this:

julia> f(;a=1,b=1) = a+b                                                                                                                              
f (generic function with 1 method)                                                                                                                    

julia> f(a=4,5)            # I intended to write f(a=4,b=5)                                                                                                                           
ERROR: MethodError: no method matching f(::Int64; a=4)                                                                                                
Closest candidates are:
  f(; a, b) at REPL[13]:1

I suggest only allowing keywords last when calling functions, similar to when defining keyword functions.

I suggest only allowing keywords last when calling functions, similar to when defining keyword functions.

There are a lot of APIs where passing keywords in other positions is both useful and ergonomic.

Out of curiosity, is there a defined evaluation order for positional and keyword arguments? I saw a few older issues and https://docs.julialang.org/en/latest/manual/functions/#Evaluation-Scope-of-Default-Values-1 talks about scopes, but nothing I've found states whether e.g. arguments are evaluated left to right, or all positional arguments are evaluated before all keyword arguments, or if there's no defined evaluation order (or something else).

@yurivish, for keywords see docs (also https://github.com/JuliaLang/julia/issues/23926). For optional ones the story is a bit more complicated, maybe read here. (Note though that questions are better asked on https://discourse.julialang.org/)

Doesn't seem worth it's own issue, but it always seems odd to me that bits(1) returns a String, it seems like it should be BitVector or Vector{Bool}.

I'd like to suggest a review of method tables that dispatch on Function or Callable. Let's try to make sure these APIs are the way we want them… and that we aren't missing an opportunity to allow for restructuring the tables such that we could allow for duck-calling any object.

As for all and any, those seem straightforward to deprecate to all(f(x) for x in xs), which is already eta-reduced to all(Generator(f, xs)) and so should have no overhead.

Not sure if it's what you meant but I figured it's worth stating just in case: I'm hardcore against deprecating any functional-style APIs for generators. We have any(f, x) and all(f, x) and they're widely used; -10000000 for removing those (or any such methods, really).

Im pro generator. Seems like a fundamental building block of lazy programming and it should be exported. all(Generator(f, xs)) sometimes more convenient for defined functions than all(f(x) for x in xs). Also +10000000 to restore the balance

I prefer having less syntax here if possible. If it's easy and readable and performant to express the idea of "take xs, apply f to everything, and return true iff all of those are true", then why should we put it into one verb?

If the concern is the needless noun x in f(x) for x in xs, then @bramtayl's suggestion of exporting Generator (perhaps using a better name like Map?) makes sense.

Things like all(isnull, x) are quite simpler than all(isnull(v) for v in x). We've deprecated the allnull function from NullableArrays in favor of all(isnull, x); if that syntax went away we'd probably have to reintroduce it.

How about renaming strwidth to stringwidth (I think this is the only exported string manipulation function that has abbreviated string to str)

Actually it's been renamed to textwidth (https://github.com/JuliaLang/julia/pull/23667).

IMO, this issue is too broad to have the 1.0 milestone on it, given we want to get to feature freeze shortly. We may need multiple owners and assign sets of functions for review if we are to do this.

Also, this is another place where FemtoCleaner can auto-update many things post 1.0, even though it would be nice to get it all right.

just a comment on textwidth :

INFO: Testing Cairo
Test Summary:   | Pass  Total
Image Surface   |    7      7
Test Summary:   | Pass  Total
Conversions     |    4      4
Test Summary:   | Pass  Total
TexLexer        |    1      1
WARNING: both Compat and Cairo export "textwidth"; uses of it in module Main must be qualified
Samples        : Error During Test
  Got an exception of type LoadError outside of a @test
  LoadError: UndefVarError: textwidth not defined
  Stacktrace:
   [1] include_from_node1(::String) at .\loading.jl:576
   [2] include(::String) at .\sysimg.jl:14
   [3] macro expansion at C:\Users\appveyor\.julia\v0.6\Cairo\test\runtests.jl:86 [inlined]
   [4] macro expansion at .\test.jl:860 [inlined]
   [5] anonymous at .\<missing>:?
   [6] include_from_node1(::String) at .\loading.jl:576
   [7] include(::String) at .\sysimg.jl:14
   [8] process_options(::Base.JLOptions) at .\client.jl:305
   [9] _start() at .\client.jl:371
  while loading C:\Users\appveyor\.julia\v0.6\Cairo\samples\sample_pango_text.jl, in expression starting on line 28

The error message seems pretty clear about what the problem is? Cairo need to extend the base method.

help?>  Base.textwidth
  No documentation found.

  Binding Base.textwidth does not exist.

julia> versioninfo()
Julia Version 0.6.0
julia> Compat.textwidth
textwidth (generic function with 2 methods)

i had no doubt that the message (which i got via travis) is OK, but why does Compat export textwidth if it's not in 0.6 ?

Because that's the point of Compat? This discussion is going very much out of scope though so I suggest we can continue it on discourse or slack.

I suggest changing copy to shallowcopy and deepcopy to copy as it recently took me quite some time to realise that copy is a “shallow" copy and the function I wrote was mutating the array of arrays. I think it would be much more intuitive if copy would do a "deep" copy and something like shallowcopy is used for shallow copies? I know now when to use deepcopy, but I think many other users will run into the same issue.

Let's please try to keep this issue for API consistency, not a grab-bag of "specific things I don't like".

Associative the name seems quite inconsistent with all the other type names in Julia.

Firstly, types are not usually adjectives. The noun is Association, and is used at least by some Mathematica docs I found.

I think that AbstractDict would be vastly more consistent with other types like AbstractArray, AbstractRange, AbstractSet and AbstractString, each of which have a prototypical concrete types Dict, Array, Range, Set and String.

Our exception types are a bit all over the place: some are named FooError, others are named BarException; a few are exported, most aren't. This could use a pass through for consistency.

So what would be preferred, FooError or BarException? Exported or not?

For me, BarException involves somewhere a raising/catching pattern.

I prefer a lot, and some others in the functional world too, use the Some/None pattern (*) where control flow is more direct and predictable.

So +1 for FooError

(*) Some/Void ex Optional in julia #23642 .

Are these things still on the table given the feature freeze? I'd especially like to tackle optional vs. keyword arguments, but the list of functions with multiple optional arguments (the clearest case for using keyword arguments instead) is pretty long.

Please take a look! I haven't gotten a chance to go through these issues systematically.

BTW, I've noted an inconsistency in the naming of traits: we have iteratorsize, iteratoreltype, but IndexStyle, TypeRangeStep, TypeArithmetic and TypeOrder. Looks like the CamelCase variants are more numerous and more recent, so maybe we should adopt that convention everywhere?

Those should definitely be made consistent. Do you want to make a PR?

This is mostly done or can be done in 1.x releases. I can update the checkboxes, but we just went through them on the triage call and everything but #25395 and the underscore audit are done.

Underscore audit

The following is an analysis of all symbols exported from Base which contain underscores, are not deprecated, and are not string macros. The main thing to note here is that these are exported names only; this does not include unexported names that we tell people to call qualified.

I've separated things out by category. Hopefully that's more useful than it is annoying.

Reflection

We have the following macros with corresponding functions:

  • [ ] @code_llvm, code_llvm
  • [ ] @code_lowered, code_lowered
  • [ ] @code_native, code_native
  • [ ] @code_typed, code_typed
  • [ ] @code_warntype, code_warntype

Whatever change is applied to the macros, if any, should be similarly applied to the functions.

  • [x] module_name -> nameof (#25622)
  • [x] module_parent -> parentmodule (#25629, see #25436 for a previous attempt at renaming)
  • [x] method_exists -> hasmethod (#25615)
  • [x] object_id -> objectid (#25615)
  • [ ] pointer_from_objref

pointer_from_objref could perhaps do with a more descriptive name, maybe something like address?

Aliases for C interop

The type aliases containing underscores are C_NULL, Cintmax_t, Cptrdiff_t, Csize_t, Cssize_t, Cuintmax_t, and Cwchar_t. Those that end in _t should stay, as they're named to be consistent with their corresponding C types.

C_NULL is the odd one out here, being the only C alias containing an underscore that isn't mirrored in C (since in C this is just NULL). We could consider calling this CNULL.

  • [ ] C_NULL

Bit counting

  • [ ] count_ones
  • [ ] count_zeros
  • [ ] trailing_ones
  • [ ] trailing_zeros
  • [ ] leading_ones
  • [ ] leading_zeros

For a discussion of renaming these, see #23531. I very much favor removing the underscores for these, as well as some of proposed replacements in that PR. I think it should be reconsidered.

Unsafe operations

  • [ ] unsafe_copyto!
  • [ ] unsafe_load
  • [ ] unsafe_pointer_to_objref
  • [ ] unsafe_read
  • [ ] unsafe_store!
  • [ ] unsafe_string
  • [ ] unsafe_trunc
  • [ ] unsafe_wrap
  • [ ] unsafe_write

It's probably okay to keep these as-is; the ugliness of the underscore further underscores their unsafety.

Indexing

  • [ ] broadcast_getindex
  • [ ] broadcast_setindex!
  • [ ] to_indices

Apparently broadcast_getindex and broadcast_setindex! exist. I don't understand what they do. Perhaps they could use a more descriptive name?

Interestingly, the single index version of to_indices, Base.to_index, is not exported.

Traces

  • [ ] catch_backtrace
  • [x] catch_stacktrace -> stacktrace(catch_backtrace()) (#25615)

Presumably these are the catch block equivalents of backtrace and stacktrace, respectively.

Tasks, processes, and signals

  • [ ] current_task
  • [ ] task_local_storage
  • [ ] disable_sigint
  • [ ] reenable_sigint
  • [ ] process_exited
  • [ ] process_running

Streams

  • [ ] redirect_stderr
  • [ ] redirect_stdin
  • [ ] redirect_stdout
  • [x] nb_available -> bytesavailable (#25634)

It would be nice to have a more general IO -> IO redirection function into which all of these could be combined, e.g. redirect(STDOUT, io), thereby removing both underscores and exports.

Promotion

  • [ ] promote_rule
  • [ ] promote_shape
  • [ ] promote_type

See #23999 for a relevant discussion regarding promote_rule.

Printing

  • [x] print_with_color -> printstyled (see #25522)
  • [ ] print_shortest (see #25745)
  • [ ] escape_string (see #25620)
  • [ ] unescape_string

escape_string and unescape_string are a little odd in that they can print to a stream or return a string. See #25620 for a proposal to move/rename these.

Code loading

  • [ ] include_dependency
  • [ ] include_string

include_dependency. Is this even used outside of Base? I can't think of a situation where you would want this instead of include in any typical scenario.

include_string. Isn't this just an officially sanctioned version of eval(parse())?

Things I didn't bother categorizing

  • [x] gc_enable -> GC.enable (#25616)
  • [ ] get_zero_subnormals
  • [ ] set_zero_subnormals
  • [ ] time_ns

get_zero_subnormals and set_zero_subnormals could do with more descriptive names. Do they need to be exported?

+1 for method_exists => methodexists and object_id => objectid. It's also kind of silly that catch_stacktrace even exists. It can be deprecated to its definition, stacktrace(catch_backtrace()).

How do we feel about de-underscoring C_NULL? I've gotten pretty used to it, but I also buy the argument that none of the other C* names have an underscore.

The other C names are types, while C_NULL is a constant. I think it's good how it is and follows naming guidelines.

and follows naming guidelines.

How so?

Constants are often all caps with underscores – C_NULL follows that. As @iamed2 said, it's a value, not a type, so the Cfoo naming convention doesn't necessarily apply.

I mistakenly thought https://github.com/JuliaLang/julia/blob/master/doc/src/manual/variables.md#stylistic-conventions referenced constants but it doesn't. It probably should.

I suggest a consistent, mathematically sound, interface for general Hilbert spaces in which vectors are not Julia Arrays. Function names like vecdot, vecnorm, etc. could well be replaced by the general concepts of inner and norm as discussed in https://github.com/JuliaLang/julia/issues/25565.

As I've said a few times, this is not a catchall issue for things one wants to change.

I believe the only items remaining under this umbrella for 1.0 are #25501 and #25717.

I'd like to do something with (get|set)_zero_subnormals but maybe the best short-term solution is to just unexport them.

Something which should probably review is how numbers are treated in the context of collection operations like map and collect. It was pointed that the former returns a scalar but the latter returns a 0D array.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

StefanKarpinski picture StefanKarpinski  ·  3Comments

musm picture musm  ·  3Comments

Keno picture Keno  ·  3Comments

felixrehren picture felixrehren  ·  3Comments

TotalVerb picture TotalVerb  ·  3Comments