For Unicode Character Categories see http://www.fileformat.info/info/unicode/category/index.htm
Haskell implements the type "GeneralCategory" and a function to determine a character's "GeneralCategory".
Their implementation goes like this:
I propose to write a Python script, which does something similar.
Having such a type and function in Rust enables us to correctly implement functions in the "char" module. See http://haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/src/Data-Char.html
The module-in-progress called 'unicode::' in libstd is where I was going to sketch out an interface to libicu. The decision is not actually very simple for most of the character classes, and ICU has this well handled. I guess we can expose it under core::char if everyone's cool with adopting a dependency on libicu?
libicu provides many additional desirable features, and it is probably present on most computers (Python uses it, so it should be fine for us).
Do we want to provide public libicu bindings or just use it internally in modules like "char", "str" etc?
I tend to lean for the latter.
To implement the functions in Rust's "char" correctly using libicu, i think we only need to call functions like "u_isspace()", "u_isdigit ()", "u_forDigit()" (http://icu-project.org/apiref/icu4c/uchar_8h.html).
We wouldn't need full libicu-bindings (including the many constants definitions) yet.
I think we should go for the libicu route. See #1370
Can we re-open this? We don't depend on libicu any more, but there's still no easy way of finding a character's category.
Sorry to comment on a thread so old, I actually just implemented much of the UCD (v9.0.0) here. It doesn't depend on libicu, nor the standard library, so hopefully it should be easy to use with projects (though it's probably not as reliable as ICU).
Most helpful comment
Sorry to comment on a thread so old, I actually just implemented much of the UCD (v9.0.0) here. It doesn't depend on libicu, nor the standard library, so hopefully it should be easy to use with projects (though it's probably not as reliable as ICU).