Rust: "char": add a type and function for Unicode Character Categories

Created on 20 Dec 2011  ·  6Comments  ·  Source: rust-lang/rust

For Unicode Character Categories see http://www.fileformat.info/info/unicode/category/index.htm

Haskell implements the type "GeneralCategory" and a function to determine a character's "GeneralCategory".
Their implementation goes like this:

I propose to write a Python script, which does something similar.

Having such a type and function in Rust enables us to correctly implement functions in the "char" module. See http://haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/src/Data-Char.html

Most helpful comment

Sorry to comment on a thread so old, I actually just implemented much of the UCD (v9.0.0) here. It doesn't depend on libicu, nor the standard library, so hopefully it should be easy to use with projects (though it's probably not as reliable as ICU).

All 6 comments

The module-in-progress called 'unicode::' in libstd is where I was going to sketch out an interface to libicu. The decision is not actually very simple for most of the character classes, and ICU has this well handled. I guess we can expose it under core::char if everyone's cool with adopting a dependency on libicu?

libicu provides many additional desirable features, and it is probably present on most computers (Python uses it, so it should be fine for us).

Do we want to provide public libicu bindings or just use it internally in modules like "char", "str" etc?
I tend to lean for the latter.

To implement the functions in Rust's "char" correctly using libicu, i think we only need to call functions like "u_isspace()", "u_isdigit ()", "u_forDigit()" (http://icu-project.org/apiref/icu4c/uchar_8h.html).

We wouldn't need full libicu-bindings (including the many constants definitions) yet.

I think we should go for the libicu route. See #1370

Can we re-open this? We don't depend on libicu any more, but there's still no easy way of finding a character's category.

Sorry to comment on a thread so old, I actually just implemented much of the UCD (v9.0.0) here. It doesn't depend on libicu, nor the standard library, so hopefully it should be easy to use with projects (though it's probably not as reliable as ICU).

Was this page helpful?
0 / 5 - 0 ratings