Additions to std.utf.
Detect whether c is a UTF-8 continuation byte.
Detect whether c is a UTF-16 lead/trail surrogate or not a surrogate.
Detect whether c is the first code unit in a sequence.
Adjust idx to point at the start of a UTF sequence or at the end of str.
Returns minimum/maximum possible length of string conversion to another Unicode Transformation Format result.
import std.range; import std.utf; const str = "abc-ЭЮЯ"; const wlen = toUTF16(str).length; const dlen = walkLength(str); assert(wlen >= minLength!wchar(str) && wlen <= maxLength!wchar(str)); assert(dlen >= minLength!dchar(str) && dlen <= maxLength!dchar(str));
Copies text from source to buff performing conversion to different unicode transformation format if needed.
buff must be large enough to hold the result.
const str = "abc-ЭЮЯ"; wchar[100] wsbuff; assert(copyEncoded(str, wsbuff) == "abc-ЭЮЯ"w);
Copies as much text from the beginning of source to buff as latter can hold performing conversion to different unicode transformation format if needed.
source will be set to its uncopied slice.
import std.array: empty; const(char)[] buff = ...; wchar[n] wbuff = void; while(!buff.empty) f(buff.copySomeEncoded(wbuff)); // `f` accepts at most `n` wide characters