Skip to content
ReFreezed edited this page Jun 1, 2021 · 2 revisions

Note: The documentation has moved to the LuaWebGen website. Information here may be out of date!

[v1.3] The UTF-8 module, available through the utf8 global, contains some UTF-8 related helper functions.

Note: Positions and lengths are given in bytes, unless otherwise specified.


codepointToString

string = utf8.codepointToString( codepoint )
utf8.codepointToString( codepoint, outputArray )

Convert a single Unicode codepoint to a string, optionally adding the result to an array. Raises an error if the codepoint is outside the valid range.

getCharacterLength

length = utf8.getCharacterLength( string [, position=1 ] )

Get the amount of bytes the character at position takes up (between 1 and 4). Returns nil if the string is invalid at position. Examples:

local s = "aÜx"
print(utf8.getCharacterLength(s, 1)) -- 1 (a)
print(utf8.getCharacterLength(s, 2)) -- 2 (Ü)
print(utf8.getCharacterLength(s, 4)) -- 1 (x)

getCodepointAndLength

codepoint, length = utf8.getCodepointAndLength( string [, position=1 ] )

Get the codepoint for, and amount of bytes taken up by, the character at position. Returns nil if the string is invalid at position.

getLength

length = utf8.getLength( string [, startPosition=1 ] )

Get the total length of a string in characters starting at startPosition. Returns nil and the first error position if the string isn't a valid UTF-8 string. Example:

print(utf8.getLength("aÜx"))    -- 4
print(utf8.getLength("a\255x")) -- nil, 2

getStartOfCharacter

startPosition = utf8.getStartOfCharacter( string, position )

Get the position where the character at position begins. Returns nil if the string is invalid at position. Example:

print(utf8.getStartOfCharacter("aÜx", 3)) -- 2