Module:Roman
CodeDiscussionEditHistoryLinksLink count Subpages:DocumentationTestsResultsSandboxLive code All modules
Uses Lua: |
Summary
This module contains functions for working with Roman numerals. Currently used by:
Using this module from templates
Numeral
This function converts an Arabic numeral into a Roman numeral. It works for values between 0 (N) and 4999999999 (M̿M̿M̿M̿C̿M̿X̿C̿I̿X̿C̅M̅X̅C̅I̅X̅CMXCIX): this includes the whole range of unsigned 32-bit integers. The output string no longer contain HTML tags. If needed you can add external CSS formatting using a serif font family, or a small-caps font variant. Arabic numeral zero is output as 'N' (for Classical Latin adverbs "Nec" or "non"), like in standard CLDR Data.
If the input does not look like it contains a number or the number is outside of the supported range, an error message is returned. If an error message is returned, the error message will contain code to categorize pages into Category:Errors reported by Module Roman.
Usage:
{{#invoke:Roman|Numeral|''value''}}
Example:
{{#invoke:Roman|Numeral|8}}
produces VIII.
Arabic
This function converts a Roman numeral into an Arabic numeral. It works for values between 0 (N) and 4999999999 (M̿M̿M̿M̿C̿M̿X̿C̿I̿X̿C̅M̅X̅C̅I̅X̅CMXCIX): this includes the whole range of unsigned 32-bit integers.
If the input does not look like it contains a number or the number is outside of the supported range, an error message is returned. If an error message is returned, the error message will contain code to categorize pages into Category:Errors reported by Module Roman.
Usage:
{{#invoke:Roman|Arabic|''value''}}
Example:
{{#invoke:Roman|Arabic|viii}}
produces 8.
Using this module from Lua code
In order to use the functions in this module from another Lua module you first have to import this module.
Example:
local roman = require('Module:Roman')
_Numeral
This function converts an Arabic numeral into a Roman numeral. It works for values between 0 and 4999999. The output string may contain HTML tags. Arabic numeral zero is output as an empty string. If the input does not look like it contains a number or the number is outside of the supported range an error message is returned. If an error message is returned, the error message will contain code to categorize pages into Category:Errors reported by Module Roman.
Usage:
roman_value = roman._Numeral(value)
isRoman
Tests if the input is a valid Roman numeral. Returns true if so, false if not. For the purposes of this function, the empty string is not a valid Roman numeral.
Usage:
if roman.isRoman(roman_value) then
toArabic
This function converts a Roman numeral into an Arabic numeral. It works for values between 0 and 4999999999. The string 'N' is converted to zero. If the input is not a valid Roman numeral this function attempts to parse it as an Arabic number and returns nil if it also fails.
Usage:
arabic_value = roman.toArabic(value)
See Also
Modules related to internationalization (i18n) of dates
| ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Code
--[[
This module converts Arabic numerals into Roman numerals.
It currently works for any non-negative integer below 5 billions (up to 4 999 999 999).
Zero is represented as "N" (from Classical Latin adverbs "nec" or "non"), like in standard CLDR data.
For numbers starting at 4 thousands, this version no longer generates any HTML+CSS, but only plain-text:
standard Unicode combining diacritics are used for overlines (U+0305 for the first level,
then U+0304 for the second level, but both are treated equivalently when parsing Roman numbers).
For numbers starting at 4 billions, it still uses 4 letters M with double overlines because
triple overlines are not supported in plain-text (this is acceptable, just like "MMMM" is also
acceptable for representing 4000 but this version chooses the shorter "IV" with a single overline).
The Roman number parser will accept all valid notations (except apostrophic/Claudian/lunate notations
using reversed C), more than what it generates, and will correctly convert them to Arabic numbers.
Please do not modify this code without applying the changes first at Module:Roman/sandbox and testing
at Module:Roman/sandbox/testcases and Module talk:Roman/sandbox/testcases.
Authors and maintainers:
* User:RP88, User:Verdy_p
]]
require('strict');
local concat = table.concat
local floor = math.floor
local trim = mw.text.trim
local len = mw.ustring.len
local sub = mw.ustring.sub
local find = mw.ustring.find
local getArgs = require('Module:Arguments').getArgs
local p = {}
--[============[
Private data
--]============]
-- See CLDR data /common/rbnf/root.xml for "roman-upper" rules. However we still don't
-- use the rarely supported Roman extension digits after 'M' (in U+2160..2188), but use
-- the more common notation with diacritical overlines ('ↁ' = 'V̅', 'ↂ' = 'X̅', etc.).
-- Please avoid using HTML with "text-decoration:overline" style, but use plain-text
-- combining characters (U+0304 or U+0305 for single overline, U+033F for double).
-- In this table, combining overlines (U+0304) are preferred to macrons (U+0305), as
-- they align with double overlines (U+033F). None of them are precombined, so you
-- can easily detect and remove them from generated strings. As well the other
-- compatiblity forms (in the next table) are avoided in the generated strings.
local decimalRomans = {
d0 = { [0] = '', 'I', 'II', 'III', 'IV', 'V', 'VI', 'VII', 'VIII', 'IX' },
d1 = { [0] = '', 'X', 'XX', 'XXX', 'XL', 'L', 'LX', 'LXX', 'LXXX', 'XC' },
d2 = { [0] = '', 'C', 'CC', 'CCC', 'CD', 'D', 'DC', 'DCC', 'DCCC', 'CM' },
d3 = { [0] = '', 'M', 'MM', 'MMM', 'I̅V̅', 'V̅', 'V̅I̅', 'V̅I̅I̅', 'V̅I̅I̅I̅', 'I̅X̅' },
d4 = { [0] = '', 'X̅', 'X̅X̅', 'X̅X̅X̅', 'X̅L̅', 'L̅', 'L̅X̅', 'L̅X̅X̅', 'L̅X̅X̅X̅', 'X̅C̅' },
d5 = { [0] = '', 'C̅', 'C̅C̅', 'C̅C̅C̅', 'C̅D̅', 'D̅', 'D̅C̅', 'D̅C̅C̅', 'D̅C̅C̅C̅', 'C̅M̅' },
d6 = { [0] = '', 'M̅', 'M̅M̅', 'M̅M̅M̅', 'I̿V̿', 'V̿', 'V̿I̿', 'V̿I̿I̿', 'V̿I̿I̿I̿', 'I̿X̿' },
d7 = { [0] = '', 'X̿', 'X̿X̿', 'X̿X̿X̿', 'X̿L̿', 'L̿', 'L̿X̿', 'L̿X̿X̿', 'L̿X̿X̿X̿', 'X̿C̿' },
d8 = { [0] = '', 'C̿', 'C̿C̿', 'C̿C̿C̿', 'C̿D̿', 'D̿', 'D̿C̿', 'D̿C̿C̿', 'D̿C̿C̿C̿', 'C̿M̿' },
d9 = { [0] = '', 'M̿', 'M̿M̿', 'M̿M̿M̿', 'M̿M̿M̿M̿' },
}
local romanDecimals = {
-- Basic Latin capital letters
C = 100,
D = 500, -- TODO: add Medieval "apostrophic/Claudian/lunate" notations like 'IƆ'
I = 1,
J = 1, -- = 'I'. Modern distinct form of the formerly unified Roman letter 'I'
L = 50,
M = 1000,
N = 0, -- Abbreviated "nec" or "non" adverb in Classical Latin
O = 0, -- = 'N'.
U = 5, -- = 'V'. Modern distinct form of the formerly unified Roman letter 'V'
V = 5,
W = 10, -- = 'VV' = 'X'. Modern distinct form of the former Roman archaic digraph 'VV' used as a numeral before the adoption of 'X'
X = 10,
Y = 2, -- = 'IJ' = 'II'. Uncommon ligature (borrowed from the lowercased form)
-- Basic Latin small letters (not used in Classical Latin, but added in Medieval Latin)
c = 100, -- = 'C'.
d = 500, -- = 'D'. TODO: add Medieval "apostrophic/Claudian/lunate" notations like 'IƆ'
i = 1, -- = 'I'.
j = 1, -- = 'J' = 'i'. Modern distinct form of the formerly unified Roman letter 'i'
l = 50, -- = 'L'.
m = 1000, -- = 'M'.
n = 0, -- = 'N'. Abbreviated "nec" or "non" adverb in Classical Latin
o = 0, -- = 'O' == 'n'.
u = 5, -- = 'U' = 'v'. Modern distinct form of the formerly unified Roman letter 'v'
v = 5, -- = 'V'.
w = 10, -- = 'W' = 'vv' = 'x'. Modern distinct form of the former Roman archaic digraph 'vv' used as a numeral before the adoption of 'x'
x = 10, -- = 'X'.
y = 2, -- = 'Y' = 'ij' = 'ii'. Uncommon ligature
-- U+012A .. U+012B : LATIN LETTER I/i WITH COMBINING MACRON, canonically equivalent to 'I' and U+0304
['Ī'] = 1000, ['ī'] = 1000,
-- U+016A .. U+016B : LATIN LETTER U/u WITH COMBINING MACRON, canonically equivalent to 'U' and U+0304, modern distinct form of the formerly unified Roman letter 'V'
['Ū'] = 5000, ['ū'] = 5000,
-- U+026A : LATIN SMALL CAPITAL LETTER I
['ɪ'] = 1, -- = 'I'
-- U+029F : LATIN SMALL CAPITAL LETTER L
['ʟ'] = 50, -- = 'L'
-- U+0304 .. U+0305 : COMBINING MACRON/OVERLINE
['\204\132'] = -1000, -- (0xCC,0x84 in UTF-8) multiplier (thousand), MACRON (compatiblity)
['\204\133'] = -1000, -- (0xCC,0x85 in UTF-8) multiplier (thousand), OVERLINE (recommanded)
-- U+033F : COMBINING DOUBLE OVERLINE
['\204\191'] = -1000000, -- (0xCC,0xBF in UTF-8) multiplier (million)
-- U+1D04 : LATIN SMALL CAPITAL LETTER C
['ᴄ'] = 100, -- = 'C'
-- U+1D05 : LATIN SMALL CAPITAL LETTER D
['ᴅ'] = 500, -- = 'D'
-- U+1D0A : LATIN SMALL CAPITAL LETTER J
['ᴊ'] = 1, -- = 'J' = 'I'
-- U+1D0D : LATIN SMALL CAPITAL LETTER M
['ᴍ'] = 1000, -- = 'M'
-- U+1D1C : LATIN SMALL CAPITAL LETTER U
['ᴜ'] = 5, -- = 'U' = 'V'
-- U+1D20 : LATIN SMALL CAPITAL LETTER V
['ᴠ'] = 5, -- = 'V'
-- U+2160 .. U+216F : Roman capital digit symbols (compatibility, monospaced in CJK fonts)
['Ⅰ'] = 1, ['Ⅱ'] = 2, ['Ⅲ'] = 3, ['Ⅳ'] = 4, ['Ⅴ'] = 5, ['Ⅵ'] = 6,
['Ⅶ'] = 7, ['Ⅷ'] = 8, ['Ⅸ'] = 9, ['Ⅹ'] = 10, ['Ⅺ'] = 11, ['Ⅻ'] = 12,
['Ⅼ'] = 50, ['Ⅽ'] = 100, ['Ⅾ'] = 500, ['Ⅿ'] = 1000,
-- U+2170 .. U+217F : Roman lowercase digit symbols (compatibility, monospaced in CJK fonts)
['ⅰ'] = 1, ['ⅱ'] = 2, ['ⅲ'] = 3, ['ⅳ'] = 4, ['ⅴ'] = 5, ['ⅵ'] = 6,
['ⅶ'] = 7, ['ⅷ'] = 8, ['ⅸ'] = 9, ['ⅹ'] = 10, ['ⅺ'] = 11, ['ⅻ'] = 12,
['ⅼ'] = 50, ['ⅽ'] = 100, ['ⅾ'] = 500, ['ⅿ'] = 1000,
-- U+2180 .. U+2182 : Old Roman symbols (these have no case pairs)
['ↀ'] = 1000, -- = 'I̅' = 'M'. TODO: add Medieval "apostrophic/Claudian/lunate" notations like 'CIƆ'; do not confuse it with 'CD' (400=500-100)
['ↁ'] = 5000, -- = 'V̅'. TODO: add Medieval "apostrophic/Claudian/lunate" notations like 'DƆ' and 'IƆƆ'
['ↂ'] = 10000, -- = 'X̅'. TODO: add Medieval "apostrophic/Claudian/lunate" notations like 'CCIƆƆ'
-- U+2183..U+2184 : ROMAN DIGIT (CAPITAL|LOWER) REVERSED C.
-- TODO: add for "apostrophic/Claudian/lunate" notations (and support 'Ɔ' LATIN OPEN O as aliases)
-- The "reversed C" ('Ɔ') is a trailing multiplier by 10 but if it is not paired by a leading 'C', the surrounded value will be divided by 2:
-- * "I" = 1, but if followed by followed by 'Ɔ', it takes the value 100:
-- * when followed by a first 'Ɔ' it multiplies it by 10 giving 1000 (assuming 'CIƆ'), but if not prefixed by a pairing 'C', gives 500 for 'IƆ' = 'D'.
-- * when followed by a second 'Ɔ' it multiplies it by 10 giving 1000 (assuming 'CCIƆƆ'), but if not prefixed by a pairing 'C', gives 5000 for 'IƆƆ' = 'DƆ'.
-- * for higher multiples, using overlines is highly preferred for noting multipliers by 1000.
-- U+2185: ROMAN NUMERAL SIX LATE FORM (borrowed in Latin, from Greek Final sigma in capital form, similar to 'C' with a leg)
['ↅ'] = 6, -- = 'VI'
-- U+2186: ROMAN NUMERAL FIFTY EARLY FORM ('VI' overstriking letters)
['ↆ'] = 50, -- = 'L'
-- U+2187 .. U+2188: ROMAN NUMERAL (ONE HUNDRED|FIFTY) THOUSAND (Archaic, rarely supported in fonts)
['ↇ'] = 50000, -- = 'L̅'. TODO: add Medieval "apostrophic/Claudian/lunate" notations like 'DƆƆ' and 'IƆƆƆ'
['ↈ'] = 100000, -- = 'C̅'. TODO: add Medieval "apostrophic/Claudian/lunate" notations like 'CCCDƆƆ' and 'CCCIƆƆƆ'
-- TODO: map mathematical symbols (Latin letters in designated styles), enclosed Latin letters, CJK wide variants
}
--[=================[
Private functions
--]=================]
--[==[
This function returns a string containing the input value formatted as a Roman numeral.
It works for non-negative integers lower than 5 billions (up to 4 999 999 999: this covers
all unsigned 32-bit integers), otherwise it returns the number formatted using Latin
digits. The result string will be an UTF-8-encoded plain-text alphabetic string.
]==]--
local function convertArabicToRoman(value)
if value == nil then
return ''
elseif value == 0 then
return 'N' -- for adverbs "nec" or "non" in Classical Latin (which had no zero)
elseif value >= 1 and value <= 4999999999 and value == floor(value) then
local d0, d1, d2, d3, d4, d5, d6, d7, d8
d0, value = value % 10, floor(value / 10)
d1, value = value % 10, floor(value / 10)
d2, value = value % 10, floor(value / 10)
d3, value = value % 10, floor(value / 10)
d4, value = value % 10, floor(value / 10)
d5, value = value % 10, floor(value / 10)
d6, value = value % 10, floor(value / 10)
d7, value = value % 10, floor(value / 10)
d8, value = value % 10, floor(value / 10)
return concat({
decimalRomans.d9[value],
decimalRomans.d8[d8],
decimalRomans.d7[d7],
decimalRomans.d6[d6],
decimalRomans.d5[d5],
decimalRomans.d4[d4],
decimalRomans.d3[d3],
decimalRomans.d2[d2],
decimalRomans.d1[d1],
decimalRomans.d0[d0],
})
else
return tostring(value)
end
end
--[==[
This function converts a plain-text string containing a Roman numeral to an integer.
It works for values between 0 (N) and 4 999 999 999 (M̿M̿M̿M̿C̿M̿X̿C̿I̿X̿C̅M̅X̅C̅I̅X̅CMXCIX).
]==]--
local function convertRomanToArabic(roman)
if roman == '' then return nil end
local result, prevRomanDecimal, multiplier = 0, 0, 1
for i = len(roman), 1, -1 do
local currentRomanDecimal = romanDecimals[sub(roman, i, i)]
if currentRomanDecimal == nil then
return nil
elseif currentRomanDecimal < 0 then
multiplier = -currentRomanDecimal * multiplier
else
currentRomanDecimal, multiplier = currentRomanDecimal * multiplier, 1
if currentRomanDecimal < prevRomanDecimal then
result = result - currentRomanDecimal
else
result = result + currentRomanDecimal
prevRomanDecimal = currentRomanDecimal
end
end
end
return result
end
--[==[
This function converts a string containing a Roman numeral to an integer.
It works for values between 0 and 4999999999.
The input string may contain HTML tags using style="text-decoration:overline" (not recommended).
]==]--
local function convertRomanHTMLToArabic(roman)
local result = convertRomanToArabic(roman)
if result == nil then
result = tonumber(roman)
end
return result
[==[ DISABLED FOR NOW, NOT REALLY NEEDED AND NOT CORRECTLY TESTED
local result = 0
local overline_start_len = len(overline_start)
if sub(roman, 1, overline_start_len) == overline_start then
local end_tag_start, end_tag_end = find(roman, overline_end, overline_start_len, true)
if end_tag_start ~= nil then
local roman_high = sub(roman, overline_start_len + 1, end_tag_start - 1)
local roman_low = sub(roman, end_tag_end + 1, len(roman)) or ''
if find(roman_high, "^[mdclxvi]+$") and find(roman_low, "^[mdclxvi]*$") then
result = convertRomanToArabic(roman_high) * 1000 + convertRomanToArabic(roman_low)
end
end
end
return result
]==]
end
--[==[
Helper function to handle error messages.
]==]--
local function outputError(message)
return concat({
'<strong class="error">Roman Module Error: ', message,
'</strong>[[Category:Errors reported by Module Roman]]'
})
end
--[================================[
Public functions for Lua modules
--]================================]
--[==[
isRoman
Tests if the trimmed input is a valid Roman numeral. Returns true if so, false if not.
For the purposes of this function, the empty string (after trimming whitespaces) is not a Roman numeral.
Parameters:
s: string to test if it is a valid Roman numeral
Error Handling:
If the input is not a valid Roman numeral this function returns false.
]==]--
function p.isRoman(s)
return type(s) == 'string' and convertRomanToArabic(trim(s)) ~= nil
end
--[==[
toArabic
This function converts a Roman numeral into an Arabic numeral.
It works for values between 0 and 4999999999.
'N' is converted to 0 and the empty string is converted to nil.
Parameters:
roman: string containing value to convert into an Arabic numeral
Error Handling:
If the input is not a valid Roman numeral, this function returns nil.
]==]--
local function toArabic(roman)
if type(roman) == 'string' then
roman = trim(roman)
local result = convertRomanToArabic(roman)
if result == nil then
result = tonumber(roman)
end
return result
elseif type(roman) == 'number' then
return roman
else
return nil
end
end
p.toArabic= toArabic
--[==[
_Numeral
This function returns a string containing the input value formatted as a Roman numeral.
It works for values between 0 and 4999999999.
Parameters:
value: integer or string containing value to convert into a Roman numeral
Error Handling:
If the input does not look like it contains a number
or the number is outside of the supported range,
an error message is returned.
]==]--
local function _Numeral(value)
if value == nil then
return outputError('missing value')
end
if type(value) == 'string' then
value = tonumber(value)
elseif type(value) ~= 'number' then
return outputError('unsupported value')
end
return convertArabicToRoman(value)
end
p._Numeral = _Numeral
--[==============================[
Public functions for MediaWiki
--]==============================]
--[==[
Arabic
This function for MediaWiki converts a Roman numeral into an Arabic numeral.
It works for values between 0 and 4999999999.
'N' is converted to 0 and the empty string is converted to nil.
Usage:
{{#invoke:Roman|Arabic|<value>}}
{{#invoke:Roman|Arabic}} - uses the caller's parameters
Parameters:
1: Value to convert into an Arabic numeral. Must be a valid Roman numeral.
Error Handling:
If the input does not look like it contains a number
or the number is outside of the supported range,
an error message is returned.
]==]--
function p.Arabic(frame)
-- if no argument provided than check parent template/module args
local args = getArgs(frame)
return toArabic(args[1])
end
--[==[
Numeral
This function for MediaWiki converts an Arabic numeral into a Roman numeral.
It works for values between 0 and 4999999999 (includes the whole range of unsigned 32-bit integers).
Arabic numeral zero is output as 'N' (for Latin negation adverbs "nec" or "non").
Usage:
{{#invoke:Roman|Numeral|<value>}}
{{#invoke:Roman|Numeral}} - uses the caller's parameters
Parameters:
1: Value to convert into a Roman numeral. Must be at least 0 and less than 5,000,000.
Error Handling:
If the input does not look like it contains a number
or the number is outside of the supported range,
an error message is returned.
]==]--
function p.Numeral(frame)
local args = getArgs(frame)
return p._Numeral(args[1])
end
return p