User:Cmglee/extract lang.py
Appearance
This Python3 script by user:cmglee extracts and writes a monolingual SVG file from a multilingual SVG file, to let a language version be previewed in a Web browser during its development. (The alternative way to view a non-default language in a multilingual SVG in a Web browser is to install and change the language of the browser and restart it.)
Output filenames are the original filename with "-<ISO639_CODE>" added before the last ".".
Usage
[edit]python3 extract_lang.py <SVG_FILENAME> [<ISO639_CODE> <ISO639_CODE> ...]
ISO639_CODE is as listed on commons:template:list of supported languages. If no codes are provided, all languages found in the file (and the default) are output.
Source code
[edit]As Wikimedia does not allow general executable files to be uploaded, the source code is provided below
#!/usr/bin/env python3
## Extract and write a monolingual SVG from a multilingual SVG, to preview in a browser, by CMG Lee.
## Usage: python3 extract_lang.py <SVG_FILENAME> [<ISO639_CODE> <ISO639_CODE> ...] (all if omitted)
import re, sys
def extract_lang(svg_all, lang):
svg_langs = {} ## svg_langs[code] = source
svg_currents = [] ## current language content
level = 1 ## DOM level under switch
for svg_part in re.findall(r'.*?>', svg_all, flags=re.DOTALL):
svg_currents.append(svg_part)
if re.findall(r'<\s*/', svg_part): level -= 1
elif not re.findall(r'/\s*>', svg_part): level += 1
if level == 1:
findall_lang = re_lang.findall(svg_currents[0])
lang_current = findall_lang[0] if len(findall_lang) > 0 else None
svg_langs[lang_current] = ''.join(svg_currents)
svg_currents = []
return re_lang.sub('', svg_langs[lang] if lang in svg_langs else svg_langs[None])
re_lang = re.compile(r'\s*systemLanguage\s*=\s*"\s*([^\s"]+)"', flags=re.I)
path_in = sys.argv[1]
with open(path_in, encoding='utf-8', newline='') as f: svg_in = f.read()
for lang in sys.argv[2:] if len(sys.argv) > 2 else set(re_lang.findall(svg_in) + ['default']):
path_out = re.sub(r'(\..+?)$', r'-%s\1' % (lang), path_in, flags=re.DOTALL)
print(path_out)
svg_out = re.sub(r'(<\s*switch[^>]*>)(.*?)(\s*<\s*/\s*switch[^>]*>)',
lambda matchs:extract_lang(matchs.group(2), lang),
re.sub(r'<!--.*?-->', '', svg_in), flags=re.I | re.DOTALL)
with open(path_out, 'w', encoding='utf-8', newline='') as f: f.write(svg_out)