summaryrefslogtreecommitdiff
path: root/src/backend/utils/mb/Unicode
AgeCommit message (Collapse)Author
2023-01-02Update copyright for 2023Bruce Momjian
Backpatch-through: 11
2022-01-08Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2021-10-04Make Unicode makefile parallel-safePeter Eisentraut
Fix the rules so that each rule is parallel safe, using the same trickery that we use elsewhere in the tree for rules that produce more than one output file. Refactor the whole makefile so that there is less repetition. Discussion: https://www.postgresql.org/message-id/18e34084-aab1-1b4c-edd1-c4f9fb04f714%40enterprisedb.com
2021-10-04Update Unicode map text filesPeter Eisentraut
A couple of newer ones are available. There are no functional differences, but let's get them in anyway, so that there is no surprise diff next time someone wants to do some actual work in this area.
2021-07-22Fix typo in commentPeter Eisentraut
Author: Kyotaro Horiguchi <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/20210716.170209.175434392011070182.horikyota.ntt%40gmail.com
2021-07-19Remove some whitespace in generated C outputPeter Eisentraut
It doesn't match the normal coding style. Reviewed-by: Kyotaro Horiguchi <[email protected]> Reviewed-by: Heikki Linnakangas <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2021-07-19Make UCS_to_most.pl process encodings in sorted orderPeter Eisentraut
This just makes the progress output easier to follow. Reviewed-by: Kyotaro Horiguchi <[email protected]> Reviewed-by: Heikki Linnakangas <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2021-05-12Initial pgindent and pgperltidy run for v14.Tom Lane
Also "make reformat-dat-files". The only change worthy of note is that pgindent messed up the formatting of launcher.c's struct LogicalRepWorkerId, which led me to notice that that struct wasn't used at all anymore, so I just took it out.
2021-01-02Update copyright for 2021Bruce Momjian
Backpatch-through: 9.5
2020-11-14Fix some typosMichael Paquier
Author: Daniel Gustafsson Discussion: https://postgr.es/m/[email protected]
2020-07-22Fix conversion table generator scripts.Thomas Munro
convutils.pm used implicit conversion of undefined value to integer zero. Some of conversion scripts are susceptible to regexp greediness. Fix, avoiding whitespace changes in the output. Also update ICU URLs that moved. No need to back-patch, because the output of these scripts is also in the source tree so we shouldn't need to rerun them on back-branches. Author: Kyotaro Horiguchi <[email protected]> Discussion: https://postgr.es/m/CA%2BhUKGJ7SEGLbj%3D%3DTQCcyKRA9aqj8%2B6L%3DexSq1y25TA%3DWxLziQ%40mail.gmail.com
2020-04-13Use perl warnings pragma consistentlyAndrew Dunstan
We've had a mixture of the warnings pragma, the -w switch on the shebang line, and no warnings at all. This patch removes the -w swicth and add the warnings pragma to all perl sources missing it. It raises the severity of the TestingAndDebugging::RequireUseWarnings perlcritic policy to level 5, so that we catch any future violations. Discussion: https://postgr.es/m/[email protected]
2020-01-09Add support for automatically updating Unicode derived filesPeter Eisentraut
We currently have several sets of files generated from data provided by Unicode. These all have ad hoc rules and instructions for updating when new Unicode versions appear, and it's not done consistently. This patch centralizes and automates the process and makes it part of the release checklist. The Unicode and CLDR versions are specified in Makefile.global.in. There is a new make target "update-unicode" that downloads all the relevant files and runs the generation script. There is also a new script for generating the table of combining characters for ucs_wcwidth(). That table is now in a separate include file rather than hardcoded into the middle of other code. This is based on the script that was used for generating d8594d123c155aeecd47fc2450f62f5100b2fbf0, but the script itself wasn't committed at that time. Reviewed-by: John Naylor <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2020-01-01Update copyrights for 2020Bruce Momjian
Backpatch-through: update all files in master, backpatch legal files through 9.4
2019-10-13Update unicode.org URLsPeter Eisentraut
Use https, consistent host name, remove references to ftp. Also update the URLs for CLDR, which has moved from Trac to GitHub.
2019-01-02Update copyright for 2019Bruce Momjian
Backpatch-through: certain files through 9.4
2018-05-27Avoid use of unportable hex constant in convutils.pmAndrew Dunstan
Discussion: https://postgr.es/m/[email protected]
2018-05-27Don't fall off the end of perl functionsAndrew Dunstan
This complies with the perlcritic policy Subroutines::RequireFinalReturn, which is a severity 4 policy. Since we only currently check at severity level 5, the policy is raised to that level until we move to level 4 or lower, so that any new infringements will be caught. A small cosmetic piece of tidying of the pgperlcritic script is included. Mike Blackwell Discussion: https://postgr.es/m/CAESHdJpfFm_9wQnQ3koY3c91FoRQsO-fh02za9R3OEMndOn84A@mail.gmail.com
2018-05-09Restrict vertical tightness to parentheses in Perl codeAndrew Dunstan
The vertical tightness settings collapse vertical whitespace between opening and closing brackets (parentheses, square brakets and braces). This can make data structures in particular harder to read, and is not very consistent with our style in non-Perl code. This patch restricts that setting to parentheses only, and reformats all the perl code accordingly. Not applying this to parentheses has some unfortunate effects, so the consensus is to keep the setting for parentheses and not for the others. The diff for this patch does highlight some places where structures should have trailing commas. They can be added manually, as there is no automatic tool to do so. Discussion: https://postgr.es/m/[email protected]
2018-04-27perltidy: Add option --nooutdent-long-commentsPeter Eisentraut
2018-04-27perltidy: Add option --nooutdent-long-quotesPeter Eisentraut
2018-02-24Update headers of generated filesPeter Eisentraut
The scripts were changed in c98c35cd084a25c6cf9b08c76de8b89facd75fe7, but the output files were not updated to reflect the script changes.
2018-02-24Add current directory to Perl include pathPeter Eisentraut
Recent Perl versions don't have the current directory in the module include path anymore, so we need to add it here explicitly to make these scripts continue to work.
2018-02-24Use croak instead of die in Perl code when appropriatePeter Eisentraut
2018-01-03Update copyright for 2018Bruce Momjian
Backpatch-through: certain files through 9.3
2017-12-21Avoid putting build-location-dependent strings into generated files.Tom Lane
Various Perl scripts we use to generate files were in the habit of printing things like "generated by $0" into their output files. That looks like a fine idea at first glance, but it results in non-reproducible output, because in VPATH builds $0 won't be just the name of the script file, but a full path for it. We'd prefer that you get identical results whether using VPATH or not, so this is a bad thing. Some of these places also printed their input file name(s), causing an additional hazard of the same type. Hence, establish a policy that thou shalt not print $0, nor input file pathnames, into output files (they're still allowed in error messages, though). Instead just write the script name verbatim. While we are at it, we can make these annotations more useful by giving the script's full relative path name within the PG source tree, eg instead of Gen_fmgrtab.pl let's print src/backend/utils/Gen_fmgrtab.pl. Not all of the changes made here actually affect any files shipped in finished tarballs today, but it seems best to apply the policy everyplace so that nobody copies unsafe code into places where it could matter. Christoph Berg and Tom Lane Discussion: https://postgr.es/m/[email protected]
2017-10-19UCS_to_most.pl: Process encodings in sorted orderPeter Eisentraut
Otherwise the order depends on the Perl hash implementation, making it cumbersome to scan the output when debugging.
2017-09-05Remove unnecessary parentheses in return statementsPeter Eisentraut
The parenthesized style has only been used in a few modules. Change that to use the style that is predominant across the whole tree. Reviewed-by: Michael Paquier <[email protected]> Reviewed-by: Ryan Murphy <[email protected]>
2017-05-17Post-PG 10 beta1 pgperltidy runBruce Momjian
2017-04-07Remove duplicate assignment.Heikki Linnakangas
Harmless, but clearly wrong. Kyotaro Horiguchi
2017-03-13Include array size in forward declaration.Heikki Linnakangas
Some compilers require it. At least Visual Studio, according to the buildfarm, and gcc with the -pedantic flag.
2017-03-13Use radix tree for character encoding conversions.Heikki Linnakangas
Replace the mapping tables used to convert between UTF-8 and other character encodings with new radix tree-based maps. Looking up an entry in a radix tree is much faster than a binary search in the old maps. As a bonus, the radix tree representation is also more compact, making the binaries slightly smaller. The "combined" maps work the same as before, with binary search. They are much smaller than the main tables, so it doesn't matter so much. However, the "combined" maps are now stored in the same .map files as the main tables. This seems more clear, since they're always used together, and generated from the same source files. Patch by Kyotaro Horiguchi, with lot of hacking by me at various stages. Reviewed by Michael Paquier and Daniel Gustafsson. Discussion: https://www.postgresql.org/message-id/20170306.171609.204324917.horiguchi.kyotaro%40lab.ntt.co.jp
2017-03-13Remove obsolete references to JIS0201.TXT JIS0208.TXT.Heikki Linnakangas
We don't use those files anymore, since commit 1de9cc0dcc.
2017-02-02Add KOI8-U map files to Makefile.Heikki Linnakangas
These were left out by mistake back when support for KOI8-U encoding was added. Extracted from Kyotaro Horiguchi's larger patch.
2017-02-01Small fixes to the Perl scripts to create unicode conversion tables.Heikki Linnakangas
Add missing semicolons in UCS_to_* perl scripts. For consistency, use "$hashref->{key}" style everywhere. Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/[email protected]
2017-01-03Update copyright via script for 2017Bruce Momjian
2016-11-30Make all unicode perl scripts to use strict, rearrange logic for clarity.Heikki Linnakangas
The loops were a bit difficult to understand, due to breaking out of them early. Also fix things that perlcritic complained about. Daniel Gustafsson
2016-11-30Rewrite the perl scripts to produce our Unicode conversion tables.Heikki Linnakangas
Generate EUC_CN mappings from gb-18030-2000.xml, because GB2312.TXT is no longer available. Get UHC from windows-949-2000.xml, it's more up-to-date. Plus tons more small changes. With these changes, the perl scripts faithfully produce the *.map files we have in the repository, from the external source files. In the passing, fix the Makefile to also download CP932.TXT and CP950.TXT. Based on patches by Kyotaro Horiguchi, reviewed by Daniel Gustafsson. Discussion: https://postgr.es/m/[email protected]
2016-11-30Remove leading zeros, for consistency with other map files.Heikki Linnakangas
The common style is to pad to 4 digits. Running the current perl scripts to generate these map files would override this change, but the next commit will rewrite the perl scripts to produce this style. I'm doing this as a separate commit, to make it more clear what non-cosmetic changes the next commit makes to the map files. Discussion: https://postgr.es/m/[email protected]
2016-11-30Remove code points < 0x80 from character conversion tables.Heikki Linnakangas
PostgreSQL treats characters with < 0x80 leading byte as plain ASCII, and they are not even passed to the conversion routines. There is no point in having them in the conversion tables. Everything in the tables were direct ASCII-ASCII mappings, except for two: * SHIFT_JIS_2004 code point 0x5C (backslash in ASCII) was mapped to Unicode YEN SIGN character. * Unicode 0x5C (backslash again) was mapped to "REVERSE SOLIDUS" in SHIFT_JIS_2004 These mappings never had any effect, so there's no functional change from removing them. Discussion: https://postgr.es/m/[email protected]
2016-11-15Fix broken statement in UCS_to_most.pl.Robert Haas
This has been wrong for a very long time, and it's puzzling to me how it ever worked for anyone. Kyotaro Horiguchi
2016-11-01Add make rules to download raw Unicode mapping filesPeter Eisentraut
This serves as implicit documentation and is handy if someone wants to tweak things. The rules are not part of a normal build, like this entire directory.
2016-10-07Remove bogus mapping from UTF-8 to SJIS conversion table.Heikki Linnakangas
0xc19c is not a valid UTF-8 byte sequence. It doesn't do any harm, AFAICS, but it's surely not intentional. No backpatching though, just to be sure. In the passing, also add a file header comment to the file, like the UCS_to_SJIS.pl script would produce. (The file was originally created with UCS_to_SJIS.pl, but has been modified by hand since then. That's questionable, but I'll leave fixing that for later.) Kyotaro Horiguchi Discussion: <[email protected]>
2016-06-12Finish pgindent run for 9.6: Perl files.Noah Misch
2016-03-16UCS_to_EUC_JIS_2004.pl: Turn off "test" mode by defaultPeter Eisentraut
It produces debugging output files that are of no further use, so we don't need that by default.
2016-03-16Make spacing and punctuation consistentPeter Eisentraut
2016-03-04Add prerequisite for KOI8-U.TXTPeter Eisentraut
This was missed when the encoding was added.
2016-03-04Make some adjustments in variable assignmentsPeter Eisentraut
These variables aren't really used for anything interesting, but it seems the existing grouping was somewhat nonsensical.
2016-03-04Add missing rules related to EUC_JIS_2004 and SHIFT_JIS_2004 encodingsPeter Eisentraut
This was apparently forgotten in commit 75c6519ff68dbb97f73b13e9976fb8075bbde7b8.
2016-03-01Add Unicode map generation scripts as rule prerequisitesPeter Eisentraut
That way, the rules will trigger when the scripts change.