Talk:Windows-1252
Computing: CompSci C‑class | ||||||||||||||||||||
|
Typography C‑class Low‑importance | ||||||||||
|
Outlining of differences from ISO-8859-1
The article says: "The following table shows Windows-1252, with differences from ISO-8859-1 outlined." How are they outlined? I can't see it.
—Preceding unsigned comment added by 88.79.125.76 (talk) 07:19, 9 July 2009 (UTC)
The introduction to this page needs to state whether Windows-1252 is CP1252 or what relation they have to each other. Only in the table caption are the two terms associated, and this page is the #1 Google hit for CP1252. RCanine 21:14, 17 October 2007 (UTC)
Why can the codepage Windows-1252 not used on MacOS or Linux? --84.61.69.103 16:10, 8 June 2006 (UTC)
- I'm not sure exactly what you mean by that; some software, even in non-Windows operating systems, might recognize this character encoding and properly render documents transmitted or stored using it; however, as its name implies, it's an encoding which was devised by Microsoft for use in Windows, and is not one of the standard, platform-neutral ones such as iso-8859-1 or utf-8. *Dan T.* 16:47, 8 June 2006 (UTC)
- Most operating systems use some unicode encoding and/or a small selection of legacy encodings as an internal format. What is supported by the conversion libs and apps that communicate over open protocols is generally a much wider selection and the windows-125x encodings certainly get this level of support on all major platforms. Windows-1252 is also the unofficial default character set of the web (officially its ISO-8859-1 but since the C1 control codes are banned anyway...........) Plugwash 22:42, 8 June 2006 (UTC)
- I'm not sure exactly what you mean by that, but I guess this; for example, in Linux you can not mount fat32 volumes with 'iocharset=cp1252', and too often 'locales' usually put things worse. At the end, it's almost impossible to interchange files between Windows and Linux without messing up the filenames, (at least for me, an average Linux user, outside USA).
Is Windows-1252 a proprietary codepage? --84.61.43.60 16:44, 31 August 2006 (UTC)
- Well it was introduced by a vendor without any standards body approval but there is nothing preventing anyone who wants to from implementing support for it. As simple statements of facts about an encoding method i'm pretty sure raw codepage tables are not copyrightable and information on the code page is freely availible to anyone who wants it from microsoft themselves, unicode.org and countless other places. Plugwash 16:14, 1 September 2006 (UTC)
"AFAIK, only IE and other MS products do this"
The above was an edit reason for the addition of a {{fact}} tag. I don't have a cite but i tried it with IE and firefox on windows and firefox and konqueror on linux and they all did it. Plugwash 15:51, 6 September 2006 (UTC)
- Whoops, got my facts screwed up. IE and MS products do this for ISO-8859-15. The others do it for ISO-8859-1 only. Removing request for citation. --ChrisRuvolo (t) 19:03, 6 September 2006 (UTC)
- Ineligible for copyright. --▦Frogger3140▦ (talk) 12:20, 26 September 2008 (UTC)
0x80-0x9F was not used in ISO Latin-1
The article says, "The encoding is a superset of ISO 8859-1, but differs by using displayable characters rather than control characters in the 0x80 to 0x9F range." But the ISO 8859-1 page has it that 0x80-0x9F is left unused. Which is right? --Apantomimehorse 02:44, 11 September 2006 (UTC)
- i've made a minor correction to this article now. Plugwash 00:25, 27 September 2006 (UTC)
- This mistake is still present.. Did someone revert it, and why ? Danadocus (talk) 18:25, 12 May 2010 (UTC)
- ISO_8859-1:1987 (better known as the MIME type ISO-8859-1 — note extra dash) does map these characters as control codes. See ISO 8859-1#ISO-8859-1. You are in a maze of twisty encodings, all alike. --ChrisRuvolo (t) 19:19, 12 May 2010 (UTC)
- This mistake is still present.. Did someone revert it, and why ? Danadocus (talk) 18:25, 12 May 2010 (UTC)
Table
The table in this article is really poor; for instance the difference between zero and O is almost invisible and the there is no difference between lower case l and upper case i. Worse, the difference between the various types of quotes is completely lost. I think we should follow the nice example at Code page 437 or at least use a font that can convey the differences of the characters. AxelBoldt 21:00, 6 October 2006 (UTC)
- The FF hex character code (lower right corner of table) is AFAIK only defined in Windows ANSI, not in Latin-1. - Alf—The preceding unsigned comment was added by 81.191.161.87 (talk) 22:39, 18 February 2007 (UTC).
What annoys me is that there is no legend explaining what the different background colors actually mean. --JVersteeg 17:29, 24 September 2007 (UTC)
Copy.
Anyway, I just wanted to make this table look like the one in windows-1250, which omits the lower half.
This provides the same information in a more concise way so I guess that's nicer.
MaxDZ8 talk 13:01, 8 April 2008 (UTC)
I am going to revert this change to take out the bottom half. It is convenient to have it all available for anyone looking for this. While this view shows what is explicitly different the rest of the table is part of the code page. Perhaps the page for 1250 should be changed? this is especially relevant because many MS products still use 1252. An example of this is VBA in excel. tseabrooks —Preceding unsigned comment added by 64.221.222.142 (talk) 16:06, 10 April 2008 (UTC)
I understand. I'm not 100% sure this metric to be valuable, after all, US-ASCII is well defined. Although a consistent look would be desiderable, 1250 has been that way for a while and it seemed to work.
MaxDZ8 talk 07:21, 12 April 2008 (UTC)
Alt key input
In fact, this method enters characters from "ANSI" and "OEM" codepages associated with current keyboard language/layout, not just 1251 and 437. Thus switching keyboard between say Russian and Norwegian one can enter different sets of characters.
Subset
Wow! I thought about making my change a minor one, I did not think anyone could seriously object to my modifications. I am not an experienced wikipedia contributor, so I wanted to play it save ... well.
I am not convinced by your arguments. It doesn't matter whether people use the C1 control codes of ISO 8859-1 or not. The octets in question are defined in both encodings, with different meanings. The term "subset" is, thus, wrong. One can argue that the average user benefits from the extra glyphs he can produce by using Windows-1252, more than from the contol codes of ISO 8859-1. However, this is a technical matter we are talking about; it helps to be precise. More so because readability was not disturbed and there where no information lost due to my editing. Traxer 17:04, 19 February 2007 (UTC)
- One could say that the printable characters (the ones with visible glyphs) of windows-1252 are a superset of those in iso-8859-1, but not the complete set of characters (printable or control) in both encodings. The encodings themselves aren't "sets" (or "subsets" or "supersets"), because, strictly speaking, a mathematical set has no ordinal numbers assigned to its components (there is just a cardinal number of the set's membership), while a character encoding consists of a series of ordinal pairings between numbers and characters. *Dan T.* 19:50, 19 February 2007 (UTC)
- The thing is, for the purposes of this page, abstract mathematical set theory and the Zermelo-Frankel axiom (or whatever) is all completely irrelevant. What matters is is what people commonly mean when they use "superset" in the context of computer software and character "sets. However, if you want to assuage your mathematical conscience, you can reflect that the official ISO/IEC 8859-1 specification technically doesn't define the "C1" control area -- see the green areas in the table on that article page. AnonMoos 04:54, 20 February 2007 (UTC)
- "The encodings themselves aren't sets"?. Hogwash. An encoding isn't a set of characters, but it is a set of (character, code) pairs, and in that sense one encoding can certainly be a subset of another. Mhkay 17:55, 12 September 2007 (UTC)
- If ISO/IEC 8859-1 specification does not define the code values 0x80 to 0x9F, then the phrase "using displayable characters rather than control characters in the 0x80 to 0x9F range" is misleading.Traxer 17:08, 21 February 2007 (UTC)
- Note the distinction between the standard "ISO/IEC 8859-1" IANA charset "ISO-8859-1", the former does not define any control codes, the latter does. Plugwash 20:34, 21 February 2007 (UTC)
- The text compares Windows-1252 and ISO/IEC 8859-1, not Windows-1252 and IANA's ISO-8859-1. AnonMoos is right on that point, 0x80 to 0x9F are not defined. Traxer 10:27, 22 February 2007 (UTC)
Historical Accuracy
This statement strikes me as post-hoc rationalisation rather than historical fact:
"The name has been taken from an early ANSI draft, that later, was modified and became ISO-8859-1."
It seems unlikely because ISO-8859-1 was developed within ECMA and was an ISO standard before it was an ANSI standard. Personally, I have always suspected that Microsoft initially called it ANSI because they were proposing to implement the ANSI standard (that is, the ISO standard which is published in the US rebranded as ANSI), and they continued to refer to it internally as ANSI after they started making changes to it.
Mhkay 17:50, 12 September 2007 (UTC)
- If you can show that ISO-8859-1 was developed internally within ECMA and that Microsoft had no access to its drafts, you will have disproven the statement you cite. Note however that Microsoft is an ECMA member.
- I cannot figure out exactly what you are saying in the second half of the post. Are you saying perhaps that Microsoft intended to submit a modified version of the ISO standard as an ANSI proposal? If so, do you have a citation for that? — Preceding unsigned comment added by 82.139.87.39 (talk) 06:23, 2 October 2011 (UTC)
- I think that the line of development that led to ISO-8859-1 went through several incarnations, and wasn't a Europe-only thing. The DEC VT220 character set seems to have been somewhat influential at the beginning. AnonMoos (talk) 12:02, 2 October 2011 (UTC)
ansinew
The following statement is very imprecise, to say the least:
- In LaTeX packages, it is referred to as ansinew.
Precisely, there's fundamentally one package for dealing with input encoding, namely inputenc.sty and in it Windows-1252 can be referred to both as cp1252 or ansinew of which the documentation literally says "Windows 3.1 ANSI encoding, extension of Latin-1 (synonym1 for cp1252.)" --Blazar.writeto() 23:00, 19 August 2008 (UTC)
Outlines
The text says: "The following table shows Windows-1252, with differences from ISO-8859-1 outlined." That's not true at this moment, apparently because of a MediaWiki limitation. The part that says style="border-width:3px" gets overriden, rather than added to, by the template {{chset-color-punct}} which as of this writing expands to style="background:#DFF7FF;". How should that be fixed? The two solutions I see are to report it to MediaWiki hoping for a fix, or to create new templates like e.g. {{chset-color-punct-outlined}} expanding to style="border-width:3px;background:#DFF7FF;". The latter has the advantage of not having to deal with style=... tags. The former would have the advantage of being able to combine templates, e.g. {{chset-color-punct}}{{outlined}} --pgimeno (talk) 07:49, 14 April 2009 (UTC)
- That's annoying; it's also affecting ISO/IEC 8859-9 (they used to work). AnonMoos (talk) 09:27, 14 April 2009 (UTC)
HTML
The most recent edit removed the interesting fact that HTML 5 says that the character encoding ISO-8859-1 should be handled as though it was CP1252. Can somebody confirm this is true and I think the text should be reverted if so. The current text just says that HTML 5 can accept CP1252 as a character encoding which is nowhere near as interesting and useful as a fact.Spitzak (talk) 21:19, 5 February 2010 (UTC)
- There is info confirming this in a HTML5 draft here: [1]. However, it notes that this is a willful violation of the W3C spec. For that reason, the final HTML5 spec may not include this section, and it would be inappropriate to mention it until finalized, IMO. --ChrisRuvolo (t) 19:13, 12 May 2010 (UTC)
Codepage layout asterisks
In the Codepage layout section there are asterisks after decimal values 128 to 159. It's not clear why these are there. Below the table are two paragraphs. The first paragraph is about the color coding and the second is about character positions 80, 81, 8D, 8F, 90, and 9D.
I'm assuming the asterisks are to lead the reader to the second paragraph but it's confusing to me and I suspect other readers. I'm not sure what the goal of the asterisks are
- The asterisks are next to decimal values and yet the decimal values are never mentioned in the paragraph. Obviously easy to fix but that's not the core issue.
- The paragraph mentions five specific code positions, 80, 81, 8D, 8F, 90, and 9D (hex) and yet there are asterisks next to 27 (decimal) positions in the table.
- Perhaps the asterisks are about the entire C1 control code range. If so, then why don't cells 129, 141, 143, 144, and 157 (decimal) have numbers and asterisks?
- The reference to C1 control code on the second paragraph escapes me (pun intended). Windows-1252 defines glyphs for various points in the 80-9F positions. I do not believe it concerns itself at all with the C1 control codes. --Marc Kupper|talk 05:22, 12 August 2011 (UTC)
- It says above the chart "differences from ISO-8859-1 marked with thick borders and asterisks". Not sure whether both are needed (the asterisks were added because of a past technical problem with the borders, see section "Outlines" above), but that's why it's there... AnonMoos (talk) 11:54, 12 August 2011 (UTC)
Characters 0...31
It states on the Latin-1 page that Unicode codepoints in the Latin-1 range are often interpreted as 1252 by software. According to this page characters 0...31 are empty / control characters. But when I actually print them on Windows (using TextOutW or TextOutA) they show up as code page 437 except for {1...6 16 21...23 25} which are box-drawing characters and {0 9 10 13 28...31} which are empty or possibly control (but TextOut doesn't interpret control codes).
Does anyone know why this is? Are the first 31 characters of 1252 dual-purpose as in 437, are these characters of a different code page (handled specially by TextOut) or something else? — Preceding unsigned comment added by 82.139.87.39 (talk) 08:44, 1 October 2011 (UTC)
- That's probably a Windows API thing. I don't think that Microsoft would have made the "MS LineDraw" font if they had incorporated CP437 specials into Windows-1252... AnonMoos (talk) 12:08, 2 October 2011 (UTC)
- Even though MS LineDraw contains a subset of 437, it is still a superset of the characters talked about above by a far margin. — Preceding unsigned comment added by 82.139.87.39 (talk) 23:44, 27 January 2012 (UTC)
Windows-1252.svg legend
The legend for Windows-1252.svg only mention the blue content. The reader can't be sure about the green and red content ; also the black one but that should really be unchanged content. So I'm assuming the red content is removed and green added, from ANSI to Windows-1252. Even if my assumption is right, not everyone could make the same (assumption) thus a complete legend should be made. — Preceding unsigned comment added by DynV (talk • contribs) 20:20, 30 July 2012 (UTC)
"Microsoft-affiliated bloggers"?
"Details" section, last sentence: Microsoft-affiliated bloggers now state that “The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community.”
"Microsoft-affiliated bloggers"? What's that supposed to mean? — Preceding unsigned comment added by Tharos (talk • contribs) 12:55, 20 August 2013 (UTC)
- Well a quick goole finds a MSDN blog ( http://blogs.msdn.com/b/oldnewthing/archive/2004/05/31/144893.aspx ) which links to a document ( http://download.microsoft.com/download/5/6/8/56803da0-e4a0-4796-a62c-ca920b73bb17/21-Unicode_WinXP.pdf ) supposedly from "Cathy Wissink Program Manager, Windows Globalization Microsoft Corporation" which contains that sentance. Presumablly this is what was reffered to. Plugwash (talk) 13:40, 20 August 2013 (UTC)
Time of origin
I've tried to figure out when 1252 originated. The closest I've come is a table from the Windows 1.0 Programmer's Reference (cited by Charles Petzold in Programming Windows, the old version, not the new one which isn't really about Windows proper any more) that shows what later would become 1252, although many characters are still missing.
1) The × and ÷ symbols weren't added yet, though they did exist in 8859-1.
2) Charles Petzold says the NBSP and SHY didn't exist yet, but I think he was mistaken. Their symbols are present in the table and it'd be silly to assume them to be duplicates of SP and -.
3) DEL is present in 1252 but not in 8859-1.
4) All characters in the ranges 0x, 1x, 8x, 9x were classified as unsupported. Clearly Windows 1.0 did have CR and LF and such. I can only assume the author of the table didn't consider such control bytes part of the character encoding proper. In any case, many characters that are present in 1242 now are still missing. These characters aren't present in 8859-1.
Since Windows 1.0 came out in 1987, I think this has important implications for the origin of the ANSI misnomer. Back then, 1252 pretty much was 8859-1 apart from the control codes and two symbols which were added later anyway. I think this set was already referred to as ANSI before any talk of Unicode in 1990 or so in order to differentiate it from the OEM (=IBM) set.
So when the name ANSI for 1252 originated, it was probably correct.