This template falls within the scope of WikiProject Writing systems, a WikiProject interested in improving the encyclopaedic coverage and content of articles relating to writing systems on Wikipedia. If you would like to help out, you are welcome to drop by the project page and/or leave a query at the projectβs talk page.Writing systemsWikipedia:WikiProject Writing systemsTemplate:WikiProject Writing systemsWriting system
Although generally keen on char, I'd need to be convinced in this case. Char is used to "isolate" a glyph under discussion from the associated running text. In the output of unichar, that is usually clear.
The only argument in favour that I can see is that, at present, unichar identifies the glyph by increasing its size and maybe the faint box used by char would be better? But conversely magnification makes it easier to "read".
It's clear to anyone who's familiar with the format, but I'm not sure it's as clear to a general reader, especially one who doesn't know what the "U+ stuff" means. I haven't noticed any specific problems that this would solve, I just think it's good to have a consistent format for "inline character literals" on Wikipedia. jlwoodwa (talk) 08:19, 9 July 2023 (UTC)[reply]
So how would we handle this example: U+20E0β COMBINING ENCLOSING CIRCLE BACKSLASH (which is already not handled terribly well). Likewise, Asiatic scripts present issues that don't occur to those of us only familiar with alphabetic scripts. A lot of development work has gone into this template to deal with these issues so changing it would not be trivial, given the need to verify many many test cases and rewrite to resolve anomalies. Annoyingly, one of the recent main developers, user:DePiep, is no longer available to advise. --πππ½ (talk) 10:20, 9 July 2023 (UTC)[reply]
β seems to work just fine. I understand the difficulty of modifying such a convoluted and widely-used template, though. Since it sounds like it's not obviously a bad idea, I'll try the "obvious implementation" in the sandbox, and give an update here when it's working. jlwoodwa (talk) 10:35, 9 July 2023 (UTC)[reply]
Combining diacritics are displaying as tofu on Android - fault may be in cwith= handling?
I don't know if this is new? The argument cwith=◌ or cwith=β is used heavily to display combining diacritics. I'm editing in Android right now and the symbol displays correctly. But in articles like diacritic, it is has more tofu than a Japanese restaurant. Is there a style serif somewhere that is blocking the last resort substitution? --πππ½ (talk) 13:22, 21 September 2023 (UTC)[reply]
No solution suggested, it is an implementation defect in Android. So unless someone has a back-channel to Google, we just have to grin and bear it. --πππ½ (talk) 16:33, 22 September 2023 (UTC)[reply]
Requirement: when cwith=β is invoked, wrap the output in <span style="font-family: serif">{{1}}</span>, where {{1}} is the sequence dotted circle + combining diacritic. Is there a doctor in the house? --πππ½ (talk) 16:32, 23 September 2023 (UTC)[reply]
Will this work {{unichar |0301 |combining acute accent |cwith=β|use=script|use2=serif}} U+0301βΜCOMBINING ACUTE ACCENT, I just extended the template to support "serif" as a use2 param if you set use as "script." This might work also. {{unichar |0301 |combining acute accent |cwith=β|use=script|use2=noto}} U+0301βΜCOMBINING ACUTE ACCENTAndreπ20:03, 23 September 2023 (UTC)[reply]
Yes, that would work. I hate to be ungrateful but to employ that solution would create a lot of work, many many articles would to be updated to use itΒ β and, when Google discards Roboto as default sans font, would all have to be undone again. AFIK, this is the only use-case for cwith=β so it would not have any deleterious effect elsewhere (and would be easy to back out). [BTW, we couldn't have use2=noto because it would break Bing and Safari.] --πππ½ (talk) 22:11, 23 September 2023 (UTC)[reply]
Can anyone explain (better still fix) this phenomenon:
U+0360ββΝ COMBINING DOUBLE TILDE , a tilde diacritic that spans a pair of adjacent characters: βΝ β no markup: βΝ β
Just using the characters directly puts the diacritic in the right place but unichar fails (placement is offset). (At least when using Chrome on Chromebook).
|cwith=ββ puts the dotted circles before the diacritic, but the diacritic is supposed to be between them. I don't know how it should be fixed though. β EruΒ·tuon19:21, 22 September 2023 (UTC)[reply]
Ah, of course. Obvious really. <blush> There are very few of these two-character diacritics so I don't really see it being worth anyone's while hacking the template to fix it. I'll just add a note to the documentation to say it doesn't work, handcrafting is required. --πππ½ (talk) 19:37, 22 September 2023 (UTC)[reply]
I have added this text. It is not quite right, the display of the U+0360 is not exactly as produced by the template but does it matter?
** Note that cwith=ββ does not provide the desired result if the intention is to display a diacritic that spans two characters (such as those in the range U+035C to U+0362): the diacritic will be offset. In such cases, editors must emulate the template output by hand, because the correct HTML sequence is "first-character + combining-diacritic + second-character". Thus, for example, to show the combining double tilde U+0360, write U+0360 ◌͠◌ then (in {{small}}), COMBINING DOUBLE TILDE. This produces U+0360 βΝ β COMBINING DOUBLE TILDE.
Really this needs a "print this instead" for the character. All this size/font/cwith stuff could be put into that instead of trying to fool the automatic text generator into producing the desired result. Spitzak (talk) 21:50, 23 September 2023 (UTC)[reply]
I meant that there could be a parameter, perhaps show, so that if invoked with show=foobar then instead of showing the character it shows "foobar". This could then contain any wiki or html markup desired and any trick needed to get the character to be correctly visible. In this example it would contain the two circles and the combining diacritic. Spitzak (talk) 00:08, 28 December 2023 (UTC)[reply]
Enhancement request: sanity check or lazy invocation
At Copyright sign, a vandal changed {{unichar|25|Percent sign|html=}} to {{unichar|26|Percent sign|html=}}. No error was generated, though inspection shows that the name doesn't match the new, wrong, glyph. The template really should do a sanity check that the name actually matches the code-point and display an error status if not. For familiar glyphs likeΒ % and &, it is obvious but not if it is a j
Better still, don't ask for any text, indeed ignore any provided. A simple {{unichar|25}} should fetch the official name and not expect editors to do make-work.
It seems this has fallen through the cracks. I'm going to see if I can wrangle a modification to this template that will simply allow one to print the canonical Unicode name for a given code point. I would prefer it being the default or only behavior, but I am curious is this would be a problem for anyone. Remsenseθ―12:58, 5 April 2024 (UTC)[reply]
The issue being, it seems we need a data module of 150k entries that the module has to be searched every timeβif we want to prevent vandalism, anywayβand that's about three orders of magnitude more entries than I've seen a module on here work with, so I am worried by the potential server load. Remsenseθ―18:16, 5 April 2024 (UTC)[reply]
Doy, you're completely right on the latter point. Had the current flowing the wrong way in my brain there. I'll poke the pump. Remsenseθ―18:27, 5 April 2024 (UTC)[reply]
The sooner we can put this live, the better. There's a lot of it about! (Kudos to Nickps for spotting this one in such a high-profile article but such basic stuff should't depend on eagle eyes to keep clean.) --πππ½ (talk) 10:33, 7 April 2024 (UTC)[reply]
I am not sure of a particular reason why it can't, I just didn't want to be rash about doing so. It's not like it was a particularly technical change, if you'd like to do the honors? Remsenseθ―10:38, 7 April 2024 (UTC)[reply]
The template should certainly ignore the text given but maybe we should start with a green warning to say that the template has done so. One like the error message you get if you accidently type firdt=John in a CS1/2 citation. We could do it silently and let those who have been taking advantage of the failure to check come and read the (to be revised) documentation which will tell them that the free text field is no more. πππ½ (talk) 12:55, 7 April 2024 (UTC)[reply]
Revising the doc, I noticed that calling the template with no text generated just omitted it. I can't see why anyone would want to do that but we had best add a name=none option? πππ½ (talk) 13:10, 7 April 2024 (UTC)[reply]
I think it's nice to have just because I often am too lazy to tab to a template's documentation so I try all the things (=none? could it be =false? how about =no? Surely it will no longer confound me if I try =""βthere we go!) Remsenseθ―13:13, 7 April 2024 (UTC)[reply]
This is usually the pragmatist's move with a binary parameter. I swear there's a thing that lets you check all the ways a user wants to say no or yes to something. Remsenseθ―14:09, 7 April 2024 (UTC)[reply]
I probably don't deserve praise for that one considering I'm the one who made the mistake in the first place [1] but thanks, I guess. Nickps (talk) 11:06, 7 April 2024 (UTC)[reply]
In Unicode, the majuscule Ζ’ is encoded in the Latin Extended-B block at U+01A2 and the minuscule Ζ£ is encoded at U+01A3.[1] The assigned names, "LATIN CAPITAL LETTER OI" and "LATIN SMALL LETTER OI" respectively, are acknowledged by the Unicode Consortium to be mistakes, as gha is unrelated to the letters O and I.[2] The Unicode Consortium therefore has provided the character name aliases "LATIN CAPITAL LETTER GHA" and "LATIN SMALL LETTER GHA".[1]
Right now, we have
U+01A2Ζ’LATIN CAPITAL LETTER OI
We need a alias= as in alias=LATIN CAPITAL LETTER GHA , as suggested by Chatul at the Village Pump. There are a very few such cases where an error was made in the original standard that will never be changed. --πππ½ (talk) 13:49, 7 April 2024 (UTC)[reply]
I think it would be ok for arg 1 to continue to work. Instead find all the invocations of this template and remove arg 1 unless it is actually necessary.Spitzak (talk) 19:07, 8 April 2024 (UTC)[reply]
In principle, you are absolutely rightΒ β but in practice that would be a huge task, wildly out of proportion to the tiny number of cases where the Unicode Consortium admits it made an error. This is the most practicable solution to this specific problem. Meanwhile, ignoring the supplied 2= in favour of the canonical text resolves immediately the rather more cases of spelling errors and vandalism. --πππ½ (talk) 20:25, 8 April 2024 (UTC)[reply]
I have seen a lot of nlink=<blank>, indeed I confess to have been a major perpetratorΒ β "monkey see monkey do". It works (worked) and there was (is?) no error message to say No data supplied with nlink=, ignored. So we need ...
first: a list of articles that use nlink= with no data, so that someone (aka me, since I know many of them are my fault) can go round and correct them. [I believe that the template already has such an exceptions report, though whether anyone has been checking since DePiep got canned must be doubtful.) Then we can reinstate the change.
second, add some code to say (for all the optional parameters), No data supplied with <param>=, ignored
PS sorry to have dropped the bombshell and not been around until now to help with the cleanup; officially I was otherwise engaged and shouldn't have been in a position to spot the error. <blush> --πππ½ (talk) 23:01, 8 April 2024 (UTC)[reply]
My "first" wouldn't be needed if the current interception of nlink=<blank> were changed so that it linked to the U+XXXX or the target character rather than some name? Which adds support to the question of "do we even need nlink=Β ?". --πππ½ (talk) 23:58, 8 April 2024 (UTC)[reply]
Don't apologize at all! Nothing about this is particularly burdensome. I am leaning towards linking to the character itself, are there cases where this is going to break? Remsenseθ―00:03, 9 April 2024 (UTC)[reply]
So, do you think directly linking to the character itself is the best move? That's where I am presently unless there are edge cases (e.g. I can think of high-range code points and non-printable ones, and maybe we can define those manually). Remsenseθ―02:26, 9 April 2024 (UTC)[reply]
It looks to be a neat solution. The only catch that I can see is that these U+XXXX aren't well watched and may be subject to vandalism. It is not an obvious vector for a "bad actor" so I guess it is a reasonable risk. The problem is that the attack won't be obvious and someone following a link to a Gardiner's sign list entity will have no idea how it happened. --πππ½ (talk) 23:01, 8 April 2024 (UTC)[reply]
Are there any cases of nlink=target-name#section-name? I can't think why there would but if it is possible (as it is), someone somewhere will have done it. <sigh> --πππ½ (talk) 23:58, 8 April 2024 (UTC)[reply]
Though there are cases where the nlink goes to a broad concept article (such as Gardiner's sign list) when there is no specific article. So nlink=<something other than one codepoint> is certainly valid and useful.
So to solve the current problem, we just need to change the behaviour of nlink=<nothing> so that it links to the target character article rather than its Unicode name. As you proposed already, I think? But we can't dispense with nlink= completely and just link everything willy-nilly since many codepoints (e.g., Chinese characters) don't have their own articles. --πππ½ (talk) 08:11, 9 April 2024 (UTC)[reply]
Testcases
As a template editor, I find it helpful, when people point out exceptions and cases like this, to put them in the testcases page so that future editors do not have to remember them. β Jonesey95 (talk) 21:52, 8 April 2024 (UTC)[reply]
Which testcases? I'm planning on ensuring there's an adequate library of them there once I'm done with this round of updates. Remsenseθ―21:54, 8 April 2024 (UTC)[reply]
Per above...is there actually a purpose to being able to set a custom link rather than create easter eggs? I say we just have it link in most cases to Ζ’ i.e. the page for the character itself most of the time. Remsenseθ―21:57, 8 April 2024 (UTC)[reply]
Almost there
Great to see it working again, thank you. Just one left on the to-do list, I think?
name=none so that {{unichar|0123|name=none}} produces just plain U+0123 Δ£
It looks a lot like the use of the alias can be automatic, by just checking the alias database and using it instead of the real one if there is an entry. Is there a reason you did not do this? Spitzak (talk) 09:44, 10 April 2024 (UTC)[reply]
Knew I should've just looked at the page that definitely exists where they tell me what characters can't be used as article titles. Remsenseθ―19:44, 9 April 2024 (UTC)[reply]
Some you win, some you lose. I just came back to say it must be something to do with that character because these work:
I see that it is also a problem with latin script. In the example of "q with circumflex" below, the template fails to align the circumflex correctly over the q. --πππ½ (talk) 18:52, 21 April 2024 (UTC)[reply]
The cwith character is printed first. Also you should not try to use this to show a character that is not a single code point. Spitzak (talk) 08:03, 22 April 2024 (UTC)[reply]
Suppose that somewhere there exist a letter q with circumflex, qΜ. Before we enhanced the template to assert the canonical name (and only the canonical name), it was possible to write {{unichar|0071|cwith=Μ|Latin small letter q with circumflex}} and get U+0071 qΜ LATIN SMALL LETTER Q WITH CIRCUMFLEX. Which of course was false: U+0071 is a common or garden q. The new arrangement is questionably better, producing U+0071ΜqLATIN SMALL LETTER Q, which is a different kind of lie: the grapheme shown is not U+0071 and it is not (just) a Latin small letter q.
So I would like to propose that, when cwith=<combining diacritic>, we expose that fact in the description.
Thus, for example, {{unichar|0071|cwith=Μ}} should produce U+0071qLATIN SMALL LETTER QwithU+0302ΜCOMBINING CIRCUMFLEX ACCENTΒ : qΜ
Yes, I agree that the dotted circle should be the only valid option. Perhaps way back in the early developments, it also supported a coloured block to show the various forms of space character? These are now hardcoded but I guess there are too many combining diacritics to do the same here too.
I will revise the documentation accordingly.
As for all the other bells and whistles, it would take a full search of existing usage to determine where and why they are used. That is not a trivial task. πππ½ (talk) 08:34, 22 April 2024 (UTC)[reply]
I have revised the documentation to formally restrict the base character to β and to deprecate any other usage. Please review.
Is it possible to determine it is combining from the unicode info database? If so maybe just ignore the field entirely and use that. Spitzak (talk) 07:15, 25 April 2024 (UTC)[reply]
Do we know how/whether that would work with non-Western scripts? Interestingly (at least on ChromeOS), this Devangari combiner comes with dotted circle out of the box: U+0942ΰ₯DEVANAGARI VOWEL SIGN UU. I don't know how typical that is. --πππ½ (talk) 17:10, 26 April 2024 (UTC)[reply]
The docs say that |nlink= with no argument is deprecated but in my opinion it is a useful feature that we should try to support. The problematic characters are easy to fix simply by linking to the names instead of the characters. I have already written how this can be done in the sandbox (the diff). The only problem with the way its currently done is that I have to special case the underscore because low line is a disambiguation page. I don't like hardcoding things like that, but I don't think anyone plans to move underscore any time soon so it should be fine. Nickps (talk) 14:26, 14 June 2024 (UTC)[reply]
It was only deprecated because it is a bit of a bear trap. Not every Unicode canonical name has a matching article, I think? And just because an article of that name exists, does it necessarily relate to the character.
|nlink= does not link to the canonical name by default. It links to the character itself. See {{unichar/sandbox|32|nlink=}}->U+00322DIGIT TWO for an example (digit two does not exist, 2 obviously does). My proposal is that the name should be linked if and only if the character is not allowed in a title.Nickps (talk) 20:05, 14 June 2024 (UTC)[reply]
To actually explain what my change is, if the character is any of # < > [ ] { } |Β : _ which are the characters not allowed in titles, then I link to the name (except low line which is disambiguated to underscore), otherwise, nothing changes. Nickps (talk) 20:29, 14 June 2024 (UTC)[reply]
Rereading the discussions about the last big change, it does seem to be the case that it was just these forbidden characters that caused the barf (specific example was full stop). Your proposed revision resolves that problem and seems lightweight enough not to cause any problems.
Ok, that makes sense, you never know how these things can break. I also need to write testcases anyway, so there's no rush to merge. Nickps (talk) 09:01, 15 June 2024 (UTC)[reply]
This makes perfect sense. I cannot figure out why a huge change to make it not use a user-defined name was somehow accompanied by a change that forced a user defined name for the link. I would implement this ASAP as somebody is busy adding text to the nlink in every instance, which is backwards. Spitzak (talk) 14:19, 15 June 2024 (UTC)[reply]
@Spitzak I'd suggest you ask them to stop their edits and comment here. I want to undeprecate the empty nlink parameter but apparently this editor disagrees and should be given a chance to explain their reasons. Nickps (talk) 14:50, 15 June 2024 (UTC)[reply]
Did you mean me? After the big change (when we discovered the anomaly that Nickps is now fixing), I certainly went round clearing nlink=nothing because of not knowing the full extent of the problem. That was a month ago. Has someone else resumed? πππ½ (talk) 18:06, 15 June 2024 (UTC)[reply]
Now, to be clear, that page used to be at TM:Unichar/sandbox/doc but since it was only used by {{Unichar/hexformat/sandbox}}, I moved it to its current title. Still, I can't understand the purpose of that page. To me it looks more like a bunch of notes for personal use rather than a documentation page. Does anyone have any idea what it's supposed to say or should it just go to TfD? Nickps (talk) 01:07, 25 June 2024 (UTC)[reply]
Make |cwith=| a valid option, to save us having to dig out a dotted circle every time?
Since, as documented, the only valid parameter for cwith= is the dotted circle, can anyone see a reason to demand the parameter in for first place? Surely we can just have |cwith=| (a null parameter) as a valid option, with the dotted circle being supplied automatically. πππ½ (talk) 16:26, 16 August 2024 (UTC)[reply]
I think it should also be possible to automatically add the dotted circle if the unicode attributes indicates the character is combining, so no cwith is needed at all.
If it is wrong, I really recommend an attribute be added that is the "print this instead" attribute. It can contain any markup wanted, and would replace all the stuff to set the font and size and cwith, and the image option, and so on. Spitzak (talk) 17:33, 16 August 2024 (UTC)[reply]
I think most of the current parameters could be replaced with a single optional parameter. If that parameter is given, it's value is used to show the character. This would get rid of the need for the image and a lot of other controls for messing with the font. Popular substitutions could eventually be put in the template itself. Spitzak (talk) 03:02, 17 August 2024 (UTC)[reply]
But the only character we ever want to show is the canonical glyph and canonical name? (with the sole exception of combining diacritics which need the support of a dotted circle for clarity) [Caution: many Devangari diacritics come with the dotted circle 'as standard'.] I'm still not following you.
Or do you mean a option to use serif rather than the default sans, since some glyphs are difficult to "read" without the hinting supplied by serif.
Or am I still missing your point? (Though if is that there is surfeit of bells and whistles that are never used and should go, I agree "subject to survey". πππ½ (talk) 15:43, 17 August 2024 (UTC)[reply]
I think you really need to give an example. I assume you don't mean anything horrible like getting U+005E^CIRCUMFLEX ACCENT to display U+005E ^ CARET SIGN? --πππ½ (talk) 18:32, 4 September 2024 (UTC)[reply]
Assuming the new field is called "as", I propose that {{unichar|0040|as="FooBar"}} display as U+0040 FooBar COMMERCIAL AT instead of U+0040@COMMERCIAL ATSpitzak (talk) 18:39, 4 September 2024 (UTC)[reply]
Width bug
Recently this has been adding a lot of whitespace at the end of the small-caps name. Most obvious if the link is enabled as the underscore is also extended under this whitespace. Spitzak (talk) 17:53, 4 September 2024 (UTC)[reply]
Occasionally, I'd like to use the unicode character itself as the parameter. For instance, for π΄, I'd like {{unichar|π΄}} to produce U+1F3B4π΄FLOWER PLAYING CARDS. This would occasionally save me a short but slightly tedious round trip looking up the character code of a character I already have but I don't have the code of, and having the computer do this mapping for me seems quite doable using software (I don't know much about Wikipedia templates, though). Single characters 0-F/f can be exempt from this, of course, if their capacity to represent single-digit hexadecimal numbers from 0 to 15 is still important (although maybe it isn't, since most people write those like 000F or 0F anyway?).
While looking into this, I was reminded that the unichar template doesn't let you add the U+ prefix to the code in parameter 1. So, for instance, U+1F3B4 is an error. Apparently this is a common error for people to make, so maybe it should be detected and the U+ prefix should simply be stripped internally? Dingolover6969 (talk) 07:02, 19 October 2024 (UTC)[reply]
I'm not sure how a reverse lookup like that could be easily accomplished in a Wikipedia template. It seems like something that ought to be possible since the computer obviously has this information, but I don't think you have access to the table that you'd need to do that. The best idea that comes to mind would be to generate a magic template list with a script or bot of some kind that hardcodes the table and then look it up from that. Andreπ07:08, 19 October 2024 (UTC)[reply]