Template talk:Lang
lang-my outputs tofu on my browser (FF)I've been removing lang-my where I come across it because it turns burmese script into tofu. Not sure what the problem is, but assume it's forcing a script to display that I don't have installed. I have a number of burmese scripts, though, including generic ones like Noto, so display shouldn't be a problem. — kwami (talk) 09:26, 31 August 2024 (UTC)
—Trappist the monk (talk) 19:20, 7 November 2024 (UTC) merge language-specific templatesFor years I've wanted to create a We might:
No doubt I've missed something here, not the least of which is community approval to make this change. —Trappist the monk (talk)
Replace and delete the approximately 1145
{{lang-??}} templates listed at [[]] with the single template {{langx}} .
The
Like
BreakI've came across Template:Lang-az-Cyrl and Template:Lang-lmo-IT that aren't in Category:Lang-x templates (not sure why) and aren't in the TfD nomination. Should they be? --Gonnym (talk) 09:05, 30 September 2024 (UTC)
Moldovan CyrillicAn editor moved {{Lang-mo-Cyrl}} to {{Moldovan Cyrillic}}, which broke the documentation nicely. Many of the transclusions were replaced with the new name. It may need some special attention during the migration. – Jonesey95 (talk) 14:23, 9 October 2024 (UTC)
Italics in foreign-language textI'm struggling with what to do with foreign-language text containing italic text while following default rules on foreign-language italicization. Specifically, I'm working on Template:Translated blockquote. The default rules are described at Template:Lang#Automatic italics and defined at Module:Lang#L-996.
I have edited Template:Lang/with italics (permalink) as a proof-of-concept that can accept the following kinds of markup:
My implementation is really klunky, so this isn't an edit request. It just seemed easier for me to implement in the template rather than the Lua module. Questions:
Daask (talk) 20:08, 18 September 2024 (UTC)
Is it applied to transliterarions?Please see Talk:Kompromat#Why_is_the_word_so_small?. Two issues: (2) the complaint abouot fontsize and (1) (my question: Is the usage {{lang|ru|Kompromat}} (Kompromat) valid or only {{lang|ru|компромат}} makes sense? --Altenmann >talk 23:54, 4 October 2024 (UTC)
Georgian italicsIn Langx, Georgian (code "ka") is currently italicized by default but shouldn't be, per WP:FOREIGNITALIC. — Goszei (talk) 22:54, 12 October 2024 (UTC)
Lua error in Module:Lang at line 1422: attempt to concatenate a nil valueThis error show on the page Wicked City (1987 film). 118.3.227.103 (talk) 15:40, 13 October 2024 (UTC)
Error when displaying Japanese textI don't know if this is the right place for a bug report, but instead of the Japanese text and romaji equivalent I get this message: "Lua error in Module:Lang at line 1422: attempt to concatenate a nil value.". The text was displaying correctly until I clicked on the donate button with the scroll-wheel (which opened the page in a new tab). Now any page I go on has this error message instead of the Japanese text, even when I refresh or close and reopen a page. I am using Firefox and Ecosia. Luu-meer (talk) 15:44, 13 October 2024 (UTC) Tracking categoriesCould you add the following tracking categories to the module?
Gonnym (talk) 08:30, 14 October 2024 (UTC)
lang-en Moved from User talk:Trappist the monk § lang-en {{langx}} shouldn't say "The non-English text to display." when |en| is allowed (as it should, since lang-en is being merged with it). Or at least "Text" shouldn't be a "Required field" as I can put "Literal translation". Web-julio (talk) 03:31, 19 October 2024 (UTC)
Helpful guide for when to use this versus |
Template | Languages | Scripts | Transliterations | Translation | Labels |
---|---|---|---|---|---|
{{Hani}}
|
Any | — | — | No | No |
{{CJKV}}
|
Yes | Always | |||
{{lang-zh}}
|
Chinese |
|
|
Yes | Optional |
{{Nihongo}}
|
Japanese | Japanese writing system[a] | Hepburn | Yes | Optional |
{{Nihongo2}}
|
Japanese | Japanese writing system[a] | — | No | No |
{{Korean}}
|
Korean |
|
Yes | Optional | |
{{Hanja}}
|
Korean | Hanja | — | No | Always |
{{Vi-nom}}
|
Vietnamese | Chữ Nôm | — | No | No |
{{Lang}}
|
Any | Any | Any | No | No |
{{Langx}}
|
Any | Any | Any | Yes | Optional |
- ^ a b c No parameter for giving a kana transcription; mixed orthography can be used.
- ^ A single "Korean" parameter—suitable for giving a Hangul transcription of a written word used in multiple languages, but not transcribing hanja in a Korean-specific context.
- ^ A single "Vietnamese" parameter—suitable for giving a transcription of a written word used in multiple languages, but not transcribing in a Vietnamese-specific context.
--HarJIT (talk) 13:54, 25 October 2024 (UTC)
Typo in "Langx |italic= parameter operation" section
In the Italic=value (last section of table), in the second entry, we see {{Langx|ru|''тундра''|italic=}invert}}
. There appears to be an extra right-brace right after "italic=". Tarl N. (discuss) 13:19, 28 October 2024 (UTC)
- I didn't realize my request was going to Template talk:Lang. The typo I'm referring to is in the Template:Langx section "Langx |italic= parameter operation" section. Why does the talk page for langx drop one here? Tarl N. (discuss) 02:21, 31 October 2024 (UTC)
- That error was fixed with this edit. This talk page is the centralized discussion page for several related templates and modules.
- —Trappist the monk (talk) 02:52, 31 October 2024 (UTC)
- Ah, thanks. Tarl N. (discuss) 00:26, 1 November 2024 (UTC)
- Further, I think the section name should be changed to "Lang" instead of "Langx" on the Template:Lang article, right? Kilvin the Futz-y Enterovirus (talk) 07:28, 12 January 2025 (UTC)
Missing languages
We need the ability to feed languages outside ISO, for example, such as Old West Norse, Old East Norse, Old Swedish, Early Modern Swedish, Late Modern Swedish, etc. Blockhaj (talk) 08:27, 30 October 2024 (UTC)
- No we do not, in my opinion. Remsense ‥ 论 09:19, 30 October 2024 (UTC)
- Ur reasoning? Why limit ourselfs. Blockhaj (talk) 10:34, 30 October 2024 (UTC)
- Same reason as always: it serves insufficient concrete benefit to editors or readers, while increasing technical, conceptual, and potentially epistemological complexity. At this level of diachronic granularity, whose schemas are we meant to use? There's a reason ISO took on the task of producing a standard for this to begin with, wouldn't you agree? Remsense ‥ 论 10:44, 30 October 2024 (UTC)
- I disagree with the argument "insufficient concrete benefit to editors or readers". Current limits are limiting in a bad way. I feel we should instead strive for commonality with Wiktionary, whos expanded schemas i propose we use. Blockhaj (talk) 11:23, 30 October 2024 (UTC)
- As you are locking in the argument that there are concrete issues to be solved, would you mind directly articulating what they are? Remsense ‥ 论 22:20, 30 October 2024 (UTC)
- I disagree with the argument "insufficient concrete benefit to editors or readers". Current limits are limiting in a bad way. I feel we should instead strive for commonality with Wiktionary, whos expanded schemas i propose we use. Blockhaj (talk) 11:23, 30 October 2024 (UTC)
- Same reason as always: it serves insufficient concrete benefit to editors or readers, while increasing technical, conceptual, and potentially epistemological complexity. At this level of diachronic granularity, whose schemas are we meant to use? There's a reason ISO took on the task of producing a standard for this to begin with, wouldn't you agree? Remsense ‥ 论 10:44, 30 October 2024 (UTC)
- Ur reasoning? Why limit ourselfs. Blockhaj (talk) 10:34, 30 October 2024 (UTC)
- If there is sufficient need, we can create IETF private use tags for languages not directly supported in the IANA language-subtag-registry file. The list of currently supported private-use tags is at Template:Lang § Private-use language tags.
- Language templates based on Module:Lang will not adopt the mishmash of nonstandard tags that are supported at wiktionary.
- —Trappist the monk (talk) 13:15, 30 October 2024 (UTC)
extra params?
|anglicization=
/ |anglisation=
and |romanization=
/ |romanisation=
would be useful, |translation=
and |transliteration=
and |lit=
provide a translation, transliteration, and literal meaning; but if something has an older anglicization, that should also be available (ie. Crackow, etc), and a romanized form that is different from transliteration, because of some oddball or non-English choices in letter/character use, or because the language uses both latin and non-latin script, making the latin script version not a transliteration ; also for extended latin alphabets to basic latin alphabetic forms -- 65.92.246.77 (talk) 11:32, 30 October 2024 (UTC)
Private-use language tags
I propose the addition of the following private-use tags:
- Old East Norse: non-x-east
- Old Norwegian: nor-x-old
- Middle Norwegian: nor-x-middle
- Old Norwegian: nor-x-old
- Old West Norse: non-x-west
- Old Swedish: swe-x-old
- Early Modern Swedish: swe-x-ems
- Late Modern Swedish: swe-x-lms
- Early Modern Swedish: swe-x-ems
- Middle Danish: dan-x-middle
- Modern Danish: dan-x-modern
- Old Swedish: swe-x-old
Blockhaj (talk) 17:18, 1 November 2024 (UTC)
tracking sr usage with issues
@Trappist the monk I noticed {{lang-sr}} was deleted after the bot replaced its usage, but it also had a couple of semantic problems previously discussed at Template talk:Lang-sr and Talk:Romanization of Serbian that were never resolved:
- a lot of text is marked as just "Serbian" but we don't know if it's Latin, in which case it should be italicized, or if it's Cyrillic, in which case it shouldn't
- for example the lead section of Belgrade has:
- Serbian: Београд / Beograd
- and the latter part of that fails MOS:FOREIGNITALIC
- for example the lead section of Belgrade has:
- its third parameter was sometimes used to show the other script, but would mark it as "romanization", which may or may not be good - when discussing 500-year-old sources it's probably fine, but when discussing something from the last 50 years it's basically very weird
- for example as it was before this fix:
- and there is no "romanization" in the latter half of the 20th century, the company's name in Latin was of the same significance as its name in Cyrillic
How can we address these now with langx? Can we get at least some tracking categories if these symptoms are detected, so they can be checked? --Joy (talk) 09:54, 8 November 2024 (UTC)
- If this is such a problem, why wasn't
{{lang-sr}}
deleted long ago? Didn't we create{{lang-x2}}
,{{lang-sr-Cyrl-Latn}}
, and{{lang-sr-Latn-Cyrl}}
specifically to address this issue? Also,{{lang-sr-Cyrl}}
and{{lang-sr-Latn}}
? - This crude search finds about 4900 articles that use
{{langx|sr|...}}
and this crude search finds about 1500 articles that have{{langx|sr|<parameter>|<another parameter>|...}}
.<another parameter>
could be a named parameter or a 'transliteration'. - I am opposed to one-off special-case code. Module:Lang/langx has a list of unsupported language tags. Use of
{{langx}}
with any of those tags adds the page to Category:Langx uses unsupported language tag. I will addsr
to that list. In future, some of the currently unsupported language tags will be converted to supported private use tags. After that, I expect that the module will be tweaked so that the remaining unsupported language tags will cause the module to emit error messages. - —Trappist the monk (talk)
14:26, 8 November 2024 (UTC)15:19, 8 November 2024 (UTC) additional templates- I would guess the reason is that nobody in the know really wanted to create a TfD that would have required a check and possibly a change to 5k articles when lang-sr can be perfectly fine if the input text is only one Cyrillic parameter. We don't want to emit error messages to readers for that. How can we best manage this process of converting to different tags?
- BTW I also noticed that the old template had code to add Category:Instances of Lang-sr using second unnamed parameter since 2016, so the removal of this part is a bit of a regression. --Joy (talk) 16:55, 8 November 2024 (UTC)
- The day after I created Module:Lang/langx, I made myself a TODO-note wondering if
{{langx}}
couldn't auto-italicize in a manner similar to{{lang}}
. Sometime later I wrote a hack to do just that. I have moved that hack into Module:Lang/sandbox. What the{{langx/sandbox}}
renderings look like compared to the live{{lang}}
and{{langx}}
template renderings can be seen in this version of my sandbox (permalink). The hack should probably be rewritten so that Module:Lang will work for those other-language wikis that don't / won't support{{langx}}
. Any{{lang-??}}
templates that remain after the conversion will need to be checked to ensure that they continue to work as they were intended. - —Trappist the monk (talk) 20:44, 8 November 2024 (UTC)
- OK so if I read that right, overall the outcome would be that Serbian Latin would be italicized, and combinations still need manual interventions en masse? --Joy (talk) 21:41, 8 November 2024 (UTC)
- Of course, but you knew that. The new:
{{langx|sr|Београд / Beograd|lit=White City}}
- is just as broken as the old:
{{lang-sr|Београд / Beograd|lit=White City}}
- which is why you wrote
{{lang-sr-Cyrl-Latn}}
and its companions: - I imagine that you might write an awb script that is sufficiently clever to create
{{lang-sr-Cyrl-Latn}}
from{{langx|sr|Београд|Beograd|White City}}
. Mayhaps even from{{langx|sr|Београд / Beograd|lit=White City}}
. - —Trappist the monk (talk) 23:01, 8 November 2024 (UTC)
- Okay, but none of this addresses my original point - how do we find them first. This issue may affect sr, sh, cnr and uz IIRC, can't we just have a tracking category for this whole class of lang-x2 languages? --Joy (talk) 10:19, 9 November 2024 (UTC)
- Did I not suggest how to find articles that use
{{langx|sr|...}}
? Repeating the second of those suggested searches here with similar searches for the other three language tags: - I am opposed to special-case code.
- —Trappist the monk (talk) 19:41, 9 November 2024 (UTC)
- I mean we can genericize it even further - Category:Articles containing Serbian-language text shows 20.5k, why wouldn't we simply distinguish those 1.5k... and in turn why not have a tracking category for labeled vs. not labeled for each language. Is there a particular cost to having two subcategories instead of just one? --Joy (talk) 21:33, 9 November 2024 (UTC)
- I have written a simple awb script that trawls the search results above and lists those articles that have
{{langx}}
templates that are candidates for conversion to{{lang-<tag>-Cyrl-Latn}}
. I have put the four lists in your user space; see User:Joy/candidate articles for lang-xx-Cyrl-Latn. - —Trappist the monk (talk) 19:05, 10 November 2024 (UTC)
- I have written a simple awb script that trawls the search results above and lists those articles that have
- I mean we can genericize it even further - Category:Articles containing Serbian-language text shows 20.5k, why wouldn't we simply distinguish those 1.5k... and in turn why not have a tracking category for labeled vs. not labeled for each language. Is there a particular cost to having two subcategories instead of just one? --Joy (talk) 21:33, 9 November 2024 (UTC)
- Did I not suggest how to find articles that use
- Okay, but none of this addresses my original point - how do we find them first. This issue may affect sr, sh, cnr and uz IIRC, can't we just have a tracking category for this whole class of lang-x2 languages? --Joy (talk) 10:19, 9 November 2024 (UTC)
- Of course, but you knew that. The new:
- OK so if I read that right, overall the outcome would be that Serbian Latin would be italicized, and combinations still need manual interventions en masse? --Joy (talk) 21:41, 8 November 2024 (UTC)
- The day after I created Module:Lang/langx, I made myself a TODO-note wondering if
Next steps?
Nice job with clearing and deleting all the templates from the TfD!
From the left over templates, we have
- those at Category:Lang-x templates with other than ISO 639. I think that if we aren't planning on deleting them, then we should support them with the private use range.
- templates with IPA support like Template:Lang-rus. Can we add
|ipa=
support built-in to the module?
Another question which I have is regarding the script templates at Category:Script–font templates. If font support is needed for specific languages, why don't we support it via the module? Is the text less clear with us not always using it? Are some of these outdated with newer Unicode support?
Regarding #Tracking categories, I think making the difference between lang and langx only being the label is the right way to handle this, as the label=no situation is not only unnecessary code in text, but it also disables all other labels. Gonnym (talk) 10:11, 12 November 2024 (UTC)
- Of the templates originally in Category:Lang-x templates with other than ISO 639, several have been converted to be usable by
{{lang}}
and{{langx}}
:- Template:Lang-ast-leo → Leonese: text → added
ast-ES
language tag (used internally by{{lang}}
) to Module:Lang/data - Template:Lang-az-Arab → [text] Error: {{Langx}}: Latn text/non-Latn script subtag mismatch (help) →
az-Arab
is a properly formed IETF language tag that was ignored by wrapped{{Language with name}}
template - Template:Lang-fr-gallo → Gallo: text → added
fr-gallo
to Module:Lang/data - Template:Lang-fra-que → Quebec French: text → added
fr-CA
to Module:Lang/data - Template:Lang-ku-Cyrl → [text] Error: {{Langx}}: Latn text/non-Latn script subtag mismatch (help) →
ku-Cyrl
is a properly formed IETF language tag; converted from{{Language with name}}
- Template:Lang-lmo-IT → Bergamasque: text → added
lmo-x-berg
to Module:Lang/data - Template:Lang-oc-gascon → Gascon: text →
oc-gascon
is a properly formed IETF language tag ignored by wrapped{{Language with name}}
- Template:Lang-ast-leo → Leonese: text → added
- which leaves us with these:
- Template:Lang-1ca – Old Anatolian Turkish is a defunct Turkic language; private use tag might be possible:
trk-x-oldanat
; don't know iftrk
is the right base tag - Template:Lang-est-sea – Seto is a dialect of Estonian; private use tag might be possible:
et-x-seto
- Template:Lang-fra-frc – private use tag might be possible:
fr-x-frainc
- Template:Lang-1ca – Old Anatolian Turkish is a defunct Turkic language; private use tag might be possible:
- These are not languages so we probably ought not support them with Module:Lang; that being the case, these templates don't belong in Category:Lang-x templates with other than ISO 639:
- Template:Lang-sq-definite – definiteness is a linguistic construct
- Template:Lang-uniturk – Uniform Turkic Alphabet is a writing system
- Template:Lang-vi-chunom – chữ Nôm is a writing system; applies custom styling with
{{Vi-nom}}
- Template:Lang-vi-hantu – chữ Hán is a writing system; applies custom styling with
{{Vi-nom}}
- I don't currently have an opinion about styling templates. I suspect that there are editors who will demand styling because they prefer the styled for over the default form:
- I suspect that there would be a deal of work to be done were we to attempt to consolidate the various scripts and their (sometimes) attendant css files.
- I don't really understand what you mean by your #Tracking categories comment. And, if that comment was a continuation of that other discussion, doesn't the comment belong there?
- —Trappist the monk (talk) 19:06, 12 November 2024 (UTC)
- Nice, good job again on shortening the list!
- Regarding
1ca
, looking at the article,trk
seems the most correct. est-sea
's linguage in the article seems a bit confusing. It says it's a South Estonian but that article's infobox does not list Estonian as a parent (the lead does though). It's most recent parent according to the infobox is Võro language. Not sure ifet-x-seto
is the most correct.fr-x-frainc
seems good.- Regarding script templates, I thought the reason was not just visible preference but because it renders it correctly, but maybe that isn't true, or always true. I think though that it's probably better for the wiki if we use consistent fonts so we don't have instances of the the above Hebrew translation which look different, even on the same page. It will also make for shorter code on the pages themselves if we don't need to apply the script template manually.
- Some templates that use script and currently can't be merged: Template:Lang-ku-Arab
- Others:
- Template:Lang-ka and Template:Transl-grc usages can be converted if we support an automatic transliteration (
|auto=yes
or something), which will call their respected templates (if they exist). - Template:Lang-rus can be converted if we support
|IPA=
or, if we remove support of IPA from outside that template. In general though, I don't think it's smart of us to have lang-rus around as that's an opening for yet another batch of templates created in similar style.
- Template:Lang-ka and Template:Transl-grc usages can be converted if we support an automatic transliteration (
- Regarding
- Gonnym (talk) 10:59, 13 November 2024 (UTC)
- Nice, good job again on shortening the list!
auto italics for {{langx}}
At present, {{langx}}
uses a list of language tags scraped from those now deleted {{lang-??}}
templates that called lang_xx_inherit()
. That function sets the initial rendering style for a {{lang-??}}
template to upright. The list of tags is in Module:Lang/langx at lines 1–536 (permalink).
In the sandbox, I have adapted the auto italics code used by {{lang}}
so that we aren't limited by the hard-coding in the inherit_t
list. Serbian is a good example. That language gives equal status to Cyrillic and Latin text. Currently, the live version of {{langx|sr|<text>}}
renders <text>
in an upright font regardless of script. The proposed sandbox version renders Cyrillic <text>
in an upright font and Latin <text>
in an italic font. {{lang}}
renderings here for reference:
- српски језик ←
{{lang|sr|српски језик}}
- Serbian: српски језик ←
{{langx|sr|српски језик}}
- Serbian: српски језик ←
{{langx/sandbox|sr|српски језик}}
- srpski jezik ←
{{lang|sr|srpski jezik}}
- Serbian: srpski jezik ←
{{langx|sr|srpski jezik}}
- Serbian: srpski jezik ←
{{langx/sandbox|sr|srpski jezik}}
Without objection, I shall update the live version of the module to support auto italics.
—Trappist the monk (talk) 23:28, 12 November 2024 (UTC)
- Good idea on making the source of information of both styles the same. Gonnym (talk) 11:03, 13 November 2024 (UTC)
lang error that currently can't be fixed within the template
At Adoptionism#Ebionites (and I've seen this issue in many other places), the code used is {{lang|hbo|אביונים|ebyonim}}
, this produces an error as {{lang}} does not support transliteration. This can be fixed by changing to use {{langx}}, however the label it will produce for the language isn't wanted there. |label=none
can be used, but then it also removes the label for the romanization, which is wanted there. One can remove the transliteration outside the template, but that just defeats the purpose of the template.
What should happen in my opinion, and I've said this somewhere in one of the above sections, is that {{lang}} and {{langx}} should have the same secondary features regarding transliteration and literal translation, with the difference being that Langx produces a language label and Lang does not (but does produce labels for the other parts). Gonnym (talk) 17:00, 14 November 2024 (UTC)
Broken usage of langx
I'm not sure how this template works, but this page is complaining about a missing parameter "p", and I'm not sure how to fix it. x42bn6 Talk Mess 18:24, 15 November 2024 (UTC)
- The page was calling {{lang-ru}} with
|p=
. The template has been deleted, so I don't know if|p=
(for "pronunciation", possibly) was a valid parameter. An admin will be able to check. – Jonesey95 (talk) 18:51, 15 November 2024 (UTC) - Some history – I didn't go back to the very beginning:
- changed from
{{lang-ru}}
to{{lang-rus}}
at this edit –{{lang-rus}}
supports the|p=
parameter - changed from
{{lang-rus}}
to{{lang-ru}}
at this edit –{{lang-ru}}
ignored the unsupported|p=
parameter - changed from
{{lang-ru}}
to{{langx|ru|...}}
at this edit –{{langx}}
ignored the unsupported|p=
parameter until just a day or so ago; now it emits an error message when editors give it parameters that it does not support.
- changed from
- —Trappist the monk (talk) 19:00, 15 November 2024 (UTC)
- So it looks like one possible fix is to change the template transclusion back to {{lang-rus}}. Or is that creating more work in the future? This error is present in other articles, such as Denis Cheryshev. – Jonesey95 (talk) 19:17, 15 November 2024 (UTC)
- For now changing is the fix. I did however propose that we either disentangle the unsupported features from -rus or add support for them so other languages can use. There is really almost no reason at all for any specific-language template to stay after the creation of langx. Gonnym (talk) 19:24, 15 November 2024 (UTC)
- Pending more granular tracking categories or sorting within the category, an insource search shows 63 articles with this particular error. Most appear to be using lang|ru, but at least a few are using lang|zh, which I have not investigated. – Jonesey95 (talk) 14:32, 16 November 2024 (UTC)
- It looks like there is also an error message with "sc", which presumably refers to script. Mellk (talk) 13:35, 22 November 2024 (UTC)
- Thanks, but it is not necessary for you to report each instance of unknown parameters causing error messages. They are all collected in Category:Lang and lang-xx template errors which at present lists 191 pages.
- —Trappist the monk (talk) 13:57, 22 November 2024 (UTC)
- Since this is related to lang-rus, the issue is not just "p=". Mellk (talk) 14:06, 22 November 2024 (UTC)
- The 'issue' is
{{lang}}
and{{langx}}
with parameters that are not know to those templates. The issue is not confined to{{lang-rus}}
or{{lang-zh}}
templates that have been improperly changed to{{lang}}
or{{langx}}
. Here are searches that are not parameter specific for both templates:{{lang}}
~680 articles{{langx}}
~190 articles
- Yep, there is a lot of junk out there. You still don't need to make a report here for every subgroup of errors that you encounter out there.
- —Trappist the monk (talk) 14:43, 22 November 2024 (UTC)
- I did not plan to make a report for every error. I also did not say that the errors are confined to lang-rus (that is pretty obvious when the search above showed that it was not just ru). I was referring to the fix suggested above. Mellk (talk) 14:59, 22 November 2024 (UTC)
- The 'issue' is
- Since this is related to lang-rus, the issue is not just "p=". Mellk (talk) 14:06, 22 November 2024 (UTC)
- It looks like there is also an error message with "sc", which presumably refers to script. Mellk (talk) 13:35, 22 November 2024 (UTC)
- I think it is also possible to move pronunciation to the IPA template. I was under the impression that lang-rus would eventually be replaced, but it seems like this is not the case yet? Mellk (talk) 09:38, 22 November 2024 (UTC)
- Pending more granular tracking categories or sorting within the category, an insource search shows 63 articles with this particular error. Most appear to be using lang|ru, but at least a few are using lang|zh, which I have not investigated. – Jonesey95 (talk) 14:32, 16 November 2024 (UTC)
- For now changing is the fix. I did however propose that we either disentangle the unsupported features from -rus or add support for them so other languages can use. There is really almost no reason at all for any specific-language template to stay after the creation of langx. Gonnym (talk) 19:24, 15 November 2024 (UTC)
- So it looks like one possible fix is to change the template transclusion back to {{lang-rus}}. Or is that creating more work in the future? This error is present in other articles, such as Denis Cheryshev. – Jonesey95 (talk) 19:17, 15 November 2024 (UTC)
Lang error category without error message?
Church Slavonic is in Category:Lang and lang-xx template errors, but I am unable to find a red error message. Maybe I just can't see it. – Jonesey95 (talk) 19:30, 15 November 2024 (UTC)
Do you see it here:[a]
[ⱌⱃⰽⰲⰰⱀⱁⱄⰾⱁⰲⱑⱀⱄⰽⱜ ⰵⰸⰻⰽⱜ] Error: {{Langx}}: invalid parameter: |script= (help)
Fixing the deprecated |script=
parameter (cu
→ cu-Glab
) resolves the problem.[a]
Croatian Church Slavonic: ⱌⱃⰽⰲⰰⱀⱁⱄⰾⱁⰲⱑⱀⱄⰽⱜ ⰵⰸⰻⰽⱜ, romanized: crkavnoslověnskь jezikь
- ^ Croatian Church Slavonic: ⱌⱃⰽⰲⰰⱀⱁⱄⰾⱁⰲⱑⱀⱄⰽⱜ ⰵⰸⰻⰽⱜ, romanized: crkavnoslověnskь jezikь
It has been a while, but I've seen these before and if my failing memory is correct, always associated with {{efn}}
. I was never able to figure out why the invalid error message gets sandwiched into and corrupts the maintenance message.
—Trappist the monk (talk) 20:04, 15 November 2024 (UTC)
- No, I do not see an error message in this talk page section. Maybe my custom CSS is suppressing it? When I inspect the page, I see Note the display:none. – Jonesey95 (talk) 14:33, 16 November 2024 (UTC)
<span class="lang-comment" style="font-style: normal; display: none; color: #33aa33; margin-left: 0.3em;">{{langx}} uses deprecated parameter(s) </span>
- I can see error messages above now, and in the 20 October 2024 version of Church Slavonic. This appears to be resolved. – Jonesey95 (talk) 18:47, 22 November 2024 (UTC)
Use in headers
If there is non-English text in section headers, should we use this template? E.g. == Hello ({{lang|ko|안녕}}) ==
seefooddiet (talk) 23:31, 15 November 2024 (UTC)
- Isn't this a question for the appropriate WP:MOS talk page? Templates and wikilinks are discouraged in section headings; see MOS:HEADINGS.
{{lang}}
is a template and, unless|nocat=yes
will create a category wikilink. I can imagine that we could make{{lang}}
subst-able in a way that it knows that it is being subst'd so won't emit a category. Once subst'd you'd end up with a header that looks like this:== Hello (
<span title="Korean-language text"><span lang="ko">안녕</span></span>
) ==
- I don't know if there are any rules regarding html markup in headings so posing your question elsewhere would be a good idea. Start at WT:MOS?
- —Trappist the monk (talk) 00:24, 16 November 2024 (UTC)
text/script mismatch
I've been picking away at Category:Langx deprecated parameters and noticed multiple instances of {{lang}}
and {{langx}}
templates where <text>
does not match the script specified by the script subtag. For example, this:
{{langx|tly-Latn|Фәхрәддин Әбосзодә}}
→ [Фәхрәддин Әбосзодә] Error: {{Langx}}: Non-latn text (pos 1: Ф)/Latn script subtag mismatch (help)
In that template, <text>
is clearly not Latn
script but {{langx}}
doesn't notice and so incorrectly renders <text>
in italic form.
So, in the sandbox, I've fixed that, at least partially. To support auto-italics, Module:Lang evaluates <text>
to see if it is wholly Latn script. When it is not, <text>
is rendered upright (unless overridden by |italic=
). Since we know that <text>
is or is not Latn script, we can check the script subtag (if present) to see that it is appropriate. In the example above, the Cyrillic <text>
does not match the -Latn
subtag.
Conversely, when <text>
is Latn script, a mismatch exists when the script subtag is not -Latn
:
{{langx|tly-Cyrl|Text}}
→ [Text] Error: {{Langx}}: Latn text/non-Latn script subtag mismatch (help)
Again {{langx}}
does not notice so <text>
is incorrectly rendered in upright form.
Fixed in the ~/sandbox:
{{langx/sandbox|tly-Latn|Фәхрәддин Әбосзодә}}
→ [Фәхрәддин Әбосзодә] Error: {{Langx}}: Non-latn text (pos 1: Ф)/Latn script subtag mismatch (help){{langx/sandbox|tly-Cyrl|Text}}
→ [Text] Error: {{Langx}}: Latn text/non-Latn script subtag mismatch (help)
Same applies to {{lang}}
so:
{{lang/sandbox|tly-Latn|Фәхрәддин Әбосзодә}}
→ [Фәхрәддин Әбосзодә] Error: {{Lang}}: Non-latn text (pos 1: Ф)/Latn script subtag mismatch (help){{lang/sandbox|tly-Cyrl|Text}}
→ [Text] Error: {{Lang}}: Latn text/non-Latn script subtag mismatch (help)
Without objection, I shall implement this in the live module.
—Trappist the monk (talk) 16:15, 17 November 2024 (UTC)
Category renames
Now that almost all lang-xx have been deleted, the categories should be renamed to "Lang and langx".
Also, Template:My has ended in deletion, so if the bot can help with that replacement it would be great. Gonnym (talk) 16:20, 18 November 2024 (UTC)
- Switching
{{my}}
to{{lang}}
is outside of the Monkbot/task 20 remit. One might write an awb task to do the job though I notice that there are others already doing the work. Unless{{my}}
lingers for longer than it should (don't know how long that is) I guess I wouldn't worry about it. - Yeah, categories should be renamed. I suppose that can happen at any time so long as it happens at about the same time that we update Module:Lang to use the new names. The module should continue to support the existing names for those wikis that don't support
{{langx}}
. - —Trappist the monk (talk) 17:09, 18 November 2024 (UTC)
fn lang_xx_inherit parameter values removed
Trappist the monk recently did a major overhaul of Module:Lang in order to implement Template:Langx. (Thanks!) In the process, he removed |fn=lang_xx_inherit
, |fn=lang_xx_italic
, and |fn=lang
from Module:Lang as "no longer required". However, this broke Template:Translated blockquote, which depended on this feature, and is used in mainspace articles.
Based on the old documentation, I believe this documents the equivalent replacements:
Old code | New code |
---|---|
{{Lang |
{{Lang |
{{Lang |
{{Langx |
{{Lang |
{{Langx |
Please correct me if the old and and new code columns above are not exactly equivalent. I thought I would document this here in case any other template editors experienced similar errors from the removal of this functionality. I have yet to fix Template:Translated blockquote but plan on it in the next few days. Daask (talk) 21:03, 18 November 2024 (UTC)
- Restored in Module:Lang/sandbox.
|fn=lang_xx_inherit
and|fn=lang_xx_italic
were created so that editors didn't have to create yet another{{lang-??}}
template;|fn=lang
just came along for the ride. With the advent of{{langx}}
that generic use is no longer required. - We don't check parameter use for the useful utilities:
|fn=is_ietf_tag
,|fn=is_lang_name
,|fn=name_from_tag
, and|fn=tag_from_name
;name_from_tag
shown here for completeness. - Test the fix in
{{Translated blockquote/sandbox}}
by switching{{lang}}
to{{lang/sandbox}}
. - —Trappist the monk (talk) 00:15, 19 November 2024 (UTC)
- @Trappist the monk: Template:Lang/sandbox, and Template:Translated blockquote/sandbox, which now uses it, work as expected. Do you intend to restore these features to Template:Lang? Daask (talk) 14:24, 19 November 2024 (UTC)
- Yeah, I think I have to. Some version of Module:Lang is used on ~160 MediaWiki sites. There may be sites that rely on
|fn=
. - —Trappist the monk (talk) 15:15, 19 November 2024 (UTC)
- Yeah, I think I have to. Some version of Module:Lang is used on ~160 MediaWiki sites. There may be sites that rely on
- @Trappist the monk: Template:Lang/sandbox, and Template:Translated blockquote/sandbox, which now uses it, work as expected. Do you intend to restore these features to Template:Lang? Daask (talk) 14:24, 19 November 2024 (UTC)
Putting lang inside of langx?
I was curious if there is any point of putting lang inside of langx? for examples, see any of these. these are all single nestings, but I have also see cases with multiple {{lang}} inside of one {{langx}}. Frietjes (talk) 15:44, 19 November 2024 (UTC)
- None that I can think of unless the editor felt that the tool-tip was a requirement. Regardless, such constructs result in improper html and pointless category link duplication. For example:
{{langx|ain|{{lang|ain-Kana|アィヌ}}}}
→[[Ainu language|Ainu]]: <span lang="ain"><span title="Ainu (Japan)-language text"><span lang="ain-Kana">アィヌ</span></span>[[Category:Articles containing Ainu (Japan)-language text]]</span>[[Category:Articles containing Ainu (Japan)-language text]]
- the first category link (in English) is marked up as Ainu.
- The above was a conversion from:
{{lang-ain|アィヌ}}, {{transl|ain|Aynu}}
- to:
{{lang-ain|{{lang|ain-Kana|アィヌ}}, {{lang|ain-Latn|Aynu}}
- at this edit by SrpskiAnonimac.
- I can see no real useful reason why
{{lang}}
/{{langx}}
should be nested. Don't do that. - The fix for the above, as it currently exists in Ainu people § Names, is:
{{langx|ain-Kana|アィヌ}}
→ Ainu: アィヌ
- For others like this one from Roman province § Republican period where the two language tags are different:
{{langx|el|{{lang|grc|ἐπαρχίᾱ}}}}
- the fix is to use the language tag that directly wraps the text (no doubt there will be exceptions):
{{langx|grc|ἐπαρχίᾱ}}
→ Ancient Greek: ἐπαρχίᾱ
- —Trappist the monk (talk) 16:51, 19 November 2024 (UTC)
Errors in the template documentation
I am seeing what I believe are new errors in the template documentation. In the table headed "Langx |italic= parameter operation", I see many cells with output like "script= (help)". I suspect that an unescaped pipe in the error message output may be causing something unwanted to happen. – Jonesey95 (talk) 15:06, 20 November 2024 (UTC)
- Yep, fixed.
- —Trappist the monk (talk) 15:24, 20 November 2024 (UTC)
- Much better; thank you. – Jonesey95 (talk) 19:04, 20 November 2024 (UTC)
Non-latn text/Latn script subtag mismatch errors in ancient Iranian articles
Articles regarding ancient Iranian society like Mithra, Mantra (Zoroastrianism)#Etymology and Saoshyant#Etymology are showing this error recently, and I'm not sure how to fix them. —CX Zoom[he/him] (let's talk • {C•X}) 13:09, 26 November 2024 (UTC)
- Do you really mean to romanize Miθra and Miθraʰ with 'θ' (Greek small letter theta)? Do you really mean to romanize Astwat̰-әrәta and astvat-әrәta with 'ә' (Cyrillic small letter schwa)?
- Apparently there is no unicode for Latin theta so that may require some sort of modification to Module:Lang if, in fact, you did really mean to use the Greek theta character. There is a Latin small letter schwa: 'ə'. Wouldn't that be the correct choice when romanizing Astwat̰-әrәta and astvat-әrәta?
- —Trappist the monk (talk) 15:15, 26 November 2024 (UTC)
- Sorry, I don't know much about how romanization works, but I believe you are correct about the schwa symbol. For Latin theta, I think there needs to be an exception. Or maybe {{transliteration}} would fit better here? I saw it work fine in some other articles. —CX Zoom[he/him] (let's talk • {C•X}) 17:41, 26 November 2024 (UTC)
{{transliteration|ae|Miθra}}
should emit an error message because Greek theta is not Latin theta and in the rendering, 'Miθra' is marked up as Latin text:<span title="Avestan-language romanization"><i lang="ae-Latn">Miθra</i></span>
- Miθra
- For the same reason, were we using
{{langx}}
, there should be an error message:{{langx|ae|𐬨𐬌𐬚𐬭𐬀|Miθra}}
[[Avestan language|Avestan]]: <span lang="ae" dir="rtl">𐬨𐬌𐬚𐬭𐬀</span>, <small>romanized: </small><span title="Avestan-language romanization"><i lang="ae-Latn">Miθra</i></span>
- Avestan: 𐬨𐬌𐬚𐬭𐬀, romanized: Miθra
- These need to be fixed.
- I think that I have a solution to the
{{lang|ae-Latn|Miθra}}
where 'θ' is the Greek form but I'll hold off on implementing that until I've fixed the missing transliteration error messaging. - —Trappist the monk (talk) 19:21, 26 November 2024 (UTC)
- Sorry, I don't know much about how romanization works, but I believe you are correct about the schwa symbol. For Latin theta, I think there needs to be an exception. Or maybe {{transliteration}} would fit better here? I saw it work fine in some other articles. —CX Zoom[he/him] (let's talk • {C•X}) 17:41, 26 November 2024 (UTC)
- I have tweaked the sandbox so that when the Greek theta (U+03B8) is the only non-Latin character in a string of text, it is assumed to represent the non-existent (in Unicode) Latin theta. Here are a variety of illustrations:
- For
{{lang}}
:{{Lang/sandbox|ae-Latn|Miθraʰ}}
→ Miθraʰ – assume Latin theta becauseLatn
script specified and all other characters in<text>
are Latin script{{Lang/sandbox|ae-Cyrl|Miθraʰ}}
→ [Miθraʰ] Error: {{Lang}}: Latn text/non-Latn script subtag mismatch (help) – assume Latin theta because all other characters in<text>
are Latin script; script/text mismatch:Cyrl
script specified but<text>
is Latin script{{Lang/sandbox|ae|Miθraʰ}}
→ Miθraʰ – assume Latin theta because all other characters in<text>
are Latin script
- When theta is the only character in
<text>
:{{Lang/sandbox|ae-Latn|θ}}
→ θ – assume Latin theta becauseLatn
script specified{{Lang/sandbox|ae-Cyrl|θ}}
→ [θ] Error: {{Lang}}: Latn text/non-Latn script subtag mismatch (help) – assume Cyrillic theta becauseCyrl
script specified – Greek/Cyrillic Unicode mismatch not checked{{Lang/sandbox|ae|θ}}
→ θ – assume Greek theta because script not specified
- For
{{langx}}
:{{Langx/sandbox|ae-Latn|Miθraʰ}}
→ Avestan: Miθraʰ – assume Latin theta becauseLatn
script specified and all other characters in<text>
are Latin script{{Langx/sandbox|ae-Cyrl|Miθraʰ}}
→ [Miθraʰ] Error: {{Langx}}: Latn text/non-Latn script subtag mismatch (help) – assume Latin theta because all other characters in<text>
are Latin script; script/text mismatch:Cyrl
script specified but<text>
is Latin script{{Langx/sandbox|ae|Miθraʰ}}
→ Avestan: Miθraʰ – assume Latin theta because all other characters in<text>
are Latin script
- When theta is the only character in
<text>
:{{Langx/sandbox|ae-Latn|θ}}
→ Avestan: θ – assume Latin theta becauseLatn
script specified{{Langx/sandbox|ae-Cyrl|θ}}
→ [θ] Error: {{Langx}}: Latn text/non-Latn script subtag mismatch (help) – assume Cyrillic theta becauseCyrl
script specified – Greek/Cyrillic Unicode mismatch not checked{{Langx/sandbox|ae|θ}}
→ Avestan: θ – assume Greek theta because script not specified
- For
{{langx}}
with<translit>
: - For:
{{transliteration}}
{{transliteration/sandbox|ae|Miθra}}
→ Miθra – assume latin theta because<code>
is a language tag{{transliteration/sandbox|ae|θ}}
→ θ – assume latin theta because<code>
is a language tag{{transliteration/sandbox|latn|θ}}
→ [θ] Error: {{Transliteration}}: transliteration text not Latin script (pos 1: θ) (help) – assume latin theta because<code>
is a script tag{{transliteration/sandbox|cyrl|θ}}
→ [θ] Error: {{Transliteration}}: transliteration text not Latin script (pos 1: θ) (help) – assume latin theta because<code>
is a script tag{{transliteration/sandbox|ru|ш}}
→ [ш] Error: {{Transliteration}}: transliteration text not Latin script (pos 1: ш) (help) – error because<translit>
notlatn
script{{transliteration/sandbox|cyrl|ш}}
→ [ш] Error: {{Transliteration}}: transliteration text not Latin script (pos 1: ш) (help) – error because<translit>
notlatn
script
- For
- Without objection, I shall update the live module.
- —Trappist the monk (talk) 20:38, 27 November 2024 (UTC)
- Updated.
- —Trappist the monk (talk) 17:33, 28 November 2024 (UTC)
Category:Transliteration template errors $2
The article First Sino-Japanese War, in the sidebar box entitled "First Sino-Japanese War", contains a transliteration error and also appears to be assigning the nonexistent category Category:Transliteration template errors $2. I suspect that recent changes to this module or one of its subpages has caused this new, nonexistent category to appear. – Jonesey95 (talk) 18:45, 28 November 2024 (UTC)
- Fixed I think; the miscoding (on my part) also added articles to Category:Lang and lang-xx template errors $2. The article count in Category:Lang and lang-xx template errors was going down, which is an expected result of the change. On the other hand, Category:Transliteration template errors was not changing so I was beginning to wonder why. Now I know why.
- —Trappist the monk (talk) 19:35, 28 November 2024 (UTC)
- I figured it was a small typo like this. I don't go looking for these things, but I look at a lot of pages with errors in my travels, and I often stumble across new entries in error reports and categories that are caused by template and module changes. – Jonesey95 (talk) 21:03, 28 November 2024 (UTC)
Lone Common-script letter causes the non-Latin error
The lone {{transl|ar|ʾ}} (U+02BE ʾ MODIFIER LETTER RIGHT HALF RING) triggers the error as in DIN 31635. Looks like it works okay in longer words with other letters present, but not alone. – MwGamera (talk) 21:02, 28 November 2024 (UTC)
- To determine if
<text>
is Latin script, Module:Lang uses Module:Unicode data.U+02BE
is:{{#invoke:Unicode data|lookup|script|02BE}}
→Zyyy
- For a
Latn
determination, the<text>
must contain at least oneLatn
-script character and then may contain one or more characters fromZinh
(Code for inherited script)Zyyy
(Code for undetermined script),Zzzz
(Code for uncoded script) scripts. - Giving
{{transliteration}}
an okina (U+02BB), an apostrophe (U+0027) – or any other punctuation – will cause the same error message return:{{transl|ar|ʻ}}
→ ʻ – okina:Zyyy
←{{#invoke:Unicode data|lookup|script|02BB}}
{{transl|ar|'}}
→ ' – apostrophe:Zyyy
←{{#invoke:Unicode data|lookup|script|0027}}
- —Trappist the monk (talk) 23:41, 28 November 2024 (UTC)
- I mean, I can see that this is happening, that's why I mentioned it being of the Common script (aliased to the ISO code
Zyyy
here), but the result is clearly undesirable. Maybe the template wasn't meant to be used with single letters, but if the usage is appropriate (and it seems to be to me) then the check is incorrect. I'm sure it might help catching some mistakes, but the Script property of characters used and the language tag to mark it up with are conceptually related but different things. Since I'm not sure what exactly was intended, I'm just pointing out another place where the current solution fails short and needs someone's attention. – MwGamera (talk) 13:26, 29 November 2024 (UTC) - This needs to be fixed ASAP, as editors are responding by just removing the template. Remsense ‥ 论 04:25, 1 December 2024 (UTC)
- Is there any value in placing single punctuation in a language tag? Do screen readers read these differently? Gonnym (talk) 09:35, 1 December 2024 (UTC)
- I have no idea what screen readers do with symbols like that, but it affects font choice and other styling. I would consider it desirable to have all transliterations (or transcriptions) consistently marked up the same way no matter if they are of just a single letter (or phoneme) or of a longer word. – MwGamera (talk) 14:31, 1 December 2024 (UTC)
- I agree. I wrote the original
is_Latin
function (back when Module:Unicode data wasn't restricted to template editors) and I think in view of the cases of lone modifier letters, Module:lang should use a different function that checks that there are no non-Latin characters (for instance, no Cyrillic or Greek characters), but permits Common and Inherited characters. That might not be sufficient as I think some Greek characters are used in orthography of Latin-script languages and have no Latin-script equivalents (I can look for specific cases if there is interest), but it's an improvement. Lone Common-script characters should have the correct markup, and I should have thought of these cases when I was creating the function. — Eru·tuon 05:40, 29 December 2024 (UTC) - In fact, I believe that when I wrote the
is_Latin
function, it was only being used to decide whether to italicize foreign-language text (MOS:FOREIGN). I didn't intend it to decide whether {{transl}} should display an error. — Eru·tuon 00:36, 30 December 2024 (UTC)
- I agree. I wrote the original
- I have no idea what screen readers do with symbols like that, but it affects font choice and other styling. I would consider it desirable to have all transliterations (or transcriptions) consistently marked up the same way no matter if they are of just a single letter (or phoneme) or of a longer word. – MwGamera (talk) 14:31, 1 December 2024 (UTC)
- Is there any value in placing single punctuation in a language tag? Do screen readers read these differently? Gonnym (talk) 09:35, 1 December 2024 (UTC)
- Another example of the same problem with
{{lang|zh-Bopo|ˊ}}
in Special:Diff/1270491769 which will need to be reverted once the module is fixed. Like Remsense noted, the bogus error misleads editors into removing the template despite it being used correctly. – MwGamera (talk) 10:46, 21 January 2025 (UTC)
- I mean, I can see that this is happening, that's why I mentioned it being of the Common script (aliased to the ISO code
lang sandbox edits
@Gonnym: Something about this edit broke Module:Lang/sandbox so that the testcases fail.
Also: maker_error_span()
should be make_error_span()
?
—Trappist the monk (talk) 15:41, 30 November 2024 (UTC)
- Fixed both. make_error_span() could probably be replaced with
make_error_msg()
which also handles the span. I just created it to have that code be in one place while it was there. Gonnym (talk) 22:37, 30 November 2024 (UTC)
Issue with use in links
Discussion at Wikipedia:Main Page/Errors#Friday's FA has identified an issue with (some browsers') display of the title attribute for code like:
''[[École Polytechnique massacre|{{Lang|fr|École Polytechnique|italic=no}} massacre]]'''
where the displayed link contains text in more than one language (arguable in this case, but the point is general).
This could be remedied by allowing suppression of the title attribute, by writing, say:
''[[École Polytechnique massacre|{{Lang|fr|École Polytechnique|italic=no|title=no}} massacre]]''
or possibly better still by simply removing the title attribute completely.
Why do we need that attribute?
Can we apply one or other solution? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:26, 30 November 2024 (UTC)
- Correct me if I'm wrong, but isn't this fixed by the
nocat
parameter? Not sure about the title attribute though. - Also, @Trappist the monk, I believe the documentation has a typo when it references a hyphenated
no-cat
parameter as this resulted in an error message when I tried it on my end. Kilvin the Futz-y Enterovirus (talk) 07:44, 12 January 2025 (UTC)
Wrong font for lzh (Literary Chinese)
When using "lang|lzh" for Literary Chinese texts, it seems to be using a Taiwanese font?
For example, 有 is typically written as 月 which is also seen in historical texts such as in the Kangxi dictionary (inherited glyphs). But in the Taiwanese standard, they prefer to write it as ⺼ which is modern orthography (Traditional Chinese characters ≠ Literary Chinese characters). Another example would be 遣 where the radical ⻌ would be written as ⻍ according to the inherited glyphs, while the Taiwanese standard is ⻎. The template uses ⻎ instead of ⻍. How would one change it so that the template would use fonts (such as I.Ming) that are based on the inherited glyphs rather than the Taiwanese Traditional characters fonts (which are based on handwriting and their own standard)? Lachy70 (talk) 07:25, 4 December 2024 (UTC)
- This is only related to the fonts your system picks to render specific languages, and has nothing to do with Wikipedia. Remsense ‥ 论 08:04, 4 December 2024 (UTC)
- It doesn't use Taiwanese font for me. Unfortunately browsers allow configuring default fonts only for handful of languages (if at all) and
lzh
isn't among them. And system configuration might be difficult. The easiest way is to add something like[lang]:lang(lzh){font-family:"I.Ming"}
to your user style either in your browser, or just for when you're logged in to Wikipedia at common.css or global.css. The template documentation already covers that at § Applying styles. But if you think of something like overriding default fonts for everyone regardless of their system configuration, then this is something that definitely should not be done (MOS:FONTFAMILY). – MwGamera (talk) 05:51, 5 December 2024 (UTC)
Update to Module:Lang/sandbox
I've modified Module:Lang/sandbox to allow {{Wikt-lang}} to use the language html attribute logic instead of having to duplicate the entire code. Testcases at Module talk:Lang/testcases have all passed so nothing seems to have been broken. Let me know if you have any comments before I update. Gonnym (talk) 09:58, 16 December 2024 (UTC)
Transliteration whitelist
@Trappist the monk I don't think having a blanket whitelist of arbitrary non-Latin script characters makes sense, and especially not one which is as random as [ʻʼʾʿΔαβγδθσφχϑьᾱῑ῾上入去平]
. This is totally unsustainable, since it will constantly need to be expanded (e.g. I can already see that ъ
is missing, which crops up in various Slavicist transcripitons), and it also opens the door to false-negatives, because most of these will not be acceptable characters in the vast majority of languages. This seems like an artificially-imposed maintenance burden for increasingly little gain.
What I suggest is:
- Convert to form NFD before checking, which removes the need to have precomposed characters like
ᾱῑ
. - Allow all common script characters.
- Allow any characters marked with
Latn
in the Unicode ScriptExtension.txt file. - Generate a warning message via
mw.message
instead of a big error message, as it's overkill. - Create a maintenance category, and add all transcriptions containing non-Latin-script characters to it by default.
- Allow language-specific exceptions, specified in the data somewhere. These should only be added for really common cases.
- Implement an override, which can be specified using a parameter. This should be used in all other cases. Suggest it in the warning, too ("If this is correct, please...").
Theknightwho (talk) 16:34, 2 January 2025 (UTC)
- Yeah, I know, really crude. I did that for the avoidance of conflict.
- Hadn't thought about NFD and ScriptExtensions; I will.
mw.message
? Not sure how that would be used. My experience withmw.message()
is limited to rendering error messages with$1
,$2
, etc replacements. Can it be used to render messages someplace other than directly in the rendered article? Or were you perhaps thinking ofmw.addWarning()
?- Maintenance categories are problematic because quite often,
{{transliteration}}
is used in wikilinks and{{ill}}
templates: - Emitting a category wikilink inside another wikilink breaks the rendering.
- Yep, overrides are necessary because stuff like this:
{{transl|ja|Ama Kakeru ミ☆ Jōshikōsei}}
. - —Trappist the monk (talk) 19:57, 2 January 2025 (UTC)
- I strongly agree with User:Theknightwho on the problems with the whitelist. I think underlying problem with this breaking change stems from mixing two separate uses of the term 'Latn', without being clear about transliteration requirements.
-
-Latn
is the script portion of the IETF language tag, which is used to set thelang=
attribute (RFC-4646), which affects the display style of the inline text containing element (among other things,as noted by Template:Lang#Rationale). It is important that a single transliterated string has a consistent display style across all its characters, and with other transliterations in the same document. It's a sensible requirement for a en-wiki transliteration template where 'romanization' is a near synonym to use a 'Latn' display style.Latn
is also used in Unicode for the "predominant" [script value] of a single code-point.if the predominant use of the character is in one script, but it is also used in others, then it takes the Script property value associated with that predominant use
. This is a different (glyph level) classification, and doesn't directly relate to transliteration.
- It's hard to find a concrete example in the specs, so this could perhaps be explained better, but it is in fact completely reasonable to have a Greek theta character displayed side-by-side with Latn characters, all using the same Latn display style. This is what is required for Etruscan transliterations, and all the other non-Latn Unicode-script-class examples previously mentioned, including the "modifier" half circles used for Arabic, and the ъ mentioned above.
- The same string could be displayed using Greek display rules, but it would look wrong. It would also be wrong to use mixed styles in the same string. A 'Latin theta' is a semantically different symbol, which is why it has a different Unicode code point, and is also incorrect to substitute.
- The number of characters, or the Unicode script classification of any adjacent characters, are irrelevant for the display purposes if the transliteration is valid. Single character transliterations are totally valid. Ironically, the most obvious use is in transliteration tables.
[ʻʼʾʿΔαβγδθσφχϑьᾱῑ῾上入去平]
demonstrates that Unicode Script classification of individual glyphs is a different concern from a consistent transliteration display style. I have no idea what the CJK glyphs are doing there, I cannot verify any of it. It looks like nonsense. I know what the Greek symbols are, and don't even doubt those CJK are valid in some transliteration of something, but this partial list has no value AFAICT.- The current
IS-LATIN
whitelist function is misnamed. It's more of a is-valid-transliteration-string/char, but as stated above is of little value, impossible to maintain, and additionally seems to be based on misunderstandings. - Not only is it prone to false-positives, but every "true"-positive error it catches mis-characterises the problem. It's not that the string contains a non-"Unicode Script = Latn" character, rather the character possibly is not a valid transliteration symbol. At best, this is a heuristic for maintenance purposes, but even then it needs to be considerably smarter and have a better idea of what is and isn't valid transliteration. It is not appropriate for this to be raising error messages. Warnings at most, but it'd still be annoyingly noisy.
- Template:Transliteration/testcases are appallingly light and most of these basic transliteration cases that broke and seem to be a total surprise should be covered. Salpynx (talk) 21:08, 3 January 2025 (UTC)
- @Salpynx Just FYI,
上入去平
refer to the four tones of Middle Chinese, which are of fundamental importance in Chinese linguistics, so it's not that weird that they've come up. No modern variety has retained the Middle Chinese tone system (Mandarin having 4 tones is a coincidence - it's not a one-to-one conversion), and they're diaphonemic anyway (so IPA is out), so you sometimes see them given next to readings in a similar fashion to the tone numbers used in Wade-Giles or Jyutping. Theknightwho (talk) 02:16, 4 January 2025 (UTC)
- @Salpynx Just FYI,
- The re-use of 'is-latin' with the unjustified whitelist now breaks more things:
The Chinese character {{lang|und-Hani|上}} has 3 strokes
(modelled after an example from Template:Lang#Undetermined_language)
- The Chinese character 上 has 3 strokes
{{lang|und-Grek|σαβαθ}}
- σαβαθ
{{lang|ota-Grek|χαβα}} / {{lang|ota-Arab|هوا}} (Weather)
- χαβα / هوا (Weather)
- It's not clear what the original change was for, but it broke things without pre-discussion (AFAICT), despite the warnings on Template:Lang and Template:Transliteration. Patching it up piecemeal doesn't seem to be helping. Salpynx (talk) 05:30, 5 January 2025 (UTC)
How do I include a non-literal translation?
The langx template has a translation parameter, but it produces "lit. [text]". What should I do if I want to include a non-literal translation? TryKid [dubious – discuss] 18:21, 2 January 2025 (UTC)
- You don't have to use the translation parameter:
{{langx|es|casa}}, 'dwelling'
→ Spanish: casa, 'dwelling'
- Include punctuation and any descriptive text as you see fit. Of course, if you do sommat like that, some helpful editor is likely to come along and 'fix' your carefully crafted non-literal translation...
- —Trappist the monk (talk) 20:07, 2 January 2025 (UTC)
- I see. It's strange that the parameter automatically defaults to "literal translation". I think most of the useful translations included on Wikipedia aren't literal, but are cited to sources which make thoughtful decisions on how to translate something (e.g. Haravijaya, the reason I asked this question). Having a "lit." parameter seems like a magnet inviting original research from editors to translate something themselves.
- Any chance of changing this to something more sensible, maybe two separate lit and translation parameters? regards, TryKid [dubious – discuss] 21:23, 2 January 2025 (UTC)
- That is exactly what we do on Wiktionary, so I agree that it's a good idea. The difference is especially relevant if you're dealing with idioms: e.g. Greek ξεβράκωτος στ' αγγούρια (xevrákotos st’ angoúria, "caught with one's pants down; unprepared", lit. "pantsless among the cucumbers"). Theknightwho (talk) 02:22, 4 January 2025 (UTC)
x2 swap weirdness
For some reason, Lake Grahovo lead use of an x2-using template is not rendering, it says '<text1> is not Latin script', but this is a swapped variant, text2 is supposed to be Latin...? --Joy (talk) 21:46, 8 January 2025 (UTC)
- This is about this template:
{{lang-cnr-Cyrl-Latn|Граховско језеро|Grahovsko јezero}}
{{lang-cnr-Cyrl-Latn}}
wraps{{lang-x2}}
.{{lang-x2}}
requires Latn-script text in positional parameter{{{1}}}
, Cyrl-script text in positional parameter{{{2}}}
. I suppose for consistency,{{lang-cnr-Cyrl-Latn}}
requires Cyrl-script text in positional parameter{{{1}}}
and Latn-script text in positional parameter{{{2}}}
. It then swaps them to the order required by{{lang-x2}}
by way of|_alias_map=1:text2, 2:text1
and applies|swap=yes
to render Cyrl-script text first.- The error message is supplied by
{{lang-x2}}
. Because{{lang-x2}}
is wrapped by{{lang-cnr-Cyrl-Latn}}
,{{lang-x2}}
uses{{lang-cnr-Cyrl-Latn}}
in the error message. Editors looking to fix the error, won't find{{lang-x2}}
in the article wikitext so using the name of the wrapping template helps to locate the source of the error.{{lang-x2}}
cannot know that{{lang-cnr-Cyrl-Latn}}
swaps the inputs; all it sees is non-Latn-script text where Latn-script text ought to be. - For this template,
{{lang-cnr-Cyrl-Latn}}
parameter{{{2}}}
needs fixing. - —Trappist the monk (talk) 00:14, 9 January 2025 (UTC)
- I don't understand. How does it work with {{lang-cnr-Cyrl-Latn|Скадарско језеро|Skadarsko jezero}} at Lake Skadar?
- Here's both of them evaluated here:
- Error: {{lang-cnr-Cyrl-Latn}}: <text1> is not Latin script (pos 11) (help)
- Montenegrin: Скадарско језеро, Skadarsko jezero
- What's the actual difference? --Joy (talk) 09:11, 9 January 2025 (UTC)
- I noticed it also complains in the inverse variant:
- Error: {{lang-cnr-Latn-Cyrl}}: <text1> is not Latin script (pos 11) (help)
- Montenegrin: Skadarsko jezero, Скадарско језеро
- Is it possible that one of those letters looks like a Latin letter but is actually Cyrillic as well? I think at some point I saw an error message saying "at position XY" but it's not showing that right now... I wonder what are the best tools we would have for editors to detect this. Something like od -a but in Mediawiki syntax? --Joy (talk) 09:13, 9 January 2025 (UTC)
- The ostensibly Latin letter was "ј" (I used an external program to find it). With that retyped using a Latin keyboard layout, it works:
- Montenegrin: Grahovsko jezero, Граховско језеро
- --Joy (talk) 09:16, 9 January 2025 (UTC)
- I've added position indicator to the error message.
- My favorite tool for this is Uniview.
- —Trappist the monk (talk) 14:22, 9 January 2025 (UTC)
- Thanks! --Joy (talk) 10:37, 11 January 2025 (UTC)
- I noticed it also complains in the inverse variant:
Two languages?
Does langx support two languages in a single template setting? Like this: ({{langx|bn|ক|en|K}}) = (Bengali: ক, English: K). It would be great. Mehedi Abedin 19:58, 10 January 2025 (UTC)
- It does not.
- —Trappist the monk (talk) 20:54, 10 January 2025 (UTC)
Error from Yin and Yang
I think 阳 and 阴 passing as 'Latn' here (via whitelist) is generating an error in Yin and Yang. It is already defined that {{Lang|zh-Hans|{{{s}}}}}
in {{Infobox Chinese/Chinese}}. I think 'zh-Hans' in the template conflicts with 'Latn' in the whitelist here when [[wikt:]] (appears to be mixed). But I am not sure ...
GKNishimoto (talk) 08:10, 13 January 2025 (UTC)
- You may be right. I plucked those two characters from the infobox in Guanyin. There, they are used in transliterations of Suzhounese variety of Wu Chinese text in conjunction with some of the Wu tone indicator characters:
| suz = Kue<sup>阴平</sup> In<sup>阴平</sup>
| suz2 = Kue<sup>阴平</sup> Syu<sup>阴去</sup> In<sup>阴平</sup>
- Also found at Morris Chang in this thing:
|wuu=Jiann<sup>阴平去</sup> Zong<sup>阴平去</sup>mœü<sup>阳舒</sup><small> (urban [[Ningbo]])</small><br> Jia<sup>阴上</sup> Zong<sup>阴平去</sup>mœü<sup>阳舒</sup><small> (rural Ningbo)</small><br> Jjia<sup>阳舒</sup> Zong<sup>阴平去</sup>mœü<sup>阳舒</sup><small> (rural Ningbo)</small><br> Jiann<sup>阴平</sup> Jiong<sup>阴平</sup>miu<sup>阳平</sup><small> ([[Ninghai County|Ninghai]])</small><br> Jjiann<sup>阳平</sup> Jiong<sup>阴平</sup>miu<sup>阳平</sup><small> (Ninghai)</small>
- Are 阳 and 阴 valid tone indicators in Suzhounese Wu romanizations? What about 舒 which also appears in that Morris Chang romanization?
- Looking at Romanization of Wu Chinese, only one of the four tone characters that frequently appear in Wu transliterations, 去, appears in that article. For me that begs the question: Should en.wiki be using those four 'tone' characters (上, 入, 去, 平)) as tone indicators when our own article on Wu romanization does not use them as such?
- —Trappist the monk (talk) 13:49, 13 January 2025 (UTC)
- Among us (speakers of at least one of the languages of the CJK triad) these characters are like "drawings/diagrams" expressing ideas/concepts. Even when we don't know how to pronounce them (due to dialect variations), we can deduce their meanings just by superficially visualizing the ideogram. We still have those cases (usually in the Chinese part) in which they are used as tone indicators (perhaps "a form of IPA"). I do think it's interesting that this type of detail can be visualized next to the romanizations, because it makes it bidirectionally easier for bilinguals (the "same" romanization can have completely different meanings, when we don't "get lost from there to here", "get lost from here to there", but we can assimilate the line of reasoning by putting the parts together). I'm not "yet" a fluent speaker of English, Japanese, and Chinese. I prefer not to commit to the working method of linguists.
- My proposal regarding the issue of errors generated is that be implemented in the module code a routine that does not take into account the parts that are explicitly flagged (such as creating a parameter
|ignore=<part that will not be checked>
or a|skip_subscript_tag_check)
"only" for mixed cases. Only at the moment when the value of the argument is checked and compared with the script subtag. After the check, it would pass without any problems as it came in. I know it is not easy. I analyze, study, try to understand, and customize several of the modules that you have edited recently. Last night I saw this issue and decided to flag it here. The issue of the infobox in Yin and Yang can be easily "remedied" by removing the wikilink. It is the part related to programming (of the module) that, from my point of view, seems educationally interesting for those who have the knowledge and time available. - Thank you for your attention (and the good work you are doing). I will meditate (read/study) a little on your message.
- GKNishimoto (talk) 19:11, 13 January 2025 (UTC)
- Yes, they are tone indicators, as you can see at Suzhou dialect#Tones. Please stop disrupting the encyclopedia by insisting that editors come here and explain the use of non-Latin-script characters in transliterations, which were never a problem before your unilateral decision to contrive a Latin-script-checking function that was never intended for this template's purpose. How many more new page sections here and at Template talk:Transliteration do we need before this is over? You actually don't know what you're doing. Hftf (talk) 10:51, 15 January 2025 (UTC)
Template-protected edit request on 16 January 2025
This edit request to Module:Lang/data/iana languages has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
change "["zkt"] = {"Kitan"}," to "["zkt"] = {"Khitan"}," Hanzlan (talk) 15:37, 16 January 2025 (UTC)
i18n request
Can you move common parameters like |code=
, |text=
, |translit=
, |translation=
, |lit=
, |label=
, |label=
to Module:Lang/configuration so that it's easier to internationalize the module? Right now I'm trying to implement the module on ku.wiki and having problems with the translation. Thank you! Wikihez (talk) 20:39, 16 January 2025 (UTC)
What on earth is User:Trappist the monk trying to accomplish with the IS LATIN check?
It has broken many tranlisteration templates again on Etruscan language, and presumably elsewhere. The previous whitelist version also had left many invalid error messages on pages for months. Now it is worse. It's proving difficult to have a constructive conversation about this, before or after the breaking changes.
Users raised specific problems with this IS LATIN check on this page and Template:Transliteration, but fundamentally the check and error message are misguided. It's not clear (from the documentation or code) how the current IS LATIN (by checking individual Unicode codepoint primary script classes) code relates to languages or transliteration, or what useful feature it is trying to add -- other than a naive attempt to validate something that is trivailly more complex than that. It seems like it is there for some wiki-gnome homoglyph fixing task for Category:Transliteration_template_errors, but it's not appropriate for an Error message on the module. And, as implemented, is not even fit for a warning or bot fixer task. It requires too much human confirmation.
Spamming fake error messages on multiple pages for months is more disruptive that leaving visually identical characters in strings that are unnoticible to humans, and correctly indexed and matched due to charcter folding. What is this even for? I am dissapointed in how this has been handled. Salpynx (talk) 23:21, 18 January 2025 (UTC)
- Thank you User:Trappist the monk for fixing those broken templates on the one page I mentioned with your edit Special:Diff/1270316092. It looks like a typo fix in a overly complex implementation of an unwarranted feature. I am glad my comments are having an incremental effect on improving Wikipedia.
- The next page to fix is Proto-Semitic language, if we have to play the "fix specific issues character by character" game. Again, I don't know how to offer specific constructive advice on this one, because non-Latn theta is valid on the Etruscan language page, why is it not valid on Proto-Semitic language? Do we need to have the single character transliterations are fine discussion again?
- It's great that there are so many live examples where we can find exceptions to the "must be Latn" falsehood. Testing in production is clearly fine. Salpynx (talk) 22:42, 19 January 2025 (UTC)
- User:Trappist the monk I don't understand how Proto-Semitic language came right. Category:Transliteration template errors has changed from around 200 listings to 115 over the course of today, which may be a good indicator of how many of these were spurious errors, and how much noise this is generating. Next live issue as of this message: Scythian languages, with the script theta which was raised in November last year at Template_talk:Transliteration#Underdocumented_error_type. Salpynx (talk) 05:13, 20 January 2025 (UTC)
- User:Trappist the monk, your continued recent edits to Module:Lang/data/is_latn_data are what are removing and changing the count of reported transliteration "errors" without leaving a clear history. I figured this out by digging through code. Category:Transliteration template errors now lists 94 errors -- it has more than halved in the last few days with these arbitrary rule changes. There are more unorganised exceptions encoded in the incomplete list than there were "real" problem cases in the first place. This is not a good use of your time, or mine.
- Scythian languages no longer has errors. Ghadamès language still does.
- We are almost at the point where the IS LATIN check is reporting no errors on existing wiki pages (what was the point?). Now every new use of the transliteration template could potentially require new, illogically framed, exceptions to be added to the code. This is totally unmaintainable (as pointed out previously).
- You are using the technical term 'Latn' in a way that was already confusing to other editors, and have now created a totally idiosyncratic personal definition where the 'Latn' class of a symbol depends on "the language" it is transliterating. Not only is this dumb because this is not what 'Latn' is in its own context. It is doubly dumb because it misses the transliteration 101 fact that transliterations apply primarily to scripts. For every language exception currently in the exception list, we need corresponding script exceptions. There are also valid reasons where `und` and `mis` would be appropriate in some cases. It also seems to ignore that there are potentially multiple transliteration schemes for any given script.
- dand-y This has invalid characters in the transliteration. One is arguable, depending on context. This feature does nothing to help.
- [θaurχ] Error: {{Transliteration}}: transliteration text not Latin script (pos 1: θ) (help)
- [θaurχ] Error: {{Transliteration}}: transliteration text not Latin script (pos 1: θ) (help)
- [σa- ςa- sa-] Error: {{Transliteration}}: transliteration text not Latin script (pos 5: ς) (help)
- [λτλβ] Error: {{Transliteration}}: transliteration text not Latin script (pos 4: β) (help)
- [λτλβ] Error: {{Transliteration}}: transliteration text not Latin script (pos 1: λ) (help) The different errors on these two examples are confusing and unhelpful. Can you figure out what is correct?
- This continues to be disruptive, and is silently baking in an entire misguided data module into the template architecture. AFAICT all WP:CONSENSUS has been against the breakages this has caused with transliteration, and there have been no concrete examples of anything positive from this for anyone else to support.
- Can someone else help here with WP:DISPUTE? Every stupid nitpick I've made above has resulted in User:Trappist the monk adding code to resolve an individual observation on how wrong this is. I can keep generating examples, so we will have code exceptions to handle valid transliterations that only exist on Wikipedia talk pages for the purpose of criticising Module:Lang/data/is_latn_data.
- The specific error message added in Special:Diff/1260409561 and all supporting code which affects transliteration templates is:
- Invalid
- Unhelpful (demonstrably has broken more than it has "correctly" flagged)
- Has and continues to be disruptive with red error text where none should be, and in the mental overhead of figuring out what is actually correct, and why the errors are appearing at all
- Introduces a mess of overly-complex code and future maintenance burden, with no clear purpose or documentation
- Is creating new wiki-only definitions of standard externally verifiable terms like "Latn" and "transliteration" as a result of one user's Cognitive dissonance. Surely this is not what we want? Salpynx (talk) 22:14, 20 January 2025 (UTC)
- Stop sounding entitled and maybe editors will want to respond to your wall of text. There was obviously errors in ways editors used language templates. Trappist the monk tried fixing some of them. They might have had some bugs in their code, that's how code works. Stop with the wall of text and instead plainly write this:
- Gonnym (talk) 10:44, 21 January 2025 (UTC)
Error: Expected result:
- Nobody *wants* any of this BS except for Trappist the monk — though actually even Trappist the monk seems to not want this either as he continues to not actually engage in any conversation and silently continues with his unilateral epicycle-drawing two months on.
- Sure, there were a handful of obviously erroneous characters (the only example I even remember at this point was a Cyrillic O with macron used in "Kо̄sen") BUT THE NUMBER OF SUCH ERRORS WAS NOT ANYWHERE NEAR TO THE SCALE that introducing disruptive red error messages for thousands of false positives across the encyclopedia was ever an appropriate solution!!! Why do we even need this code in a submodule called by millions of pages. You can just write a program to periodically search for them and fix the tiny number of issues which need to be reviewed by human hands ANYWAY. How do people not understand this??? Hftf (talk) 11:05, 21 January 2025 (UTC)
- It's not surprising to see people getting annoyed when most of the time the error appears is because of incorrect assumptions in the code rather than the inappropriate use of the template. It would be a different story if it were just a maintenance category or a message like cs1-maint or cs1-hidden-error, but this outright breaks the previously correct rendering for all readers. It looks like simply scanning dumps for inappropriate uses of lang/transl/etc would require less person-hours to get the actual misuses fixed. – MwGamera (talk) 11:25, 21 January 2025 (UTC)
- Error: Category:Transliteration_template_errors
{{transliteration}} expects <text> to be written using Latn-script characters.
does not apply generally to transliterations, as evidenced by the many transliterations which require non-Latn / non-Latin characters, so should not be reported as an error. - Expected result:
- Error reports on templates should reliably represent actual errors, and have accurate descriptions.
- Personal code to help fix a few specific transliteration homoglyphs belongs outside Lang module code used on ~1.5M pages.
- Transliteration template code should be protected by more test cases to prevent breakages on basic examples.
- Feature changes to widely used templates should be discussed first per the Template:High use warning on Template:Transliteration
- Developers show willingness and responsiveness to address bug reports and comments on high use template and module code when valid concerns are raised, ideally by reverting breaking changes immediately until a correct approach can be worked out.
- Given User:Trappist the monk's otherwise good work on Module:Lang, I assume they can read and understand my text. I clearly struggle to be succinct when discussing nuances of Module and Template coding that is required to cover transliterations of every language and script. I've tried to be informative and accurate.
- Concessions to readability weren't made to me (or other editors) as I had to read through hidden, rapidly changing, and insufficiently documented Lua code across multiple templates and module files to process responses to my points. Salpynx (talk) 23:23, 21 January 2025 (UTC)
quotation marks and apostrophes
When writing prose, and there's a need to nest quotations, we have specialized templates (e.g. {{single+double}}) we can use to distinguish apostrophes and quotation marks. For example,
Anthony heard on the subway the other day, "She actually said, 'Oh my god, Beckie, look at her butt.'"
When using a citation template's |quote=
parameter, if the text begins or ends with an apostrophe, the citation template also automatically applies that distinguishing spacing, too. For example,
Mix-a-Lot, Sir. "Baby Got Back". Mack Daddy.
'It is so big. She looks like one of those rap guys' girlfriends', she continued.
However, when using this template, combining the translation and nested quoting, we get both punctuations abutting each other. For example,
French: Elle a exprimé sa confusion en précisant : « Qui comprend ces gars du rap ? », lit. 'She expressed her confusion, elaborating, "Who understands those rap guys?"'
Can such automatic quotation mark-apostrophe-spacing be implemented in this template, too? — Fourthords | =Λ= | 14:52, 22 January 2025 (UTC)
Non-English language
I'm trying to implement this Module:lang/documentor tool, but it keeps showing me message Lua error in Module:Lang/documentor_tool at line 330: bad argument #1 to 'gsub' (string expected, got nil). What to do? Enkhsaihan2005 (talk) 13:24, 28 January 2025 (UTC)
- Where are you seeing this?
- —Trappist the monk (talk) 14:16, 28 January 2025 (UTC)
- For example, https://mn.wikipedia.org/wiki/%D0%90%D0%BD%D0%B3%D0%B8%D0%BB%D0%B0%D0%BB:%D0%90%D0%B1%D1%85%D0%B0%D0%B7-%D1%85%D1%8D%D0%BB_%D0%B4%D1%8D%D1%8D%D1%80_%D1%82%D0%B5%D0%BA%D1%81%D1%82_%D0%B0%D0%B3%D1%83%D1%83%D0%BB%D1%81%D0%B0%D0%BD_%D3%A9%D0%B3%D2%AF%D2%AF%D0%BB%D1%8D%D0%BB Enkhsaihan2005 (talk) 14:22, 28 January 2025 (UTC)
- If I am to believe Google translator, Ангилал:Абхаз-хэл дээр текст агуулсан өгүүлэл has
<language name>
at the beginning of the category name. If that is the case then the patternsplit_title
at line 345 is not correct. That pattern expects<text><language name>-<text>
('Articles containing German-language text' at en.wiki for example). You might change the pattern to%%s([^,]*)
. That should work forLANGUAGE_TEXT
but may not work for your as-yet undefinedLANGUAGES_COLLECTIVE_TEXT
andMONGOLIAN
. - If you change the pattern so that it has only one capture, line 360 isn't needed for
LANGUAGE_TEXT
. If it is needed forLANGUAGES_COLLECTIVE_TEXT
orMONGOLIAN
, you might change it to:page_title_modified = page_title_modified:gsub(part2 or '', "")
- —Trappist the monk (talk) 15:46, 28 January 2025 (UTC)
- Thanks, now it has lua error at line 390 Enkhsaihan2005 (talk) 15:56, 28 January 2025 (UTC)
- Apparently there is a bug at line 390.
"LANG"
should be"TEMPLATE"
. I'll fix that here. But, you shouldn't be landing at line 390. You are because at line 725 in Module:Lang, you have whitespace after the colon in'[[Ангилал: '
. - —Trappist the monk (talk) 17:12, 28 January 2025 (UTC)
- What should I do? Enkhsaihan2005 (talk) 17:16, 28 January 2025 (UTC)
- Did I not just say what needs to be done? In Module:Lang/documentor_tool at line 390 change:
"LANG"
to"TEMPLATE"
. In Module:Lang at line 725 change'[[Ангилал: '
to'[[Ангилал:'
- —Trappist the monk (talk) 17:20, 28 January 2025 (UTC)
- Ok, it works now. Thank you very much Enkhsaihan2005 (talk) 17:22, 28 January 2025 (UTC)
- Did I not just say what needs to be done? In Module:Lang/documentor_tool at line 390 change:
- What should I do? Enkhsaihan2005 (talk) 17:16, 28 January 2025 (UTC)
- Apparently there is a bug at line 390.
- Thanks, now it has lua error at line 390 Enkhsaihan2005 (talk) 15:56, 28 January 2025 (UTC)
- If I am to believe Google translator, Ангилал:Абхаз-хэл дээр текст агуулсан өгүүлэл has
- For example, https://mn.wikipedia.org/wiki/%D0%90%D0%BD%D0%B3%D0%B8%D0%BB%D0%B0%D0%BB:%D0%90%D0%B1%D1%85%D0%B0%D0%B7-%D1%85%D1%8D%D0%BB_%D0%B4%D1%8D%D1%8D%D1%80_%D1%82%D0%B5%D0%BA%D1%81%D1%82_%D0%B0%D0%B3%D1%83%D1%83%D0%BB%D1%81%D0%B0%D0%BD_%D3%A9%D0%B3%D2%AF%D2%AF%D0%BB%D1%8D%D0%BB Enkhsaihan2005 (talk) 14:22, 28 January 2025 (UTC)
ISO [und] means the language of a text or recording is unknown, not that there is no language.
For example, a text that hasn't been deciphered, or a recording where the label has been lost. It doesn't mean a glyph without connection to a particular language.
We give the example 'The Chinese character {lang|und-Hani|字} has 6 strokes.' But this is incorrect - there is no language here that remains to be identified, so the 'lang' template is inappropriate. The second model is the right approach - 'The Chinese character {script|Hani|字} has 6 strokes.' — kwami (talk) 01:49, 29 January 2025 (UTC)
- I am not exactly sure why you assert this narrower sense: und in ISO 639 indicates "a language or languages [that] must be indicated but the language cannot be identified"—it cannot be identified because it's indeterminate. Otherwise, text would simply be coded as English, which is not correct. That's what I understand to be the case, anyway. Remsense ‥ 论 03:47, 29 January 2025 (UTC)
- But it's not a language. Without being a language, it can't be an unidentified language. Otherwise we'd need to tag a photo of a dog as 'und', because we can't say what language it is. — kwami (talk) 08:39, 29 January 2025 (UTC)
zxx
exists for dogs. If you're talking just about a graphical shape of a character, I'd say it's applicable as well. It's a matter of a convention and there's no universal one as far as I know. The inline use of {{Script}} should be always discouraged. – MwGamera (talk) 12:07, 29 January 2025 (UTC)- I didn't mean a recording of a dog, but of a photo of a dog, as it would be graphical like a letter. But you think we should use [zxx] 'no linguistic content' for graphemes? That makes sense to me. — kwami (talk) 21:23, 29 January 2025 (UTC)
- But there is linguistic content, right? I suppose it's possible I've conflated undetermined and indeterminate. Remsense ‥ 论 18:01, 30 January 2025 (UTC)
- Out of context, 字 or the letter 'A' have no language content. There's no associated language that would warrant an ISO language code. They might get an ISO script code, but not a language code.
- If used for a particular language, they should be labeled with the ISO code of that language.
- If we know it's a language, but we can't tell which one -- and the source itself wasn't able to ID the language -- then it would be code [und].
- If we can determine the language, and just haven't looked up its ISO code yet, then we should leave it blank. We might tag it for 'needs ISO code'. That's exactly what the ISO-xx template was - 'put the ISO code where the xx is when we're ready for it.'
- We could create a private-use code like [qqq] for 'has an ISO code but we haven't gotten around to entering it yet.' — kwami (talk) 19:04, 30 January 2025 (UTC)
- What about contexts where a character is being discussed in a manner expressly applicable to any one, or any combination thereof, of a specific set of languages? This is very often the case with logographs. They cannot be said to have "no language content", because they have e.g. the distinct meaning 字 → 'character'. Remsense ‥ 论 19:11, 30 January 2025 (UTC)
- That's also very often the case with alphabetic letters, and is what the ISO script codes are for. The language codes are for languages. 字 [or A] has no language code, only the language it transcribes does. If it doesn't transcribe a language, it would be inaccurate to claim that it does.
- Evidently we've decided not to tag scripts. I haven't seen the discussion, but if that's so, then we don't tag scripts. So 字 or 'A' wouldn't get any tag. — kwami (talk) 19:17, 30 January 2025 (UTC)
- I don't want to lead the discussion in circles or create undue headaches, but I suppose that to me would still seem to explicate—not merely imply—that passages left untagged as such are in English, when they are specifically being described as text in one or more languages other than English. Remsense ‥ 论 19:25, 30 January 2025 (UTC)
- If it were a passage, sure. But the letter 'A' isn't a passage, it's just a letter. In a mathematics text, we wouldn't tag the digit '3' for a language either. Sure, someone might be counting in English, but '3 + 2 = 5' is divorced from any particular language. Same with 'the letter A is the first letter of the alphabet' or 'the IPA uses the letter ʔ for the glottal stop.' — kwami (talk) 19:57, 30 January 2025 (UTC)
- If there is a language involved, then it obviously should be tagged with that language. I'd say it applies to any meaningful unit, even if it's a single-letter morpheme that cannot appear standalone. E.g. I would want -s to be tagged as English when discussing English plurals (obviously a bad example for English Wikipedia where the body of text is already in English, but you get the idea). However, I would like to remind you that tagging serves a mundane purpose rather than some imaginary goal of accuracy in representation: font selection, screen readers, etc. If multiple languages use an identical graphical form and either would work, I think it's ok to just tag one. I think zxx is a good way to make it explicit it's a mention of a letter as a letter rather than a use of it to write a single-letter word. But which sounds better: The Chinese character [tsɿ˥˧] has six strokes or The Chinese character [dʑi] has six strokes or The Chinese character Chinese letter has six strokes? The und might be a stretch, zxx is reasonable to me, but I would see no fault in using simple zh either. It's a part of formatting. If it needs to be communicated to the reader that it applies to multiple languages, it needs to be done with actual words rather than just tagging. – MwGamera (talk) 00:56, 31 January 2025 (UTC)
- User:Remsense's first response is broadly correct. The wiki templates are effectively defining
{lang|und-Hani|字}
and{script|Hani|字}
as identical because they have the same effect on generated markup; a consequnce of the lang module handling both lang and script output. The ISOund
code appears unfortunately, but deliberately, flexible on this so that "undetermined and indeterminate" aren't distinguishable. The definition is in terms of the practical: "must be indicated but ..." - I do agree with the original post that
{script|Hani|字}
is semantically more accurate, so where it is important to distinguish say, an artefact with a unknown-language 字 marking (uselang=und-Hani
) vs. the character 字 applicable to any and all language, use Script directly... but the rendered output is likely the same. Any further objections should be in terms of expected display rendering or screen-reader handling. The original example is a specific example of using the lang template for its script-only rendering, so is I believe is correct for that purpose. - Also, I disagree that
zxx
is better here thanund
.zxx
applies more to asemic writing or glossolalia etc.{lang|zxx-Hani|字}
feels wrong, because if 字 is really non-linguistic, it's also not Hani. Its just a different symbol with no meaning that happens to look like an existing Hani symbol. Salpynx (talk) 02:10, 31 January 2025 (UTC)- You're essentially arguing that it cannot be fully detached from the language. I don't think I agree, but I do agree it isn't really wrong to use
und
. Just make sure we all stick to one. But claiming that{{lang|und-Hani|字}}
and{{script|Hani|字}}
are identical is strictly incorrect. The markup is completely different. The latter is a display kludge that exists solely to change fonts and it leaves it marked up in the language of the surrounding text. (It's a no-op in case of Hani. It's also less useful than it sounds. Most of the time the script can be inferred from the characters used. But you can't differentiate between different standards of Hant or Hani, for example, so it can't help with something like § Wrong font for lzh (Literary Chinese) in its current form. Compare with {{vi-nom}} which is like some of the {{script}}'s subtemplates that do both things.) It's completely orthogonal to {{lang}} and most, it not all, of uses of {{script}} I've seen go against my understanding of MOS (i.e. the editor just liked the template's Hebrew font better than their system's default or they wrongly thought Persian must always use Nastaliq). It might be reasonable to have a template specifically for marking up graphemes as graphemes and it would probably need to do no more than what {{script}} already does. It's not obvious to me if it needs to be additionally delimited as a different language/non-language, but I see advantages of doing that as I shown in the example with the pronunciation of the Chinese character. {{Script}} doesn't do it. – MwGamera (talk) 14:32, 31 January 2025 (UTC) - (edit conflict)
The wiki templates are effectively defining
Umm, not true. At the top of every rendered en.wiki page is this:{lang|und-Hani|字}
and{script|Hani|字}
as identical because they have the same effect on generated markup<html class="client-nojs" lang="en" dir="ltr">
- which sets the page default language to English.
{{lang}}
overrides that declaration, in this case forund
(undetermined language):{{lang|und-Hani|字}}
→<span title="undetermined-language text"><span lang="und-Hani">字</span></span>
→ 字
{{script}}
does not override the default language so the wrapped character is treated as English by browsers and screen readers:{{script|Hani|字}}
→'"`UNIQ--templatestyles-00000124-QINU`"'<span class="Unicode">字</span>
→ 字
a [consequence] of the lang module handling both lang and script output.
Also not true. Module:Lang has nothing to do with{{script}}
.- —Trappist the monk (talk) 16:03, 31 January 2025 (UTC)
- Thanks for correcting me on the implementation details. I remain confused why
{{lang}}
's use of script seems different from{{script}}
's.{{lang}}
seems more formal and fits better with semantic markup, and for display and accessibility. It'd make sense for{{script}}
and{{lang}}
to handle "script" the same way, since I believe it's the same concept. - I still think my description of the intent and definition was correct for the old version, which changed with Special:Diff/1272522517. The previous version was coherent and practical, but so is the new version. The use of the
{{lang}}
template is cleaner now; so I guess that's good. Unfortunately{{script}}
seems considerably less sophisticated, and it remains unclear how best to mark-up text for both language and script. - I'd expect
{{script|Hani|字}}
→'"`UNIQ--templatestyles-0000012A-QINU`"'<span class="Unicode">字</span>
to includeHani
as important semantic info... and I think the way to do it would be viaund-Hani
for anything that could interpret it. You are right here, but{{script}}
seems lacking. Salpynx (talk) 07:06, 1 February 2025 (UTC)
- Thanks for correcting me on the implementation details. I remain confused why
- You're essentially arguing that it cannot be fully detached from the language. I don't think I agree, but I do agree it isn't really wrong to use
- User:Remsense's first response is broadly correct. The wiki templates are effectively defining
- I don't want to lead the discussion in circles or create undue headaches, but I suppose that to me would still seem to explicate—not merely imply—that passages left untagged as such are in English, when they are specifically being described as text in one or more languages other than English. Remsense ‥ 论 19:25, 30 January 2025 (UTC)
- What about contexts where a character is being discussed in a manner expressly applicable to any one, or any combination thereof, of a specific set of languages? This is very often the case with logographs. They cannot be said to have "no language content", because they have e.g. the distinct meaning 字 → 'character'. Remsense ‥ 论 19:11, 30 January 2025 (UTC)
- But it's not a language. Without being a language, it can't be an unidentified language. Otherwise we'd need to tag a photo of a dog as 'und', because we can't say what language it is. — kwami (talk) 08:39, 29 January 2025 (UTC)
- If we are to believe this search, there are about 140
{{script}}
sub-templates. Of those, there are 18 that explicitly set the htmllang=
attribute or allow users to set it via a parameter.{{script}}
is a styling template so those 18 sub-templates are in violation of Template:Script § Usage and should probably be fixed so that they comply. - —Trappist the monk (talk) 13:14, 29 January 2025 (UTC)
- No. The accidental wording of template docs shouldn't override valid uses cases of other wiki-editors. For Template:Script/Arabic and Template:Script/Cuneiform, language is relevant in determining the correct display style. Salpynx (talk) 02:29, 31 January 2025 (UTC)
- Accidental? Really? Explain how this unchallenged edit by Editor Ineffablebookkeeper was somehow 'accidental'.
- The rendering of
{{Script/Arabic}}
, when a font specified in Template:Script/styles arabic.css exists on the local machine, is not modified by|lang=
attribute:{{Script/Arabic|أَكْتُبُ}}
→'"`UNIQ--templatestyles-0000012D-QINU`"'<span class="script-arabic script-Arab" dir="rtl" style="font-size: 125%; " >أَكْتُبُ</span>‎
→ أَكْتُبُ{{Script/Arabic|أَكْتُبُ|lang=ar}}
→'"`UNIQ--templatestyles-00000132-QINU`"'<span class="script-arabic script-Arab" lang="ar" dir="rtl" style="font-size: 125%; " >أَكْتُبُ</span>‎
→ أَكْتُبُ
- It is true that
{{Script/Cuneiform}}
does use the first positional parameter to select certain specific fonts:8
→ Santakku11
→ Assurbanipal12
→ Esagilana
→ UllikummiA – what isana
supposed to really be?ana
is the ISO 639-3 tag for Andaqui language
- or with
4
–7
,9
,10
,elx
(Elamite),sux
(Sumerian), andxeb
(Eblan) to inhibit the use of the template's default fonts (why?) - And, for the sake of completeness, the other sixteen
{{script}}
subtemplates that can or do set thelang=
attribute:- these use
{{{1}}}
as the language in alang="{{{1}}}"
attribute when{{{2}}}
is present{{Script/Brahmi}}
– 44 transclusions{{Script/Glagolitic}}
– 32 transclusions{{Script/Kharosthi}}
– 3 transclusions{{Script/Slavonic}}
– 185 transclusions
- these unconditionally set the
lang=
attribute:{{Script/Balinese}}
– 53 transclusions –lang="ban"
{{Script/Cham}}
– 13 transclusions –lang="cjm"
{{Script/Classical and Medieval Latin}}
– 59 transclusions –lang="und-Latn"
{{Script/Limbu}}
– 17 transclusions –lang="lif"
{{Script/Runic}}
- 175 transclusions –lang="gem-Runr"
{{Script/Sundanese}}
– 39 transclusions –lang="su"
- this is similar to
{{Script/Arabic}}
but uses Template:Script/alkalami-regular.css:{{Script/Hausawi}}
– 8 transclusions – optionallang="{{{lang}}}"
- for these, the language tag has a default that may be overridden but the script subtag is unconditionally set:
{{Script/Sylheti}}
– 16 transclusions – defaults tolang="syl-syloti"
(scriptsyloti
is forced; invalid script is invalid; should beSylo
?){{Script/Syriac}}
– 2 transclusions – defaults tolang="syc-Syrc"
(scriptSyrc
is forced){{Script/Eastern Syriac}}
– 25 transclusions – defaults tolang="syc-Syrn"
(scriptSyrn
is forced){{Script/Estrangelo Syriac}}
– 47 transclusions – defaults tolang="syc-Syre"
(scriptSyre
is forced){{Script/Western Syriac}}
– 25 transclusions – defaults tolang="syc-Syrj"
(scriptSyrj
is forced)
- these use
- I haven't yet tested them all, but of those that I have tested, none rendered differently when given a language tag. Your operating system may produce different results.
- —Trappist the monk (talk) 17:03, 31 January 2025 (UTC)
- I'm not entirely sure what's being argued here, but for the avoidance of doubt, that edit was me trying very hard to unify the whys and wherefores of a number of various language templates, and how they should be used. I'm not sure it was my best work, to be honest.
- It's also worth noting that I did this before the more recent changes to change 'transl' to 'transliteration', and all of the separate instances of lang-[insert ISO code here] templates to just be lang-x|[ISO code]. They have been unified a little further this way, but it's still not perfect.—Ineffablebookkeeper (talk) ({{ping}} me!) 20:52, 31 January 2025 (UTC)
- For my part, I think that the 18
{{script}}
sub-templates should not be setting thelang=
html attribute; that is the duty of{{lang}}
etc. Fixing them so that they do not set thelang=
attribute will bring them into line with Template:Script § Usage and in line with the other 120-ish{{script}}
sub-templates. - —Trappist the monk (talk) 00:53, 1 February 2025 (UTC)
- My point was that a sub-class of a class can extend the parent, so it's not immediately wrong that a subclass of
{{script}}
might need language. It seems reasonable that{{Script/Arabic}}
should take language into consideration (I'm surprised that it doesn't), and{{Script/Cuneiform}}
does. The most common use case, I imagine, is that an editor wants to set the script style of text in a particular script and language. Generally both will be known and should be marked up. Then readers can choose how best to display, given their local fonts. {{Template:Script/Hebrew}}
is clear about "Hebrew script" not being "Hebrew text"... but then it's not clear how or why you would ever use that template. Perhaps nesting the{{lang}}
within the{{script}}
forlang=he
orlang=ji
(Yiddish) is the correct way? It'd make sense to do it in one template to give the desired CSS and correct markup for semantic and accessibility purposes. Currently the two things seem separate. I understand why you'd want the CSS, but not why you'd want it without the other correct, more useful, markup.- The list above is clearly inconsistent. Many cases look like they will happen to produce correctly marked up HTML despite any logical inconsistency conflating script and language. Simultaneously setting language and script seems to be a common use-case. The degree of interdependence of the two can vary depending on language and script. There should be some clear way to do it properly and efficiently. I understand the unification efforts are ongoing, and that's what we are discussing. Salpynx (talk) 08:01, 1 February 2025 (UTC)
- Oh. Yeah, I'd have to argue that it does make sense to make {{script}} a subset of {{lang}}. I can't think of any instance where you'd be writing Arabic or Hebrew text but not need a language template of some sort. And 120-ish sub-templates for {{script}} seems a little silly to have when it could be a function of a larger template. —Ineffablebookkeeper (talk) ({{ping}} me!) 09:51, 1 February 2025 (UTC)
- My point was that a sub-class of a class can extend the parent, so it's not immediately wrong that a subclass of
- For my part, I think that the 18
- No. The accidental wording of template docs shouldn't override valid uses cases of other wiki-editors. For Template:Script/Arabic and Template:Script/Cuneiform, language is relevant in determining the correct display style. Salpynx (talk) 02:29, 31 January 2025 (UTC)
Portal di Ensiklopedia Dunia