If you are expecting {{ISO 639 name/sandbox}} to return the same values for non-ISO 639 'codes' as some the existing templates do, you shall be disappointed. IETF language tags are not ISO 639 codes so the ~/sandbox returns an error message. The same is true when given a language name because again, a language name is not an ISO 639 code.
The module sandbox has a new function iso_639() that attempts to return either the code or the language name depending on what it is given (still working on this so these may or may not be working when you read this):
No list. And, it isn't immediately obvious how to make such a list. Both of intitle:"ISO 639 name" insource:/#[Rr][Ee][Dd][Ii][Rr][Ee][Cc][Tt]/ and insource:/#[Rr][Ee][Dd][Ii][Rr][Ee][Cc][Tt]/ prefix:"ISO 639 name" searches return nothing.
Special:PrefixIndex/Template:ISO 639 name (alas) won't show redirects only (they can be hidden ...). I suppose one might take that list and make a lua module. This appears to be a start:
functionp.redirect()localdata={'ISO 639 name Arabic','ISO 639 name AR','ISO 639 name ar','ISO 639 name ja-Latn',}localredirects={};localname;for_,vinipairs(data)doname='Template:'..vifmw.title.new(name).isRedirectthentable.insert(redirects,name);endendreturnmw.dumpObject(redirects)end
You reverted yourself with the edit summary nevermind, that works. Does it? Don't you want the names of the templates that are redirects? Why would you want the names of the articles that use those redirects? Article names seems to be what Category:Pages using ISO 639 name redirect templates is collecting.
I needed the redirect templates to get the pages so I can replace the parameter used in these articles to a correct ISO 639 one, so this method also works. The only problem is that I have to manually go over ~1200 articles ({{Cleanup translation}}, {{Rough translation}} and {{ISO 639 name}} transclusions) and do with AWB an empty save which takes a lot of time since it stops me every few seconds. Do you have a bot that can do this faster? --Gonnym (talk) 00:16, 11 August 2020 (UTC)[reply]
If you can give me a list of articles, I have a null-edit bot task. Of course, you can just let the job queue take its course because there is no hurry.
I know there is a bug that sometimes takes forever for the queue to work. It's been 7 hours and it still didn't start. Anyways, if you have time, I created a list of pages here.— Preceding unsigned comment added by Gonnym (talk • contribs) 07:50, 11 August 2020 (UTC)[reply]
Done. These didn't get done because they are protected:
@Gonnym: I have updated {{ISO 639 name}} to use Module:ISO 639 name. The update included function name changes. I think that I have found and fixed all of the templates that were temporarily broken because of the function name changes. I have also updated Template:ISO 639 name/doc and {{ISO 639 name/doc/row}}. You're redirect-detection code remains in {{ISO 639 name}}; it is the reason that the Western Frisian category link is broken on the ~/doc page.
So where to now? Figuring out where all of those 1100-ish templates in Category:ISO 639 name from code templates are transcluded looks like it will be painful. We might want to add some sort of category mechanism to those templates that would categorize their use: Category:ISO 639 name templates with valid codes for proper ISO 639-1, -2, -3, or -5 codes and Category:ISO 639 name templates with invalid codes for all others. But that isn't really easy. We could replace the plain-text language name in the templates with a call to {{ISO 639 name|<code-from-template-name>}} – relatively easy to accomplish with AWB. That gives us a test of all of the 'codes' being used; Module:ISO 639 name will categorize the 'errors' (IETF-like language tags, proper names, etc) so those can be fixed where they appear in article space (and perhaps in other namespaces). Once that is done, Special:WhatLinksHere/Template:ISO 639 name will list where the template is transcluded so instances of {{ISO 639 name <code>}} can be replaced with direct calls to {{ISO 639 name|<code>}}.
All of this is, of course, predicated on a successful outcome at WP:TfD. Have I over-thought this? Under-thought this?
Great job! I think you exactly-thought this. I'll be nominating these templates today as I got another editor to help with his bot for tagging all these. Then we can do what you suggested with changing the template plain text so catch errors and see where they are used and once we're done with that we can convert the template to use the module directly. I believe the tracking category I added has done what it needed doing as I fixed the calls I could find, so I can remove it (unless I'm mistaken?). --Gonnym (talk) 09:14, 14 August 2020 (UTC)[reply]
Yesterday I created an AWB script that gets the language code from the template name, writes the appropriate {{ISO 639 name|<code>}} template, and comments out the plain-text language name. I need to rework it a bit to account for the tagging that is being added.
Yeah, languages supported by Media Wiki seems the better choice for {{proofreader needed}}. That template used to use the {{#ifexist:}} parser function to look for one of the now WP:TfD'd {{ISO 639 name <code>}} templates. My 25 October 2018 edit switched it to use Module:ISO 639 name which has a replacement function for {{#ifexist:}} (then: iso_639_name_exists(), now: iso_639_code_exists()). There isn't an equivalent facility for the #language: magic word. I can think of a couple of ways around that limitation:
As long as x in {{ISO 639 name x}} is a valid ISO 639-1, -2, -3, -5 code (not an IETF language tag) then the fix is: {{ISO 639 name x}} → {{ISO 639 name|x}}
The likely fix for {{lang2iso}} is {{#invoke:ISO 639 name|iso_639_name_to_code|<name>}}. But, {{lang2iso}} has about 6600 transclusions so some testing is warranted before we just replace all with the module invoke. I'll do that.
Apparently there are only two Template:ISO 639 code ... templates (search) so it isn't really clear to me why that bit of code is still there. ~/inner core evaluates {{{2}}} for length and when 2, uses {{iso639-1}} to create a link to Library of Congress or when 3 uses the ISO639-3 interwiki mapping to link to sil.org. So, all of that being said, I suspect that it might be best to add a function to Module:ISO 639 name: iso_639_name_exists(). If we do that then, in ~/core we write:
Yeah, TfD those 2 lone templates is a good idea. What should they be replaced with? {{#invoke:ISO 639 name|iso_639_name_to_code|{{{1}}}}} or {{Lang2iso}}? --Gonnym (talk) 16:46, 25 August 2020 (UTC)[reply]
Nothing overly surprising here. I'm inclined to update the template to use the module and fix what needs fixing. Of the templates that use {{lang2iso}}, these appear to be the mode used:
comparison {{lang2iso}} against {{#invoke:ISO 639 name|iso_639_name_to_code|<name>}}
Expected?
lang2iso
iso_639_name_to_code
Language name
comment
als
gsw
gsw
Tosk Albanian
lang2iso is wrong; alemannic is Alemannic German ISO 639-2, -3 code gsw
diq
zza
zza
Dimli
dimli is an exact match for one of the names associated with ISO 639-2 zza; ISO 639-2, -3 name for diq is 'Dimli (individual language)'
nb
nb
nb
Norwegian Bokmål
bug: override namelists are searched first; norwegian bokmål overrides ISO 639-2 primary 'Bokmål, Norwegian'
ne
nep
nep
Nepali
nepali is an exact match for ISO 639-2 nep; ISO 639-1 ne is 'Nepali (macrolanguage)'
st
st
st
Sotho
ISO 639-2 override list to override 'Sotho, Southern' has code sot; ISO 639-1 st is 'Southern Sotho' for consistency should be overridden to 'Sotho'; ISO 639-3 'Southern Sotho' is overridden to 'Sotho
sw
swa
swa
Swahili
exact match for ISO 639-2 swa; ISO 639-1 (sw), -3 (swa) name is 'Swahili (macrolanguage)'
ms
msa
msa
Malay
exact match for ISO 639-2 may(B) and msa(T); ISO 639-1, -3 name is 'malay (macrolanguage)'
or
ory
ory
Oriya
exact match for ISO 639-3 ory; ISO 639-1 name is 'odia (macrolanguage)'
or
ori
ori
Oriya
exact match for ISO 639-2 ori; ISO 639-1 or and -3 ori name is 'oriya (macrolanguage)'; ISO 639-3 ory name is 'oriya (individual language)'
roh
rm
rm
Romansh
exact match for ISO 639-1 rm and ISO 639-2, 3 roh
ruq
ro
ro
Megleno Romanian
lang2iso is wrong; there are two #switch cases for romanian: | romanian=ro precedes | romanian=ruq
sdc
sc
sc
Sassarese Sardinian
lang2iso is wrong; there are two #switch cases for sardinian: | sardinian=sc precedes | sardinian=sdc
Why are the above not producing the same result? And assuming both are correct, does it matter for the output? Template doc says it returns a two letter language code, but some of its return values are three letters, so I'm not sure what is going on there. Maybe it prefers to return a two letter language code if available, if not then it returns a three letter one? --Gonnym (talk) 09:30, 25 August 2020 (UTC)[reply]
I've added explanations to your table. There is a disclaimer in the module doc about the usefulness of iso_639_name_to_code(): <language name> must exactly match the name in the data tables.
The bug is that the code searches all of the override tables before searching the individual parts. It should search override 1 then part 1, override 2 then part 2, ... I'll fix that and add 'Sotho' as a part 1 override.
So to change the template, it should go from {{safesubst<noinclude/>:#switch: {{safesubst<noinclude/>:lc:{{{1}}}}} to {{#invoke:ISO 639 name|iso_639_name_to_code|{{{1}}}}? Is the safesubst needed? is the lowercase needed or does the module handle this? --Gonnym (talk) 10:07, 27 August 2020 (UTC)[reply]
Using {{ISO 639 name}} as a model, it would seem that this is what you want:
{{rough translation}} was looking for a return from a template called {{ISO 639 name hu}} but that template no-longer exists so {{rough translation}} fell back to using the raw value i{{{1}}} in the category name. I think that this has been remedied. Editor Gonnym last edited {{rough translation}} at 0611Z; {{ISO 639 name hu}} was deleted at 2150Z, 15h39m later. The red-linked translation categories did did not start accumulating articles until after {{ISO 639 name hu}} was deleted. Your revert of Gonnym's edit was unwarranted because, as this case shows, it is not always true that what you see is the result of the last edit made to a template.
I'll tweak the module to properly handle names that end with 'Language'. I'll probably also tweak Module:Lang. Indus Valley Language redirects to Harappan language so that can be handled in the override data.
this is the English Wikipedia so English names are probably best; these can be overridden to use en.wiki preferred names
these can be overridden to use en.wiki preferred names
sgn is an ISO 639-2 collective code so Sign Languages is correct; the singular form would, I think, convey a different meaning. If it becomes an issue, a redirect to the correct article is probably the correct answer
With regard to overrides: just because we can doesn't mean that we should. If there is a need then, of course, override; no point in adding an override if it is never used.
When doing code-to-name look-ups there is minimal pain; when doing name-to-code lookup, the current code must read each k/v pair in an override table (pairs()) until it stumbles upon the input name. Large tables take longer. I have thought about modifying Module:Language/data/ISO 639 name to code/make so that it includes override data in Module:Language/data/ISO 639 name to code. Doing that would change the laborious name-search to a simple lookup.
I see. Well, your idea for the refactor sounds good, so it can wait til whenever you do that. Is the dab fixes not being done also for the same reasons? --Gonnym (talk) 12:22, 27 August 2020 (UTC)[reply]
Partly; if there isn't a need ... Additionally, the dab list is complicated so is a pain to figure out. For example:
{{#invoke:ISO 639 name/sandbox|iso_639_code_to_name|link=yes|amy}} → Ami – probably override to Marranj language
{{#invoke:ISO 639 name/sandbox|iso_639_code_to_name|link=yes|hbo}} → Ancient Hebrew – no clearly obvious article as a link target
{{#invoke:ISO 639 name/sandbox|iso_639_code_to_name|link=yes|toi}} → Tonga – override in the article_name{} table
I suppose that one way around the dab issue is to modify |link=: if |link=<article title> (any value that is not yes) link with that.
It is unbelievable to say that your user page does not exist, you operate a bot and are an administrator. I will present you with one of the biggest honours ever.
Of the categories accumulating in Category:Lang and lang-xx template errors, some are valid errors while others (most?) are ISO 639-2/B codes. These -2/B codes are not found in Module:Language/data/iana languages because the -2/T codes are preferred. The fix for these categories is to replace the -2/B code with its matching -2/T code.
I was wondering what the point is even in listing the other ISOs as they don't seem to do anything or appear in the text. See Category:Articles containing Korean-language text which has 5 codes but only 1 appears in the text. But if possible, we can even go further and eliminate the need to list any ISOs. If we take the the name appearing after the "containing" and before the "-language text", and pass it to the module, it should be able to give us whatever data we needed. So if we only show one parameter, we can request the ISO code by the language name. Did I miss something here? --Gonnym (talk) 18:19, 28 August 2020 (UTC)[reply]
Oh, aye, I wondered something similar though didn't pursue it as far as you did. Were it me, I think that {{Category articles containing non-English-language text/inner core}} would have one positional parameter: an IETF language tag (language tags are easily internationalized, language names not so much).
Could you explain the internationalization issue? {{ISO 639 name|fn=iso_639_name_to_code|Literary Chinese}} -> lzh so this already works. Or did you mean something else? --Gonnym (talk) 08:49, 29 August 2020 (UTC)[reply]
A rather large number of templates get copied from en.wiki to other-language.wikis. I suspect that it is unlikely that {{Category articles containing non-English-language text/inner core}} will be one of those, but as a general principle, because the Wikipedia community is multi-lingual, we should be in the habit of writing templates and modules with that in mind. ISO 639 language codes are standardized and international so we should use them in preference to the English names for languages.
I guess I don't understand the point you are trying to make. zh-classical is not a valid IETF language tag so {{lang}} will never add Category:Articles containing Classical Chinese-language text to any article. 'Classical Chinese' is not a language name known to {{lang}}. Similarly, 'Classical Gaelic' is not a language name known to {{lang}}. Categories using the {{Category articles containing non-English-language text}} that cannot be populated by {{lang}} serve no purpose so should be CfD'd. The act of CfDing these may have some benefit. If, for example, in the eyes of en.wiki, 'Classical Gaelic' is a synonym for 'Hiberno-Scottish Gaelic', we can fix that and whatever associated redirects to get readers to the appropriate article (those decisions are outside my bailiwick). Further, because 'Classical Chinese' and 'Classical Gaelic' are not known to {{lang}} we cannot fetch a code from the name:
{{#invoke:lang|tag_from_name|Classical Chinese}} → Error: language: Classical Chinese not found – ha! a bug because:
{{#invoke:lang|name_from_tag|zh-classical}} → Error: unrecognized language tag: zh-classical
I agree with what you wrote above and you seem to be agreeing with what I said, so I'll I'try and explain again. The category name should match the ISO code. We currently have categories such as the Classical Chinese and Classical Gaelic which don't match. I wanted to have a piece of code that will add them to the error category so it will be easier to find them (instead of browsing a list of 1k+ categories) and then CfD them. The code you added does not currently accomplish this. What it checks is if the user who added the template used a correct ISO code. However, that code can be correct and the category name incorrect, or both can be correct but not belonging to each other. In both of those situations the category won't appear in the error category. The only way (I see) to do this, is to check if the category name itself is correct. That is why I proposed we check if the language name in the title appears in an ISO list. That might have internationalization issues, but it's better than not being able to find these errors. Did I explain it better? --Gonnym (talk) 11:44, 29 August 2020 (UTC)[reply]
The category name should match the ISO code. No. The category name should match what en.wiki editors have determined is the appropriate name. While I would prefer that articles and categories all use the name defined in the standards, editors here don't agree. This is why we have Module:Lang/data so that Module:lang can override standardized names with en.wiki preferred names.
When you proposed the test, I did not understand that you wanted a 'language-name test'. I'll think about that.
When I first mentioned internationalization, I was referring to a rewrite of {{Category articles containing non-English-language text}} so that as input it would accept only one positional parameter, an IETF language tag from which the template would get the en.wiki sanctioned language name.
This is why we have Module:Lang/data so that Module:lang can override standardized names with en.wiki preferred names. - I'll correct myself. I meant that. My point was that the category name should match the category in which Module:lang places the pages using the specific code in. So if I create a category called "Articles containing Trappist-language text" and add {{Category articles containing non-English-language text|fr}}, which is a valid ISO, the page still gets placed in the error category as {{lang|fr}} does not populate the "Articles containing Trappist-language text" category. Is that more clear? --Gonnym (talk) 13:15, 29 August 2020 (UTC)[reply]
The instances that I checked in Category:ISO 639 name template errors (I only checked a few) used {{lang2iso|}}. We could probably use awb to change that form of the template to {{lang2iso|undetermined}} or we could change {{lang2iso}} to include |{{{1|undetermined}}}
Which one do you think is better? I'll note that if we add that as default to the template then we won't get the error "error: language name required" if that is important. Also, if we go AWB, can your bot help? As 2.3k is a bit too much for me to do manually. --Gonnym (talk) 11:51, 29 August 2020 (UTC)[reply]
The awb solution because that will give future editors (at least those conscientious editors who preview their edits) a chance to fix whatever they are doing before they publish. I could write a bot task to make those changes but for 2k or 3k edits it hardly seems worth it. Bot tasks take time to get through WP:BRFA. The job could be finished (using the same code) before BRFA gets started.
{{#language:{{lang2iso|ru}}}} → <span style="font-size:100%;" class="error show_639_err_msgs">error: ru not found in iso 639-1, -2, -2b, -3, -5 list ([[template:iso 639 name|help]])</span>[[category:iso 639 name template errors]]
No. The {{rough translation}} template documentation (which, like most template documentation, sucks) says that {{{1}}} is for the language name – doesn't say whether that name it the English-language form of the name, the exonym ... Optional {{{2}}} is (presumably, because the documentation isn't explicit about this either) the endonym. The code:
{{#language:{{lang2iso|{{{1}}}}}}}
does work when given the correct input:
{{#language:{{lang2iso|Russian}}}} → русский
I have created iso_639_name_exists() in Module:ISO 639 name and is_name() in Module:Mw lang. I should also do that for Module:Lang. I should also harmonize function names across the three modules. I'm not feeling particularly motivated to fix {{rough translation}}. I'll leave that for someone else...
Found Module:Language/name which seems to duplicate Module:ISO 639 name. Am I correct? If so, is there anything to merge or just a simple replacement? I'll TfD it based on your response.
Doesn't the answer to this question require knowledge of where the replacement is to be? If the template that uses Module:Language/name is Wikipedia specific then the answer would be {{mw lang}}. In other use cases, it might be {{lang}} or it might be {{ISO 639 name}}. Looking at that sentence, it strikes me that {{ISO 639 name}} should be renamed {{ISO 639 lang}} ...
So that when desired, editors may ask iso_639_name_to_code() to return either the -2B or -2T code for a language that has both. Module:Language/data/ISO 639 name to code lists one -2 code for languages that have -2B and -2T codes. As it is right now, editors can't be expected to know which of -2B or -2T which will be returned. Here are two examples:
{{ISO 639 name|fn=iso_639_name_to_code|Albanian|2}} → sqi a -2T code
{{ISO 639 name|fn=iso_639_name_to_code|Czech|2}} → ces a -2B code
Module:Language/data/ISO 639-2 still has both because I don't want to break anything while I still haven't yet figured out how the various module functions will use the -2B module.
For Module:Language/data/ISO 639 name to code, I want to figure this out before I modify Module:Language/data/ISO 639 name to code/make to include the -2B data and the override data in the table that it creates.
Is it possible to add a parameter so pages like Template:ISO 639 name/doc won't show up in Category:ISO 639 name template errors? I'm trying to clear out that category and since the /doc page has an example of an error it shows up there. |hide-err= won't help here, as the /doc wants to show the error, I just don't want it to cause the page to appear in the tracking category.
Template:Lang-eml is unused so valid for TfD, but I was wondering if that is even a valid code? The template error says it isn't but I'm double checking.
The Template:User mo family of templates is showing an error because it can't find "mo", since "mo" is a deprecated language code of Moldovan language. I was looking at this and this and I'm wondering if these codes should still be in our files? Or maybe in a case-by-case situation? The LoC says for "mo/l" The identifiers mo and mol are deprecated, leaving ro and ron (639-2/T) and rum (639-2/B) the current language identifiers to be used for the variant of the Romanian language also known as Moldavian and Moldovan in English and moldave in French. The identifiers mo and mol will not be assigned to different items, and recordings using these identifiers will not be invalid., while for "sr" is says ISO 639-2/B code deprecated in favor of ISO 639-2/T code. What do you think?
Uses of {{Lang2iso}} without a parameter are starting to appear in the category. I know it's being added via Template:Afc decline but I can't find where the template is used so I can set the default value. Can you help me locate it?
yes; I also want to add a help link to the template doc page
eml is the second level domain name for the Emilian-Romagnol Wikipedia consequently, {{#language:eml|en}} → Emiliano-Romagnolo looks like a valid ISO 639 code; it is not. TfD
added mo to Module:Language/data/ISO 639 override; I'll add the other 639-2, -3 codes presently. For sr, the 639-2/B code scc doesn't appear to be used anywhere. Do you know of some place where it is used? If there is a need, scc can be added to the override
Part of the difficulty is because {{Afc decline}} is normally substed? In that template, {{lang2iso}} is used to create an inter-language interwiki-link so {{Afc decline}} should properly be using the data behind {{#language}}. Time for a new template/module pair perhaps. Module:Citation/CS1 does this so I can adapt code from that to this task. I suppose it would be a parallel to {{ISO 639 name}}
I TfD Lang-eml. I don't know of any other deprecated codes, including sr, that are in use, but they might be. 'mo' was in the error category which is why I saw it. I'm setting |cat=no but the page is still in the error category. What am I doing wrong? --Gonnym (talk) 17:54, 27 August 2020 (UTC)[reply]
Fixed I think (sometimes the brain just doesn't work) so I think that the error handling is done for now. Nor does the brain remember stuff. In March this year I wrote Module:Sandbox/trappist the monk/mw lang. I don't remember doing that ...
Good job with the fast fix though :) OK, it seems this is a never-ending wack-a-mole game. Found a few more templates that still used the ISO 639 name x version and fixed the easier ones.
Template:Uselanguage can you tell how to replace the code there? As a related note, it seems a very old version of that template had messed up code and placed the entire switch in the user pages such as User talk:Qbamin~enwiki. I'd guess that a majority of the transclusions of Template:Iso2lang is because of that.
I don't know how to get the verification if the wiki language is correct (#ifexist:Template:ISO 639 name {{{1|notarealpage}}}) to work. Proofreader doesn't have this check so it just creates an invalid link. Any ideas?
This older version of {{proofreader needed}} used the {{#ifexist:Template:ISO 639 name {{{2|}}}|...|...}} construct. So, wouldn't the pertinent bit be written:
{{SAFESUBST:<noinclude />#ifeq:{{#language:{{{1|}}}|en}}|{{{1|}}}|[[Image:Information.svg|25px|alt=|link=]]Please do not contribute text in a foreign language to English Wikipedia. Your contributions are more than welcome at a [[List of Wikipedias|Wikipedia in your language]].|Please do not contribute text in {{SAFESUBST:<noinclude />#language:{{{1}}}|en}} to English Wikipedia. Your contributions are more than welcome at [[:{{{1}}}:Main Page|{{SAFESUBST:<noinclude />ISO 639 name {{{1}}}}} Wikipedia]].|}}
That is why we have TfD. Let the community have their say and we shall act accordingly.
Didn't we discuss this at §Proofreader? It ain't perfect (another reason for me to finish Module:Sandbox/trappist_the_monk/mw_lang). {{#language:<lang code>}} returns the language name when <lang code> is a recognized code; when not a recognized code, returns <lang code>. With {{#ifeq:...}} we can test the returned value:
--user sandbox deleted -
The problematic one here is French though it won't break {{Uselanguage}}.
I have no reason to believe that Module:Language/name/data is not working correctly. Modules other than Module:Lang use it (Module Lang accounts for 1.1 million of those transclusions). Were it up to me I would get rid of Module:Language/data/wp languages because its provenance is unknown.
Completely forgot about that discussion. I (think) I fixed both Proofreader and Uselanguage. Regarding Module:Language/data/wp languages, I'm not sure how to check where its used because of the high amount of transclusions from the /data sub module. --Gonnym (talk) 12:37, 28 August 2020 (UTC)[reply]
User talk:Keeper76/Archive 13 done. Can't do the other because I don't have (or want) interface admin rights. You'll have to talk to the page owner or get an en.wiki interface admin to tweak the code. Java script is not my thing so I can't explain why {{language}} is being executed (must be or the error cat would not exist).
Left him a message. Only errors left now are you for talk and sandbox, so whenever you don't need those with errors feel free to fix them :) --Gonnym (talk) 16:19, 1 September 2020 (UTC)[reply]
Arbitrary break
OK, so we've done quite a lot so far. Current issues below:
Found Module:Language/text which says its meant as a replacement for all lang-x templates, which Module:Lang does. Am I correct? If so, I'll TfD it also.
I'll fix the old usages of Template:Uselanguage which were incorrectly subset in articles.
I'm 2/3 of the way done mapping the languages that lead to dab pages to their correct article location.
Mind if I refactor our discussions on this page to move discussions into their separate sections? Will make it easier to find and comment (and I know this is my fault its like this:) ). --Gonnym (talk) 11:01, 29 August 2020 (UTC)[reply]
I think that you are probably correct that Module:Language is Wiktionary-specific. That module has a whole passel of subpages that may or may not be used by that module. I know that Module:Wiktionary was an attempt to resolve issues around italicization in {{Wiktionary}}; that process got mired in the squabble over how to represent the defined word ... A well advertised discussion is warranted if you think that this is an important issue.
I guess I don't have a clear sense for what should be done, if anything. Its sort of messy but it works. Almost any data module that Module:Lang, Module:ISO 639 name, and Module:Language need is there. I can imagine splitting the content so that there are separate structures for ISO 639 data modules and IANA data modules. Do we need to do that? What other language data is going to be needed in the future?
I suspect that data modules that are specific to an executable module, should be segregated as subpages of their associated executable modules. Module:Lang/data and Module:Lang/ISO 639 synonyms are already segregated for Module:Lang. Module:Language/data/ISO 639 override should probably be moved so that it is a subpage of Module:ISO 639 name.
Lately I've been developing a bot (using a simple find and replace pywikibot) to help solve some problems with the CS1 module in SqWiki. Currently, what it does is that:
It converts English language names into universal 2 characters ISO codes, solving the "unknown language error" and aiding in i18n;
It removes deprecated parameters like "ref=harv", "ref=harvnb";
It removes CS1 categories added "manually" (not automatically by the module, what we talked the last time);
Now, regarding this subject, I have a few questions:
I'm still not totally sure what to do with 3 characters ISO code languages. Do they somehow require any kind of maintenance? Why are we tracking them? Strange question: To your knowledge/beliefs, is there any hope for "total standardization" in the future among the number of characters in ISO codes AND among some language codes used my Wikimedia (therefore removing the need for remapping and hacking)?
Currently the bot converts ONLY English language names into 2 characters ISO codes (those that can be converted into 2 characters, as I said, I'm not yet sure how to deal with the 3 characters ones). That's because most of our articles come from EnWiki. The ideal solution would be for it to be able to convert all language names in ALL languages into 2 characters ISO codes. Therefore it would provide maintenance even to articles coming from other languages (and maybe it could do its job even on other wikis? - Not sure if that function would be needed among other wikis). The problem is that I simply don't know how other languages are called in other languages. This is the list I used in designing the conversion: Codes for the Representation of Names of Languages. By using the same list, I can make the French and German conversion possible but that's where I stop. And I can probably fill in the Albanian list by using that template you've suggested I import before but I don't know of any way how I can fill in the remaining languages. Any thoughts on that?
I'm a little surprised a bot like this has not been created yet in EnWiki (to my knowledge) considering the vast amount of bots you have running here. For me, I was motivated to create it after spending countless hours just converting English language names into ISO codes to solve those errors but maybe that's not a big problem here. Anyway, the question is: Do you think this bot would be useful outside of SqWiki? Do you think it's a good thing if we globally strive to use ISO codes for languages in our Wiki communities so it helps (a bit) with i18n? I mean, of course, in communities that already rely on CS1 module for most of their citations. Or would that approach provide problems I haven't thought of yet? Of course, other fixing functions mentioned above could also be imported globally, no? The fact that there is yet no bot helping globally with CS1 module problems (I emphasize: To my knowledge) makes me think that there are problems in that approach I haven't thought of but maybe it's just that no one had found the time yet to deal with it. What do you think?
And lastly, what are some kind of CS1 problems (errors and maintenance) that could be fixed with a bot apart from what I'm already solving? I mean, of course a simple bot can't really put the name of a missing original title (when the translated one is there) because it goes beyond simple regex-es but maybe it can help in some other problems? Maybe the bot could go beyond using simple regex-es somehow and help even with problems like those? I'm open to brainstorming (and to learning, as I've only lately just started dealing with bot design). - Klein Muçi (talk) 13:51, 31 August 2020 (UTC)[reply]
@Jonesey95: thank you! Apparently there are more than date fixes there. I'll give a thorough look at the code it uses to hopefully import it to my bot too. Hopefully, it won't require too many changes/adaptions. - Klein Muçi (talk) 15:47, 31 August 2020 (UTC)[reply]
Answers:
Hoping that WikiMedia will ever standardize on ISO 639 language codes and names is, I think, a forlorn hope. Therefore, it will always be necessary to remap and hack.
The Library of Congress list is a good list for ISO 639-1 and ISO 639-2 codes and names but, cs1|2 only accepts ISO 639-2 codes and names that are also supported by MediaWiki. I suspect that the English-language version of the data behind the {{#language:}} magic word is the most complete mapping of codes ↔ names at MediaWiki so I think that those data should be the data source for your bot. The ideal solution would be for it to be able to convert all language names in ALL languages into 2 characters ISO codes. Ha! Not likely. There are 100-ish ISO 639-5 codes, 7600-ish ISO 639-3 codes, 500-ish ISO 639-2 codes, but only 180-ish two-character ISO 639-1 codes. It is not possible to map all of those three-character codes to two-character codes. I have tweaked the en.wiki copy of Module:cs1 documentation support so that function lang_list() accepts |lang=<code>. With that, you can get a list of what MediaWiki thinks are the language names for the <code>'s language: Albanian in Spanish is (according to MediaWiki) albanés. This is not a perfect system. If a MediaWiki list of language names for a particular language is incomplete, MediaWiki will fall-back to some other language list (probably the English list).
At the moment, Category:CS1 maint: unrecognized language doesn't have many articles listed (384) – someone is keeping its count low. For such a low article-count, I don't see a need for a bot. Use your bot outside of sq.wiki? I don't know, that is really an issue for those other-language wikis. I don't see a use for it here. I do think that in cs1|2 and any other template that does anything with language names, the preferred method for communicating the name to the template should be with IETF language tags so that i18n isn't an issue. I would wish that MediaWiki would adopt such a standard but I won't hold my breath (forlorn hope that I mentioned earlier). Global bot? I think this unlikely. Perhaps if all installations of the cs1|2 module suite were identical except for what can be configured in ~/Configuration then I suppose some sort of bot is possible. Perhaps when (if) global modules/templates become a reality... I think that the only pseudo-global bot working on some facet of cs1|2 problems is IAbot and that really isn't about cs1|2 problems per se, but about deceased urls that can be coupled to an |archive-url=.
I don't have an answer. The missing language param category is your biggest. Maybe you can derive cues about the language from |title= by inspecting the words that make up the title; of course you don't really know if English words in |title= actually reflect the language in which the source is written ...
It looks like I still have a lot to study about language names and their conventions regarding corresponding codes in technical environments. I'll try to make use of your new tweaking, maybe that's exactly what I needed. (Most likely I'll be back here asking for help on how to do that exactly but I'll try it alone first. :P )
Yes, I was thinking the sames about global bots and IABot (being the only one). If global versions of CS1 modules are all updated, I think bots helping in a global scale with minor fixes could help keep the error/maint categories empty. But keeping the modules up to date globally would be the biggest challenge to overcome. Maybe I'm getting a bit too enthusiastic here but how about we start keeping track of CS1 modules in a global scale in Meta? Maybe we could notify communities en masse when a new update was coming and keep track of how that goes globally, their specific changes they would ask for, etc. Then we could talk with more confidence about global-related problems and solutions. Do you think that thing could be possible and inside your scope of interest?
Yes, that's the biggest problem in SqWiki and the first motivation for creating the bot. It was this discussion that inspired the bot. Unfortunately what's described there it's outside of my actual technical capabilities so I was forced to at least deal with the unknown languages.
Finally, can you explain me in a very short way, why do we keep track of articles using 3 character ISO codes? Maybe you already did so above, but I was bombarded with new information regarding languages and apparently I have missed this. - Klein Muçi (talk) 17:15, 31 August 2020 (UTC)[reply]
While I would like to see cs1|2 be more globally useful, I am not much interested in the bureaucracy and drama that comes from disparate user-interests bickering over some point or other. This is why I almost never participate in RfCs and other such community driven decision making.
Why do we categorize anything? Because someone might find it useful? It isn't harmful, so I don't see much point in withdrawing the categorization.
I fully understand what you mean. And I didn't meant to remove it. I just thought that maybe it was there in need of some attention to get some kind of fix. As always, thank you for your answers! :)) - Klein Muçi (talk) 01:47, 2 September 2020 (UTC)[reply]
I know, but it's used in an infobox and the input a user used was the one without the "languages" part and the article also doesn't use it, which was why I asked. --Gonnym (talk) 09:56, 3 September 2020 (UTC)[reply]
This is about Mithila (ancient city), {{Infobox historic site}}, and {{ConvertAbbrev}}? Use Prakrit languages. I don't think that we should override the names for collective codes unless there is a very compelling reason to do so.
{{Infobox historic site}} should probably not be using {{lang2iso}} to feed {{lang}} and/or {{lang-??}} templates because similar but different data sets. Instead, the infobox should be calling {{lang}} functions so that the same data set is used throughout:
{{lang|fn=tag_from_name|Prakrit languages}} → pra
Yeah, I know, small quibble ...
I have voiced this before at some other infobox talk page: a utility that accepts native-name-of-thing with matching native-language name or native-language code that then calls Module:Lang to render native-name-of-thing appropriately. There is a first hack at that utility in Module:Lang/utilities/sandboxnative_name_lang(). Infoboxen have a variety of ways that they handle native-names-of-things; that variety could/should be standardized.
Yes, it was regarding those pages you listed above. Looking at Module:Language/data/ISO 639-5 I am wondering if maybe we can adjust how it works to something similar to what you do in the lang_name_get function. Search for either the name or name without the word "languages". Could something like that work? Also, why are the datasets for the lang family different? --Gonnym (talk) 11:47, 3 September 2020 (UTC)[reply]
We strip parenthetical disambiguation from language names because ISO 639 disambiguators are unlikely to match any disambiguators or redirects that en.wiki has:
So, yeah, we could strip languages from collective-code names but what is the very compelling reason to do so?
{{Lang}} doesn't understand either of ISO 639-2B and -5 and derives almost all of its code/name pairs from IANA's language-subtag-registry file; the mapping there is sometimes different from the ISO 639 sources. I think that in the future, {{lang}} should stop (will stop) using Module:Language/name/data which combines IANA with ISO 639-3 and Module:Language/data/wp languages so that {{lang}} relies solely on the IANA data – after all, the primary purpose of {{lang}} is correct html markup and html specifies the IANA data.
Sir the account " Divya Agarwal " has been redirected to another page and protected. She is a notable actress . I want to create a proper web page for her. Since you are a administrator it's my humble request that you either delete or remove protection from this page. Thank you Rjidindiana (talk) 09:07, 7 September 2020 (UTC)[reply]
Before it can be done there needs to be an article to replace the redirect. See WP:YFA for how to do that. If the article is accepted, editors who accept your draft can help you with the redirect and protection issues.
Sir I do have the article ready to replace but the problem is that an article already exists of that name which has no details. It is also protected because of which I cannot make edits.
If the new draft which can be replaced is accepted it be issued in my username and not "Divya Agarwal". That's why I am requesting you to delete this page. Kindly help Rjidindiana (talk) 14:12, 7 September 2020 (UTC)[reply]
Hi, Trappist the monk, I need your help regarding a script. You know general formatting script changes crore/lakhs to million, but about Indian articles, it's not preferable. I derived a script from this see User:Empire AS/General formatting.js. However, I wanted to remove the changing of crores into millions. I've tried to do it manually, but I know nothing about developing scripts. Can you tell me what to remove or add to just remove this units changing? Thank you. Waiting for your reply. --Empire ASTalk!02:13, 5 September 2020 (UTC)[reply]
You know general formatting script changes crore/lakhs to million... Actually, I don't know that; what make you think that I do? I don't know why you are asking this question of me, I am not the author of that script. Shouldn't questions about a particular be directed to the author of the script?
Trappist the monk, The script author isn't active for a while. As you replied me on the talk page of MOSNUM dates script, therefore, I thought that you would know about it too.I came here as I think that you may know how to correct a script and know how to remove that units changing. That's all. Thank you. Empire ASTalk!02:50, 7 September 2020 (UTC)[reply]
The correct person to help you with that script is the script's author. It does not make sense to me that there would be two versions of the script that do markedly different things. Because others will be using the unmodified script, it is best to get assistance from the script's author. Failing that, perhaps discuss at WP:IANB.
Trappist the monk, I'll not say it's my work but derivatived from original script. The 2 scripts will work the same except changing units. And the work is so tiny, that only the you have to tell me only section that should be removed that works of changing? Thank you. Empire ASTalk!11:28, 7 September 2020 (UTC)[reply]
But that's the problem, isn't it? The two scripts will not work the same way because your script will be different. You visit an article and use your script to modify the article (retaining crore/lakhs). Then someone else comes along and uses the original script and modifies the article again (converting crore/lakhs). If the original script doesn't do the right thing, talk to the script's author and get the original script repaired else you and others will find yourselves repairing articles that shouldn't need repairing.
I fixed my error on this page; it was simply a typo, and I smacked myself for not using the Show Preview button. I also sent another reply by ping, but in case that doesn't work because you don't have a regular user page, have sent this one too. You can just delete it otherwise.) The page should now look the same as before I touched it.