User talk:Butlerblog/Archives/2024/July


Strange and weird

Butlerblog, I've wondered for months why you follow me to articles and then edit them after I do. I see the edits you make at these articles because they are on my watchlist. By and large, these are articles you've never edited previously, which means they probably aren't on your watchlist before I have edited them. In order for you to see my edits at these articles, you have to be keeping tabs on my editing activities through the user contributions page assigned to my account. When you first started behaving this way many months ago, I figured it was because you were still upset that I challenged you about edits where we disagreed. It occurred to me that you would eventually get tired of watching my editing and stop the behavior. When you started behaving in this manner, it looked and felt at the time as if you were chest-pounding, marking territory, asserting some kind of power over me so I would know my place and leave "your articles" alone. So, here we are about eight months later. You haven't grown tired of following me and now it just seems even more strange and weird. Obsessive, even. What reasonable explanation can you give, after all this time, for watching what I do in Wikipedia and showing up to edit articles I have edited, when you've never edited them previously? A4M2 Alaska4Me2 (talk) 14:58, 5 July 2024 (UTC)

Don't overthink it. For the most part, it's nothing more than cleaning up WP:BAREURL refs. ButlerBlog (talk) 15:42, 5 July 2024 (UTC)
No overthinking going on here. Simply seven months of observation via my watchlist which has lead to an obvious conclusion through the application of Occam's razor. A4M2 Alaska4Me2 (talk) 15:49, 5 July 2024 (UTC)

Category:Pages using infobox film with nonstandard dates

If your bot needs a new challenge, Category:Pages using infobox film with nonstandard dates. Gonnym (talk) 15:28, 25 April 2024 (UTC)

@Gonnym: Certainly! I think I can put something together that would take a bite out of that. ButlerBlog (talk) 16:17, 25 April 2024 (UTC)
It didn't take much to put together an initial bot for this. It can format single dates and should be able to clean up around half of the existing maintenance category. My expectation is that while running, I'll monitor for entries that the regex doesn't apply and see if we can address any additional entries.
ButlerBlog (talk) 17:31, 25 April 2024 (UTC)
Nice! I'll also keep watch and see if I find any strange edits. Gonnym (talk) 19:09, 25 April 2024 (UTC)
@Gonnym: While we're waiting on bot task approval, I'm wondering if you might refine the maintenance category a little bit. While I was doing a test run on the bot, I notice the category includes some of the following:
  • Empty |released=
  • |released= containing only a comment (i.e. <!-- {{Film date|Year|Month|Day|Location}} -->)
  • at least a couple entries had no |released= parameter included in the infobox
ButlerBlog (talk) 16:20, 26 April 2024 (UTC)
Yes, currently the category is also capturing instances where the date is also missing as that should be added. I might move that to a different category if you think that's better. Gonnym (talk) 20:01, 26 April 2024 (UTC)
It might be good to have those excluded just to have a more accurate number. But that's just a personal preference. It won't make a difference for the bot task either way. (In other words, only do it if you think there's a benefit and I'll go along with that.) ButlerBlog (talk) 20:23, 26 April 2024 (UTC)
Category split. Gonnym (talk) 18:04, 27 April 2024 (UTC)
@Gonnym: the bot task is approved. I'll start running it today. While I was waiting for the approval, I noticed the edit you corrected from the test run. It did have an exception for {{Infobox album}} but that same regex did not skip {{Infobox video game}}. I have since added an exception for that. In my testing, I did not notice any other issues, but if you notice anything wrong during the run, let me know as soon as you can - or trigger the shut off, and I'll address it right away. Thanks! ButlerBlog (talk) 12:35, 3 May 2024 (UTC)
addendum to the above - I'm pausing for now. I ran through around 700 articles on AWB manual just to be confident. Of those, I manually skipped 14 because it made a change (did not auto-skip as no change) and the change wasn't what it should be. I was able to refine my regexes to eliminate some of those situations, but not all of them. It's still at about a 1% error rate.
Errors that break things I believe I have eliminated - the regex has been refined to only explicitly hit {{Infobox film}} (avoiding any issues as noted previously). What's left are changes that do not completely change the contents of the parameter. In a few instances where there are 3 or more dates, the third and onward are not formatted. There are a couple of odd formatting that I have caught as well where the regex was not able to discern the type of format to apply. In these situations (which is all of what is left in the "errors" category), at least one date is formatted with {{Film date}}, but not remaining dates. My concern on those is that they would likely result in the article being cleared from the maintenance category, but still not fully formatted.
I am going to work with the 9 remaining from my test run log that were manually skipped and see if I can eliminate those from either being changed or to make sure they are fully changed. Most of them are fairly similar. Once I have that refined, I'll do another manual run for review and if the error rate is acceptable at that point, I'll run it completely. (So at least another update to come on this - thanks for your patience) ButlerBlog (talk) 15:17, 3 May 2024 (UTC)
Wow, good job on catching all those! I'll keep on eye on the changes and if I catch something I'll let you know. Gonnym (talk) 16:53, 3 May 2024 (UTC)
I've worked through those manual skips and I could solve all but 2. But that would be a 0.2% error rate,. I'd like it to be zero, but I also think the two that were not picked up are likely "one-offs" where there are not others like them. I'll do another test run manually and see if we're closer to zero and if so, I'll start it automatically. ButlerBlog (talk) 20:37, 3 May 2024 (UTC)
Another thing to watch for and maybe skip is pages with "franchise" or "film series" disambiguation. These are technically not using the correct infobox so I'm pretty sure that the data itself is also not in a style that can work with the film date template. Gonnym (talk) 05:27, 4 May 2024 (UTC)
Thanks! That may help weed out some of the questionable items. ButlerBlog (talk) 12:51, 4 May 2024 (UTC)
The fix at Asylum (2005 film) wasn't complete. I removed the second date but the festival should have been added to the template. Not a big issue though. Gonnym (talk) 14:03, 4 May 2024 (UTC)
Atlas Shrugged (film series) also didn't work. Gonnym (talk) 15:10, 4 May 2024 (UTC)
Thanks! I'll look those over before the next run. ButlerBlog (talk) 18:41, 4 May 2024 (UTC)
I'm still getting around 1% (or less) that match an existing regex to trigger a change, but are not formatted in the way the regex is expecting, so they get changed but not correctly (and because they match a regex, even imperfectly, they are not skipped). Each time, I add an adjustment to account for these, which of course adds to the level of complexity. So... in "going back to basics" what I'm going to do is take out very exact parts of the existing approved bot and run just for those to get all the "low hanging fruit" as it were. I'm going to start with the "year only" entries, then I'll add back the regex for single dates with no location or reference. That should clear a significant chunk. Then I'll return to the more complex regexes. ButlerBlog (talk) 16:27, 6 May 2024 (UTC)
Yeah, that sounds like a good way to handle this. Each time takes us one step closer. Gonnym (talk) 16:30, 6 May 2024 (UTC)
Just finishing up the simplified run through. It just did single dates where they were MDY, month year, or year only. That seems to have cleared almost half of the maintenance cat. Keeping with the concept of keeping it simplified, I'll add back one or two steps today and do another run through and see what that gets us. ButlerBlog (talk) 11:25, 7 May 2024 (UTC)
Another easy fix is converting start date like in Ballistic Kiss. And maybe also one date with country like in Ballet Mécanique. Gonnym (talk) 11:31, 7 May 2024 (UTC)
That was originally on my list. However... That's one that's a little tricky. There are some instances where there is another infobox that has |released= and depending on the formatting of the film date's released param, I've had instances where the first occurrence is skipped and it tries to change the second instance (which is the wrong infobox). But that will get worked back in at some point (when I can work out my regex to ignore the occurrences that I know it should be ignoring). ButlerBlog (talk) 12:22, 7 May 2024 (UTC)
Doh! I just discovered the "in template call" rule. That must be new (or "newer"). I know they've been working to update the program and I have installed a couple of updates the last few months after being behind for awhile, so I think that's a new option. Anyway, knowing that's there now is a major shift in what I can do (starting wtih this start date change). ButlerBlog (talk) 14:40, 7 May 2024 (UTC)

I'm doing another complete run-through now with some additional (from the original) regex patterns. Cleaning up {{start date}}, DMY, MDY, and year only. The previous run was looking for some DMY and MDY that were similar, but this picks up additional patterns. It also hits some year only entries that the previous run excluded (the ones that had HTML comments, which I'm actually removing because they are not necessarily consistent). ButlerBlog (talk) 15:14, 7 May 2024 (UTC)

So far, this is editing about half of the remaining articles. If that holds true through the remainder, we should get this down to less than 9k remaining. ButlerBlog (talk) 15:20, 7 May 2024 (UTC)
It ended up more like 2:1 edits:skips, so we got it down to around 6k to go. ButlerBlog (talk) 02:11, 8 May 2024 (UTC)
Wow, great job! I tried finding an easy pattern left in those 6k but so far didn't find any. Gonnym (talk) 07:15, 8 May 2024 (UTC)
Seems like Jayam Manade (1986 film), Jeepers Creepers (1939 animated film), Jeet (1949 film), Jeeva (1986 film), In the Blood (1923 film), The Hypnotist (1957 film) should have been easy fixes for the bot. Any idea why it didn't catch these? Gonnym (talk) 11:01, 10 May 2024 (UTC)
Those all fit patterns that I haven't actually done a full run through with yet. Some of the single date with something after it (location or ref) patterns I haven't done a run through with yet. Most of what ran through completely so far is single dates with nothing after it, although I did do some of the more complex patterns attended while the unattended version was running - but I think I stopped at titles starting with "C". I've got a few more simple patterns that can run through yet which I'll do today. ButlerBlog (talk) 12:23, 10 May 2024 (UTC)
Now that I'm working through it again today, I remember why I wasn't doing ones like Jeepers Creepers (1939 animated film). The same regex that changes that would pick up Between Us (2011 film) incorrectly - at least the way I currently have it. ButlerBlog (talk) 13:18, 10 May 2024 (UTC)

I did some additional refining and I'm doing another unattended run with some additional simple patterns. Still don't have MDY & DMY with locations/refs quite refined enough to run unattended yet. There are a lot of entries that fit that, but what my patterns still get wrong is that my single date patterns still pick up entries that have two dates and locations - essentially grabbing the end of the second date as part of the first. Until I get that worked out (which I will eventually), I'm holding off on those. ButlerBlog (talk) 16:43, 10 May 2024 (UTC)

OK - I was able to work out a regex pattern that picks up the single date with locations (i.e. 11 December 1911 (South Africa)) while ignoring similar results that have two dates. I think that will allow me to do another full run. ButlerBlog (talk) 19:54, 10 May 2024 (UTC)
Update to all(?) of the above: I reviewed and refined a number of patterns based on the above. Articles like Jeet (1949 film) & Jeeva (1986 film) were still not getting picked up, which I could not seem to figure out because I had a specific regex pattern for {{Start date|YYYY|MM|DD|df=y}} which is what they had. Turned out to be a closing parenthesis in the wrong place - it didn't throw an error, it just produced a different result. So that was corrected. I also reworked the patterns that handled single dates with refs (previously was only doing year only to see how that went). All of these I was able to get down to a zero (or near zero?) error rate, and it should pick up everything that is a simple single date. In working through these, I learned a few new features in AWB as well as improved my regex knowledge - both of which will be well served when I go back to refine the original tv infobox date bot (I think I can simplify some of the more complex patterns based on what I did here). My next step will be working on some of the straight-forward multiple dates. (Also, I need to add in a pattern for {{start date}} if the value includes a reference or location) Thanks for bearing with me! ButlerBlog (talk) 11:55, 11 May 2024 (UTC)
Great job! Regarding the ref, this didn't fix it with the |ref1= parameter which causes the ref to be on a different line. Again, not a big issue and other editors will eventually fix it. Gonnym (talk) 12:23, 11 May 2024 (UTC)
Aw, nuts! I didn't realize that put the ref on a different line. There are a bunch of those. Unfortunately, with the {{Film date}} change, it removes them from the maintenance cat. Would it be possible to create a maintenance category for articles using {{Film date}} that have the ref outside the template? Then I could go back and fix those. ButlerBlog (talk) 12:40, 11 May 2024 (UTC)
Not at the moment, no. Infobox film does not use my complex behind the scenes validation and I don't see me creating that anytime soon, so the best there is just a check if it uses film date or not, not how it is used. But I don't see this as a big issue and that is a small cost to pay for the 36k page mess that was before your bot run. Gonnym (talk) 12:49, 11 May 2024 (UTC)
OK - well, I've adjusted it for anything going forward from here. ButlerBlog (talk) 12:51, 11 May 2024 (UTC)
I think I can use the "user contributions" setting to build a list of the bot's edits in AWB to go back over what I did so far and fix those. ButlerBlog (talk) 12:56, 11 May 2024 (UTC)

I reworked the patterns with refs to move the ref to |ref1=. Then I used the "user contributions" setting in AWB to get a list of the bot's edits and go back through to fix any ref/ref1 mistakes I think I got them all. I have added some additional simple patterns and am doing another runthrough with those added now. I expect is to be just under a 1:2 ratio of changes:skips, so hopping to trim another 1K or so off the list. ButlerBlog (talk) 12:49, 13 May 2024 (UTC)

@Gonnym: I was away (on a road trip) for the last week or so, which didn't give me much opportunity to work on the bot. Now that I'm back, I'm working on trying to get more of the film category cleared. I did some yesterday and today, and the revised patterns removed a few hundred entries. It's down to 1570 remaining as of this writing. I have noticed there are a lot of film series/franchise articles in the remaining list. I don't have a good count on how many, but it is a significant number. ButlerBlog (talk) 14:58, 28 May 2024 (UTC)
Welcome back! Yeah, I noticed that also. I was waiting for your runs to end before seeing how many we have left. I'll post on WP:WikiProject Film about moving away from {{Infobox film}} to either {{Infobox media franchise}} or a new {{Infobox film series}} with more specific parameters for a film series and without the unnecessary parameters that are used for films (director, writer, etc.). Hopefully other editors agree and we can move forward. Gonnym (talk) 15:02, 28 May 2024 (UTC)

@Gonnym: Sorry for going dark on this. My brain fried on doing minor regexes and I stepped aside for a bit. Then I got distracted on the Wikipedia:WikiProject Reliability/June 2024 Drive (I worked on sourcing Westerns mostly). I'm ready to come back around to what's left on this - if anything. When I left it, I felt most of what was left was oddball stuff that wasn't going to be well handled by a regex - or at least more than half anyway. Any thoughts on where the category stands at this point? ButlerBlog (talk) 13:53, 10 July 2024 (UTC)

TV category

Also, if you have a chance, can you run the bot on the TV category (notice that it's logic has expanded to the additional date parameters) so I can see what pages remain that need manual fixing? Gonnym (talk) 10:38, 8 May 2024 (UTC)

For sure! I've had it on hold for the time being because I need to make adjustments based on what's in there currently. ButlerBlog (talk) 19:14, 8 May 2024 (UTC)

@Gonnym: I had totally paused things last month because with the changes, the bot needs to be updated. I'm not sure where things are at this point, but I'll work on updating the TV bot and see if I can get it running again. ButlerBlog (talk) 13:53, 10 July 2024 (UTC)

A lot of the work is manually done by Aspects. I haven't touched the franchise/film series issue yet. A lot of other issues to fix before opening that can of worms. Gonnym (talk) 08:43, 11 July 2024 (UTC)

Next challenge?

Another challenge if you want... Once you're done with the above film and TV cleanup, there is Category:Pages using infobox television with missing dates. Pages there that are films (such as Abducted: The Mary Stauffer Story) which can be checked if the film category exists (such as Category:2019 television films). Replace |first_aired= with |released=. Gonnym (talk) 12:24, 29 May 2024 (UTC)

I'm always up for a challenge. That would be a good one. ButlerBlog (talk) 13:53, 29 May 2024 (UTC)
@Gonnym: I am still up for this. I'm going to start looking into it. This might be a little more complicated and require something other than AWB - possibly a python bot. I can work on doing that standalone, or as another AWB bot with an external processing module. Either way, I think it's more than just AWB. Primarily, this response is just to pull this back out of the archive so I remember it. ButlerBlog (talk) 13:56, 10 July 2024 (UTC)
Yeah it seems a bit more complicated. In the meantime User:Aspects has been fixing various issues so I'm sure we'll get there :) Gonnym (talk) 07:06, 11 July 2024 (UTC)

Thanks!

A Barnstar!
Thanks for participating in the June 2024 backlog drive!

You scored 2018 points while adding citations to articles during WikiProject Reliability's first {{citation needed}} backlog drive, earning you this Order of the Superior Scribe of Wikipedia (ossw). Thanks for helping out!

Pichpich (talk) 21:32, 10 July 2024 (UTC)

Your GA nomination of The Chosen (TV series)

The article The Chosen (TV series) you nominated as a good article has passed ; see Talk:The Chosen (TV series) for comments about the article, and Talk:The Chosen (TV series)/GA2 for the nomination. Well done! If the article is eligible to appear in the "Did you know" section of the Main Page, you can nominate it within the next seven days. Message delivered by ChristieBot, on behalf of David Fuchs -- David Fuchs (talk) 17:04, 12 July 2024 (UTC)