This is an archive of past discussions on Wikipedia:STiki. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.
I have no issue with the concept. However, I assume you want the position numbers to be dynamic? Can someone provide an example of another template that does this so I can see how they do it? I can imagine an opt-in list for participation, whereby the nightly leaderboard script would write to a sub-user-page just the position number, and then this page/number could be transcluded via the template. However, all this seems a bit burdensome, and I'd like to know if there is a cleaner way. Thanks, West.andrew.g (talk) 13:50, 6 June 2013 (UTC)
It can, but like I said, forcing someone to opt-in to place a user-box is a bit of a burden. Then, STiki has to make tens (hundreds?) of user-space edits a day to update everyone's position. This is inelegant. Other userboxes do this, (e.g., position on edit count list). How do they do it? or are they manually updated? Thanks, West.andrew.g (talk) 15:03, 6 June 2013 (UTC)
Upon further digging, this looks to be something virtually all other templates handle manually. The work in making this automatic is non-trivial, but it is something that I will begin iterating on. Thanks, West.andrew.g (talk) 15:17, 6 June 2013 (UTC)
I went ahead and asked there. We'll see what the outcome is, and if negative, I am prepared to hack it out the inelegant way. Thanks, West.andrew.g (talk) 16:39, 6 June 2013 (UTC)
Yes, thank you! I am just trying to clean up how the "example" looks at WP:STiki (since the basepage "STiki" is not an entry in the leaderboard). Thanks, West.andrew.g (talk) 21:39, 9 June 2013 (UTC)
You are welcome to maintain the box in your own userspace, but as Yaris mentioned, I think the exact count version is sufficient for the WP:STiki main page. Regardless, thanks for your efforts.
JRE Version 7 is definitely sufficient. For a while, we compiled back so Java 5 would work, though I wouldn't be surprised if Java 6 was now required to run STiki. However, it is worth mentioning that Java 6 was released in *2006*. If you'd provide some details of what is not working for you, we'd be happy to help you troubleshoot. It would be ideal if you could try to start the JAR from terminal/prompt/command-line so any error messages would be printed to the console (but if this makes no sense to you, just ignore it and describe your issue). Thanks, West.andrew.g (talk) 13:46, 9 June 2013 (UTC)
It works fine on my home computer. I have Java 7 Update 21, the latest, and the program works fine. The only problem is at school. Double clicking on the program just does nothing. One co outer I tried says it has Java 7 Update 11. When you say command line, do you mean cmd.exe? Or something else? -- (T) Numbermaniac (C)01:17, 10 June 2013 (UTC)
Yes, cmd.exe in the Windows case. However, if you are only running into the issue only at school, I suspect your network is blocking/firewalling the communications ports that STiki needs to communicate over. This is not uncommon, although STiki is usually able to detect this and display an information dialogue indicating as much. Thanks, West.andrew.g (talk) 02:05, 10 June 2013 (UTC)
I do believe the school computers have Symantec. Does that block it? Even so, Symantec usually says something. Also. How would I run it from cmd? -- (T) Numbermaniac (C)03:08, 10 June 2013 (UTC)
I can't say for sure, but outbound access on port 22 is sometimes blocked. See if the cmd line gives you any helpful information: [1]. Thanks, West.andrew.g (talk) 13:20, 10 June 2013 (UTC)
I have rollback and reviewer rights plus enough wiki experience (article editing, watching and verifying edits and changes). I think STiki will be a useful tool for me. Please grant me access to STiki. Thanks. Zyma (talk) 20:31, 9 June 2013 (UTC)
Hi Zyma,
If you have rollback rights you don't need any additional permission. Just download STiki and run it.
Started and I got this error. I think it's because of port blocking by my ISP. Is there anyway to change STiki port (3306 to another unblocked port)? Or other troubleshooting? Zyma (talk) 19:28, 10 June 2013 (UTC)
Are you in some institutional/corporate/educational setting? It is atypical for home connections to be configured this way (or at least you should be able to fix the firewall to your liking). STiki needs to communicate over 3306 (mysql). There is a feature request to switch to standard communications over HTTP/80, but this won't be fixed any time soon. I think there has been some luck with HTTP tunnelling efforts, but I haven't experimented enough to speak confidently on them. Thanks, West.andrew.g (talk) 19:43, 10 June 2013 (UTC)
Meh, I don't know. I think that being able to start a new thread and coherently state your request without the box is a good measure of whether or not someone should be granted access, especially in edge cases--communication skills, no matter how underplayed, are still an important part of counter-vandalism. Theopolisme(talk)04:32, 11 June 2013 (UTC)
Basically what was said above. I wasn't too excited with the format or wording -- and I was terribly tired -- so I just wanted to revert and have some talk page discussion before/if it was implemented. The main WP:STiki page does make it pretty obvious one should apply here and we don't need to solicit requests from those who don't know what is going on. If we were to proceed with such a button, it should take on the form of a template/application that eases the task of making access decisions from my perspective (i.e., a quick link to an edit counter, ask the user to estimate revert count, another question or two etc.). Thanks, West.andrew.g (talk) 00:12, 12 June 2013 (UTC)
The user has 5,000 STiki classifications. This includes classifying edits as innocents, which will result in no edit. Yaris678 (talk) 12:41, 11 June 2013 (UTC)
Greetings. I have rollback rights and, among other tasks, patrol recent changes. I came here to request access to STiki, but have just seen from an earlier post on this talkpage that rollbackers apparently don't need to formalise a request. If that's the case, please consider this a courtesy call, together with a thankyou for making this necessary tool available. Cheers! --Technopat (talk) 12:32, 11 June 2013 (UTC)
Hello, most of my contributions are based on reverting the original content and to keep the pages up to accurate announcements and notices, i would love to be a part of helping team. I always try to keep everything im involved accurate and user friendly, thats why im requesting this. Thanks in advance ((Argento1985)13:36, 11 June 2013 (UTC))
Not done --- Hi Argento1985. While I appreciate your recently increased participation in Wikipedia, I think it best to wait a little while before granting STiki access. Most of your history seems focused on the article of a single soccer club. While such dedication is admirable, it hasn't granted you the broad experience needed of a vandalism patroller. Moreover, your recent help/Teahhouse requests would seem you aren't completely familiar with the formal structure of warning/AIV/etc. This is not a criticism over your work. Instead, as was suggested on your talk page, the WP:CVUA process can quickly bring you up to speed on our norms and train you for such work. If you find a mentor and complete that process, return here, and I will happily approve your account. Thanks, West.andrew.g (talk) 00:29, 12 June 2013 (UTC)
As I mentioned in a just-archived thread, I will be unable to get to the issue any time in the foreseeable future. It is a goal of mine to get the code onto GitHub in a way that will allow the STiki community to become more independent/diverse in whom it relies on for its maintenance and changes. I am not jumping ship, I just have a whole new set of professional priorities I need to accomodate. Thanks, West.andrew.g (talk) 19:31, 13 June 2013 (UTC)
Hi, I reverted an edit (see here), and was surprised that STiki didn't automatically welcome new users, such as (I think) Twinkle does. Would it be feasible to have an automatic welcome of new users whose edits have had to be reverted? Thanks, Matty.00718:52, 14 June 2013 (UTC)
Is it feasible? Yes, trivially. Is it a good idea? I'll leave that for the STiki community to discuss. I can almost recall this coming up at some point in the past (one can peruse the archives if they feel so inclined). I will point out that WP:Snuggle and others (the Teahouse?) are taking much more formal approaches to the welcoming process. Rather than welcoming those we pseudo-welcome with a level-1 warning, it might be more interesting to welcome those new users we DON'T revert (but again, this might be a feed we could provide the Snuggle folks, a project led by another wiki researcher with whom I am familiar). Thanks, West.andrew.g (talk) 20:03, 14 June 2013 (UTC)
It does in some cases-"The welcome is issued only if the user is notified about the deletion, and only if their talk page does not already exist." Thanks for the ideas, User:West.andrew.g. Matty.00709:32, 16 June 2013 (UTC)
STiki permissions
I would like permissions to use STiki please. . .I love the value of information I am able to get from Wikipedia. For nearly 5 or 6 years now, I have used Wikipedia almost daily to look up anything I can imagine. Lately, in the last few weeks/months I've gotten much more serious about Wikipedia, as I came to appreciate just how important keeping the articles as up to date and accurate is. Thank you for your time. EzPz (talk) 19:53, 15 June 2013 (UTC)
Not done -- Similar to the admin's response to your recent rollback request, your above text says very little about your ability to distinguish innocent/good-faith/vandalism edits. Looking at your contribution history, you seem to only be getting started learning about anti-damage tools and the norms of our warning process. While I appreciate your renewed interest and participation in Wikipedia -- there is still much to be learned. As I suggested to Argento1985 above, the WP:CVUA can guide you in this process. With their help, you can quickly become a more accurate and efficient editor. Thanks, West.andrew.g (talk) 18:24, 18 June 2013 (UTC)
Greetings STiki Team. I just did a quick search through the archives to see if other users had reported similar hiccups/hiccoughs, but everybody's circs. are their own, so am raising mine. My first few incursions with STiki ran perfectly smoothly (it's always taken a few seconds for the system to kick in, but I took that for granted), but the last half dozen times that I've logged on, the tool has started freezing/hanging after a while (I'd say around 20 minutes, but I've never actually timed it). It's no big deal, 'cos I just log out and log back on again, but thought I'd bring it up in case it's a solvable.
Now that I'm here, another thing is that when I click on "Current page" in "Edit Properties" I get sent to a non-logged-in version of the Wikipedia page. Again, it's no big deal 'cos I just log in, but one does aim for that time-and-motion and/or user-friendly efficiency :) BTW, if I haven't already done so, I'd also like to thank you for making this great tool available to the community. Cheers! --Technopat (talk) 11:33, 20 June 2013 (UTC)
Greetings, I'd be happy to help with these issues, but they are a little vague to address directly at this stage (not your fault). I'll assume you are running the most recent version of the software. Are you able to run STiki from a command-line or terminal environment? Sometimes STiki will spew out some helpful debugging information that isn't popped as an informational dialogue.
Regarding the first issue, is there any particular classification/action that tends to trigger these hangs?
Regarding the second, all the other links send you to a logged-in version of WP except for that one? Does this happen for anyone else? Could it have something to do with the inclusion of the "twinkle, article=" parameter in the last update?
As a note to the community, I am no longer able to debug STiki during normal business hours (although I can occasionally do the talk pages). Thanks, West.andrew.g (talk) 14:19, 20 June 2013 (UTC)
Thanks for the reply West.andrew.g. Sorry about the vagueness... despite the monicker, I'm a mere user, not a technician, and have no idea what information could be of use to you. There's no rush to look for solutions during normal business hours: as I mentioned, it's no big deal and I solve it by logging out/on. Having posted the above, of course when I logged on today everything ran smoothly, but I'll try to bear your "first issue" question in mind if/when it freezes. As for the links, I've just checked and it happens to the 5 in "edit properties" and to the 3 in "last revert". As for Twinkle, I am a TW-user. Cheers! --Technopat (talk) 15:18, 20 June 2013 (UTC)
Towards the second issue, does your browser clear all its cookies upon exit? i.e., if you open the browser after restarting your computer, will you need to login to Wikipedia in order have access for that session? All Java is doing is asking your browser to open and pull a URL. It is not passing your username/password/session data with that request. When I click through via STiki, I am logged in by virtue of the fact my browser has remembered my credentials (via a cookie). The fact this is occurring for you with all GUI links makes me confident the issue is on your end (as this would drive STiki's users crazy if it normally worked this way). Thanks, West.andrew.g (talk) 15:46, 20 June 2013 (UTC)
Technopat, Here is a question that may help to diagnose the logging-in issue. If you keep your browser open once you have logged in to STiki, do you need to log in to Wikipedia again, next time you open a wiki page from STiki? Yaris678 (talk) 16:38, 20 June 2013 (UTC)
Greetings Yaris678. I'll start with your question, 'cos it's easier to answer. No, I don't need to log in again to Wikipedia, i.e., I don't lose my existing connection, even if I link from the 5 links in "edit properties" and the 3 in "last revert" (these open up to the corresponding page in a new window but in an un-logged mode.) Cheers! --Technopat (talk) 10:17, 21 June 2013 (UTC)
It is a setting in your web browser that is causing you to have to log in each time a link is clicked within STiki; most likely: (1) cookies being cleared at browser exit, or (2) your not checking the "remember me" option at the time of sign-in. Thanks, West.andrew.g (talk) 23:44, 21 June 2013 (UTC)
(outdent) Thanks for that. I'll check it out now. If it's as simple as that, I won't be back to bother y'all, so thanks again. Cheers! --Technopat (talk) 18:57, 22 June 2013 (UTC)
Congratulations to STiki! 1 million classifications
Just about a day or so ago, STiki crossed the 1 million classification threshold! It was the review of this edit by User:Widr (which he classified as "innocent") which was precisely number one million. The fact our current revert count sits at around 332k also indicates that STiki has had around a 33% "revert rate" throughout its existence (highly unscientific!). This is an accomplishment that speaks to the contributions of all members of the STiki community, and I hope to have your continued support as we reach for 500k reverts and beyond! Thanks, West.andrew.g (talk) 13:50, 24 June 2013 (UTC)
That's great. Congratulations to the developers and the fellow STiki users. Hope that we carry on this cruising progress. Faizan13:54, 24 June 2013 (UTC)
I have a database of every classification done using the tool.
Different queueing/scoring mechanisms use different criteria in choosing what edits to display. CBNG uses primarily language probabilities and may not take into account the permissions of the user making an edit. The "metadata" queue does. Thanks, West.andrew.g (talk) 13:35, 25 June 2013 (UTC)
Change requested
Hey! I've been using STiki a lot these days and the software is amazing. I saw that some of the edits IPs made we constructive. Now, giving credit where credit is due, I suggest we add a feature to thank such IPs for their contribution and welcome them to Wikipedia. Suggestions welcome! —Avenue X at Cicero (t·c) sends his regards @ 14:43, 29 June 2013 (UTC)
This is an interesting idea. I know that in the past Andrew has pointed out that WP:Snuggle is much better equipped for looking after new users and said that he'd prefer STiki to concentrate on vandalism.
I agree with the point about STiki being focused but I also notice that Snuggle only deals with new logged in users.
It makes me wonder. Presumably a Snuggle-like system for IPs would have to be different. IP addresses can be shared. Then again, a lot of them aren't.
Going back to Avenue X at Cicero's point, about thanking... there is the new thank functionality. This gives a "Thank" link next to "Undo" when looking at an edit. Does everyone get that or is it just reviewers or something? The problem is... that can only be used to thank logged-in users!
I am comfortable with implementing this (along with most other suggestions), I just do not want to duplicate the work of the (seemingly) more robust Snuggle system. It appears we might have found a novel niche here -- and the fact we actually have a human classification (and not just a machine generated score) is an asset in the welcoming process. I don't know how the Teahouse does their welcoming work, if someone wants to look into that. If the community can arrive at a criteria and template to support this welcoming, I will encode it as an option within STiki. Some thought also needs to be given to how this is presented to STiki users. Thanks, West.andrew.g (talk) 16:29, 29 June 2013 (UTC)
I don't know exactly how the Snuggle UI works, either. However, there is always the possibility we produce an IRC feed of "innocent classifications of new IP edits". If Snuggle wants to consume this and do their thing, cool. If they don't or we can do it better, I'll just build it directly into STiki. West.andrew.g (talk) 16:38, 29 June 2013 (UTC)
Putting aside last month's queue hanging issue, I've had a number problems using the CBNG metadata; old edits (as old as 460+ days) after leaving an edit unclassified for a specific period of time (approx 2-4 mins, this does not happen with the STiki metadata), and a lower vandalism catch rate when compared to the STiki metadata. What gives? hmssolent\You rang?ship's log02:49, 26 June 2013 (UTC)
@HMSSolent: One or more people have been getting through classifications at a tremendous rate recently. Whereas usually there are a couple of hundred classifications per day, in the last fortnight or so, there have been thousands. (In the last 24 hours ClueBot NG - 6928(18.00%) Metadata - 893(6.00%)) This is a much faster rate than feeds in, and therefore the backlog has been clearing. I'm putting it down to school holidays. (Andrew, am I right? ) 930913(Congratulate) 03:08, 26 June 2013 (UTC)
Technically all is fine. However, an influx of new and/or exuberant users are now handling upwards of 10,000 classifications per day (at peak). This isn't faster than the feeds come in, but we're certainly pushing to queue depths we haven't seen in the past. Overall hit-rates have been as low as 10-11% as of late (compared to 33% probably a month ago). Most of this traffic is in the CBNG queue, so hit-rates are probably a little denser in the "metadata" queue (but as soon as I say that...). As I've stated, STiki's popularity is a cache-22: the more vandal fighters we have, the less incentive there is for everyone to keep participating. Thanks, West.andrew.g (talk) 03:17, 26 June 2013 (UTC)
One way to improve the hit rate is to widen the whitelist. Currently, edits are ignored if they are by a bot or are a STiki revert. You could easily widen that to include admins and reviewers and maybe even rollbackers.
I know you prefer to let the machine learning speak for itself. However, consider the following two points:
It pragmatically makes sense. i.e. it improves the hit rate for little extra effort and negligible increased risk of missing an actual vandal.
I am fine with doing this and writing other simple filters atop the CBNG logic. Do these cases come up that often? This takes little time investment on my end, but does eat bandwidth, so I'd like to keep the filter relatively narrow (the metadata and CBNG queues are in completely different logical silos). Do we imagine this simple permission rule being the most effective? We're often so concerned with identifying features that best identify vandalism, that we rarely consider the opposite viewpoint (i.e., what features best identify innocent edits) -- as these are not strictly complementary/negation sets. Thanks, West.andrew.g (talk) 23:31, 26 June 2013 (UTC)
Just fired up STiki this morning and the CBNG queue has started showing up 400+ day old edits again - all being constructive edits. None of the edits I've received are anywhere below 24hrs. What's going on? hmssolent\You rang?ship's log03:22, 1 July 2013 (UTC)
Hi everyone - new STiki user here. Is there a way to determine how other STiki users have classified edits that I pass on? It seems like it would improve the learning curve. If there isn't, and other editors think it would be useful, please consider this a request. :-) Arc de Ciel (talk) 08:00, 30 June 2013 (UTC)
I can't do this using available data, but it would not be difficult to implement. How do you envision this information being transmitted to an end user? A rote listing to a wiki page is not the most intuitive, and I imagine it would grow very-very large. West.andrew.g (talk) 22:17, 30 June 2013 (UTC)
Thanks for the responses! I'm really not sure what the best option would be, but I was imagining a window in the STiki interface which would show maybe the 50 most recent classifications you passed on. I don't imagine the data would need to be stored for longer than that. It doesn't even need to identify who made the change, if editors prefer it that way. Arc de Ciel (talk) 03:26, 1 July 2013 (UTC)
Perhaps something as simple as "Of your passes [in x time, y of which haven't been reclassified yet], 40% were classified as vandalism and 60% were classified as innocent, with 30% being passed on again." It could give useful information, such as if it told people that 80% of their passes were innocent. 930913(Congratulate) 09:19, 1 July 2013 (UTC)
Also, how should I treat edits which are unconstructive but already reverted, or which I changed through the normal interface because I didn't think any of the four options applied? Thanks. Arc de Ciel (talk) 08:00, 30 June 2013 (UTC)
Classifications made in the panel after an "external edit" has been made will have no effect on the encyclopaedia. That is, making a change in your browser and then pressing "Vandalism (revert)" will *not* revert your edit. However, those classifications are recorded for learning purposes. This way, if someone beats you to a vandalism revert, we can still learn that it is vandalism. If innocent/vandalism/AGF don't apply in your opinion, the "pass" option seems most appropriate -- this is basically a null choice. West.andrew.g (talk) 22:17, 30 June 2013 (UTC)
I see - so I assume that in general, STiki won't take an action if any change is made that would normally prevent an undo due to edit conflict? Then I also assume that in such cases, it won't present the same change to another editor even when I pass? Arc de Ciel (talk) 03:26, 1 July 2013 (UTC)
Yes. If an edit has been made to an article after the one shown in STiki then STiki will not change the encyclopedia and will not show the edit to another user. The new edit will go into the STiki queue, instead of the one that is currently being shown. (If the new edit is not vandalism then the chances are that it is so far down the queue that it will never be seen by a STiki user) Yaris678 (talk) 11:37, 1 July 2013 (UTC)
Congratulations, STiki! You're receiving this barnstar because you've made {{{1}}} classification threshold using STiki in January,2025. We thank you both for your contributions to Wikipedia at-large and your use of the tool. We hope you continue your ascent up the leaderboard and stay in touch at the talk page. Thank you and keep up the good work! West.andrew.g (developer) and ~~~~
{{User:Pratyya Ghosh/STiki Tireless Contributors Barnstar}}
You need to add the count by posting like this.
{{User:Pratyya Ghosh/STiki Tireless Contributors Barnstar|3000}} will produce this.
Congratulations, STiki! You're receiving this barnstar because you've made 3000 classification threshold using STiki in January,2025. We thank you both for your contributions to Wikipedia at-large and your use of the tool. We hope you continue your ascent up the leaderboard and stay in touch at the talk page. Thank you and keep up the good work! West.andrew.g (developer) and ~~~~
It'll be start from the first of the month. I mean Date 1 and will end at the last of the month What's your opinion?
(1) If you want to glean this from the "last 30 days" column of the leaderboard, that is fine. A little manual math and daily diffs could help you adjust for months with 31 days (and the real kicker, February). I, however, will not be writing a specialized statistical dump for this purpose. (2) The grammar/wording is quite poor in your proposed barnstar. Also, our boilerplate text is also becoming quite tiresome. Fix these/that and think of something more creative to say. (3) I won't be facilitating this in any official fashion -- but this being a collaborative community, members are allowed to give awards however they see fit. I do think we should be mindful about diluting their significance -- but if this is a single barnstar each month, I do not see the big deal. West.andrew.g (talk) 14:12, 3 July 2013 (UTC)
If it is to be given, I will recommend it only for the table topper each month, so as to keep a value in the award. However it may become bit repetitive as winner will be always one of the two :). On a different note, Andrew do you have any data or graph which shows the total Stiki classifications per day or per month. --Vigyanitalkਯੋਗਦਾਨ10:59, 8 July 2013 (UTC)
I've pasted raw data here. At right is also a CDF of STiki usage produced a couple months back for purposes of my dissertation. The increasing slope basically shows the growing popularity (which has even sharpened over the past few months). Thanks, West.andrew.g (talk) 14:27, 8 July 2013 (UTC)
I don't see the benefit of this. It would add another column to an already wide leaderboard. After all, the percentage (and quantity) of innocent classifications can be derived from other columns on the board: (Vandalism + AGF + Innocent) = 100%. Recall, when "pass" is clicked it is not really a classification but the absence of a classification. Passes are not helpful from a machine-learning standpoint, are difficult to audit, and should not be included in the total due to gamesmanship concerns. West.andrew.g (talk) 14:17, 3 July 2013 (UTC)
Critical backend connection failure
Many people confirming the fact the STiki server is down...
Andrew, I'm getting a complete failure to connect to the STiki backend on two different networks. I checked my internet connection (obviously), port 3306 is open to the best of my knowledge, I'm not seeing an update to the software, so this leads me to believe that something is down on the STiki server end. Just wanted to notify you of this. Please let me know if I'm completely wrong. As you know, I use STiki very frequently, though, and I haven't made any firewall adjustments... --Jackson Peebles (talk) 04:41, 18 July 2013 (UTC)
Yep. If anyone feels they need a fix of anti-vandalism work, I recommend you try Huggle, if you haven't already. Not as good as STiki in many ways, but at least its an option. Yaris678 (talk) 13:49, 19 July 2013 (UTC)
A power failure in the server room is to blame (partially the fault of this heat wave, I have to imagine). My contacts estimate everything should be back online within two hours, and I'll set about regenerating the awards and all that stuff fairly quickly. Thanks, West.andrew.g (talk) 14:24, 19 July 2013 (UTC)Link title
It will be fixed within the day. Coincidentally, I am travelling to Philadelphia (where the machine lives) and will get the chance to investigate what keeps going wrong. I can't do anything about the occasional power or networking blip, but the crux of the issue seems to be that the machine will not auto-reconnect to the network when these things occur. Quite literally, these outages are fixed by having a colleague login to the machine and press a "Connect" button (total time: about 5 seconds). The settings are enabled for auto-reconnect -- they just don't work. Thanks, We st.andrew.g (talk) 13:06, 23 July 2013 (UTC)
Done -- I brought the server back up ~8 hours ago. However, the most recent power loss seems to have occurred when STiki was writing to the database. This corrupted the indices on some tables. I am assuming this is what caused the weird/old edits being popped from the queue. A repair action was successful, but I assume it locked the database while it worked, and this is why some people were reporting subsequent difficulty. All works fine for me as of this moment and all statistical updates have been conducted. I also switched network managers on the machine in the hope the machine will come back online more smoothly in the future (it auto-restarts from a power perspective, but the network connection has been the fail point recently). West.andrew.g (talk) 23:38, 24 July 2013 (UTC)
I guess these recent issues are an argument to move the STiki server on to a Foundation-run machine. e.g. at Wikimedia Labs. The Foundation seems to be able to run a less interrupted service.
Another argument is favour is that it might make it easier to do a non-English STiki. (Obviously there would be other obstacles to that, but one less obstacle is good).
Is there currently an obstacle to moving the STiki server onto a Wikimedia Labs machine? e.g. Do you have an account with them, Andrew?
I am not opposed to this suggestion. I do (or at least, did in the context of the copyright stuff) have Wikimedia Labs access. My hesitancy comes from the fact that STiki has a non-trivial footprint and dependency set. It has a 50GB MySQL server behind it, must have port 3306 un-firewalled for communication, and needs IRC on the machine. None of these are severe but there may be a bit of bureaucracy involved. Perhaps we could get the ball rolling with an admin over there to see what might be involved. I'd *really* like to get some of my statistical stuff running over there (assuming they have TB of space to spare) -- so I can start doing other languages -- as I get almost weekly requests for WP:5000 like things for other languages (yesterday it was Russian).
In addition to the additional legwork, I also admit to being stubborn because this simply shouldn't be happening. If I can solve the software issue so that the network interface auto-reconnects these we would be 2-minute outages instead of 2-day ones. However, I also recognize that in the mean time these outages are unacceptable for a tool that is trying to grow, so I am earmarking a transition for investigation. Thanks, West.andrew.g (talk) 17:28, 25 July 2013 (UTC)
Rollback
Hi Andrew. As one of the regular admins who work at Wikipedia:Requests for permissions/Rollback for the past couple of years, I'm beginning to feel some concern for the accord of use of Stki. We get requests for the Rollback right from editors who are already using Stiki, but who clearly are still not aware what constitutes vandalism and/or what Rollback and Stiki may be used for. I realise that there may be other criteria for obtaining the use of Stiki, but I have been declining too many applications for Rollback form Stiki users for comfort. Kudpung กุดผึ้ง (talk) 08:42, 26 July 2013 (UTC)
Hi Kudpung, do you have some links that give examples of such declined requests? Perhaps these will show how we need to tighten up the rules. Yaris678 (talk) 10:30, 26 July 2013 (UTC)
I don't think it's a really big issue just for the moment, so I won't delve into the complex system of archiving at PERM, but I'll mention any new instances that come to light. But when someone says: Rollback would be helpful in the instance of mass reverting non-consensual/uncited genre changes by IPs ... I am also using STiki to revert cases of vandalism already, it's kind of worrying. Kudpung กุดผึ้ง (talk) 12:55, 26 July 2013 (UTC)
We are always open to discussion on how to best modify the permissions system, assuming there is evidence to support those suggestions. I'll note that since we changed to the current format I can't think of anyone who has come to WT:STiki to complain about a user, and I can only think of one individual we had to reprimand ourselves. I regularly check the talk pages of STiki users (especially new ones) to ensure they aren't getting complaints from others. I think we are certainly achieving a net positive with those users who use STiki w/o rollback. West.andrew.g (talk) 13:55, 26 July 2013 (UTC)
Hi Andrew. Thanks for that. If you're happy with the way you are according access to Stiki, that's fine with me. I will however continue to decline applications for Rollback if/when a user is not clear on what it is for. Kudpung กุดผึ้ง (talk) 03:59, 27 July 2013 (UTC)
About using STiki
I have 1070+ mainspace edits, and I'm dealing with vandalism for a fairly long time. Can I use STiki now for reversion of vandalism?--AsceticRosé14:58, 30 July 2013 (UTC)
Stuff that makes it less likely that something is vandalism
I have been thinking about what Andrew said at Wikipedia talk:STiki/Archive 11#CBNG queue. We were discussing the possibility of having a whitelist based on access level (e.g. admins or admins and reviewers). Andrew said
Do we imagine this simple permission rule being the most effective? We're often so concerned with identifying features that best identify vandalism, that we rarely consider the opposite viewpoint (i.e., what features best identify innocent edits) -- as these are not strictly complementary/negation sets.
I think the best way to investigate this is to do a plot similar to Figure 6.5 from Andrew's dissertation. The plots to the right show the sort of thing I mean.
These plots show how the actual revert rate differs from the calculated probability of being vandalism. You could take the raw probabilities from STiki or CBNG or take them after they have been modified for time enqueued. It would be worth doing separate plots for the STiki and CBNG probabilities as I suspect that the CBNG ones will show more change.
The second of the two plots uses a very basic and easy to calculate measure of the experience of a registered user: The minimum of the number of edits and the account age in days. Obviously, there are variations on this, such as min(num edits, 2*account age (days)) or min(num edits to NS0, account age (days)). But I suspect the measure given will be pretty good.
Another thing you could do while in the database is see what proportion of classified edits are by Rollbackers etc and work this out for different calculated vandalism probabilities. I suspect you will find that for a calculated probability of 80% the proportion of rollbackers will be very low but for a calculated probability of 5% it will fairly low but significant. This would imply that filtering out rollbackers etc. adds more value as the popularity of STiki increases and the number of low-probability edits being inspected increases too. Then again, maybe the proportion is always very low and its just that people remember such instances when they happen. This is why looking at the data is good. We may find that filtering by access level isn't worth the CPU cycles but filtering by min(num edits, account age (days)) is.
Going beyond whitelists, I can imagine queue prioritisation based on something that multiplies the CBNG probability by some function of some measure of experience would be an improvement. A figure like "Revert rate by number of edits or account age" would tell you if this is worthwhile and would also indicate what function to use.
A whitelist is a good idea. I'd also suggest edits made by rollbackers - sometimes I see a Huggle edit and it obviously was reverting vandalism, or an admin making edits, and they're definitely not vandalism. — kikichugirlinquire19:01, 30 July 2013 (UTC)
Yaris, I like these suggestions and will eventually get these graphs produced -- they are a bit more involved than simple/straightforward queries. Should be interesting, though. I'm just a touch overwhelmed at the moment and I have promised some other WikiProjects some data processing first. Contributing to that stress, I will be in Hong Kong for Wikimania should any other users be there and like to have a chat/drink. Thanks, West.andrew.g (talk) 23:24, 30 July 2013 (UTC)
Thanks Andrew. Looking forward to those graphs. I appreciate you are very busy.
I won't be in Hong Kong for Wikimania 2013. However, it is very likely that I will be at Wikimania 2014 in London. (Since its so close I would be silly not to go.)
Hi! I read your dissertation after it was mentioned in this week's "Signpost", and that set me thinking.
For over two years now I've been checking the recent changes feed for the "help" and "portal" namespaces. No bots patrol those edits, and I *think* I am the only editor checking them systematically. So when I check in at 6-7am in the morning (I'm in the UK), it's not unusual to find that highly-visible pages have been overwritten hours ago with stuff that ClueBot would have reverted from articles in seconds.
Now, Cluebot won't touch pages outside the main namespace because it hasn't been trained. Could these edits be fed into the STiki queues somehow?
Here are examples of my reverts in the Portal and Help namespaces.
One "gotcha" is that the page view statistics are not reliable for portal sub-pages. For example, the statistics for Portal:Technology/Intro claim that it is viewed very rarely, but it is actually transcluded into Portal:Technology and is viewed 2000 times a day.
Oh goodness, some has actually *read* that? The namespaces you mention seem to be very valid candidates for autonomous inspection. STiki and its metadata model are perhaps better poised to do this than the CBNG folks (I imagine the "help" namespace in particular has some very different language conventions than normal article text). There are some barriers to accuracy and bootstrapping, e.g., the reputation metrics for these articles need to accumulate. The one you mention about page popularity statistics is an insignificant factor, though.
There will be some minor issues and we probably won't be able to reach the performance benchmarks achieved with vandalism. However, if its just a matter of piping these edits through existing models to provide an improvement over the status quo; then this shouldn't be terribly hard from a code perspective. I'd prefer to dump these edits into an alternate queue for inspection. I'll put this on my investigation list (and add to the feature table); but I'll be honest that I am a bit backlogged right now with stuff for WP:Turnitin and some page popularity stuff. Thanks, West.andrew.g (talk) 16:59, 5 August 2013 (UTC)
Yes. This seems like a sensible improvement on the status quo and doing it as a separate queue makes sense.
Would it make sense to also include the Wikipedia: namespace? I think this will be similar to Portal: and Help: in the important respects. Or is the Wikipedia: namespace better covered by other things?
The User: namespace would be an interesting one... I think it would be basically the same but we might want to exclude edits made by a user in their own user space.
Templates might also be similar in terms of the learning from metadata. However, it might be worth putting the Template: namespace in a separate queue, just because some STiki users would be more familiar than others with the template language.
And then you've got all the associated talk pages. I suspect these will be different in some way.
I hope I haven't just made things unnecessarily complicated. You could, of course, just start with the Portal: and Help: namespaces and see how that goes before considering the other possibilities.
I have been notified by network administrators that there will planned network downtime on Friday, August 16th, between 7 AM and 10 AM (Eastern time of the United States, i.e., NYC time). This should be a decent opportunity to test the machine's ability to bring itself back online. I will be watching the status and my contact in PHL is also aware of the possibility things might not go as planned (in which case transitioning to Labs infrastructure will gain some priority on my TODO list). Thanks, West.andrew.g (talk) 17:32, 6 August 2013 (UTC)
Evaluation of STiki
I see some editors making hundreds of reverts in a day or two using your tool (mostly edits by IPs, by the way) and I'm left with some questions:
Have you evaluated your software/script to see how many acts of "vandalism" that you detect are actually vandalism?
Are you currently evaluating your success rate in detecting edits that deserve to be reverted? Do you spot check examples to see whether edits you highlight as bad or vandalism are not valid?
Is there some kind of bragging rights that go to editors who complete the most reverts over a short period of time? Do you think quantity of reverts is somehow more important than the amount of thought and consideration that should be given before reverting another editor's work?
Do you target edits from IP users and are they subjected to more scrutiny than registered users?
Are there actually users who are getting blocked solely because your tool says they are vandalizing Wikipedia?
STiki doesn't do any reverts itself. It merely shows edits to users and allows them to revert the edit if they so choose. In terms of blocking, there are more people involved, not least the administrator that makes the block. An admin will always look at the facts before blocking. If an admin has used poor judgement, that is not a STiki issue.
STiki will serve edits to users based on its calculated probability that the edit is vandalism. Of course, this is just a probability, which is why the users are important. STiki can serve edits by IPs and registered users. If the greatest proportion of these is from IP edits, the most likely explanation is that the greatest proportion of vandalism comes from IP edits. However, it is possible that the two proportions (of served edits and of edits that are vandalism) are not equal due to the different information available (IPs give location, accounts give age of account, for example).
The danger of some editors getting "edit-count-itis" has been noted before. Editors are encouraged to use proper judgement. If you think a particular editor is not taking enough time over decisions it is probably best to take it up on their user-talk page. If that doesn't help then you can bring it up here and if necessary they can be blocked from using the tool or blocked altogether.
Yaris has done a fine job in his general answer, so I'll be brief in providing a more technical response.
Have you evaluated your software/script to see how many acts of "vandalism" that you detect are actually vandalism?
Yes, we have used establish anti-vandalism corpora to train and evaluate our classifiers under cross-validation. The WP:STiki page references the academic writings that contain these results. As a tool, however, STiki makes no guarantees that it is displaying vandalism. The tool permits orderly access to queues sorted by vandalism probability. If enough people are using the tool, it may be likely they are *not* seeing vandalism (but probabilistically speaking, the edit is more likely vandalism than those not yet inspected by tool users). It does not make classifications along binary terms.
Are you currently evaluating your success rate in detecting edits that deserve to be reverted? Do you spot check examples to see whether edits you highlight as bad or vandalism are not valid?
Indeed we do, particularly those of new users. It should be mentioned however that there are entry requirements for users who use the tool so these are not complete novices. If they are screwing up here, users are just as likely to be erring elsewhere. Neither is acceptable.
Is there some kind of bragging rights that go to editors who complete the most reverts over a short period of time? Do you think quantity of reverts is somehow more important than the amount of thought and consideration that should be given before reverting another editor's work?
Reverts are never rewarded. We do encourage participation via the "classifications" metric. Quality is generally more important than quantity, and that is why the "pass" option exists so users who feel inclined to move quickly do not need to muck up the borderline cases.
Do you target edits from IP users and are they subjected to more scrutiny than registered users?
If "targeting" means using prior statistical evidence as the basis for predictions, then "yes". If my memory is correct, something on the order of 80% of vandalism is caused by IP users. However, there are also reputation and geo-location metrics that branch off of IP address that could lessen or increase the scrutiny given to any individual IP address.
Are there actually users who are getting blocked solely because your tool says they are vandalizing Wikipedia?
Again, my tool claims nothing is vandalism in the binary sense. It only shows the most likely vandalism that has not already been inspected. The burden falls to humans to decide what is vandalism. Even then, a set of users disjoint from the tool determines whether abusive histories are deserving of block actions.
Anything unusual going on, or is it really that no-one has passed a milestone in the last 6 days? Just thought it was a bit mysterious. -- tnumbermaniacc07:16, 19 August 2013 (UTC)
I have no evidence that anything unusual is going on, and technically all is okay with the reporting. Non-scientific user-page browsing would suggest that lots of our additions over the past weeks/months were students possibly on Summer holiday. Presumably we will see less of these team members as classes resume; and less new ones altogether. I'm always a fan of recruitment, but it must be carefully done to avoid spam-like behaviors. West.andrew.g (talk) 15:19, 19 August 2013 (UTC)
I think WP:FULL is overkill. Lots of non-admins have contributed much to the authoring of that page. It has been under WP:SEMI before when it got some bursty vandalism, and I think that is appropriate as necessary. One could change the links to something malicious, I suppose, but this is true of many links on Wikipedia. STiki is probably a far loss prominent threat than many others. In the extreme case one could place links on a separate page under full protection and then transclude them into the larger document. West.andrew.g (talk) 02:06, 20 August 2013 (UTC)
See the above post (two up) about anticipated downtime. It's apparently stretched on a bit longer than expected. I have a contact at UPenn that can confirm when everything has broadly come back up for the University so we can confirm that STiki should be doing the same. Thanks, West.andrew.g (talk) 17:00, 16 August 2013 (UTC)
That speaks to difficulty on your end, then. The server was known to be up and working less than 30 hours ago, because I have operational reports of that fact -- and it managed to update the milestones/leaderboard/counter at 1AM (local to me) last night without issue (about 12 hours ago). Everything I have here suggests it came down around 7AM when the maintenance was planned to begin. Thanks, West.andrew.g (talk) 18:03, 16 August 2013 (UTC)