User talk:J.smith/pseudocode

overview

Yes I think the concept is sound, but theres one key aspect missing from the code currently. As is it checks to see if the page contains the string, but doesn't know if it was added in that exact edit. As a result, this code will ALWAYS run it's max length (which isn't too bad at log(n) but still). If it checked the diff and the page it would know if it found the right one first try and get to stop. I might be making an issue of something that isn't an issue though. Vicarious 06:04, 29 December 2006 (UTC)[reply]

Without loading DIFFs insteed of OLDIDs there would be no way to know if that was the first version or not... and I don't think there is a way to load diffs from Query.php.
I was playing with a second idea... running instead of at 50%... running it at 10% or 90% depending on user input. Also, I was planning on adding a "choke" - An option to cut off how many oldids to load. (if you know it's in the last 200 edits that reduces the number of checks by a ton.) ---J.S (T/C) 06:12, 29 December 2006 (UTC)[reply]
This search will be running on a relatively small array. Logarithmic searching is quick on trillions of items, running it on 10,000 or so items should be very fast. The computing power required for this alogrithm is negligble relative to searching the page for the string, and that will only need to be done 10-20 times. Also, specifying the number of edits to check isn't as helpful as it would seem. Searching the last 200 edits takes 8 queries (256), searching 10,000 takes 14 (16,384). I think any benefit to searchtime would be offset by the time it takes the user to type the number. Vicarious 06:49, 29 December 2006 (UTC)[reply]
Well, one thing I'm concerned about is the actual download time from a slow wikipedia server. If the page is 100k the run-time at a busy time of day might be a few minutes. Then again, I'm sure most of the time we'll only be dealing with 1000 edits or less. ---J.S (T/C) 07:03, 29 December 2006 (UTC)[reply]

final output

I just ran through this code on paper with a sample array of length 12 and 8 (and the answer was in 3). With 12 it stopped when pages left was 1 and gave a result of 2. With 8 it stopped when pagesleft was 2 and the result was 3. By changing loop so it goes until pagesleft is 1 the array of 8 looped once more and ended at 2 also. Basically, we've made a function that finds the last page that does NOT contain the string. Vicarious 06:28, 29 December 2006 (UTC)[reply]

Hmmm interesting. If we changed it from +1 to +2 it would return the first oldID then? With both the last oldID and first oldID we could generate a diffLink for the user of the software. ---J.S (T/C) 06:31, 29 December 2006 (UTC)[reply]
Ugh, I spoke too soon, currently the search will unpredictably find either the first page with it or the last page without it, but that's easy to fix, give me a sec. Vicarious 06:34, 29 December 2006 (UTC)[reply]
There, now it always returns the pagenumber of the first page containing the searchstring. To make a diff you'd simply compare it to pagenumber-1 Vicarious 06:39, 29 December 2006 (UTC)[reply]
How are you rounding in your tests? Nearest or always down? ---J.S (T/C) 06:45, 29 December 2006 (UTC)[reply]
I assumed intiger math which cuts the decimal, so I've been flooring everything. Vicarious 06:49, 29 December 2006 (UTC)[reply]
Perfect. You read my C++ mind. I'll let you know when I have something like actual code writen for this... (sometime tommorow). ---J.S (T/C) 06:59, 29 December 2006 (UTC)[reply]

tiny code

Although this drastically reduces readability, it occured to me how small this code could be. If CheckPage returns an int of positive or negative 1 instead of a bool then this:

	PageToCheck = DiffList[PageNumberToCheck, 0]
	IsFound = CheckPage(PageToCheck, SearchString)

	If (IsFound = TRUE)			//If "checkpage" resulted in a "positive"
		PageNumberToCheck = PageNumberToCheck - (PageLeft/2)
	Else			
		PageNumberToCheck = PageNumberToCheck + (PagesLeft/2)

could be turned into:

	PageNumberToCheck += (CheckPage(DiffList[PageNumberToCheck, 0], SearchString)*(PageLeft/4))

This is completely irrelavent, but still makes me smile. Vicarious 07:13, 29 December 2006 (UTC)[reply]

"page"

Perhaps you should use "revision" instead of "page" where relevant. -- Jmax- 09:20, 29 December 2006 (UTC)[reply]

Sure. It would make the intent a bit clearer. ---J.S (T/C) 16:36, 29 December 2006 (UTC)[reply]