User talk:AndrewRT/Metrics


Musings on page quality [4 October 2015]

I've been musing on metrics and my thoughts may intersect with yours, especially with regards to your section about "Number of Sourced, Dated Person pages". Comparing WeRelate to other web sites on number of people pages represents a tilted playing field. From what I can see many of those web sites are full of unsourced assertions, partial/incorrect data and numerous duplicates. So a raw page/person count is really meaningless, it's like comparing our cooking skills based on an eating contest. In my mind, the strength of WeRelate is that it represents the [The WikiWay] -- that by collective effort we are gradually improving the quality of genealogical information.

So, we should have a metric which measures quality, your section "Number of Sourced, Dated Person pages" represented a good start, but I thought of a more comprehensive measurement: a quality score for each page. I put together some rudimentary code to go over the WeRelate data and generate such a thing for each person page. My initial calculation is something like this: +1 for each event, +1 for each source ref on that event, +1 for each source, +1 for referencing a Source: page (rather than MySource), +1 for text in the source, +1 for each image, +1 for each 256 bytes of text. The specifics of that formula are up for debate, but I think you can see the intent. I'm still working the kinks out of my script, so it hasn't gotten all the way through the data yet, but, thus far, the average score for person pages is about 4.1. But clearly something needs tuning, as this page gets the highest score Person:Elhanen Wakefield (1).

Anyway, I hope that all made sense. --Trentf 12:59, 3 October 2015 (UTC)

Hi Trentf. Thanks for your comments. As you can see, my enthusiasm for WeRelate has waned somewhat - I have sadly concluded that the culture here means it will never be a success. For me, the success of both wikis and pandos are down to volume (however measured) and volume isn't sufficiently valued here.
Regarding your suggestion - yes it makes a certain sense but the more complicated you make the measure the harder it becomes to compare different sites. That ultimately is key: werelate has to define its USP by saying "we are the best at x". It's no good if "x" is something you can measure for WeRelate but can't for (say WikiTree). AndrewRT 17:06, 3 October 2015 (UTC)
Thanks for your feedback. I figured it was a weird idea resulting from my pervasive ignorance of most of the issues involved (aside from the programming issues) I've spent the last couple of months generating metrics at work so perhaps I had a hold of Maslow's hammer :) Anyway, I'm having fun messing around with code to gather various information from the WeRelate dump file. I'll continue to do so, eventually it will show up on GitHub. Thanks again. --Trentf 00:39, 5 October 2015 (UTC)