User:AndrewRT/Metrics

The purpose of this page is to set out some simple proposed metrics for how we can measure and track the success of WeRelate. The intention is that these are listed in order of importance. After a description of the metric follows a rationale and suggestion for how it could be measured or similar current measures. Although this is in my userspace, please feel free to edit with your ideas.

Contents

Primary measures

All metrics would be measured at a point in time and then the measurement repeated periodically to reveal change over time.

Number of Sourced, Dated Person pages

Rationale: As stated at WeRelate:About, the goal is to be the "#1 community website for genealogy". This is taken to mean meaning more people covered to a good standard than any other community website. This site is also based on the concept of a Pando - i.e. a single tree that unites all people interested in the same line. Pandos come into their own when you have a large number of people covered because it increases the chance that researchers come across other people researching the same line. The coverage has to be good quality, because otherwise you cannot correctly merge people or address inconsistencies.

Measurement: Number of Person: pages with at least one Source and at least one date.

Mechanisms: Number of person pages is given at top right corner of this query. Sourced pages might be shown with the following query: [1]

Don't know how this could be filtered to only show sourced or dated pages.

Target: The target could be determined by comparing to the number of people covered on other genealogy websites (e.g. ancestry.com has 4,000,000,000 according to [2], familysearch has 200,000,000 [3], rootsweb WorldConnect 640,000,000 [4], FamilyPedia 127,000 [5]; note many of these could be duplicates)

Comments:

You could also measure number of Family pages, Place pages or Source pages, but ultimately these are indirectly linked to the number of Person pages.

I think we have two different hypotheses (and therefore measurables) here:
If WR covers (i.e., has pages for) a large number of people, then we will unite all people interested in the same line. Not sure the logic flows well on this one.
If WR's coverage is good quality, then we can correct merge duplicates and address inconsistencies.
And yet another implicit hypothesis which perhaps should be more explicit:
If WR's coverage is good quality, then more people will come and contribute (which based on above, earlier hypotheses, leads to WR being the #1 community genealogy site)
Yeah, I don't think this is going to get you what you want. Perhaps there's a way to count non-blank fields on a person page; that assumes the following hypothesis:
The more fields are filled in, the more likely the quality is high. That's a pretty iffy hypothesis.
How would we recognize quality coverage? Is it ratio of REF tags (or otherwise footnoted material) to all text in a given profile page?
Jillaine 18:15, 16 December 2012 (EST)
I've reworded my text in "target" as that may be confusing. As I've said above, my hypothesis is that the more sources and dates are filled in the easier it would be to identify and correct errors and merge duplicates. I think that's a reliable hypothesis. I've avoided the "quality" measures as I agree they would be hard to measure - but please feel free to add one if you can think of one that would work. AndrewRT 18:36, 16 December 2012 (EST)
Jillaine 18:15, 16 December 2012 (EST)
My main hypothesis is as above: I see the "#1 community website" as meaning more good quality people covered than other community websites. A secondary hypothesis is that the benefit of pandos only start to emerge when you have large numbers of people covered. The rule of thumb is - how likely are you to come across a distant cousin who has already researched your line further than you have? For me, this has happened many times on ancestry.com (which covers ca. 500,000,000 people), but never on WeRelate (which has 2,300,000). You fourth hypothesis could work both ways: a high bar for quality could as easily put people off contributing. AndrewRT 18:36, 16 December 2012 (EST)
So... given what you've just said, how, if at all, does that change your thinking about what you'd measure? Jillaine 18:57, 16 December 2012 (EST)
I've changed the priority and clarified the rationale set out above. AndrewRT 20:06, 16 December 2012 (EST)

Number of active users

Rationale: As stated at WeRelate:About, "WeRelate.org is about social networking, sharing research, and collaboration". The goal is to be the "#1 community website for genealogy". This is only possible if there is a large number of individuals actively contributing to research on WeRelate.

a) What does success look like? If WR was the #1 community website for genealogy, what would that look like? Answering this will help clarify what needs to be measured.
b) Hypothesis that emerges from above: "If a large number of individuals actively contribute to research on WR, then WR will be the #1 community website for genealogy." Check the hypothesis; does it accurately reflect what you want to convey?
Jillaine 18:15, 16 December 2012 (EST)
Thanks for the comments. Broadly yes, but I see my 'hypothesis' slightly different. I see the "#1 community website" as meaning more good quality people covered than other community websites. Number of users is a time-leading indicator of this - i.e. if you get the highest number of people involved, you will end up with the largest number of people covered. AndrewRT 18:25, 16 December 2012 (EST)
Andrew, this is exactly why I do this work-- to help people articulate their hypotheses. I propose a hypothesis based on what I see in the words, then check it with its initiator, who then-- as you did-- clarifies their thinking/communication.
So then, taking your hypothesis to the next level, are you saying that "If WR has the largest number of people covered, it will become the #1 community genealogy website?" Jillaine 18:55, 16 December 2012 (EST)

Measurement: The number of registered users (currently 38,968) can be misleading as it includes people who came along, are active for a short time and then drop off. The number of people who do 5 or more edits in a given month is more useful as it shows active users. The Wikimedia Foundation (which runs Wikipedia) uses this key metric as you can see from their reportcard.

Mechanisms: It could be calculated from Special:Recentchanges (needs about last 20,000 edits) or Special:Newpages and extracting the Usernames.

Target: [unsure] Could a target be based on a measure of the number of people active in genealogy in the world, or number of active users for other websites (e.g. ancestry.com, familysearch, wikipedia)?

New user retention

Rationale: The site will only grow if we are able to recruit and retain new contributors. Analysis for Wikipedia has shows that this is a particular issue for wikis, particularly after they pass through their initial growth stage.

Hypothesis: Only if we recruit and retain new contributors, then will the site grow
Hypothesis imbedded above: If the site grows, then WR will become the #1 community genealogy web site.
Jillaine 18:15, 16 December 2012 (EST)
I would go slightly further than this, in that I would say (following #1 above), target number of users is significantly higher than current number of users and therefore the only way to reach target #1 is through more targeted measurement of #2. AndrewRT 18:27, 16 December 2012 (EST)

Measurement: The number of new users who are still doing at least one edit per month 12 months after their first edit. This is the measure that Wikipedia uses. Given that WeRelate is younger, there could be a case for reducing the time period to 6 months and this would also show the benefits of any activities done to improve retention.

Mechanisms: Not sure how you work out when someone joins without going through the full edit log.

Target: Wikipedia has around 10%; inevitably with a wiki you will get a large proportion of "drive bys" who try and then leave but I think 50% retention would be a real positive sign of success.

Page views

Rationale: This is a good, indirect measure of the site's usefulness, influence and quality. People seeing the site are more likely to convert into new users. It also measures sites linking to werelate and search engine results.

Hypothesis: If more people view the site, then they will convert to new users (... and lead to WR being the #1 community genealogy site...) Jillaine 18:15, 16 December 2012 (EST)
No, that isn't my hypothesis. My hypothesis is that if more people see the page this is an indication of the value of the information contained in the page. The focus is on "readers" - the increase in contributors is a secondary benefit. I think it's important that we do focus on readers at some stage - recognise that we are trying to create a product that is of value to people visiting the site, not just providing a repository for contributors to store their information. AndrewRT 18:40, 16 December 2012 (EST)
Ah, thanks for the clarification. I might word the hypothesis the other way around: "If the information contained on the page is high quality, then more people will visit the site...".
But yes, there's another piece here about readership (vs participation). What I think I'm understanding from you is that the value of WR (and hence its ability to achieve #1 status) is tied to two things:
1) quantity and quality of participation
2) quantity of readership
Each of those has two very different measurables.
Thanks for letting me play here. Jillaine 19:02, 16 December 2012 (EST)

Measurement: Unique page views per month

Mechanisms: Site administrators presumably would have access to this.

Target: [unsure] Could be linked to the number of page views for other genealogy websites (e.g. ancestry.com, familysearch, rootsweb)?

Seems like page views and server stats have become far more complicated and complex in recent years. Can WR distinguish between humans and "spiders" that "view" pages? Over the more recent years, I've come to no longer rely much of such stats. Just not sure they'll provide the value you need to measure success. Jillaine 18:15, 16 December 2012 (EST)
I agree we would need to distinguish between the two but I believe some tools can do this. I'm afraid it's out of my field so I would leave others to comment. AndrewRT 18:40, 16 December 2012 (EST)
Yeah, I used to be in that field (online communications), but veered away in recent years. I haven't kept up on that technology. Jillaine 19:02, 16 December 2012 (EST)

Number of edits

Rationale: Number of edits is a simple measure that indicates activity on the site. Although not readily comparable to other sources, the change over time indicates whether the site is growing or tailing off.

Measurement: Edits per month, exclusive of GEDCOM uploads, to content pages (Person, Family, Source, Place).

Mechanisms: Can be derived from a query on Special:Recentchanges. Not sure how GEDCOM edits would be excluded.

Target: Could be based on current levels plus accelerating growth?

Discussion:

  • Number of edits? Number of edits exclusive of GEDCOM uploads? Last month, last 3 months, etc. I would think year over year these numbers should be increasing somewhat. : --jrm03063 19:17, 16 December 2012 (EST)
I agree this is a good measure of activity and relatively easy to generate which is important. However, I don't think it's good as an ultimate measure, because, to borrow a cliche, we could just be digging holes and then covering them up. in our case, we could be making edits and then reverting them, or uploading GEDCOMs and then making lots of edits fixing the mistakes. That's why I'd propose having it as a primary measure but at the bottom of the list. AndrewRT 19:59, 16 December 2012 (EST)

Other measures

Other measures that could be added or integrated into the above:

  • Number of new users per month
  • Site usability [how measured?]
  • Site usefulness [how measured?]
  • Quality, e.g. average sources per person/family, dates per person/family, better consistency
  • Connectivity

Comments:

Maybe a couple of other metrics...
  • Connectivity. This is expensive to compute - but would be interesting. Of our 2M+ person pages, what's the largest clique? The next? How many cliques to get to 90% of all people?
  • Net changes in metrics over time?
--jrm03063 19:17, 16 December 2012 (EST)

Process

If this is to meaningful it would to move from just the thoughts of a few people to be something that is adopted by the site as a whole. I would appreciate any thoughts on what process that should follow? AndrewRT 18:46, 16 December 2012 (EST)

We should probably get some notion of success as it might be defined by our gracious hosts at Allen County. Is it enough to have an active wiki forum for genealogy - to go with their existing genealogy offerings? Or do they need to see some level of use/influence/etc.?
--jrm03063 19:17, 16 December 2012 (EST)