User talk:Jrm03063/Source-side reference creation


Response and Suggestions [18 September 2016]

First I’d like to say that I appreciate what you are attempting to do. If I understand this correctly, this attempts to solve a reuse issue that is not natively solved in WeRelate (or in GEDCOM or in other genealogical software that I am familiar with). We can reuse source names by referencing a centralized list, but we cannot reuse citation text that is excerpted from the source. If I understand correctly, you propose to solve this problem by:

  • Marking up the source transcript with references to WeRelate person/family pages
  • Using this markup and some software to automatically select a section of the source transcript to be used as an excerpted citation text
  • Adding this excerpted citation text to the referenced person/family page
  • Periodically updating the citation text on the person/family pages (because WeRelate does not support an active link to a piece of citation text) – this keeps the citation from becoming stale if the original transcript is updated, essentially mimicking the functionality of a centralized reused piece of citation text

Opposition has arisen, as I recall, for 3 reasons:

  • In some cases, there might be considerably better sources for the person/family, and adding this particular source (whatever it might happen to be), could be misleading or confusing (or simply clutter).
  • In some cases, the source has already been manually added to the page, with a manually selected excerpted citation text and possibly comments. If so, there is a strong desire not to overwrite this information.
  • A general wariness of automation in the world of genealogy, where we know that human judgement is extremely valuable.

Here are my thoughts:

  • I have mixed feelings about including every possible source on every page. In a scholarly world, it would be appropriate to include all sources, and to explicitly refute those that are incorrect. However, I do not feel that as amateur genealogists, we should feel obliged to refute every incorrect assertion about every person (think of all the amateur trees out there that have been copied 20 times over, and also disasters like OneWorldTree).
  • On the other hand, I agree that Savage, while old, still shows up in enough amateur trees to give a sense that it is still being used, and is worth acknowledging – even if to refute it. (My mantra is “the best defense against bad data is good data”, which in this case, means that if we refute Savage errors up front, we are less likely to have someone use Savage to update a page with bad data in the future.)
  • Having said that, should Savage (specifically)be added to each person/family page that is identified in the markup of the transcript? Personally, I would have no problem with adding it where it does not already exist, because I know that the transcript has been tagged where it has been shown to be incorrect. Since this tagging exists, I assume you could add the defect note (from the Transcript Talk page) as a note attached to the source citation on the person page. Maybe you are already planning to do that – I did not spot an example like this.
  • In regards to replacing existing Savage citations with ones pulled from the transcript, I agree with others that this is not appropriate. There are many reasons why someone would want to use a different excerpt than the automated excerpt – they might want to manually select which text to include, or they might want to include their own observations.
Consider the following example from something I added to WeRelate today (this is a census transcription, and it occurs to me that your overall solution for reusability of transcript excerpts might come in handy for census transcriptions). I just added this transcription to family page Daniel Boardman and Mary Olds:
Daniel Boardman:
- 1 male aged 60-69 [Daniel]
- 1 female aged 50-59 [Mary]
- 1 male aged 20-29 [likely George or Sidney]
- 1 female aged 20-29 [likely Sarah]
- 1 male aged 15-19 [likely Daniel]
- 2 males aged 10-14 [Franklin and Henry]
- 1 female aged 10-14 [likely Catherine, although she would have been 19 if the birth year on her gravestone is correct; alternately, this might have been a servant and Catherine may have been married or elsewhere]
- 2 females aged 5-9 [likely Emily and Amanda's daughter Frances, who might have been only 4]
- 2 males aged under 5 [likely Amanda's 2 sons, Daniel and George]
The actual census record does not (of course) include any of the info in brackets. So let’s assume that you have transcribed large sections of the 1830 census and marked it up to show that this household refers to the Daniel and Mary (Olds) Boardman family. You have not included the information in brackets, because it is analysis, not transcription. I have added this citation to the family page, along with the text in brackets, because I found it useful in estimating one or more birth years in the family. I certainly don’t want an automated program to overwrite my citation. Nor does it make sense for the automated program to add another copy of the transcript, without my analysis.

So what can be done? I would suggest that you could do the following (these suggestions are meant to initiate a conversation which may lead to the modification of any part of these suggestions):

  • Only add the source and citation text to pages where a reference to the source does not already exist.
Further refinement: To address other issues that have been raised (incorrect tagging in the source transcript, obscure references to individuals, removal of the "no sources" template), I would refine this further, as follows:
  • As a rule, do not automate the process of adding transcript excerpts to pages. Instead, add a (to be developed) template to the person/family page at the time the source transcript is being tagged (when, presumably, the person/family page is already accessed). This allows manual control of when to use the automated excerpt, and also allows the user to tie the source to facts. It would be best if the preview could immediately show the results (that is, resolve the template), but I assume that would require a change to WeRelate (or maybe not - I don't know how sophisticated templates can be). Alternatively, if the templates were resolved in a weekly batch run and the watchers were notified, they could check the citations in a timely manner, and decide to remove the template if it showed obscure references, or correct the source tagging if the wrong person was identified.
I believe that this would achieve the goal of making citation text reusable, without compromising manual control over citations on a given person/family page.
  • Recognizing that this would be a lot of work for Savage, where the tagging has already been done, I would be prepared to accept a one-time automated addition of this new template to pages identified in the transcript that do not already reference Savage - provided it was run in very small batches to begin with, to give people a chance to see and evaluate the impact. If there were a lot of tagging errors, a lot of obscure references, or the job did not run as anticipated and overwrote existing citations, then the job would be stopped and we would revert to manual addition of the new template. (The job should also remove the "no sources" template where it is encountered.)
  • To address the opinion (with which I agree) that lists of children belong on the family page, a second template could be developed that would copy an entire sketch onto a page, and could be used on family pages. This template would have to be added manually. An entire Savage sketch might be overkill, but I'm not sure how else to reliably recognize the end of the relevant section - unless we just want the first X characters and a link to the full sketch.
  • Add it after existing sources (I assume you would anyway).
  • On updates, only update the citation text if it was originally added through automation and has not been manually changed since. There might be a couple of ways of determining this – one would be to tag it on the person/family page as an automated citation and then look through the page’s history to see if the citation has been manually changed; another would be to see if the citation exactly matches the previous version of the transcript excerpt (which you would have to store or regenerate). Either might present performance problems, but that is another topic. (Not just the citation text, but corresponding notes also have to be considered – I haven’t given this enough thought to propose a solution, because we need to have agreement on the basics first.)

I think these 3 things would go a long way towards leveraging the Savage transcript as you would like to, while alleviating some of the concerns of others. There will still be some who do not want to see Savage on every relevant page, and might remove it – but the pages are shared and anyone could decide to add such a reference manually, so it is likely better to demote the reference (to the bottom of the list) and/or refute it than to prevent it from being on the page.

Apart from all the above, I have another concern with your approach, and that is that the algorithm to excerpt the text might not pick the most significant portion of the text. In some cases, particularly in sources other than Savage, a whole paragraph might be relevant. So here is another suggestion:

When you add the excerpted citation onto a person/family page, add a (templated) note such as “This text has been excerpted from a WeRelate transcript. To see this excerpt in context, select here.” (I’m sure this can be improved, but you get the idea.) Then “here” should take you to the start of the Savage sketch. I realize that the Savage transcript does not currently allow jumping to the start of a sketch, but I assume it would not be terribly difficult to add links where the sketch tags exist. Especially with something as cryptic as Savage, being able to jump to the start of a sketch would be a lot more useful than just jumping to the page – in fact, it would seem to me that this would be one of the benefits of having our own transcript.
Including the link would be comparable to including a source on a page without a text citation, but with a link to the relevant external text/page (such as a page of a digitized book). The person would expect to have to follow the link to see the citation. I am seeing this more commonly in WeRelate lately.
Using a templated note would also allow us to find all automatically generated source citations on person/family pages, which would probably come in handy sometime in the future.
If users understood the use of the template, they could remove it when manually replacing the automated source citation, which would prevent the citation from being considered for future automated update.

If this last suggestion is implemented, and if manually added notes are handled appropriately (I don’t profess to know quite what would be appropriate), it is just possible that over time, users will find that this standardized approach meets their needs and many of the automated citations will be left “as is”. But in the meantime, please do not overwrite any existing Savage citations, as this will only lead to opposition to the whole idea, as you have already seen.--DataAnalyst 19:55, 17 September 2016 (UTC)


I've read through the text, trying to understand what Savage is and what solution it would provide for what problem.

What i like about Wikimedia in general, and WeRelate specifically, is suggestions. One suggestion for example is where i find the mother's last name. Then WR suggests at the person page and also at the family page that 'This page can be renamed' (from xxx_unknown to xxx_lastname) and i only need to click it. Most important i feel about suggestions is that suggestions give me, the user, the feeling that i am in control, and that the software is helping me, encouraging me, not forcing me, to improve the quality of the page that i am editing at that moment.

Perhaps suggestions are also valuable when someone edits a page that could benefit from Savage (still not sure what it is) ?

This means i think i understand and value your suggestions, but my proposition would be to not statically update any page. Instead, when nobody views the page, does the page exist? So i would propose the dynamics of 'This page can be changed' type of solution while in *edit* mode.

Thx Ron woepwoep 03:56, 18 September 2016 (UTC)


THANK YOU so VERY MUCH for your thoughtful reply! I know it takes time to put together what you've said - and I deeply appreciate the investment.
I owe you a longer response, but my wife seems to think the weekend includes more than me hacking away on a keyboard! --jrm03063 16:32, 18 September 2016 (UTC)