WeRelate talk:Suggestions/Content Language Neutrality via Wikipedia/Wikidata

Below is the subsequent discussion which appeared on the WaterCooler.... --jrm03063 18:26, 6 January 2014 (UTC)

I'm looking at wikidata and reaching out to some folks over there. If anyone has contacts over there, would appreciate an introduction. Thanks! --jrm03063 14:21, 3 December 2013 (UTC)

Earlier discussions on this topic indicated that the different wikipedias are different.... So not interchangeable like that. --Jrich 18:51, 3 December 2013 (UTC)
Yes, they are different. It's also not certain that any given English language WP page will have corresponding biographies in other versions of WP - as well as which other language versions may be present. Finally, it's also true that different versions may be written quite differently - and in rare cases - will offer different facts.
Never the less, I believe that - when present - they are close enough for the purpose at hand. --jrm03063 19:00, 3 December 2013 (UTC)

I want to expand a little on why I think doing this is important, and not be seeming to simply blow off the concerns expressed above. The rationale as I would presently understand it, came from discussions I had w/werebear. His observation was that we're doing ourselves a disservice as far as reaching different language populations. For example, it would be natural to expect that people from France will be most comfortable seeing the French wikipedia and (presumably) that version will have the best coverage of French Aristocracy and Nobility. However, when folks who aren't native English speakers come to WeRelate - they'll find us to be relatively unfriendly for purposes of non-english speakers. This is unfortunate and - for purposes of famous people - possibly unnecessary.

I would expect that, in general, the French WP is going to have very good equivalence coverage (relative to English WP) for the people of greatest interest to folks from France. A mere software transformation of our database (along with obtaining the other language versions of WP) could almost instantly make our content (related to famous French folks) accessible and welcoming. Presuming that they follow conventions, and that we develop procedures that are reciprocal - when folks from France associate a French WP page with a WR page - those of us who are stuck with English have a chance of seeing a page we'll understand too. Likewise for other languages.

It's certainly true that there can be an arbitrary number of discrepancies between pages - associated with a single individual or place - when that information is retrieved from different language versions of Wikipedia. But on the whole, those discrepancies should be minimal and - over time - they should have a tendency to diminish. In the meantime, we can leverage a large English WP time investment - to become - essentially overnight - vastly more welcoming to speakers of other languages. --jrm03063 22:10, 3 December 2013 (UTC)

That the different wikipedias give different facts, and that due to the proposed mechanism, one might be including a version of text that they have not reviewed, would not recommend the use of wikipedia as a source in the first place. To accept different languages blindly without knowing what is going to show up on the page, given its rather mediocre track record with regard to genealogy facts, would not seem very wise. --Jrich 03:56, 4 December 2013 (UTC)
While I have a very strong interest in making maximum use of open scholarship, such as the various language forms of Wikipedia, I have never claimed that they represent a preferred source for the strictest genealogical research. They are a very useful starting point for those new to the material and offer much that is of general interest. To the extent that they have hard genealogical research value - it may be as an unambiguous designation of "Who" we are talking about. In a sense - a superior AFN.
Moving toward a focus on wikidata (where possible) - may actually do a better job stressing the unambiguous identification aspect of this. For example - instead of claiming Wikipedia as a source - it might be preferable (when available) to indicate the wikidata object id as a post-mortem fact (rather like LDS events, but absent any faith bias). Inclusion of a language-specific wiki extract at the top of any page serves to make the page more immediately accessible to different populations - but only as a matter of general interest.
While oblique to the discussion at hand, I see two justifications for use of various language forms of WP as an actual WeRelate "source entry" (which isn't the same in my mind as a strict genealogical source).
  • Sometimes, we're looking at a page or person for which we don't have anything else - or we're trying to cover large expanses of genealogy on an interim basis. It represent a better start by far - than nothing at all (which was the case in the medieval morass of our early days).
  • Preservation of the connection to open-source content in exported GEDCOMs. There is no harm in having a "source entry" that isn't referred to by any facts (which is what you would have in a great page - with good sources for the genealogical facts). Such material is bibliographic by definition - and is more apt to be maintained in subsequent GEDCOM dump->load->dump cycles, through various software, if it exists as a structured record and not simply as free-form text in the narrative. --jrm03063 14:38, 4 December 2013 (UTC)
I never implied you said that wikipedia was a preferred source, I said its use as a source of any kind was NOT recommended by the several foreign language versions that may say different things (and which one may not realize when one reviews and cites the version in one's native language).
I have a very strong interest in accurate genealogy. For people that care about accuracy, knowing the basis for a fact is as important as knowing the fact. Without the basis, there is no fact, there is something that is probably a copying error, or a misinterpretation, or naive name matching, or assumption presented as fact, etc. I believe it is wrong to say that the medieval morass was based on "nothing at all". I'm sure all that medieval data was all dutifully copied from somewhere. It wasn't created out of the blue. Instead, the problem is that the people copied it without knowing if it had any validity or not, i.e., without knowing how the facts were known. They just assumed the author of the data saw something that gave that data, therefore it must be correct. In reality, you can find multiple facts for nearly every person on the Internet, and after one or two generations of picking and choosing which facts you like the best (much less after 15 or 20 generations for medieval stuff), it is no surprise that it bears no resemblance to reality. Undoubtedly the medieval morass was cleaned up by referring to a book of some authority, which was cited as a source, of course.
Based on recent page creations, it may be worthwhile to point out that this same reliance on consulting and citing sources applies to more modern genealogy, as well, so we don't create a morass there, too.
The need to know the basis for facts is a large part of my antipathy towards wikipedia as a source. When it is correct, the item of interest and usefulness to the reader of a WeRelate page is going to be the source it cites. This is what will get the reader closer to the basis of the fact. When it not correct, which it is with some frequency, it can be misleading and harmful. Wikipedia is not a genealogy website, and it will not view discovering the basis of facts as part of its mission. Instead, its authors, probably untrained in genealogy, will merely relay facts they find in some other source. If it is a good wikipedia page, they might happen to choose a good source, instead of just reaching for the handiest one, and citing wikipedia as the source will probably result in ignoring the quality source that is the real source of the data. If it is not a good wikipedia page, not even that much. --Jrich 19:02, 4 December 2013 (UTC)
The risks you observe in use of any version of Wikipedia are real, but I think the community has already judged them relative to the perceived benefits. For the instant matter - risk in the use of different language versions of WP - there is a similar balance to be struck. What is the risk of making semi-automatic use of different language versions of WP extracts - compared against the risks of discouraging participation by folks who are not comfortable speakers of English? The former seems pretty modest - the latter pretty horrible. There are also some pretty big risks in just letting things move forward on their current relatively sluggish path. --jrm03063 01:13, 5 December 2013 (UTC)

I am probably misunderstanding the discussion here, but I feel I should add my two cents since a question I asked is tied up with this. My question was something like "Is there a consensus on Werelate on which form of a name should go in the Name (rather than Alt Name) space, where the person in question has variant forms of his or her name in different languages?" I noted that Englished versions of the names of French aristocrats would probably discourage potential French-speaking contributors, for example. I haven't given as much thought to the Personal History section, but I guess the same issue arises. Should not the personal history be written in French, if possible?

Just wanted to double-check the answers to these questions, before (ideally without) getting into the discussion about the implications of the unreliability of pubicly edited sites like Wikipedia and Werelate.

BTW, thanks jrm03063 for taking the trouble to come up with some ideas to address my concerns. --Werebear 02:47, 5 December 2013 (UTC)

You're welcome.
I would also add that I understand the preferred form of a name to be the one closest to what that person would have used (or at least been formally known by) in their lifetime, and in their native language. Further, that post-mortem translations into other languages can (and should) probably be omitted. If there's evidence for use of different language name forms during the individuals lifetime - THEN - alt forms are warranted. Mind you - we're so far from following this practice it's just pathetic - but it's what I understand to be preferred if we should ever get there. Use of a translation is acceptable as a starting point, in preference to having nothing. I don't know that we ever established any of this as formal policy - but I don't think any of this was considered to be controversial. Others may want to mull this - the oversight committee may contemplate whether a formal policy statement is appropriate (maybe there is one - and I've just never seen it).
I was contemplating Werebear's concerns that we're not doing enough to encourage contributions by other language speakers. I've always considered contributions in any language legitimate - but there's a heavy practical English bias that other language speakers might not get past. I tried to imagine ways that we might do something a little more dramatic to reach out to such folks, and started to mentally sketch out the process I describe above for "WP Pages".
I do recall that we once discussed that we didn't want to have different language versions of WeRelate - and there have been numerous observations that different language versions of WP may have superior content - or otherwise be preferred - compared to the English WP - but that was all in the context of pretty informal discussions (and subject to potential defects in my memory!). --jrm03063 15:23, 5 December 2013 (UTC)
On names, I am not sure whether it should be preferable to use the name the person used or was widely known by in his or her lifetime, or, at least for prominent people, the most generally accepted form of the name currently used in scholarship in the modern language most closely associated with the person. I guess there is something to be said for either, and both bring problems. Yesterday, I changed the primary name for John I of Aragon to "Juan I de Aragón" (Spanish), but why not "Chuan" (Aragonese), "Joan" (Catalan), or whatever the medieval Navarro-Aragonese version was? Or even Latin, if we are considering what form was used in writing at the time.--Werebear 17:58, 5 December 2013 (UTC)
I guess I shouldn't forget Occitan...--Werebear 18:07, 5 December 2013 (UTC)