WeRelate talk:Current Source Projects


Connection to Past Source Projects [10 February 2011]

Dallan, just curious how you envision that this project will intertwine with or differ from past source project discussions at WeRelate talk:Source Committee and WeRelate talk:Source review. --BobC 01:18, 10 February 2011 (EST)

I'm obviously not Dallan, but I think the idea is to link some WeRelate people and some new users from Family Search who are working on updating and "curating" source pages to a joint project discussion page. As I've said before, I've been working for the last year on reviewing Maine related sources. This will work as a cross-fertilization conversation page, and will hopefully build on our past decisions (rather than rehashing them). I think this will translate back into some improved help pages and WeRelate source pages.

--Brenda (kennebec1) 07:57, 10 February 2011 (EST)

Brenda covered it all. I thought we should start a new page rather than continue on the old pages, since they're somewhat dated now.--Dallan 11:26, 10 February 2011 (EST)

Continuation of conversation on source navigation/categories from main page [17 February 2011]

Dallan, when you propose to remove categories, to be clear, you mean automatically generated ones, right? So we would no longer have categories like Category:Smith surname or Category:Church records? Since one of MediaWiki's features is the ability to categorize, we're going to have some people trying to categorize pages regardless, and we still won't have an organizational scheme - it will be up to each individual to do as they wish. Right now, there have been about 7 users who have either edited a page to add a category, or edited a category to add a parent in the last week. --Jennifer (JBS66) 16:28, 16 February 2011 (EST)

Yes.--Dallan 17:30, 16 February 2011 (EST)

Regarding whether we should do search-based or category-based navigation, in the interest of limited development time, I'd like to implement one solution, not both. Either

  1. I can enhance search so that it works better for navigation, allowing more results per page and displaying them in a more-compact form for example, and add search-templates to the bottom of source, place, person, family, image, etc. pages so that we no longer need automatically-generated categories. In search-based navigation, the facets on the left of the search screen would be take the place of sub-categories. Search-based navigation has the advantage of displaying more information about a page if we want -- the birth and death years for example; whereas category-based navigation displays only the page title. The one issue that I can think of is that there are too many surnames for the facets. So if you're looking at the Place page for Cook, Illinois, United States for example and click on the "People in Cook, Illinois, United States" search-link, you'd see a list of all people, but you wouldn't see navigation facets for individual surnames (like Jones or Smith) because there would be too many of them. Category-based navigation would show you a list of surname-based categories: Jones in Cook, Illinois, United States, Smith in Cook, Illinois, United States for example.
  2. I can enhance categories so they work better.

It will probably take the same amount of time either way. I think it's more a question of which approach gets us closer to what we want.--Dallan 17:30, 16 February 2011 (EST)

Although it may be an either/or for development time, I'm not sure it is necessary to eliminate one or the other. As noted above, if we decide to enhance search vs enhancing auto categorization, we will still have people seeking to categorize, and will still need to use the category outline we have now (along with modifications) to manage that in some way. If we decide to use enhanced categorization, search isn't going to go away (nor will it stop being one of those items people will have questions about, at least, if not requests for enhancements).
As I noted on the project page, I think search is where people intuitively start in their efforts to find and organize the information they need. And I like the search template enhancement that Jennifer created, as well as the suggestions you had for me regarding title sorts with inexact searches. Search improvements and interfaces will have the most immediate user impact, I suspect, and as I said before, tend to be the most flexible in responding to the vastly different ways people think about and approach information.
But overall, categorization really is useful, even in the broad, incomplete and sometimes chaotic ways we have it organized now. If categorization automation in the broad sense had to wait, I think it could. But I would be sad to see the automation to date disappear; taxonomy is an important part of understanding and organizing information. Even if most users never look beyond what shows up at the bottom of the page (automatically), we here in the background/admin side really do find taxonomy helpful in creating a common understanding of what we're doing (i.e. what gets included or excluded, how places and people relate to each other).
So, the last thing you said, Dallan, is "which approach gets us closer to what we want." What do we want? What's the immediate goal?
To the extent this conversation is an outgrowth of the source review project, sources generally (as in a library) like to be categorized, and those of us who work with sources tend to be quite obliging....
But if this conversation is about the users experience, I tend to think search enhancements and alternate search methodologies will have the most impact.
After all, people and places don't need that much categorization, beyond their basic definition (i.e. people = surnames and Places require the hierarchy of geographic names, which luckily for us has mostly already been defined by others). I'm not sure we have to undo that basic categorization work just because we aren't quite sure how topics and areas of interest (Sources) should be intertwined with People and Places.--Brenda (kennebec1) 21:02, 16 February 2011 (EST)

The suggestions for enhancing search-based navigation seem unnecessary to me. You have already improved the search functionality, I don't think we're asking for additions beyond that.

  1. Adding a navigational template to Place pages only. The original idea was to better tie together Place and Source pages and provide an easy, visual access to possible sources for a place. Since this is a template, it can be edited in one place as our needs change (ie. adding or removing a source subject). As I see it, this involves automatically adding the template to all place pages. What is the time/difficulty level to accomplish this?
  2. Tweak the category names for automatically-created categories. The goal is to no longer dump all pages directly into their parent category, but put them into themed "buckets". So, church records for Maine will be in a different "bucket" then Surnames in Maine.
  3. Fix the bug for categorizing renamed place pages
  4. Consider automating the surname pages only ie: Smith surname gets put into the Surnames category. This will help to reduce the number of wanted categories. We may also want to consider adding the TOC template to these.
  5. Write clear guidelines for the structure and proper use of categories.
  6. Decide whether we want the categories to be visible or hidden (as they are now).
  7. Perhaps we could not create a Surname in Place category when the place is red-linked. This would eliminate quite a bit of "junk" in the wanted categories, like Category:Merle in WI in St. Matthew's Lutheran Church or Category:Boehm in 1217.

Beyond this, I would not suggest automating additional processes at this point. This is an evolving process, one that will need to be molded and adapted over time. So, development time wise, you're not needing to change how search works - but adding additional category and nav. box functionality. --Jennifer (JBS66) 07:26, 17 February 2011 (EST)

I really like the visual format! It's nice to have something concrete. Two questions:

  1. What about sources that contain both surnames and places? Do we want to have "Sources for Smith Surname in Place" categories? Maybe not. Maybe it would be best to just put those sources in place-oriented categories and in the "Sources for Smith Surname" category without trying to create categories that combine the two.
  2. Earlier you had mentioned creating a "Sources of Place" category as a subset of the Place category, and then putting the "Church records of Place" and "Vital records of Place" categories as sub-categories of that. That seemed good to me, since the Place category could then have sub-categories for various namespaces: sources, repositories, images, people (surnames), etc.
I uploaded a new version of the chart to include Sources of Place subcats. --Jennifer (JBS66) 15:45, 17 February 2011 (EST)

Regarding the question of search vs categories, here's why I think they overlap. Suppose we have a category for the place, and that category has links for "Church records of Place", "Vital Records of Place", etc. How is this different from a search box at the bottom of the source with links that will end up showing the same sources but in a search results list instead of a category list? So why have both? If we're adding search-links because of some deficiency in the category-based approach, then let's either fix the category approach or adopt the search-links approach for navigation.

I don't think that we need to eliminate one or the other. Search is about as good as it's going to get for sources at this point. But we're saying that browsing/navigating sources could be better. The question is what do we want to rely upon for navigation: custom search-links, or categories. I believe that any list of sources that you can think of a category for, I can come up with a search-link for that will show you the same results but optionally with more information. Certainly all of the boxes in Jennifer's visual format (which I like very much) could be implemented as search-result lists. We could add an option to search so that search results from bottom-of-page search-links looked exactly like the category page format if you like that format; for example, I could move the subject facets from the left-hand side to the top of the page.

Similarly, the search-links you mention could be replaced with links to the appropriate category pages instead, or a link to the category for the place (which we already have) and the category page could have sub-category links to the subject-based categories. Since we already have a link to the category for the place, and we're talking about adding subject-based sources for the place as sub-categories to that page, why do we feel that we also need bottom-of-page links to search results that will show exactly the same set of sources? I figure that there must be something about the category pages that isn't working for you, which is why I bring up the question.

I think this question is even more interesting when you think about the surname-in-place categories. Should they be implemented as real category pages or as search-result links for that surname in that place?

Not automating category creation and maintenance doesn't work for me. I don't want to require admins to create and maintain categories. I'd rather have a few inappropriate categories to fix than many missing/mis-categorized ones. It seems that once we have a well-defined browse structure, we can automate its maintenance.

Let's continue developing the browse structure. We can also talk about whether we want to implement it using search result lists or system-maintained category pages.--Dallan 12:42, 17 February 2011 (EST)

Search-links navigation example [18 February 2011]

Here's an example of what I'm talking about with search-links navigation.

In the pages referenced below, you can click on links in the upper part of the page to navigate to super- and sub-categories. For this demonstration you can only click on the blue links. All links (including the red links) would be implemented in a real system; I just got tired of creating mocked-up wiki pages for this demonstration.

The pages you see would not be category pages; they would be generated dynamically by the search system. So nothing to create, nothing to maintain.

  • A Person page for John Smith born in Illinois would have this bottom-of-page search-link in place of the automatically-generated smith-in-illinois category:

Smith in Illinois, United States

  • A Source page for a death index in Illinois would have this bottom-of-page search-link in place of the automatically-generated vital-records-in-illinois category:

Vital Records in Illinois, United States

  • The Place page for Illinois would have this bottom-of-page search-link in place of the automatically-generated illinois category:

Illinois, United States

A First step [10 June 2011]

Is it possible for us to look at what is involved in taking the first step towards better organizing categories? A good place to start may be with sources. They currently clutter up the main place categories and make it difficult, I believe, for users to see the benefits of a category structure.

So... how involved would it be to change the automatically-generated categories for Source pages:

  1. to a non-hidden category. This would allow users to change them as necessary
  2. to reflect a Subject of/in Place hierarchy. These may be refined further in the future, as Brenda is doing for Maine, but it's a good starting point towards better organization.

I can work on a proposed scheme if this is feasible. --Jennifer (JBS66) 08:57, 26 April 2011 (EDT)

My current thinking is to replace the automated categories with the approach described in the previous section. I believe you'll get everything that the automated categories give you, and more, and we won't need to keep babysitting them to keep them up to date.--Dallan 10:51, 28 April 2011 (EDT)
I'm all in favor of automating things as much as possible, but there is a major (IMO) drawback to the automated categories as shown in the previous section. Sources, as they are currently, allow for only one subject. Unfortunately, some sources, such as this one could (and should) have multiple subjects. (In this example, the current subject of Cemetery records as well as Military records.) Unless we can add multiple subjects to a sources (which I think we should do anyway), the automated system is going to omit a lot of relevant sources. -- Amy (Ajcrow) 11:09, 28 April 2011 (EDT)
If you're in favor of doing it anyway, then let's do it :-). Adding multiple subjects to a source is a fairly easy change, much easier than it would be to try to maintain the automatically-generated categories as place pages get changed. One other thing to consider: We have 3 sub-subjects that show up only if certain subjects are selected: Ethnicity, Church, and Occupation. Once we allow multiple subjects to be selected, it would simplify the UI if we did away with these sub-subjects. I don't know how often they're used; you can't search on them for example. How about making the subject drop-down multi-select, and removing the sub-subjects?--Dallan 11:22, 28 April 2011 (EDT)
I'm sorry, I'm not following... now we're talking about removing all automated categories, not just the surname ones? In the last two paragraphs we're talking about source categories, different searching, and the difficulty with place page categories. What exactly are you referring to when you say "then let's just do it"? --Jennifer (JBS66) 11:31, 28 April 2011 (EDT)
I think (hope!) Dallan means let's go ahead and allow multiple subjects for a source. -- Amy (Ajcrow) 11:35, 28 April 2011 (EDT)
Well, I agree with allowing multiple subjects for sources. The sub-subjects are important in one example I can think of. The kerkelijke (church records) for the Netherlands. The page titles were imported with too little detail, and the only way to know which records they really are (without going to FHLC) is the sub-subject field (Dutch reformed, Mennonite, Catholic, etc). Any way we can record that information in the text field before we dispose of it? --Jennifer (JBS66) 11:52, 28 April 2011 (EDT)

I can do that. As for the "let's just do it", yes, I was referring to multiple subjects.

I don't plan to make any other enhancements to the automatically-generated categories, but I don't plan to replace them until we have an agreed-upon alternative. It's just not feasible to maintain them as places are changed. Until we come up with a way to navigate without using automatically-generated categories, the automatically-generated categories will stay as they are. Once we do, then we'll switch over to that and I'll remove the automatically-generated categories.--Dallan 19:01, 28 April 2011 (EDT)

So this means that a valuable navigational tool that is commonly used on wikis is going to be discarded... that would be unfortunate. Projects currently underway such as Category:Veterans, Category:Sources of Maine, United States, and Category:Cemeteries of Maine, United States would lose their place category counterparts.
One suggestion is for us to put thought into a more usable category structure, recategorize the pages automatically, and make these categories unhidden. Then, users can take over much of the responsibility from there. If a few of the admins could utilize AWB, then perhaps we could take on the tasks of correcting categories when place hierarchies change. --Jennifer (JBS66) 10:13, 29 April 2011 (EDT)
No, it means that automatically-generated categories (the surname-in-place categories) are going to be represented using an approach that's more flexible. Why waste what will become hundreds of hours of admin time maintaining something that can be maintained automatically using a different approach? I'd much rather see the time spent improving help pages and creating tutorials. Human-generated categories will continue as-is. In fact they'll get more visibility because they won't be lost among the thousands of automatically-generated categories.
What do you think you will lose with the search-based categories?--Dallan 14:33, 3 May 2011 (EDT)
I have a question about the auto-generated categories, specifically the source-in-place, as you have shown above. I'm not sure I'm understanding the implementation of it. On a Source page, such as Source:Graham, Albert Adams. History of Fairfield and Perry Counties, Ohio, will there be a link at the bottom of that page for "History of/in Fairfield, Ohio, United States", which would then go to something like you have mocked up (link in previous section)? Or is this something that the user reaches via search? -- Amy (Ajcrow) 14:51, 3 May 2011 (EDT)
I'm thinking that we would list the automatically-generated category "Sources of Fairfield, Ohio, United States" either at the bottom of the page below the user-generated categories in a box labeled something like "Browse" or "More like this", or else in the left-hand sidebar above the "Watchers".--Dallan 15:13, 3 May 2011 (EDT)
When you say automatically-generated categories, and based upon your Search-links navigation example, I picture the implementation of this to be similar to that which exists at family search (a drill-down search of sorts). So, my impression when you've described a link to these 'categories' is that they would be a link to this search, much like I included in the last sentence. Am I picturing this correctly? --Jennifer (JBS66) 15:21, 3 May 2011 (EDT)
We could do it that way, with the facets displayed on the left, though I was thinking more along the lines of User:Dallan/SourcesInIllinois, with the facets displayed at the top, more like a traditional category page. Either way is fine. Also, I'd probably display pages in the category using one or maybe two lines per page (list the person's name and their birth & death dates or years for example) instead of the 3-4 lines per page like we do in search, so that showing 100-200 pages per screen wouldn't be so long.--Dallan 15:29, 3 May 2011 (EDT)

Dallan, you asked what I think will be lost with search-based categories. Here are a few thoughts that come to mind:

  1. Lose the ability to add customized categories that will tie in together - rather than one system for "search-based categories" and another system for "human generated categories"
  2. Lose the ability to reorganize and customize categories as needed, rather than being based solely upon the page's fill-in fields. Take Repositores as an example. Right now, there is no type field so they are all grouped together. Categories could be added to distinguish Libraries from Courthouses without the need for an added field.
  3. When I mentioned above about categories losing their place category counterparts - what would the human generated Category:American Civil War veterans connect into? Right now the plan is for it to connect into a system of county place categories (ie, show me the veterans of Hartford, Connecticut, United States). When we remove the automated categories on pages, we would lose the structure this will tie into.
Maybe there's a way to preserve the benefits. Here are two possible ideas:
  • What if we had an optional place field on category pages? Then when someone was looking at a search-based category for all pages of a particular place, human-generated category pages for that place would appear in the list as well. So when someone was browsing the search-category for Virginia, the list of Category pages (e.g., Courthouses in Virginia, Libraries in Virginia) would be included. And if I navigated to the search-category for one of the counties in Virginia, I'd be shown a list of the Category pages having that county in their place field. Going the other way, from category pages to search-based categories: when someone was looking at a Category page, we could include a link back to the search-category for the place. So if I'm looking at the Category page Courthouses in Virginia, I'd see a link to the search-category for Virginia.
  • An alternate approach is to encourage categories to be more like tags, and to make category another "facet" of the search-based categories, along with place and namespace. With this approach you'd just have categories like "Courthouse" and "Library" instead of "Courthouses in Virginia" and "Libraries in Virginia". When looking at a search-category, you'd see a list of tags appearing on the pages in that category, ordered by frequency, so that big categories were first in the list. So if I'm looking at the search-category for Virginia, I'd see a list of tags: "Early Settlers (1000), Courthouses (51), Libraries (40)" etc. If I then selected the "Repository" namespace, the list of tags would narrow down: "Early Settlers" category would drop out because there wouldn't be any Repository pages in the "Early Settlers" category. But I'd still see the categories for courthouses and libraries. Clicking on one of those categories would show me all repositories in Virginia in that category.
The second approach would be my preference. If you wanted to have a new tag appear in the search-based categories, you'd just start assigning pages to that category. You wouldn't have to create Category pages for "Courthouses", "Courthouses in Virginia", "Courthouses in County X, Virginia", etc. Just create the category "Courthouses". Similarly, you'd have just one "American Civil War veterans" category. You wouldn't need to maintain separate county-level categories. Clicking on the "American Civil War veterans" tag in the search-based categories would let you then navigate up and down the place hierarchy, with the listed filtered to include just those in the "American Civil War veterans" category and in whatever place the user had navigated to.

Regarding the usability of the Search-links navigation example (I'm not attempting to compare with categories here): If I compare this to something like this site, I can remove a filter by clicking the X and it will recalculate the results. Say I specify down to People in Illinois, but once I'm there, I realize I want Sources in Illinos instead. Is there a way to get there without traversing back through the options?

Right now I'm thinking that we'd implement breadcrumbs in the section for each facet. If you were currently looking at the search-category for people in Illinois, you's see two sections:
All namespaces > People
All places > United States > Illinois (a list of the counties of Illinois would appear next in this section)
If you wanted to switch to sources in Illinois, you'd click first on "All namespaces", then on the "People" link that would then appear in that section. Similarly, if you wanted to switch to Virginia, you'd click on United States, then on the "Virginia" link that would appear in that section.

I am keeping an open mind about faceted search. The Faceted Wikipedia Search offers some interesting possibilities... The question I'd want to consider is: is faceted navigation the best choice for the contents and types of users WR has? --Jennifer (JBS66) 09:50, 4 May 2011 (EDT)

I think so. The issue is we have a multi-rooted hierarchy. People might want to browse "Sources in Illinois", "Sources in County X, Illinois", "People in County X, Illinois", "People in Illinois", etc. It's difficult to maintain Category pages that make it easy to jump from "Sources in County X, Illinois" to both "People in County X, Illinois" or "Sources in Illinois". I know it's possible by putting the right super-categories everywhere, but that's asking a lot. It seems simple enough to show a category-like page with the main facets at the top (or maybe down the left, like they are in search, I don't know): namespaces, places, and tags (categories). If you're browsing sources, maybe you get a couple of additional facets: source-subject and availability. If you want to filter the results further, then use categories. If you do use categories, you don't have to create category pages for every single county -- just create one category page. The category will show up as a facet value whenever you're browsing pages assigned to that category.--Dallan 19:56, 17 May 2011 (EDT)

Forgive me for intruding into this discussion at a late point. I am new to WeRelate but I am not new to the issues being discussed on this page. In my real life I have built a complex engineering database with millions of components with complex attribute, classification and inheritance structures. Based on that experience I will make a few comments.

1) To build and maintain a robust and scalable classification (you call them categories) system you will need to be able to create multiple hierarchical classification taxonomies - Places and Sources being the two examples under discussion here however there may be more.

2) I would centralize the creation of the taxonomy structure - some of this work has been done for you regarding Places and Sources

3) I would create a structure for delegating administration of sub-trees of the classification hierarchy to different Subject Matter Experts for different topics, regions, etc - there is no way you will be able to scale with just your original SMEs - this may be a longer-term problem but you might think about it sooner rather than later. Opening this up to all members will ultimately create a huge mess - for example I have encountered both gaps and errors in your Places database but you do not want me to take on trying to correct this so I have chosen not to do so. It is not clear to me who is or should be the SME for maintaining the Places hierarchy for central North Dakota - or for what is now Ukraine but has at various times past been parts of Russia, Poland, etc.

4) I would NOT automatically generate hidden categories from data contents - eg "Surname in Place". I was referred to this page by Jennifer when I queried her regarding how to avoid or correct a lot of bogus "Surname in New York" categories created merely by the fact that someone entered the US via Ellis Island although nobody remained there for more than a few days.

5) It is not practical to anticipate all the different ways users may wish to associate your objects. Some parts/product data management platforms are moving to faceted search as a way of dealing with this. I think it is a wise for you to consider this also.

6) In my opinion faceted search will not replace the need for categories where there is a relevant hierarchy of categories with a well-defined taxonomy - Places and Sources being a good example. However for arbitrary attributes of People like surnames or given names, which are essentially "flat" attributes and for which there can be many alternates and variants, I think any attempt to group them into categories is doomed to fail and you will be better served by richer search capabilities.--Jhamstra 11:14, 10 June 2011 (EDT)

Thank you for the input. I'm working on an approach that removes system-generated categories and includes places and user-generated categories as search facets; we'll see how that works out. I don't recall asking anyone to not fill in gaps or correct the place database. I think it would be great if more SME's would volunteer to review and correct the place database.--Dallan 12:56, 10 June 2011 (EDT)