[tcs-lc] Misspelled Names and Orthographic Variants (Issue 005)

Roger Hyam roger at hyam.net
Fri Apr 29 02:08:46 PDT 2005


Three levels of nested comments here - never a good thing...

>>1) All names that look, sound and smell like a scientific name should be
>>created as NameObjects. This is because:
>>a) Someone may have used them as part of a concept or concepts somewhere.
>>b) They may or may not be erroneous. We can't say don't mark up the
>>erroneous ones as NameObjects because you might not know they are
>>erroneous.
>>    
>>
>
>That's what I was afraid of -- and the specific reason I sent that list.
>So, if I understand you correctly, NameObject=NameString (where NameString
>is any unique sequence of Unicode characters, which somewhat resembles an
>attempt at a scientific name).
>
>Is this correct?
>  
>
No. If I had meant that a NameObject was just a string I would have 
written that.  It has to be a set of characters (not all the Unicode 
ones but just those acceptable by the codes) that some one might believe 
to be a published scientific name of a biological organism governed by 
one of the codes. Yes a name object could be constructed as a random set 
of characters but there would be little point in doing it. One has to 
have an intent to use the construct for something.

>>2) The PublicationStatus element can be used as a human readable note in
>>NameObjects to indicate that a name is a orthographic variant of another
>>name. Other than this there should not be name-name links to indicate
>>misspellings.
>>    
>>
>
>But, but, but....I thought the whole point of making names as objects was to
>allow direct name-name relationships???
>
>  
>
You can put name-name relationships in the PublicationStatus element as 
well as many of the other elements in the schema.

>>3) To mark up a misspelling one should create a link between two
>>TaxonConcepts.
>>    
>>
>
>Well...does that mean that all misspellings must be attached to "Defined
>Concepts"?  Or, does it mean that not all misspellings will be represented
>as TCS objects?
>
>  
>
If you read the next three points I explain...

>>a) For an author to misspell a name they must have used it to refer to a
>>concept of some kind that it would be useful to reason about.
>>    
>>
>
>Agreed -- but I thought that not all name-usages rose to the level of
>"defined concepts"?
>  
>
Who misspelled the name? If the person that did it was circumscribing a 
taxon then you create a concept and possibly NameObject etc. You could 
create a empty concept and have them as the according to if you don't 
know anything about their circumscription. On the other hand if they 
misspelled it in hand writing, on a debt slip, on a Friday afternoon 120 
years ago and you don't believe anyone else has used that spelling I 
would be tempted not to create a concept or a name. This kind of thing 
is more likely to be defined in ABCD anyhow. If  you believe that the 
hand written scrawl could be a 'real name' as in it might actually have 
been published somewhere then you could create a NameObject and possibly 
a nominal concept if you want to debt the specimen to it.

It is horses for courses on this. That is why I would like to start 
talking about real life examples now.

>>b) The person who misspelled the name should be in the according to.
>>    
>>
>
>I'm happy with that -- provided that there is a liberal allowance for
>representing usages as concepts (that are mapped congruently to other more
>well-defined concepts).
>
>In fact, part of the reason for my previous query was to get at exactly this
>issue:  I think if you are going to define "names" as Objects, there snould
>be a 1:1 ratio between unique NameObject instances, and
>Basionyms+NewCombinations (botanical perspective), or perhaps even a 1:1
>ratio between NameObject instances and terminal epithets of Basionyms alone
>(zoological perspective).  That way, misspellings & such are captured in a
>human-readable "VerbatimSpelling" element of each TC instance, but point to
>a well-defined NameObject that excludes misspellings (i.e., "AccordingTo
>Author used this text string, but really meant this NameObject").
>
>  
>
The only trouble is you can't assume that some one knows they have a 
wrongly spelled name. They may mark it up as a accepted name because 
they think it is. Then we have to deal with it and the way we deal with 
it depends on who has concepts relating to it.

>It makes no sense to me to create a full structure for top-level NameObjects
>if NameObject=NameString.  A text string is better represented as a simple
>element within the TC substructure -- you don't need a defined object for
>that.  The whole point of defining NameObjects (I thought) was to treat them
>as complex properties, with myriad Name-Name relationships, not to be
>confused with TaxonConcepts, which have only one or a very few Concept-Name
>relationships, but potentially many Concept-Concept relationships.
>
>  
>
It is and that is what we have done I think. Check out the ReferenceType 
complex type.

>>c) If the author misspelled a name when they initially published it
>>(e.g. wrong gender)  there may be some concepts that use the incorrect
>>spelling and some that use the correct spelling. All these concepts
>>should be capable of being related to each other in terms of set
>>relationships.
>>    
>>
>
>Exactly!!!!  But there is no special need to relate the "Aus bea" concepts
>to other "Aus bea" concepts, exclusive of "Aus bus" concepts.  So it makes
>no sense to me to define two separate name objects ("Aus bea" and "Aus
>bus").  Rather, there should be ONE Name Object (which has at minimum
>attributes detailing original orhtography and Code-correct orthography, but
>not necessarily all possible orthographies), and then all "Aus bea" and "Aus
>bus" TC objects would point to the SAME NameObject.  Whether the AccordingTo
>author spelled the species epithet "bea" or "bus" is trivial both
>nomenclaturally and conceptually, and is therefore relegated to a
>"VerbatimSpelling" element within each TC instance (which would probably be
>the element used for text-match searches).  What really matters is that the
>"Aus bea" authors and the "Aus bus" authors intended to refer to the same
>"name object" -- and it just seems like a no-brainer to me that you would
>represent this fact by linking both sets of TC instances to the same
>NameObject instance.
>
>  
>
That would be great. All we need to do is know ahead of time whether a 
name is a misspelling so we could decide not to mark it up. Then tell 
everyone it is a misspellings of another name so they don't mark it up 
either (we could invent an exchange mechanism to do this :). In reality 
these objects will be created. They can be linked together if need be, 
either at the concept level (if we are comparing taxa based on the 
names) or at the nomenclatural level. I just can't see what your problem 
is here.

>>In order to do 3 there needs to be a 'is misspelling of' concept
>>relationship type that is currently missing from the schema.
>>    
>>
>
>I think it would be a mistake to go that route.  We don't need that level of
>complexity, when a simple "VerbatimSpelling" would both capture the
>human-readible reality of the text string that appeared in the publiction,
>and serve as the perfect field to search through for text matches.
>  
>
We need the misspelling relationship because two Concepts (with 
different circumscriptions) may have names that have that relationship. 
We can't say they are the same thing simply because they both use 
different versions of the same name. We have to relate them with 'is 
congruent' or 'overlaps' or somethings but it would also be useful to 
say the equivalent of "we believe this taxon uses a different spelling 
of the same scientific name as that taxon" You could do this without 
even creating a name object for one of the TaxonConcepts if you like. 
The thing is pretty damn flexible.

Things will become clear as we work through real examples. I am 
currently doing the ones on the LC wiki and will post them when 
complete. Can't guarantee I will get through them all though as it is 
quite time consuming.

Roger

-- 

==============================================
 Roger Hyam
----------------------------------------------
 Biodiversity Informatics
 Independent Web Development 
----------------------------------------------
 http://www.hyam.net  roger at hyam.net
----------------------------------------------
 2 Janefield Rise, Lauder, TD2 6SP, UK.
 T: +44 (0)1578 722782 M: +44 (0)7890 341847
==============================================


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/tcs-lc/attachments/20050429/18ba3def/attachment-0001.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: roger.vcf
Type: text/x-vcard
Size: 275 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/tcs-lc/attachments/20050429/18ba3def/roger-0001.vcf


More information about the Tcs-lc mailing list