unitDictionay additions made should we do a supplemental release?

Dan Higgins higgins at nceas.ucsb.edu
Wed Mar 26 12:02:07 PST 2003


Matt,
    I think you pointed out the problem - The Morpho problem came from 
special characters pasted from Word!

Dan

Matt Jones wrote:

> I bet that 'invalid UTF-8 encoding' actually means that someone used a 
> not-so-smart text editor on a UTF-8 encoded XML file, such that the 
> xml declaration in the prolog still says that it should be UTF-8 
> encoded, but the UTF-8 encoding was not actually used when the file 
> was written out.  Basically the metadata/data disagree :)  The degree 
> symbol is a valid utf-8 character with Unicode codepoint U+00B0. You 
> can find the Unicode symbols at:
>     http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
> and a good overview of characters, character sets, and character 
> encodings at:
> http://technocage.com/~ray/notespage.jsp?pageName=charenc&pageTitle=Character+Encoding
>
> Matt
>
> Dan Higgins wrote:
>
>> Matt,
>>    The problem is that special characters like the degree symbol do 
>> not properly map to UTF-8 characters for some reason. The error is 
>> 'invalid UTF-8 encoding'. There is an error in Bugzilla.
>>
>> Dan
>>
>> Matt Jones wrote:
>>
>>> Dan,
>>>
>>> Actually, we made an explicit decision to use a UTF-8 encoding for 
>>> the eml-unitDictionary.xml file specifically so that we could use 
>>> non-ascii characters like superscripts, degree symbols, and others 
>>> that are common symbols for units.  Any compliant XML parser should 
>>> be able to deal fine with UTF-8 or UTF-16 and other unicode 
>>> character sets, and I think Morpho needs to be adjusted to accept 
>>> any character that is legal in a legal XML document.
>>>
>>> Matt
>>>
>>> Dan Higgins wrote:
>>>
>>>> Scott,
>>>>    I was looking over your message and noted your use of "m²" . It 
>>>> is of interest that the superscript '²' is not a standard ASCII 
>>>> character (i.e. the upper bit of its 8-bit representation is set, 
>>>> while most standard ASCII uses only the lower 7 bits). This may not 
>>>> be a problem in most cases, but we ran into a similar issue with 
>>>> the special character for 'degrees' in Morpho with some 
>>>> unicode/Java not recognizing such high order bit characters. (I 
>>>> think the main problem was with the Xalan XSLT processor not 
>>>> working when such special characters were in the document that was 
>>>> being transformed.)
>>>>
>>>>    It took us a good bit of effort to diagnose the problem, and we 
>>>> are recommending that any special 8 bit characters should be 
>>>> avoided. So just a word of warning and a suggestion that maybe we 
>>>> should avoid such characters in EML docs.
>>>>
>>>> Dan Higgins
>>>> NCEAS
>>>>
>>>>
>>>>
>>>> Scott Chapal wrote:
>>>>
>>>>> [For Solar Radiation], does this look reasonable?
>>>>>
>>>>>  <!--powerFlux-->
>>>>>  <unitType id="powerFlux" name="powerFlux"> 
>>>>> <!--wattsPerMeterSquared-->
>>>>>     <dimension name="power"/>
>>>>>        <dimension name="length" power="2"/>
>>>>>  </unitType>
>>>>>
>>>>>  <!--energyFlux-->
>>>>>  <unitType id="energyFlux" name="energyFlux"> 
>>>>> <!--kiloJoulesPerMeterSquared-->
>>>>>        <dimension name="energy"/>
>>>>>        <dimension name="length" power="2"/>
>>>>>  </unitType>
>>>>>
>>>>>  <init id="wattsPerMeterSquared" name="wattsPerMeterSquared"
>>>>>        abbreviation="W/m²"
>>>>>     multiplierToSI="1"/>
>>>>>    <description>Watts per Meter Squared</description>
>>>>>  </unit>
>>>>>
>>>>>  <unit id="kiloJoulesPerMeterSquared" 
>>>>> name="kiloJoulesPerMeterSquared"
>>>>>        abbreviation="kJ/m²"
>>>>>        multiplierToSI="1"/>
>>>>>    <description>Kilo Joules per Meter Squared</description>
>>>>>  </unit>
>>>>>
>>>>> BTW, this excercise makes me wonder if we aren't re-inventing the
>>>>> wheel, OR inventing a wheel that we shouldn't have to.   Has anybody
>>>>> reviewed:
>>>>>
>>>>> http://www.unc.edu/~rowlett/units/index.html
>>>>>
>>>>> He certainly seems to have some expertise in the area.
>>>>>
>>>>> And, what about NIST itself, or some other government standards body?
>>>>> Why are WE having to do this and keep it correct and up to date?  I
>>>>> definitely believe the unit dictionary should be de-coupled from EML
>>>>> in the next release.
>>>>>
>>>>> -Scott
>>>>>
>>>>> Scott Chapal <scott.chapal at jonesctr.org> writes:
>>>>>
>>>>>  
>>>>>
>>>>>> Has the thinking on the unit-dictionary progressed?
>>>>>>
>>>>>> What is discussed below seems a bit heavyweight to me.  Why can't a
>>>>>> versioned unit dictionary exist as a simple stand alone schema
>>>>>> document, referencable via a namespace declaration?  
>>>>>> Considerations for
>>>>>> backward compatibility would obviously apply.
>>>>>>
>>>>>> Working with our climate data, I found the need for:
>>>>>>
>>>>>> kiloPascal
>>>>>> wattsPerMeterSquared
>>>>>> kiloJoulePerMeterSquared
>>>>>> Fuel Moisture % - percentWaterContentByWeight ??
>>>>>>
>>>>>> Relative Humidity is presumably unitless?
>>>>>>
>>>>>> -Scott
>>>>>>   
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  
>>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> eml-dev mailing list
>>> eml-dev at ecoinformatics.org
>>> http://www.ecoinformatics.org/mailman/listinfo/eml-dev
>>
>>
>>
>>
>>


-- 
*******************************************************************
Dan Higgins                                  higgins at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Ph: 805-892-2531
National Center for Ecological Analysis and Synthesis (NCEAS) 
735 State Street - Room 205
Santa Barbara, CA 93195
*******************************************************************





More information about the Eml-dev mailing list