unitDictionay additions made should we do a supplemental release?
Dan Higgins
higgins at nceas.ucsb.edu
Wed Mar 26 12:02:07 PST 2003
Matt,
I think you pointed out the problem - The Morpho problem came from
special characters pasted from Word!
Dan
Matt Jones wrote:
> I bet that 'invalid UTF-8 encoding' actually means that someone used a
> not-so-smart text editor on a UTF-8 encoded XML file, such that the
> xml declaration in the prolog still says that it should be UTF-8
> encoded, but the UTF-8 encoding was not actually used when the file
> was written out. Basically the metadata/data disagree :) The degree
> symbol is a valid utf-8 character with Unicode codepoint U+00B0. You
> can find the Unicode symbols at:
> http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
> and a good overview of characters, character sets, and character
> encodings at:
> http://technocage.com/~ray/notespage.jsp?pageName=charenc&pageTitle=Character+Encoding
>
> Matt
>
> Dan Higgins wrote:
>
>> Matt,
>> The problem is that special characters like the degree symbol do
>> not properly map to UTF-8 characters for some reason. The error is
>> 'invalid UTF-8 encoding'. There is an error in Bugzilla.
>>
>> Dan
>>
>> Matt Jones wrote:
>>
>>> Dan,
>>>
>>> Actually, we made an explicit decision to use a UTF-8 encoding for
>>> the eml-unitDictionary.xml file specifically so that we could use
>>> non-ascii characters like superscripts, degree symbols, and others
>>> that are common symbols for units. Any compliant XML parser should
>>> be able to deal fine with UTF-8 or UTF-16 and other unicode
>>> character sets, and I think Morpho needs to be adjusted to accept
>>> any character that is legal in a legal XML document.
>>>
>>> Matt
>>>
>>> Dan Higgins wrote:
>>>
>>>> Scott,
>>>> I was looking over your message and noted your use of "m²" . It
>>>> is of interest that the superscript '²' is not a standard ASCII
>>>> character (i.e. the upper bit of its 8-bit representation is set,
>>>> while most standard ASCII uses only the lower 7 bits). This may not
>>>> be a problem in most cases, but we ran into a similar issue with
>>>> the special character for 'degrees' in Morpho with some
>>>> unicode/Java not recognizing such high order bit characters. (I
>>>> think the main problem was with the Xalan XSLT processor not
>>>> working when such special characters were in the document that was
>>>> being transformed.)
>>>>
>>>> It took us a good bit of effort to diagnose the problem, and we
>>>> are recommending that any special 8 bit characters should be
>>>> avoided. So just a word of warning and a suggestion that maybe we
>>>> should avoid such characters in EML docs.
>>>>
>>>> Dan Higgins
>>>> NCEAS
>>>>
>>>>
>>>>
>>>> Scott Chapal wrote:
>>>>
>>>>> [For Solar Radiation], does this look reasonable?
>>>>>
>>>>> <!--powerFlux-->
>>>>> <unitType id="powerFlux" name="powerFlux">
>>>>> <!--wattsPerMeterSquared-->
>>>>> <dimension name="power"/>
>>>>> <dimension name="length" power="2"/>
>>>>> </unitType>
>>>>>
>>>>> <!--energyFlux-->
>>>>> <unitType id="energyFlux" name="energyFlux">
>>>>> <!--kiloJoulesPerMeterSquared-->
>>>>> <dimension name="energy"/>
>>>>> <dimension name="length" power="2"/>
>>>>> </unitType>
>>>>>
>>>>> <init id="wattsPerMeterSquared" name="wattsPerMeterSquared"
>>>>> abbreviation="W/m²"
>>>>> multiplierToSI="1"/>
>>>>> <description>Watts per Meter Squared</description>
>>>>> </unit>
>>>>>
>>>>> <unit id="kiloJoulesPerMeterSquared"
>>>>> name="kiloJoulesPerMeterSquared"
>>>>> abbreviation="kJ/m²"
>>>>> multiplierToSI="1"/>
>>>>> <description>Kilo Joules per Meter Squared</description>
>>>>> </unit>
>>>>>
>>>>> BTW, this excercise makes me wonder if we aren't re-inventing the
>>>>> wheel, OR inventing a wheel that we shouldn't have to. Has anybody
>>>>> reviewed:
>>>>>
>>>>> http://www.unc.edu/~rowlett/units/index.html
>>>>>
>>>>> He certainly seems to have some expertise in the area.
>>>>>
>>>>> And, what about NIST itself, or some other government standards body?
>>>>> Why are WE having to do this and keep it correct and up to date? I
>>>>> definitely believe the unit dictionary should be de-coupled from EML
>>>>> in the next release.
>>>>>
>>>>> -Scott
>>>>>
>>>>> Scott Chapal <scott.chapal at jonesctr.org> writes:
>>>>>
>>>>>
>>>>>
>>>>>> Has the thinking on the unit-dictionary progressed?
>>>>>>
>>>>>> What is discussed below seems a bit heavyweight to me. Why can't a
>>>>>> versioned unit dictionary exist as a simple stand alone schema
>>>>>> document, referencable via a namespace declaration?
>>>>>> Considerations for
>>>>>> backward compatibility would obviously apply.
>>>>>>
>>>>>> Working with our climate data, I found the need for:
>>>>>>
>>>>>> kiloPascal
>>>>>> wattsPerMeterSquared
>>>>>> kiloJoulePerMeterSquared
>>>>>> Fuel Moisture % - percentWaterContentByWeight ??
>>>>>>
>>>>>> Relative Humidity is presumably unitless?
>>>>>>
>>>>>> -Scott
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> eml-dev mailing list
>>> eml-dev at ecoinformatics.org
>>> http://www.ecoinformatics.org/mailman/listinfo/eml-dev
>>
>>
>>
>>
>>
--
*******************************************************************
Dan Higgins higgins at nceas.ucsb.edu
http://www.nceas.ucsb.edu/ Ph: 805-892-2531
National Center for Ecological Analysis and Synthesis (NCEAS)
735 State Street - Room 205
Santa Barbara, CA 93195
*******************************************************************
More information about the Eml-dev
mailing list