Measurement scale in EML
Matt Jones
jones at nceas.ucsb.edu
Mon Feb 28 09:31:01 PST 2005
Hi Xiaoping,
As Peter mentioned, your problems have arisen before. See below for
some additional recommendations beyond Peter's from my personal perspective.
Xiaoping Wang wrote:
> Dear Matt and Peter:
>
> I have seen a lot of discussions recently on issues about measurement
> scale and temporal coverage. They are very helpful for our better
> understanding of EML. The following are my questions and concerns I
> raised during my work on our EML-based metadata. <#temporalCoverage>
>
> 1. About the Measurement scale
>
> The measurementSclae is a little bit confusing. I spent a lot of time
> working on the measurementScale for nominal data. Here I want to give
> you an example about how I use the measurmentScale to describe nominal
> data in our dataset, and you can see whether my implementation is based
> on correct understanding of this element.
>
> We have a data table with four columns (attributes): recordID,
> variable_name, variable_unit, and avriable_value. The values for
> variable_name column include certain measurements for the chemical and
> physical properites of sea water such as temperature, salinity,
> nitrate...... The following is a sample piece of my EML file for this
> dataset.
> - <#> <attribute>
> <attributeName>varName</attributeName>
> <attributeDefinition>Name of chemical or physical property
> measured</attributeDefinition>
> <storageType>String</storageType>
> - <#> <measurementScale>
> - <#> <nominal>
> - <#><nonNumericDomain>
> - <#><enumeratedDomain>
> - <#><codeDefinition>
> <code>T</code>
> <definition>Temperature, unit: C</definition>
> </codeDefinition>
> - <#> <codeDefinition>
> <code>S</code>
> <definition>Salinity, unit: PPT</definition>
> </codeDefinition>
> - <#><codeDefinition>
> <code>ST</code>
> <definition>Sigma-T, unit: KG/M**3</definition>
> </codeDefinition> <#>
> </enumeratedDomain>
> </nonNumericDomain>
> </nominal>
> </measurementScale>
> </attribute>
> - <#> <attribute>
> <attributeName>varUnit</attributeName>
> <attributeDefinition>Unit of chemical or physical property
> measured</attributeDefinition>
> <storageType>String</storageType>
> - <#> <measurementScale>
> - <#> <nominal>
> - <#> <nonNumericDomain>
> - <#> <textDomain>
> <definition>*</definition>
> </textDomain>
> </nonNumericDomain>
> </nominal>
> </measurementScale>
> </attribute>
>
> My questions / concerns are:
> (1) Is it suitable to use enumeratedDomain element to describe varName?
Yes, that is fine, although if you wanted it to be free text that would
be ok too (just use textDomain instead of enumeratedDomain). Encoding
the unit information in the variable name is somewhat repetitive if you
have the same unit information in the varUnit column.
>
> (2) For the varUnit, I don't think it is necessary to include
> measurementScale element. However, since the measurementScale is an
> required field, I have to put something there in order to pass the EML
> validation. So I put a "*" sign for the definition element. I have
> seen some other similar cases in which the EML metadata developers use a
> "*" for the definition element. Obviously, the measurementScale content
> described here tells no useful information about the varUnit.
The use of the '*' is inappropriate. The field is required because the
authors of EML thought the information was important. In this case, I
think you should put in the definition something that indicates that the
values are names of units. One major thing that is missing here is that
you don't use the EML Unit Dictionary when choosing your unit
definitions. This eliminates the major advantage of EML in being able
to provide quantitative information about units. If there is a 1:1
correspondence between your units and the EML unit dictionary, I think
it would be good if you defined varUnit as an enumerated domain and for
each of your units provide the EML standard name for the unit in the
definition. This would help in translating, although it is unlikely
that anyone could use this in automated systems because its such a
non-standard use of the eml descriptors.
In general, this model of variablename, varunit, value is a non-standard
use of the relational model as the attributes do not really represent a
single type. The relational model is generally intended to have
attributes that contain a semantically homogenous set of values. In
your case this is not true, unless considered from a meta-level. So, I
think you are using the relational model as a schema language itself.
This significantly complicates use of the data in standard analytical
systems (e.g., SAS< Splus, R, Matlab) -- they basically all require
different views of the data as described in Peter's note. Personally I
think that documenting these more traditional views if you have them
would be far more useful to scientists who wish to analyze the data.
That would have the added benefit of being better described by EML
structures. Documenting your "meta-level" schema isn't particularly
informative because the information in one attribute is so heterogeneous.
>
> 2. About the information of metadata itself
>
> Based on my understanding of EML schemas, the only inforamtion
> associated with the metadata itself is the information about metadata
> provider(s). However, my supervisors and I think that it is important
> to provide other metadata information, such as when metadata document is
> created, if further update of metadata is neede, and if the answer is
> yes, what is the metadata update frequency and the date of last update.
> Those pieces of information are particularly important in the case when
> the endDate value for the dataset from on-going projects is going to
> change, because first they can remind metadata providers / developer
> when they should update their metadata, and second they can tell
> metadata users if the metadata document provides the most current
> information about the dataset described.
Sure. In hindsight, I think we should have included these metadata
information fields, particularly the timestamp fields. But we do have
some related fields that describe ongoing data collection. Take a look
at /eml/dataset/maintenance/description and
/eml/dataset/maintenance/maintenanceUpdateFrequency. The latter is
probably what you want. Ay fields that you want but that don't exist in
the schema can be put in the "/eml/additionalMetadata" field, so you
always have that as a recourse. If you have specific recommendations
for fields that are needed you could send them to
eml-dev at ecoinformatics.org and we'll try to get them into plans for a
future release.
>
> 3. About the temporal coverage <#temporalCoverage>
>
> We have many metadata records with uncertain endDate because the new
> data are being continuously loaded into the dataset. Whenever new data
> are loaded, we have to change the values for end date, number of
> records, and /or size of table...... I am wondering when you can
> provide a solution for this issue.
Personally I think this is a good thing. At any given point in time
there is a finite amount of data available, and the metadata should
describe that. If you have an automated data collection process, then
you would simply have to update your metadata as part of that process.
The number of records, table size, and checksum are useful when people
get your data to validate that they got the data without error. The end
date for temporal coverage provides valuable discovery information, and
should simply be made to match the data that you release.
>
> In addition, I found from John's email that you had a KNB data
> management workshop early this year. I am very interested in this kind
> of workshop, particular workshop associated with the use of metacat. If
> you have this type of workshop in the future, please let me know.
Yeah, we had one in February. We announce these opportunities on
various web sites and mailing lists. You should subscribe to
ecoinfo at ecoinformatics.org and watch http://seek.ecoinformatics.org in
particular for announcements.
Like Peter I also recommend that you get involved in the ongoing
improvements related to EML. Your feedback and contributions would be
extremely vauable. Good luck. Let us know if you have more questions.
Matt
>
> Thank you very much for your support!
>
> Xiaoping Wang
>
> PMEL /NOAA
>
>
>
>
>
>
--
-------------------------------------------------------------------
Matt Jones jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/ Fax: 425-920-2439 Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
University of California Santa Barbara
Interested in ecological informatics? http://www.ecoinformatics.org
-------------------------------------------------------------------
More information about the Eml-dev
mailing list