[seek-dev] (no subject)

Matt Jones jones at nceas.ucsb.edu
Fri Sep 24 10:46:50 PDT 2004


Well, actually that datatype information is available already.  The 
query document lists the namespace for the concept space (i.e., 
eml://ecoinformatics.org/eml-2.0.1), so the system that does query 
translation can look up the datatype for every term in the query using 
the xsd schema.  We discussed putting datatype redundantly in the query 
doc too, but we decided it 1) was unnecessary because the info is 
already available, 2) makes creating the query harder, and 3) could 
potentially conflict with the original datatype as defined in the 
concept schema.  For all of these reasons we decided to leave it out.

Matt

Bing Zhu wrote:
> I believe we have the similar issue in SRB when I tried to compose more
> complicated queries for WhyWhere data and had problem dealing with 'integer'
> type meta data.
> 
> Note the following condition example which a typical query part in our
> query document.
> 
>    <condition operator="LESS THAN OR EQUALS"
> concept="northBoundingCoordinate">31.5</condition>
> 
> The missing piece of information to translate it into underneath data
> management
> system such as SRB is the attribute "datatype". If we can have this
> information
> in our query document, the problem of "northBoundingCoordinate<=31.5" can be
> avoided.
> 
> 
> Bing
> 
> 
> 
> 
> 
> -----Original Message-----
> From: seek-dev-admin at ecoinformatics.org
> [mailto:seek-dev-admin at ecoinformatics.org]On Behalf Of Jing Tao
> Sent: Thursday, September 23, 2004 7:49 PM
> To: jones at nceas.ucsb.edu
> Cc: seek-dev at ecoinformatics.org
> Subject: [seek-dev] (no subject)
> 
> 
> Hi, matt, steve and chris:
> 
> This afternoon I played a little bit with the query steve submit to
> ecogrid node and found something about number and string.
> 
> Metacat stores the metadata in "nodedata" field in xml_nodes table. The
> data type for "nodedata" is char. If you want to compare a number in
> query, db should do casting.
> 
> basically, steve's query will look like:
> AND
>  condition1 northBoundingCoordinate<=31.5
>  condition2 southBoundingCoordinate>=10.0
>  condition3 surname=joye
> AND
> 
> In metacat log, the time consumer is the main query, taking 271 seconds
> (extened  query is trivial, only take 3 seconds). I post the main query
> to sql client, it took 267.4 seconds to run. So the result is almost same.
> 
> Then I did some other tests and try to figure it out. Here is some testing:
> 
> Query 1: northBoundingCoordinate<=31.5
> 
> SQL query look like(simplied, only show different part to quey 2,3):
> SELECT DISTINCT docid FROM xml_nodes WHERE nodedata <= 31.5   AND
> parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE
> 'geographicCoverage/boundingCoordinates/northBoundingCoordinate'
> 
> Time: 236 seconds(No too much faster than whole query) appro 100 documents
> 
> 
> Query 2: southBoundingCoordinate>=10.0
> SQL query look like(simplied, only show differenc to query 1, 3)
>  SELECT DISTINCT docid FROM xml_nodes WHERE nodedata >= 10.0  AND
> parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE
> 'geographicCoverage/boundingCoordinates/southBoundingCoordinate')
> 
> Time: 114.9 seconds appro 100 docs
> 
> Query 3:
> Condition surname=joye
> 
> SQL query looks like(only show different part to query 1, 2)
> SELECT DISTINCT docid FROM xml_nodes WHERE UPPER(nodedata) = 'JOYE' AND
> parentnodeid IN (SELECT nodeid FROM xml_index WHERE path LIKE
> '/eml/dataset/creator/individualName/surName'
> 
> Time: 2.8 seconds(Much faster) 4 docs
> 
> Oh, Third one(2.8 secs) is much faster than first one(236 secs) and second
> one(114.9). Postgres have some problem in casting :)
> 
> I also run following query:
> 
> Query 4: northBoundingCoordinate<=31.5
> The condition is as same as query 1, but the sql query are little bit
> different. It add something to get rid of non-number cells (this is
> metacat does now):
> SELECT DISTINCT docid FROM xml_nodes WHERE nodedata <= 31.5  AND
> UPPER(nodedata) = LOWER(nodedata) AND LTRIM(nodedata) != ' '  AND nodedata
> IS NOT NULL AND parentnodeid IN (SELECT nodeid FROM xml_index WHERE path
> LIKE 'geographicCoverage/boundingCoordinates/northBoundingCoordinate'
> 
> Time: 247.1 seconds(Almost to query 1 - 236 seconds)
> 
> 
> Query 5: southBoundingCoordinate>=10.0
> The condition is as same as query 2, but the sql query are little bit
> different. It add some thing got get rid of nonnumber of cells.
> 
> SELECT DISTINCT docid FROM xml_nodes WHERE nodedata >= 10.0 AND
> UPPER(nodedata) = LOWER(nodedata)  AND nodedata IS NOT NULL AND
> LTRIM(nodedata) != ' ' parentnodeid IN (SELECT nodeid FROM xml_index WHERE
> path LIKE
> 'geographicCoverage/boundingCoordinates/southBoundingCoordinate'
> 
> Time: 14.1 seconds (much faster than query 2 - 114.9 secs)
> 
> The difference between query 2 and query 5 is, query 5 gets rid of
> non-number cell in search space. And query 5(14 secs) is much faster than
> query 2(114 secs). So it seems progres has a hard time to ignore
> non-number char during the casting(from char to number).
> 
> But why query 4 (247 seconds) and query 1(236 seconds) no big difference
> even we add something to get rid of nonnumber cell in query 4? I guess the
> our adding is not completed, it doesn't get rid of all non-number char in
> cell. We need fixed that.
> 
> I also try to run those queries in Oracle in ecoinfo. But I got a
> "ORA-01722: invalid number" error. Durane got the same problem in
> numberica search. Those prove our metacat query couldn't remove all
> non-number character too.
> 
> It seems posgres can ingore non-number character in cell during
> string-number casting, but it is very expensive. Oracle just throw a sql
> excpeiton.
> 
> Hope this is helpful.
> 
> Jing
> 
> 
> --
> Jing Tao
> National Center for Ecological
> Analysis and Synthesis (NCEAS)
> 735 State St. Suite 204
> Santa Barbara, CA 93101
> 
> _______________________________________________
> seek-dev mailing list
> seek-dev at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/seek-dev

-- 
-------------------------------------------------------------------
Matt Jones                                     jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Fax: 425-920-2439    Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
University of California Santa Barbara
Interested in ecological informatics? http://www.ecoinformatics.org
-------------------------------------------------------------------



More information about the Seek-dev mailing list