[kepler-dev] Attributes for Kepler (and Ptolemy) Tokens

Tue Mar 25 13:25:04 PDT 2008

Hi Dan,
Right, I had forgotten about the Java Dictionary class.  I think of
things as Maps . . .

Perhaps we could refactor the unit system work so that Token had a
reference to an object that could be both your proposed Dictionary/Map
attribute and the unit system.  I see the unit system as additional type
information, which is very similar to the attribute.

To do this refactoring, we would add a reference to Token and remove
the unit system reference in ScalarToken.  This would handle my object
concerns rather nicely, though it would require some heavy lifting
with the unit system.  Still, designing the attribute extension to
Token to be flexible enough to handle the unit system would make the
unit system work possible.

_Christopher

--------

    Hi Christoper,

        My terminology may be a bit confusing, since different languages use 
    different terms for what is basically the same thing. For instance, in 
    Java, Hashtable is derived from java.util.Dictionary, but the JavaDocs 
    now say that the Dictionary class is obsolete and one should use the 
    newer Map interface. So I do think of a RecordToken as a dictionary 
    object. [TreeMap appears to basically just be a Map sorted by key values]
        And, yes, I agree that even a null value in a member would increase 
    the space needed for Tokens, so that might be a disavantage, but I don't 
    know how big of one. But it still seems useful to me to be able to add 
    attributes to any token. And R and Python include that ability for any 
    of their objects.

    Dan

    Christopher Brooks wrote:
    > Hi Dan,
    > Hmm, interesting idea.
    >
    > The dictionary sounds a bit like a RecordToken.  RecordTokens use
    > a TreeMap as the inner data structure.  Perhaps attaching a RecordToken
    > to a Token might help with data management and operations on the metadata
   .
    > I don't fully understand the DataFrame example, but it does not sound
    > like RecordToken would help there.
    >
    > One issue with adding to Token is that even if the reference to a
    > dictionary is null, it will still add space to Token.  Can anyone
    > confirm this?
    >
    > Right now, I don't think Tokens have any data, the data is part of the
    > subclass.
    >
    > It might be worth looking at how the unit system in ptolemy/data/unit
    > is implemented.  It looks like we ended up making ScalarToken larger
    > by adding:
    >     protected int[] _unitCategoryExponents = null;
    >
    > The notion of adding metadata to a token is of interest to us, Edward
    > might have some input.
    >
    > _Christopher
    > --------
    >
    >     
    >     Hi All,
    >     
    >         I have been spending some time lately learning Python with the 
    >     particular goal of using the Python/Jython actor in Kepler. One thing

    >     that I have noted is that Python has some interesting similarities to
    R. 
    >     In particular, both languages have the ability to attach 'attributes'
    to 
    >     arbitrary objects. It strikes me that this is a very useful way to 
    >     attach various types of metadata to data objects - a capability that 
   is 
    >     the basis of knb/eml data packages that are stored in the NCEAS Metac
   at 
    >     and used in Kepler EML data source actors.
    >     
    >         Kepler passes data between actors as Tokens, which I think of as 
    >     references to the actual data (one level of abstraction from the actu
   al 
    >     data). However, at least as far as I understand it, there is no way t
   o 
    >     attach attributes to Tokens. *I would like to propose adding a 
    >     'Dictionary' member (i.e. a Hashtable) to the base Token class*. This

    >     would allow any Kepler token to carry a named list of 'attributes'. 
    >     Example labels (keys) for these attributes might be a 'name', 'unit',
    or 
    >     some named more complex metadata element (e.g. an XML fragment). The 
    >     default value of this Dictionary member could be null so that it woul
   d 
    >     have no effect on existing workflows using existing tokens, and it wo
   uld 
    >     have minimal effect on new workflows unless it was deliberately 
    >     populated with attributes of interest.
    >     
    >         Any comments/thoughts on this?
    >     
    >     Dan Higgins
    >     
    >     Some additional thoughts:
    >         One item that lead to these thoughts is the R dataframe object th
   at 
    >     is very useful in R for manipulating table-like structures. In R, a 
    >     dataframe is an ordered list of column data. The columns are basicall
   y 
    >     arrays of the same length but not necessarily of the same data type -

    >     i.e. one might be strings, another doubles, etc. The columns (and row
   s) 
    >     can be named. A dataframe is thus very similar to a relational databa
   se 
    >     table and functions for subsetting, searching, and other RDB-like 
    >     operations exist in R.
    >         How would one pass dataframe objects between arbitrary actors in 
    >     Kepler using Kepler tokens? My first thought would be as Ptolemy 
    >     RecordTokens where each item (ie column) in the Record is an ArrayTok
   en. 
    >     The columns in the Record each have an associated label (name), but t
   hey 
    >     are not ordered except by the alphabetical order of the names (since 
   a 
    >     RecordToken is just a dictionary or hash table). To get the ordering 
   of 
    >     the dataframe, one could create a DataframeToken that was an array of

    >     column arrays, but then how does one attach names (and other metadata
   ) 
    >     to each column array?
    >         So you can see that the idea of including a Dictionary member to 
    >     Token is driven in part by the desire to create a 'dataframe-like' to
   ken 
    >     for Kepler.
    >     
    >     --------------040404000407070403020603
    >     Content-Type: text/html; charset=ISO-8859-1
    >     Content-Transfer-Encoding: 7bit
    >     
    >     <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    >     <html>
    >     <head>
    >     </head>
    >     <body bgcolor="#ffffff" text="#000000">
    >     Hi All,<br>
    >     <br>
    >     &nbsp;&nbsp;&nbsp; I have been spending some time lately learning Pyt
   hon wi
    >    th the
    >     particular goal of using the Python/Jython actor in Kepler. One thing
    >     that I have noted is that Python has some interesting similarities to
    >     R. In particular, both languages have the ability to attach
    >     'attributes' to arbitrary objects. It strikes me that this is a very
    >     useful way to attach various types of metadata to data objects - a
    >     capability that is the basis of knb/eml data packages that are stored
    >     in the NCEAS Metacat and used in Kepler EML data source actors.<br>
    >     <br>
    >     &nbsp;&nbsp;&nbsp; Kepler passes data between actors as Tokens, which
    I thi
    >    nk of as
    >     references to the actual data (one level of abstraction from the actu
   al
    >     data). However, at least as far as I understand it, there is no way t
   o
    >     attach attributes to Tokens. <b>I would like to propose adding a
    >     'Dictionary' member (i.e. a Hashtable) to the base Token class</b>.
    >     This would allow any Kepler token to carry a named list of
    >     'attributes'. Example labels (keys) for these attributes might be a
    >     'name', 'unit', or some named more complex metadata element (e.g. an
    >     XML fragment). The default value of this Dictionary member could be
    >     null so that it would have no effect on existing workflows using
    >     existing tokens, and it would have minimal effect on new workflows
    >     unless it was deliberately populated with attributes of interest.<br>
    >     <br>
    >     &nbsp;&nbsp;&nbsp; Any comments/thoughts on this?<br>
    >     <br>
    >     Dan Higgins<br>
    >     <br>
    >     Some additional thoughts:<br>
    >     &nbsp;&nbsp;&nbsp; One item that lead to these thoughts is the R data
   frame 
    >    object that
    >     is very useful in R for manipulating table-like structures. In R, a
    >     dataframe is an ordered list of column data. The columns are basicall
   y
    >     arrays of the same length but not necessarily of the same data type -
    >     i.e. one might be strings, another doubles, etc. The columns (and row
   s)
    >     can be named. A dataframe is thus very similar to a relational databa
   se
    >     table and functions for subsetting, searching, and other RDB-like
    >     operations exist in R.<br>
    >     &nbsp;&nbsp;&nbsp; How would one pass dataframe objects between arbit
   rary a
    >    ctors in
    >     Kepler using Kepler tokens? My first thought would be as Ptolemy
    >     RecordTokens where each item (ie column) in the Record is an
    >     ArrayToken. The columns in the Record each have an associated label
    >     (name), but they are not ordered except by the alphabetical order of
    >     the names (since a RecordToken is just a dictionary or hash table). T
   o
    >     get the ordering of the dataframe, one could create a DataframeToken
    >     that was an array of column arrays, but then how does one attach name
   s
    >     (and other metadata) to each column array?<br>
    >     &nbsp;&nbsp;&nbsp; So you can see that the idea of including a Dictio
   nary m
    >    ember to
    >     Token is driven in part by the desire to create a 'dataframe-like'
    >     token for Kepler.<br>
    >     </body>
    >     </html>
    >     
    >     --------------040404000407070403020603--
    > --------
    >   
--------