next up previous contents
Next: Time Series Data Up: Cluster Exchange Format - Previous: Introduction   Contents

File Format

Files in this syntax must have names with the extension .cef to assist identification.

An exclamation mark is used as a comment marker, and all input to the right of this marker up to the newline character is ignored on input.

A header provides sufficient metadata to describe the data and its formatting, and is specified in detail below. The header must be attached and precede the data records. Apart from quoted text data, and variable names, header information is CASE INSENSITIVE, so that, for example, entries `Depend_0', `DEPEND_0', and `DepEnD_0' are equivalent.


Table: Case sensitivity : the general rule is that anything not included in quotes is case insensitive. This table shows exceptions to this rule.
Item Ref. Rule Comments
CEF file extension This
section
.cef is written in lower case on UNIX and Linux systems Be careful when copying files from systems which do not distinguish case (PC, Mac, VMS).
Header information This
section
Case insensitive, apart from the exceptions listed below. For example, entries `Depend_0', `DEPEND_0', and `DepEnD_0' are equivalent.
       
Text strings This
section
Anything written between quotes is case sensitive. CAA will suppress case sensitivity when searching for keywords.
       
Time stamps Sect. 2.1 The characters `T' and `Z' of the ISO time code are case insensitive.  
FILE_TYPE_VERSION Sect. 2.4 CEF should be in upper case. ``CEF-2.0'' will be parsed.
File names Sect. 2.5 Case sensitive To be compatible with the
extension .cef, lower case is recommended.
Global attribute names sect. 2.6 Case sensitive Exception to the general rule.
       
Variable names Sect. 2.7 Case sensitive Exception to the general rule
       
Enumerated metadata   Case sensitive
(data may be parsed)
The values proposed in the CAA Metadata Dictionary, CAA-CDPP-TN-0002, are to be used.


Lines of metadata may be continued using `$\backslash$' as a continuation marker following one of the commas separating a list of values.

The data records must be immediately preceded by a line which identifies the end of the data. This takes the form
DATA_UNTIL = yyy
where yyy is a quoted string that will be found at the beginning of a line following the last data record in the file, or takes the value EOF (not in quotes) if the data is to be read until the end of file is reached. Spaces, tabs and non-printing characters are not permitted within the string yyy.

All data files are record oriented and homogeneous - they have a sequence of records each with the same variables in the same order. Variables that are multi-dimensional take the natural C ordering, and one entry is required for each element. In time series data the records are ordered on the monotonic increasing time variable. Each record is ended by a record_delimiter which by default is a new line character, ($\backslash$n). The record_delimiter may not be the comment marker (exclamation mark, `!') or ampersand (`&') or a non-printing character. Entries in a record are comma separated and white space surrounding delimiters is ignored.

The record delimiter and all carriage return ($\backslash$r), new line ($\backslash$n) and white space characters at the start and end of records and surrounding delimiters are to be deleted from data records on read. Thus data may safely be formatted with white space and end of line markers for readability. This also allows for easier exchange between platforms where the end of line marker is variously $\backslash$r$\backslash$n under DOS, $\backslash$n under Unix and $\backslash$r under MacOS. All these end of line markers are removed together with white space (including tab characters, $\backslash$t).

Note that in order to protect white space and commas in text data this data type is enclosed within double quotes " ". Quoted text is used when the text itself is to be used as the value of a parameter. When the text refers to an object, such as a variable name or an entry in a list of reserved names it will not be quoted. Variable names are case sensitive, but other values which refer to lists of reserved values such as data types, will be case insensitive. Such case insensitive parameters and values are shown throughout this document in uppercase characters.

Blank lines (where the first character is the newline character, the end of record character, or contain only white space characters) are ignored.



Subsections
next up previous contents
Next: Time Series Data Up: Cluster Exchange Format - Previous: Introduction   Contents
Anthony Allen 2009-10-19