The Madrigal data model and metadata files

Understanding the Madrigal data model is an important step in understanding how Madrigal works. There is a correspondence between each level of the data model and the metadata files that are found in the MADROOT/metadata directory. In this section we describe each level of the Madrigal data model and the corresponding metadata file.

Madrigal site
Instrument
Instrument type
Experiment
Experiment files
Data parameters
- Parameter explanations
Parameter categories
Data types (kindat)
Instrument parameters
Instrument kindats
Instrument data
Madrigal files

Madrigal site - siteTab.txt

The highest level of Madrigal is a Madrigal site. A Madrigal site is one particular web site controlled by one particular group, that holds all their own data. While each Madrigal site stores their own data locally, they also share metadata with all the other sites. This makes it possible for users to search for data at all the Madrigal sites at once no matter which site they visit, and simply follow links to the Madrigal site that has the data they are interested in.

Metadata about all sites is stored in MADROOT/metadata/siteTab.txt. When new Madrigal sites are added, this table is updated and all Madrigal sites are notified so they can update this file. If a site is running Madrigal 2.5 or higher, this file will be automatically updated unless the file has been manually modified by the administrator in a way not reported to OpenMadrigal administrator. This file contains the following comma-separated fields:

Site ID (e.g., 1)
Site Name (e.g., Millstone Hill Observatory)
Madrigal server (e.g., www.haystack.mit.edu)
Madrigal document root relative to server (e.g., madrigal)
Madrigal CGI directory relative to server (e.g., cgi-bin/madrigal)
Madrigal servlet directory relative to server (e.g., madrigal/servlets) - This field is no longer used.
Contact name (e.g., John M. Holt)
Contact Address 1 (e.g., MIT Haystack Observatory)
Contact Address 2 (e.g., Route 40)
Contact Address 3 (e.g., "" )
Contact City (e.g., Westford)
Contact State/Province (e.g., MA)
Contact Postal Code (e.g., 01886)
Contact Country (e.g., USA)
Contact Telephone (e.g., 1-617-981-5624)
Contact email (e.g., mailto:jmh@haystack.mit.edu) Multiple addresses may be listed if separated by semicolons.
Site version (e.g. 2.6) Version of Madrigal installed as period seperated integers. If not given, default is 2.6.

Instrument - instTab.txt

The next layer of the Madrigal data model is the instrument. All data in Madrigal is associated with one and only one instrument. Any given Madrigal site will hold data from one or more instruments. Since Madrigal focuses on ground-based instruments, most instruments have a particular location associated with them. However, some Madrigal data is based on measurements from multiple instruments, and so have no particular location. Some examples are "EISCAT Scientific Association IS Radars" which combine data from the multiple EISCAT radars, and "World-wide GPS Receiver Network", which consists of over a thousand individual GPS receivers distributed around the globe.

Metadata about all instruments is stored in MADROOT/metadata/instTab.txt. When new Madrigal instruments are added, this table is updated and all Madrigal sites are notified so they can update this file. If a site is running Madrigal 2.5 or higher, this file will be automatically updated unless the file has been manually modified by the administrator in a way not reported to OpenMadrigal administrator. This instrument code list is usually consistent with the Cedar instrument list. This file contains the following comma-separated fields:

Instrument Code (e.g., 30)
Instrument 3-letter Mnemonic - either of second two characters may also be numbers (e.g., mlh or p4p)
Instrument Name (e.g., Millstone Hill Incoherent Scatter Radar)
Latitude (e.g., 42.5)
Longitude (e.g., -71.9)
Altitude in km above sea level (e.g., 0.146)
Contact name (e.g., John M. Holt)
Contact Address 1 (e.g., MIT Haystack Observatory)
Contact Address 2 (e.g., Route 40)
Contact Address 3 (e.g., "" )
Contact City (e.g., Westford)
Contact State/Province (e.g., MA)
Contact Postal Code (e.g., 01886)
Contact Country (e.g., USA)
Contact Telephone (e.g., 1-617-981-5625)
Contact email (e.g., mailto:jmh@haystack.mit.edu)
Instrument category id (e.g., 6)

Instrument type - instType.txt

The instrument type table lists categories of instruments, to allow the user to search instruments more easily. If a site is running Madrigal 2.5 or higher, this file will be automatically updated unless the file has been manually modified by the administrator in a way not reported to OpenMadrigal administrator. This file contains the following comma-separated fields:

Instrument category id (e.g., 6)
Instrument category description (e.g., Incoherent Scatter Radar)

Experiment - expTab.txt

All the data from a given instrument is organized into experiments. An experiment consists of data from a single instrument covering a limited period of time, and, as a rule, is meant to address a particular scientific goal. Madrigal makes the assumption that instruments may be run in different modes, and so the data generated may vary from one experiment to another. By organizing one instrument's data into experiments, the purpose and limitations of each experiment can be made clearer. As a Madrigal administrator, you can also provide users with supplemental plots and documentation about that experiment, in addition to the standard Madrigal data files. See the section on creating and updating Madrigal experiments for more information.

Madrigal has a number of security codes for different types of experiments. Most experiments are public, which allows them to be accessed by everyone. An experiment can also be made private, which means that only users with set ranges of ip addresses can access them. (See Set private versus public access.) Private experiments are never shared with other Madrigal sites. There is also a hidden experiment state to completely remove an experiment from access by Madrigal (if, for example, the data is discovered to be corrupt, but might be fixed in the future).

There are also two experiment states for archiving experiments. These are public archive and private archive. These states are meant to support the archiving of Madrigal data at a central Madrigal site, such as the one at NCAR. These archived experiments are duplicates of experiments found at other Madrigal sites. In general these archived experiments are ignored by any part of the user interface that searches all sites, because the user will only want to find the main data source. However, when the user interface is only accessing local data, these archived experiments will appear. A private archived experiment is subject to the same restriction as a regular private experiment.

Metadata about all experiments is stored in MADROOT/metadata/expTab.txt. This file is automatically generated from individual expTab.txt files located in each experiment directory, as will be described in the next section on experiment organization. This file contains the following comma-separated fields:

Experiment ID (auto-generated)
Experiment URL (e.g., http://www.haystack.mit.edu/cgi-bin/madtoc/1997/mlh/03dec97g). Note this url is historical, and no longer works.
Experiment Name (e.g., Wide Latitude Substorm Study)
Site ID (e.g., 1)
Start Date (YYYYMMDD) (e.g., 19971203)
Start Time (HHMMSS)(e.g., 011356)
End Date (e.g., 19971205) (YYYYMMDD)
End Time (e.g., 123525) (HHMMSS)
Instrument Code (e.g., 30)
Security Code (e.g., 0) - 0 for public, 1 for private, -1 for completely ignored, 2 for public archive, 3 for private archive.
Principle Investigator (e.g., Phil Erickson) optional field - used to override the instrument PI
Principle Investigator email (e.g., perickson@haystack.mit.edu) optional - PI must also be given

There is also a file called MADROOT/metadata/expTabAll.txt which is also automatically generated. If differs from expTab.txt in that it contains experiment metadata from all Madrigal sites, not just the local one. Any remote experiment with a non-zero security code will be excluded.

Experiment Files - fileTab.txt

The data from a given experiment is stored in one or more experiment files. There are a number of reasons there may be more than one file for a given experiment. The first is that different kinds of data may be stored in different files. Also, the experimental data may be analyzed in more than one way, leading to files with different sets of measured parameters. For these two cases, each file should have its own kindat code (see below). Another reason for multiple files is that older, historical files can be kept on-line for reference purposes.

With Madrigal 3, the format of these files is Hdf5 as defined by the CEDAR Madrigal Hdf5 file format. Each file may contain only one kindat. The category field is used to distinguish files which are of historical interest only, e.g. a file which have been superseded by a file with an improved electron density calibration. In some cases there may be more than one up-to-date variant of a file, e.g. when different analysis options have been chosen. In this case one of these files is designated the default, and the others are designated as variants.

Metadata about all experiment files is stored in MADROOT/metadata/fileTab.txt. This file is automatically generated from individual fileTab.txt files located in each experiment directory, as will be described in the next section on experiment organization. This file contains the following comma-separated fields:

File Name (e.g., mil971203g.002)
Experiment ID (e.g., 10000125)
Data Type (e.g., 3001)
Category (1=default, 2=variant, 3=history, 4=real-time)
Size of File (e.g., 241920)
File contains at least one Catalog Record File (0 for no, 1 for yes)
File contains at least one Header Record File (0 for no, 1 for yes)
Analysis/modify Date (YYYYMMDD) (e.g., 19980101)
Analysis/modify Time (HHMMSS) (e.g., 115131)
File processing status description (preliminary, final, or any other description)
Permission flag: 0 for public, 1 for private
File analyst - who did the analysis of the data and created this file (e.g. John Holt) optional
File analyst email (e.g., jholt@haystack.mit.edu) optional - File analyst must also be given

There is also a file called MADROOT/metadata/fileTabAll.txt which is also automatically generated. If differs from fileTab.txt in that it contains experiment file metadata from all Madrigal sites, not just the local one.

Data parameters - parmCodes.txt

Any given file is made up a series of records holding measured parameters. Note that based on which parameters are in the file, Madrigal will automatically derive a large number of other parameters such as Kp and Magnetic field strength that aren't in the file itself. In the web browser, measured parameters are shown in bold, derived parameters in normal font.

The metadata file parmCodes.txt contains information about what Madrigal or Cedar parameters are supported. It replaces the old format file parmCodes.txt used in Madrigal 2 because that old file limited the size of id codes with its column delimited format. . If a parameter has a parameter code of 0, it cannot be stored in a Cedar file and is meant the be a derived value only. All Madrigal mnemonics must be unique. All non-zero parameter codes must be unique. If a new parameter is desired, it should be done in coordination with Bill Rideout at brideout@haystack.mit.edu.

The file parmCodes.txt is comma delimited.

Parameter Code (may be zero)
Description (cannot include a comma)
Units
Mnemonic
Format (Now use C-style formatting)
Width
Category Id (see madCatTab.txt)
Mnemonic has Html description (1 or 0) - used to display extra information about the parameter
Err mnemonic has Html description (1 or 0) - used to display extra information about the error parameter
Is duplicate of code (optional field - used because old cedar format format had duplicate parameters to represent different dynamic range).

Parameter explanations

For parameters that cannot be fully described in the simple string in the parmCodes.txt file, additional explanation about the parameter or its corresponding error parameter can be added to the file madroot/doc/parmDesc.html. Simply create a new named anchor in that file, where the anchor name is the parameter mnemonic in all capitals. Following that, a description of arbitrary length can be given using html. Change one of the last two columns from 0 to 1 for that parameter in parmCodes.txt to let Madrigal know that this explanation exists. In general, the parameter order in parmDesc.html matches that of parmCodes.txt, but that is not a functional requirement.

Parameter categories - madCatTab.txt

The Madrigal category metadata file(madCatTab.txt) contains information about what categories Madrigal parameters belong in. The categories are similar to the Cedar categories, but do not follow them exactly. This file does not change. This file contains the following comma-separated fields:

Category id (integers, starting at 0)
Category name
Minimum parameter code (integer). Used only for codes not listed in parmCodes.txt. If category doesn't have a range of codes, use -1.
Maximum parameter code (integer). Used only for codes not listed in parmCodes.txt. If category doesn't have a range of codes, use -1.

Data type (kindat) table - typeTab.txt

The Madrigal data type (also called kind of data or kindat) metadata file(typeTab.txt) contains a list of all data types in the database. The purpose of the kindat is to uniquely identify the data processing algorithm used to compute the parameters in the associated Madrigal file. Each individual Madrigal site is free to update this table, as long as there are no duplicate kindats in your own file. It is best to simply keep the kindat codes in ascending order to aviod any chance of a duplicate. The kindat description must not contain a comma.

Data Type Code (e.g., 3001)
Data Type Brief Description (e.g., Basic Derived Parameters )

Instrument parameter table - instParmTab.txt

The instrument parameter metadata file (instParmTab.txt) contains information about what measured parameters are found in the data for any given instrument. This data is used to support the global database query web page, and is rebuild by updateMaster. This file contains the following comma-separated fields:

Instrument Code (e.g., 30)
Parameter mnemonic list (e.g., range rangei az1 az2 el1 el2 pl snp3 chisq mhdqc1 systmp systmi power tfreq popl dpopl ti dti tr dtr vo dvo ph+ dph+ pm dpm fa dfa pnrmd pnrmdi vdopp dvdopp)

Instrument kindat table - instKindatTab.txt

The instrument kindat metadata file (instKindatTab.txt) contains information about what kindat codes are used with any given instrument. This data is used to support the global database query web page, and is rebuild by updateMaster. This file contains the following comma-separated fields:

Instrument Code (e.g., 30)
Kindat code list (e.g., 3408 13204 13210 3001)

Instrument data table - instData*.txt

The instrument data metadata files (instData.txt and instDataPriv.txt) contain information about for what years and instruments each Madrigal site has data. Effectively these file are summaries of data in expTabAll.txt, and only exist to speed UI performance. Note that there will be only one line per kinst value. If muliple sites have data for the same instrument, the rules as to which siteID is used are:

Test experiments are ignored
If the local site has that instrument, the local site id is always used (thus archive sites will show data as local).
Non-local archived experiments are ignored in favor of the main site.
If the instrument is at multiple non-archived, non-local sites, the site with the largest number of experiments is used.

The two files instData.txt and instDataPriv.txt diff only in that private data is included in instDataPriv.txt, whereas public data isd included in both.

This data is used to enhace performance of various web pages, and is rebuild by updateMaster.

This file contains the following comma-separated fields:

Site ID (eg, 10)
Instrument Code (e.g., 30)
Year for whihc there is data list (e.g., 1998 1999 2002)

File data

The bottom level of the Madrigal data model is of course the data itself. With Madrigal 3.0 and beyond, the actual file format is Hdf5 that follows the rules defined in the CEDAR Madrigal Hdf5 format document. A Madrigal file is made up of a series of records, each with a start and stop time, representing the integration period of measurement (Madrigal tries to enforce the idea that all measurements take a finite time, but sometimes the start time = the stop time).

Each Cedar parameter can also have an associated error value. This error value can have the special values "missing", "assumed", or "known bad". If an error parameter is "assumed", the implication is that the measured value itself is assumed, and does not represent a measured value. If the error value is "known bad", the measured data is known to have a problem.