Tips for formating your data to be served by CMarZ

The Data Management Office will be able to serve your data more quickly if it does not require extensive reformatting in order to be incorporated into our database.  While each data set is unique, the guidelines below will help in submitting data, but please contact the CMarZ Data Management Office with specific questions.

Example dataset formats are given in Appendices A and B, below.  Useful information:

·      Data fields should not be color coded.

·      Blank data fields should be identified as either containing no data (‘nd’) or a zero where zero is a real observed value. Do not use '999's to mean “no data.” All cells should contain observed values or “nd.” Do not leave blanks in the cells below an entry if it means that the same entry is repeated.

·      Relevant dates, vessel names, identification notations, etc. should be repeated in columns, not rows.

·      Comments within a data sheet should appear only once and be limited to one column, not several times in several different columns.

·      Please try to avoid the characters ' ( ) ‘ ’ ? ° and & in your data. The web interprets these characters very specifically.

·      Species should be named in a consistent manner.

Two examples of data formats easily servable on the CMarZ database.

Example 1: good dataset

Example 2: good dataset.

Example 3: Dataset that is NOT easily servable without extensive reformatting (uses colors that give meaning to the data, has blank cells). Species names are in a good place here - they should be down the side rather than across the top as is sometimes done).

Example 4: Not a good dataset for serving. In this example, the colors are not a problem because they are for ease of viewing rather than have a meaning such as the validity of the data. See a more database friendly version below. The problems with this data include:

- the species names across the top should go down the side.
- the species and stages should be split up (see example below).
- the rows in the first 14 columns need to be repeated for each species/stage .
- there are blanks that could mean zero individuals were found or it could mean they were not looked for in this sample set.
- there are characters that database has problems with including '(' and ')'.
- the fractions are not readable by the database. We would prefer either a decimal value or just the denominator (see example below).

example of data4

Example 5: The perfect dataset. This is how it looked after we reformatted a similar dataset. The first 15 columns repeat for all the species/stages. The stages are a column while the species are a row. The fraction is converted to just the denominator. Zeros are entered for the counts rather than left blank. No cell is empty.

data file