Skip to main content
Version: 6.0.0

How to structure the samples metadata file

Background

SODA helps you prepare the samples metadata file conveniently. While SODA automatically generates the file in the required structure, we explain here how it must be structured according to the SPARC rules in order to provide some insight about the structure of the file generated by SODA.

How to

  • Format: The samples file is accepted in either xlsx, csv, or json format. SODA generates it in the xlsx format based on the template provided by the Curation Team.
  • Location in the dataset: The samples file is typically expected in the high-level dataset folder.
  • Content: The subject_id and sample_id are mandatory (highlighted in bold and italic below) for all datasets and must be provided with one Value. The other experimental setup elements (highlighted in bold-only below) are also mandatory when available.
    • subject_id: Lab-based schema for identifying each subject. This field should match the primary's sub-folder names. The subject_id must be unique.
    • sample_id: Lab-based schema for identifying each sample. The sample_id must be unique across the whole dataset.
    • wasDerivedFromSample: This is the sample_id of the sample from which the current sample was derived (e.g., slice, tissue punch, biopsy, etc.).
    • pool_id: If data is collected on multiple subjects at the same time include the identifier of the pool where the data file will be found. If this is included it should be the name of the top level folder inside primary.
    • experimental group: This field refers to the experimental group that a subject is assigned to in the research project.
    • Specimen type: This refers to the physical type of the specimen from which the data were extracted.
    • Specimen anatomical location: This is the organ, or sub-region of organ from which the data were extracted.
    • Sex: This is the sex of the subject, or if unknown, leave it empty.
    • Species: This is the species of the subject. When users start typing to search for a species, SODA provides species suggestions based on the NCBI taxonomy.
    • Strain: This is the organism strain of the subject.
    • RRID for strain: This is the Research Resource Identifier Identification (RRID) for the strain of the subject. SODA utilizes Scicrunch to identify the RRID of the strain users provide.
    • Additional Fields (e.g. MINDS): Provide any additional fields that you would like to include in your samples.xlsx file.
    • Age: Age of the subject (e.g., hours, days, weeks, years old) or if unknown, leave it empty. For your convenience, SODA separates this entry into 2 fields: A number field (e.g: 1, 2, 3) and a unit field (e.g: hours, days, weeks, etc). If an ISO format is expected for this entry, enter the ISO-formatted text in the number field, and select N/A for the unit field.
    • Age category: The age category that the subject belongs to. An search field with suggestions based on list derived from UBERON life cycle stage is provided in the interface for your convenience.
    • Age range (min): This is the minimal age (youngest) of the research subjects. The format for this field is numerical value + space + unit (spelled out).
    • Age range (max): This is the maximal age (oldest) of the research subjects. The format for this field is numerical value + space + unit (spelled out).
    • Handedness: This refers to the preference of the subject to use the right or left hand, whenever applicable.
    • Genotype: This refers to the genetic makeup of genetically modified alleles in transgenic animals belonging to the same subject group. Ignore this field if the RRID is already provided.
    • Reference atlas: Enter here the reference atlas and organ.
    • Protocol title: This field refers to the title of the protocol within Protocols.io once the research protocol is uploaded to Protocols.io. In SODA, users can connect to their protocols.io account by clicking on Help me with my protocol information under the Protocol Information tab. An login interface will instruct users to sign in their account in the browser at protocols.io. An access token is required for automatic extraction of the protocol titles and links and can be easily obtained from the provided website once they are signed in. Once users successfully connect their account with with SODA, they can search in the input field for their protocol titles.
    • Protocol.io location: This refers to the Protocol.io URL for the protocol title. Once the protocol is uploaded to Protocols.io, the protocol must be shared with the SPARC group and the Protocol.io URL is noted in this field. Please share with the SPARC group. In SODA, when users select a protocol title in the previous field (Protocol title), the protocol location or link will be automatically filled out for this field.
    • Experimental log file name: This is a file containing experimental records for each sample, whenever applicable.

Was this page helpful?