Skip to main content
Skip table of contents

Files and Metadata: Annotation and Organization


Overview

This guide explains how to document and organize your research data using metadata - descriptive information that helps others find, understand, and reuse your work.

The process typically takes 1-2 weeks, depending on:

  • Amount of data to document

  • Complexity of your experimental setup

  • Number of templates needed

  • Any validation issues that need addressing

The process involves collaboration between you (the contributor) and MC2 Center staff so your data is:

  • Well-documented with standardized descriptions

  • Properly organized for easy access

  • Linked together in meaningful ways

  • Ready for others to discover and reuse

This approach follows FAIR principles, making your data:

  • Findable: Others can discover your data

  • Accessible: Clear access requirements

  • Interoperable: Uses standard formats

  • Reusable: Well-documented for reuse

CRITICAL: Working with Templates

Throughout this process, you'll work with metadata templates provided as Google Sheets. These templates help capture important information about your data in a standardized format. To ensure your metadata can be processed correctly:

ONLY record metadata in templates linked in your Synapse Project

Why This Process Matters

Each step in this process serves an important purpose:

  1. Folder Organization

  • Makes data easy to find

  • Supports automated processing

  1. Metadata Templates

  • Captures essential information

  • Provides data discovery

  1. Submission Order

  • Maintains data relationships

  • Prevents missing links

  • Supports validation

  1. Component IDs

  • Create clear relationships

  • Support data tracking

  • Allow future updates

Understanding Metadata Types

We use two primary types of metadata:

  1. Record-based Metadata

  • Describes things like:

    • Study information - Patient demographics

    • Human participants or model systems - Cell line information

    • Biospecimens - Sample processing methods

    • Experimental details - Imaging parameters

  1. File-based Metadata

  • Describes the actual data files such as:

    • FASTQ files from sequencing

    • Microscopy images

    • Analysis results

    • Supporting documentation

These two metadata types work together to tell the complete story of your research:

CODE
Study
  ├── Participants/Models ─┐
  ├── Biospecimens ────────┼──> Data Files
  └── Experimental Setup ──┘

Required Access

For Contributors:

  • Access to your grant-specific Synapse project

  • Access to metadata templates (provided by MC2 Center)

For MC² Center Staff:

  • Administrative access to Synapse

  • Access to schematic CLI tools

  • Access to validation scripts

Instructions

  1. Understand Your Data Organization

🔷 Contributor Role: Review how your Synapse project is structured.

Content should be organized in a standard folder structure:

CODE
Project/
├── data/                  # Your research data files
├── studies/               # Study information
├── biospecimens/          # Sample metadata
├── models/                # Model system and cell line metadata
├── individuals/           # Human patient metadata
├── sharing_plans/         # Data sharing info
├── governance/            # Governance documentation
├── publications/          # Metadata for publications
├── datasets/              # Metadata for released datasets
├── tools/                 # Metadata for released tools
├── education/             # Metadata for released educational resources
  1. Access Your Templates

🔷 Contributor Role: Get your metadata templates.

CRITICAL: Working with Templates

Throughout this process, you'll work with metadata templates provided as Google Sheets. These templates help capture important information about your data in a standardized format. To ensure your metadata can be processed correctly:

ONLY record metadata in templates linked in your Synapse Project

To access your available metadata templates, navigate to the relevant folder in your Synapse project and select the linked sheet.

Example:

  • to access the Biospecimen template, open the biospecimens/ folder

  • to access the File View template, open the data/ folder

The linked template will be named according to the format: [grant number]_[data type]_[version]

Example: CA123456_Biospecimen_v10.0.0

Your Data Sharing Plan will be used to document which metadata templates to complete for your datasets.

  • For metadata types that apply to more than one dataset (e.g., Biospecimen, File View, Individual, Model), additional rows will be added to your Data Sharing Plan

  • For all Data Sharing Plan entries, the applicable metadata templates will be linked in column Y, “DSP Dataset Metadata“

  1. Follow the Completion Order

🔷 Contributor Role: Complete templates in this order:

  1. Model/Individual information (if applicable)

  • Describes your experimental system

  • Must come before Biospecimen data

  1. Biospecimen information (if applicable)

  • Links samples to models/individuals

  • Must come before file metadata

  1. File View metadata

  • Describes your actual data files

  • Links files to samples and study

  1. Assay-specific metadata

  • Additional details about specific methods

  • Example: Imaging or sequencing parameters

  • Please contact the MC2 Center for guidance on preparing assay-specific metadata

  1. Resource metadata (can be submitted independently of metadata listed above)

  • Information about publications, datasets, computational tools, and educational resources associated with your grant

Why this order matters:

  • Assay-specific metadata typically includes “Key”-type attributes that link files and information.

  • Properly submitting and preparing metadata helps to ensure that content can be linked appropriately.

CODE
Study ID
  ├── Model/Individual ID
  │     └── Biospecimen ID
  │           └── File ID
  └── Dataset ID
  1. Record Your Metadata

🔷 Contributor Role: Fill out your templates.

For each template:

  1. Look for field descriptions in column headers

  • Hover over column names for detailed descriptions

  • Required fields are highlighted in blue

  • Optional fields provide additional context

  1. Check “Sheet 2” for valid values:

  • Click “Sheet 2” tab at bottom (unhide if needed)

  • Find your column of interest

  • Use exact values from "Valid Values" column

  • Multiple values? Use commas to separate

  1. Use comma-separated lists for multiple values

Example: "RNA-seq, ATAC-seq, ChIP-seq"

Common information sources:

  • Data files themselves

  • Quality control reports

  • Lab notebooks

  • Protocol documents

  • Analysis outputs

  • Publications

Upload any reference documents you used to the documentation folder

  1. Understanding Component IDs

🔷 Contributor Role: Each entry needs a unique ID.

IDs follow these patterns:

  • Study: [Grant number]-[Journal/Type]-[Date]

  • Model: [Grant number]-M[Number]

  • Individual: [Grant number]-IND[Number]

  • Biospecimen: [Parent ID]-B[Number]

Example flow:

CODE
Study: GRANT123-CELL-2024
   └── Model: GRANT123-M1
        └── Biospecimen: GRANT123-M1-B1
             └── File: syn789012 (Synapse ID)
  1. Submit for Validation

🔷 Contributor Role: Let MC² Center know when you're done.

If you are a contributor, update the MC² Center and STOP here.


🔶 MC2 Center Role: We will:

  1. Download your completed templates

  2. Run validation checks to ensure:

  • All required fields are complete

  • Values match expected formats

  • IDs and keys are properly linked

  • Relationships between records are valid

  1. Provide feedback if updates are needed

  2. Upload validated metadata to Synapse

If validation fails:

  1. We'll provide detailed feedback about:

  • Which fields need attention

  • What the specific issues are

  • How to correct the problems

  1. You can then:

  • Make the requested updates

  • Ask questions if anything is unclear

  • Resubmit for validation

Common validation issues to watch for:

  • Missing required fields

  • Incorrect date formats

  • Invalid ID patterns

  • Missing relationships between records

  • Incorrect terms or spellings

  • Values not from approved list

  1. After Validation

🔶 MC2 Center Role: Once validation is successful, we:

  1. Convert metadata to proper format

  2. Upload to Synapse project

  3. Apply metadata to relevant files

  4. Update portal database

  5. Prepare for eventual release

This process ensures your data is:

  • Properly documented

  • Correctly linked

  • Ready for discovery

  • Prepared for sharing

Example Workflows

Example 1: Imaging Dataset

A researcher wants to share microscopy data with some participant information:

  1. Complete templates in order:

    1. Individual template (participant info)

    2. Biospecimen template (sample info)

    3. File View template (file info)

    4. Imaging Channel template

    5. Imaging Level 2 template

  2. Link everything together:

CODE
Study (GRANT123-IMG-2024)
├── Individual (GRANT123-IND1)
│     └── Biospecimen (GRANT123-IND1-B1)
│           └── Image Files (syn789012)
└── Dataset (syn456789)

Example 2: GeoMx Dataset

A researcher wants to share spatial genomics data:

  1. Upload supporting files first:

  • Experimental config file

  • Probe config file

  • Lab worksheet

  • ROI coordinate files, if applicable

  1. Complete templates in order:

    1. Individual template (participant info)

    2. Biospecimen template (sample info)

    3. File View template

    4. Imaging Channel template

    5. ROI/segment template

    6. GeoMx Auxiliary files template

    7. GeoMx Level 1 template

    8. GeoMx Level 2 template

    9. GeoMx Level 3 template

    10. GeoMx Imaging template

  2. Example organization:

CODE
Project
  ├── biospecimens
      └── Biospecimen metadata template
  ├── individuals
      └── Individual metadata template
  ├── imaging_channel
      └── Imaging channel metadata template
  └── [study_id]/
      ├── File View metadata template
      ├── Auxiliary files/
          └── Auxiliary files metadata template
      ├── ROI Data/
          └── ROI metadata template
      ├── GeoMx Level 1/
          └── GeoMx Level 1 metadata template
      ├── GeoMx Level 2/
          └── GeoMx Level 2 metadata template
      ├── GeoMx Level 3/
          └── GeoMx Level 3 metadata template
      └── GeoMx Imaging/
          └── GeoMx Imaging metadata template

Validation Process Timeline

The validation process typically takes:

  • Initial review: 1-2 business days

  • Each revision cycle: 1-2 business days

  • Final validation: 1-2 business days

Factors that can affect timing:

  • Number of templates to validate

  • Complexity of relationships

  • Number of validation issues

  • Response time for revisions

Example Templates

CRITICAL: Working with Templates

Throughout this process, you'll work with metadata templates provided as Google Sheets. These templates help capture important information about your data in a standardized format. To ensure your metadata can be processed correctly:

ONLY record metadata in templates linked in your Synapse Project

Resource metadata example templates:

Need Help?

We're here to support you through this process. Don't hesitate to Contact Us if you have questions or need guidance at any step.

Additional Resources

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.