Files and Metadata: Annotation and Organization
Overview
This guide explains how to document and organize your research data using metadata - descriptive information that helps others find, understand, and reuse your work.
The process typically takes 1-2 weeks, depending on:
Amount of data to document
Complexity of your experimental setup
Number of templates needed
Any validation issues that need addressing
The process involves collaboration between you (the contributor) and MC2 Center staff so your data is:
Well-documented with standardized descriptions
Properly organized for easy access
Linked together in meaningful ways
Ready for others to discover and reuse
This approach follows FAIR principles, making your data:
Findable: Others can discover your data
Accessible: Clear access requirements
Interoperable: Uses standard formats
Reusable: Well-documented for reuse
CRITICAL: Working with Templates
Throughout this process, you'll work with metadata templates provided as Google Sheets. These templates help capture important information about your data in a standardized format. To ensure your metadata can be processed correctly:
ONLY record metadata in templates linked in your Synapse Project
Why This Process Matters
Each step in this process serves an important purpose:
Folder Organization
Makes data easy to find
Supports automated processing
Metadata Templates
Captures essential information
Provides data discovery
Submission Order
Maintains data relationships
Prevents missing links
Supports validation
Component IDs
Create clear relationships
Support data tracking
Allow future updates
Understanding Metadata Types
We use two primary types of metadata:
Record-based Metadata
Describes things like:
Study information - Patient demographics
Human participants or model systems - Cell line information
Biospecimens - Sample processing methods
Experimental details - Imaging parameters
File-based Metadata
Describes the actual data files such as:
FASTQ files from sequencing
Microscopy images
Analysis results
Supporting documentation
These two metadata types work together to tell the complete story of your research:
Study
├── Participants/Models ─┐
├── Biospecimens ────────┼──> Data Files
└── Experimental Setup ──┘
Required Access
For Contributors:
Access to your grant-specific Synapse project
Access to metadata templates (provided by MC2 Center)
For MC² Center Staff:
Administrative access to Synapse
Access to schematic CLI tools
Access to validation scripts
Instructions
Understand Your Data Organization
🔷 Contributor Role: Review how your Synapse project is structured.
Content should be organized in a standard folder structure:
Project/
├── data/ # Your research data files
├── studies/ # Study information
├── biospecimens/ # Sample metadata
├── models/ # Model system and cell line metadata
├── individuals/ # Human patient metadata
├── sharing_plans/ # Data sharing info
├── governance/ # Governance documentation
├── publications/ # Metadata for publications
├── datasets/ # Metadata for released datasets
├── tools/ # Metadata for released tools
├── education/ # Metadata for released educational resources
Access Your Templates
🔷 Contributor Role: Get your metadata templates.
CRITICAL: Working with Templates
Throughout this process, you'll work with metadata templates provided as Google Sheets. These templates help capture important information about your data in a standardized format. To ensure your metadata can be processed correctly:
ONLY record metadata in templates linked in your Synapse Project
To access your available metadata templates, navigate to the relevant folder in your Synapse project and select the linked sheet.
Example:
to access the Biospecimen template, open the biospecimens/ folder
to access the File View template, open the data/ folder
The linked template will be named according to the format: [grant number]_[data type]_[version]
Example: CA123456_Biospecimen_v10.0.0
Your Data Sharing Plan will be used to document which metadata templates to complete for your datasets.
For metadata types that apply to more than one dataset (e.g., Biospecimen, File View, Individual, Model), additional rows will be added to your Data Sharing Plan
For all Data Sharing Plan entries, the applicable metadata templates will be linked in column Y, “DSP Dataset Metadata“
Follow the Completion Order
🔷 Contributor Role: Complete templates in this order:
Model/Individual information (if applicable)
Describes your experimental system
Must come before Biospecimen data
Biospecimen information (if applicable)
Links samples to models/individuals
Must come before file metadata
File View metadata
Describes your actual data files
Links files to samples and study
Assay-specific metadata
Additional details about specific methods
Example: Imaging or sequencing parameters
Please contact the MC2 Center for guidance on preparing assay-specific metadata
Resource metadata (can be submitted independently of metadata listed above)
Information about publications, datasets, computational tools, and educational resources associated with your grant
Why this order matters:
Assay-specific metadata typically includes “Key”-type attributes that link files and information.
Properly submitting and preparing metadata helps to ensure that content can be linked appropriately.
Study ID
├── Model/Individual ID
│ └── Biospecimen ID
│ └── File ID
└── Dataset ID
Record Your Metadata
🔷 Contributor Role: Fill out your templates.
For each template:
Look for field descriptions in column headers
Hover over column names for detailed descriptions
Required fields are highlighted in blue
Optional fields provide additional context
Check “Sheet 2” for valid values:
Click “Sheet 2” tab at bottom (unhide if needed)
Find your column of interest
Use exact values from "Valid Values" column
Multiple values? Use commas to separate
Use comma-separated lists for multiple values
Example: "RNA-seq, ATAC-seq, ChIP-seq"
Common information sources:
Data files themselves
Quality control reports
Lab notebooks
Protocol documents
Analysis outputs
Publications
Upload any reference documents you used to the documentation folder
Understanding Component IDs
🔷 Contributor Role: Each entry needs a unique ID.
IDs follow these patterns:
Study: [Grant number]-[Journal/Type]-[Date]
Model: [Grant number]-M[Number]
Individual: [Grant number]-IND[Number]
Biospecimen: [Parent ID]-B[Number]
Example flow:
Study: GRANT123-CELL-2024
└── Model: GRANT123-M1
└── Biospecimen: GRANT123-M1-B1
└── File: syn789012 (Synapse ID)
Submit for Validation
🔷 Contributor Role: Let MC² Center know when you're done.
If you are a contributor, update the MC² Center and STOP here.
🔶 MC2 Center Role: We will:
Download your completed templates
Run validation checks to ensure:
All required fields are complete
Values match expected formats
IDs and keys are properly linked
Relationships between records are valid
Provide feedback if updates are needed
Upload validated metadata to Synapse
If validation fails:
We'll provide detailed feedback about:
Which fields need attention
What the specific issues are
How to correct the problems
You can then:
Make the requested updates
Ask questions if anything is unclear
Resubmit for validation
Common validation issues to watch for:
Missing required fields
Incorrect date formats
Invalid ID patterns
Missing relationships between records
Incorrect terms or spellings
Values not from approved list
After Validation
🔶 MC2 Center Role: Once validation is successful, we:
Convert metadata to proper format
Upload to Synapse project
Apply metadata to relevant files
Update portal database
Prepare for eventual release
This process ensures your data is:
Properly documented
Correctly linked
Ready for discovery
Prepared for sharing
Example Workflows
Example 1: Imaging Dataset
A researcher wants to share microscopy data with some participant information:
Complete templates in order:
Individual template (participant info)
Biospecimen template (sample info)
File View template (file info)
Imaging Channel template
Imaging Level 2 template
Link everything together:
Study (GRANT123-IMG-2024)
├── Individual (GRANT123-IND1)
│ └── Biospecimen (GRANT123-IND1-B1)
│ └── Image Files (syn789012)
└── Dataset (syn456789)
Example 2: GeoMx Dataset
A researcher wants to share spatial genomics data:
Upload supporting files first:
Experimental config file
Probe config file
Lab worksheet
ROI coordinate files, if applicable
Complete templates in order:
Individual template (participant info)
Biospecimen template (sample info)
File View template
Imaging Channel template
ROI/segment template
GeoMx Auxiliary files template
GeoMx Level 1 template
GeoMx Level 2 template
GeoMx Level 3 template
GeoMx Imaging template
Example organization:
Project
├── biospecimens
└── Biospecimen metadata template
├── individuals
└── Individual metadata template
├── imaging_channel
└── Imaging channel metadata template
└── [study_id]/
├── File View metadata template
├── Auxiliary files/
└── Auxiliary files metadata template
├── ROI Data/
└── ROI metadata template
├── GeoMx Level 1/
└── GeoMx Level 1 metadata template
├── GeoMx Level 2/
└── GeoMx Level 2 metadata template
├── GeoMx Level 3/
└── GeoMx Level 3 metadata template
└── GeoMx Imaging/
└── GeoMx Imaging metadata template
Validation Process Timeline
The validation process typically takes:
Initial review: 1-2 business days
Each revision cycle: 1-2 business days
Final validation: 1-2 business days
Factors that can affect timing:
Number of templates to validate
Complexity of relationships
Number of validation issues
Response time for revisions
Example Templates
CRITICAL: Working with Templates
Throughout this process, you'll work with metadata templates provided as Google Sheets. These templates help capture important information about your data in a standardized format. To ensure your metadata can be processed correctly:
ONLY record metadata in templates linked in your Synapse Project
Resource metadata example templates:
Need Help?
We're here to support you through this process. Don't hesitate to Contact Us if you have questions or need guidance at any step.