Files and Metadata: Packaging and Release
Overview
This guide explains how to prepare and release datasets in Synapse. It covers organizing metadata, setting up access controls, and making data available to the research community. The process ensures data is well-documented, properly controlled, and ready for reuse.
The process involves:
Organizing metadata into shareable formats
Setting up appropriate access controls
Validating all relationships
Releasing data for access
Required Access
For MC2 Center Staff:
Administrative access to Synapse
Access to metadata templates
Access to validation tools
Governance team contact
Instructions
Mint Dataset Digital Object Identifiers (DOIs)
🔶 MC2 Center Role: Prepare Dataset View metadata for contributor review
DOIs will be recorded in DatasetView manifest during the next step
When creating the DOI, at minimum, include the first author and primary investigator
If other authors are known, based on a pre-print or provided Study metadata, include their names, as well
Prepare Dataset Annotations
🔶 MC2 Center Role: Prepare Dataset View metadata for contributor review
Once datasets are ready for release, Dataset View records will be recorded in the Dataset View metadata template linked in your Synapse project and added to the Cancer Complexity Knowledge Portal database.
🔷 Contributor Role: Review Dataset View metadata
Apply Dataset Annotations
🔶 MC2 Center Role: Run script table_to_annotations.py to annotate Datasets with Dataset View metadata
python table_to_annotations.py -t [Dataset Synapse Id] -v [DatasetView metadata table Synapse Id]
Annotate Files with Record Metadata
🔶 MC2 Center Role: Run script table_to_annotations.py to extract and apply metadata to files contained in Datasets
python table_to_annotations.py -t [Dataset Synapse Id] -f [File View metadata table Synapse Id] -s [Biospecimen metadata table Synapse Id] -i [Individual metadata table Synapse Id] -m [Model metadata table Synapse Id]
Adjust Dataset schemas
Remove from Dataset schema:
StudyKey
Id
FileViewId
EntityId
Component
Any other Component_Id attributes applied. Retain Component Keys only
Any other redundant fields
From automatically applied annotations, only retain the following:
Id
Name
Path
currentVersion
dataFileSizeBytes
dataFileMD5Hex
Configure Access Controls
🔶 MC2 Center Role: Set up data access:
Review sharing requirements:
Check Data Sharing Plan
Verify IRB documentation
Confirm institutional certification
Bind the appropriate access requirement (AR) JSON schema to the Synapse project or incorporate the AR JSON schema into data validation schemas.
This is only required if the data will be released under access requirements.
Access Levels:
├── Open Access
│ ├── Anyone on the internet can view
│ └── Registered Synapse users can download
└── Access Requirements
├── User must accept conditions of use to gain access
│ to files (Conditional Access)
└── User must submit an access request or provide
documentation to gain access to files (Controlled Access)
Ensure annotations on Datasets and data storage folders align with the access requirement schema:
Tag Datasets and data storage folders with appropriate annotations
Configure folder-level restrictions
Set file-specific controls if needed
Work with Governance to verify that access controls have been accurately applied
Validate Release Package
🔶 MC2 Center Role: Perform final checks:
Metadata validation:
All required fields present
Keys properly linked
Relationships valid
No missing connections
Access control validation:
DUO codes properly applied
AR schema bound correctly
Permissions working as expected
Test access paths
Release Data
🔶 MC2 Center Role: Make data available:
For Open Access data:
Set permissions to public for Datasets and data storage folders
For Controlled Access data:
Verify AR implementation
Test access request process
Document approval workflow
Set permissions to public for Datasets and data storage folders
Final verification:
Test all access paths
Verify download functionality
Check permission inheritance
Ensure DatasetView metadata has been integrated into the Cancer Complexity Knowledge Portal staging tables
Timeline Expectations
Metadata Organization: 2-3 business days
Metadata extraction and entity annotation
Relationship verification
Schema configuration
Access Setup: 1-2 business days
AR configuration
Permission testing
Validation: 1-2 business days
Metadata checks
Access verification
Final testing
Common Issues and Solutions
Metadata Relationships
Issue: Missing key relationships
Solution: Use validation scripts to identify gaps
Prevention: Follow key naming conventions
Access Controls
Issue: Incorrect permission inheritance
Solution: Check folder hierarchy permissions
Prevention: Test access at each level
Schema Display
Issue: Hidden required fields
Solution: Review schema configuration
Prevention: Use schema templates
Support
We're here to support you through this process. Don't hesitate to Contact Us if you have questions or need guidance at any step.