Overview
This guide explains how to prepare and release datasets in Synapse. It covers organizing metadata, setting up access controls, and making data available to the research community. The process ensures data is well-documented, properly controlled, and ready for reuse.
The process involves:
-
Organizing metadata into shareable formats
-
Setting up appropriate access controls
-
Validating all relationships
-
Releasing data for access
Required Access
For MC2 Center Staff:
-
Administrative access to Synapse
-
Access to metadata templates
-
Access to validation tools
-
Governance team contact
Instructions
-
Mint Dataset Digital Object Identifiers (DOIs)
πΆ MC2 Center Role: Prepare Dataset View metadata for contributor review
-
DOIs will be recorded in DatasetView manifest during the next step
-
When creating the DOI, at minimum, include the first author and primary investigator
-
If other authors are known, based on a pre-print or provided Study metadata, include their names, as well
-
Prepare Dataset Annotations
πΆ MC2 Center Role: Prepare Dataset View metadata for contributor review
Once datasets are ready for release, Dataset View records will be recorded in the Dataset View metadata template linked in your Synapse project and added to the Cancer Complexity Knowledge Portal database.
π· Contributor Role: Review Dataset View metadata
-
Apply Dataset Annotations
πΆ MC2 Center Role: Run script table_to_annotations.py to annotate Datasets with Dataset View metadata
python table_to_annotations.py -t [Dataset Synapse Id] -v [DatasetView metadata table Synapse Id]
-
Annotate Files with Record Metadata
πΆ MC2 Center Role: Run script table_to_annotations.py to extract and apply metadata to files contained in Datasets
python table_to_annotations.py -t [Dataset Synapse Id] -f [File View metadata table Synapse Id] -s [Biospecimen metadata table Synapse Id] -i [Individual metadata table Synapse Id] -m [Model metadata table Synapse Id]
-
Adjust Dataset schemas
-
Remove from Dataset schema:
-
StudyKey
-
Id
-
FileViewId
-
EntityId
-
Component
-
Any other Component_Id attributes applied. Retain Component Keys only
-
Any other redundant fields
-
-
From automatically applied annotations, only retain the following:
-
Id
-
Name
-
Path
-
currentVersion
-
dataFileSizeBytes
-
dataFileMD5Hex
-
-
-
Configure Access Controls
πΆ MC2 Center Role: Set up data access:
-
Review sharing requirements:
-
Check Data Sharing Plan
-
Verify IRB documentation
-
Confirm institutional certification
-
-
Bind the appropriate access requirement (AR) JSON schema to the Synapse project or incorporate the AR JSON schema into data validation schemas.
This is only required if the data will be released under access requirements.
Access Levels:
βββ Open Access
β βββ Anyone on the internet can view
β βββ Registered Synapse users can download
βββ Access Requirements
βββ User must accept conditions of use to gain access
β to files (Conditional Access)
βββ User must submit an access request or provide
documentation to gain access to files (Controlled Access)
-
Ensure annotations on Datasets and data storage folders align with the access requirement schema:
-
Tag Datasets and data storage folders with appropriate annotations
-
Configure folder-level restrictions
-
Set file-specific controls if needed
-
-
Work with Governance to verify that access controls have been accurately applied
-
Validate Release Package
πΆ MC2 Center Role: Perform final checks:
-
Metadata validation:
-
All required fields present
-
Keys properly linked
-
Relationships valid
-
No missing connections
-
-
Access control validation:
-
DUO codes properly applied
-
AR schema bound correctly
-
Permissions working as expected
-
Test access paths
-
-
Release Data
πΆ MC2 Center Role: Make data available:
-
For Open Access data:
-
Set permissions to public for Datasets and data storage folders
-
-
For Controlled Access data:
-
Verify AR implementation
-
Test access request process
-
Document approval workflow
-
Set permissions to public for Datasets and data storage folders
-
-
Final verification:
-
Test all access paths
-
Verify download functionality
-
Check permission inheritance
-
Ensure DatasetView metadata has been integrated into the Cancer Complexity Knowledge Portal staging tables
-
Timeline Expectations
-
Metadata Organization: 2-3 business days
-
Metadata extraction and entity annotation
-
Relationship verification
-
Schema configuration
-
Access Setup: 1-2 business days
-
AR configuration
-
Permission testing
-
Validation: 1-2 business days
-
Metadata checks
-
Access verification
-
Final testing
Common Issues and Solutions
-
Metadata Relationships
-
Issue: Missing key relationships
-
Solution: Use validation scripts to identify gaps
-
Prevention: Follow key naming conventions
-
Access Controls
-
Issue: Incorrect permission inheritance
-
Solution: Check folder hierarchy permissions
-
Prevention: Test access at each level
-
Schema Display
-
Issue: Hidden required fields
-
Solution: Review schema configuration
-
Prevention: Use schema templates
Support
We're here to support you through this process. Don't hesitate to Contact Us if you have questions or need guidance at any step.