Skip to main content
Skip table of contents

Files and Metadata: Packaging and Release


Overview

This guide explains how to prepare and release datasets in Synapse. It covers organizing metadata, setting up access controls, and making data available to the research community. The process ensures data is well-documented, properly controlled, and ready for reuse.

The process involves:

  • Organizing metadata into shareable formats

  • Setting up appropriate access controls

  • Validating all relationships

  • Releasing data for access

Required Access

For MC2 Center Staff:

  • Administrative access to Synapse

  • Access to metadata templates

  • Access to validation tools

  • Governance team contact

Instructions

  1. Mint Dataset Digital Object Identifiers (DOIs)

🔶 MC2 Center Role: Prepare Dataset View metadata for contributor review

  • DOIs will be recorded in DatasetView manifest during the next step

  • When creating the DOI, at minimum, include the first author and primary investigator

  • If other authors are known, based on a pre-print or provided Study metadata, include their names, as well

  1. Prepare Dataset Annotations

🔶 MC2 Center Role: Prepare Dataset View metadata for contributor review

Once datasets are ready for release, Dataset View records will be recorded in the Dataset View metadata template linked in your Synapse project and added to the Cancer Complexity Knowledge Portal database.

🔷 Contributor Role: Review Dataset View metadata

  1. Apply Dataset Annotations

🔶 MC2 Center Role: Run script table_to_annotations.py to annotate Datasets with Dataset View metadata

CODE
python table_to_annotations.py -t [Dataset Synapse Id] -v [DatasetView metadata table Synapse Id]
  1. Annotate Files with Record Metadata

🔶 MC2 Center Role: Run script table_to_annotations.py to extract and apply metadata to files contained in Datasets

CODE
python table_to_annotations.py -t [Dataset Synapse Id] -f [File View metadata table Synapse Id] -s [Biospecimen metadata table Synapse Id] -i [Individual metadata table Synapse Id] -m [Model metadata table Synapse Id]
  1. Adjust Dataset schemas

    • Remove from Dataset schema:

      • StudyKey

      • Id

      • FileViewId

      • EntityId

      • Component

      • Any other Component_Id attributes applied. Retain Component Keys only

      • Any other redundant fields

    • From automatically applied annotations, only retain the following:

      • Id

      • Name

      • Path

      • currentVersion

      • dataFileSizeBytes

      • dataFileMD5Hex

  2. Configure Access Controls

🔶 MC2 Center Role: Set up data access:

  • Review sharing requirements:

    • Check Data Sharing Plan

    • Verify IRB documentation

    • Confirm institutional certification

  • Bind the appropriate access requirement (AR) JSON schema to the Synapse project or incorporate the AR JSON schema into data validation schemas.

This is only required if the data will be released under access requirements.

CODE
Access Levels:
├── Open Access
│   ├── Anyone on the internet can view
│   └── Registered Synapse users can download
└── Access Requirements 
    ├── User must accept conditions of use to gain access
    │   to files (Conditional Access)
    └── User must submit an access request or provide
        documentation to gain access to files (Controlled Access)
  • Ensure annotations on Datasets and data storage folders align with the access requirement schema:

    • Tag Datasets and data storage folders with appropriate annotations

    • Configure folder-level restrictions

    • Set file-specific controls if needed

  • Work with Governance to verify that access controls have been accurately applied

  1. Validate Release Package

🔶 MC2 Center Role: Perform final checks:

  • Metadata validation:

    • All required fields present

    • Keys properly linked

    • Relationships valid

    • No missing connections

  • Access control validation:

    • DUO codes properly applied

    • AR schema bound correctly

    • Permissions working as expected

    • Test access paths

  1. Release Data

🔶 MC2 Center Role: Make data available:

  • For Open Access data:

    • Set permissions to public for Datasets and data storage folders

  • For Controlled Access data:

    • Verify AR implementation

    • Test access request process

    • Document approval workflow

    • Set permissions to public for Datasets and data storage folders

  • Final verification:

    • Test all access paths

    • Verify download functionality

    • Check permission inheritance

    • Ensure DatasetView metadata has been integrated into the Cancer Complexity Knowledge Portal staging tables

Timeline Expectations

  1. Metadata Organization: 2-3 business days

  • Metadata extraction and entity annotation

  • Relationship verification

  • Schema configuration

  1. Access Setup: 1-2 business days

  • AR configuration

  • Permission testing

  1. Validation: 1-2 business days

  • Metadata checks

  • Access verification

  • Final testing

Common Issues and Solutions

  1. Metadata Relationships

  • Issue: Missing key relationships

  • Solution: Use validation scripts to identify gaps

  • Prevention: Follow key naming conventions

  1. Access Controls

  • Issue: Incorrect permission inheritance

  • Solution: Check folder hierarchy permissions

  • Prevention: Test access at each level

  1. Schema Display

  • Issue: Hidden required fields

  • Solution: Review schema configuration

  • Prevention: Use schema templates

Support

We're here to support you through this process. Don't hesitate to Contact Us if you have questions or need guidance at any step.

Resources

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.