Cryo2StructData : Trained Model and Data Splits (Small Subset) (doi:10.7910/DVN/DTV4JF)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Cryo2StructData : Trained Model and Data Splits (Small Subset)

Identification Number:

doi:10.7910/DVN/DTV4JF

Distributor:

Harvard Dataverse

Date of Distribution:

2023-10-26

Version:

1

Bibliographic Citation:

Giri, Nabin; Wang, Liguo; Cheng, Jianlin, 2023, "Cryo2StructData : Trained Model and Data Splits (Small Subset)", https://doi.org/10.7910/DVN/DTV4JF, Harvard Dataverse, V1

Study Description

Citation

Title:

Cryo2StructData : Trained Model and Data Splits (Small Subset)

Subtitle:

Cryo2StructData: A Large Labeled Cryo-EM Density Map Dataset for AI-based Modeling of Protein Structures

Identification Number:

doi:10.7910/DVN/DTV4JF

Authoring Entity:

Giri, Nabin (University of Missouri System)

Wang, Liguo (Brookhaven National Laboratory)

Cheng, Jianlin (University of Missouri System)

Other identifications and acknowledgements:

Giri, Nabin

Grant Number:

R01GM146340

Distributor:

Harvard Dataverse

Access Authority:

Giri, Nabin

Access Authority:

Cheng, Jianlin

Depositor:

Giri, Nabin

Date of Deposit:

2023-10-24

Holdings Information:

https://doi.org/10.7910/DVN/DTV4JF

Study Scope

Keywords:

Computer and Information Science, Medicine, Health and Life Sciences, Other, cryo-electron microscopy

Abstract:

This repository includes the trained transformer-based model for the small subset Cryo2StructData dataset, as well as the training and validation split files. These split files categorize density map EMD-IDs into low, medium, and high resolutions. The training and validation sets contain 1680 and 187 density maps, respectively, with a split ratio of 90:10.

Notes:

The trained model checkpoints are for predicting amino acid types, secondary structure types and backbone atom types. The <b><i>backbone_atom_prediction.ckpt</i></b> is used during the inference phase to classify each voxel into one of four different classes representing three backbone atoms (C&alpha;,C and N) and no presence of any backbone atoms. Similarly, the <b><i>amino_acid_type_prediction.ckpt</i></b> is used to classify each voxel into one of twenty-one different amino acid classes representing twenty different amino acids and unknown or absence of amino acid. Finally, the <b><i>secondary_structure_prediction.ckpt</i></b> is used to classify each voxel into one of four different classes representing three secondary structure atoms (coils, &alpha;-helices, and &beta;-strands) and no presence of any secondary structure atoms.<br> The inference code that utilizes these checkpoints is available in the Cryo2StructData GitHub repository : <a href= "https://github.com/BioinfoMachineLearning/cryo2struct">github.com/BioinfoMachineLearning/cryo2struct </a>

Methodology and Processing

Sources Statement

Documentation and Access to Sources:

The source code and instructions to use trained models for inference are freely available at https://github.com/BioinfoMachineLearning/cryo2struct

Data Access

Other Study Description Materials

Related Materials

The source code and instructions to use trained models for inference are freely available at https://github.com/BioinfoMachineLearning/cryo2struct

Other Study-Related Materials

Label:

ResolutionBasedSplits.xlsx

Text:

Train and Validation Splits Based on Resolution for Small Subset of Cryo2StructData dataset.

Notes:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Other Study-Related Materials

Label:

Trained Models.zip

Text:

Trained transformer-based models for predicting amino acid types, secondary structure types, and backbone atom types.

Notes:

application/zip