View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
Cryo2StructData : Trained Model and Data Splits (Small Subset) |
Identification Number: |
doi:10.7910/DVN/DTV4JF |
Distributor: |
Harvard Dataverse |
Date of Distribution: |
2023-10-26 |
Version: |
1 |
Bibliographic Citation: |
Giri, Nabin; Wang, Liguo; Cheng, Jianlin, 2023, "Cryo2StructData : Trained Model and Data Splits (Small Subset)", https://doi.org/10.7910/DVN/DTV4JF, Harvard Dataverse, V1 |
Citation |
|
Title: |
Cryo2StructData : Trained Model and Data Splits (Small Subset) |
Subtitle: |
Cryo2StructData: A Large Labeled Cryo-EM Density Map Dataset for AI-based Modeling of Protein Structures |
Identification Number: |
doi:10.7910/DVN/DTV4JF |
Authoring Entity: |
Giri, Nabin (University of Missouri System) |
Wang, Liguo (Brookhaven National Laboratory) |
|
Cheng, Jianlin (University of Missouri System) |
|
Other identifications and acknowledgements: |
Giri, Nabin |
Grant Number: |
R01GM146340 |
Distributor: |
Harvard Dataverse |
Access Authority: |
Giri, Nabin |
Access Authority: |
Cheng, Jianlin |
Depositor: |
Giri, Nabin |
Date of Deposit: |
2023-10-24 |
Holdings Information: |
https://doi.org/10.7910/DVN/DTV4JF |
Study Scope |
|
Keywords: |
Computer and Information Science, Medicine, Health and Life Sciences, Other, cryo-electron microscopy |
Abstract: |
This repository includes the trained transformer-based model for the small subset Cryo2StructData dataset, as well as the training and validation split files. These split files categorize density map EMD-IDs into low, medium, and high resolutions. The training and validation sets contain 1680 and 187 density maps, respectively, with a split ratio of 90:10. |
Notes: |
The trained model checkpoints are for predicting amino acid types, secondary structure types and backbone atom types. The <b><i>backbone_atom_prediction.ckpt</i></b> is used during the inference phase to classify each voxel into one of four different classes representing three backbone atoms (Cα,C and N) and no presence of any backbone atoms. Similarly, the <b><i>amino_acid_type_prediction.ckpt</i></b> is used to classify each voxel into one of twenty-one different amino acid classes representing twenty different amino acids and unknown or absence of amino acid. Finally, the <b><i>secondary_structure_prediction.ckpt</i></b> is used to classify each voxel into one of four different classes representing three secondary structure atoms (coils, α-helices, and β-strands) and no presence of any secondary structure atoms.<br> The inference code that utilizes these checkpoints is available in the Cryo2StructData GitHub repository : <a href= "https://github.com/BioinfoMachineLearning/cryo2struct">github.com/BioinfoMachineLearning/cryo2struct </a> |
Methodology and Processing |
|
Sources Statement |
|
Documentation and Access to Sources: |
The source code and instructions to use trained models for inference are freely available at https://github.com/BioinfoMachineLearning/cryo2struct |
Data Access |
|
Other Study Description Materials |
|
Related Materials |
|
The source code and instructions to use trained models for inference are freely available at https://github.com/BioinfoMachineLearning/cryo2struct |
|
Label: |
ResolutionBasedSplits.xlsx |
Text: |
Train and Validation Splits Based on Resolution for Small Subset of Cryo2StructData dataset. |
Notes: |
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
Label: |
Trained Models.zip |
Text: |
Trained transformer-based models for predicting amino acid types, secondary structure types, and backbone atom types. |
Notes: |
application/zip |