Recommendations

Note: Clicking any link within the readthedocs site will not open a new web browser tab. If you want to keep your docs open, either middle-click or right-click and choose open in new tab for the links you would like to follow.


1. About this Document

This document highlights common recommendations for usage of the collection 3165 data.

2. The BIDS Participants Files and Matched Groups

Demographic and socioeconomic variables relating to the ABCD participants included in Collection 3165 can be found in the participants.tsv spreadsheet. A data dictionary further explaining each variable is also included. They are available for download on the main NDA Collection 3165 page. A high level overview of these variables is detailed below.

  1. participant_id: NDA unique pGUID, starting with sub-
  2. session_id: Participant's session ID (all data within this first release are ses-baselineYear1Arm1)
  3. collection_3165: Presence or absence of the subject from this NDA collection 3165 uploaded data
  4. site: ABCD site location
  5. scanner_manufacturer: GE, Philips, or Siemens scanner
  6. scanner_model: Scanner model name
  7. scanner_software: Scanner software description
  8. matched_group: Carefully matched similar groups
  9. sex: Sex
  10. demo_race_a_p___10: White
  11. demo_race_a_p___11: Black/African American
  12. demo_race_a_p___12: Native American
  13. demo_race_a_p___13: Alaska Native
  14. demo_race_a_p___14: Native Hawaiian
  15. demo_race_a_p___15: Guamanian
  16. demo_race_a_p___16: Samoan
  17. demo_race_a_p___17: Other Pacific Islander
  18. demo_race_a_p___18: Asian Indian
  19. demo_race_a_p___19: Chinese
  20. demo_race_a_p___20: Filipino
  21. demo_race_a_p___21: Japanese
  22. demo_race_a_p___22: Korean
  23. demo_race_a_p___23: Vietnamese
  24. demo_race_a_p___24: Other Asian
  25. demo_ethn_p: Latinx
  26. demo_race_a_p___25: Other Race
  27. demo_race_a_p___77: Refuse To Answer
  28. demo_race_a_p___99: Don't Know
  29. age: Age in months
  30. handedness: Handedness
  31. siblings_twins: Family member status
  32. income: Combined income
  33. participant_education: Participant grade in school
  34. parental_education: Highest level of parental education
  35. anesthesia_exposure: History of participant anesthesia exposure
  36. neurocog_pc1.bl:
  37. neruocog_pc2.bl:
  38. neurocog_pc3.bl:
  39. released: Participants with updated fast track data based on revised QC (see: known issues)
  40. updated_dwi_input_json: Participants scanned on GE with MR Software release DV25.0_R02_1549.b (see: known issues)

The matched_group field is the product of comparisons across site, age, sex, ethnicity, grade, highest level of parental education, handedness, combined family income, exposure to anesthesia, and family-relatedness which show no significant differences between the ABCD-1 and ABCD-2 groups. Comparison of the counts and means for each of these factors shows that ABCD-1 and ABCD-2 are negligibly different samples. Gender shows the largest absolute difference of 2.5 percent. No other demographic variables differ by more than 1 percent. See table above.

Matched groups

A full-resolution version of this table can be found here.

3. The BIDS Quality Control File

This Quality Control (QC) file contains QC metrics for data from this collection and is available for download on the main NDA Collection 3165 page. Version 1.0.1 contains brain coverage scores for all runs of the derivatives.func.runs_task-(MID|nback|rest|SST)_volume data subsets. Currently, available fields in the QC file are:

  1. participant_id: NDA unique pGUID, starting with sub-
  2. session_id: Participant's session ID, starting with ses-
  3. data_subset: Collection 3165 data subset
  4. task: fMRI task name, starting with task-
  5. run: Chronological run number, starting with run-
  6. path: Relative path from the root of the data set
  7. brain_coverage_score: Overlap of the functional run time series mean with the atlas mask

Brain Coverage Score

The brain coverage score is an estimate of how much overlap exists between the fMRI task volumes and the MNI atlas mask. It is determined by what percentage of the MNI atlas mask file is covered by each temporal mean of the fMRI time series volume. This is calculated by first taking the temporal mean of the 4-dimensional fMRI time series. The meaned 3-dimensional volume is then binarized using fslmaths and masked using the MNI152_T1_2mm_brain_mask.nii.gz. The brain coverage score is a percentage. The score is the number of non-zero voxels left in the binarized volume divided by the number of non-zero voxels in the MNI mask.

4. Downloading and Unpacking Data

There are two ways to download ABCD Study data and get BIDS inputs or derivatives:

  1. (PREFERRED) Downloading from NDA Collection 3165 will provide a "data structure manifest" spreadsheet with AWS S3 links and other key information. DCAN Labs has designed a GitHub repository for selectively downloading only parts of the BIDS input and derivative data, the "nda-abcd-s3-downloader".
  2. ABCD Fast Track Data on the NDA can alternatively be downloaded and unpacked into BIDS with the ABCD-STUDY abcd-dicom2bids GitHub repository. This is if you need DICOM files specifically.

nda-abcd-s3-downloader

This downloader can parallelize downloads and you can specify only your data subsets of interest.

abcd-dicom2bids

This tool pulls DICOMs and E-Prime files from the NDA's "fast-track" data. It also unpacks, converts, and BIDS-standardizes the fast-track data so it becomes BIDS-compliant and matches that which is uploaded to collection 3165.

5. MATLAB Motion Mask Files

In order to make an accurate correlation matrix, use the MATLAB motion mask file described in release document 4, Derivatives, under the Motion MAT File heading.

6. Interacting with Output Data Types

Along with GIFTIs, released data follows the standards defined by the Human Connectome Project, such as reporting different metrics in standard grayordinate space and saving data using CIFTI standard file formats.

A couple of great blog posts can be read online for more detailed coverage of CIFTI data types and interaction. These topics will only be briefly discussed in this document.

The following data types, listed by file name extension, are available in this collection's BIDS derivatives.

  1. .dlabel.nii: "Dense label files" contain the "labels" (a.k.a. parcels) within parcellations.
  2. .dscalar.nii: "Dense scalar files" contain things like cortical thickness, curvature, and myelin maps on a scalar value per surface vertex basis.
  3. .dtseries.nii: "Dense time series" contain functional time series from fMRI runs in surface space on a vector time series per surface vertex basis.
  4. .ptseries.nii: "Parcellated time series," contain the dense time series parcellated by the corresponding dense label file.
  5. .surf.gii: "GIFTI surface files" contain the "geometry" surface delineations/definitions of a particular surface, like the midthickness surface for example.

Dense and Parcellated Time Series

The dense and parcellated time series files should regularly be analyzed using their corresponding motion files. Periods of high motion should be censored out for the purposes of regular connectivity/correlation matrix analysis.

Correlation Matrices

Correlation matrices should be generated from either the dense or parcellated time series using frame censoring from the aforementioned MATLAB motion mask files. The DCAN-Labs/cifti-connectivity tools should be used which account for choosing a framewise displacement threshold, an acceptable amount of remaining minutes threshold, and outputting either dense (.dconn.nii) or parcellated (.pconn.nii) connectivity matrices.

Connectome Workbench

For visualization of all of these CIFTI files, use Connectome Workbench.

7. DCAN Labs Software

We have built tools to utilize this data using our recommended methods. Read on for descriptions of each publicly-hosted open-source software GitHub repository from DCAN-Labs.

ABCD-BIDS Pipeline: https://github.com/ABCD-STUDY/abcd-hcp-pipeline

See these release notes' document 3: Pipeline.

Custom Clean: https://github.com/DCAN-Labs/CustomClean

Custom clean is a generalized piece of software which is great for defining common output files to delete when presented with similar folders of files. This is a common occurrence in data processing where you process an input dataset and end up with a similar set of output files for every processed job following some output folder convention.

First you use a graphical user interface (GUI) to teach custom clean about what files should be regularly cleaned. The custom clean GUI outputs a "cleaning JSON file" which has all the definitions for files to be cleaned within. After that you can call the custom clean script with the cleaning JSON file as many times as you like on as many similar folders as you like.

File Mapper: https://github.com/DCAN-Labs/file-mapper

File mapper is another generalized piece of software which is great for defining a common output folder/file hierarchy based on a template set of files to be mapped and an output hierarchy to which you can map. We use it to conform the commonly output "Human Connectome Project-styled" processed folders into BIDS-compliant derivative folders.

Much like custom clean, you define a JSON file which says how to map a file from some common input to some common output in order to "reshape" your data outputs.

8. BIDS Folder Layout

Your final BIDS folder structure will look like this tree if you download everything. Full descriptions of these BIDS input and BIDS derivative data are located in these release notes' documents 2 and 4, Inputs and Derivatives respectively.

ABCD-BIDS Layout

A full-resolution version of this picture, complete with descriptions, can be found here.