PF19: Genetics Question 2

For complex pancreatic disease POPULATION STUDIES, what genetic information shall be included in the genotyping method, analysis pipeline, and annotation?

Chairs: Agnieszka Rygiel MD and TBN

Delegates: Balázs Németh MD PhD, Eszter Hegyi MD PhD (email request to Dr. Whitcomb or Dr. Sahin-Toth)

Considerations: Should researchers include detailed description of the population, ancestry, SNP chip (minimal standard to do high quality imputation), sequencing (minimal depth and quality of key loci?) study design or publication.  Should recommendations include other information such as  details of the analytic pipeline?

6 Replies to “PF19: Genetics Question 2”

  1. Proposed STATEMENT: The steps of the Data processing and analysis flow chart for Sequencing based association studies (Whole genome Sequencing, Whole exome sequencing, Targeted-region resequencing) should include:

    1. Platform description
    2. Varian Calling – description of bioinformatics programs used to generate VCF files.
    3. QC (quality control)
    – check for DNA contamination (checking the level of heterozygosity as contaminated samples have high levels of heterozygosity) ,
    – global QC ( quality description of analyzed regions) should contain min, max, an average region depth (coverage)
    – per-variant QC – coverage (DePristo MA, 2011Nat. Genet.(491-498)

    4. The variants annotation
    – description of bioinformatics tools used for variant annotation including: (gene level , position of the variant in the gene, in silico prediction on protein function)
    5. Filtering process. This step should contain:
    – description of the filtering based on frequency ( MAF% cutoff)- which genomic databases were used e.g dbSNP, 1000 Genomes. ExaC, Gnomad etc.
    – clinical databases used to asses pathogenicity status ( genotype- phenotype correlation)
    6. Replication of the results in independent cohorts.
    This was based on (Seunggeung Lee, 2014, PMID: 24995866.)
    Please comment on this.

  2. II Regarding description of the population

    Proposed STATEMENT: The publication should contain the description of the ethnical background of the analyzed cases. The use of ethnically matched control group is essential in the case -control studies.

    The question is whether public sequencing databases such as Genome Aggregation Databases (gnomAD) might be used as “ the control group” for comparison purpose? If so, this what kind of bioinformatical analysis should be performed to correct for lack of-individual –level data, differences in ancestry and differences in sequencing platform and data processing. Suggested reference: PMID:30269813)

  3. Additional point regarding the pipeline analysis and study design

    The NGS data should be supplied with the BED file where the chromosomal input regions are specified by chromosome number, start position and end position. This is very important in the whole exome and retargeted sequencing methods since there is a number of different kits to prepare the DNA library and the actual chromosomal input regions may differ between them.

  4. Phil Greer MS (from the Whitcomb group) suggested:

    Platform: Micro-Array, WES, WGS, etc

    Informatics: Reference Genome Build, software used for processing and version of pipeline, database build dates, for WES: the platform/library, for all NGS, depth/coverage, QC results

  5. Using the data of an ethnically matched control group is crucial in every case-control study. I think, it is always better if we recruit controls together with patients by ourselves from exactly the same population and sequence them using the same method. However, we should recommend to use public sequencing databases as a source of control groups if there are available ethnically matching data, becasue the genetic data of these databases could definitely give additional weight to any case-control studies.

    What do you think about age and gender matching between patients and controls?

Leave a Reply