A. Variant annotation using TransVar

Access http://www.transvar.net in your web browser,

1. Forward annotate the following genomic mutations, using GRCh37/hg19 reference and the CCDS database

17: 7577538 C T

17: 7577121 G A

Note:

Input these variants in HGVS format: chr17:g.7577538C>T and chr17:g.7577121G>A

Questions:

· Which gene is affected by these mutations?

· Do these mutations affect protein coding?

2. Reverse annotation the following protein alterations, using GRCh37/hg19 reference and Refseq database

ABL1:p.E255V

EGFR:p.L747S

Questions:

· How many genomic mutations each of the protein alterations could map to?

B. Variant annotation using Annovar

1. SSH onto the bioinformatics server

ssh YOUR_ID@139.52.107.59

2. Create a directory

mkdir annovar

cd annovar

3. Copy example files into your directory

cp /course/varlab/tumor.1.vcf .

4. Convert vcf format to annovar format

/course/final_project/bin/annovar/convert2annovar.pl -format vcf4 tumor.1.vcf > tumor.avinput

Note:

1. more information about file format and format conversion: http://annovar.openbioinformatics.org/en/latest/user-guide/input/

2. The first long path is trying to use the perl code (the code file is “convert2annovar.pl”) to convert the file “tumor.1.vcf” to “tumor.avinput”

5. Perform gene annotation (!!! This is one command in one sentence)

/course/final_project/bin/annovar/annotate_variation.pl -geneanno -buildver hg19 tumor.avinput /course/final_project/bin/annovar/humandb

Questions:

· How many variants are exonic, splicing, or intronic? (hints: examine tumor.avinput.variant_function using “more”, “grep” and “wc -l” commands)

Note:

1. More information here: http://annovar.openbioinformatics.org/en/latest/user-guide/gene/ (gene-based annotation)

2. This command is calling the perl code (“annotate_variation.pl”), “-geneanno” means gene-based annotation, “-buildver hg19” means use the genome builder hg19, “tumor.avinput” is the input file we generated in step B-4, “/course/final_project/bin/annovar/humandb” is the folder where the database stored.

3. use “ls -l” to check if any new files have been created, what are they?

6. Perform functional annotation using PolyPhen2

/course/final_project/bin/annovar/annotate_variation.pl -buildver hg19 tumor.avinput /course/final_project/bin/annovar/humandb -filter -dbtype ljb_pp2

Questions:

· How many variants obtained PolyPhen2 score? (hints: how many lines are there in tumor.avinput.hg19_ljb_pp2_dropped?)

· How many variants are “probably damaging” (score >= 0.85), “possibly damaging” (0.15<=score<0.85), or “benign” (score < 0.15)? (hints: examine the scores in the 2^nd column in tumor.avinput.hg19_ljb_pp2_dropped)

Note:

1. This command has the similar structure as the previous one, but it added some filters after the /humandb.

C. Sequence assembly using Velvet Assembler

1. Copy /course/asmlab into your home directory

cp -r /course/asmlab .

cd asmlab

2. Prepare hash table using kmer size 19

./velveth 19mer 19 -fastq -short TP53.hg19.mut1.r1.fq TP53.hg19.mut1.r2.fq

3. Assemble

./velvetg 19mer -exp_cov auto

Questions:

· What is the estimated coverage?

· What is the N50 size?

· What is the total assembly length?

· What is the breadth of coverage? i.e., total assembly length/reference sequence length (this is the integer in the header line of TP53.hg19.fa).

Note: quick guide of “velvet”: http://www.embnet.org/sites/default/files/quickguides/Velvet_and_Oases-QG.pdf

Github user manual: https://github.com/dzerbino/velvet/wiki/Manual

TP53.hg19.mut1.r1.fq and TP53.hg19.mut1.r2.fq are fastq files for gene TP53, first file is read 1 file, second file is read 2 file. Because they are pair-end sequenced data.

4. Examine the assembly result

more 19mer/contigs.fa

5. Try a different kmer size, e.g., 21

./velveth 21mer 21 -fastq -short TP53.hg19.mut1.r1.fq TP53.hg19.mut1.r2.fq

./velvetg 21mer -exp_cov auto

Questions:

· Which kmer produced better assembly?

Note: What is kmer? What is the difference between 19mer and 21mer? (Wikipedia: https://en.wikipedia.org/wiki/K-mer)