A.     Variant annotation using TransVar

 

Access http://www.transvar.net in your web browser,

 

1.     Forward annotate the following genomic mutations, using GRCh37/hg19 reference and the CCDS database

17: 7577538 C          T         

17: 7577121 G          A         

 

        Note:

Input these variants in HGVS format: chr17:g.7577538C>T and chr17:g.7577121G>A

 

Questions:

á      Which gene is affected by these mutations?

á      Do these mutations affect protein coding?

 

 

2.     Reverse annotation the following protein alterations, using GRCh37/hg19 reference and Refseq database

ABL1:p.E255V

EGFR:p.L747S

 

Questions:

á      How many genomic mutations each of the protein alterations could map to?

 

B.     Variant annotation using Annovar

1.     SSH onto the bioinformatics server

ssh YOUR_ID@139.52.107.59

2.     Create a directory

mkdir annovar

cd annovar

3.     Copy example files into your directory

cp /course/varlab/tumor.1.vcf .

4.     Convert vcf format to annovar format

/course/final_project/bin/annovar/convert2annovar.pl -format vcf4 tumor.1.vcf > tumor.avinput

        Note:

1. more information about file format and format conversion: http://annovar.openbioinformatics.org/en/latest/user-guide/input/ 

2. The first long path is trying to use the perl code (the code file is Òconvert2annovar.plÓ) to convert the  file Òtumor.1.vcfÓ to Òtumor.avinputÓ

 

5.     Perform gene annotation (!!! This is one command in one sentence)

/course/final_project/bin/annovar/annotate_variation.pl -geneanno -buildver hg19 tumor.avinput /course/final_project/bin/annovar/humandb

 

Questions:

á      How many variants are exonic, splicing, or intronic? (hints: examine tumor.avinput.variant_function using ÒmoreÓ, ÒgrepÓ and Òwc -lÓ commands)

Note:

1. More information here: http://annovar.openbioinformatics.org/en/latest/user-guide/gene/  (gene-based annotation)

2. This command is calling the perl code (Òannotate_variation.plÓ), Ò-geneannoÓ means gene-based annotation, Ò-buildver hg19Ó means use the genome builder hg19, Òtumor.avinputÓ is the input file we generated in step B-4, Ò/course/final_project/bin/annovar/humandbÓ is the folder where the database stored.

3. use Òls -lÓ to check if any new files have been created, what are they?

 

6.     Perform functional annotation using PolyPhen2

/course/final_project/bin/annovar/annotate_variation.pl -buildver hg19 tumor.avinput /course/final_project/bin/annovar/humandb -filter -dbtype ljb_pp2

 

Questions:

á      How many variants obtained PolyPhen2 score? (hints: how many lines are there in tumor.avinput.hg19_ljb_pp2_dropped?)

á      How many variants are Òprobably damagingÓ (score >= 0.85), Òpossibly damagingÓ (0.15<=score<0.85), or ÒbenignÓ (score < 0.15)? (hints: examine the scores in the 2nd column in tumor.avinput.hg19_ljb_pp2_dropped)

Note:

1. This command has the similar structure as the previous one, but it added some filters after the /humandb.

 

C.     Sequence assembly using Velvet Assembler

1.     Copy /course/asmlab into your home directory

cd

cp -r /course/asmlab .    

cd asmlab

2.     Prepare hash table using kmer size 19

./velveth 19mer 19 -fastq -short TP53.hg19.mut1.r1.fq TP53.hg19.mut1.r2.fq

3.     Assemble

./velvetg 19mer -exp_cov auto

 

            Questions:

á      What is the estimated coverage?

á      What is the N50 size?

á      What is the total assembly length?

á      What is the breadth of coverage? i.e., total assembly length/reference sequence length (this is the integer in the header line of TP53.hg19.fa).

Note: quick guide of ÒvelvetÓ: http://www.embnet.org/sites/default/files/quickguides/Velvet_and_Oases-QG.pdf

Github user manual: https://github.com/dzerbino/velvet/wiki/Manual

TP53.hg19.mut1.r1.fq and TP53.hg19.mut1.r2.fq are fastq files for gene TP53, first file is read 1 file, second file is read 2 file. Because they are pair-end sequenced data.

 

4.     Examine the assembly result

more 19mer/contigs.fa

5.     Try a different kmer size, e.g., 21

./velveth 21mer 21 -fastq -short TP53.hg19.mut1.r1.fq TP53.hg19.mut1.r2.fq

./velvetg 21mer -exp_cov auto

 

Questions:

á      Which kmer produced better assembly?

Note: What is kmer? What is the difference between 19mer and 21mer? (Wikipedia: https://en.wikipedia.org/wiki/K-mer)