A. Variant annotation using TransVar
Access http://www.transvar.net in your web browser,
1. Forward annotate the following genomic mutations, using GRCh37/hg19 reference and the CCDS database
17: 7577538 C T
17: 7577121 G A
Note:
Input these variants in HGVS format: chr17:g.7577538C>T and chr17:g.7577121G>A
Questions:
á Which gene is affected by these mutations?
á Do these mutations affect protein coding?
2. Reverse annotation the following protein alterations, using GRCh37/hg19 reference and Refseq database
ABL1:p.E255V
EGFR:p.L747S
Questions:
á How many genomic mutations each of the protein alterations could map to?
B. Variant annotation using Annovar
1. SSH onto the bioinformatics server
ssh YOUR_ID@139.52.107.59
2. Create a directory
mkdir annovar
cd annovar
3. Copy example files into your directory
cp /course/varlab/tumor.1.vcf .
4. Convert vcf format to annovar format
/course/final_project/bin/annovar/convert2annovar.pl
-format vcf4 tumor.1.vcf > tumor.avinput
Note:
1. more information about file format and
format conversion: http://annovar.openbioinformatics.org/en/latest/user-guide/input/
2. The first long path is trying to use the perl code (the code file is Òconvert2annovar.plÓ) to convert the file Òtumor.1.vcfÓ to Òtumor.avinputÓ
5. Perform gene annotation (!!! This is one command in one sentence)
/course/final_project/bin/annovar/annotate_variation.pl
-geneanno -buildver
hg19 tumor.avinput /course/final_project/bin/annovar/humandb
Questions:
á How many variants are exonic, splicing, or intronic? (hints: examine tumor.avinput.variant_function using ÒmoreÓ, ÒgrepÓ and Òwc -lÓ commands)
Note:
1. More information here: http://annovar.openbioinformatics.org/en/latest/user-guide/gene/ (gene-based annotation)
2. This command is calling the perl code
(Òannotate_variation.plÓ), Ò-geneannoÓ means
gene-based annotation, Ò-buildver hg19Ó means use the
genome builder hg19, Òtumor.avinputÓ is the input
file we generated in step B-4, Ò/course/final_project/bin/annovar/humandbÓ is the folder where the database stored.
3. use Òls -lÓ to check if any new files have been created, what are they?
6. Perform functional annotation using PolyPhen2
/course/final_project/bin/annovar/annotate_variation.pl
-buildver hg19 tumor.avinput
/course/final_project/bin/annovar/humandb -filter -dbtype ljb_pp2
Questions:
á How many variants obtained PolyPhen2 score? (hints: how many lines are there in tumor.avinput.hg19_ljb_pp2_dropped?)
á How many variants are Òprobably damagingÓ (score >= 0.85), Òpossibly damagingÓ (0.15<=score<0.85), or ÒbenignÓ (score < 0.15)? (hints: examine the scores in the 2nd column in tumor.avinput.hg19_ljb_pp2_dropped)
Note:
1. This command has the similar structure as the previous one, but it added some filters after the /humandb.
C. Sequence assembly using Velvet Assembler
1. Copy /course/asmlab into your home directory
cd
cp -r /course/asmlab .
cd asmlab
2. Prepare hash table using kmer size 19
./velveth 19mer 19 -fastq
-short TP53.hg19.mut1.r1.fq TP53.hg19.mut1.r2.fq
3. Assemble
./velvetg 19mer -exp_cov
auto
Questions:
á What is the estimated coverage?
á What is the N50 size?
á What is the total assembly length?
á What is the breadth of coverage? i.e., total assembly length/reference sequence length (this is the integer in the header line of TP53.hg19.fa).
Note: quick guide of ÒvelvetÓ: http://www.embnet.org/sites/default/files/quickguides/Velvet_and_Oases-QG.pdf
Github user manual: https://github.com/dzerbino/velvet/wiki/Manual
TP53.hg19.mut1.r1.fq and TP53.hg19.mut1.r2.fq are fastq files for gene TP53, first file is read 1 file, second file is read 2 file. Because they are pair-end sequenced data.
4. Examine the assembly result
more 19mer/contigs.fa
5. Try a different kmer size, e.g., 21
./velveth 21mer 21 -fastq
-short TP53.hg19.mut1.r1.fq TP53.hg19.mut1.r2.fq
Questions:
á Which kmer produced better assembly?
Note: What is kmer? What is the difference between 19mer and 21mer? (Wikipedia: https://en.wikipedia.org/wiki/K-mer)