Dot-matrix methods

Template:Bioinformatics

Overview edit

A dot matrix picture provides a global picture of local similarities between two sequences. They are appropriate:

  • for comparing large sequences (several 1000 residues)
  • if one does not know in advance whether two sequences share detectable similarity or which parts of the sequences are related to each other.

They are useful for:

  • detection of repeats within protein sequences
  • detection of shared domains between protein sequences

Exercise edit

We propose the use of this free and public java applet to familiarize yourself with, and try out plotting a dot-matrix:

http://myhits.isb-sib.ch/cgi-bin/dotlet

Once the applet has loaded you should press the "Input" button and insert the protein sequences of your choice. A good database, with a big collection of protein sequences, is SwissProt. But for this exercise we will provide you with three ready pastable protein sequences:

>sp|P06239|LCK_HUMAN (Name=LCK;)Proto-oncogene tyrosine-protein kinase LCK
GCGCSSHPEDDWMENIDVCENCHYPIVPLDGKGTLLIRNGSEVRDPLVTYEGSNPPASPLQDNLVIALHSYEPSHDGDLG
FEKGEQLRILEQSGEWWKAQSLTTGQEGFIPFNFVAKANSLEPEPWFFKNLSRKDAERQLLAPGNTHGSFLIRESESTAG
SFSLSVRDFDQNQGEVVKHYKIRNLDNGGFYISPRITFPGLHELVRHYTNASDGLCTRLSRPCQTQKPQKPWWEDEWEVP
RETLKLVERLGAGQFGEVWMGYYNGHTKVAVKSLKQGSMSPDAFLAEANLMKQLQHQRLVRLYAVVTQEPIYIITEYMEN
GSLVDFLKTPSGIKLTINKLLDMAAQIAEGMAFIEERNYIHRDLRAANILVSDTLSCKIADFGLARLIEDNEYTAREGAK
FPIKWTAPEAINYGTFTIKSDVWSFGILLTEIVTHGRIPYPGMTNPEVIQNLERGYRMVRPDNCPEELYQLMRLCWKERP
EDRPTFDYLRSVLEDFFTATEGQYQPQP
>sp|P16333|NCK1_HUMAN (Name=NCK1;..)Cytoplasmic protein NCK1 (NCK adaptor ...
MAEEVVVVAKFDYVAQQEQELDIKKNERLWLLDDSKSWWRVRNSMNKTGFVPSNYVERKNSARKASIVKNLKDTLGIGKV
KRKPSVPDSASPADDSFVDPGERLYDLNMPAYVKFNYMAEREDELSLIKGTKVIVMEKCSDGWWRGSYNGQVGWFPSNYV
TEEGDSPLGDHVGSLSEKLAAVVNNLNTGQVLHVVQALYPFSSSNDEELNFEKGDVMDVIEKPENDPEWWKCRKINGMVG
LVPKNYVTVMQNNPLTSGLEPSPPQCDYIRPSLTGKFAGNPWYYGKVTRHQAEMALNERGHEGDFLIRDSESSPNDFSVS
LKAQGKNKHFKVQLKETVYCIGQRKFSTMEELVEHYKKAPIFTSEQGEKLYLVKHLS
>sp|P15498|VAV_HUMAN (Name=VAV1;..)Vav proto-oncogene.[Homo sapiens]
MELWRQCTHWLIQCRVLPPSHRVTWDGAQVCELAQALRDGVLLCQLLNNLLPHAINLREVNLRPQMSQFLCLKNIRTFLS
TCCEKFGLKRSELFEAFDLFDVQDFGKVIYTLSALSWTPIAQNRGIMPFPTEEESVGDEDIYSGLSDQIDDTVEEDEDLY
DCVENEEAEGDEIYEDLMRSEPVSMPPKMTEYDKRCCCLREIQQTEEKYTDTLGSIQQHFLKPLQRFLKPQDIEIIFINI
EDLLRVHTHFLKEMKEALGTPGAANLYQVFIKYKERFLVYGRYCSQVESASKHLDRVAAAREDVQMKLEECSQRANNGRF
TLRDLLMVPMQRVLKYHLLLQELVKHTQEAMEKENLRLALDAMRDLAQCVNEVKRDNETLRQITNFQLSIENLDQSLAHY
GRPKIDGELKITSVERRSKMDRYAFLLDKALLICKRRGDSYDLKDFVNLHSFQVRDDSSGDRDNKKWSHMFLLIEDQGAQ
GYELFFKTRELKKKWMEQFEMAISNIYPENATANGHDFQMFSFEETTSCKACQMLLRGTFYQGYRCHRCRASAHKECLGR
VPPCGRHGQDFPGTMKKDKLHRRAQDKKRNELGLPKMEVFQEYYGLPPPPGAIGPFLRLNPGDIVELTKAEAEQNWWEGR
NTSTNEIGWFPCNRVKPYVHGPPQDLSVHLWYAGPMERAGAESILANRSDGTFLVRQRVKDAAEFAISIKYNVEVKHIKI
MTAEGLYRITEKKAFRGLTELVEFYQQNSLKDCFKSLDTTLQFPFKEPEKRTISRPAVGSTKYFGTAKARYDFCARDRSE
LSLKEGDIIKILNKKGQQGWWRGEIYGRVGWFPANYVEEDYSEYC

In the dialog box that appears you should, firstly, insert a brief name in link with the protein to simply remember which protein is which and, secondly, copy-paste one of the protein sequences proposed above, or one found in the SwissProt database. Be careful when copy-pasting to not include the first line describing the protein and start your selection only on the first amino-acid letter.

Now, press the "OK" button; the dialog box should blank permitting you to repeat the process with an other sequence. Indeed, unless you want to compare one sequence with itself, which is not very interesting and only produces recurrence plots, you should have a second sequence loaded too.

Finally press the "Compute" button to actually draw the plot and explore it.

Questions edit

Do any of these proteins seem related to each other? What are the common regions? Does this make sense with the deduced repeat and domain architectures that are annotated in the SwissProt database for these three proteins (links provided)?

http://www.expasy.org/uniprot/LCK_HUMAN

http://www.expasy.org/uniprot/NCK1_HUMAN

http://www.expasy.org/uniprot/VAV_HUMAN