蛋白质结构预测

更新时间:2024-01-18 09:23:01 阅读量: 教育文库 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

鼠伤寒沙门氏菌H-1鞭毛蛋白的结构预测

蛋白质序列截图

>sp|P06179|FLIC_SALTY Flagellin OS=Salmonella typhimurium GN=fliC PE=1 SV=4

MAQVINTNSLSLLTQNNLNKSQSALGTAIERLSSGLRINSAKDDAAGQAIANRFTANIKG LTQASRNANDGISIAQTTEGALNEINNNLQRVRELAVQSANSTNSQSDLDSIQAEITQRL NEIDRVSGQTQFNGVKVLAQDNTLTIQVGANDGETIDIDLKQINSQTLGLDTLNVQQKYK VSDTAATVTGYADTTIALDNSTFKASATGLGGTDQKIDGDLKFDDTTGKYYAKVTVTGGT GKDGYYEVSVDKTNGEVTLAGGATSPLTGGLPATATEDVKNVQVANADLTEAKAALTAAG VTGTASVVKMSYTDNNGKTIDGGLAVKVGDDYYSATQNKDGSISINTTKYTADDGTSKTA LNKLGGADGKTEVVSIGGKTYAASKAEGHNFKAQPDLAEAAATTTENPLQKIDAALAQVD TLRSDLGAVQNRFNSAITNLGNTVNNLTSARSRIEDSDYATEVSNMSRAQILQQAGTSVL AQANQVPQNVLSLLR

等电点、相对分子质量计算

10 20 30 40 50 60 MAQVINTNSL SLLTQNNLNK SQSALGTAIE RLSSGLRINS AKDDAAGQAI ANRFTANIKG

70 80 90 100 110 120 LTQASRNAND GISIAQTTEG ALNEINNNLQ RVRELAVQSA NSTNSQSDLD SIQAEITQRL

130 140 150 160 170 180 NEIDRVSGQT QFNGVKVLAQ DNTLTIQVGA NDGETIDIDL KQINSQTLGL DTLNVQQKYK

190 200 210 220 230 240

VSDTAATVTG YADTTIALDN STFKASATGL GGTDQKIDGD LKFDDTTGKY YAKVTVTGGT

250 260 270 280 290 300 GKDGYYEVSV DKTNGEVTLA GGATSPLTGG LPATATEDVK NVQVANADLT EAKAALTAAG

310 320 330 340 350 360 VTGTASVVKM SYTDNNGKTI DGGLAVKVGD DYYSATQNKD GSISINTTKY TADDGTSKTA

370 380 390 400 410 420 LNKLGGADGK TEVVSIGGKT YAASKAEGHN FKAQPDLAEA AATTTENPLQ KIDAALAQVD

430 440 450 460 470 480 TLRSDLGAVQ NRFNSAITNL GNTVNNLTSA RSRIEDSDYA TEVSNMSRAQ ILQQAGTSVL

490

AQANQVPQNV LSLLR

蛋白质参数预测

Number of amino acids: 495

Molecular weight: 51611.7

Theoretical pI: 4.79

Amino acid composition: Ala (A) 61 12.3% Arg (R) 14 2.8% Asn (N) 42 8.5% Asp (D) 37 7.5% Cys (C) 0 0.0% Gln (Q) 32 6.5% Glu (E) 17 3.4% Gly (G) 43 8.7% His (H) 1 0.2% Ile (I) 25 5.1% Leu (L) 42 8.5% Lys (K) 28 5.7% Met (M) 3 0.6% Phe (F) 6 1.2% Pro (P) 5 1.0% Ser (S) 38 7.7% Thr (T) 57 11.5% Trp (W) 0 0.0%

Tyr (Y) 12 2.4% Val (V) 32 6.5% Pyl (O) 0 0.0% Sec (U) 0 0.0%

(B) 0 0.0% (Z) 0 0.0% (X) 0 0.0%

Total number of negatively charged residues (Asp + Glu): 54 Total number of positively charged residues (Arg + Lys): 42

Atomic composition:

Carbon C 2194 Hydrogen H 3597 Nitrogen N 641 Oxygen O 785 Sulfur S 3

Formula: C2194H3597N641O785S3 Total number of atoms: 7220

Extinction coefficients:

This protein does not contain any Trp residues. Experience shows that this could result in more than 10% error in the computed extinction coefficient.

Extinction coefficients are in units of M-1 cm-1, at 280 nm measured in water.

Ext. coefficient 17880 Abs 0.1% (=1 g/l) 0.346

Estimated half-life:

The N-terminal of the sequence considered is M (Met). The estimated half-life is: 30 hours (mammalian reticulocytes, in vitro). >20 hours (yeast, in vivo).

>10 hours (Escherichia coli, in vivo).

Instability index:

The instability index (II) is computed to be 24.98 This classifies the protein as stable.

Aliphatic index: 83.86

Grand average of hydropathicity (GRAVY): -0.395

氨基酸组成、电荷分布、疏水区域、跨膜区域等 SAPS. Version of April 11, 1996. Date run: Mon Nov 22 12:41:22 2010

SAPS (Statistical Analysis of Protein Sequences) evaluates by statistical

criteria a wide variety of protein sequence properties. A full description

of the methods is given in the paper referred to below. The output is or-

ganized in the following sections: file name, sequence printout, composi-

tional analysis, charge distributional analysis (charge clusters; high

scoring (un)charged segments; charge runs and patterns), distribution of

other amino acid types (high scoring hydrophobic and transmembrane seg-

ments; cysteine spacings), repetitive structures (in the amino acid alpha-

bet and in a 11-letter reduced alphabet), multiplets (counts, spacings,

and clusters in the amino acid and charge alphabets), periodicity

analysis, spacing analysis. Each section is annotated below under its sec-

tion title.

The SAPS program was developed in the group of Prof. Samuel Karlin at

Stanford University. Correspondence relating to SAPS should be addressed

to either Volker Brendel or Samuel Karlin at the Department of Mathemat-

ics, Stanford University, Stanford CA 94305, U.S.A.; phone: (415) 723-

2209; fax: (415) 725-2040; email: volker@gnomic.stanford.edu. Users of the

program should cite the following reference: Brendel, V., Bucher, P.,

Nourbakhsh, I., Blaisdell, B.E., Karlin, S. (1992) Methods and algorithms

for statistical analysis of protein sequences. Proc. Natl. Acad. Sci. USA

89: 2002-2006.

Protein 1 (File: wwwtmp/.SAPS.19208.6283.seq)

SWISS-PROT ANNOTATION: ID unknown

DE unknown, 549 bases, 502107F4 checksum.

number of residues: 545; molecular weight: 57.1 kdal

1 SPPFLICSAL TYFLAGELLI NSSALMNELL ATYPHIMRIM GNFLICPESV MAQVINTNSL

61 SLLTQNNLNK SQSALGTAIE RLSSGLRINS AKDDAAGQAI ANRFTANIKG LTQASRNAND

121 GISIAQTTEG ALNEINNNLQ RVRELAVQSA NSTNSQSDLD SIQAEITQRL NEIDRVSGQT

181 QFNGVKVLAQ DNTLTIQVGA NDGETIDIDL KQINSQTLGL DTLNVQQKYK VSDTAATVTG

241 YADTTIALDN STFKASATGL GGTDQKIDGD LKFDDTTGKY YAKVTVTGGT GKDGYYEVSV

301 DKTNGEVTLA GGATSPLTGG LPATATEDVK NVQVANADLT EAKAALTAAG VTGTASVVKM

361 SYTDNNGKTI DGGLAVKVGD DYYSATQNKD GSISINTTKY TADDGTSKTA LNKLGGADGK

421 TEVVSIGGKT YAASKAEGHN FKAQPDLAEA AATTTENPLQ KIDAALAQVD TLRSDLGAVQ

481 NRFNSAITNL GNTVNNLTSA RSRIEDSDYA TEVSNMSRAQ ILQQAGTSVL AQANQVPQNV

541 LSLLR

--------------------------------------------------------------------------------

COMPOSITIONAL ANALYSIS (extremes relative to: swp23s.q)

The composition of the input sequence is evaluated relative to the residue

usage quantile table specified with the `-s species' flag. Low usage in the 1% quantile is indicated by the label -- (e.g., Y-- means that the input sequence uses tyrosine as little as the 1% least tyrosine contain- ing proteins in the reference set); low usage in the 5% quantile is indi- cated by the label `-' (e.g., L-); high usage above the 95% quantile

point is indicated by the label `+' (e.g., A+); and high usage above the

99% quantile point is indicated by the label `++' (e.g., LIVFM++). The

usage is evaluated for all 20 amino acids, positive (KR) and negative (ED)

charge, total charge (KRED), net charge (KR-ED), major hydrophobics

(LVIFM), and the groupings ST, AGP (encoded by CCN, GCN, and GGN codons),

and FIKMNY (encoded by AAN, AUN, UAN, and UUN codons).

A : 65(11.9%); C : 2( 0.4%); D : 37( 6.8%); E : 20( 3.7%); F : 9( 1.7%)

G : 45( 8.3%); H- : 2( 0.4%); I : 30( 5.5%); K : 28( 5.1%); L : 51( 9.4%)

M : 6( 1.1%); N+ : 45( 8.3%); P- : 9( 1.7%); Q : 32( 5.9%); R : 15( 2.8%)

S : 43( 7.9%); T+ : 59(10.8%); V : 33( 6.1%); W : 0( 0.0%); Y : 14( 2.6%)

KR : 43 ( 7.9%); ED : 57 ( 10.5%); AGP : 119 ( 21.8%);

KRED : 100 ( 18.3%); KR-ED : -14 ( -2.6%); FIKMNY : 132 ( 24.2%);

LVIFM : 129 ( 23.7%); ST + : 102 ( 18.7%).

酶切结果预测

PeptideMass

The entered protein is: P06179

The selected enzyme is: Thermolysin

Maximum number of missed cleavages (MC): 0

All cysteines in reduced form.

Methionines have not been oxidized.

Displaying peptides with a mass bigger than 500 Dalton.

Using monoisotopic masses of the occurring amino acid residues and giving peptide masses as [M+H]+.

--------------------------------------------------------------------------------

You have selected P06179 (P06179) from UniProtKB/Swiss-Prot:

Flagellin (Phase 1-I flagellin)

Chain Flagellin at positions 2 - 495 [Theoretical pI: 4.79 / Mw (average mass): 51480.50 / Mw (monoisotopic mass): 51450.00] mass position #MC modifications peptide sequence 1432.6590 236-249 0 VTGGTGKDGYYEVS 1238.5131 100-111 0 ANSTNSQSDLDS

1130.4783 310-319 0 MSYTDNNGKT

1109.4786 223-231 0 FDDTTGKYY

962.4789 250-258 0 VDKTNGEVT

844.4774 156-162 0 IDIDLKQ

840.4461 345-351 0 INTTKYT

830.4003 82-88 0 LNEINNN

820.3795 335-342 0 ATQNKDGS

818.3203 328-334 0 VGDDYYS

794.3526 352-359 0 ADDGTSKT

793.4566 175-180 0 VQQKYK

776.3784 275-281 0 ATEDVKN

770.3791 483-489 0 ANQVPQN

759.3995 120-125 0 LNEIDR

747.3883 287-293 0 ADLTEAK

741.2937 454-459 0 IEDSDY

733.3363 402-408 0 ATTTENP

719.3570 367-373 0 ADGKTEV

718.3730 210-216 0 LGGTDQK

717.3890 114-119 0 AEITQR

676.3624 18-23 0 LNKSQS

660.3675 422-427 0 LRSDLG

660.3562 217-222 0 IDGDLK

638.3508 376-381 0 IGGKTY

620.2886 460-465 0 ATEVSN

619.3046 126-131 0 VSGQTQ

606.2729 75-80 0 AQTTEG

606.2365 150-155 0 ANDGET

589.2940 13-17 0 LTQNN

562.2831 163-167 0 INSQT

557.3293 267-272 0 LTGGLP

549.2515 198-202 0 LDNST

548.2675 5-9 0 INTNS

548.2311 139-143 0 AQDNT

543.2773 393-397 0 AQPDL

527.2208 386-390 0 AEGHN

519.2409 41-45 0 AKDDA

516.3140 92-95 0 VREL

516.2889 429-432 0 VQNR

53.8% of sequence covered (you may modify the input parameters to display also peptides < 500 Da or > 100000000000 Da):

10 20 30 40 50 60

aqvINTNSl slLTQNNLNK SQSalgtaie rlssglrins AKDDAagqai anrftanikg 70 80 90 100 110 120 ltqasrnand gisiAQTTEG aLNEINNNlq rVRELavqsA NSTNSQSDLD SiqAEITQRL 130 140 150 160 170 180 NEIDRVSGQT QfngvkvlAQ DNTltiqvgA NDGETIDIDL KQINSQTlgl dtlnVQQKYK 190 200 210 220 230 240 vsdtaatvtg yadttiaLDN STfkasatgL GGTDQKIDGD LKFDDTTGKY YakvtVTGGT 250 260 270 280 290 300 GKDGYYEVSV DKTNGEVTla ggatspLTGG LPatATEDVK NvqvanADLT EAKaaltaag 310 320 330 340 350 360 vtgtasvvkM SYTDNNGKTi dgglavkVGD DYYSATQNKD GSisINTTKY TADDGTSKTa 370 380 390 400 410

420 lnklggADGK TEVvsIGGKT YaaskAEGHN fkAQPDLaea aATTTENPlq kidaalaqvd 430 440 450 460 470 480 tLRSDLGaVQ NRfnsaitnl gntvnnltsa rsrIEDSDYA TEVSNmsraq ilqqagtsvl 490 aqANQVPQNv lsllr

二级结构预测部分结果截图

跨膜区段预测

信号肽及其剪切位点预测 SignalP-NN result:

in EPS format

>sp_P06179_FLIC_SALTY length = 495

# Measure Position Value Cutoff signal peptide? max. C 484 0.055 0.52 NO max. Y 15 0.094 0.33 NO max. S 4 0.919 0.92 NO mean S 1-14 0.540 0.49 YES D 1-14 0.317 0.44 NO

# Most likely cleavage site between pos. 14 and 15: LLT-QN

SignalP-HMM result:

in EPS format

>sp_P06179_FLIC_SALTY

Prediction: Non-secretory protein Signal peptide probability: 0.000

Max cleavage site probability: 0.000 between pos. -1 and 0

# gnuplot script

for making the plot(s)

卷曲螺旋预测

三维结构预测结果系列截图

本文来源:https://www.bwwdw.com/article/n6to.html

Top