CONSENSUS SEQUENCES



Methodology

To compute consensus sequences, we used the UniProtKB reviewed (Swiss-Prot) fasta file. We filtered protein sequences to compute the relative abundance of each amino-acids relative to a given organism. The corresponding (observed) abundance in O-GlcNAcylated segments for each -5 to +5 position was computed and the ratio embedded in a log2 function. At position 0 (S, T, or ST residues) the output was normalized on the plot ceiling. We used the logomaker python library to generate the sequence logos.

Homo (n=9340)

Consensus ST Homo
Consensus S Homo
Consensus T Homo

Mus (n=1392)

Consensus ST Mus
Consensus S Mus
Consensus T Mus

Rattus (n=409)

Consensus ST Rattus
Consensus S Rattus
Consensus T Rattus

Caenorhabditis (n=66)

Consensus ST Caenorhabditis
Consensus S Caenorhabditis
Consensus T Caenorhabditis