HOW TO CITE



Please acknowledge the Olivier-Van Stichelen lab at the Medical College of Wisconsin, Dpt of Biochemistry when citing this database.


For the human and others O-GlcNAcome repositories, please cite:

Wulff-Fuentes E, Berendt RR, Massman L, Danner L, Malard F, Vora J, Kahsay R, and Olivier-Van Stichelen S, The Human O-GlcNAcome Database and Meta-Analysis. Scientific Data 2021 8(1):25

For the software development of the O-GlcNAc Database and the Python package utilsovs, please cite:

Malard F, Wullf-Fuentes E, Berendt R, Didier G, and Olivier-Van Stichelen S, Automatization and self-maintenance of the O-GlcNAcome catalogue: A Smart Scientific Database. Database (Oxford). 2021 2021.


Disclaimer

We automatically control and manually validate that all O-GlcNAcylation sites in the O-GlcNAc Database are relevant with the parent protein sequence.

For this quality control, the canonical protein sequence is used first. If one mismatch is detected, and for less than 1% of sites in the O-GlcNAc Database, the alternatives protein sequences are then tested.

When present, those sites are specifically labeled on the website with a link toward the relevant isoform search result.

The terms canonical, alternative, related identifiers and sequences were sourced from UniProtKB database.




THE O-GLCNAC SCORE



The O-GlcNAc score (S) is a quantifier we developped to estimate the level of exhaustiveness for each entry in the database in function of the available O-GlcNAc litterature.

S(x)~=~R(x)^{norm}+C(x)^{norm}+T(x)^{norm}+fA(x)^{norm}+lA(x)^{norm}+B(x)^{norm}

Briefly, the O-GlcNAc score of a given entry, S(x), is the sum of normalized factors, each describing a particular aspect of the litterature regarding a given entry.

Each factor is a function and contributes in the [0,1] interval to establish the global score S, which has a maximal theoretical value of six (scaled up to 100 in the Explore panel).

Those factors represent: More details are available in the above-mentioned papers or on request.




PEPTIDE DIGEST TOOL



We provide full and partial digestion products for all the entries in the O-GlcNAc Database and several common proteases.

This includes averaged and monoisotopic mass for each peptides, in absence or in presence of O-GlcNAcylation or phosphorylation.

Below the proteases cleavage sites and the molecular weights information we use.

## Pattern matching using Regex and python3.7 re.finditer() method and double look-behind to preserve overlapping matches.
            proteases = {
                'Trypsin':['bovine',r'(?<=[KR])(?<=[A-Z])'], # Cut after K or R residues
                'Chymotrypsin':['bovine',r'(?<=[YFW])(?<=[A-Z])'], # Cut after Y, F or W residues
                'Arg-C':['mouse-submaxillary-gland',r'(?<=[R])(?<=[A-Z])'], # Cut after R residues
                'Glu-C':['Staphylococcus-aureus',r'(?<=[E])(?<=[A-Z])'], # Cut after E residues
                'Lys-C':['Lysobacter-enzymogenes',r'(?<=[K])(?<=[A-Z])'], # Cut after K residues
                'Pepsin':['porcine',r'(?<=[GAVLIPMYFW])(?<=[A-Z])'], # Cut after G, A, V, L, I, P, M, Y, F, W residues
                'Thermolysin':['Bacillus-thermo-proteolyticus',r'(?<=[LFIVMA])(?<=[A-Z])'], # Cut after L, F, I, V, M, A residues
                'Elastase':['porcine',r'(?<=[AVSGLI])(?<=[A-Z])'] # Cut after A, V, S, S, G, L, I residues
            }

            ## Dictionary  of monoisotopic (index 0) and averaged (index 1) mass (see https://web.expasy.org/findmod/findmod_masses.html#AA)
            monoisotopic_average_mass = {
                'A':[71.03711,71.0788],
                'R':[156.10111,156.1875],
                'N':[114.04293,114.1038],
                'D':[115.02694,115.0886],
                'C':[103.00919,103.1388],
                'E':[129.04259,129.1155],
                'Q':[128.05858,128.1307],
                'G':[57.02146,57.0519],
                'H':[137.05891,137.1411],
                'I':[113.08406,113.1594],
                'L':[113.08406,113.1594],
                'K':[128.09496,128.1741],
                'M':[131.04049,131.1926],
                'F':[147.06841,147.1766],
                'P':[97.05276,97.1167],
                'S':[87.03203,87.0782],
                'T':[101.04768,101.1051],
                'W':[186.07931,186.2132],
                'Y':[163.06333,163.1760],
                'V':[99.06841,99.1326],
                'U':[150.953636,150.0388],
                'O':[237.147727,237.3018],
                'water':[18.01056,18.01524], # MW peptide = MW each residue + MW water
                'oglcnac':[203.0794,203.1950],
                'phospho':[79.9663,79.9799]
            }

The total digestion products list contains the smallest peptides that could be obtained with a given protease (i.e. 100% cleavage at all sites).

The partial digestion products list contains all possible combinations of adjacent peptides from the above-mentioned list.

## Example with Trypsin protease and the full length - imaginary - protein sequence:
                ILIKEGLCNACRAWPEPTIDES

                # Total digestion products list
                ILIK
                EGLCNACR
                AWPEPTIDES

                # Partial digestion products list
                ILIK
                EGLCNACR
                AWPEPTIDES
                ILIKEGLCNACR
                EGLCNACRAWPEPTIDES
                ILIKEGLCNACRAWPEPTIDES
            

We use the code below to generate partial digestion products from the total digestion products list.

def partial_digestion_products(full_digestion_products,full_digestion_index,length_sequence):
                # Number of peptides upon full digestion
                length_full_fragment_set = len(full_digestion_products)
                # Output list for partial products
                partial_digestion_products = []
                # We start from each peptide in the full_digestion_products
                for start in range(length_full_fragment_set):
                # We extend one by one
                for end in range (start+1, length_full_fragment_set+1):
                # We compute residues index to pass in the output list
                if len(full_digestion_index[start:end]) > 1:
                index_tuple = (full_digestion_index[start:end][0],full_digestion_index[start:end][-1])
                elif len(full_digestion_index[start:end]) == 1 and full_digestion_index[start:end][0] == full_digestion_index[-1]:
                index_tuple = (int(full_digestion_index[-1]),length_sequence-1)
                elif len(full_digestion_index[start:end]) == 1 and full_digestion_index[start:end][0] == full_digestion_index[0]:
                index_tuple = (0,int(full_digestion_index[0]))
                # We append the partial_digestion product to the list and we save the residue indexing computed just above
                partial_digestion_products.append([''.join(full_digestion_products[start:end]),index_tuple])
                # We get what we want
                return partial_digestion_products

Due to limited server resources, please note that you may face timeout issues in partial digestion mode with poorly selective proteases and large proteins.

For custom requests, specific protease addition in the menu list or for any other comment, please email us.




TECHNOLOGY



The O-GlcNAc Database relies on the non-relational database management system MongoDB and is based on the django web framework for rendering.

Backend processes were all developed using the Python programming language (v3.7) and the pymongo library for database server-client interactions.

GNU/Linux Debian-based systems with gunicorn (Python http) and NginX (SSL/reverse proxy) were used for developement and production of the O-GlcNAc Database.

The whole front-end HTML 5.0 code of the O-GlcNAc Database was validated with no error nor warning using the W3C markup validation service.

Please report to admin@oglcnac.com if you find any bug or disfunction when browsing our content.




CHANGE LOGS & COMPATIBILITY



Compatibily of the O-GlcNAc Database version 1.3 and known bugs

Jan-31-23: The O-GlcNAc Database version 1.3   New!
Apr-29-21: The O-GlcNAc Database version 1.2 Feb-21-21: The O-GlcNAc Database version 1.1 Dec-25-20: The O-GlcNAc Database version 1.0 Nov-16-20: Beta version
Nov-12-20: On the Web
Nov-8-20: Testing version