Identifying protein function is an ongoing task in bioinformatics as its application for various fields such as human healthcare. Automated protein annotation solutions are being developed to fill the gap between annotated proteins and new sequences o...
Identifying protein function is an ongoing task in bioinformatics as its application for various fields such as human healthcare. Automated protein annotation solutions are being developed to fill the gap between annotated proteins and new sequences obtained by Next Generation Sequencing techniques. The most popular database used to describe the protein functions is Gene Ontology (GO) which provides 40,000 terms describing gene product functions. As protein sequence is the most abundant resource available, predicting function from the protein sequence only is still a challenging problem. We here provide a review of GO based protein function annotation and propose prediction models using pretrained protein sequence embeddings to predict their GO terms. Our proposed approach has no limitations on sequence length, offers fast training and delivers almost the best performances in Cellular Component ontology, while giving competitive scores in Molecular Function and Biological Process categories, compared to other baselines and deep learning based methods.