hPP Corpus: A Tagged Biomedical Corpus for AutomaticExtraction of Human Protein Phosphorylation for UnderstandingCellular Functions

doi:10.23880/jes-16000140

Journal of Embryology & Stem Cell Research (JES)
ISSN: 2640-2637

Research Article

hPP Corpus: A Tagged Biomedical Corpus for Automatic Extraction of Human Protein Phosphorylation for Understanding Cellular Functions

Authors:

DOI: 10.23880/jes-16000140

Volume 4, Issue 1

Abstract

Proteins perform their functions by interacting with other proteins. Phosphorylation is a post-transcriptional modification of proteins and plays an important role in cellular functions. Protein interaction and phosphorylation play a critical role in biological functions and indicate disease states including cancer, Alzheimerâ€™s disease and Parkinsonâ€™s disease. Mining protein phosphorylation information from biomedical literature is a topic of interest in biomedical text mining and highly challenging. Text mining researchers apply a variety of algorithms to extract such information. A standard annotated corpus is necessary to evaluate the performance of the text mining algorithms. However, to our best knowledge there is no standard annotated corpus available for evaluating approaches related to the extraction of protein phosphorylation information related to human. The available corpora, iProLink, PTM (Post Transcriptional Modification) phosphorylation extraction corpus and protein phosphorylation corpus from Protein Information Resource (PIR) are not specific to human. In this paper, we present a corpus called â€˜hPP (human Protein Phosphorylation) corpusâ€™ exclusively on human protein phosphorylation information. Current version of hPP corpus contains 2,380 sentences from 1,000 MEDLINE abstracts related to human protein phosphorylation. The corpus is annotated with named entities, event relationship and syntactic dependencies, and freely available at http:// www.biominingbu.org/hPPcorpus/hPP_corpus.xml. To our best knowledge hPP corpus is the first and foremost annotated corpus available for evaluating text mining systems on extracting human protein phosphorylation from MEDLINE abstracts.

Keywords: Cellular Function; Protein Phosphorylation; Post-Transcriptional Modification; Text Mining, Information Extraction; Named Entity Recognition

View PDF

Submit Manuscript

Anaesthesia and Critical Care Medicine Journal (ACCMJ) Open Access Journal of Microbiology & Biotechnology (OAJMB) Diabetes & Obesity International Journal (DOIJ) Advances in Clinical Toxicology (ACT) Pediatrics & Neonatal Biology Open Access (PNBOA) Virology & Immunology Journal (VIJ) Neurology & Neurotherapy Open Access Journal (NNOAJ) Otolaryngology Open Access Journal (OOAJ) Cell & Cellular Life Sciences Journal (CCLSJ) Open Access Journal of Cardiology (OAJC) Psychology & Psychological Research International Journal (PPRIJ)

Journal of Embryology & Stem Cell Research (JES)
ISSN: 2640-2637

hPP Corpus: A Tagged Biomedical Corpus for Automatic Extraction of Human Protein Phosphorylation for Understanding Cellular Functions

Abstract

Recommended journals

Indexed in

Member

About Us

Guidelines

Services

Connect With us

Journal of Embryology & Stem Cell Research (JES) ISSN: 2640-2637

hPP Corpus: A Tagged Biomedical Corpus for Automatic Extraction of Human Protein Phosphorylation for Understanding Cellular Functions

Abstract

Recommended journals

Indexed in

Member

About Us

Guidelines

Services

Connect With us

Journal of Embryology & Stem Cell Research (JES)
ISSN: 2640-2637