In many fields of molecular
biology, especially in signal transduction,
thousands of new results are published each
year, and relationships between the information
in different articles is often not immediately
apparent, even if a research manages to read
all relevant, published articles. Medline in
NCBI contains over 10 million abstracts, and
approximately 40,000 new abstracts are added
each month. Although there are growing numbers
of sequence database and other hand-constructed
databases, most new information is unstructured
text Medline and full text journals. Biological
literature mining can be useful in accomplishing
the following tasks: identification of the names
of biological entities, identification of various
among biological entities, and identification
of the status of biological discoveries stated
in literature and web pages.
Currently, our groups work on biological literature
mining which including biological named entity
recognition and biological relation extraction.
We have designed two named entity recognition
system which one base on dictionary-based method
and the other machine learning method. In relation
extraction, we have implemented a pattern-based
system aim for extracting Gene-Disease and Protein-Protein
relation. We believe that these system can facilitate
the biologist¡¦s paper reading efforts.
Demo site URL: http://bioinformatics.iis.sinica.edu.tw/BioLiteratureMining/