Insilico Scientist BioInformatics: 2008

Tuesday, June 24, 2008

DNA Discovery & Protein Synthesis

Structure of DNA

The molecular biologists realized that in order to learn how DNA might reproduce itself and transmit inherited information, they needed to discover the structure of the DNA molecule. They would have to work “blindfolded,” in the sense that earlier studies had provided very few clues to guide them. The researchers knew that each DNA molecule contained many copies of the four types of bases, small molecules called adenine, cytosine, guanine, and thymine. The molecule also included at least one “backbone,” a long string of identical, alternating sugar and phosphate molecules. X-ray crystallography, a technique that helped chemists analyze the shape of molecules, suggested that the backbone was shaped like a coil, or helix. Austrian-born biochemist Erwin Chargaff had shown in the late 1940s that the amount of cytosine in a DNA molecule was always the same as the amount of guanine, and the same was true of adenine and thymine. However, no one knew how many backbone strands each molecule of DNA contained or how the backbones and bases were arranged within the molecule.

James Watson and Francis Crick deduced in 1953 that each molecule of deoxyribonucleic acid (DNA) is made up of two “backbones” composed of alternating smaller molecules of phosphate (P) and deoxyribose (D), a sugar. The backbones both have the shape of a helix, or coil, and they twine around each other. Inside the backbones, like rungs on a ladder, are four kinds of smaller molecules called bases. The bases always exist in pairs, connected by hydrogen bonds. Adenine (A) always pairs with thymine (T), and cytosine (C) always pairs with guanine (G).

How DNA Replicates

DNA’s structure explains its power to duplicate itself. When a cell prepares to divide, the hydrogen bonds between the bases dissolve and the DNA molecule splits along its length like a zipper unzipping. Each half then attracts bases and backbone pieces from among the molecules in the cell, forming the same pairs of bases that had existed before. The result is two identical DNA molecules.

If DNA carried hereditary information, Crick and Watson said, DNA molecules had to be able to reproduce themselves when chromosomes duplicated during cell division. The two men believed that the key to DNA’s reproduction lay in the molecule’s mirror-image structure. Just before a cell divides, they proposed, the weak hydrogen bonds between the pairs of bases in its DNA molecules break. Each molecule then splits lengthwise, like a zipper unzipping. Each base attracts its pair mate, complete with an attached backbone segment, from among free-floating materials in the cell nucleus. An adenine molecule always attracts a thymine and vice versa, and the same for cytosine and guanine. When the process is complete, the nucleus contains two identical double-stranded DNA molecules for every one that had existed before. The cell now splits, and each of the two daughter cells receives a complete copy of the original cell’s DNA. Experiments later confirmed this theory.

Use of DNA to make Protein

As a first step in making a protein, part of a DNA molecule (a gene) uses itself as a pattern to form a matching stretch of messenger RNA (mRNA). When the messenger RNA moves into the cytoplasm of the cell, it attracts matching short stretches of transfer RNA (tRNA), each of which tows a single amino acid molecule. With the help of an organelle called a ribosome, the transfer RNA molecules lock onto the matching parts of the messenger RNA, and the amino acids they carry are joined, forming a protein.

Crick and Brenner suggested that DNA makes a copy of itself in the form of RNA (ribonucleic acid), which is like DNA except that it has a different kind of sugar in its backbones, and in place of thymine it has a different base, uracil. DNA normally cannot leave a cell’s nucleus, but its RNA copy, which came to be called messenger RNA, can travel into the cytoplasm, the jellylike material that makes up the outer part of the cell. In the cytoplasm, Crick and Brenner said, the messenger RNA encounters small bodies called ribosomes. A ribosome rolls along the messenger RNA molecule and attracts from the cytoplasm the amino acid represented by each three-base “letter” of the translated DNA code. Crick believed that what he called adapter molecules (later called transfer RNA) tow the amino acids to the correct spots on the messenger RNA. The amino acids then join together, forming the protein. The messenger RNA and the ribosome release the protein molecule into the cell. Brenner and other researchers in the early 1960s proved that this theory was essentially correct.

Ref : Modern Genetics Engineering Life

Sunday, June 8, 2008

Biology And Computer Science

One of the most exciting things about being involved in computer programming and biology is that both fields are rich in new techniques and results. Of course, biology is an old science, but many of the most interesting directions in biological research are based on recent techniques and ideas. The modern science of genetics, which has earned a prominent place in modern biology, is just about 100 years old, dating from the widespread acknowledgement of Mendel's work. The elucidation of the structure of deoxyribonucleic acid (DNA) and the first protein structure are about 50 years old, and the polymerase chain reaction (PCR) technique of cloning DNA is almost 20 years old. The last decade saw the launching and completion of the Human Genome Project that revealed the totality of human genes and much more. Today, we're in a golden age of biological research—a point in human history of great medical, scientific,
and philosophical importance.

Computer science is relatively new. Algorithms have been around since ancient times (Euclid), and the interest in computing machinery is also antique (Pascal's mechanical calculator, for instance, or Babbage's steam-driven inventions of the 19th century). But programming was really born about 50 years ago, at the same time as construction of the first large, programmable, digital/electronic (the ENIAC ) computers. Programming has grown very rapidly to the present day. The Internet is about 20 years old, as are personal computers; the Web is about 10 years old. Today, our communications, transportation, agricultural, financial, government, business, artistic, and of course, scientific endeavors are closely tied to computers and their programming. This rapid and recent growth gives the field of computer programming a certain excitement and requires that its professional practitioners keep on their toes. In a way,
programming represents procedural knowledge—the knowledge of how to do things— and one way to look at the importance of computers in our society and our history is to see the enormous growth in procedural knowledge that the use of computers has occasioned. We're also seeing the concepts of computation and algorithm being adopted widely, for instance, in the arts and in the law, and of course in the sciences. The computer has become the ruling metaphor for explaining things in general. Certainly, it's tempting to think of a cell's molecular biology in terms of a special kind of computing machinery. Similarly, the remarkable discoveries in biology have found an echo in computer science. There are evolutionary programs, neural networks, simulated annealing, and more. The exchange of ideas and metaphors between the fields of biology and computer science is, in itself, a spur to discovery (although the dangers of using an improper metaphor are also real).

Monday, May 12, 2008

In silico medicine: computer simulations aid drug development and medical care

Like millions of people in the United States, Bill and Allen have asthma. They're lucky enough to take the newest therapies, sometimes even before the drugs come to market. Yet neither Bill nor Allen has ever been to a doctor's office or the hospital. After all, it's okay if they get sick or even die--a simple click of a mouse can restore these patients to perfect health. Bill and Allen are two of the newest subjects in a growing research area called in silico biology. As basic researchers, drug developers, doctors, and health-care administrators struggle with the sheer volume and complexity of scientific information that they face every day, information gathered from computer-generated patients like Bill and Allen may lead to better decisions.

Computer simulations aren't new in nonmedical research and development. Much of climatology and geology is modeled with computers, and engineers routinely design products--from airplanes to tire treads--using mathematical simulations rather than building and testing prototypes. But such computer applications have been slow to reach people studying the vagaries of the human body. Today, however, researchers are increasingly turning to computers to explore medical science.
"Computer simulation is the modus operandi for the future of biomedical research," says James B. Bassingthwaighte of the University of Washington in Seattle. "The complexity of [biological] systems is such that intuition is of no help in figuring out where an intervention might have an effect or what is the most efficient means of diagnosis and treatment."

DEVELOPING DRUGS

The company that Bill and Allen call home created its model of asthma patients by compiling information from more than 3,500 scientific studies on the disease. Next, scientists at the company--Entelos in Menlo Park, Calif.--wrote equations that account for more than 7,500 parameters of a person's health that may be important in asthma. These include the thickness of the mucous lining in the lungs and the effects of inflammation in blocking the airways.
From one to the next, virtual patients may respond differently to drugs depending on their health parameters, just as real patients do, and their diseases may take different courses. But because the researchers know the exact differences between their virtual patients, they can more easily tease out the most important factors in complicated interactions.
Virtual patients may also respond differently when the modelers use different views of how a disease develops. Bill and Allen, for example, represent two hypotheses about how asthma attacks begin. Bill reflected the idea that the immune-signaling molecule interleukin-5 overstimulates immune cells called eosinophils. Those cells then trigger inflammation and asthma attacks. Allen, in contrast, had an asthma attack when immune cells called macrophages blocked his airways.
Entelos developed Bill and Allen when the Bridgewater, N.J.-based drug company Aventis was considering tests of whether compounds that block interleukin-5 work as treatments for asthma attacks. When exposed to virtual allergens and no drugs, Bill's asthma attacks lasted 3 days, much longer than is typical for people with asthma. Simulated injections of interleukin-5 blockers prevented Bill's asthma attacks but not Allen's.
Because Bill's asthma didn't seem to reflect real life and Allen didn't respond to the interleukin-5 blockers, Aventis didn't pursue these compounds as potential asthma therapies. The Entelos model seems to have been accurate. Despite promising animal studies, when other companies recently tested interleukin-5 blockers in people, they found that the compounds have much less effect than the researchers had originally expected.
Each simulation of a disease begins by modeling the normal physiology and interaction of the organs involved. "We are striving for a whole-body approach to health and disease," says Jeff-Trimmer of Entelos. "We want to use [our models] to understand how a person gets sick." Even when models don't seem to simulate what happens in real life--as in Bill--the findings can help researchers better understand physiological factors that are important in causing diseases, says Trimmer.
To date, computer simulations of basic human biology are closely intertwined with the commercial quest to develop new drugs quickly and inexpensively. Pharmaceutical companies can now take more than a decade to move a compound from the lab to the clinic, and many potential drugs fall by the wayside in that journey. Companies report that about 20 percent of the drugs that succeed in preliminary trials are abandoned once doctors test them in large, expensive clinical trials: Either they don't work as expected, or troubling side effects emerge.
The current system is so bad that anything would be an improvement, says Lewis Sheiner of the University of California, San Francisco, who models drug responses.
Drug companies hope that computer simulations will focus their efforts on the candidate drugs most likely to work in people. With biological models of cells, organs, and metabolic pathways, industry scientists can pick out genes or proteins involved in a disease and then sort through various drug candidates. Only the most promising ones would then be tested in cells and animals. Such modeling is gaining proponents, says Karin Jorga of Hoffmann-La Roche in Basel, Switzerland.

Computer simulations may also reduce the number and size of human trials that a company has to conduct on the way to a drug's approval by the Food and Drug Administration, Jorga says. The simulations can indicate the most efficient trial designs to reveal beneficial effects. Some computer models that simulate how drugs are broken down in the body can predict the most effective dose, she says. At a meeting on drug effects in the Netherlands earlier this year, Jorga reported that such modeling reduced the time Hoffmann-La Roche needed to develop and test a new formulation of a drug for preventing bone loss. With the new formulation, people could take the drug once a month rather than once a day.

Another example of how computer simulations have guided drug design is a virtual heart developed by Denis Noble of the University of Oxford in England. The model consists of a mass of virtual cells, each processing virtual sugar and oxygen. On a computer screen, this heart beats just like the real thing. The virtual heart can be programmed to develop different diseases, and scientists can see how it responds to treatment with different drugs.
Noble's model has been used, for example, to predict whether one drug is likely to cause abnormal heart rhythms, a widely recognized, potentially dangerous side effect. Computer simulations, however, aren't good at predicting unexpected side effects.
Greater advantages could result if biomedical researchers combine the models they're developing. Entelos is trying to hook together models of several different diseases to increase its chances of detecting side effects or spotting dangerous drug interactions. For example, heart disease, obesity, and diabetes tend to occur in the same patients, Trimmer says, so it would be useful to model all those ailments at the same time.
Bassingthwaighte and other researchers are combining individual computer models of genes, proteins, cells, organs, and metabolic pathways. With such a set of coordinated models, he suggests, medical scientists might see effects of disease that are hidden by the complexity of the human body.
This effort--sometimes called the virtual human or physiome project--is just beginning. Many technical and scientific hurdles remain: getting the models to communicate with one another, finding the powerful computers to run such simulations, even gathering the data needed to develop some of the models.
And not everyone is convinced that unified models will offer helpful information. "If you're an engineer, it isn't useful to use a quantum mechanical model of the world to talk about girders and bridges," says Sheiner. "Getting the scale of the model to match the scale of the question is tricky, so that working from the genes up may not be an efficient way to determine the best ways of treating particular diseases."

MODELING MEDICAL CARE

Some researchers are applying computer simulations to an even broader realm of health care. Rather than studying cells and organs, researchers associated with the Oakland, Calif.-based health-care plan Kaiser Permanente have created a computer simulation of a virtual world where people develop diseases--asthma, diabetes, and heart disease so far--and go to doctors for tests and treatments. Called the Archimedes model, it may help Kaiser Permanente set guidelines for patient care and find ways to monitor caregivers' performance, says physician David M. Eddy, a consultant to Kaiser Permanente. He and physicist Leonard Schlessinger of Kaiser developed the model.

Archimedes may also predict results from clinical trials that simply can't be carried out in the real world. It may also give physicians results more quickly than would be possible with an actual trial. Eddy cites a variety of obstacles for carrying out real-world trials: "the pace of innovation, the high cost of doing research, the long follow-up times required, the large number of options to be compared ... and the unwillingness of the world to stand still until the research is done."
All those factors "severely limit our ability to evaluate all the options [for patient care] through clinical research alone," he adds.

The model takes into account many influences on disease in people, says Eddy. For example, each virtual patient has a virtual liver and pancreas. The health of these organs affects the concentration of sugar in a virtual patient's blood. When sugar concentrations rise high enough, a patient may experience symptoms of diabetes, such as thirst or frequent urination, and go to a doctor. Although the underlying causes of diabetes aren't well understood, Archimedes uses equations that reproduce the disease's known characteristics, such as the observed incidence of the disease in different ethnic groups.
"As in reality, each patient is different," says Eddy. Some patients may take medicine or follow a doctor's recommendations more meticulously than others do. Moreover, patients may receive care of different quality. Reflecting studies of doctors' habits, a virtual primary-care physician is less likely to accurately follow treatment guidelines than is a virtual specialist. These factors, in turn, affect a number of physiological variables in the computer, as in life.
For example, different simulated diet and exercise plans reduce the cholesterol, blood pressure, and weight of the virtual patients by different amounts. Those changes then affect the likelihood that, say, an overweight patient would develop diabetes or heart disease.
At the annual meeting of the American Diabetes Association in June, Eddy reported results from an Archimedes simulation of a real-life, large clinical trial. The actual trial was known as the Diabetes Prevention Program (SN: 9/8/01, p. 159). Mimicking that trial, the virtual one enrolled virtual overweight people with a variety of risk factors for developing diabetes. It randomly assigned them to receive intensive counseling about diet and exercise, a drug called metformin that increases a person's sensitivity to insulin, or a placebo.
To program the model, Eddy and his colleagues used results from a variety of real-life studies about the effects of various interventions on weight and a variety of other risk factors for diabetes. The first time Archimedes simulated the actual diabetes trial, the virtual results were very close to the real results-as close as one would expect results of two actual trials to be, says Eddy. The simulation almost exactly mimicked the clinical trial when the modelers included the weight regained by participants.
"If we could not have afforded that trial, we could have predicted the results quite well using this model," says Richard Kahn of the American Diabetes Association in Alexandria, Va. "The [simulated] health-care system responded very much like the real one, and that suggests that Archimedes could help improve the way we treat patients."

SIMULATING REALITY

Despite the promise of technology, "no model is perfect," cautions Bassingthwaighte. "The limitation is our database of knowledge. If we don't understand it, we can't model it."
Thanks to the Human Genome Project, which recently decoded essentially the entire human genome, researchers now have a large database of genes and proteins, he says. A new field of research, called proteomics, is adding information about the expression of proteins and their interactions in cells. However, Bassingthwaighte cautions, relatively little is known about how most of those molecules affect cells and organs. He expects that extensive experimenting will be required to tease out that information.
Likewise, simulations of disease processes and health care can reproduce what's known but can't always identify an unexpected side effect. For instance, even today, a computerized model probably wouldn't predict the damage that the diet drug fenfluramine causes to heart valves, as revealed in studies in 1997 (SN: 10/18/97, p. 252), says Trimmer. It's for such reasons that he notes, "This [simulation] technology is not a replacement for traditional research but a complement to it."
"The fundamental idea is that if you can model a real thing on a computer, you can answer a lot of questions without experimenting with the real thing," says Sheiner. "While models are far from perfect because they depend on imperfect information, they can give you an intelligent basis for making decisions when you're faced with uncertainty."
Early progress with computerized biomedicine has modelers enthusiastic about its future. "My goal is to move medicine onto a quantitative footing," Eddy says. "There's still a long way to go, but simulations will certainly help. We've shown that this plane will fly."
Who develops diabetes?

ACTUAL TRIAL * COMPUTER SIMULATION
Placebo 25 percent 26 percent
Metformin 19 percent 21 percent
Counseling 12 percent 12 percent
* THE DIABETES PREVENTION PROGRAM SIMULATION (INCLUDING PARTICIPANTS' WEIGHT GAIN) BY A COMPUTER MODEL CALLED ARCHIMEDES.

Sunday, May 11, 2008

Bioinformatics: Life Science Research In Silico

As late as the early 1990s, biology and related fields required very little experience with computers. Now, however, the vast amount of data on DNA sequences and proteins generated from the Human Genome Project and from labs around the world have made it clear that biologists will have to rely more on computers to organize, store and efficiently make use of these data for analysis. Thus, from the fusion of computer science and biology, a relatively new field was born: bioinformatics, a field that holds promise for speeding up drug discovery but also faces problems as it is integrated into our society.
Bioinformatics has emerged in response to major advances in molecular biology technologies, and has been made possible by the exponential growth of computer technology. This interdisciplinary field bridging biology, math, and computer science has three major components: the organization of data generated from experiments into databases, the development of new algorithms and software, and the use of software for the interpretation and analysis of data. Each of these components contributes to the optimal use of the information generated from experiments.

Example of a database

FlyBase illustrates the usefulness of a well-built database of biological information in research. This database contains all the currently available knowledge on the fruit fly, the model experimental organism Drosophila melanogaster. The information found on Flybase includes gene and protein sequences found in fruit flies, protein expression patterns and functions, literature references, as well as a list of researchers working with this organism. This database, which is updated regularly and accessible online, allows researchers to quickly obtain the information they need.

Database Search

Well-built databases are very useful for storing, organizing, and accessing information. However, in order to make sense of the data, researchers must be able to analyze and interpret them. This is where the bioinformatics “tools” -- algorithms developed for data analysis -- come in.When a researcher sequences a piece of DNA, for example, he/she simply gets a long string of the letters: A, G, C, and T, each representing one of its constituent bases. By comparing this sequence to all of the available sequences in a given database, researchers could determine whether the sequence codes for any gene, or whether it is similar to any known sequence from another organism. However, due to the large amount of DNA sequences present in any database these days, this task may take months, or even years, if it is done manually.
Computers, on the other hand, can perform this task efficiently. One frequently used tool to search for similar sequences in a database is called BLAST (Basic Local Alignment Search Tool). This program compares a user-specified DNA sequence to sequences in a database and outputs the results, starting from the sequences that best matches the input sequence based on its algorithm.
Armed with this and many other similar data analysis software tools, biologists have been able to make many discoveries, such as the identification of gene coding regions on DNA sequences, as well as possible links of genes to certain diseases.

Bioinformatics and Drug Discovery

Major pharmaceutical companies are also keeping up with the new field of bioinformatics. Some have opened specialized bioinformatics research units to help in their drug development process. As Mark Swindells and Richard Fagan pointed out in their June 2001 Chemical Innovation article “Target Discovery Using Bioinformatics," “success in the race to discover targets in the post-genomic world will go not simply to the companies with the greatest repertoire of privately held sequences, but to the companies with the greatest ability to mine the value locked in the burgeoning data archives."
Bioinformatics can, for example, be used to determine whether a drug target in a pathological bacterium is also present in humans. This will help researchers predict the drug's potential side effects. It might also allow them to decide early in the drug development process whether to abandon or continue with research on that drug, thus saving precious time and money. Moreover, bioinformatics can aid in the prediction of protein structures and functions (based on homology to known proteins) to determine the potential of a protein to be a drug target.
To keep up with the demand of large pharmaceuticals companies, smaller companies such as Accelerys (San Diego, CA), LION Bioscience AG (Heidelberg, Germany), and Incyte Genomics (Palo Alto, CA) are offering access to information services and data analysis software. Meanwhile, other companies have opted to form strategic alliances with the large pharmaceuticals; for example, Rosetta Inpharmatics, a developer of bioinformatics software, has now become a subsidiary of Merck & Co., Inc., located in Kirkland, WA.

Current Problems in Bioinformatics

The generation of biological information has increased with unprecedented speed in the past two decades. This has been matched by the rapid development of supercomputer power and has been backed with large monetary investments. But one thing has not kept up with this growth. Presently, there is a lack of trained personnel in this interdisciplinary field. This could be one of the major limitations to the future expansion of bioinformatics. There are still very few programs in bioinformatics at major universities around the world. Even those schools offering it face the difficulty of finding and keeping individuals who possess the required expertise to teach, since biotech and pharmaceutical companies can offer more attractive salaries and benefits.
According to some companies, such as Rosetta Inpharmatics, and Abgenix, an ideal bioinformatician should have a solid background in biology and be very comfortable with UNIX and programming languages such as C and Perl. However, the generalized shortage of bioinformaticians has forced companies to hire computer scientists or mathematicians and teach them about biology, or hire biologists who have some self-taught computer skills.

The problem with this, however, as some critics point out, is that biologists often don't have very strong statistical and/or programming training. The computer scientists, on the other hand, often don't understand what the really meaningful biological questions are when creating algorithms.
Fortunately, this problem has been taken seriously and several universities in the United States are incorporating bioinformatics courses in their undergraduate curricula or establishing institutes of bioinformatics. The University of California at Davis, for example, is putting $95 million into a new bioinformatics program. The Virginia Polytechnic Institute will invest $100 million for its new Virginia Bioinformatics Institute which will inhabit three buildings. Other universities such as the University of Florida at Gainesville, the University of Sciences in Philadelphia and George Mason University in Virginia have also formed bioinformatics departments on their campuses. It is hoped that these programs will provide life sciences students with a solid background in experimental biology as well as a good understanding of the computer tools available for the investigation of biological questions.

Conclusions

DNA sequences generated by the human genome project, protein structures, gene expression patterns, etc. carry with them an enormous amount of valuable information. To uncover all the information, however, will still take many more years of research. Bioinformatics, as we have seen, is set to become an indispensable tool. This is not to say that wet-bench work is not required; on the contrary, bioinformatics will give researchers the tools necessary to guide their bench work and handle computers as easily as they do the microscope.

Thursday, May 8, 2008

Biological Databases

Biological and Protein Databases

The completion of the sequencing of the Human genome was achieved in April, 2003. This completion has had an outstanding effect on how biological and biomedical research is conducted. The sequencing has given us information on human sequence variation data, model organism sequence data, and information on gene structure and function which all provide ground for the researchers to better design and interpret their experiments, fulfilling the promise of bioinformatics in advancing and accelerating biological discovery.

GenBank is the database in which most researchers are familiar wth. GenBank is the annotated collection of all publicly available DNA and protein sequences. This database, maintained by National Center for Biotechnology Information (NCBI) at the National Institutes of Health, represents a collaborative effort between NCBI, the European Molecular Biology Laboratory (EMBL), and the DNA Data Bank of Japan (DDBJ).
The Human Genome Project along with other sequencing projects has allowed for a vast number of sequence data. For example the number of bases in GenBank doubles every 14 months, and this exponential growth rate is expected to continue for some time to come.
GenBank, or any other biological database for that matter, serves little purpose unless the data can be easily searched and entries retrieved in a usable, meaningful format. Otherwise, sequencing efforts have no useful end, since the biological community as a whole cannot make use of the information hidden within these millions of bases and amino acids.
Be that as it may, the range of publicly available biological data goes far beyond what is included in GenBank. Since the major public sequence databases need to be able to store data in a generalized fashion, often times these databases do not contain more specialized types of information that would be of interest to specific segments within the biological community. To address this, many smaller, specialized databases have emerged. These databases, which contain information ranging from strain crosses to gene expression data, provide a valuable adjunct to the more visible public sequence databases, and the user is encouraged to make intelligent use of both types of databases in their searches.

EMBL (Europe) (SRS and EMBL and EBI)

The EMBL (European Molecular Biology Laboratory) nucleotide sequence database is maintained by the European Bioinformatics Institute (EBI) in Hinxton, Cambridge, UK.
It can be accessed and searched through the SRS system at EBI, or one can download the entire database as flat files.

DDBJ (Japan)

The DNA Data bank of Japan began as collaboration with EMBL and GeneBank. The National Institute of Genetics runs it.
One can search for entries by accession number.

PROTEIN DATABASES

PRIMARY PROTEIN SEQUENCES DATABASE
PIR ( The protein Information Resource protein sequence database) -

PIR-PSD is maintained by National Biomedical research foundation ( NBRF), International Protein Information Database of Japan (JIPID) and Martinsired Institute for Protein Sequences (MIPS)
PIR-PSD data processing involves four major steps: import, merging, classification and annotation
The primary source for (PSD) are naturally occurring wild types.The sequences from Gene Bank/ EMBL/DDBJ translations, published literature and direct submission to PIR international
One can search for entries or do sequence similarity searches at PIR the site.
The database can also be downloaded as a set of flat files
PIR also produces the three dimensional structures available in Protein Databank (PDB).
The database is split in to four distinct sections, designated PIR1 PIR4, which differs in term of quality of data and level of annotation provided:
PIR1: Fully classified and annotated entries
PIR2: Includes preliminary entries which have not been thoroughly reviewed and may contain redundancy
PIR3:Unverified entries
PIR4: entries fall in to one of four category (i) conceptual translation of art factual sequences (ii) conceptual translation that are not transcribed or scribed translated (iii) conceptual translation that are extensively genetically engineered (iv) sequences that are not generally encoded and not produced on ribosome
Program are provided for data retrieval and sequencing searching via the NBRF-PIR database web Page

MIPS

The Martinsired Institute for protein sequences collects and processes sequences data for the tripartite PIR- International Protein Sequences database Project.
The database is distributed with PATCHX, a supplimentsuppliment of unverified protein sequences from external sources.

Access to the database is provided through web server: results of FastA similarity searches of all proteins within PIRsimilarity PIR- International and PATCHX are stored in a dynamically maintained database, allowing the instant access to Fast A results.

SWISS

Currently maintained by SIB and EBI/EMBL.
Provide high level annotation, including description of function of the protein , and structure of its domains, its post translation modification, variants and so on
Details of SWISS-PORT entry

TrEMBL

Computer annotated supplement to SWISS –PORT
Contains translation of all coding sequences (CDS) in EMBL.
SPSP-TrEMBL :Contains entries that will eventually be incorporated in to SWISS-PORT, but that not yet been manually annotated.
REMREM-TrEMBL Contains sequences that are not destined to be including in SWISS-PORT

NRL-3D

Produced by PIR from sequences extracted from the Brookhaven Protein databank (PDB)

SECONDARY DATABASES
PROSITE

PROSITE is a database of short protein sequence patterns and proPROSITE profiles that
files characterize biologically significant sites in proteins.
It is a part of SWISS-PROT and is maintained in the same way as SWISS-PROT.
PROSITE is based on regular expressions describing characteristic sub-sequences of specific protein families or domains.

Profile

A position specific scoring table that encapsulates the sequence information within
complete alignment is termed as profile

PRINTS

PRINTS provide a compendium of protein finger prints – groups of conserved
motifs that characterize a protein family.

Pfam

Pfam is a database of protein families defined as domains (contiguouis segments of entire protein sequence). For each domain, it contains a multiple alignment of set of defining sequences (the seeds) and the other sequences in SWISS-PROT and TrEMBL that can be matched to that alignment.

BLOCKS

BLOCKS Patterns without gaps in aligned protein families defined by PROSITE, found by pattern searching and statistical sampling algorithms.

Identify

Identify is another automatically derived tertiary resource derived from BLOCK.

STRUCTURAL CLASSIFICATION DATABASE
SCOP (Structure Classification of Proteins)
SCOP Classification system
Family
Super family
Fold

CATH ( Class, Architecture, Topology, Homology)
CATH Classification
Class
Architecture
Topology
Homology

PDB sum
Analyze all the structure in the PDB

Biological databases provide a useful and informative role in the Biology community, providing the first step in being able to perform vigorous and accurate bioinformatic analyses.

Sunday, May 4, 2008

Application of BioInformatics in various Fields

Application of BioInformatics in various Fields

Bioinformatics is the use of IT in biotechnology for the data storage, data warehousing and analyzing the DNA sequences. In Bioinfomatics knowledge of many branches are required like biology, mathematics, computer science , laws of physics & chemistry, and of course sound knowledge of IT to analyze data. Bioinformatics is not limited to the computing data, but in reality it can be used to solve many biological problems and find out how living things works.
It is the comprehensive application of mathematics (e.g., probability and statistics), science (e.g., biochemistry), and a core set of problem-solving methods (e.g.,computer algorithms) to the understanding of living systems.

Bioinformatics is being used in following fields:
Molecular medicine
Personalised medicine
Preventative medicine
Gene therapy
Drug development
Microbial genome applications
Waste cleanup
Climate change Studies
Alternative energy sources
Biotechnology
Antibiotic resistance
Forensic analysis of microbes
Bio-weapon creation
Evolutionary studies
Crop improvement
Insect resistance
Improve nutritional quality
Development of Drought resistance varieties
Vetinary Science

Friday, May 2, 2008

Medical discovery of the century

The Medical Discovery of the Century!
This Miracle Substance

Protects against Diabetes...

Stops Heart Disease Before it Starts...

And Kills Cancer Cells on Contact...

So, Why Haven’t You Heard About It?

Imagine a drug that can cause cancer cells to self-destruct, while leaving healthy cells to proliferate. It can also convert tumor cells into normal cells and prevent cancer from spreading to other parts of the body.
In fact, this "drug" has been shown to help in the prevention and treatment of cancer in seven different ways!
But that's not all. It can also dramatically reduce your risk of heart disease and stroke... lower bad cholesterol... and help you maintain healthy blood pressure levels.
This "drug" also helps the pancreas to function better, while increasing the body's sensitivity to insulin. It is a godsend for diabetics and a powerful way to prevent the disease in the first place. And would you believe that there are no negative side-effects, when used properly? Sounds like a miracle, right?
The miracle substance I'm talking about is vitamin D... and it's free of charge and available right outside your front door.
Vitamin D Is FREE from the Sun - And What It Does Is Nothing Short of Amazing…
At last, the truth about sun exposure, vitamin D and your health is revealed...

The healing power of sunlight :
Sunlight and vitamin D could wipe out the four biggest risk factors for heart disease
Sunlight helps you build strong bones and protects you from osteoporosis, osteomalacia and rickets
Sunlight helps to improve your immunity
Back and joint pain could be related to your vitamin D levels

So get your sun exposure safely........................

Wednesday, April 30, 2008

BioInformatics

Introduction to BioInformatics

Biological data is proliferating rapidly. Public databases such as GenBank and the ProteinData Bank have been growing exponentially for some time now. With the advent of theWorld Wide Web and fast Internet connections, the data contained in these databases anda great many special-purpose programs can be accessed quickly, easily, and cheaply fromany location in the world. As a consequence, computer-based tools now play anincreasingly critical role in the advancement of biological research.Bioinformatics, a rapidly evolving discipline, is the application of computational toolsand techniques to the management and analysis of biological data. The termbioinformatics is relatively new, and as defined here, it encroaches on such terms as"computational biology" and others. The use of computers in biology research predatesthe term bioinformatics by many years. For example, the determination of 3D proteinstructure from X-ray crystallographic data has long relied on computer analysis. It'simportant to be aware, however, that others may make different distinctions between theterms. In particular, bioinformatics is often the term used when referring to the data andthe techniques used in large-scale sequencing and analysis of entire genomes, such as C.elegans, Arabidopsis, and Homo sapiens.

The NIH Biomedical Information Science and Technology definition
“Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.”
Michael Liebman’s definition in “Bioinformatics: An Editorial Perspective”
“Bioinformatics is “the study of the information content and information flow in biological systems and processes.”

In short bioinformatics is management information system for molecular biology and has many practical applications :

Annotate
Store
Search/Retrieve
Analyze & Visualize

Bioinformatics derives knowledge from computer analysis of biological data. These can consist of the information stored in the genetic code, but also experimental results from various sources, patient statistics, and scientific literature. Research in bioinformatics includes method development for storage, retrieval, and analysis of the data. Bioinformatics is a rapidly developing branch of biology and is highly interdisciplinary, using techniques and concepts from informatics, statistics, mathematics, chemistry, biochemistry, physics, and linguistics. It has many practical applications in different areas of biology and medicine.
Roughly, bioinformatics describes any use of computers to handle biological information. In practice the definition used by most people is narrower; bioinformatics to them is a synonym for "computational molecular biology"- the use of computers to characterize the molecular components of living things.

Bioinformatics Definition -Personal view
The Tight Definition "Classical" bioinformatics:

Fredj Tekaia at the Institut Pasteur offers this definition of bioinformatics: "The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information.

The Loose definition :

There are other fields-for example medical imaging / image analysis which might be considered part of bioinformatics. There is also a whole other discipline of biologically-inspired computation; genetic algorithms, AI, neural networks. Often these areas interact in strange ways. Neural networks, inspired by crude models of the functioning of nerve cells in the brain, are used in a program called PHD to predict, surprisingly accurately, the secondary structures of proteins from their primary sequences. What almost all bioinformatics has in common is the processing of large amounts of biologically-derived information, whether DNA sequences or breast X-rays.
" Richard Durbin", Head of Informatics at the Wellcome Trust Sanger Institute , expressed an interesting opinion : "I do not think all biological computing is bioinformatics, e.g. mathematical modelling is not bioinformatics, even when connected with biology-related problems. In my opinion, bioinformatics has to do with management and the subsequent use of biological information, particular genetic information."

Bioinformatics definition - Organization / commitee
Bioinformatics definition by bioinformatics definition Committee, National Institute of Mental Health released on July 17, 2000 (source: http://www.bisti.nih.gov/CompuBioDef.pdf) (1)
The NIH Biomedical Information Science and Technology Initiative Consortium agreed on the following definitions of bioinformatics and computational biology recognizing that no definition could completely eliminate overlap with other activities or preclude variations in interpretation by different individuals and organizations.
Bioinformatics: Research, development, or application of computational tools andapproaches for expanding the use of biological, medical, behavioral or health data,including those to acquire, store, organize, archive, analyze, or visualize such data.
Computational Biology: The development and application of data-analytical andtheoretical methods, mathematical modeling and computational simulation techniquesto the study of biological, behavioral, and social systems.
The National Center for Biotechnology Information (NCBI 2001) defines bioinformatics as
"Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline.There are three important sub-disciplines within bioinformatics: the development of new algorithms and statistics with which to assess relationships among members of large data sets; the analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and the development and implementation of tools that enable efficient access and management of different types of information." (2)
Bioinformatics- Definition (As submitted to the Oxford English Dictionary)
(Molecular) bio informatics: bioinformatics is conceptualising biology in terms of molecules (in the sense of Physical chemistry) and applying informatics techniques(derived from disciplinessuch as applied maths, computer science and statistics) to understand andorganise the information associatedwith these molecules, on a large scale. Inshort, bioinformatics is a managementinformation system for molecular biology and has many practical applications. (3)
(Source: What is bioinformatics? A proposed definition and overview of the field. NM Luscombe, D Greenbaum, M Gerstein (2001) Methods Inf Med 40: 346-58)

Bioinformatics definition - Website / other sources
Bioinformatics or computational biology is the use of mathematical and informational techniques, including statistics, to solve biological problems, usually by creating or using computer programs, mathematical models or both. One of the main areas of bioinformatics is the data mining and analysis of the data gathered by the various genome projects. Other areas are sequence alignment, protein structure prediction, systems biology, protein-protein interactions and virtual evolution. (source: www.answers.com)
Bioinformatics is the science of developing computer databases and algorithms for the purpose of speeding up and enhancing biological research. (source: www.whatis.com)
As a discipline that builds upon computational biology, bioinformatics encompasses the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems. As a discipline that builds upon the life, health, and medical sciences, bioinformatics supports medical informatics; gene mapping in pedigrees and population studies; functional-, structural-, and pharmaco-genomics; proteomics, and dozens of other evolving ï¿½ï¿½omics.ï¿½ As a discipline that builds upon the basic sciences, bioinformatics depends on a strong foundation of chemistry, biochemistry, biophysics, biology, genetics, and molecular biology which allows interpretation of biological data in a meaningful context. As a discipline whose core is mathematics and statistics, bioinformatics applies these fields in ways that provide insight to make the vast, diverse, and complex life sciences data more understandable and useful, to uncover new biological insights, and to provide new perspectives to discern unifying principles. In short, bioinformaticists bring a multidisciplinary perspective to many of the critical problems facing the health-science profession today.
"Biologists using computers, or the other way around. Bioinformatics is more of a tool than a discipline.(source: An Understandable Definition of Bioinformatics , The O'Reilly Bioinformatics Technology Conference, 2003) (4)
The application of computer technology to the management of biological information. Specifically, it is the science of developing computer databases and algorithms to facilitate and expedite biological research.(source: Webopedia)
Bioinformatics: a combination of Computer Science, Information Technology and Genetics to determine and analyze genetic information. (Definition from BitsJournal.com)
Bioinformatics is the application of computer technology to the management and analysis of biological data. The result is that computers are being used to gather, store, analyse and merge biological data.(EBI - 2can resource)
Even though the three terms: bioinformatics , computational biology and bioinformation infrastructure are often times used interchangeably, broadly, the three may be defined as follows:
bioinformatics refers to database-like activities, involving persistent sets of data that are maintained in a consistent state over essentially indefinite periods of time;
computational biology encompasses the use of algorithmic tools to facilitate biological analyses; while bioinformation infrastructure comprises the entire collective of information management systems, analysis tools and communication networks supporting biology.Thus, the latter may be viewed as a computational scaffold of the former two.
Bioinformatics is currently defined as the study of information content and information flow in biological systems and processes. It has evolved to serve as the bridge between observations (data) in diverse biologically-related disciplines and the derivations of understanding (information) about how the systems or processes function, and subsequently the application (knowledge). A more pragmatic definition in the case of diseases is the understanding of dysfunction (diagnostics) and the subsequent applications of the knowledge for therapeutics and prognosis.

Definitions of Fields Related to Bioinformatics
Bioinformatics has various applications in research in medicine, biotechnology, agriculture etc. Following research fields has integral component of Bioinformatics

Computational Biology

The development and application of data-analytical andtheoretical methods, mathematical modeling and computational simulation techniquesto the study of biological, behavioral, and social systems.

Genomics:

Genomics is any attempt to analyze or compare the entire genetic complement of a species or species (plural). It is, of course possible to compare genomes by comparing more-or-less representative subsets of genes within genomes.

Proteomics:

Proteomics is the study of proteins - their location, structure and function. It is the identification, characterization and quantification of all proteins involved in a particular pathway, organelle, cell, tissue, organ or organism that can be studied in concert to provide accurate and comprehensive data about that system. Proteomics is the study of the function of all expressed proteins. The study of the proteome, called proteomics, now evokes not only all the proteins in any given cell, but also the set of all protein isoforms and modifications, the interactions between them, the structural description of proteins and their higher-order complexes, and for that matter almost everything 'post-genomic'." (5)

Pharmacogenomics:

Pharmacogenomics is the application of genomic approaches and technologies to the identification of drug targets. In Short, pharmacogenomics is using genetic information to predict whether a drug will help make a patient well or sick. It Studies how genes influence the response of humans to drugs, from the population to the molecular level.

Pharmacogenetics:

Pharmacogenetics is the study of how the actions of and reactions to drugs vary with the patient's genes. All individuals respond differently to drug treatments; some positively, others with little obvious change in their conditions and yet others with side effects or allergic reactions. Much of this variation is known to have a genetic basis. Pharmacogenetics is a subset of pharmacogenomics which uses genomic/bioinformatic methods to identify genomic correlates, for example SNPs (Single Nucleotide Polymorphisms), characteristic of particular patient response profiles and use those markers to inform the administration and development of therapies. Strikingly such approaches have been used to "resurrect" drugs thought previously to be ineffective, but subsequently found to work with in subset of patients or in optimizing the doses of chemotherapy for particular patients.

Cheminformatics:

'The mixing of those information resources [information technology and information management] to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the arena of drug lead identification and optimization.' (Frank K Brown 'Chemoinformatics: what is it and how does it impact drug discovery.' Ann. Rep. Med. Chem. 1998, 33 , 375-384.) (6)Related terms of cheminformatics are chemi-informatics, chemometrics, computational chemistry, chemical informatics, chemical information management/science, and cheminformatics. But we can distinguish chemoinformatics and chemical informatics as follows Chemical informatics : 'Computer-assisted storage, retrieval and analysis of chemical information, from data to chemical knowledge.' ( Chem. Inf. Lett. 2003, 6 , 14.) This definition is distinct from ' Chemoinformatics ' (and the synonymous cheminformatics and chemiinformatics) which focus on drug design. chemometrics: The application of statistics to the analysis of chemical data (from organic, analytical or medicinal chemistry) and design of chemical experiments and simulations. [IUPAC Computational] computational chemistry : A discipline using mathematical methods for the calculation of molecular properties or for the simulation of molecular behavior. It also includes, e.g., synthesis planning, database searching, combinatorial library manipulation (Hopfinger, 1981; Ugi et al., 1990). [IUPAC Computational]

Structural genomics or structural bioinformatics:

It refers to the analysis of macromolecular structure particularly proteins , using computational tools and theoretical frameworks. One of the goals of structural genomics is the extension of idea of genomics , to obtain accurate three-dimensional structural models for all known protein families, protein domains or protein folds . Structural alignment is a tool of structural genomics.

Comparative genomics:

The study of human genetics by comparisons with model organisms such as mice, the fruit fly, and the bacterium E. coli .

Biophysics:

The British Biophysical Society defines biophysics as: "an interdisciplinary field which applies techniques from the physical sciences to understanding biological structure and function".

Biomedical informatics / Medical informatics:

"Biomedical Informatics is an emerging discipline that has been defined as the study, invention, and implementation of structures and algorithms to improve communication, understanding and management of medical information."

Mathematical Biology:

Mathematical biology also tackles biological problems, but the methods it uses to tackle them need not be numerical and need not be implemented in software or hardware. It includes things of theoretical interest which are not necessarily algorithmic, not necessarily molecular in nature, and are not necessarily useful in analyzing collected data.

Computational chemistry:

Computational chemistry is the branch of theoretical chemistry whose major goals are to create efficient computer programs that calculate the properties of molecules (such as total energy, dipole moment, vibrational frequencies) and to apply these programs to concrete chemical objects. It is also sometimes used to cover the areas of overlap between computer science and chemistry.

Functional genomics:

Functional genomics is a field of molecular biology that is attempting to make use of the vast wealth of data produced by genome sequencing projects to describe genome function. Functional genomics uses high-throuput techniques like DNA microarrays, proteomics, metabolomics and mutation analysis to describe the function and interactions of genes.

Pharmacoinformatics:

Pharmacoinformatics concentrates on the aspects of bioinformatics dealing with drug discovery

In silico ADME-Tox Prediction:(Brief description)

Drug discovery is a complex and risky treasure hunt to find the most efficacious molecule which do not have toxic effects but at the same time have desired pharmacokinetic profile. The hunt starts when the researchers look for the binding affinity of the molecule to its target. Huge amount of research requires to be done to come out with a molecule which has the reliable binding profile. Once the molecules have been identified, as per the traditional methodologies, the molecule is further subjected to optimization with the aim of improving efficacy. The molecules which show better binding is then evaluated for its toxicity and pharmacokinetic profiles. It is at this stage that most of the candidates fail in the race to become a successful drug.

Agroinformatics / Agricultural informatics:

Agroinformatics concentrates on the aspects of bioinformatics dealing with plant genomes.

Systems biology:

Systems biology is the coordinated study of biological systems by investigating the components of cellular networks and their interactions,by applying exprerimental high-throughput and whole-genome techniques, and integrating computational methods with experiemntal efforts.

Insilico Scientist BioInformatics