MOLECULAR BIOLOGY AND DIAGNOSTICS INTRODUCTION TO MOLECULAR BIOLOGY AND BIOMOLECULES OUTLINE • Introduction to Molecular Biology o Classical Biochemistry and Genetics o The Merging of Biochemistry and Genetics o The Birth of Molecular Biology o The Central Dogma of Molecular Biology • The Biomolecules o The Important Biomolecules in Molecular Biology o Carbohydrates o Lipids o Proteins o Nucleic Acids • DNA Synthesis & Manipulation o DNA Structure, Function, and Synthesis o DNA Extraction and Isolation o DNA Amplification o DNA Quantitation • RNA Synthesis & Manipulation o RNA Structure, Function, and Synthesis o Reverse Transcription o RNA Quantitation o Application of RNA Technology • Protein Synthesis and Manipulation o Proteins o Amino Acids o The Evolution Significance of Cytochrome C o Protein Structure o Denaturation and Protein Folding o Function of Proteins o Transcription of DNA to RNA o The Genetic Code • LESSON 1: INTRODUCTION TO MOLECULAR BIOLOGY CLASSICAL BIOCHEMISTRY AND GENETICS BIOCHEMISTRY • Study of the chemical substances and processes that occur in living organisms like plants, animals, and microorganisms • Deals with the chemistry of life, and as such it draws on the techniques of analytical , organic, and physical chemistry, including physiology that involved in life vital processes and functions. • Enters into the investigation of chemical changes in diseases, drug action, nutrition, immunity, agriculture, and genetics. GENETICS • Important component in understanding molecular biology. • Branch of biology concerned with the study of genes, genetic variation and heredity in organisms. • Gregor Mendel o Scientist and Augustinian friar working in 19th century o First to study genetics scientifically o Studied "trait inheritance", patterns in the way traits are handed down from parents to offspring. o Observed that organisms (pea plants) inherit traits by way of discrete "units of inheritance". This term, still used today, is a somewhat ambiguous definition of what is referred to as a gene. • Trait inheritance and molecular inheritance mechanisms of genes are still primary principles of genetics in the 21st century, but modern genetics has expanded beyond inheritance to studying the function and behavior of genes. Genetics has given rise to a number of subfields, including molecular genetics, epigenetics and population genetics . THE MERGING OF BIOCHEMISTRY AND GENETICS • Researchers in biochemistry use specific techniques native to biochemistry, but increasingly combine these with techniques and ideas developed in the fields of genetics,molecular biology , and biophysics . There is not a defined line between these disciplines. • Biochemistry is the study of the chemical substances and vital processes occurring in live organisms. Biochemists focus heavily on the role, function, and structure of biomolecules. The study of the chemistry behind biological processes and the synthesis of biologically active molecules are examples of biochemistry. • Genetics - study of effect of genetic differences in organisms. This can often be inferred by the absence of a normal component (e.g. one gene). The study of "mutants" – organisms that lack one or more functional components with respect to so-called "wild type" or normal phenotype. • Molecular biology is the study of molecular underpinnings of the processes of replication, transcription, translation, and cell function. o Central dogma of molecular biology where genetic material is transcribed into RNA and then translated into protein, despite being oversimplified, still provides a good starting point for understanding field. o Nucleus - contains the DNA THE BIRTH OF MOLECULAR BIOLOGY • The history of molecular biology begins in the 1930s with the convergence of various, previously distinct biological and physical disciplines: biochemistry, genetics, microbiology, virology and physics. With the hope of understanding life at its most fundamental level, numerous physicists and chemists also took an interest in what would become molecular biology. • In its modern sense, molecular biology attempts to explain the phenomena of life starting from macromolecular properties that generate them. Two categories of macromolecules in particular are focus of molecular biologist: o Nucleic acids - among which the most famous is deoxyribonucleic acid (DNA), the constituent of genes, and o Proteins - the active agents of living organisms. • One definition of the scope of molecular biology therefore is to characterize structure, function and relationships between these two types of macromolecules. This relatively limited definition will suffice to allow us to establish a date for so-called "molecular revolution", or at least to establish a chronology of its most fundamental developments. HISTORY OF MOLECULAR DIAGNOSTICS: THE MOLECULAR BIOLOGY TIMELINE • 1865 - Gregor Mendel, Law of Heredity • 1866 - Johann Miescher, Purification of DNA • 1949 - Sickle Cell Anemia Mutation • 1953 - Watson and Crick, Structure of DNA • 1970 - Recombinant DNA Technology • 1977 - DNA Sequencing • 1985 - In vitro Amplification of DNA (PCR) • 2001 - The Human Genome Project • 2005-11 - Sequencing Technologies and Genome Sequencing
THE CENTRAL DOGMA OF MOLECULAR BIOLOGY • The dogma is a framework for understanding the transfer of sequence information between information-carrying biopolymers, in the most common or general case, in living organisms. There are 3 major classes of such biopolymers: DNA and RNA (both nucleic acids), and protein. The dogma classes these into 3 groups of 3: • Three General Transfers (believed to occur normally in most cells) o describe the normal flow of biological information: DNA can be copied to DNA (DNA replication), DNA information can be copied into mRNA (transcription), and proteins can be synthesized using the information in mRNA as a template (translation) o HIV - targets DNA in T cells • Three Special Transfers (known to occur, but only under specific conditions in case of some viruses or in a laboratory) o RNA being copied from RNA (RNA replication), DNA being synthesised using an RNA template (reverse transcription), and proteins being synthesized directly from a DNA template without the use of mRNA. • Three Unknown Transfers (believed never to occur). o a protein being copied from a protein, synthesis of RNA using the primary structure of a protein as a template, and DNA synthesis using the primary structure of a protein as a template LESSON 2: THE BIOMOLECULES THE IMPORTANT BIOMOLECULES IN MOLECULAR BIOLOGY • Carbohydrates • Lipids • Proteins • Nucleic Acids CARBOHYDRATES • Carbohydrates - provide an energy source (sugars) for the cell,energy storage (Starch, glycogen) and also may play a structural role (cellulose). • Simplest subunit of a carbohydrate is a monosaccharide. • Monosaccharide - simple sugars that are composed of 3- 7 carbon atoms. • Structure of the glucose molecule is a good representation of carbohydrate subunits, also called a monomer. • Monomers of Carbohydrates: Monosaccharide • o Example: glucose, fructose, sucrose, maltose, starch, cellulose LIPIDS • Naturally occurring hydrophobic molecules. • Heterogeneous group of compounds related to fatty acids. Include fats, oils, waxes, phospholipids, etc. • Make up about 70% of the dry weight of the nervous system • Crucial for the healthy functioning of the nerve cells. • Greasy or oily organic substances; lipids are sparingly soluble in water and are soluble in organic solvents like chloroform, ether and benzene. • Composed of long hydrocarbon chains. • Lipid molecules hold a large amount of energy and are energy storage molecules. • Generally esters of fatty acids and are building blocks of biological membranes • Most of the lipids have a polar head and non-polar tail. • Fatty acids can be unsaturated and saturated fatty acids. • Lipids present in biological membranes are of three classes based on the type of hydrophilic head present: o Glycolipids - lipids whose head contains oligosaccharides with 1-15 saccharide residues. o Phospholipids - contain a positively charged head which are linked to the negatively charged phosphate groups. Sterols, whose head contain a steroid ring. CHARACTERISTICS OF LIPIDS • Lipids are relatively insoluble in water. • They are soluble in non-polar solvents, like ether, chloroform, methanol. • Lipids have high energy content and are metabolized to release calories. • Lipids also act as electrical insulators, they insulate nerve axons. • Fats contain saturated fatty acids; they are solid at room temperatures. Example, animal fats. • Plant fats are unsaturated and are liquid at room temperatures. • Pure fats are colorless, they have extremely bland taste. • The fats are sparingly soluble in water and hence are described are hydrophobic substances. • They are freely soluble in organic solvents like ether, acetone and benzene. • The melting point of fats depends on the length of the chain of the constituent fatty acid and the degree of unsaturation. • Geometric isomerism, the presence of double bond in the unsaturated fatty acid of the lipid molecule produces geometric or cis-trans isomerism. • Fats have insulating capacity; bad conductors of heat. • Emulsification is the process by which a lipid mass is converted to a number of small lipid droplets. The process of emulsification happens before the fats can be absorbed by the intestinal walls. • The fats are hydrolyzed by the enzyme lipases to yield fatty acids and glycerol. • The hydrolysis of fats by alkali is called saponification. This reaction results in the formation of glycerol and salts of fatty acids called soaps. • Hydrolytic rancidity is caused by the growth of microorganisms which secrete enzymes like lipases. These split fats into glycerol and free fatty acids.
TYPES OF LIPIDS • In the year 1943 Bloor proposed the following classification of lipids based on their chemical composition: o Simple Lipids o Compound Lipid o Derived Lipids • Simple Lipids or Homolipids o Simple lipids are the esters of fatty acids with various alcohols. o Fats and Oils (triglycerides and triacylglycerols) - These are esters of fatty acids with a trihydroxy alcohol, glycerol. A fat is solid at ordinary room temperature, an oil is liquid. o Simple Triglycerides - Simple triglycerides are one in which three fatty acids radicles are similar or are of the same type. Example: Tristearin, Triolein. o Mixed Triglycerides are one in which the three fatty acids radicles are different from each other. Example: distearo-olein, dioleo-palmitin. o Waxes are the esters of fatty acids with high molecular weight monohydroxy alcohols. Example: Beeswax, Carnauba wax. • Compound Lipids or Heterolipids o Heterolipids are esters of fatty acids with alcohol and possess additional groups also. o Phospholipids or Phosphatids are compound containing fatty acids and glycerol in addition to a phosphoric acid, nitrogen bases and other substituent. They usually possess one hydrophilic head and tow non-polar tails. They are called polar lipids and are amphipathic in nature. ▪ Phospholipids can be phosphoglycerides, phosphoinositides and phosphosphingosides. o Phosphoglycerides are major phospholipids, they are found in membranes. It contains fatty acid molecules which are esterified to hydroxyl groups of glycerol. The glycerol group also forms an ester linkage with phosphoric acid. Example: Lecithin, Cephalins. o Phosphoinositides are said to occur in phospholipids of brain tissue and soybeans. The ply important role in transport processes in cells. o Phosphosphingosides are commonly found in nerve tissue. Example: sphingomyelins. o Glycolipids are the compounds of fatty acids with carbohydrates and contain nitrogen but no phosphoric acid. The glycolipids also include certain structurally related compounds comprising the groups gangliosides, sulpholipids and sulfatids. • Derived Lipids o Derived lipids are the substances derived from simple and compound lipids by hydrolysis. These includes fatty acids, alcohols, monoglycerides and diglycerides, steroids, terpenes, carotenoids. ▪ *The most common derived lipids are steroids, terpenes and carotenoids. o Steroids do not contain fatty acids, they are nonsaponifiable, and are not hydrolyzed on heating. They are widely distributed in animals, where they are associated with physiological processes. Example: Estranes, androstranes, etc. o Terpenes in majority are found in plants. Example: Natural rubber. gernoil, etc. o Carotenoids are tetraterpenes. They are widely distributed in both plants and animals. They are exclusively of plant origin. Due to the presence of many conjugated double bonds, they are colored red or yellow. Example: Lycopreene, carotenes, Xanthophylls. • Essential fatty acids are those that cannot be constructed through any chemical pathways, known to happen in humans. They must be obtained from the diet. Linoleic acid and linolenic acid are the essential fatty acids. • Non-essential fatty acids are those which are not necessary to be taken through diet, they are synthesized through chemical pathways. • Unsaturated fatty acids have one or more double bonds between carbon atoms. The two carbon atoms are bound to each other through double bonds and can occur in cis or trans configuration. • Saturated fatty acids are long chain carboxylic acids and do not have double bonds. Example: Arachidic acid, Palmitic acid, etc. STRUCTURE OF LIPIDS • Lipids have no single common structure. The most commonly occurring lipids are triglycerides and phospholipids.
• Triglycerides are fats and oils. Triglycerides have a glycerol backbone bonded to three fatty acids. If the three fatty are similar then the triglyceride is known as simple triglyceride. If the fatty acids are not similar then the fatty acids are known as mixed triglyceride. • Second most common class of lipids are phospholipids. They are found in membranes of animal and plants. Phospholipids contains glycerol & fatty acids, they also contain phosphoric acids and a low-molecular weight alcohol. Common phospholipids are lecithins & cephalins. FUNCTION OF LIPIDS • Lipids perform several functions: o Lipids are storage compounds, triglycerides serve as reserve energy of the body. o Lipids are important component of cell membranes structure in eukaryotic cells. o Lipids regulate membrane permeability. o Source for fat soluble vitamins like A, D, E, K. o They act electrical insulators to the nerve fibres, where the myelin sheath contains lipids. o Lipids are components of some enzyme systems. o Some lipids like prostaglandins and steroid hormones act as cellular metabolic regulators. o Cholesterol is found in cell membranes, blood, and bile of many organisms. o As lipids are small molecules and are insoluble in water, they act as signalling molecules. o Layers of fat in the subcutaneous layer, provides insulation and protection from cold. Body temperature maintenance is done by brown fat. o Polyunsaturated phospholipids are important constituents of phospholipids; they provide fluidity and flexibility to the cell membranes. o Lipoproteins that are complexes of lipids and proteins occur in blood as plasma lipoprotein, they enable transport of lipids in aqueous environment, and their transport throughout the body. o Cholesterol maintains fluidity of membranes by interacting with lipid o Cholesterol is the precursor of bile acids, Vitamin D and steroids. o Essential fatty acids like linoleic and linolenic acids are precursors of many different types of ecosanoids including prostaglandins, thromboxanes. These play a important role in pain, fever, inflammation and blood clotting. EXAMPLES OF LIPIDS • Few well known examples of lipids are as follows: 1. Fatty acids - Oleic acid, Linoleic acid, Palmitoleic acid, Arachidonic acid. 2. Fats and Oils - Animal fats - Butter, Lard, Human fat, Herring oil. Plant oils - Coconut oil, Corn, Palm, Peanut, Sunflower oil. 3. Waxes - Spermacti, Beeswax, Carnauba wax. 4. Phospholipids - Lecithins, Cephalins, Plasmoalogens, Phosphatidyl inositols, Sphingomyelins. 5. Glycolipids - Kerasin, Phrenosin, Nervon, Oxynervon. 6. Steroids - C 29, C 28, C 27, C 24, C 21 steroids. 7. Terpenes - Monoterpenes, Sesquiterpenes, Diterpenes, Triterpenes. 8. Carotenoids - Lycopene, Carotenes, Xanthophylls. PROTEINS • Proteins o Heteropolymers of stings of amino acids. Amino acids are joined together by the peptide bond which is formed in between the carboxyl group and amino group of successive amino acids. Proteins are formed from 20 different amino acids, depending on the number of amino acids and the sequence of amino acids. • Functions of proteins o Structural - muscle fibers, connective tissue, collagen, keratin – hair, feathers. o Transport - hemoglobin, membrane channels and pumps. o Movement - contractile fibers in muscles (actin and myosin) o Immune response -antibodies, complement proteins, cell surface marker. o Catalysts – enzymes • Monomers of proteins : Amino Acid FOUR LEVELS OF PROTEIN STRUCTURE • Primary structure of Protein - Here protein exist as long chain of amino acids arranged in a particular sequence. They are non-functional proteins. • Secondary structure of protein - The long chain of proteins are folded and arranged in a helix shape, where the amino acids interact by the formation of hydrogen bonds. This structure is called the pleated sheet. Example: silk fibres. • Tertiary structure of protein - Long polypeptide chains become more stabilizes by folding and coiling, by the formation of ionic or hydrophobic bonds or disulphide bridges, this results in the tertiary structure of protein. • Quaternary structure of protein - When a protein is an assembly of more than one polypeptide or subunits of its own, this is said to be the quaternary structure of protein. Example: Haemoglobin, insulin. NUCLEIC ACIDS • Organic compounds with heterocyclic rings • Nucleic acids are made of polymer of nucleotides. Nucleotides consists of nitrogenous base, a pentose sugar and a phosphate group. • Nucleoside is made of nitrogenous base attached to a pentose sugar. o The nitrogenous bases are adenine, guanine, thyamine, cytosine and uracil. Polymerized nucleotides form DNA and RNA which are genetic material.
• Nucleic acid, naturally occurring chemical compound that is capable of being broken down to yield phosphoric acid , sugars, and a mixture of organic bases (purines and pyrimidines). Nucleic acids are the main information-carrying molecules of the cell, and, by directing the process of protein synthesis, they determine the inherited characteristics of every living thing. The two main classes of nucleic acids are deoxyribonucleic acid (DNA )) and ribonucleic acid (RNA ). DNA is the master blueprint for life and constitutes the genetic material in all free-living organisms and most viruses. RNA is the genetic material of certain viruses, but it is also found in all living cells, where it plays an important role in certain processes such as the making of proteins. • Nucleic acids holds genetic codes (Deoxyribonucleic acid and Ribonucleic acid). • Aid in protein synthesis specifically the Ribonucleic acid • Monomers of Nucleic acids: Nucleotides TYPES AND STRUCTURE OF NUCLEIC ACIDS CHARACTERISTICS OF NUCLEIC ACIDS • Nucleic acid, so named because it is found in a cell's nucleus, is a catch-all term for DNA and all types of RNA and is the means by which an organism stores, translates and passes on its genetic information. Nucleic acids are made of chains of nucleotides, which are composed of a five-carbon sugar, a base and a phosphate group. • Function o The main function is store and transfer genetic information. o To use the genetic information to direct the synthesis of ne protein. o The deoxyribonucleic acid is the storage for place for genetic information in the cell. o DNA controls the synthesis of RNA in the cell. o The genetic information is transmitted from DNA to the protein synthesis in the cell. o RNA also directs the production of new protein by transmitting genetic information to the protein building structures. o The function of the nitrogenous base sequences in the DNA backbone determines the proteins being synthesized. o The function of the double helix of the DNA is that no disorders occur in the genetic information if t is lost or damaged. o RNA directs synthesis of proteins. o m-RNA takes genetic message from RNA. o t-RNA transfers activated amino acid, to the site of protein synthesis. • r-RNA are mostly present in the ribosomes, and responsible for its stability DIFFERENCES OF DNA FROM RNA DNA RNA Mostly found in nucleus and nucleoid Mostly found in cytoplasm Deoxyribonucleic acid Ribonucleic acid Deoxyribose - sugar where bases are A, T, C, G Ribose - the sugar where the bases are A, U, C, G Long polymer Shorter than DNA A-T, C-G A-U, C-G Double-stranded and exhibits a double-helix structure Single-stranded, sometimes forms secondary and tertiary structures Prefers B-form Prefers A-form More prone to UV damage Less prone to UV damage Carries the genetic information necessary for the development, functioning, and reproduction Mainly involved in protein synthesis, sometimes regulates the gene expression LESSON 3.1: DNA SYNTHESIS AND MANIPULATION DNA STRUCTURE, FUNCTION AND SYNTHESIS • Deoxyribonucleic acid o DNA is a polymer of the four nucleotides A, C, G, and T, which are joined through a backbone of alternating phosphate and deoxyribose sugar residues. These nitrogen-containing bases occur in complementary pairs as determined by their ability to form hydrogen bonds between them. A always pairs with T through two hydrogen bonds, and G always pairs with C through three hydrogen bonds. The spans of A:T and G:C hydrogen-bonded pairs are nearly identical, allowing them to bridge the sugar-phosphate chains unifo rmly. This structure, along with the molecule’s chemical stability, makes DNA the ideal genetic material. The bonding between complementary bases also provides a mechanism for the replication of DNA and the transmission of genetic information. STRUCTURE OF DNA
NUCLEIC ACID COMPONENTS • Sugar - deoxyribose • Base + sugar = Nucleoside - N - glycoside bond. • Nucleoside + phosphoric acid = Nucleotide - Ester bond. • Nucleic Acids - condensation polymer of nucleotide (Nucleotide - nucleotide) phosphor diester bond. • Watson -Crick double helical structure of DNA and forces responsible for stability of helix. • Functions of N o Transmission of hereditary Characters (DNA) o Store house of genetic information control protein synthesis in cell. Direct synthesis of RNA. • Properties of Nucleic Acid : o Optical Property: Absorbance in UV at 260 nm o Melting Temperature: Tm analysis • Importance of Nucleic Acid : o The DNA is the biological molecule that stores all the genetic information of the cell (in some viruses RNA may function as the molecule that stores the genetic information). Everything that the cells has to do, at what time in its life cycle, and how it has to do it is determined by the information contained in the DNA molecule. In addition, DNA functions as the molecule that carries on the genetic information from parent to offspring. DNA EXTRACTION AND ISOLATION • DNA isolation of purification of DNA from sample using a combination of physical and chemical methods. The first isolation of DNA was done in 1869 by Friedrich Miescher. Currently it is a routine procedure in molecular biology or forensic analyses. For the chemical method, there are many different kits used for extraction, and selecting the correct one will save time on kit optimization and extraction procedures. PCR sensitivity detection is considered to show the variation between the commercial kits. • There are three basic and two optional steps in a DNA extraction: o Cells which are to be studied need to be collected. o Breaking the cell membranes open to expose the DNA along with the cytoplasm within (cell lysis). ▪ Lipids from the cell membrane and the nucleus are broken down with detergents and surfactants. ▪ Breaking down proteins by adding a protease (optional). ▪ Breaking down RNA by adding an RNase (optional)-for elongation. o The solution is treated with a concentrated salt solution (saline) to make debris such as broken proteins, lipids and RNA clump together. o Centrifugation of the solution, which separates the clumped cellular debris from the DNA. o DNA purification from detergents, proteins, salts and reagents used during the cell lysis step. The most commonly used procedures are: ▪ Ethanol precipitation usually by ice-cold ethanol or isopropanol. Since DNA is insoluble in these alcohols, it will aggregate together, giving a pellet upon centrifugation. Precipitation of DNA is improved by increasing of ionic strength, usually by adding sodium acetate. ▪ Phenol –chloroform extraction in which phenol denatures proteins in the sample. After centrifugation of the sample, denatured proteins stay in organic phase while the aqueous phase containing nucleic acid is mixed with chloroform to remove phenol residues from the solution. ▪ Minicolumn purification that relies on the fact that the nucleic acids may bind (adsorption) to the solid phase (silica or other) depending on the pH and the salt concentration of the buffer • Cellular and histone proteins bound to the DNA can be removed either by adding a protease or by having precipitated the proteins with sodium or ammonium acetate, or extracted them with a phenol-chloroform mixture prior to the DNA-precipitation. • After isolation, the DNA is dissolved in a slightly alkaline buffer, usually in a TE buffer, or in ultra-pure water. DETECTION OF DNA • A diphenylamine (DPA) indicator will confirm the presence of DNA. This procedure involves chemical hydrolysis of DNA: when heated (e.g., ≥95 °C) in acid, the reaction requires a deoxyribose sugar and therefore is specific for DNA. Under these conditions, the 2-deoxyribose is converted to w-hydroxylevulinyl aldehyde, which reacts with the compound, diphenylamine, to produce a blue-colored compound. DNA concentration can be determined measuring the intensity of absorbance of the solution at the 600 nm with a spectrophotometer and comparing to a standard curve of known DNA concentrations. • Measuring the intensity of absorbance of the DNA solution at wavelengths 260 nm and 280 nm is used as a measure of DNA purity. DNA absorbs UV light at 260 and 280 nanometres, and aromatic proteins absorb UV light at 280 nm; a pure sample of DNA has a ratio of 1.8 at 260/280 and is relatively free from protein contamination. A DNA preparation that is contaminated with protein will have a 260/280 ratio lower than 1.8. • DNA can be quantified by cutting the DNA with a restriction enzyme, running it on an agarose gel, staining with ethidium bromide (EtBr) or a different stain and comparing the intensity of the DNA with a DNA marker of known concentration. • Using the Southern blot technique, this quantified DNA can be isolated and examined further using PCR and RFLP analysis. These procedures allow differentiation of the repeated sequences within the genome. It is these techniques which forensic scientists use for comparison, identification, and analysis.
DNA AMPLIFICATION • Polymerase chain reaction (PCR) is a technique used to exponentially amplify a specific target DNA sequence, allowing for the isolation, sequencing, or cloning of a single sequence among many. PCR was developed in 1983 by Kary Mullis, who received a Nobel Prize in chemistry in 1993 for his invention. The polymerase chain reaction has been elaborated in many ways since its introduction and is now commonly used for a wide variety of applications including genotyping, cloning, mutation detection, sequencing, microarrays, forensics, and paternity testing. • Typically, a PCR is a three-step reaction. The sample containing a dilute concentration of template DNA is mixed with a heat-stable DNA polymerase, such as Taq polymerase, primers, deoxynucleoside triphosphates (dNTPs), and magnesium. In the first step of PCR, the sample is heated to 95 –98°C, which denatures the double- stranded DNA, splitting it into two single strands. In the second step, the temperature is decreased to approximately 55 –65°C, allowing the primers to bind, or anneal, to specific sequences of DNA at each end of the target sequence, also known as the template. In the third step, the temperature is typically increased to 72°C, allowing the DNA polymerase to extend the primers by the addition of dNTPs to create a new strand of DNA, thus doubling the quantity of DNA in the reaction. This sequence of denaturation, annealing, and extension is repeated for many cycles, resulting in the exponential amplification of the template DNA. As the DNA polymerase loses activity or the dNTPs and primers are consumed, the reaction rate reaches a plateau. DNA QUANTITATION • There are several ways to quantitate solutions of nucleic acids. If the solution is pure, one can use a spectrophotometer to measure the amount of ultraviolet radiation absorbed by the bases. DNA can also be quantified by measuring the UV-induced emission of fluorescence from intercalated ethidium bromide. This method is useful if there is not enough DNA to quantify with a spectrophotometer, or if the DNA solution is contaminated. The common practice to quantitate the DNA or RNA is with the use spectrophotometer which determines the average concentrations of the nucleic acids DNA or RNA present in a mixture, as well as their purity. • Spectrophotometric analysis is based on the principles that nucleic acids absorb ultraviolet light in a specific pattern. In the case of DNA and RNA, a sample is exposed to ultraviolet light at a wavelength of 260 nanometers (nm) and a photo-detector measures the light that passes through the sample. Some of the ultraviolet light will pass through and some will be absorbed by the DNA / RNA. The more light absorbed by the sample, the higher the nucleic acid concentration in the sample. The resulting effect is that less light will strike the photodetector and this will produce a higher optical density (OD) • Using the Beer-Lambert Law it is possible to relate the amount of light absorbed to the concentration of the absorbing molecule. At a wavelength of 260 nm, the average extinction coefficient for double-stranded DNA is 0.020 (μg/ml)−1 cm−1, for single-stranded DNA it is 0.027 (μg/ml)−1 cm−1, for single-stranded RNA it is 0.025 (μg/ml)−1 cm−1 and for short single-stranded oligonucleotides it is dependent on the length and base composition. Thus, an Absorbance (A) of 1 corresponds to a concentration of 50 μg/ml for double-stranded DNA. • This method of calculation is valid for up to an A of at least 2. A more accurate extinction coefficient may be needed for oligonucleotides; these can be predicted using the nearest-neighbor model QUANTITATION WITH A SPECTROPHOTOMETER (LAB) • Use H2O or 1X TE as a solvent to suspend the nucleic acids, and place each sample in a quartz cuvette. Zero the spectrophotometer with a sample of solvent. For more accurate readings of the nucleic acid sample of interest, dilute the sample to give readings between 0.1 and 1.0. • For a 1-cm pathlength, the optical density at 260 nm (OD260) equals 1.0 for the following solutions: o a 50 μg/mL solution of dsDNA o a 33 μg/mL solution of ssDNA o a 20- 30 μg/mL solution of oligonucleotide o a 40 μg/mL solution of RNA • Contamination of nucleic acid solutions makes spectrophotometric quantitation inaccurate. Calculate the OD260/OD280 ratio for an indication of nucleic acid purity. Pure DNA has an OD260/OD280 ratio of ~1.8; pure RNA has an OD260/OD280 ratio of ~2.0. Low ratios could be caused by protein or phenol contamination. • Example of Calculation: A sample of dsDNA was diluted 50X. The diluted sample gave a reading of 0.65 on a spectrophotometer at OD260. To determine the concentration of DNA in the original sample, perform the following calculation: o dsDNA concentration = 50 μg/mL × OD260 × dilution factor o dsDNA concentration = 50 μg/mL × 0.65 × 50 o dsDNA concentration = 1.63 mg/mL o Calculations ▪ The optical density is generated from equation: ▪ Optical Density= Log (Intensity of Incident Light/ Intensity of Transmitted Light) ▪ In practical terms, a sample that contains no DNA or RNA should not ▪ absorb any of the ultraviolet light and therefore produce an OD of 0 ▪ Optical Density=Log (100/100)=0 ▪ When using spectrophotometric analysis to determine the concentration of DNA or RNA, the Beer-Lambert law is used to determine unknown concentrations without the need for standard curves. In essence, the Beer Lambert Law makes it possible to relate the amount of light absorbed to the concentration of the absorbing molecule. The following absorbance units to nucleic acid concentration conversion factors are used to convert OD to concentration of unknown nucleic acid samples: ▪ A260 dsDNA = 50 µg/ml ▪ A260 ssDNA = 33 µg/ml ▪ A260 ssRNA = 40 µg/ml
• Conversion Factors o When using a 10 mm path length , simply multiply the OD by the conversion factor to determine the concentration. Example, a 2.0 OD dsDNA sample corresponds to a sample with a 100 ug/ml concentration. o When using a path length that is shorter than 10mm, the resultant OD will be reduced by a factor of 10/path length. Using the example above with a 3 mm path length, the OD for the 100 ug/ml sample would be reduced to 0.6. To normalize the concentration to a 10mm equivalent, the following is done: o 0.6 OD X (10/3) * 50 ug/ml=100 ug/ml o Most spectrophotometers allow selection of the nucleic acid type and path length such that resultant concentration is normalized to the 10 mm path length which is based on the principles of Beer's law . • A260 as Quantity Measurement o The "A260 unit" is used as a quantity measure for nucleic acids. One A260 unit is the amount of nucleic acid contained in 1 mL and producing an OD of 1. The same conversion factors apply, and therefore, in such contexts: o 1 A260 unit dsDNA = 50 µg o 1 A260 unit ssDNA = 33 µg o 1 A260 unit ssRNA = 40 µg • Sample Purity (260:280 / 260:230 Ratios) o It is common for nucleic acid samples to be contaminated with other molecules (i.e. proteins, organic compounds, other). The secondary benefit of using spectrophotometric analysis for nucleic acid quantitation is the ability to determine sample purity using the 260 nm:280 nm calculation. The ratio of the absorbance at 260 and 280 nm (A260/280) is used to assess the purity of nucleic acids. For pure DNA, A260/280 is widely considered ~1.8 but has been argued to translate - due to numeric errors in the original Warburg paper - into a mix of 60% protein and 40% DNA. The ratio for pure RNA A260/280 is ~2.0. These ratios are commonly used to assess the amount of protein contamination that is left from the nucleic acid isolation process since proteins absorb at 280 nm. o The ratio of absorbance at 260 nm vs 280 nm is commonly used to assess DNA contamination of protein solutions, since proteins (in particular, the aromatic amino acids) absorb light at 280 nm.  The reverse, however, is not true — it takes a relatively large amount of protein contamination to significantly affect the 260:280 ratio in a nucleic acid solution. o 260:280 ratio has high sensitivity for nucleic acid contamination in protein: % protein % nucleic acid 260:280 ratio 100 0 0.57 95 5 1.06 90 10 1.32 70 30 1.73 o o o o 260:230 ratio lacks sensitivity for protein contamination in nucleic acids (table shown for RNA, 100% DNA is approximately 1.8): % nucleic acid % protein 260:230 ratio 100 0 2.00 95 5 1.99 90 10 1.98 70 30 1.94 • This difference is due to the much higher mass attenuation coefficient nucleic acids have at 260 nm and 280 nm, compared to that of proteins. Because of this, even for relatively high concentrations of protein, the protein contributes relatively little to the 260 and 280 absorbance. While the protein contamination cannot be reliably assessed with a 260:280 ratio, this also means that it contributes little error to DNA quantity estimation. LESSON 3.2: RNA SYNTHESIS AND MANIPULATION RNA STRUCTURE, FUNCTION AND SYNTHESIS RNA STRUCTURE • Like DNA, RNA is a linear polymer made of four different types of nucleotide subunits linked together by phosphodiester bonds. It differs from DNA chemically in two respects: o the nucleotides in RNA are ribonucleotides —that is, they contain the sugar ribose (hence the name ribonucleic acid) rather than deoxyribose o RNA contains the bases adenine (A), guanine (G), and cytosine (C), it contains the base uracil (U) instead of the thymine (T) in DNA. In RNA, G pairs with C, and A pairs with U (a) Ribonucleotides contain the pentose sugar ribose instead of the deoxyribose found in deoxyribonucleotides. (b) RNA contains the pyrimidine uracil in place of thymine found in DNA. • DNA always occurs in cells as a double-stranded helix, RNA is single-stranded. An RNA chain can therefore fold up into a particular shape, just as a polypeptide chain folds up to form the final shape of a protein.
o (a) DNA is typically double stranded, RNA is typically single stranded. o (b) RNA can fold upon itself, with the folds stabilized by short areas of complementary base pairing within the molecule, forming a three-dimensional structure . RNA FUNCTIONS • Cells access the information stored in DNA by creating RNA to direct the synthesis of proteins through process of translation. The three main types of RNA directly involved in protein synthesis are messenger RNA (mRNA), ribosomal RNA (rRNA), and transfer RNA (tRNA). Structure Function mRNA Short, unstable, single-stranded RNA corresponding to a gene encoded within DNA Serves as intermediary between DNA and protein; used by ribosome to direct synthesis of protein it encodes rRNA Longer, stable RNA molecules composing 60% of ribosome’s mass Ensures the proper alignment of mRNA, tRNA, and ribosome during protein synthesis; catalyzes peptide bond formation between amino acids tRNA Short (70-90 nucleotides), stable RNA with extensive intramolecular base pairing; contains an amino acid binding site and an mRNA binding site Carries the correct amino acid to the site of protein synthesis in the ribosome RNA SYNTHESIS • RNA synthesis, or transcription, is the process of transcribing DNA nucleotide sequence information into RNA sequence information. RNA synthesis is catalyzed by a large enzyme called RNA polymerase. The basic biochemistry of RNA synthesis is common to prokaryotes and eukaryotes, although its regulation is more complex in eukaryotes. • There are three phases of transcription: initiation, elongation and termination. It is easier to understand the process by first examining elongation then initiation and termination. • Elongation o RNA polymerase links ribonucleotides together in a 5' to 3' direction. The polymerase induces the 3' hydroxyl group of the nucleotide at the 3' end of the growing RNA chain which attacks (nucleophilic) the a phosphorous of the incoming ribonucleotide. A diphosphate is released and the 5' carbon of incoming nucleotide is linked through a phosphodiester bond to the 3' carbon of preceding nucleotide o Nucleotide incorporation is determined by base pairing with the template strand of the DNA. The template is the DNA strand, also called the sense strand, that is copied by the RNA polymerase into a complementary strand of RNA called the transcript. The DNA strand that is not copied is know as the antisense strand. Note that while the RNA chain grows in a 5' to 3' direction the polymerase migrates along the sense strand in a 3' to 5' direction. Thus the 5' to 3' ribonucleotide sequence of the RNA transcript is identical to the 5' to 3' antisense DNA strand with uracil in place of thymidine. • Initiation o The initiation of transcription is directed by DNA sequences called promoters which tell the RNA polymerase where to begin transcription. The subunits that enable RNA polymerases to recognize and bind promoters are called initiation factors. The initiating nucleotide can be either a purine or pyrimidine. There are numerous eukaryotic promoters with multiple promoter sequence elements. Some of the elements specify where transcription is to be initiated, others determine the frequency with which transcription is initiated at a specific gene. The initiation of transcription in eukaryotes is complicated and involves numerous factors (proteins) that must interact with the DNA and with one another to initiate transcription. o Promoters ▪ Only one strand of the DNA that encodes a promoter, a regulatory sequence, or a gene needs to be written. ▪ The strand that is written is the one that is identical to the RNA transcript, thus the antisense strand of the DNA is always selected for presentation. ▪ The first base on the DNA where transcription actually starts is labeled +1. ▪ Sequences that precede, are upstream of the first base of the transcript, are labeled with negative numbers. Sequences that follow the first base of the transcript, are downstream, are labeled with positive numbers. o RNA pol II promoters are quite diverse. This enables the cell to choose and regulate the expression of the 50 to 100 thousand different genes encoded by its DNA. There are some sequence elements that are conserved and found in most RNA pol II promoters. There are three "boxes": TATA usually found 25 to 35 base pairs upstream, the CAAT box and the GC box both located from 40 to 200 base pairs upstream.
o These three elements provide a basal level of transcription and are found in most "housekeeping" genes. Housekeeping genes encode enzymes and proteins that all cell types require for normal function and are usually expressed at steady state or basal levels. Other sequence elements, which are continually being discovered, serve as regulatory elements. Elements that enable a cell to specifically turn other non-housekeeping genes on or off in response to environmental signals such as hormones, growth factors, metals and toxins. The spacing and orientation of all of the sequence elements are critical for proper functioning. There is a third type of sequence element that can be located either upstream or downstream relative to the initiation site which is called an enhancer or silencer. Enhancers or silencers affect the rate and frequency of initiation of transcription. o RNA pol III promoters for tRNA are found downstream of the initiation point. These promoters consist of two elements, the first of which is located 8 to 30 base pairs downstream and is called Box A. The second element is 50 to 70 base pairs downstream and is called Box B. o RNA pol I promoter consists of a 70 base pair long core element and an upstream element that is about 100 base pairs long. The core spans a segment of DNA that includes sequences that are both up and downstream of the initiation site. • Termination o Prokaryotes use two means for terminating transcription, factor-independent and factor-dependent. Certain DNA sequences function as signals that tell the RNA polymerase to terminate transcription. The DNA of a terminator sequence encoded an inverted repeat and an adjacent stretch of uracils. Factor-dependent termination involves a terminator sequence as well as a factor or protein called rho. The mechanisms by which eukaryotes terminate transcription are poorly understood. Most eukaryotic genes are transcribed for up to several thousand base pairs beyond the actual end of the gene. The excess RNA is then cleaved from transcript when the RNA is processed into its mature form. REVERSE TRANSCRIPTION • Reverse Transcription is the enzyme-mediated synthesis of a DNA molecule from an RNA template. The resulting DNA, known as complimentary DNA strand (cDNA), can be used as a template for PCR amplification. Reverse transcription followed by PCR is known as RT-PCR (reverse transcription-PCR). • RT reactions typically include the following reagents: • RNA template - RNA can be purified from a variety of biological sample types. RNA is inherently unstable, so strict precautions must be taken to prevent degradation during the extraction process and subsequent handling steps. The quality and purity of the RNA template are crucial to the success of RT-PCR. • Reverse transcriptase - Reverse transcriptase enzymes are RNA-directed DNA polymerases derived from retroviruses that can synthesize single-stranded cDNA fragments complementary to an RNA template. • Oligonucleotide primers - Three different types of primers can be used in the RT reaction: o Oligo(dT) - Primers that selectively anneal to the poly(A) tails found on most messenger RNA (mRNA) molecules. Only polyadenylated RNA will be reverse transcribed in an oligo(dT) - primed reaction. o Random hexamers - A mixture of random hexanucleotide primers that anneal to sequences throughout the target RNA, resulting in reverse transcription of both polyadenylated and nonpolyadenylated RNAs. Random hexamer-primed cDNA can be used in multiple PCR reactions, allowing for analysis of more than one target region. o Sequence-specific - Primers that hybridize to a specified gene sequence and result in reverse transcription of a specific mRNA When using sequence-specific primers, a new RT reaction must be performed for each gene of interest. • Nucleotides - The four dNTPs – dATP, dTTP, dCTP, and dGTP – are the building blocks of the cDNA strands synthesized by reverse transcription. • RNAse inhibitor - Addition of an RNase inhibitor helps protect the RNA template from activity of RNA-degrading ribonucleases present in the laboratory environment. • Reaction buffer : RT reaction buffer provides optimal conditions for enzyme activity. An appropriate buffer is usually supplied with the reverse transcriptase enzyme. RNA QUANTITATION • Common methods used to quantitate RNA: • UV Spectroscopy o The traditional method for assessing RNA concentration and purity. The absorbance of a diluted RNA sample is measured at 260 and 280 nm. The nucleic acid concentration is calculated using the Beer-Lambert law, which predicts a linear change in absorbance with concentration. UV spectroscopy is the most widely used method to quantitate RNA. It is simple to perform, and UV spectrophotometers are available in most laboratories. • Agilent 2100 Bioanalyzer o The Agilent 2100 bioanalyzer uses a combination of microfluidics, capillary electrophoresis, and fluorescent dye that binds to nucleic acid to evaluate both RNA concentration and integrity. The ladder and samples are loaded in designated wells on the RNA Lab Chip. Size and mass information is provided by the fluorescence of RNA molecules as they move through the channels of the chip. The instrument software automatically compares the peak areas from unknown RNA samples to the combined area of the six RNA 6000 Ladder RNA peaks to determine the concentration of the unknown samples. The RNA 6000 Nano System has a broad dynamic range and can quantitate between 25-500 ng/ml of RNA with a covariance of ~10%. o Perhaps the most powerful feature of the Agilent 2100 bioanalyzer is its ability to provide information about RNA integrity. As each RNA sample is analyzed, the software generates both a gel-like image and an electropherogram. When analyzing total RNA, the areas under the 18S and 28S ribosomal RNA peaks are used to calculate the ratio of these two major
ribosomal RNA species and these data are displayed along with quantitation data on individual electropherograms Significant changes in the ratios of the 18S and 28S ribosomal RNA peaks are indicative of degraded RNA. • In addition to its usefulness for analysis of total RNA, the bioanalyzer is also a superior tool for analyzing mRNA and amplified aRNA (antisense RNA) integrity. Intact mRNA and aRNA profiles consist of a broad distribution of signal, with the bulk of the RNA usually falling between 1 and 2 kb, though this will vary from tissue to tissue. A significant shift of the profile towards lower molecular weights is indicative of poor RNA integrity. APPLICATION OF RNA TECHNOLOGY RNAi TECHNOLOGY • RNA interference (RNAi) is the process by which the translation of a protein is prevented by selective degradation of its encoded mRNA. In nature, this mechanism likely evolved for cells to eliminate unwanted foreign genes as a defense against viruses. In research, this technique is used for loss-of-function studies. • RNAi technology has the ability to validate target genes and functionally assess relevant disease genes. Thus leading to the development of effective therapeutics. The discovery of RNA interference by Craig Mello and Andrew Fire earned them the 2006 Nobel Prize for Physiology or Medicine. Highlighting the importance of this technology on the future of disease research and drug development. LESSON 3.3: PROTEIN SYNTHESIS AND MANIPULATION PROTEINS • Proteins are polymers of amino acids. Each amino acid contains a central carbon, a hydrogen, a carboxyl group, an amino group, and a variable R group. The R group specifies which class of amino acids it belongs to: electrically charged hydrophilic side chains, polar but uncharged side chains, nonpolar hydrophobic side chains, and special cases. • Proteins have different “layers” of structure: primary, secondary, tertiary, quaternary. • Proteins have a variety of function in cells. Major functions include acting as enzymes, receptors, transport molecules, regulatory proteins for gene expression, and so on. Enzymes are biological catalysts that speed up a chemical reaction without being permanently altered. They have “active sites” where the substrate/reactant binds, and they can be either activated or inhibited (competitive and/or noncompetitive inhibitors). AMINO ACIDS • Proteins are one of the most abundant organic molecules in living systems and have the most diverse range of functions of all macromolecules. Proteins may be structural, regulatory, contractile, or protective; they may serve in transport, storage, or membranes; or they may be toxins or enzymes. Each cell in a living system may contain thousands of proteins, each with a unique function. Their structures, like their functions, vary greatly. They are all, however, polymers of amino acids, arranged in a linear sequence Amino acids have a central asymmetric carbon to which an amino group, a carboxyl group, a hydrogen atom, and a side chain (R group) are attached. • Amino acids are the monomers that make up proteins. Each amino acid has the same fundamental structure, which consists of a central carbon atom, also known as the alpha (α) carbon, bonded to an amino group (NH2), a carboxyl group (COOH), and to a hydrogen atom. Every amino acid also has another atom or group of atoms bonded to the central atom known as R group (Figure 1). • The name “amino acid” is derived from the fact that they contain both amino group and carboxyl-acid-group in their basic structure. As mentioned, there are 20 amino acids present in proteins. Ten of these are considered essential amino acids in humans because the human body cannot produce them and they are obtained from the diet. There are 20 common amino acids commonly found in proteins, each with a different R group (variant group) that determines its chemical nature. • Which categories of amino acid would you expect to find on surface of a soluble protein, and which would you expect to
find in interior? What distribution of amino acids would you expect to find in a protein embedded in a lipid bilayer? • The chemical nature of the side chain determines the nature of the amino acid (that is, whether it is acidic, basic, polar, or nonpolar). For example, amino acid glycine has a hydrogen atom as the R group. Amino acids such as valine, methionine, and alanine are nonpolar or hydrophobic in nature, while amino acids such as serine, threonine, and cysteine are polar and have hydrophilic side chains. The side chains of lysine and arginine are positively charged, and therefore these amino acids are also known as basic amino acids. Proline has an R group that is linked to amino group, forming a ring-like structure. Proline is an exception to the standard structure of an animo acid since its amino group is not separate from side chain. • Amino acids are represented by a single upper case letter or a three-letter abbreviation. For example, valine is known by the letter V or the three-letter symbol val. Just as some fatty acids are essential to a diet, some amino acids are necessary as well. They are known as essential amino acids, and in humans they include isoleucine, leucine, and cysteine. Essential amino acids refer to those necessary for construction of proteins in the body, although not produced by the body; which amino acids are essential varies from organism to organism Peptide bond formation is a dehydration synthesis reaction. The carboxyl group of one amino acid is linked to the amino group of the incoming amino acid. In the process, a molecule of water is released . • Sequence and the number of amino acids ultimately determine the protein’s shape, size, and function. Each amino acid is attached to another amino acid by a covalent bond, known as a peptide bond, which is formed by a dehydration reaction. The carboxyl group of one amino acid and the amino group of the incoming amino acid combine, releasing a molecule of water. The resulting bond is the peptide bond (Figure 3). • The products formed by such linkages are called peptides. As more amino acids join to this growing chain, the resulting chain is known as a polypeptide. Each polypeptide has a free amino group at one end. This end is called the N terminal, or the amino terminal, and the other end has a free carboxyl group, also known as the C or carboxyl terminal. While the terms polypeptide and protein are sometimes used interchangeably, a polypeptide is technically a polymer of amino acids, whereas the term protein is used for a polypeptide or polypeptides that have combined together, often have bound non-peptide prosthetic groups, have a distinct shape, and have a unique function. After protein synthesis (translation), most proteins are modified. These are known as post-translational modifications. They may undergo cleavage, phosphorylation, or may require the addition of other chemical groups. Only after these modifications is the protein completely functional. THE EVOLUTIONARY SIGNIFICANCE OF CYTOCHROME C • Cytochrome c is an important component of the electron transport chain, a part of cellular respiration, and it is normally found in the cellular organelle, the mitochondrion. This protein has a heme prosthetic group, and the central ion of the heme gets alternately reduced and oxidized during electron transfer. Because this essential protein’s role in producing cellular energy is crucial, it has changed very little over millions of years. Protein sequencing has shown that there is a considerable amount of cytochrome c amino acid sequence homology among different species; in other words, evolutionary kinship can be assessed by measuring the similarities or differences among various species’ DNA or protein sequences. • Scientists have determined that human cytochrome c contains 104 amino acids. For each cytochrome c molecule from different organisms that has been sequenced to date, 37 of these amino acids appear in the same position in all samples of cytochrome c. This indicates that there may have been a common ancestor. On comparing the human and chimpanzee protein sequences, no sequence difference was found. When human and rhesus monkey sequences were compared, the single difference found was in one amino acid. In another comparison, human to yeast sequencing shows a difference in the 44th position. PROTEIN STRUCTURE • As discussed earlier, the shape of a protein is critical to its function. For example, an enzyme can bind to a specific substrate at a site known as the active site. If this active site is altered because of local changes or changes in overall protein structure, the enzyme may be unable to bind to the substrate. To understand how the protein gets its final shape or conformation, we need to understand the four levels of protein structure: primary, secondary, tertiary, and quaternary. PRIMARY STRUCTURE • The unique sequence of amino acids in a polypeptide chain is its primary structure. For example, the pancreatic hormone insulin has two polypeptide chains, A and B, and they are linked together by disulfide bonds. The N terminal amino acid of the A chain is glycine, whereas the C terminal amino acid is asparagine (Figure 4). The sequences of amino acids in the A and B chains are unique to insulin. Bovine serum insulin is a protein hormone made of two peptide chains, A (21 amino acids long) and B (30 amino acids long). In each chain, primary structure is indicated by three-letter abbreviations that represent the names of the amino acids in the order they are present. The amino acid
cysteine (cys) has a sulfhydryl (SH) group as a side chain. Two sulfhydryl groups can react in the presence of oxygen to form a disulfide (S-S) bond. Two disulfide bonds connect the A and B chains together, and a third helps the A chain fold into the correct shape. Note that all disulfide bonds are the same length, but are drawn different sizes for clarity. • The unique sequence for every protein is ultimately determined by the gene encoding the protein. A change in nucleotide sequence of the gene’s coding region may lead to a different amino acid being added to the growing polypeptide chain, causing a change in protein structure and function. In sickle cell anemia, the hemoglobin β chain (a small portion of which is shown in Figure 5) has a single amino acid substitution, causing a change in protein structure and function. • o The beta chain of hemoglobin is 147 residues in length, yet a single amino acid substitution leads to sickle cell anemia. In normal hemoglobin, the amino acid at position seven is glutamate. In sickle cell hemoglobin, this glutamate is replaced by a valine. • Specifically, the amino acid glutamic acid is substituted by valine in the β chain. What is most remarkable to consider is that a hemoglobin molecule is made up of two alpha chains and two beta chains that each consist of about 150 amino acids. The molecule, therefore, has about 600 amino acids. The structural difference between a normal hemoglobin molecule and a sickle cell molecule —which dramatically decreases life expectancy —is a single amino acid of the 600. What is even more remarkable is that those 600 amino acids are encoded by three nucleotides each, and the mutation is caused by a single base change (point mutation), 1 in 1800 bases. o In this blood smear, visualized at 535x magnification using bright field microscopy, sickle cells are crescent shaped, while normal cells are disc-shaped. • Because of this change of one amino acid in the chain, hemoglobin molecules form long fibers that distort the biconcave, or disc-shaped, red blood cells and assume a crescent or “sickle” shape, which clogs arteries (Figure 6). This can lead to myriad serious health problems such as breathlessness, dizziness, headaches, and abdominal pain for those affected by this disease. SECONDARY STRUCTURE • The local folding of the polypeptide in some regions gives rise to the secondary structure of the protein. The most common are the α-helix and β-pleated sheet structures (Figure 7). Both structures are the α-helix structure—the helix held in shape by hydrogen bonds. The hydrogen bonds form between the oxygen atom in the carbonyl group in one amino acid and another amino acid that is four amino acids farther along the chain. o The α-helix and β-pleated sheet are secondary structures of proteins that form because of hydrogen bonding between carbonyl and amino groups in the peptide backbone. Certain amino acids have a propensity to form an α-helix, while others have a propensity to form a β-pleated sheet. • Every helical turn in an alpha helix has 3.6 amino acid residues. The R groups (the variant groups) of the polypeptide protrude out from the α-helix chain. In the β-pleated sheet, the “pleats” are formed by hydrogen bonding between atoms on the backbone of the polypeptide chain. The R groups are attached to the carbons and extend above and below the folds of the pleat. The pleated segments align parallel or antiparallel to each other, and hydrogen bonds form between the partially positive nitrogen atom in the amino group and the partially negative oxygen atom in the carbonyl group of the peptide backbone. The α-helix and β-pleated sheet structures are found in most globular and fibrous proteins and they play an important structural role. TERTIARY STRUCTURE • The unique three-dimensional structure of a polypeptide is its tertiary structure (Figure 8). This structure is in part due to chemical interactions at work on the polypeptide chain. Primarily, the interactions among R groups creates the complex three-dimensional tertiary structure of a protein. The nature of the R groups found in the amino acids involved can counteract the formation of the hydrogen bonds described for standard secondary structures. For example, R groups with like charges are repelled by each other and those with unlike charges are attracted to each other (ionic bonds). When protein folding takes place, the hydrophobic R groups of nonpolar amino acids lay in the interior of the protein, whereas the hydrophilic R groups lay on the outside. The former types of interactions are also known as hydrophobic interactions. Interaction between cysteine side chains forms disulfide linkages in the presence of oxygen, the only covalent bond forming during protein folding.
o The tertiary structure of proteins is determined by a variety of chemical interactions. These include hydrophobic interactions, ionic bonding, hydrogen bonding and disulfide linkages. • All of these interactions, weak and strong, determine the final three-dimensional shape of the protein. When a protein loses its three-dimensional shape, it may no longer be functional. QUATERNARY STRUCTURE • In nature, some proteins are formed from several polypeptides, also known as subunits, and the interaction of these subunits forms the quaternary structure. Weak interactions between the subunits help to stabilize the overall structure. For example, insulin (a globular protein) has a combination of hydrogen bonds and disulfide bonds that cause it to be mostly clumped into a ball shape. Insulin starts out as a single polypeptide and loses some internal sequences in the presence of post-translational modification after the formation of the disulfide linkages that hold the remaining chains together. Silk (a fibrous protein), however, has a β-pleated sheet structure that is the result of hydrogen bonding between different chains. DENATURATION AND PROTEIN FOLDING • Each protein has its own unique sequence and shape that are held together by chemical interactions. If the protein is subject to changes in temperature, pH, or exposure to chemicals, the protein structure may change, losing its shape without losing its primary sequence in what is known as denaturation. • Denaturation is often reversible because the primary structure of the polypeptide is conserved in the process if the denaturing agent is removed, allowing the protein to resume its function. Sometimes denaturation is irreversible, leading to loss of function. One example of irreversible protein denaturation is when an egg is fried. The albumin protein in the liquid egg white is denatured when placed in a hot pan. Not all proteins are denatured at high temperatures; for instance, bacteria that survive in hot springs have proteins that function at temperatures close to boiling. The stomach is also very acidic, has a low pH, and denatures proteins as part of the digestion process; however, the digestive enzymes of the stomach retain their activity under these conditions. • Protein folding is critical to its function. It was originally thought that the proteins themselves were responsible for the folding process. Only recently was it found that often they receive assistance in the folding process from protein helpers known as chaperones (or chaperonins) that associate with the target protein during the folding process. They act by preventing aggregation of polypeptides that make up the complete protein structure, and they disassociate from the protein once the target protein is folded. FUNCTION OF PROTEINS Type Examples Functions Digestive Enzymes Amylase, lipase, pepsin, trypsin Help in digestion of food by catabolizing nutrients into monomeric units Transport Hemoglobin, albumin Carry substances in the blood or lymph throughout the body Structural Actin, tubulin, keratin Construct different structures, like cytoskeleton Hormones Insulin, thyroxine Coordinate the activity of different body systems Defense Immunoglobulins Protect the body from foreign pathogens Contractile Actin, myosin Effect muscle contraction Storage Legume storage proteins, egg white (albumin) Provide nourishment in early development of the embryo and the seedling • Two special and common types of proteins are enzymes and hormones. Enzymes, which are produced by living cells, are catalysts in biochemical reactions (like digestion) and are usually complex or conjugated proteins. Each enzyme is specific for the substrate (a reactant that binds to an enzyme) it acts on. The enzyme may help in breakdown, rearrangement, or synthesis reactions. Enzymes that break down their substrates are called catabolic enzymes, enzymes that build more complex molecules from their substrates are called anabolic enzymes, and enzymes that affect the rate of reaction are called catalytic enzymes. It should be noted that all enzymes increase the rate of reaction and, therefore, are considered to be organic catalysts. An example of an enzyme is salivary amylase, which hydrolyzes its substrate amylose, a component of starch.
• Hormones are chemical-signaling molecules, usually small proteins or steroids, secreted by endocrine cells that act to control or regulate specific physiological processes, including growth, development, metabolism, and reproduction. For example, insulin is a protein hormone that helps to regulate the blood glucose level. • Proteins have different shapes and molecular weights; some proteins are globular in shape whereas others are fibrous in nature. For example, hemoglobin is a globular protein, but collagen, found in our skin, is a fibrous protein. Protein shape is critical to its function, and this shape is maintained by many different types of chemical bonds. Changes in temperature, pH, and exposure to chemicals may lead to permanent changes in the shape of the protein, leading to loss of function, known as denaturation. All proteins are made up of different arrangements of the same 20 types of amino acids. TRANSCRIPTION OF DNA TO RNA • The process by which DNA is copied to RNA is called transcription (Links to an external site.), and that by which RNA is used to produce proteins is called translation (Links to an external site.). DNA REPLICATION • Each time a cell divides, each of its double strands of DNA splits into two single strands. Each of these single strands acts as a template for a new strand of complementary DNA. As a result, each new cell has its own complete genome. This process is known as DNA replication. Replication is controlled by the Watson-Crick pairing of the bases in the template strand with incoming deoxynucleoside triphosphates, and is directed by DNA polymerase enzymes. It is a complex process, particularly in eukaryotes, involving an array of enzymes. o DNA replication in bacteria • DNA biosynthesis proceeds in the 5′- to 3′-direction. This makes it impossible for DNA polymerases to synthesize both strands simultaneously. A portion of the double helix must first unwind, and this is mediated by helicase enzymes. The leading strand is synthesized continuously but the opposite strand is copied in short bursts of about 1000 bases, as the lagging strand template becomes available. The resulting short strands are called Okazaki fragments (after their discoverers, Reiji and Tsuneko Okazaki). Bacteria have at least three distinct DNA polymerases: Pol I, Pol II and Pol III; it is Pol III that is largely involved in chain elongation. Strangely, DNA polymerases cannot initiate DNA synthesis de novo, but require a short primer with a free 3′-hydroxyl group. This is produced in the lagging strand by an RNA polymerase (called DNA primase) that is able to use the DNA template and synthesize a short piece of RNA around 20 bases in length. Pol III can then take over, but it eventually encounters one of the previously synthesized short RNA fragments in its path. At this point Pol I takes over, using its 5′- to 3′-exonuclease activity to digest the RNA and fill the gap with DNA until it reaches a continuous stretch of DNA. This leaves a gap between the 3′-end of the newly synthesized DNA and the 5′-end of the DNA previously synthesized by Pol III. The gap is filled by DNA ligase, an enzyme that makes a covalent bond between a 5′-phosphate and a 3′-hydroxyl group. o Simplified representation of the action of DNA polymerases in DNA replication in bacteria. • Mistakes in DNA replication o DNA replication is not perfect. Errors occur in DNA replication, when the incorrect base is incorporated into the growing DNA strand. This leads to mismatched base pairs, or mispairs. DNA polymerases have proofreading activity, and a DNA repair (Links to an external site.) enzymes have evolved to correct these mistakes. Occasionally, mispairs survive and are incorporated into the genome in the next round of replication. These mutations may have no consequence, they may result in death of the organism, they may result in a genetic disease or cancer; or they may give the organism a competitive advantage over its neighbours, which leads to evolution by natural selection. TRANSCRIPTION • Transcription is the process by which DNA is copied (transcribed) to mRNA, which carries the information needed for protein synthesis. Transcription takes place in two broad steps. First, pre-messenger RNA is formed, with the involvement of RNA polymerase enzymes. The process relies on Watson-Crick base pairing, and the resultant single strand of RNA is the reverse-complement of the original DNA sequence. The pre-messenger RNA is then "edited" to produce the desired mRNA molecule in a process called RNA splicing. • Formation of pre-messenger RNA o The mechanism of transcription has parallels in that of DNA replication (Links to an external site.). As with DNA replication, partial unwinding of the double helix must occur before transcription can take place, and it is RNA polymerase enzymes that catalyze this process. o Unlike DNA replication, in which both strands are copied, only one strand is transcribed. The strand that contains the gene is called the sense strand, while the complementary strand is the antisense strand. The mRNA produced in transcription is a copy of sense strand, but it ’s antisense strand that is transcribed.
o Ribonucleoside triphosphates (NTPs) align along the antisense DNA strand, with Watson-Crick base pairing (A pairs with U). RNA polymerase joins the ribonucleotides together to form a pre-messenger RNA molecule that is complementary to a region of the antisense DNA strand. Transcription ends when the RNA polymerase enzyme reaches a triplet of bases that is read as a "stop" signal. The DNA molecule re-winds to re-form the double helix. o Simplified representation of the formation of pre-messenger RNA (orange) from double-stranded DNA (blue) in transcription. • RNA splicing o The pre-messenger RNA thus formed contains introns which are not required for protein synthesis. The pre-messenger RNA is chopped up to remove the introns and create messenger RNA (mRNA) in a process called RNA splicing. o ▪ Introns are spliced from the pre-messenger RNA to give messenger RNA (mRNA). • Alternative splicing o In alternative splicing, individual exons are either spliced or included, giving rise to several different possible mRNA products. Each mRNA product codes for a different protein isoform; these protein isoforms differ in their peptide sequence and therefore their biological activity. It is estimated that up to 60% of human gene products undergo alternative splicing. Several different mechanisms of alternative splicing are known. ▪ Several different mechanisms of alternative splicing exist − a cassette exon can be either included in or excluded from the final RNA (top), or two cassette exons may be mutually exclusive (bottom). o Alternative splicing contributes to protein diversity − a single gene transcript (RNA) can have thousands of different splicing patterns, and will therefore code for thousands of different proteins: a diverse proteome is generated from a relatively limited genome. Splicing is important in genetic regulation (alteration of the splicing pattern in response to cellular conditions changes protein expression). Perhaps not surprisingly, abnormal splicing patterns can lead to disease states including cancer. • Reverse transcription o In reverse transcription, RNA is "reverse transcribed" into DNA. This process, catalyzed by reverse transcriptase enzymes, allows retroviruses, including the human immunodeficiency virus (HIV), to use RNA as their genetic material. Reverse transcriptase enzymes have also found applications in biotechnology, allowing scientists to convert RNA to DNA for techniques such as PCR . TRANSLATION • The mRNA formed in transcription is transported out of the nucleus, into the cytoplasm, to the ribosome (the cell's protein synthesis factory). Here, it directs protein synthesis. Messenger RNA is not directly involved in protein synthesis − transfer RNA (tRNA) is required for this. The process by which mRNA directs protein synthesis with the assistance of tRNA is called translation. • The ribosome is a very large complex of RNA and protein molecules. Each three-base stretch of mRNA (triplet) is known as a codon, and one codon contains the information for a specific amino acid. As the mRNA passes through the ribosome, each codon interacts with the anticodon of a specific transfer RNA (tRNA) molecule by Watson-Crick base pairing. This tRNA molecule carries an amino acid at its 3′-terminus, which is incorporated into the growing protein chain. The tRNA is then expelled from the ribosome.
o Translation(a) and (b) tRNA molecules bind to the two binding sites of the ribosome, and by hydrogen bonding to the mRNA; (c) a peptide bond forms between the two amino acids to make a dipeptide, while the tRNA molecule is left uncharged; (d) the uncharged tRNA molecule leaves the ribosome, while the ribosome moves one codon to the right (the dipeptide is translocated from one binding site to the other); (e) another tRNA molecule binds; (f) a peptide bond forms between the two amino acids to make a tripeptide; (g) the uncharged tRNA molecule leaves the ribosome. TRANSFER RNA • Transfer RNA adopts a well defined tertiary structure which is normally represented in two dimensions as a cloverleaf shape. • Each amino acid has its own special tRNA (or set of tRNAs). For example, the tRNA for phenylalanine (tRNAPhe) is different from that for histidine (tRNAHis). Each amino acid is attached to its tRNA through the 3′-OH group to form an ester which reacts with the α-amino group of the terminal amino-acid of the growing protein chain to form a new amide bond (peptide bond) during protein synthesis. The reaction of esters with amines is generally favourable but the rate of reaction is increased greatly in the ribosome. o Reaction of the growing polypeptide chain with the 3′-end of the charged tRNA. The amino acid is transferred from the tRNA molecule to the protein. • Each transfer RNA molecule has a well defined tertiary structure that is recognized by the enzyme aminoacyl tRNA synthetase, which adds the correct amino acid to the 3′-end of the uncharged tRNA. The presence of modified nucleosides is important in stabilizing the tRNA structure. o Structures of some of the modified bases found in tRNA THE GENETIC CODE • The genetic code is almost universal. It is the basis of the transmission of hereditary information by nucleic acids in all organisms. There are four bases in RNA (A,G,C and U), so there are 64 possible triplet codes (43 = 64). In theory only 22 codes are required: one for each of the 20 naturally occurring amino acids, with the addition of a start codon and a stop codon (to indicate the beginning and end of a protein sequence). Many amino acids have several codes (degeneracy), so that all 64 possible triplet codes are used. For example Arg and Ser each have 6 codons whereas Trp and Met have only one. No two amino acids have the same code but amino acids whose side-chains have similar physical or chemical properties tend to have similar codon sequences, e.g. the side-chains of Phe, Leu, Ile, Val are all hydrophobic, and Asp and Glu are both carboxylic acids. This means that if the incorrect tRNA is selected during translation (owing to mispairing of a single base at the codon-anticodon interface) the misincorporated amino acid will probably have similar properties to the intended tRNA molecule. Although the resultant protein will have one incorrect amino acid it stands a high probability of being functional. Organisms show "codon bias" and use certain codons for a particular amino acid more than others. For example, the codon usage in humans is different from that in bacteria; it can sometimes be difficult to express a human protein in bacteria because the relevant tRNA might be present at too low a concentration.
• One strand of genomic DNA (strand A, coding strand) contains the following sequence reading from 5′- to 3′-: o TCGTCGACGATGATCATCGGCTACTCGA • This strand will form the following duplex: o 5′-TCGTCGACGATGATCATCGGCTACTCGA-3' o 3′-AGCAGCTGCTACTAGTAGCCGATGAGCT-5' • The sequence of bases in the other strand of DNA (strand B) written 5′- to 3′- is therefore o TCGAGTAGCCGATGATCATCGTCGACGA • The sequence of bases in the mRNA transcribed from strand A of DNA written 5′- to 3′- is o UCGAGUAGCCGAUGAUCAUCGUCGACGA • The amino acid sequence coded by the above mRNA is o Ser-Ser-Ser-Arg-STOP • However, if DNA strand B is the coding strand the mRNA sequence will be: o UCGUCGACGAUGAUCAUCGGCUACUCGA o and the amino-acid sequence will be: o Ser-Ser-Thr-Arg-Ser-Ser-Gly-Cys-Ser- THE WOBBLE HYPOTHESIS • Close inspection of all of the available codons for a particular amino acid reveals that the variation is greatest in the third position (for example, the codons for alanine are GCU, GCC, GCA and GCG). Crick and Brenner proposed that a single tRNA molecule can recognize codons with different bases at the 3′-end owing to non-Watson-Crick base pair formation with the third base in the codon-anticodon interaction. These non-standard base pairs are different in shape from A·U and G·C and the term wobble hypothesis indicates that a certain degree of flexibility or "wobbling" is allowed at this position in the ribosome. Not all combinations are possible. o Structures of wobble base pairs found in RNA • The ability of DNA bases to form wobble base pairs as well as Watson-Crick base pairs can result in base-pair mismatches occurring during DNA replication. If not repaired by DNA repair enzymes, these mismatches can lead to genetic diseases and cancer.