Human Genome
Human Genome
Human Genome
CONTENTS:
INTRODUCTION
MOLECULAR ORGANIZATION AND GENE CONTENT
COMPLETENESS OF THE HUMAN GENOME SEQUENCE
GENOMIC VARIATION IN HUMANS
HUMAN REFERENCE GENOME
MEASURING HUMAN GENETIC VARIATION
MAPPING HUMAN GENOMIC VARIATION
PERSONAL GENOMES
HUMAN KNOCKOUT
CONCLUSION
INTRODUCTION:
The human genome is the complete set of nucleic acid sequences for humans,
encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule
found within individual mitochondria.
These are usually treated separately as the nuclear genome, and the mitochondrial
genome. Human genomes include both protein-coding DNA genes and noncoding DNA.
Haploid human genomes, which are contained in germ cells (the egg and sperm gamete cells
created in the meiosis phase of sexual reproduction before fertilization creates a zygote)
consist of three billion DNA base pairs, while diploid genomes (found in somatic cells) have
twice the DNA content. While there are significant differences among the genomes of human
individuals (on the order of 0.1%), these are considerably smaller than the differences
between humans and their closest living relatives, the chimpanzees (approximately 4%) and
bonobos.
The first human genome sequences were published in nearly complete draft form in
February 2001 by the Human Genome Project and Celera Corporation. Completion of the
Human Genome Project's sequencing effort was announced in 2004 with the publication of a
draft genome sequence, leaving just 341 gaps in the sequence, representing highly-repetitive
and other DNA that could not be sequenced with the technology available at the time. The
human genome was the first of all vertebrates to be sequenced to such near-completion, and
as of 2018, the diploid genomes of over a million individual humans had been determined
using next-generation sequencing. These data are used worldwide in biomedical science,
anthropology, forensics and other branches of science. Such genomic studies have lead to
advances in the diagnosis and treatment of diseases, and to new insights in many fields of
biology, including human evolution.
As genome sequence quality and the methods for identifying protein-
coding genes improved, the count of recognized protein-coding genes dropped to 19,000-
20,000.[9] However, a fuller understanding of the role played by genes expressing regulatory
RNAs that do not encode proteins has raised the total number of genes to at least 46,831, plus
another 2300 micro-RNA genes. By 2012, functional DNA elements that encode neither
RNA nor proteins have been noted and another 10% equivalent of human genome was found
in a recent (2018) population survey. Protein-coding sequences account for only a very small
fraction of the genome (approximately 1.5%), and the rest is associated with non-coding
RNA genes, regulatory DNA sequences, LINEs, SINEs, introns, and sequences for which as
yet no function has been determined.
The haploid human genome (23 chromosomes) is about 3 billion base pairs long and contains
around 30,000 genes. Since every base pair can be coded by 2 bits, this is about 750
megabytes of data. An individual somatic (diploid) cell contains twice this amount, that is,
about 6 billion base pairs. Men have fewer than women because the Y chromosome is about
57 million base pairs whereas the X is about 156 million, but in terms of information men
have more because the second X contains almost the same information as the first[citation
needed]. Since individual genomes vary in sequence by less than 1% from each other, the
variations of a given human's genome from a common reference can be losslessly
compressed to roughly 4 megabytes.
The entropy rate of the genome differs significantly between coding and non-coding
sequences. It is close to the maximum of 2 bits per base pair for the coding sequences (about
45 million base pairs), but less for the non-coding parts. It ranges between 1.5 and 1.9 bits per
base pair for the individual chromosome, except for the Y-chromosome, which has an
entropy rate below 0.9 bits per base pair.
Diagram showing the number of base pairs on each chromosome in green
With the exception of identical twins, all humans show significant variation in genomic DNA
sequences. The human reference genome (HRG) is used as a standard sequence reference.
There are several important points concerning the human reference genome:
The HRG is a haploid sequence. Each chromosome is represented once.
The HRG is a composite sequence, and does not correspond to any actual human
individual.
The HRG is periodically updated to correct errors, ambiguities, and unknown "gaps".
The HRG in no way represents an "ideal" or "perfect" human individual. It is simply a
standardized representation or model that is used for comparative purposes.
The Genome Reference Consortium is responsible for updating the HRG. Version 38 was
released in December 2013.
A genome map is less detailed than a genome sequence and aids in navigating around the
genome.
An example of a variation map is the HapMap being developed by the International
HapMap Project. The HapMap is a haplotype map of the human genome, “which will
describe the common patterns of human DNA sequence variation”. It catalogs the patterns of
small-scale variations in the genome that involve single DNA letters, or bases.
Researchers published the first sequence-based map of large-scale structural variation across
the human genome in the journal Nature in May 2008. Large-scale structural variations are
differences in the genome among people that range from a few thousand to a few million
DNA bases; some are gains or losses of stretches of genome sequence and others appear as
re-arrangements of stretches of sequence. These variations include differences in the number
of copies individuals have of a particular gene, deletions, translocations and inversions.
SNP frequency across the human genome
Single-nucleotide polymorphisms (SNPs) do not occur homogeneously across the human
genome. In fact, there is enormous diversity in SNP frequency between genes, reflecting
different selective pressures on each gene as well as different mutation and recombination
rates across the genome. However, studies on SNPs are biased towards coding regions, the
data generated from them are unlikely to reflect the overall distribution of SNPs throughout
the genome. Therefore, the SNP Consortium protocol was designed to identify SNPs with no
bias towards coding regions and the Consortium's 100,000 SNPs generally reflect sequence
diversity across the human chromosomes.The SNP Consortium aims to expand the number of
SNPs identified across the genome to 300 000 by the end of the first quarter of 2001.
PERSONAL GENOMES:
A personal genome sequence is a (nearly) complete sequence of the chemical base pairs that
make up the DNA of a single person. Because medical treatments have different effects on
different people due to genetic variations such as single-nucleotide polymorphisms (SNPs),
the analysis of personal genomes may lead to personalized medical treatment based on
individual genotypes.
The first personal genome sequence to be determined was that of Craig Venter in 2007.
Personal genomes had not been sequenced in the public Human Genome Project to protect
the identity of volunteers who provided DNA samples. That sequence was derived from the
DNA of several volunteers from a diverse population.[81] However, early in the Venter-led
Celera Genomics genome sequencing effort the decision was made to switch from
sequencing a composite sample to using DNA from a single individual, later revealed to have
been Venter himself. Thus the Celera human genome sequence released in 2000 was largely
that of one man. Subsequent replacement of the early composite-derived data and
determination of the diploid sequence, representing both sets of chromosomes, rather than a
haploid sequence originally reported, allowed the release of the first personal genome. In
April 2008, that of James Watson was also completed. Since then hundreds of personal
genome sequences have been released, including those of Desmond Tutu, and of a Paleo-
Eskimo. In 2012, the whole genome sequences of two family trios among 1092 genomes was
made public. In November 2013, a Spanish family made four personal exome datasets (about
1% of the genome) publicly available under a Creative Commons public domain license.[87]
The Personal Genome Project (started in 2005) is among the few to make both genome
sequences and corresponding medical phenotypes publicly available.
The sequencing of individual genomes further unveiled levels of genetic complexity that had
not been appreciated before. Personal genomics helped reveal the significant level of
diversity in the human genome attributed not only to SNPs but structural variations as well.
However, the application of such knowledge to the treatment of disease and in the medical
field is only in its very beginnings. Exome sequencing has become increasingly popular as a
tool to aid in diagnosis of genetic disease because the exome contributes only 1% of the
genomic sequence but accounts for roughly 85% of mutations that contribute significantly to
disease.
HUMAN KNOCKOUTS:
Knockouts in specific genes can cause genetic diseases, potentially have beneficial effects, or
even result in no phenotypic effect at all. However, determining a knockout’s phenotypic
effect and in humans can be challenging. Challenges to characterizing and clinically
interpreting knockouts include difficulty calling of DNA variants, determining disruption of
protein function (annotation), and considering the amount of influence mosaicism has on the
phenotype.
Conclusion:
The Human Genome Project also produced other advances, not expected to be accomplished
until much later. These included an advanced draft of the mouse genome and an initial draft
of the rat genome.
Medical researchers did not wait to use data from the Human Genome Project. When the
project began in 1990, fewer than 100 human disease genes had been identified. At the
project's conclusion in 2003, the number of identified disease genes had risen to more than
1,400.
The Human Genome Project focused on the DNA sequence of an individual. The next step
was to analyze DNA sequences from different populations. This catalog of human genetic
variation was called the HapMap. Completed in 2005, the HapMap used single nucleotide
polymorphisms called SNPs to identify large blocks of DNA sequence called haplotypes that
tend to be inherited together. To use the data, researchers compare haplotypes between
people with and without a disease. Haplotypes shared by people with the disease are then
examined in detail to look for associated genes. Already, scientists have used its data to
identify a gene associated with age-related macular degeneration, a disease responsible for
blindness among the elderly. It is expected that the HapMap will play an important role in
identifying many more disease genes in the future.