Human Genome

HUMAN GENOME
CONTENTS:
 INTRODUCTION
 MOLECULAR ORGANIZATION AND GENE CONTENT
 COMPLETENESS OF THE HUMAN GENOME SEQUENCE
 GENOMIC VARIATION IN HUMANS
 HUMAN REFERENCE GENOME
 MEASURING HUMAN GENETIC VARIATION
 MAPPING HUMAN GENOMIC VARIATION
 PERSONAL GENOMES
 HUMAN KNOCKOUT
 CONCLUSION
INTRODUCTION:
The human genome is the complete set of nucleic acid sequences for humans,
encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule
found within individual mitochondria.
Genome size : 3,234.83 Mb (Mega-basepairs) ...

Number of chromosomes : 23 pairs
These are usually treated separately as the nuclear genome, and the mitochondrial
genome. Human genomes include both protein-coding DNA genes and noncoding DNA.
Haploid human genomes, which are contained in germ cells (the egg and sperm gamete cells
created in the meiosis phase of sexual reproduction before fertilization creates a zygote)
consist of three billion DNA base pairs, while diploid genomes (found in somatic cells) have
twice the DNA content. While there are significant differences among the genomes of human
individuals (on the order of 0.1%), these are considerably smaller than the differences
between humans and their closest living relatives, the chimpanzees (approximately 4%) and
bonobos.
The first human genome sequences were published in nearly complete draft form in
February 2001 by the Human Genome Project and Celera Corporation. Completion of the
Human Genome Project's sequencing effort was announced in 2004 with the publication of a
draft genome sequence, leaving just 341 gaps in the sequence, representing highly-repetitive
and other DNA that could not be sequenced with the technology available at the time. The
human genome was the first of all vertebrates to be sequenced to such near-completion, and
as of 2018, the diploid genomes of over a million individual humans had been determined
using next-generation sequencing. These data are used worldwide in biomedical science,
anthropology, forensics and other branches of science. Such genomic studies have lead to
advances in the diagnosis and treatment of diseases, and to new insights in many fields of
biology, including human evolution.
As genome sequence quality and the methods for identifying protein-
coding genes improved, the count of recognized protein-coding genes dropped to 19,000-
20,000.[9] However, a fuller understanding of the role played by genes expressing regulatory
RNAs that do not encode proteins has raised the total number of genes to at least 46,831, plus
another 2300 micro-RNA genes. By 2012, functional DNA elements that encode neither
RNA nor proteins have been noted and another 10% equivalent of human genome was found
in a recent (2018) population survey. Protein-coding sequences account for only a very small
fraction of the genome (approximately 1.5%), and the rest is associated with non-coding
RNA genes, regulatory DNA sequences, LINEs, SINEs, introns, and sequences for which as
yet no function has been determined.
Molecular organization and gene content:

The total length of the human genome is over 3 billion base pairs. The genome is organized
into 22 paired chromosomes, plus the X chromosome (one in males, two in females) and, in
males only, one Y chromosome. These are all large linear DNA molecules contained within
the cell nucleus. The genome also includes the mitochondrial DNA, a comparatively small
circular molecule present in each mitochondrion. Basic information about these molecules
and their gene content, based on a reference genome that does not represent the sequence of
any specific individual, are provided in the following table. (Data source: Ensembl genome
browser release 87, December 2016 for most values; Ensembl genome browser release 68,
July 2012 for miRNA, rRNA, snRNA, snoRNA.)
Completeness of the human genome sequence:
Although the human genome has been completely sequenced for some practical purposes,
there are still hundreds of gaps in the sequence and an uncertainty of about 5-10% (300
million basepairs added in 2018). A recent study noted more than 160 euchromatic gaps of
which 50 gaps were closed. However, there are still numerous gaps in the heterochromatic
parts of the genome which is much harder to sequence due to numerous repeats and other
intractable sequence features.
The haploid human genome (23 chromosomes) is about 3 billion base pairs long and contains
around 30,000 genes. Since every base pair can be coded by 2 bits, this is about 750
megabytes of data. An individual somatic (diploid) cell contains twice this amount, that is,
about 6 billion base pairs. Men have fewer than women because the Y chromosome is about
57 million base pairs whereas the X is about 156 million, but in terms of information men
have more because the second X contains almost the same information as the first[citation
needed]. Since individual genomes vary in sequence by less than 1% from each other, the
variations of a given human's genome from a common reference can be losslessly
compressed to roughly 4 megabytes.
The entropy rate of the genome differs significantly between coding and non-coding
sequences. It is close to the maximum of 2 bits per base pair for the coding sequences (about
45 million base pairs), but less for the non-coding parts. It ranges between 1.5 and 1.9 bits per
base pair for the individual chromosome, except for the Y-chromosome, which has an
entropy rate below 0.9 bits per base pair.
Diagram showing the number of base pairs on each chromosome in green
Genomic variation in humans:
Human reference genome:
With the exception of identical twins, all humans show significant variation in genomic DNA
sequences. The human reference genome (HRG) is used as a standard sequence reference.
There are several important points concerning the human reference genome:
 The HRG is a haploid sequence. Each chromosome is represented once.
 The HRG is a composite sequence, and does not correspond to any actual human
individual.
 The HRG is periodically updated to correct errors, ambiguities, and unknown "gaps".
 The HRG in no way represents an "ideal" or "perfect" human individual. It is simply a
standardized representation or model that is used for comparative purposes.
The Genome Reference Consortium is responsible for updating the HRG. Version 38 was
released in December 2013.
MEASURING HUMAN GENETIC VARIATION:
Most studies of human genetic variation have focused on single-nucleotide polymorphisms

(SNPs), which are substitutions in individual bases along a chromosome. Most analyses
estimate that SNPs occur 1 in 1000 base pairs, on average, in the euchromatic human
genome, although they do not occur at a uniform density. Thus follows the popular statement
that "we are all, regardless of race, genetically 99.9% the same",although this would be
somewhat qualified by most geneticists. For example, a much larger fraction of the genome is
now thought to be involved in copy number variation. A large-scale collaborative effort to
catalog SNP variations in the human genome is being undertaken by the International
HapMap Project.
The genomic loci and length of certain types of small repetitive sequences are
highly variable from person to person, which is the basis of DNA fingerprinting and DNA
paternity testing technologies. The heterochromatic portions of the human genome, which
total several hundred million base pairs, are also thought to be quite variable within the
human population (they are so repetitive and so long that they cannot be accurately
sequenced with current technology). These regions contain few genes, and it is unclear
whether any significant phenotypic effect results from typical variation in repeats or
heterochromatin.
Most gross genomic mutations in gamete germ cells probably result in inviable
embryos; however, a number of human diseases are related to large-scale genomic
abnormalities. Down syndrome, Turner Syndrome, and a number of other diseases result
from nondisjunction of entire chromosomes. Cancer cells frequently have aneuploidy of
chromosomes and chromosome arms, although a cause and effect relationship between
aneuploidy and cancer has not been established.
Mapping human genomic variation:
A genome map is less detailed than a genome sequence and aids in navigating around the
genome.
An example of a variation map is the HapMap being developed by the International
HapMap Project. The HapMap is a haplotype map of the human genome, “which will
describe the common patterns of human DNA sequence variation”. It catalogs the patterns of
small-scale variations in the genome that involve single DNA letters, or bases.
Researchers published the first sequence-based map of large-scale structural variation across
the human genome in the journal Nature in May 2008. Large-scale structural variations are
differences in the genome among people that range from a few thousand to a few million
DNA bases; some are gains or losses of stretches of genome sequence and others appear as
re-arrangements of stretches of sequence. These variations include differences in the number
of copies individuals have of a particular gene, deletions, translocations and inversions.
SNP frequency across the human genome
Single-nucleotide polymorphisms (SNPs) do not occur homogeneously across the human
genome. In fact, there is enormous diversity in SNP frequency between genes, reflecting
different selective pressures on each gene as well as different mutation and recombination
rates across the genome. However, studies on SNPs are biased towards coding regions, the
data generated from them are unlikely to reflect the overall distribution of SNPs throughout
the genome. Therefore, the SNP Consortium protocol was designed to identify SNPs with no
bias towards coding regions and the Consortium's 100,000 SNPs generally reflect sequence
diversity across the human chromosomes.The SNP Consortium aims to expand the number of
SNPs identified across the genome to 300 000 by the end of the first quarter of 2001.
PERSONAL GENOMES:
A personal genome sequence is a (nearly) complete sequence of the chemical base pairs that
make up the DNA of a single person. Because medical treatments have different effects on
different people due to genetic variations such as single-nucleotide polymorphisms (SNPs),
the analysis of personal genomes may lead to personalized medical treatment based on
individual genotypes.
The first personal genome sequence to be determined was that of Craig Venter in 2007.
Personal genomes had not been sequenced in the public Human Genome Project to protect
the identity of volunteers who provided DNA samples. That sequence was derived from the
DNA of several volunteers from a diverse population.[81] However, early in the Venter-led
Celera Genomics genome sequencing effort the decision was made to switch from
sequencing a composite sample to using DNA from a single individual, later revealed to have
been Venter himself. Thus the Celera human genome sequence released in 2000 was largely
that of one man. Subsequent replacement of the early composite-derived data and
determination of the diploid sequence, representing both sets of chromosomes, rather than a
haploid sequence originally reported, allowed the release of the first personal genome. In
April 2008, that of James Watson was also completed. Since then hundreds of personal
genome sequences have been released, including those of Desmond Tutu, and of a Paleo-
Eskimo. In 2012, the whole genome sequences of two family trios among 1092 genomes was
made public. In November 2013, a Spanish family made four personal exome datasets (about
1% of the genome) publicly available under a Creative Commons public domain license.[87]
The Personal Genome Project (started in 2005) is among the few to make both genome
sequences and corresponding medical phenotypes publicly available.
The sequencing of individual genomes further unveiled levels of genetic complexity that had
not been appreciated before. Personal genomics helped reveal the significant level of
diversity in the human genome attributed not only to SNPs but structural variations as well.
However, the application of such knowledge to the treatment of disease and in the medical
field is only in its very beginnings. Exome sequencing has become increasingly popular as a
tool to aid in diagnosis of genetic disease because the exome contributes only 1% of the
genomic sequence but accounts for roughly 85% of mutations that contribute significantly to
disease.
HUMAN KNOCKOUTS:
In humans, gene knockouts naturally occur as heterozygous or homozygous loss-of-function

gene knockouts.
A pedigree displaying a first-cousin mating (carriers both carrying heterozygous knockouts mating as
marked by double line) leading to offspring possessing a homozygous gene knockout.
Populations with a high level of parental-relatedness result in a larger number of

homozygous gene knockouts as compared to outbred populations. Populations with high rates
of consanguinity, such as countries with high rates of first-cousin marriages, display the
highest frequencies of homozygous gene knockouts. Such populations include Pakistan,
Iceland, and Amish populations. These populations with a high level of parental-relatedness
have been subjects of human knock out research which has helped to determine the function
of specific genes in humans. By distinguishing specific knockouts, researchers are able to use
phenotypic analyses of these individuals to help characterize the gene that has been knocked
out.
Knockouts in specific genes can cause genetic diseases, potentially have beneficial effects, or
even result in no phenotypic effect at all. However, determining a knockout’s phenotypic
effect and in humans can be challenging. Challenges to characterizing and clinically
interpreting knockouts include difficulty calling of DNA variants, determining disruption of
protein function (annotation), and considering the amount of influence mosaicism has on the
phenotype.
Conclusion:
The Human Genome Project also produced other advances, not expected to be accomplished
until much later. These included an advanced draft of the mouse genome and an initial draft
of the rat genome.
Medical researchers did not wait to use data from the Human Genome Project. When the
project began in 1990, fewer than 100 human disease genes had been identified. At the
project's conclusion in 2003, the number of identified disease genes had risen to more than
1,400.
The Human Genome Project focused on the DNA sequence of an individual. The next step
was to analyze DNA sequences from different populations. This catalog of human genetic
variation was called the HapMap. Completed in 2005, the HapMap used single nucleotide
polymorphisms called SNPs to identify large blocks of DNA sequence called haplotypes that
tend to be inherited together. To use the data, researchers compare haplotypes between
people with and without a disease. Haplotypes shared by people with the disease are then
examined in detail to look for associated genes. Already, scientists have used its data to
identify a gene associated with age-related macular degeneration, a disease responsible for
blindness among the elderly. It is expected that the HapMap will play an important role in
identifying many more disease genes in the future.

Human Genome

Uploaded by

Copyright:

Available Formats

Human Genome

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Human Genome

Uploaded by

Copyright:

Available Formats

HUMAN GENOME

Genome size : 3,234.83 Mb (Mega-basepairs) ...

Molecular organization and gene content:

Genomic variation in humans:

Human reference genome:

MEASURING HUMAN GENETIC VARIATION:

Most studies of human genetic variation have focused on single-nucleotide polymorphisms

Mapping human genomic variation:

In humans, gene knockouts naturally occur as heterozygous or homozygous loss-of-function

Populations with a high level of parental-relatedness result in a larger number of

You might also like