Skip to content

Explanation of species-level classifcation filed "s_unknown_species" #36

@your-highness

Description

@your-highness

Dear Temesgen,

I need some clarification on the output of slimm species-level classification "_profile.tsv". For a test sample I get the following output:

taxa_level      taxa_id linage  abundance       read_count
species 9606    k__Eukaryota|p__Chordata|c__Mammalia|o__Primates|f__Hominidae|g__Homo|s__Homo sapiens   89.1426 25602864
species 45219   k__Viruses|p__unknown_phylum|c__unknown_class|o__unknown_order|f__Arenaviridae|g__Mammarenavirus|s__Guanarito mammarenavirus    0.0178544       5128
species 1821749 k__Viruses|p__unknown_phylum|c__unknown_class|o__Picornavirales|f__Picornaviridae|g__Cardiovirus|s__Cardiovirus A       0.0136867       3931
species 0*      k__unknown_superkingdom|p__unknown_phylum|c__unknown_class|o__unknown_order|f__unknown_family|g__unknown_genus|s__unknown_species       10.8259 3109321

While most reads are classified as Human (89.14% of the reads), 11% of the reads are classified as unknown species. This is confusing because these reads are contained in the BAM file and must be mapped to a reference genome (bowtie2 --no-unal).

Does the fraction of 11% correspond to one species? Or: Could the species not be resolved because of missing taxonomic information? Or: Are these reads not discriminative for 1 species?

All the best,
Johannes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions