Skip to content

Fix missing output for nonsense and essential_splice impacts#22

Merged
FerriolCalvet merged 2 commits intodev/package_singlesamplefrom
dev/21-nonsense-n-essential_splice
Apr 22, 2024
Merged

Fix missing output for nonsense and essential_splice impacts#22
FerriolCalvet merged 2 commits intodev/package_singlesamplefrom
dev/21-nonsense-n-essential_splice

Conversation

@FerriolCalvet
Copy link
Collaborator

Problem

Missing outputs for nonsense and essential_splice impacts:

gene    sample  impact  dnds    pvalue  lower   upper
ARID1A  P19_0044_BDO_01 missense        1.0398697       0.8973031       0.49979153      1.8980241
ARID1A  P19_0044_BDO_01 nonsense                                
ARID1A  P19_0044_BDO_01 essential_splice                                
EP300   P19_0044_BDO_01 missense        1.798834        0.028565407     1.0808547       3.3448544
EP300   P19_0044_BDO_01 nonsense                                
EP300   P19_0044_BDO_01 essential_splice   

Possible reasons

There are 0s in the lambdas vectors and the tensorflow probablity functions don't seem to like it too much.

For missense:

l equal to [0.28558427 0.32617283 0.05533479 0.13334824 0.15745339 0.09242073
 0.08529546 0.30062553 0.20218417 0.2424905  0.08098787 0.27460638
 0.13486527 0.32502422 0.00684538 0.14846061 0.01359925 0.0575599
 0.04244946 0.01481647 0.01848553 0.05727808 0.06567501 0.09264341
 0.09275755 0.06764409 0.01564826 0.02131398 0.05997038 0.06238931
 0.01891936 0.12842454 0.0938124  0.12310488 0.17774798 0.04028124
 0.14306849 0.46776822 0.30274367 0.25046015 0.08864892 0.09785648
 0.07335839 0.25764883 0.108295   0.12602973 0.11663748 0.02268062
 0.07946959 0.05692631 0.01860485 0.02654143 0.02170661 0.04168417
 0.13690487 0.03569017 0.01251115 0.05133861 0.01112057 0.07303728
 0.0157018  0.01001406 0.00796137 0.01362443 0.16251975 0.08476225
 0.3185019  0.06119071 0.03862971 0.07041591 0.07634285 0.04007868
 0.08172115 0.01557059 0.09000723 0.0718038  0.02061916 0.03482537
 0.05826616 0.03305257 0.17029199 0.01897544 0.01860485 0.00884715
 0.007079   0.01311499 0.07524361 0.01182637 0.11420705 0.01738286
 0.03394764 0.01460746 0.0127372  0.01234038 0.01358624 0.01617238]

For nonsense:

l equal to [1.2262212e-02 2.1095350e-02 4.3984400e-03 3.9292038e-03 9.4510019e-03
 3.0427321e-03 4.3038931e-03 1.4362105e-02 1.8824339e-03 5.5378429e-03
 2.5045760e-03 2.4753483e-03 7.2978713e-02 4.8018314e-02 8.4807668e-03
 3.2509621e-02 5.8391481e-04 3.7227089e-03 2.6390641e-03 4.3657818e-04
 5.9400496e-05 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 1.3675435e-02 0.0000000e+00
 0.0000000e+00 0.0000000e+00 3.1742804e-02 2.1667564e-03 2.5523536e-02
 0.0000000e+00 5.5269703e-02 7.7360771e-03 2.5631344e-02 0.0000000e+00
 5.3811915e-02 2.7419524e-03 0.0000000e+00 0.0000000e+00 3.2564253e-02
 0.0000000e+00 5.9980094e-03 0.0000000e+00 5.0469371e-03 3.4141592e-03
 2.1656503e-03 1.6842084e-03 0.0000000e+00 8.1689312e-04 5.1226473e-04
 2.5927636e-04 1.7851978e-04 1.2167569e-03 1.9530891e-04 0.0000000e+00
 7.4089942e-03 1.8911036e-03 4.8782704e-03 1.7760373e-03 0.0000000e+00
 0.0000000e+00 3.4833457e-03 7.0859439e-04 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 1.3559515e-04 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 1.0814865e-02 1.1380531e-03 2.1656503e-03 5.6140282e-04 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 1.3559515e-04
 0.0000000e+00 0.0000000e+00 1.8345994e-03 0.0000000e+00 0.0000000e+00
 0.0000000e+00]

Explanation

There are only a limited number of changes that can lead to having a nonsense impact (stop codon creation, start codon truncation ... ) and same for the essential_splice variant types.

Then the 0s of the lambdas vector are real 0s, so filling them is not correct, and we would also never expect to see a mutation with that impact in any of the channels in which there is no probability (lambda = 0).

Then what we can do is remove the positions of the vector in which there is a lambda = 0 and also remove those positions from the mutations vector changing the shape from 96 channels to whatever number of channels are left after the filtering.

This is what I implemented. (see file changes in this PR)

After solving it

gene    sample  impact  dnds    pvalue  lower   upper
ARID1A  P19_0044_BDO_01 missense        1.0398697       0.8973031       0.49979153      1.8980241
ARID1A  P19_0044_BDO_01 nonsense        0.21724111      0.34792298      0.21724111      3.645586
ARID1A  P19_0044_BDO_01 essential_splice        0.21734934      0.5623948       0.21734934      9.230412
EP300   P19_0044_BDO_01 missense        1.798834        0.028565407     1.0808547       3.3448544
EP300   P19_0044_BDO_01 nonsense        3.609274        0.09682387      0.71474063      13.938253
EP300   P19_0044_BDO_01 essential_splice        0.21835195      0.5825151       0.21835195      10.202482

- happens for nonsense and essential_splice impacts
- fixed bug in subsetting the vectors not working
@FerriolCalvet FerriolCalvet self-assigned this Apr 22, 2024
@FerriolCalvet FerriolCalvet added the bug Something isn't working label Apr 22, 2024
@FerriolCalvet FerriolCalvet linked an issue Apr 22, 2024 that may be closed by this pull request
Copy link
Collaborator

@koszulordie koszulordie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still a question why the previous implementation failed to provide an output when the lambda (expectation) vectors had zero expectation components. But the proposed solution makes sense in that it should give in theory the same behavior as intended by the original MLE implementation and it seems to be working in practice.

@FerriolCalvet FerriolCalvet merged commit ac87eca into dev/package_singlesample Apr 22, 2024
@FerriolCalvet FerriolCalvet deleted the dev/21-nonsense-n-essential_splice branch April 22, 2024 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No output for nonsense and essential_splice impacts

2 participants