User contributions for Entropeneur
Appearance
Results for Entropeneur talk block log uploads logs global block log global account filter log
A user with 521 edits. Account created on 28 July 2005.
13 September 2024
- 10:5010:50, 13 September 2024 diff hist 0 m Chinese restaurant process →Expected number of tables current
- 10:4710:47, 13 September 2024 diff hist −71 Chinese restaurant process →Expected number of tables: The formula here (for the case alpha>0) was not helpful, being expressed in terms of a 'non-standard' Gamma function, without further explanation. I replaced this with a formula from the same reference (Pitman2006), as found in exercise 3.2.3, eq. (3.13).
6 September 2024
5 September 2024
- 13:1013:10, 5 September 2024 diff hist −1 m Attention (machine learning) →Masked Attention
- 07:1207:12, 5 September 2024 diff hist +1,075 Attention (machine learning) →Mathematical representation: Added subsection for masked attention.
23 August 2024
- 14:4114:41, 23 August 2024 diff hist +75 Von Mises–Fisher distribution →Weighted Rademacher Distribution: Added expected value.
19 August 2024
- 21:0621:06, 19 August 2024 diff hist +4 m Attention (machine learning) →Standard Scaled Dot-Product Attention
- 08:1208:12, 19 August 2024 diff hist +27 m Attention (machine learning) →Multi-Head Attention
- 08:0808:08, 19 August 2024 diff hist +45 m Attention (machine learning) →Standard Scaled Dot-Product Attention
- 07:4307:43, 19 August 2024 diff hist +66 m Attention (machine learning) →Multi-Head Attention
- 07:4107:41, 19 August 2024 diff hist +200 Attention (machine learning) →Mathematical representation: Improved formatting.
- 07:3307:33, 19 August 2024 diff hist +351 Attention (machine learning) →Standard Scaled Dot-Product Attention: Improved formatting
13 August 2024
- 20:1520:15, 13 August 2024 diff hist +34 m Attention (machine learning) →Standard Scaled Dot-Product Attention
- 19:2419:24, 13 August 2024 diff hist +403 Attention (machine learning) →Mathematical representation: added citation
- 17:1517:15, 13 August 2024 diff hist +466 Wikipedia:Articles for creation/Redirects No edit summary
- 14:3714:37, 13 August 2024 diff hist +71 Perceiver →Components: Added link to QKV attention section on another page. current
- 14:3314:33, 13 August 2024 diff hist +204 Draft:QKV attention Submitting using AfC-submit-wizard
- 14:3314:33, 13 August 2024 diff hist +23 N Draft talk:QKV attention Adding WikiProject tags using AfC-submit-wizard
- 14:3014:30, 13 August 2024 diff hist +223 N Draft:QKV attention -- Draft creation using the WP:Article wizard -- This is a redirect to an existing section on another page.
- 14:1714:17, 13 August 2024 diff hist +6 m Attention (machine learning) →Multi-Head Attention
- 14:1614:16, 13 August 2024 diff hist +6 m Attention (machine learning) →Standard Scaled Dot-Product Attention
- 14:1414:14, 13 August 2024 diff hist +19 m Attention (machine learning) →Multi-Head Attention
- 14:1314:13, 13 August 2024 diff hist +383 Attention (machine learning) →Mathematical representation: I've added permutation properties for multi-head attention too.
- 13:4213:42, 13 August 2024 diff hist +9 m Attention (machine learning) →Standard Scaled Dot-Product Attention
- 13:4213:42, 13 August 2024 diff hist +16 m Attention (machine learning) →Standard Scaled Dot-Product Attention
- 13:2713:27, 13 August 2024 diff hist +106 m Attention (machine learning) →Mathematical representation
- 13:2113:21, 13 August 2024 diff hist +20 m Attention (machine learning) →Standard Scaled Dot-Product Attention
- 13:1013:10, 13 August 2024 diff hist +157 Attention (machine learning) →Mathematical representation: Added detail about QKV attention output being confined to convex hull.
- 13:0013:00, 13 August 2024 diff hist +405 Attention (machine learning) →Standard Scaled Dot-Product Attention: Added mention of permutation prooperties of self- and multi-head attention.
- 12:3912:39, 13 August 2024 diff hist −5 m Attention (machine learning) →Standard Scaled Dot-Product Attention
- 12:3312:33, 13 August 2024 diff hist +1,208 Attention (machine learning) →Standard Scaled Dot-Product Attention: I've Added permutation equivariance and invariance properties.
27 July 2024
- 07:4007:40, 27 July 2024 diff hist +1 m Generalized continued fraction →The equivalence transformation current
11 July 2024
- 20:1420:14, 11 July 2024 diff hist −1 m Chinese restaurant process →Relashionship between Dirichlet-categorical and one-parameter CRP
- 20:1320:13, 11 July 2024 diff hist +6 m Chinese restaurant process →The Dirichlet-categorical model
9 July 2024
- 05:5105:51, 9 July 2024 diff hist +2 m Chinese restaurant process →Stick-breaking process
- 05:2505:25, 9 July 2024 diff hist 0 Chinese restaurant process →The Dirichlet-categorical model: corrected typo in definition of \mathbf p.
8 July 2024
- 14:3814:38, 8 July 2024 diff hist −6 m Chinese restaurant process →The Dirichlet-categorical model
- 11:5511:55, 8 July 2024 diff hist +7 m Chinese restaurant process →Stick-breaking process
- 11:5211:52, 8 July 2024 diff hist +9 m Chinese restaurant process →Stick-breaking process
- 11:4811:48, 8 July 2024 diff hist +244 m Chinese restaurant process →Stick-breaking process
- 10:4810:48, 8 July 2024 diff hist +13 m Chinese restaurant process →Stick-breaking process
- 10:4210:42, 8 July 2024 diff hist +396 Chinese restaurant process →Stick-breaking process: I included also the stick-breaking recipe for the case alpha<0.
- 10:2510:25, 8 July 2024 diff hist +26 m Chinese restaurant process →Stick-breaking process
- 08:5608:56, 8 July 2024 diff hist +10 m Chinese restaurant process →Stick-breaking process
- 08:1408:14, 8 July 2024 diff hist +4 m Chinese restaurant process →Stick-breaking process
- 08:1308:13, 8 July 2024 diff hist +6 m Chinese restaurant process →Stick-breaking process
- 08:1208:12, 8 July 2024 diff hist +1,340 Chinese restaurant process →Two-parameter generalization: I added a subsection with details of the stick-breaking interpretation of the CRP.
- 07:1607:16, 8 July 2024 diff hist +359 Chinese restaurant process →The Dirichlet-categorical model: I simplified the partition probability formula and explained what happens in this formula when the number of blocks exceeds L.