Add float32 compatibility to KMedoids #120

TimotheeMathieu · 2021-06-24T18:41:52Z

I implemented the changes suggested by @rth in PR#83 for dtypes in kmedoids.
In Clara, the dtype changes can make a consequent speedup but for kmedoids, I don't know why but there is no speedup:

Here is a small benchmark :

KMedoids

	wall_time	cpu_time	peak_memory
('build', 'float32')	0.178639	0.49873	1.14062
('build', 'float64')	0.166929	0.396058	0.0859375
('heuristic', 'float32')	0.0497368	0.255722	0.0117188
('heuristic', 'float64')	0.0783268	0.250505	60.8164
('k-medoids++', 'float32')	0.0602323	0.261578	0.00390625
('k-medoids++', 'float64')	0.0871284	0.284815	61.0117

CLARA

	wall_time	cpu_time	peak_memory
('build', 'float32')	0.944996	3.84636	0.160156
('build', 'float64')	0.794827	5.21157	1.21875
('heuristic', 'float32')	0.956299	4.54045	0
('heuristic', 'float64')	0.803466	5.22568	0
('k-medoids++', 'float32')	0.940615	6.23457	0
('k-medoids++', 'float64')	1.14663	6.11493	0.00390625

For Clara we can design settings in which the difference between 64 and 32 bit is quite large. Here I used n_sample = 200 000 in dimension 100 for CLARA and sampling_size = 200 . I used n_sample = 2000 for KMedoids.

Code

import numpy as np
import neurtu
from sklearn_extra.cluster import KMedoids, CLARA

X = np.random.normal(size=(100_000, 100))
X_32 = X.astype(np.float32)

def make_experiment(dtype, init):
    if dtype == 'float32':
        X2 = X_32
    else:
        X2 = X
    #km = CLARA(init=init, sampling_size= 50, n_clusters=9)
    km = KMedoids(n_clusters = 9, init = init)
    km.fit(X2)


def cases():
    for init in ['build', "heuristic", "k-medoids++"]:
        for dtype in ['float32', 'float64']:
            tags = {'init' : init, "dtype": dtype}
            yield neurtu.delayed(make_experiment, tags=tags)(dtype, init)
        
        
bench = neurtu.Benchmark(wall_time=True, cpu_time=True, peak_memory=True)
df = bench(cases()) # 2

rth

Thanks could you parametrize an existing test to run both on float64 and float32 input? and maybe check that the output of transform is of the same dtype. Otherwise LGTM.

Something like,

@pytest.mark.parametrize('dtype', [np.float64, np.float32])
def test_...(dtype):
    X_input = X_input.astype(dtype)
    ..

add float32 compatibility

6e10819

rth reviewed Jun 24, 2021

View reviewed changes

TimotheeMathieu added 3 commits June 24, 2021 20:46

black

e083fd2

fix CI and add test on type

b797947

add test on transform

985f38e

rth approved these changes Jun 24, 2021

View reviewed changes

TimotheeMathieu merged commit 445aaf8 into scikit-learn-contrib:main Jun 24, 2021

TimotheeMathieu deleted the clara_32bit branch June 24, 2021 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add float32 compatibility to KMedoids #120

Add float32 compatibility to KMedoids #120

TimotheeMathieu commented Jun 24, 2021

rth left a comment •

edited

Loading

Add float32 compatibility to KMedoids #120

Add float32 compatibility to KMedoids #120

Conversation

TimotheeMathieu commented Jun 24, 2021

rth left a comment • edited Loading

Choose a reason for hiding this comment

rth left a comment •

edited

Loading