-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
ENH: argmax and argmin methods for sparse matrices #6761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable, overall. It might be nice to follow the pattern used by _min_or_max
, rather than handling the axis=None
and row/column-wise cases all together.
mat.sum_duplicates() | ||
|
||
line_size = mat.shape[axis] | ||
ret = np.empty(mat.shape[1 - axis], dtype=int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ret_size, line_size = mat._swap(mat.shape)
ret = np.zeros(ret_size, dtype=int)
line_size = mat.shape[axis] | ||
ret = np.empty(mat.shape[1 - axis], dtype=int) | ||
|
||
for i in range(ret.shape[0]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can avoid the loop entirely, but we can at least vectorize the first condition:
nz_lines, = np.diff(mat.indptr) > 0
for i in nz_lines:
p, q = mat.indptr[i:i+2]
data = mat.data[p:q]
# etc...
|
||
D2 = D1.transpose() | ||
|
||
classes = [bsr_matrix, coo_matrix, csr_matrix, csc_matrix] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you integrate this with test_base.py
, you can add the argmin/argmax tests to the _TestMinMax
class and follow the existing tests as a template.
for axis in [None, 0, 1]: | ||
mat = spmatrix(D) | ||
assert_raises(ValueError, mat.argmax, axis=axis) | ||
assert_raises(ValueError, mat.argmin, axis=axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these cases actually don't raise an error when mat
is a numpy array:
In [1]: x = np.ones((5,0))
In [2]: x.argmax(axis=0)
Out[2]: array([], dtype=int64)
@perimosocordiae great advices, I think I handled them now. If you are fine with the updated state, I will move tests to test_base.py. Another thing I forgot to mention. I decided that returning ndarray is more convenient than any form of sparse matrix, because I expect that in majority of situation people will want to work with ndarray eventually. Is it right decision? |
Looks good to me. I agree that dense results are reasonable, considering that an argmin/argmax of zero doesn't necessarily indicate missing data. If we want to follow the numpy matrix convention (which spmatrix mimics), the result should be a matrix (row matrix for axis=0, column matrix for axis=1). On the other hand, argmax/argmin are typically then used for indexing, where a flat ndarray is typically the most useful. I'm leaning toward the matrix return type for now, but I could be convinced otherwise. |
Maybe I'm wrong on that, but it seems to me that people usually avoid using numpy matrices. Leaving consistency aside, I think having ndarray right away is more practical. At least for me it's very true. Leave to you to decide. |
I made sure that the minimum possible index is always returned and moved tests to test_base.py. Could you please make the final decision about whether to return array or matrix? |
|
||
mat = self.spmatrix(D) | ||
|
||
assert_equal(mat.argmax(), argmax) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I think it's clearer to have tests of the form:
assert_equal(mat.argmax(), np.argmax(D))
Rather than computing all the expected results first.
8273e40
to
3c85bb1
Compare
OK, maybe later we can change to array everywhere (like for 1.0 release). I changed to matrix for now. |
I want this in version 0.19, so merging now. Thanks, @nmayorov! |
The methods were added in gh-6761.
Methods were added in scipygh-6761.
Issue #5883
I think my approach is reasonable and efficient in terms of algorithmic complexity, but maybe it can be done with less Python loops.
For now I included tests as a separate file (instead of test_base.py). Just easier to test and demonstrate what's going on. At the end we can move it to test_base.py.
@perimosocordiae please look when you can.