-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: new tests added for tsne to expand test coverage #2229
base: main
Are you sure you want to change the base?
feature: new tests added for tsne to expand test coverage #2229
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Flags with carried forward coverage won't be shown. Click here to find out more. |
/intelci: run |
/intelci: run |
…nt results, merge previous deleted gpu test to complex test
/intelci: run |
It looks like we don't have any test here nor in daal4py that would be checking that the results from TSNE make sense beyond having the right shape and non-missingness. Since there's a very particular dataset here for the last test, it'd be helpful to add other assertions there along the lines of checking that the embeddings end up making some points closer than others as would be expected given the input data. |
…or parametrization names, removed extra tests
Hi David,
|
I think given the characteristics of the data that you are passing, it could be done by selecting some hard-coded set of points by index from "Complex Dataset1" that should end up being similar, and some selected set of points that should end up being dissimilar to the earlier ones; with the test then checking that the euclidean distances in the embedding space among each point from the first set are smaller than the distances between each point in the first set and each point in the second set. Also maybe "Complex Dataset2" is not needed. |
Hi David! |
/intelci: run |
Odd that the CI fails for sklearn1.0 and 1.1. I see that there's many places throughout the code with conditions for sklearn<1.2 though: |
Hi David!
I actually tested using sklearn and see same behavior here is my simple script and I also see a non zero embedding in the end. I removed the check for 0 embedding because we see zero and non zero embedding for different versions, hope this helps! |
Then please move it into a separate test, using Also I don't think the test should look for exact zeros everywhere. For constant data, should test for a constant first dimension, and a zero second dimension. |
@pytest.mark.parametrize("dtype", [np.float32, np.float64]) | ||
def test_tsne_reproducibility(dataframe, queue, dtype): | ||
""" | ||
Test reproducibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's better remove obvious or redundant comments. Same for example for error messages in assertions - you will see the line that failed in a log when that happens, and many of those cases will be pretty obvious from the code line, line "isfinite failed".
Description
Added additional tests in sklearnex/manifold/tests/test_tsne.py to expand the test coverage for t-SNE algorithm.
PR completeness and readability
Testing