Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset shape #4

Open
RhianTaylor6 opened this issue Jan 19, 2022 · 18 comments
Open

Dataset shape #4

RhianTaylor6 opened this issue Jan 19, 2022 · 18 comments

Comments

@RhianTaylor6
Copy link

Hello,

Trying to use this to figure out the CSG of MNIST Fashion loaded from keras. No matter what pre-processing I try I keep getting a shape error. I figure it should be similar to whatever pre-processing was used on normal MNIST and was hoping you could help?

Here is the error I am getting:
boolean index did not match indexed array along dimension 1; dimension is 28 but corresponding boolean dimension is 10

@Dref360
Copy link
Owner

Dref360 commented Jan 19, 2022

Hello,

Do you have some code I could look at?

We expect an array with shape [N, num_features] so you need to flatten the images.

@RhianTaylor6
Copy link
Author

Hi this was my latest attempt, but I don't think I am flattening the images here...not entirely sure how haha

#from spectral_metric.estimator import CumulativeGradientEstimator
#from spectral_metric.visualize import make_graph
from keras.datasets import fashion_mnist,mnist
from keras.utils import to_categorical
import numpy as np

#(X_train, y_train), test_data = fashion_mnist.load_data()
(X_train, y_train), test_data = fashion_mnist.load_data()

Normalize the images.

X_train = (X_train / 255) - 0.5

reshape dataset to have a single channel

X_train = X_train.reshape((X.shape[0],28,28,1))
y_train = to_categorical(y_train)

estimator = CumulativeGradientEstimator()
estimator.fit(data=X_train, target=y_train)
csg = estimator.csg # The actual complexity values.
estimator.evals, estimator.evecs # The eigenvalues and vectors.

@Dref360
Copy link
Owner

Dref360 commented Jan 19, 2022

You can flatten the images with:

X_train = X_train.reshape((X.shape[0],-1))

@RhianTaylor6
Copy link
Author

then I get a similar error: boolean index did not match indexed array along dimension 1; dimension is 784 but corresponding boolean dimension is 10

@Dref360
Copy link
Owner

Dref360 commented Jan 19, 2022

Ah I see the issue, you must not call to_categorical here.
And we expect an array with a single dimension.

y_train = y_train.reshape([-1])

@Dref360 Dref360 closed this as completed Jan 19, 2022
@RhianTaylor6
Copy link
Author

Thank you so much! That worked and MNIST Fashion has a CSG of 0.61860741 in case anyone else needs to know!

@Dref360
Copy link
Owner

Dref360 commented Jan 19, 2022

Awesome! So a bit easier than notMNIST

image

@RhianTaylor6
Copy link
Author

Yep! And that kinda makes sense doesn't it! :)

@RhianTaylor6
Copy link
Author

Hello again, just had a quick question about the metrics in your paper, did you take an average for CSGs at all over a certain number of calculations?

@Dref360
Copy link
Owner

Dref360 commented Jan 19, 2022

This is the average over 20 runs I think (it's been a while)? But the standard deviation was very small as you can see in Figure 2.

@RhianTaylor6
Copy link
Author

That's amazing thank you so much! As you may have guessed I am using the CSG from your paper in my own work and I just want to make sure the MNIST Fashion CSG value is inline with the others1 I really appreciate your help!

@RhianTaylor6
Copy link
Author

Sorry last question, do you have a reference for the github project that you would prefer I use or just reference your paper again?

@Dref360
Copy link
Owner

Dref360 commented Jan 19, 2022

Referencing the CVPR paper is perfect thank you.

@Dref360
Copy link
Owner

Dref360 commented Jan 21, 2022

WHen it is available, send me a link and I'll add it to the README :)

@RhianTaylor6
Copy link
Author

RhianTaylor6 commented Jan 21, 2022 via email

@RhianTaylor6
Copy link
Author

Hello again,
I have been trying to calculate the CSG of CIFAR-10 because I want the range to display on a chart. In your paper it is reported as being 1 but I am getting it as over 3 and I am not sure why, would you be so kind as to assist me again?

@Dref360
Copy link
Owner

Dref360 commented Jan 26, 2022

Yeah sure.

For the paper, we got CIFAR10 embeddings using an autoencoder and ran t-SNE on it. We used MultiCoreTSNE.

CNN encoder code: https://github.com/Dref360/spectral_metric/blob/master/experiments/embedding/cnn_autoencoder.py
t-SNE code: https://github.com/Dref360/spectral_metric/blob/master/experiments/embedding/tsne.py

To compare datasets, they need to be from similar embedding and in the paper, we only showed scores for CNN+t-SNE I think.

@Dref360 Dref360 reopened this Jan 28, 2022
@RhianTaylor6
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants