Alternatives to PCA, such as umap #27

Hellisotherpeople · 2024-03-27T17:25:46Z

There's a whole large body of work on dimensionality reduction which handles non linearity better - i.e. UMAP. https://umap-learn.readthedocs.io/en/latest/

Is it simple to just "drop" this in place of PCA and get theoretically better results? If not, why?

what about other things, like NMF https://en.wikipedia.org/wiki/Non-negative_matrix_factorization ?

vgel · 2024-04-06T08:27:35Z

Playing with UMAP currently! I have it working but it's pretty funky, needs small coefficients. Doesn't seem to be a huge improvement over PCA currently, but it's possible the way I'm doing it isn't ideal. Might include it experimentally in the upcoming release!

(Generating a vector with UMAP is also ~30x slower than PCA currently.)

vgel · 2024-04-06T08:28:12Z

Hellisotherpeople · 2024-04-11T05:36:59Z

Very interesting!

Given the issues you describe with performance of training, there is a CuML GPU implementation of UMAP (and a lot of other dimensionality reduction algorithms which could be offered) - https://docs.rapids.ai/api/cuml/stable/api/#umap - certainly a larger dependency chain but these days everyone's accepted nvidia's stack as being mandatory so it might be good to make optional at least.

I think there is some tuning you can do with base UMAP's hyperparamaters to improve speed and possibly the quality of the generated control vectors. A UMAP expert would be able to look over that and make sure it's set "correctly" given the data - unfortunately that is not me (and likely fewer than 100 of them exist in the world).

As far as to why it requires smaller coefficients and why the performance may be hard to quantify as better - I'd love to see some analysis about this from others in the community, or even the UMAP creator himself (or at least one of the aformentioned 100)

I'm extremely appreciative that you have implemented it yourself and tried it. Very happy to see such rapid response and that it might even be made available to others. Thank you!!!

vgel · 2024-05-24T23:09:17Z

umap is now experimentally supported as an (undocumented) option in #34 — use ControlVector.train(..., method="umap"), and ensure the umap-learn package is installed.

vgel · 2024-05-24T23:10:04Z

Please feel free to use this issue to continue discussing umap and potential improvements! I'm not sure if the current method is the ideal usage of it.

thiswillbeyourgithub · 2024-06-14T11:40:32Z

Thanks @vgel for all this.

I don't have a GPU and have little free time for quite some time still but I'm still very curious as to wether nonlinear dim reduction work "better".

Here are a few thoughts:

There are tons of dimred algorithm. For example pacmap
Each algorithm has usually lots of parameters, giving a lot of room for experiment. So it might be better to have the "method" argument accept a callable, taking as input the "train" var and be stored in "directions[layer]", offering maximum flexibility at seemingly little coding cost.
I'm especially interested in the effect of gradually changing the thresholds between local and global focus of those algorithm. For example in the umap API we can tune the densmap parameter.
Or hybrid approaches: use PCA for the rough direction, then add the vector * 0.1 of the umap transformed with a focus on local relationships. Also try with global focus.
Also, maybe PCA starts working quickly (=with few examples) but the cost is a greater loss in benchmark, whereas nonlinear dimred have less sensitivity (=need more examples) but greater specificity (=reducing those directions have less side effects)
In any case, those algorithms can be greatly speedup by first taking the PCA transformation over say 50 dimensions and then applying the non linear dimension algorithm over the transform. This might not defeat the purpose entirely and retain the hypothetical gains. AFAIK it's common practice as PCA allows checking we retain enough of the variance to make sure we're not screwing up the data.
Maybe it would be a well spend effort to constitude a standardized test rig before tuning all those things. These days I'm thinking about the recent papers about abliteration, well summarized into this blog post that nicely uploaded easy to use datasets with good and bad exemples of refusals. That might be the quickest way to create our own mini benchmark.

Anyway, I won't have time for about 6-12 months but may do a PR eventually.

If anyone's interested, please share your findings, especially negative results!

thiswillbeyourgithub · 2024-06-25T13:40:32Z

Addendum to my thoughts above (I hope nobody will mind!):
8. Instead of taking all the N samples and doing a 1 dimension PCA to deduce dimension, I'm thinking of another way:

Outline:
- Take the N samples
- Do a Kmeans with n_cluster=k
- Split the N samples into the k roughly equal clusters (KMeans has the nice property to tend to make even size clusters).
- Then do the 1D PCA over each cluster.
- Now for each inference: compute the distance between the current activation, and each cluster centroid and normalize these distances so they sum to 1.
- Now apply to the activation the k directions (1 per cluster), weighted by the distances.
- All resemblance to mixture of experts is intentional: the distances is a bit like the routing network, and the idea is to optimize the tradeoff between how effective representative engineering is without being too rough (=risk of side effects).

thiswillbeyourgithub · 2024-10-29T18:44:04Z

Hello, I am back and have a tiny bit of free time to devote to explore those ideas. I plan to document things in this fork

I saw that the owner of this repo tried already with umap with some success. Can you share your remarks from testing it as in depth as your time allows before diving myself?

thiswillbeyourgithub · 2024-11-26T18:28:36Z

Btw, PaCMAP is an alternative to UMAP that does non linear dimension reduction, has somewhat less free paramters, appears much simpler to install and package, and can actually output a 1 dimension output (it was not initially possible, cf this issue). The author will probably update the package soonish.

thiswillbeyourgithub · 2024-11-28T15:12:52Z

I'm struggling to make umap work, can you tell me :

how many examples you generate in your dataset
which model you use
what layer you apply the control vector to
what umap settings you're using
what strength you're using

It's making it harder to investigate PaCMAP

thiswillbeyourgithub · 2024-12-12T07:40:55Z

I'm struggling to make umap work, can you tell me :

how many examples you generate in your dataset

which model you use

what layer you apply the control vector to

what umap settings you're using

what strength you're using

It's making it harder to investigate PaCMAP

I am still interested in the answer :)

I think I figured out that trying to preserve the scale of the train array helps a lot. See https://github.com/thiswillbeyourgithub/repeng/blob/c0722440ce5f67d8be112ebe7a2ff3fd8e97ae80/repeng/extract.py#L479

Likewise to applying a regularization norm infered from the initial data.

thiswillbeyourgithub · 2024-12-12T13:04:31Z

Addendum to my thoughts above (I hope nobody will mind!): 8. Instead of taking all the N samples and doing a 1 dimension PCA to deduce dimension, I'm thinking of another way:

Outline:

Take the N samples

Do a Kmeans with n_cluster=k

Split the N samples into the k roughly equal clusters (KMeans has the nice property to tend to make even size clusters).

Then do the 1D PCA over each cluster.

Now for each inference: compute the distance between the current activation, and each cluster centroid and normalize these distances so they sum to 1.

Now apply to the activation the k directions (1 per cluster), weighted by the distances.

All resemblance to mixture of experts is intentional: the distances is a bit like the routing network, and the idea is to optimize the tradeoff between how effective representative engineering is without being too rough (=risk of side effects).

Btw something like that works great and is present in my fork

thiswillbeyourgithub · 2024-12-12T13:16:28Z

Basically I do umap/pacmap in 3 dimensions to project the samples, then kmeans to find 2 clusters, then substract the mean of each cluster to the sample of the other clusters then apply the pca_diff on the resulting data. It seems to work great. I can push the strength to like x5 and it stays coherent. Lots more things to try!

Edit: also the directions are pretty much always orthogonal to what pca diff would do, so it seems like there's a benefit to using umap/pacmap.

vgel changed the title ~~Couldn't we do better than PCA?~~ Alternatives to PCA, such as umap May 24, 2024

thiswillbeyourgithub mentioned this issue Nov 19, 2024

Memory Issues when Attempting to Load GGUF Tensors in transformers huggingface/transformers#34417

Closed

1 task

thiswillbeyourgithub mentioned this issue Nov 26, 2024

prototype support for n_components=1 YingfanWang/PaCMAP#83

Merged

thiswillbeyourgithub mentioned this issue Dec 4, 2024

Enhancements #55

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternatives to PCA, such as umap #27

Alternatives to PCA, such as umap #27

Hellisotherpeople commented Mar 27, 2024

vgel commented Apr 6, 2024 •

edited

Loading

vgel commented Apr 6, 2024

Hellisotherpeople commented Apr 11, 2024 •

edited

Loading

vgel commented May 24, 2024

vgel commented May 24, 2024

thiswillbeyourgithub commented Jun 14, 2024

thiswillbeyourgithub commented Jun 25, 2024 •

edited

Loading

thiswillbeyourgithub commented Oct 29, 2024

thiswillbeyourgithub commented Nov 26, 2024

thiswillbeyourgithub commented Nov 28, 2024

thiswillbeyourgithub commented Dec 12, 2024

thiswillbeyourgithub commented Dec 12, 2024

thiswillbeyourgithub commented Dec 12, 2024 •

edited

Loading

Alternatives to PCA, such as umap #27

Alternatives to PCA, such as umap #27

Comments

Hellisotherpeople commented Mar 27, 2024

vgel commented Apr 6, 2024 • edited Loading

vgel commented Apr 6, 2024

Hellisotherpeople commented Apr 11, 2024 • edited Loading

vgel commented May 24, 2024

vgel commented May 24, 2024

thiswillbeyourgithub commented Jun 14, 2024

thiswillbeyourgithub commented Jun 25, 2024 • edited Loading

thiswillbeyourgithub commented Oct 29, 2024

thiswillbeyourgithub commented Nov 26, 2024

thiswillbeyourgithub commented Nov 28, 2024

thiswillbeyourgithub commented Dec 12, 2024

thiswillbeyourgithub commented Dec 12, 2024

thiswillbeyourgithub commented Dec 12, 2024 • edited Loading

vgel commented Apr 6, 2024 •

edited

Loading

Hellisotherpeople commented Apr 11, 2024 •

edited

Loading

thiswillbeyourgithub commented Jun 25, 2024 •

edited

Loading

thiswillbeyourgithub commented Dec 12, 2024 •

edited

Loading