Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question on SKATER from Discord [2024-10-15] #463

Open
jGaboardi opened this issue Oct 16, 2024 · 2 comments
Open

question on SKATER from Discord [2024-10-15] #463

jGaboardi opened this issue Oct 16, 2024 · 2 comments
Labels

Comments

@jGaboardi
Copy link
Member

Hi everyone, I discovered the documentation of spopt yesterday. An extremely interesting project!

It just so happens that one of the tutorials fits in perfectly with one of my business use cases, in particular this one: Spatial ‘K'luster Analysis by Tree Edge Removal: Clustering Airbnb Spots in Chicago.

In this demonstration, we attempt to cluster polygons characterised by a number of Airbnb spots into N clusters. Unless I was mistaken, I thought that the distribution of the objective variable (number of Airbnb spots) would be balanced for each cluster.

But that's not what we see when we add up the number of spots per cluster created. Could you clarify my understanding? And congratulations again to the whole team for the work they've done.

Many thanks in advance. Best regards

(if my question doesn't belong here, don't hesitate to delete it)

@valentincorad
Copy link

Hello, to clarify my issue. Here is my project. I have a geographical aera divided into multiple cities (for each city I have a polygon geometry). Each city is characterised by a metric A (for instance number of clients) and metric B (number of prospects). I have a number of N salesmen. I try to create N contiguous region/cluster such as each cluster/salesman has a number of client similar and the a number of prospect similar too. I might have others objectives in the future.

After some researches I found that the methods "Max-p-regions" or "Skater" could resolve my problem. I tried to understand the demonstration made with Spatial ‘K’luster Analysis by Tree Edge Removal: Clustering Airbnb Spots in Chicago in spopt documentation.

At the end of this demonstration, differents number of clusters are tested. For each tests, the number of Airbnb spots by created cluster is computed. However I thought the distribution of number of Airbnb spots intercluster (column
"num_spots" would be balanced. Here is my misunderstanding.

image

I used Skater on my project and try to minimize these two metrics while having contiguous regions without any success:

'client_std' = gdf.groupby(region_id)['number_clients'].sum().std()
'prospect_std' = gdf.groupby(region_id)['number_prospects'].sum().std()

with gdf, my geodataframe.

May I have misunderstood the use of the model ? Maybe I have to find out the right combination of parameters (floor, trace, center,...) ?

Thank you very much for your help.

@ljwolf
Copy link
Member

ljwolf commented Nov 13, 2024

Hi! Thanks for your report. We will need more information about what you're inputting to skater to provide more detailed feedback, but:

SKATER tries to find geographic clusters with similar feature values. It does not attempt to balance the size of these clusters, it only ensures that clusters are always larger than floor.

So, these clusters should be internally homogenous (like, the features should be fairly similar within the cluster) and geographically compact (which depends on the inputted W object), but may be massively different sizes.

I think this is what your screenshot shows?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants