Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification of Classification Lookup VLR #82

Open
kjwaters opened this issue Jun 14, 2019 · 33 comments
Open

Clarification of Classification Lookup VLR #82

kjwaters opened this issue Jun 14, 2019 · 33 comments
Assignees
Milestone

Comments

@kjwaters
Copy link

I think the classification lookup VLR could use some additional information. As it is, the specification has the table represented as a 256 x 16 byte payload with each 16 bytes containing a 1 byte class number and a 15 byte description.

It doesn't specifically say it, but I assume that the table should include all class numbers from 0 to 255. What do I put in the descriptions for classes that aren't defined? Null? Spaces?

@esilvia
Copy link
Member

esilvia commented Jun 17, 2019

I always assumed that it would only include used classes, but on second glance you might be right. I'd imagine that you'd leave all unused fields as null. It could certainly use some clarification.

As we discussed at JALBTCX, I've never seen this VLR actually used. Maybe @lgraham-geocue or @rapidlasso can shed some light?

@rapidlasso
Copy link
Member

I have never ever seen it used in any of the many LAS / LAZ file that I have come across.

@lgraham-geocue
Copy link

lgraham-geocue commented Jun 17, 2019 via email

@kjwaters
Copy link
Author

kjwaters commented Jun 17, 2019 via email

@dpev
Copy link

dpev commented Jun 17, 2019 via email

@manfred-brands
Copy link

We at Fugro have implemented support for this.
We use the bathy extension classification for airborne LIDAR to tag points as WaterSurface (41) vs Seafloor (40).
We also want to use it for storing the different classifications found by our land based lidar systems.
We have used it to describe the result of cleaning, instead of marking all points as noise, mark then as deleted by user or deleted by filterX/filterY. But this was more for the benefit of our point cloud classifier.
Internally we use the same numbers and the table would only be for the benefit of consumers.

Regarding the original question, as per LAS standard, anything not used will be zero.

Indeed what to do when importing LAS files from different sources with different Classification Lookup?
One option would be to use the PointSourceId to determine the Lookup dictionary.
Alternatively when importing classifications could be merged and homogenised, assuming that we don't need more than 255 different classifications in total and a user could map overlapping classifications.

We have a team working on a LAS viewer/editor/manual classifier that could use/add this.

@kjwaters
Copy link
Author

kjwaters commented Jun 18, 2019 via email

@esilvia
Copy link
Member

esilvia commented Jul 15, 2019

@kjwaters We discussed this in today's bimonthly LWG call. We agreed that the current Classification VLR does appear to expect that all 256 records be present regardless of whether or not the strings are actually populated (e.g., there will also be a record for class 15, even if the description is all null).

However, we also agreed that 15 characters isn't nearly enough to write something useful and proposed that maybe Kirk's application would make for a good case study to design a Classification VLR v2 with the following characteristics:

  • Sparse VLR size that contains only lists classifications used.
  • 1-byte unsigned char Classification Number
  • 32-character Classification Name
  • 32(128?)-character Classification Description
  • Example VLR template that implements the ASPRS standard classifications.

At 1+32+128=161 bytes per Classification, that only puts you at 41,216 bytes if all 256 classifications are used, which means it still fits into a base VLR (max size 65,536).

@kjwaters
Copy link
Author

kjwaters commented Jul 16, 2019 via email

@esilvia
Copy link
Member

esilvia commented Jul 29, 2019

My understanding is that adding new VLRs is a minor change since it doesn't affect existing implementations.

@abellgithub
Copy link

This seems like a problem. We already have well-defined classification values from 0 - 18. Allowing re-assignment of those values should be prohibited.

@kjwaters
Copy link
Author

kjwaters commented Jul 29, 2019 via email

@abellgithub
Copy link

We have a specification. Table 17 lists the classifications. What may have happened with "older data" is irrelevant to the current specification.

@lgraham-geocue
Copy link

lgraham-geocue commented Jul 29, 2019 via email

@abellgithub
Copy link

The above suggestion breaks existing implementations.

@kjwaters
Copy link
Author

kjwaters commented Jul 29, 2019 via email

@abellgithub
Copy link

This was permitted by R13, as values up to 63 were reserved and values greater than 18 shouldn't have been used by a conforming file prior to R14. Values 64-256 were set aside as "user-definable".

@rapidlasso
Copy link
Member

A tidbit from outside of the A(SPRS): Almost all of the 16 different German state survey departments have switched from ASCII to the LAS / LAZ formats but use some of their own classification codes which are in the process of getting harmonized across the different state surveys.
AdV_classification

@Deguerre
Copy link

Following this ticket as we have also implemented support for user-definable classifications. (As an aside, "Transmission tower" has a standard code, but "Distribution pole" doesn't.)

I also interpreted the spec as saying that all 256 codes must be present, which makes the code field redundant.

@esilvia
Copy link
Member

esilvia commented Sep 29, 2020

I just had a thought on this. Would it make sense to add a point count field for each classification to the v2 of this VLR, or is that deviating too far from the VLR's intended purpose? #39 attempts to address this use case so it might be a little redundant.

@kjwaters
Copy link
Author

kjwaters commented Sep 29, 2020 via email

@hobu
Copy link
Contributor

hobu commented Sep 29, 2020

Would it make sense to add a point count field for each classification to the v2 of this VLR

Propose another one for "stats".

@esilvia
Copy link
Member

esilvia commented Nov 18, 2020

Thanks all for the feedback. Here's the proposal for a Classification Lookup VLR v2 as I understand it:

  1. 1-byte unsigned char for classification number.
  2. 32 characters for Classification name (expectation that it will match the existing Classification table, although perhaps translated into a non-English language)
  3. 64 characters for null-terminated Classification description (free-form)

Does this look right?

My final question is whether it makes more sense to design it to be sparse as discussed, or whether it should always have 256 records like the original Classification Lookup VLR. The argument in favor of the latter is that making the VLR a fixed size makes it possible to update/add/remove entries in the Classification Lookup VLR in-place without rewriting the entire LAS file.

@kjwaters
Copy link
Author

kjwaters commented Nov 18, 2020 via email

@lgraham-geocue
Copy link

lgraham-geocue commented Nov 18, 2020 via email

@esilvia
Copy link
Member

esilvia commented Nov 18, 2020

@kjwaters @lgraham-geocue Thanks for confirming I was thinking along the right lines. In that case I think it makes sense to drop the first byte as redundant and only include the names and descriptions. I'll start drafting this up.

@pchilds
Copy link

pchilds commented Feb 23, 2021

Sorry, just noticed https://lidarmag.com/2021/01/24/las-exchange-las-to-have-new-classification-lookup-vlr/.
I disagree with the statement "A brief survey of the community failed to produce a single person or example file using it" as @manfred-brands and @Deguerre have clearly stated that they use it and there has been interest shown by @kjwaters and @dpev in implementing it. I've implemented it, though I must admit it wasn't well received by my previous company and we did have issues with writing data in reserved categories and getting caught out by that. That was our fault though and we should have read the spec better. It does look like others get caught out by this too so maybe it could do with stressing more why they are reserved and the consequences for the unwary developer.
Anyway I find the classification lookup absolutely essential. Getting a las file without the semantics of 27, 123, 69, made explicit is about as annoying as getting a Hotine Oblique Mercator LAS file but no CRS VLR. A sidecar file at least tells something, but unless there is some clear uniting spec it requires overhead of negotiation and development per source/consumer.
I like the proposed v2. v1 is lacking in detail and the extended description will help. I am concerned; however, about legacy handling. As a dev, having to support a widely different v1 and v2 might be difficult. I would suggest that in #2 the proposed 32 chars revert to the 15 chars of v1. After all, if there is a longer description field to be expressive in, there should not be a problem keeping number 2 truncated to 15 chars for legacy compatibility.
I find 256 classes not enough for managing different customer needs so use these 15 chars like a UUID and each customer picking and choosing a 256-64 subset. Retaining the 15 chars for #2 would make development much smoother.
I also must admit I don't conform to the spec in that I leave off from the 256 what is not used. Re: "update/add/remove entries in the Classification Lookup VLR in-place without rewriting the entire LAS file" I find inserting a lookup a once off operation and if I would ever have to add an entry, then it would be as a result of editing the classification of one or more points, which if I'm going via lazzip then rewriting the whole file is pretty much a given. I've got to say that I don't have a strong opinion in this regard as it would not be much effort to change the code implementation without breaking anything. The main thing is to have something clear and consistent going forward.

@esilvia
Copy link
Member

esilvia commented Mar 8, 2021

@pchilds Thanks for the input on this. When I wrote that article, I did forget about the @manfred-brands post and interpreted the post from @Deguerre differently. That's my mistake.

I appreciate the feedback about whether or not the v2 VLR should be sparse. I think I agree that it makes the most sense for the spec to indicate that all 256 are not required, but users can choose to include them if the "edit-in-place" functionality is desired. I can add language to indicate how that should be done.

I appreciate your feedback about the classification name being more terse and wanting to maintain consistency with v1. However, I don't really understand the use case - are you trying to make it so that one can switch classification VLR versions more smoothly? In addition, when I look at the non-English examples from @rapidlasso I do note that they are all far longer than 16 characters, so extending it to 32 characters seems essential to me. We can discuss on the next LWG call in two weeks.

@pchilds
Copy link

pchilds commented Mar 9, 2021

The use case I can suggest is as an analytics provider. I'd classify a data set that I want to onsell to two customers. The first might have a system where classifications 64-82 represent genera of Hirundinidae, whereas the second wants to use an overlapping range for palm fruits. Now I don't want to have two code bases to deal with the conflict so I'll offload all these numbers outside of the code base and use the 15 char string internally as a unique identifier. I build an architecture of my code around a 15 char system and then a customer comes along who says "Can I have it in v2?" Great. I don't have to dump things in the description VLR and can put it more where it belongs. But do I have to change the architecture of my code base? I could hash from the 32 char strings to 15, but what do I do if there is a clash?
Personally I'm not in a place at this moment where it is disruptive. It is just if I end up wanting to support v1 and v2 for different customers I need another layer of abstraction, whereas if the 15 char field hung around then that is an element of commonality that I can capitalise on.

@pchilds
Copy link

pchilds commented Mar 9, 2021

Also if the Classification Name is to be localised, then 32 chars might feel just as tight as having 15 in English. You get 3 bytes in utf-8 for a lot of CJK characters, but at the same time those languages are more represented more tersely. The trouble is more in a lot of near eastern languages that get stuck with 2 bytes per character but need a similar amount of characters to express the same thing. Could push it longer as there's still room to play with in keeping the VLR below 2^16 but I'd almost feel it would be better spent expanding the description to 192 or so.

@esilvia
Copy link
Member

esilvia commented Mar 4, 2022

I re-read through this whole thread because I feel like it has wandered around somewhat and it's been almost a year since we last had substantial discussion on it. Here's the proposed structure for Classification Lookup VLR v2 again:

  • Fixed size VLR with space for 256 classes.
  • 32 bytes for class name
  • 128 bytes for class description
  • Total 32,800 bytes, which still fits in a standard VLR

There's enough room in a standard VLR to extend the class name to 64 bytes if we want to, but I don't think we want to pretend to support UTF-16... it just feels like a can of worms.


Here's a current link to the column I wrote for lidar magazine on this topic: https://lidarmag.com/2020/11/20/xyz-exchange/


While trying to wrap up R16 over the past month I'm strongly trending toward thinking that v2 of the Classification Lookup VLR belongs in LAS 1.5. Doing so provides a couple of opportunities:

  1. We could make it a required VLR in LAS 1.5.
  2. We can deprecate v1 to protect implementers from having to support both VLRs.
  3. We can break backwards compatibility with the original Classification Lookup VLR by lengthening the classification name field.

What do you all think? Should we push this forward in LAS 1.4 R16, or does it belong in 1.5? I expect 1.5 to happen late this year or in the first half of next year.

@Deguerre
Copy link

Deguerre commented Mar 24, 2022

So since I've been namechecked a couple of times... I'm doing a different job now, although it's one where the need for custom classification lookup will be important within the next year or so.

I'm inclined to agree with @esilvia here. The v1 VLR is just plain broken and should be deprecated. Given that, R16 is probably not the right place to deprecate it. I am also inclined to agree that i18n is a can of worms, but if the WG wants to handle it here, I suggest this as a compromise:

  • Extend to 64 bytes and mandate UTF-8.
  • Add an ISO 639 code to the VLR, and explicitly endorse multiple copies for multilingual applications or jurisdictions (e.g. Canada).

I assume by "required VLR", that means "compliant implementations must support it", not "compliant files must include it".

@esilvia
Copy link
Member

esilvia commented Jun 1, 2023

The discussion on whether or not to implement this change as a revision of 1.4 is now moot. In today's LWG meeting we agreed it makes sense to deprecate (remove?) v1 of the Classification Lookup VLR in 1.5 and replace it with v2 of the VLR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants