Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #359: fix NPE when parsing gc log file without any events #409

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Lmh-java
Copy link

@Lmh-java Lmh-java commented Jan 6, 2025

Intended to fix #359.

About the fix:

  1. Introduced a null guard. When getClock() is initially null, it means there is no event in the file. Therefore, we throw an exception to terminate the remaining process.
public void diary(Diary diary) {
      this.diary = diary;
      this.clock = diary.getTimeOfFirstEvent();
      if (this.clock == null) {
          LOGGER.log(Level.SEVERE, "Time of first event is null, are there any events presented in the log file?");
          throw new IllegalStateException("Missing log events");
      }
  }

View Code

  1. Added 2 test cases, which reproduces this issue and validate the fix (Test data files will be provided in another PR to the test data repo, once this PR is finalized).

Discussion:

  1. Is termination through a RuntimeException the best way to end the process.
  2. Probably a null guard in GCLogParser is not the best position, but I cannot find a better position to check for null. In the current context, IMO, only parser knows if there is any events present in the file.

I am not very familar with the project. Any advice is welcome and will be highly valued.

@Lmh-java
Copy link
Author

Lmh-java commented Jan 6, 2025

@microsoft-github-policy-service agree

@Lmh-java Lmh-java marked this pull request as ready for review January 6, 2025 02:29
@kcpeppe
Copy link
Collaborator

kcpeppe commented Jan 6, 2025

I've taken a look at the problem and I think it can be discovered earlier.

    private Diarizer diarizer() {
        ServiceLoader<Diarizer> serviceLoader = load(Diarizer.class);
        if (serviceLoader.findFirst().isPresent()) {
            return serviceLoader
                    .stream()
                    .map(Provider::get)
                    .filter(p -> p.isUnified() == isUnified())
                    .findFirst()
                    .orElseThrow(() -> new ServiceConfigurationError("Unable to find a suitable provider to create a diary"));
        } else {
            try {
                String clazzName = (isUnified()) ? "com.microsoft.gctoolkit.parser.jvm.UnifiedDiarizer" : "com.microsoft.gctoolkit.parser.jvm.PreUnifiedDiarizer";
                Class clazz = Class.forName(clazzName, true, Thread.currentThread().getContextClassLoader());
                return (Diarizer) clazz.getConstructors()[0].newInstance();
            } catch (ClassNotFoundException | InstantiationException | IllegalAccessException | InvocationTargetException e) {
                throw new ServiceConfigurationError("Unable to find a suitable provider to create a diary");
            }
        }
    }

I would consider adding a new filter, something like this. That way the problem can be discovered as soon as the data source is being instantiated. I think an IllegalStateException is the appropriate exception to throw. Likely even better than the current ServiceConfigurationError. I guess that all depends on if any service providers were found instead of just a bad datasource that none of the service providers could deal with.

            return serviceLoader
                    .stream()
                    .map(Provider::get)
                    .filter(p -> p.isUnified() == isUnified())
                    .filter(p -> p.getDiary().hasTimeOfFirstEvent())
                    .findFirst()
                    .orElseThrow(() -> new ServiceConfigurationError("Unable to find a suitable provider to create a diary"));

@Lmh-java
Copy link
Author

Lmh-java commented Jan 7, 2025

Sounds good. In this way, we can avoid modifying the functions in the parser :).

@Lmh-java
Copy link
Author

Lmh-java commented Jan 8, 2025

The logic makes sense, but I found a new interesting challenge while implementing this. I am trying to seperate the exception caused by (a) none service providers were found (b) there are service providers, but none of them found the first time, so that we can throw IllegalStateException and ServiceConfigurationError separately. (I felt like this might be clearer for user to debug, rather than mix the exception message all together?) However, to do that, I might need to do the exact filtering twice (as the code shown below).

private Diarizer diarizer() {
        ServiceLoader<Diarizer> serviceLoader = load(Diarizer.class);
        if (serviceLoader.findFirst().isPresent()) {
             serviceLoader
                .stream()
                .map(Provider::get)
                .filter(p -> p.isUnified() == isUnified())
                .findFirst()
                .orElseThrow(() -> new ServiceConfigurationError("Unable to find a suitable provider to create a diary"));
           // cannot reuse the stream, because terminal operations will consume the stream.
            return serviceLoader
                    .stream()
                    .map(Provider::get)
                    .filter(p -> p.isUnified() == isUnified())
                    .filter(p -> p.getDiary().hasTimeOfFirstEvent())
                    .findFirst()
                    .orElseThrow(() -> new IllegalStateException("No event is found in the datasource"));
        } else {
            try {
                String clazzName = (isUnified()) ? "com.microsoft.gctoolkit.parser.jvm.UnifiedDiarizer" : "com.microsoft.gctoolkit.parser.jvm.PreUnifiedDiarizer";
                Class clazz = Class.forName(clazzName, true, Thread.currentThread().getContextClassLoader());
                return (Diarizer) clazz.getConstructors()[0].newInstance();
            } catch (ClassNotFoundException | InstantiationException | IllegalAccessException | InvocationTargetException e) {
                throw new ServiceConfigurationError("Unable to find a suitable provider to create a diary");
            }
        }
    }

In this case, this feels strange.


Or maybe we can do the following:

private Diarizer diarizer() {
        ServiceLoader<Diarizer> serviceLoader = load(Diarizer.class);
        if (serviceLoader.findFirst().isPresent()) {
             Optional<Diarizer> diarizer = serviceLoader
                .stream()
                .map(Provider::get)
                .filter(p -> p.isUnified() == isUnified())
                .findFirst();

             if (diarizer.isPresent()) {
                 if (diarizer.get().getDiary().hasTimeOfFirstEvent())
                     return diarizer.get();
                 else
                     throw new IllegalStateException("No event is found in the datasource");
             } else {
                 throw new ServiceConfigurationError("Unable to find a suitable provider to create a diary");
             }
        } else {
            try {
                String clazzName = (isUnified()) ? "com.microsoft.gctoolkit.parser.jvm.UnifiedDiarizer" : "com.microsoft.gctoolkit.parser.jvm.PreUnifiedDiarizer";
                Class clazz = Class.forName(clazzName, true, Thread.currentThread().getContextClassLoader());
                return (Diarizer) clazz.getConstructors()[0].newInstance();
            } catch (ClassNotFoundException | InstantiationException | IllegalAccessException | InvocationTargetException e) {
                throw new ServiceConfigurationError("Unable to find a suitable provider to create a diary");
            }
        }

In which, we find the first viable service provider, and only check if this service providers has events, under the assumption of if one diarizer finds no log, no other diarizer can find logs either. (I felt this assumption does not always hold).

I appreciate your idea.

@dsgrieve
Copy link
Member

dsgrieve commented Jan 8, 2025

I don't think your second block of code does the same thing. You want to find a provider that can read the log file's format (unified or preunified) AND is able to find a timestamp in the log.

I suggest you do both of the filters on the first pass. If the stream fails to find a provider, then you might run through the providers again to figure out which exception to throw. Or gather up a list of providers that pass the isUnified() test then do the diarizer hasTimeOfFirstEvent() test in a separate loop. If the list of providers is empty, throw the ServiceConfigurationError. If the second loop fails to find a provider, throw the IllegalStateExeception.

Personally, I'd rather see a ServiceConfigurationError for both with only the message changed.

@Lmh-java
Copy link
Author

Lmh-java commented Jan 9, 2025

I don't think your second block of code does the same thing. You want to find a provider that can read the log file's format (unified or preunified) AND is able to find a timestamp in the log.

I suggest you do both of the filters on the first pass. If the stream fails to find a provider, then you might run through the providers again to figure out which exception to throw. Or gather up a list of providers that pass the isUnified() test then do the diarizer hasTimeOfFirstEvent() test in a separate loop. If the list of providers is empty, throw the ServiceConfigurationError. If the second loop fails to find a provider, throw the IllegalStateExeception.

Personally, I'd rather see a ServiceConfigurationError for both with only the message changed.

Thank you for your suggestions. No, these two blocks do not do the same thing. I will go with the second implementation, where we collect all providers that passed the isUnified() test then manipulate providers as a list. It should be okay that we convert the stream into a list, since there won't be a significantly large number of providers.

IMO, I am down to throw a unified exception and provide different exception message. But this exception might be named as "ServiceConfigurationError". Semantically, this new exception is not related to configuration of the service, but rather a special property of the datasource.

@Lmh-java
Copy link
Author

Lmh-java commented Jan 9, 2025

hasTimeOfFirstEvent

@kcpeppe I just found out another issue with this implementation when I am testing the code. Sorry for didn't find this earlier.

The method mentioned here diarizer() is intended to get a proper diarizer, but at this moment, the diarizer has not started parsing the log file, which means the hasTimeOfFirstEvent() will always be false. The value of timeOfFirstEvent can only be set after the execution of diarize(String line) function. So is this might be too early to discover this problem?

@dsgrieve
Copy link
Member

dsgrieve commented Jan 9, 2025

IMO, I am down to throw a unified exception and provide different exception message. But this exception might be named as "ServiceConfigurationError". Semantically, this new exception is not related to configuration of the service, but rather a special property of the datasource.

Good idea. Something like an InvalidDataSourceException would be appropriate.

@dsgrieve
Copy link
Member

dsgrieve commented Jan 9, 2025

Having given this some more thought....

There are two cases here:

  1. a valid GC log with no time stamps
  2. a file that just isn't a GC log

In either case, the PreUnifiedDiarizer will be selected (see com.microsoft.gctoolkit.io.GCLogFile#discoverFormat).

If it's a valid GC log with no timestamps, then timeOfFirstEvent should default to epoch and I think (I'm not certain) the place to handle that is in com.microsoft.gctoolkit.jvm.Diary#getTimeOfFirstEvent. As noted in the bug description, com.microsoft.gctoolkit.aggregator.Aggregation#estimatedStartTime may need to account for this.

If there are no events, then the only event the Aggregator should see is the termination event and everything should exit gracefully (once the NPE is resolved).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NullPointerException when processing file without GC Events
3 participants