-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #359: fix NPE when parsing gc log file without any events #409
base: main
Are you sure you want to change the base?
Conversation
@microsoft-github-policy-service agree |
I've taken a look at the problem and I think it can be discovered earlier.
I would consider adding a new filter, something like this. That way the problem can be discovered as soon as the data source is being instantiated. I think an IllegalStateException is the appropriate exception to throw. Likely even better than the current ServiceConfigurationError. I guess that all depends on if any service providers were found instead of just a bad datasource that none of the service providers could deal with.
|
Sounds good. In this way, we can avoid modifying the functions in the parser :). |
The logic makes sense, but I found a new interesting challenge while implementing this. I am trying to seperate the exception caused by (a) none service providers were found (b) there are service providers, but none of them found the first time, so that we can throw IllegalStateException and ServiceConfigurationError separately. (I felt like this might be clearer for user to debug, rather than mix the exception message all together?) However, to do that, I might need to do the exact filtering twice (as the code shown below). private Diarizer diarizer() {
ServiceLoader<Diarizer> serviceLoader = load(Diarizer.class);
if (serviceLoader.findFirst().isPresent()) {
serviceLoader
.stream()
.map(Provider::get)
.filter(p -> p.isUnified() == isUnified())
.findFirst()
.orElseThrow(() -> new ServiceConfigurationError("Unable to find a suitable provider to create a diary"));
// cannot reuse the stream, because terminal operations will consume the stream.
return serviceLoader
.stream()
.map(Provider::get)
.filter(p -> p.isUnified() == isUnified())
.filter(p -> p.getDiary().hasTimeOfFirstEvent())
.findFirst()
.orElseThrow(() -> new IllegalStateException("No event is found in the datasource"));
} else {
try {
String clazzName = (isUnified()) ? "com.microsoft.gctoolkit.parser.jvm.UnifiedDiarizer" : "com.microsoft.gctoolkit.parser.jvm.PreUnifiedDiarizer";
Class clazz = Class.forName(clazzName, true, Thread.currentThread().getContextClassLoader());
return (Diarizer) clazz.getConstructors()[0].newInstance();
} catch (ClassNotFoundException | InstantiationException | IllegalAccessException | InvocationTargetException e) {
throw new ServiceConfigurationError("Unable to find a suitable provider to create a diary");
}
}
} In this case, this feels strange. Or maybe we can do the following: private Diarizer diarizer() {
ServiceLoader<Diarizer> serviceLoader = load(Diarizer.class);
if (serviceLoader.findFirst().isPresent()) {
Optional<Diarizer> diarizer = serviceLoader
.stream()
.map(Provider::get)
.filter(p -> p.isUnified() == isUnified())
.findFirst();
if (diarizer.isPresent()) {
if (diarizer.get().getDiary().hasTimeOfFirstEvent())
return diarizer.get();
else
throw new IllegalStateException("No event is found in the datasource");
} else {
throw new ServiceConfigurationError("Unable to find a suitable provider to create a diary");
}
} else {
try {
String clazzName = (isUnified()) ? "com.microsoft.gctoolkit.parser.jvm.UnifiedDiarizer" : "com.microsoft.gctoolkit.parser.jvm.PreUnifiedDiarizer";
Class clazz = Class.forName(clazzName, true, Thread.currentThread().getContextClassLoader());
return (Diarizer) clazz.getConstructors()[0].newInstance();
} catch (ClassNotFoundException | InstantiationException | IllegalAccessException | InvocationTargetException e) {
throw new ServiceConfigurationError("Unable to find a suitable provider to create a diary");
}
} In which, we find the first viable service provider, and only check if this service providers has events, under the assumption of if one diarizer finds no log, no other diarizer can find logs either. (I felt this assumption does not always hold). I appreciate your idea. |
I don't think your second block of code does the same thing. You want to find a provider that can read the log file's format (unified or preunified) AND is able to find a timestamp in the log. I suggest you do both of the filters on the first pass. If the stream fails to find a provider, then you might run through the providers again to figure out which exception to throw. Or gather up a list of providers that pass the isUnified() test then do the diarizer hasTimeOfFirstEvent() test in a separate loop. If the list of providers is empty, throw the ServiceConfigurationError. If the second loop fails to find a provider, throw the IllegalStateExeception. Personally, I'd rather see a ServiceConfigurationError for both with only the message changed. |
Thank you for your suggestions. No, these two blocks do not do the same thing. I will go with the second implementation, where we collect all providers that passed the isUnified() test then manipulate providers as a list. It should be okay that we convert the stream into a list, since there won't be a significantly large number of providers. IMO, I am down to throw a unified exception and provide different exception message. But this exception might be named as "ServiceConfigurationError". Semantically, this new exception is not related to configuration of the service, but rather a special property of the datasource. |
@kcpeppe I just found out another issue with this implementation when I am testing the code. Sorry for didn't find this earlier. The method mentioned here |
Good idea. Something like an InvalidDataSourceException would be appropriate. |
Having given this some more thought.... There are two cases here:
In either case, the PreUnifiedDiarizer will be selected (see com.microsoft.gctoolkit.io.GCLogFile#discoverFormat). If it's a valid GC log with no timestamps, then timeOfFirstEvent should default to epoch and I think (I'm not certain) the place to handle that is in com.microsoft.gctoolkit.jvm.Diary#getTimeOfFirstEvent. As noted in the bug description, com.microsoft.gctoolkit.aggregator.Aggregation#estimatedStartTime may need to account for this. If there are no events, then the only event the Aggregator should see is the termination event and everything should exit gracefully (once the NPE is resolved). |
Intended to fix #359.
About the fix:
getClock()
is initially null, it means there is no event in the file. Therefore, we throw an exception to terminate the remaining process.View Code
Discussion:
I am not very familar with the project. Any advice is welcome and will be highly valued.