Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write path normalization without array allocations #60812

Merged
merged 5 commits into from
Jan 9, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 93 additions & 17 deletions src/compiler/path.ts
Original file line number Diff line number Diff line change
Expand Up @@ -624,27 +624,103 @@ export function getNormalizedPathComponents(path: string, currentDirectory: stri
}

/** @internal */
export function getNormalizedAbsolutePath(fileName: string, currentDirectory: string | undefined): string {
return getPathFromPathComponents(getNormalizedPathComponents(fileName, currentDirectory));
export function getNormalizedAbsolutePath(path: string, currentDirectory: string | undefined): string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if currentDirectory is almost always normalized.

let rootLength = getRootLength(path);
if (rootLength === 0 && currentDirectory) {
path = combinePaths(currentDirectory, path);
rootLength = getRootLength(path);
}
const root = path.substring(0, rootLength);
const normalizedRoot = root && normalizeSlashes(root);
// `normalized` is only initialized once `path` is determined to be non-normalized
let normalized = normalizedRoot === root ? undefined : normalizedRoot;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should just use lastIndexOf to see if altDirectorySeparator occurs. That way you can avoid creating a substring for root at all.

let index = rootLength;
let segmentStart = index;
let normalizedUpTo = index;
let seenNonDotDotSegment = rootLength !== 0;
while (index < path.length) {
andrewbranch marked this conversation as resolved.
Show resolved Hide resolved
// At beginning of segment
segmentStart = index;
let ch = path.charCodeAt(index);
while (isAnyDirectorySeparator(ch) && index + 1 < path.length) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a bit to see why this works, but it's because index starts immediately after the root, or after the first separator following the prior segment. Figuring out if one of those is a backslash is handled below.

index++;
ch = path.charCodeAt(index);
}
if (index > segmentStart) {
if (normalized === undefined) {
// Seen superfluous separator
normalized = path.substring(0, segmentStart - 1);
}
andrewbranch marked this conversation as resolved.
Show resolved Hide resolved
segmentStart = index;
}
// Past any superfluous separators
const sepIndex = path.indexOf(directorySeparator, index + 1);
const altSepIndex = path.indexOf(altDirectorySeparator, index + 1);
Copy link
Member

@DanielRosenwasser DanielRosenwasser Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quadratic because you have to find at least one of the other kinds of slashes over and over. You should just tight-loop on !isAnyDirectorySeparator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or use findIndex(path, isAnyDirectorySeparator, index + 1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern with findIndex is that it might not optimize well it passed in both arrays and strings. Maybe I'm superstitious.

let segmentEnd = sepIndex === -1 ? altSepIndex : altSepIndex === -1 ? sepIndex : Math.min(sepIndex, altSepIndex);
if (segmentEnd === -1) {
segmentEnd = path.length;
}
if (segmentEnd === altSepIndex && normalized === undefined) {
// Seen backslash
normalized = path.substring(0, segmentStart);
}
andrewbranch marked this conversation as resolved.
Show resolved Hide resolved
const segmentLength = segmentEnd - segmentStart;
if (segmentLength === 1 && path.charCodeAt(index) === CharacterCodes.dot) {
// "." segment (skip)
if (normalized === undefined) {
normalized = path.substring(0, normalizedUpTo);
}
andrewbranch marked this conversation as resolved.
Show resolved Hide resolved
}
else if (segmentLength === 2 && path.charCodeAt(index) === CharacterCodes.dot && path.charCodeAt(index + 1) === CharacterCodes.dot) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: If the previous segmentLength was 1, but the segment was not ., this performs an unnecessary recheck of segmentLength === 2.

// ".." segment
if (!seenNonDotDotSegment) {
if (normalized !== undefined) {
normalized += normalized.length === rootLength ? ".." : "/..";
}
else {
normalizedUpTo = index + 2;
}
}
else if (normalized === undefined) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in this branch it is technically possible for you to avoid extra substrings by adjusting normalizedUpTo instead of initializing normalized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that it matters since you'd still need to allocate a substring when returning.

if (normalizedUpTo - 2 >= 0) {
normalized = path.substring(0, Math.max(rootLength, path.lastIndexOf(directorySeparator, normalizedUpTo - 2)));
}
else {
normalized = path.substring(0, normalizedUpTo);
}
}
else {
const lastSlash = normalized.lastIndexOf(directorySeparator);
if (lastSlash !== -1) {
normalized = normalized.substring(0, Math.max(rootLength, lastSlash));
}
else {
normalized = normalizedRoot;
}
if (normalized.length === rootLength) {
seenNonDotDotSegment = rootLength !== 0;
}
}
}
else if (normalized !== undefined) {
if (normalized.length !== rootLength) {
normalized += directorySeparator;
}
seenNonDotDotSegment = true;
normalized += path.substring(segmentStart, segmentEnd);
}
else {
seenNonDotDotSegment = true;
normalizedUpTo = segmentEnd;
}
index = segmentEnd + 1;
}
return normalized ?? (path.length > rootLength ? removeTrailingDirectorySeparator(path) : path);
}

/** @internal */
export function normalizePath(path: string): string {
path = normalizeSlashes(path);
// Most paths don't require normalization
if (!relativePathSegmentRegExp.test(path)) {
return path;
}
// Some paths only require cleanup of `/./` or leading `./`
const simplified = path.replace(/\/\.\//g, "/").replace(/^\.\//, "");
if (simplified !== path) {
path = simplified;
if (!relativePathSegmentRegExp.test(path)) {
return path;
}
}
// Other paths require full normalization
const normalized = getPathFromPathComponents(reducePathComponents(getPathComponents(path)));
const normalized = getNormalizedAbsolutePath(path, "");
return normalized && hasTrailingDirectorySeparator(path) ? ensureTrailingDirectorySeparator(normalized) : normalized;
}

Expand Down
15 changes: 15 additions & 0 deletions src/testRunner/unittests/paths.ts
Original file line number Diff line number Diff line change
Expand Up @@ -317,9 +317,24 @@ describe("unittests:: core paths", () => {
assert.strictEqual(ts.getNormalizedAbsolutePath("", ""), "");
assert.strictEqual(ts.getNormalizedAbsolutePath(".", ""), "");
assert.strictEqual(ts.getNormalizedAbsolutePath("./", ""), "");
assert.strictEqual(ts.getNormalizedAbsolutePath("./a", ""), "a");
// Strangely, these do not normalize to the empty string.
assert.strictEqual(ts.getNormalizedAbsolutePath("..", ""), "..");
assert.strictEqual(ts.getNormalizedAbsolutePath("../", ""), "..");
assert.strictEqual(ts.getNormalizedAbsolutePath("../..", ""), "../..");
assert.strictEqual(ts.getNormalizedAbsolutePath("../../", ""), "../..");
assert.strictEqual(ts.getNormalizedAbsolutePath("./..", ""), "..");
assert.strictEqual(ts.getNormalizedAbsolutePath("../../a/..", ""), "../..");

// More .. segments
assert.strictEqual(ts.getNormalizedAbsolutePath("src/ts/foo/../../../bar/bar.ts", ""), "bar/bar.ts");
assert.strictEqual(ts.getNormalizedAbsolutePath("src/ts/foo/../../..", ""), "");
// not a real URL root!
assert.strictEqual(ts.getNormalizedAbsolutePath("file:/Users/matb/projects/san/../../../../../../typings/@epic/Core.d.ts", ""), "../typings/@epic/Core.d.ts");
// the root is `file://Users/`
assert.strictEqual(ts.getNormalizedAbsolutePath("file://Users/matb/projects/san/../../../../../../typings/@epic/Core.d.ts", ""), "file://Users/typings/@epic/Core.d.ts");
// this is real
assert.strictEqual(ts.getNormalizedAbsolutePath("file:///Users/matb/projects/san/../../../../../../typings/@epic/Core.d.ts", ""), "file:///typings/@epic/Core.d.ts");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk if we want to preserve specific examples in these, but that might be fine.

Maybe we should have tests on untitled:?


// Interaction between relative paths and currentDirectory.
assert.strictEqual(ts.getNormalizedAbsolutePath("", "/home"), "/home");
Expand Down
Loading