Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent persistent storage directory collisions for similiar repository URLs #72

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion src/ComposerFileStorage.php
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,30 @@ public static function create(string $url, Config $config): self
{
$basePath = implode(DIRECTORY_SEPARATOR, [
static::basePath($config),
preg_replace('/[^[:alnum:]\.]/', '-', $url),
static::escapeUrl($url),
]);
return new static($basePath);
}

/**
* Converts a repository URL to a unique directory name.
*
* @param string $url
* A repository URL.
*
* @return string
* A version of the URL suitable to use as the name of the persistent
* storage directory.
*/
public static function escapeUrl(string $url): string
{
$escapedUrl = preg_replace('/[^[:alnum:]\.]/', '-', $url);
// Append a partial hash of the unescaped URL, to prevent URLs like
// `https://www.site.coop.info/packages` from colliding with
// `https://www.site.coop/info/packages`, whilst keeping the directory
// names easily distinguishable.
$hashedUrl = hash('sha256', $url);

return $escapedUrl . '-' . substr($hashedUrl, 0, 8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of only using the first 8 chars? Presumably we are using sha256 because we want a long hash that is very unlikely to conflict? doesn't using only part of it defeat that purpose. Why use 1 of the hash_algos() that just produces a shorter string. I don't think we worried about attacks where people try take advantage of the possible matches, correect?

Copy link
Collaborator Author

@phenaproxima phenaproxima Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is that we want just enough of the hash to virtually guarantee uniqueness, but to keep the final generated directory name understandable by humans. Maybe crc32b would suffice? (I'd certainly prefer to remove the string manipulation if possible.)

I'll wait for @davidstrauss to make the final call here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I'll regret suggesting we can truncate the SHA-256 output substantially. It's the kind of decision that I'd want to do deliberately if we do it at all, and I'm not certain how to compute the bytes of collision prevention we need here. On the other hand, I know it's safe to use the mangled name + SHA2, as the SHA2 alone would be defensibly unique.

If we're worried about an unwieldy file name, there are other serializations of SHA-256 output that have greater density than hex. So, maybe get the hash in binary and base64_encode() it? The output should be about 34-40 characters.

Finally, I'd consider just rawurlencode() and no hash at all. That escaping is RFC-compliant to the point where you shouldn't see the implementation's output drift. Percent encoding isn't particularly obnoxious to use for files/shell purposes, as the percent character isn't as special in that context as other escaping/quoting options.

You're welcome to run forward with any of these options (full hex SHA256, base64 SHA256, or robust escaping) without waiting on further input from me. :-)

}
}
21 changes: 19 additions & 2 deletions tests/ComposerFileStorageTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -50,23 +50,40 @@ public function testBasePath(): void
$this->assertSame($expectedPath, ComposerFileStorage::basePath($config));
}

/**
* @covers ::escapeUrl
*/
public function testEscapeUrl(): void
{
// Ensure that two very similar URLs are converted into unique, but
// readable, directory names.
$url1 = ComposerFileStorage::escapeUrl('https://site.coop/info/packages');
$url2 = ComposerFileStorage::escapeUrl('https://site.coop.info/packages');

$this->assertNotSame($url1, $url2);
$this->assertMatchesRegularExpression('/^https---site\.coop-info-packages-[a-z0-9]{8}$/', $url1);
$this->assertMatchesRegularExpression('/^https---site\.coop\.info-packages-[a-z0-9]{8}$/', $url2);
}

/**
* @covers ::__construct
* @covers ::create
*
* @depends testBasePath
* @depends testEscapeUrl
*/
public function testCreate(): void
{
$url = 'https://example.net/packages';
$config = new Config();

$basePath = implode(DIRECTORY_SEPARATOR, [
ComposerFileStorage::basePath($config),
'https---example.net-packages',
ComposerFileStorage::escapeUrl($url),
]);
$this->assertDirectoryDoesNotExist($basePath);

ComposerFileStorage::create('https://example.net/packages', $config);
ComposerFileStorage::create($url, $config);
$this->assertDirectoryExists($basePath);
}
}