-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make http response example more robust #460
Open
wingman-jr-addon
wants to merge
10
commits into
mdn:main
Choose a base branch
from
wingman-jr-addon:make-http-response-example-more-robust
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
38289f2
Making http-response more robust to demonstrate real-world usage
jessetrana da9856e
Small typos
jessetrana 334cb10
Ensure TextDecoder flushes at the end
jessetrana 8f5f700
Travis is complaining about the escaping of quote being useless
jessetrana dea141b
Removing Example/example.com
jessetrana 74be63d
Removing unhelpful error handling
jessetrana 5b58552
HTTP response code check and fixing mutatedStr regression
jessetrana 6e374f5
Trying to clean up charset detection code
jessetrana d518c15
Streaming replacement improvements and a bit of cleanup
jessetrana 6a5b6da
Adding Rob W's example as a link in the README
jessetrana File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,118 @@ | ||
function listener(details) { | ||
// If the HTTP response code is not OK, just let it flow through normally. | ||
if (details.statusCode < 200 || 300 <= details.statusCode) { | ||
console.log('HTTP Status Code was '+details.statusCode+' not 2XX for '+details.url+', skipping filtering.'); | ||
return; | ||
} | ||
|
||
// The received data is a stream of bytes. In order to do text-based | ||
// modifications, it is necessary to decode the bytes into a string | ||
// using the proper character encoding, do any modifications, then | ||
// encode back into a stream of bytes. | ||
// | ||
// In order to use the correct decoding, one needs to detect the charset. | ||
// Please note that there are many complex rules to detect the charset, | ||
// and no approach with scanning only the response headers will be | ||
// fully accurate. The simplified approach here is to find the | ||
// Content-Type and extract the charset if found. | ||
|
||
let {responseHeaders} = details; | ||
|
||
// Find the last Content-Type header. | ||
let contentTypeHeader = responseHeaders | ||
.slice().reverse() | ||
.find(h => h.name.toLowerCase() == "content-type"); | ||
|
||
// If Content-Type header is not set, the browser is going to do content-sniffing, | ||
// and we should also return to avoid trouble (e.g. breaking downloads, PDFs, videos, ...). | ||
if (contentTypeHeader === undefined) { | ||
console.log('Content-Type header not found for '+details.url+', skipping filtering'); | ||
return; | ||
} | ||
|
||
// If it not a supported content type, we will return rather than guess. | ||
let baseType; | ||
let contentType = contentTypeHeader.value.trim(); | ||
if(contentType.startsWith('text/html')) { | ||
baseType = 'text/html'; | ||
} else if (contentType.startsWith('application/xhtml+xml')) { | ||
baseType = 'application/xhtml+xml'; | ||
} else { | ||
console.log('Content type '+contentType+' not supported for '+details.url+', skipping filtering.'); | ||
return; | ||
} | ||
|
||
// Set up TextDecoder | ||
console.log('Initial checks passed, beginning charset detection for '+details.url); | ||
let charset = detectCharset(contentType) || 'utf-8'; | ||
let decoder = new TextDecoder(charset); | ||
console.log('The detected charset was '+charset+' for '+details.url); | ||
|
||
// While TextDecoder supports most charset encodings, TextEncoder does NOT support | ||
// other than 'utf-8', so it is necessary to change the Content-Type on the header | ||
// to UTF-8. If modifying this block of code, ensure that the tests at | ||
// https://www.w3.org/2006/11/mwbp-tests/index.xhtml | ||
// pass - current implementation only fails on #9 but this detection ensures | ||
// tests #3, 4, 5, and 8 pass. | ||
let encoder = new TextEncoder(); | ||
contentTypeHeader.value = baseType+';charset=utf-8'; | ||
|
||
|
||
// Now the actual filtering can begin! | ||
let filter = browser.webRequest.filterResponseData(details.requestId); | ||
let decoder = new TextDecoder("utf-8"); | ||
let encoder = new TextEncoder(); | ||
|
||
filter.ondata = event => { | ||
let str = decoder.decode(event.data, {stream: true}); | ||
// Just change any instance of Example in the HTTP response | ||
// to WebExtension Example. | ||
str = str.replace(/Example/g, 'WebExtension Example'); | ||
filter.write(encoder.encode(str)); | ||
filter.disconnect(); | ||
let unprocessedStr = ''; | ||
let searchString = 'Test'; | ||
let leaveUnprocessedLength = searchString.length - 1; | ||
|
||
filter.ondata = e => { | ||
// Note that the event's data may break in the middle of an encoded | ||
// character - the stream parameter is critical for success as this | ||
// method gets called multiple times. | ||
unprocessedStr += decoder.decode(e.data, {stream: true}); | ||
// Process the received data as far as possible. | ||
// Note this replacement is rather naive but demonstrates the idea | ||
// If the search string was contained in the replacement string, | ||
// for instance, the repeated replacement like this could be bad. | ||
unprocessedStr = unprocessedStr.replace(/Test/g, 'WebExtension Check'); | ||
if(unprocessedStr.length > leaveUnprocessedLength) { | ||
let processedStr = unprocessedStr.substr(0, leaveUnprocessedLength); | ||
unprocessedStr = unprocessedStr.substr(leaveUnprocessedLength); | ||
filter.write(encoder.encode(processedStr)); | ||
} | ||
} | ||
|
||
filter.onstop = async _ => { | ||
// Flush the decoding buffer | ||
unprocessedStr += decoder.decode(); | ||
// Flush our replacement buffer | ||
let processedStr = unprocessedStr.replace(/Test/g, 'WebExtension Check'); | ||
filter.write(encoder.encode(processedStr)); | ||
filter.close(); | ||
} | ||
|
||
return {}; | ||
// Because details response headers have been mutated, return them | ||
return details; | ||
} | ||
|
||
// This code tries to snag the last charset indicated | ||
// but is still not robust to poorly formed inputs. | ||
function detectCharset(contentType) { | ||
let charsetMarker = "charset="; | ||
let foundIndex = contentType.lastIndexOf(charsetMarker); | ||
if (foundIndex == -1) { | ||
return undefined; | ||
} | ||
let charsetMaybeQuoted = contentType.substr(foundIndex+charsetMarker.length).trim().toLowerCase(); | ||
let charset = charsetMaybeQuoted.replace(/"/g, ''); | ||
return charset; | ||
} | ||
|
||
browser.webRequest.onBeforeRequest.addListener( | ||
// Set up the actual webRequest hook | ||
browser.webRequest.onHeadersReceived.addListener( | ||
listener, | ||
{urls: ["https://example.com/*"], types: ["main_frame"]}, | ||
["blocking"] | ||
); | ||
{ | ||
urls: ["https://www.w3.org/*"], // Include W3 for testing charset detection. | ||
types: ["main_frame"] | ||
}, | ||
["blocking","responseHeaders"] | ||
); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The charset detection logic as implemented below is incorrect (e.g. when multiple
charset
parameters are set, the last one should be used; when multiple MIME types are set, the previously set charset should be forgotten). Let's keep the example simple, however, and make it more obvious that the implementation is incomplete.Here is an example of a more complete implementation that also covers some edge cases: Rob--W/open-in-browser@a6b926e
As you can see, it's quite involved - do not put the code in the PR. You can link to it if you wish though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's wisdom here; thanks for the link.
I've been thinking about your comment about an obviously incomplete implementation. While the idea of a complete solution here is appealing, the simplified solution will likely cover 99% of the cases and the comment will make the careful reader aware of the problem. Lack of awareness of this problem - albeit obvious in retrospect - is why I'm wanting to contribute back a more robust example in the first place; a user reported a bug to me in a review of my addon. I'd like others to avoid the same fate, as their users may not be as considerate as mine was.
Given both the complexity of determining the charset as well as the need for it to be accurate in order to filter text properly, do you think the browser-predicted charset would ever be a candidate to expose on the
details
object? It just seems like right now it would be incredibly difficult to create a truly correct implementation - do you know of any extant addons that are actually correct in this regard?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a good idea. I have thought of it before and posted a feature request at https://bugzilla.mozilla.org/show_bug.cgi?id=1425479
Unfortunately the feature request was denied. If you post a new bug that shows use cases and why alternatives aren't a good fit, the feature request can be re-evaluated.
The Open in Browser add-on that I linked before was authored by me, and modelled after Firefox's implementation (which includes non-standard behavior).