Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict Angular SSR to paths in the sitemap #3682

Merged
merged 1 commit into from
Jan 21, 2025

Conversation

alanorth
Copy link
Contributor

@alanorth alanorth commented Nov 22, 2024

References

Description

Only enable Angular SSR for paths in the DSpace sitemap and the home page. This is a compromise after analyzing high CPU usage in DSpace 7+ and discussion with the Google Scholar team. We do not need to be wasting CPU and memory to generate and store SSR pages in the cache for request paths that are not "primary" DSpace objects, for example search and browse—these request paths contain data derived from the primary objects themselves and bots can spend endless time crawling them.

This solution was originally proposed by @vitorsilverio in #3110 (comment).

Some notes:

  • This will require manual porting to DSpace 7
  • We should keep our eye on upstream work related to inlineCriticalCss because it improves the user experience. We disabled it in DSpace 7.6.2 and 8.1 because it made SSR perform even more poorly

Instructions for Reviewers

Please add a more detailed description of the changes made by your PR. At a minimum, providing a bulleted list of changes in your PR is helpful to reviewers.

List of changes in this PR:

  • Restrict SSR to request paths for primary DSpace objects like bitstreams, items, entities, communities, and collections, as well as the home page

Include guidance for how to test or review your PR.
Try browsing the repository to see if all pages work as expected.

Checklist

This checklist provides a reminder of what we are going to look for when reviewing your PR. You do not need to complete this checklist prior creating your PR (draft PRs are always welcome).
However, reviewers may request that you complete any actions in this list if you have not done so. If you are unsure about an item in the checklist, don't hesitate to ask. We're here to help!

  • My PR is created against the main branch of code (unless it is a backport or is fixing an issue specific to an older branch).
  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & specs/tests), or I have provided reasons as to why that's not possible.
  • My PR passes ESLint validation using npm run lint
  • My PR doesn't introduce circular dependencies (verified via npm run check-circ-deps)
  • My PR includes TypeDoc comments for all new (or modified) public methods and classes. It also includes TypeDoc for large or complex private methods.
  • My PR passes all specs/tests and includes new/updated specs or tests based on the Code Testing Guide.
  • My PR aligns with Accessibility guidelines if it makes changes to the user interface.
  • My PR uses i18n (internationalization) keys instead of hardcoded English text, to allow for translations.
  • My PR includes details on how to test it. I've provided clear instructions to reviewers on how to successfully test this fix or feature.
  • If my PR includes new libraries/dependencies (in package.json), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.
  • If my PR includes new features or configurations, I've provided basic technical documentation in the PR itself.
  • If my PR fixes an issue ticket, I've linked them together.

@alanorth alanorth added bug high priority performance / caching Related to performance, caching or embedded objects port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release port to dspace-8_x This PR needs to be ported to `dspace-8_x` branch for next bug-fix release labels Nov 22, 2024
@alanorth
Copy link
Contributor Author

Tests are failing because CI is checking for SSR on /home. We can fix this by:

  1. Adding /home to the SSR paths, or
  2. Using another path

The first option is probably the best because /home is one of the only paths that is guaranteed to work by default in DSpace. On the other hand, I just realized our list of SSR-enabled paths will include such endless tarpits like:

https://demo.dspace.org/entities/person/3b087e38-cd6b-4d85-9409-99a9f6f03425?spc.page=1&query=search

With entity search pages we have many combinations of pages depending on filters and number of items similar to /search. Bots will crawl those and get SSR pages, which is a massive waste of CPU and memory.

Perhaps this requires a re-think. What about inverting the logic and enabling SSR for everything, but disabling it on certain paths?

@ybnd
Copy link
Member

ybnd commented Nov 22, 2024

On the other hand, I just realized our list of SSR-enabled paths will include such endless tarpits like:

https://demo.dspace.org/entities/person/3b087e38-cd6b-4d85-9409-99a9f6f03425?spc.page=1&query=search

With entity search pages we have many combinations of pages depending on filters and number of items similar to /search. Bots will crawl those and get SSR pages, which is a massive waste of CPU and memory.

@alanorth #3231 should cover that

@tdonohue
Copy link
Member

tdonohue commented Nov 22, 2024

@alanorth : Thank you so much for getting this PR created! I was just asking someone to do this in yesterday's Developers Meeting.

Regarding the failing tests, I'd recommend adding /home to the list of SSR paths, because many bots/harvesters will start at your homepage (especially if they don't use sitemaps). So, I think that the homepage should always provide SSR.

One other suggestion. I think it'd be better to make these paths configurable instead of hardcoding them in the server.ts. It could look something like this:

ssr:
    paths: [ '/items/', '/entities/', '/collections/', '/communities/', '/bitstream/', '/bitstreams/' ]

(You'd have to update the existing ssr-config.interface.ts to support this new option)

Then in the code use environment.ssr.paths.

I'd argue that there also should be a way to enable SSR for everything (to retain current behavior). Perhaps that's the default behavior if this environment.ssr.paths configuration is unspecified or empty.

Overall, I do like this PR & support adding it quickly. I just want to add more flexibility to the configuration, as there's a chance that different sites will want to add additional paths (or keep the default behavior of SSR enabled for every path).

@alanorth
Copy link
Contributor Author

@alanorth : Thank you so much for getting this PR created! I was just asking someone to do this in yesterday's Developers Meeting.

You're welcome. I saw the meeting notes and was surprised that there wasn't already a PR, since I've been using versions of this patch for a few months already.

Regarding the failing tests, I'd recommend adding /home to the list of SSR paths...

Yes, agreed.

One other suggestion. I think it'd be better to make these paths configurable

Oh good idea, I didn't know about ssr-config.interface.ts. I will be offline for a few days but can work on this soon.

@alanorth alanorth force-pushed the angular-ssr-sitemap-3110 branch from 3d544f6 to ccd0449 Compare December 8, 2024 17:57
@alanorth
Copy link
Contributor Author

alanorth commented Dec 8, 2024

I've updated this to use a configurable array of paths, including /home. I think I've done it correctly (my testing appears to show it works).

Duplicating the configuration of the ssr.paths array in each of the environment configurations feels strange to me. I don't know how we decide which default configurations get to go into src/config/default-app-config.ts or if there is a better way.

Copy link

github-actions bot commented Dec 8, 2024

Hi @alanorth,
Conflicts have been detected against the base branch.
Please resolve these conflicts as soon as you can. Thanks!

@alanorth alanorth force-pushed the angular-ssr-sitemap-3110 branch from 34c38b6 to eeccff2 Compare December 8, 2024 18:22
@ybnd ybnd self-requested a review December 10, 2024 08:52
@nwoodward
Copy link
Contributor

@alanorth This PR looks good. To make the paths list more easily configurable, would it make more sense to add them to the ssr section of config/config.example.yml? I'm afraid they will be harder to configure in the src/environments/*.ts files.

# Angular Server Side Rendering (SSR) settings
ssr:
# Whether to tell Angular to inline "critical" styles into the server-side rendered HTML.
# Determining which styles are critical is a relatively expensive operation; this option is
# disabled (false) by default to boost server performance at the expense of loading smoothness.
inlineCriticalCss: false

@alanorth
Copy link
Contributor Author

alanorth commented Jan 8, 2025

To make the paths list more easily configurable, would it make more sense to add them to the ssr section of config/config.example.yml

@nwoodward I wasn't sure about the interaction between these defaults. I think the ones in src/environments/*.ts are the defaults, and we can put them in the example config YAML files as well. I see others like inlineCriticalCss defined in both so I assume there is some inheritance or defaulting the values initialized in src/environments/*.ts? I will try to test this week.

@tdonohue
Copy link
Member

tdonohue commented Jan 8, 2025

@alanorth : To answer your question, the config.example.yml is simply for documentation purposes. It provides examples & comments of how to configure available settings. It is not used anywhere though. But in our Installation docs we recommend you create a config.prod.yml based on the existing config.example.yml.

Any settings you set in your config.*.yml will override any default values set in src/environments/*.ts or in src/config/default-app-config.ts. So, with this PR, it should already be possible to configure this setting in your config.*.yml to override the defaults

That said, I would also recommend we add this setting to the config.example.yml with a brief explanation (in comments) along with the default value. This just makes the configuration more visible to installers...without them having to search the documentation.

@alanorth
Copy link
Contributor Author

alanorth commented Jan 8, 2025

Great, thanks @tdonohue. I will add the SSR paths to config.example.yml too so people can easily customize for their environment. I forgot that the dev and prod YAML files are not in git.

server.ts Outdated Show resolved Hide resolved
@alanorth alanorth force-pushed the angular-ssr-sitemap-3110 branch from eeccff2 to 1174068 Compare January 10, 2025 06:54
@alanorth
Copy link
Contributor Author

Thanks for the feedback 🙇 . I've updated the patch to use the startsWith() method instead of includes() and added the paths to config.example.yaml to help users know how to override them. I also did basic tests to make sure customizing the paths works for serving CSR for certain paths.

@nwoodward
Copy link
Contributor

Thanks @alanorth! Everything looks good. I'm running into an issue testing it that may be due to my ignorance about how SSR and CSR work. I'm trying to test it locally on http://locahost:4000 with the backend on http://localhost:8080/server, and I have config.dev.yml and config.prod.yml files in /config, both of which have the paths list copied over from config.example.yml. I've tested with npm run start:dev and npm start to try both YAML config files.

I tested all the paths on the list, and they all were rendered by SSR. Then I removed /items from the list and rebuilt Angular. But it's still getting rendered by SSR. I did the same test with removing /communities and got the same results. But other paths that aren't in the list, such as /search, are not being rendered by SSR. So these changes appear to be working, but for some reason I can't remove a path from the list. I wonder if it's not a caching problem, even though I'm doing a hard refresh on every page. I'll look into it.

@alanorth
Copy link
Contributor Author

alanorth commented Jan 10, 2025

@nwoodward I only tested in production mode with npm run start. Looking at the scripts in package.json now I think that dev mode doesn't use SSR so that might be what you are seeing.

Also, it helps to enable cache.serverSide.debug in the config so you get the log of hits and misses in the console. Try with a browser, then with curl for example.

@tdonohue
Copy link
Member

@nwoodward : @alanorth is correct. SSR only works if you are running in Production Mode. (See how to do that in that README link. To do Production mode on 8.x/7.x you need to use yarn instead of npm obviously)

The best way to test SSR is by starting the UI in production mode on localhost:4000, access it, and then turn off Javascript in your Browser. For example: https://developer.chrome.com/docs/devtools/javascript/disable

With Javascript disabled, you can ONLY see parts of the User Interface that have gone through SSR. Essentially, you are browsing the site like a crawler would. All links/buttons should work, and SSR pages should load properly. However, anything that requires Javascript (e.g. some animations or dropdowns) or CSR will not work.

You could test this PR by comparing it to the https://sandbox.dspace.org. The Sandbox will use SSR on every page, while this PR should not (so some pages should not load with Javascript disabled)

@nwoodward
Copy link
Contributor

@tdonohue @alanorth OK, thanks for the additional information. As I mentioned, I believe I was testing this in production mode with npm start, though it was with the frontend and backend running locally. I'll take another look.

Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alanorth : I gave this a test today. Mostly it's working great. I can verify that paths not listed in the new ssr.paths configuration do not undergo SSR (the pages just appear blank when Javascript is disabled). Conversely, those path listed in that configuration are all undergoing SSR... so the page will still load the main content with Javascript disabled.

However, I've found a small bug in the logic for the homepage when Javascript is disabled. If you access the homepage via http://localhost:4000/home, then it loads via SSR. However, if you access it via http://localhost:4000/ then it will not load (because the root path doesn't undergo SSR).

I think we may want to see if there's a way to simply hardcode that the root path (/) always undergoes SSR. I initially tried adding '/' to the list of ssr.paths, but that causes all paths to use SSR, because startsWith('/') will always pass for every path.

We may need to add an "OR" clause next to the startsWith logic in server.ts which checks if it's the root path (/), and if so, executes SSR.

@alanorth alanorth force-pushed the angular-ssr-sitemap-3110 branch from 1174068 to 451b262 Compare January 14, 2025 06:55
@alanorth
Copy link
Contributor Author

alanorth commented Jan 14, 2025

Thanks @tdonohue! Good catch. I added an explicit check for request to the root path:

  if (environment.ssr.enabled && req.method === 'GET' && (req.path === '/' || environment.ssr.paths.some(pathPrefix => req.path.startsWith(pathPrefix)))) {
...

I tested and it's working with curl and with Javascript disabled in the browser for requests to the root.

@alanorth alanorth requested a review from tdonohue January 14, 2025 12:26
@alanorth alanorth dismissed tdonohue’s stale review January 14, 2025 17:49

Added exception for root path.

Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks @alanorth ! I've retested this today, and this is now working fully. So, the small change to add SSR on / worked. I also think it's appropriate to hardcode the / path to use SSR, as that path will often be the one used by crawlers to locate the robots.txt or Sitemap.

So, I feel this is ready to be merged & ported to both 8.x and 7.x. However, I'm going to wait until after tomorrow's Developers Meeting to merge this, just in case anyone else has final feedback to express.

@tdonohue tdonohue added this to the 9.0 milestone Jan 15, 2025
@MMilosz
Copy link

MMilosz commented Jan 16, 2025

Sorry if it's too basic question–I'm still getting familiar with the frontend side of things! But I don't see anyone here mentioning URL parameters in SSR, and I'm curious about what happens if I visit the same URL with different parameters, or parameters in a different order.

To check that, I tested this on sandbox.dspace.org using the /home page in DevTools:

screenshot_20250116172147

Why did /home?test=3 load twice – I mean twice for >1 second and then twice for ~200 ms? Did the SSR not finish the first time, causing a second render? I'm concerned that with multiple requests this might potentially RIP the frontend

Oh, and also – should we include a /500 page as well?

@autavares-dev
Copy link
Contributor

@MMilosz, I tested as you suggested in today's meeting the /handle/ URLs.

They do need to be included in the paths configuration to render correctly with SSR. As it stands now, the redirect to the respective DSpace object page does not occur (tested with JS disabled in the browser and with curl).

@tdonohue
Copy link
Member

@alanorth : Per the feedback above, could you update this to include /handle paths? It appears those need to be added here as well.

@tdonohue
Copy link
Member

tdonohue commented Jan 16, 2025

@MMilosz : Per your comment, this PR already will work regardless of params on the URL path. It's matching URL paths using "startsWith()", which means that /home will match with /home?test=1 as well. So, we don't need to worry about params.

As for the /500 page. I've tested, and this does not need to be added to the list. When using this PR, I turned off Javascript, and then shutdown the backend. I verified that the "500 Service Unavailable" message will still display if you access any of the other paths (e.g. /home, or any Community, Collection or Item).

I also did verify that @autavares-dev is correct. The /handle/ path must be added to this PR. Without it, Handle redirects do not work via SSR. However, if I add /handle/ to the list of ssr.paths, then Handle redirects work properly.

Because Angular SSR is not very efficient, after discussion with
the Google Scholar team we realized a compromise would be to only
use SSR for pages in the DSpace sitemap (and the home page).
@alanorth alanorth force-pushed the angular-ssr-sitemap-3110 branch from 451b262 to 5b3b3bf Compare January 17, 2025 12:30
@alanorth
Copy link
Contributor Author

Well spotted, @MMilosz! Thanks for the feedback. I've added /handle/ to the paths for SSR.

Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks @alanorth and everyone who has given advice / testing to this. As this is at +1 and has had several other testers, I'm going to merge this "as-is". That way we can also test this on sandbox.dspace.org and demo.dspace.org for any possible side effects. If we find anything, we can fix it in a follow-up PR, or choose to disable it by default.

@tdonohue tdonohue merged commit b08edf1 into DSpace:main Jan 21, 2025
15 checks passed
@dspace-bot
Copy link
Contributor

Backport failed for dspace-7_x, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin dspace-7_x
git worktree add -d .worktree/backport-3682-to-dspace-7_x origin/dspace-7_x
cd .worktree/backport-3682-to-dspace-7_x
git switch --create backport-3682-to-dspace-7_x
git cherry-pick -x 5b3b3bfb9c84b013b8bda81e0ff23ae2b0986a42

@dspace-bot
Copy link
Contributor

Successfully created backport PR for dspace-8_x:

@tdonohue
Copy link
Member

@alanorth : It looks like this could not be auto-ported to 7.x. Could you create a 7.x version?

@alanorth alanorth deleted the angular-ssr-sitemap-3110 branch January 22, 2025 08:06
@alanorth alanorth removed the port to dspace-8_x This PR needs to be ported to `dspace-8_x` branch for next bug-fix release label Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug high priority performance / caching Related to performance, caching or embedded objects port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

(Discussion) High CPU usage in DSpace frontend related to Angular Server Side Rendering (SSR)
7 participants