-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Webpage Optimization repository contains scripts used for getting experimental results. It also contain all the experimental results and related bar plots for visualizing this results. We mainly from performed four different experiments on four different data set and results are stored in following folders:
- Experiment on replica of deploy server on a container using page speed module with ip address 10.4.14.31
- Experiment on deply.virtual.labs.ac.in without pagespeed module
- Experiment on 5000 webpages of www.vlab.co.in
- Lab wise experimentation of eight labs
- Finding most prominent performance issues
- Analysis of performance optimization tools like pagespeed
- To do Analysis and Optimization
- Lab wise analysis of performance
- Differences between pagespeed, yslow and webpagetest
Virtual labs have been implemented as a collection of web applications where in the user may run the lab from a client machine (desktop or mobile device) invoking the various services offered on the server side. Hence the execution of the lab depends on the ability of the server or the client to process the requisite actions using the resources available on the server side or the client side. After the onset of the world wide web, web performance implied optimizing the server side but with the evolution of variety of client side devices, optimization on the client side has become bottle neck. The performance bottle neck can result in high response times there by leading to icky user experience. The response time can depend on the various technologies used for designing/rendering the web page as well as the network. Critical rendering path is the chain of necessary events that occur to render web page on browser. The resources fetched during critical rendering path are called Critical Resources. Front-end developers use too many critical resources like JavaScipt files, CSS files and images to make a good user interface which becomes an overhead. Optimized web pages not only render fast but also save network bandwidth. Along with a good responsive web page, web developers should also focus on optimizing critical resources by minimizing size and number of resources. For each critical resource on a web page, browser makes a new request to server. CSS and javascript are the two critical resources that block the rendering of a web page. Therefore, correct sequence of http requests can reduce the perceived page load time. So critical resources needed to render web page should be fetched first and remaining resources should be fetched in the background. Therefore, while optimizing web page for speed we should focus on
- Minimizing the size of critical resources
- Minimizing the number of critical resources
- Minimizing the critical rendering path
We used various tools on web pages of virtual labs
- To analyze performance of thousands of web pages hosted at virtual labs to improve students experience.
- To evaluate the utility of Pagespeed to optimize web pages of virtual labs.
- Comparison between the web performance of web pages of virtual labs with/without a web optimization tool like Google PageSpeed.
Initially when world wide web started most of the web pages where plain hyper text documents. Slowly people started sharing media content like images and audio on web and in past decade web pages got transformed into web application which depend upon on critical resources like CSS, javascript and media content like images, audio, video etc. Nowadays rendering of web pages depends on the following three major criteria:
- Order in which critical resources are fetched
- Number of critical sources, and
- Size of critical resources.
Number of http requests to render any web page is directly proportional to number of critical resources. For each critical resource in the page, browser has to make a new http request to server and then it gets loaded. So, almost 80\% of the response time is consumed in downloading all the resources. So to reduce number of http requests, one can combine multiple CSS files into one, also multiple javascript files can be combined into one. Other ways include image spiriting, etc.
The nearest geographical server is selected for delivering the content which reduces network latency.
For static components, setting far-future expires header will reduce number of http requests so that browser can fetch resources from cache when request to same web page is made again. For dynamic components, use an appropriate Cache Control header to help browser with conditional requests. This reduces unnecessary http requests.
Compression reduces response time by reducing the size of http response. Gzip is the most popular and effective compression method available currently. It reduces response time by as much as 70\%. If a web client indicates support for compression in the http request header then server sends compressed components.
Problem with not including link to style sheets in the head tag is that it blocks progressive rendering and till the style sheets are fetched users sees nothing on screen.
Putting scripts at the top blocks parallel downloading of resources from same host. In most of the cases scripts are needed when users start interacting with the web page and not during the rendering web page.
If file size is too small it should be made inline to reduce the number of http requests.
CSS and javscript files are cached by browser. So downloading it every time a request is made can be avoided. It takes time to download the first time but all later requests are served from the browser cache.
Unnecessary characters from code should be reduced. This includes redundant code, removing white spaces and possibly removing comments. Removing these unnecessary characters reduces total number of critical bytes served to client-side and increases response time.
Connecting an old page to new one leads to an extra cycle of dns lookup and tcp handshake adding latency and delaying the response. Therfore, it should be avoided.
Entity tags is a way that browser and server use to determine whether the component in cache is same as that on server.
It allows to send partially ready response to browser. It should be written as early as possible in the code, preferably in the head section. In php there is a function flush() to flush the buffer.
Http requests are expensive and getting a response like ‘404 Not found’ is totally useless and adds to latency. Therefore, all such web pages should be avoided.
Favicon stays in the core of server and it is necessary because if it is missing then browser will still request for it and that adds to latency. Yahoo Yslow and Google Pagespeed are well known tools that are capable of evaluating web pages’ performance and providing suggestions to optimize web pages based on the best practices listed above. Pagespeed is also capable of optimizing web pages without changing semantics of a web page.
Yahoo yslow.js is a javascript API which runs on phantomjs. PhantomJS is a headless browser with JavaScript API. We used Yslow as performance measuring tool because it not only analyzes a web page but also gives suggestions on how to improve it. It works on following three ways:
- It crawls the DOM to find each component.
- Collects information from each component and analyzes each of them.
- It generates scores out of 100 for each rule which produces the overall
score for page. The grades for individual rules are computed differently depending on the rule. For example, for Rule 1, three external scripts are allowed and for each extra script four points are deducted from the overall weight. The code for grading each rule is found in rules.js. The overall grade is a weighted average of the individual grades for each rule, calculated in controller.js. The rules are applied approximately in order of importance, most important first. The specific weights are in the ruleset objects in rules.js. For details on score computation please refer to Rule Matrix.
Mod_pagespeed is an automated web page optimization tool developed by Google for optimization of web pages. It can analyze a web page and optimize it. Based on best practices, it has certain set of filters which optimize the web page during run time. As the server gets request for the web page, it dynamically rewrites the page using its filters and sends a highly optimized page. The rewriting happens only when the first request for web page is made and for all later requests web page is served from the cache at server side. There are more than 40 filters which can be used for optimization. These filters can be turned on or off based on our requirements.
Our work is broadly divided into four major phases namely Data Collection, Data Visualization, Analysis of Data and optimizing web pages based on analysis. During data collection phase we first collected all the urls of virtual labs hosted on www.deploy.virtual-labs.ac.in. Then we collected Yslow reports for each web page using an automated script and phantomJs. After collecting all the reports we extracted scores for each rule from reports and stored it in a csv file. During visualization phase all data is visualized using an automated script indicating performance for each rule and also overall performance of web pages. Later we carried out analysis to optimize the web pages of virtual labs.
As a first step to collect data and generate reports, we collected urls of all the web pages at following domain deploy.virtuals-labs.ac.in. Since we have access to server, to get all the urls, we extracted all the html and php pages’ links from the server and stored it into the text file using an automated bash script. In order to test the performance of web pages www.vlab.co.in, we collected 5000 urls using an online sitemap generator since we did not had access to server. We extracted yslow report for each web page using yslow.js and phantomJS using automated bash script. This automated script read the url line by line from a text file containing urls and generated a report for each web page. This script does batch processing running ten phantomjs processes in background for ten different web pages. The urls of dangling web pages are pushed into a text file named failed-urls and such reports are deleted. These ylsow reports serves as input to CSV file generation.
CSV file are generated using a bash script. This CSV file contains the overall score and scores corresponding to each rule. Script extracts all the scores corresponding to each rule and dumps it into the CSV file line by line. The content in the CSV file is used for visualizing statistics.
Data visualization is carried out using python matplotlib matplot and csvkit. Statistics for each rule is plotted using an automated script in the form of bar graphs. Scripts take csv files as input and return set of bar graphs specific to each rule as output.
In order to achieve our goals we carried out experimentation on three different set of urls:
- set containing 5000 urls of www.vlab.co.in without pagespeed
- set containing 9000 urls of www.deploy.virtual-labs.ac.in without pagespeed
- set containing 9000 urls of www.deploy.virtual-labs.ac.in with pagespeed
For each of the above set of urls we generate yslow reports and accumulate scores in three different CSV files. Later statistics are visualized in the form of bar graphs for each csv file. We analyze the performance of www.vlab.co.in using obtained bar graphs in next section. We also studied comparison between performance of 8786 web pages of www.deploy.virtual-labs.ac.in with and without pagespeed.