The PCA Visualizer is an interactive Shiny application designed to help users understand and visualize Principal Component Analysis (PCA) in a straightforward and engaging way. The app allows users to generate random data with a specified correlation, perform PCA on the data, and visualize the results interactively.
Below is a screenshot of the PCA Visualizer in action:
You can try out the PCA Visualizer Shiny app live here
- Interactive Data Generation: Users can generate random datasets with a specified number of points and correlation between the variables.
- Dynamic PCA Visualization: Visualize the mean point, first and second principal components (PC1 and PC2), and the transformed data in the principal component space.
- Customizable Display: Options to show or hide the mean point, principal components, and transformed data.
- Variance Explained: The app calculates and displays the proportion of variance explained by each principal component.
- Shiny: Provides the interactive web framework for the application.
- ggplot2: Utilized for creating static visualizations (though dynamically rendered via plotly).
- plotly: Adds interactivity to the plots, allowing users to hover over data points and explore the visualizations more deeply.
- dplyr: Used for data manipulation within the app.
-
Clone the repository:
git clone https://github.com/mohammedyounes98/PCA_Viz
-
Install the required packages in R:
install.packages(c("shiny", "ggplot2", "dplyr", "plotly"))
-
Run the Shiny app:
shiny::runApp('PCA_Viz.R')
-
Interact with the App:
-
Adjust the number of points and the correlation between
X
andY
. -
Click "Regenerate Data" to create a new dataset.
-
Use the checkboxes to toggle the display of the mean point, principal components, and transformed data.
-
Explore how the correlation affects the orientation and length of the principal components.
-
-
Number of Points Slider: Controls the number of points in the dataset.
-
Correlation Slider: Adjusts the correlation between
X
andY
variables. -
Regenerate Button: Generates a new random dataset based on the selected parameters.
-
Checkboxes: Toggle the display of the mean point, PC1, PC2, and transformed data.
-
Original Data: The scatter plot shows the original data points with optional overlays for the mean point and principal components.
-
Principal Components: PC1 is shown as a red line, representing the direction of maximum variance, while PC2 is shown as a green line, perpendicular to PC1.
-
Transformed Data: When enabled, the plot switches to display data in the new coordinate system defined by the principal components.
- A text output displays the percentage of variance explained by PC1 and PC2, giving insights into the effectiveness of the PCA for the current dataset.
Contributions are welcome! Feel free to fork the repository, make improvements, and submit a pull request. If you encounter any issues or have suggestions for new features, please open an issue.
This project is licensed under the GNU General Public License (GPL v3) License. See the LICENSE file for details.