This page describes how to install Ricgraph for a single user on Linux. If you would like to use Ricgraph in a multi-user environment on Linux, you will need to install Ricgraph differently. In case you have no idea what would be the best for your situation, please install Ricgraph for a single user on Linux, as described on this page. Or go for Ricgraph in a container.
Other Ricgraph install options are:
- Install and configure Ricgraph as a server: multi-user environment on Linux.
- Install and use Ricgraph in a container: relatively quick with limited possibilities.
Continue reading here if you would like to install Ricgraph on Windows.
On this page you can find:
- Ricgraph Makefile
- Installation instructions for a single user
- Requirements
- Steps to take
- Install Neo4j Desktop
- Install Bloom configuration
- Download Ricgraph
- Use a Python virtual environment and install Python requirements
- Ricgraph initialization file
- Using Ricgraph
- Dumping and restoring the Ricgraph database
- Ricgraph on Windows
Return to main README.md file.
A Ricgraph installation involves a number of steps. Ricgraph uses a Makefile to make installation of (parts of) Ricgraph easier. Such a Makefile automates a number of these steps. A Makefile command is executed by typing:
make [target]
To use the Ricgraph Makefile, first go to your home directory on Linux and then download it, by typing:
cd
wget https://raw.githubusercontent.com/UtrechtUniversity/ricgraph/main/Makefile
In the example above, the [target] specifies what has to be done. Assuming that you are in your home directory, you can execute one of these commands to find the possible targets:
make
make help
You can add command line parameters to the make
command, e.g. to get the
Ricgraph cutting edge version, or to specify an installation path.
Look in the Makefile for possiblities. Any variable defined
in the Makefile can be used as make
command line parameter.
For an example, see the Podman Containerfile.
Most often, you do not need to install the make
command, but if you get a
"command not found" error message, you need to install it using your Linux
package manager.
If you read the documentation below or on page Ricgraph as a server on Linux, you will notice that some sections start with mentioning a Makefile command. That means, that if you execute that command, the steps in that section will be done automatically. Sometimes, you will have to do some post-install steps, e.g. because you have to choose a password for the graph database.
Ricgraph can use two graph database backends: Neo4j and Memgraph.
Neo4j has several products:
- Neo4j Desktop;
- Neo4j Bloom graph visualization tool, included with Neo4j Desktop (according to Neo4j: "A beautiful and expressive data visualization tool to quickly explore and freely interact with Neo4j’s graph data platform with no coding required");
- Neo4j Community Edition, allows to explore the graph using Cypher queries only.
Memgraph is an in memory graph database and therefore faster than Neo4j. However, it has not been tested extensively with Ricgraph yet. Read Install and start Memgraph.
The easiest method for using Ricgraph is by using a Linux virtual machine (VM) such as you can create using VirtualBox. A VM of size 25GB with 4GB memory will work. Of course, this depends on the (size of the) sources you plan to harvest and the capabilities of your computer. The more, the better. The author uses a VM of 35GB with 10GB memory and 3 vCPUs on an 11th gen Intel i7 mobile processor.
Ricgraph has been developed with Python 3.11. For some features you need at least Python 3.9. E.g., if you have Ubuntu 20.04, you can install Python 3.11 as follows:
- Login as user root.
- Type the following commands:
add-apt-repository ppa:deadsnakes/ppa apt install python3.11
- Exit from user root.
- Install your graph database backend (choose one of these):
- Install Neo4j Desktop (recommended, since it includes Bloom). Optional: Install the Bloom configuration.
- Install and start Neo4j Community Edition.
- Install and start Memgraph.
- Download Ricgraph.
- Use a Python virtual environment and install Python requirements.
- Create and update the Ricgraph initialization file. This is also the place where you specify which graph database backend you use.
- Start
- harvesting data, see Ricgraph harvest scripts;
- writing scripts, see Ricgraph script writing.
- Execute queries and visualize the results.
Other things you might want to do, if you use Neo4j:
- Create a Neo4j Desktop database dump of Ricgraph.
- Restore a Neo4j Desktop database dump of Ricgraph in Neo4j Desktop.
- Restore a Neo4j Desktop database dump of Ricgraph in Neo4j Community Edition.
To install, you can either use the Ricgraph Makefile and execute
command make install_neo4j_desktop
, or follow the steps below.
- Install Neo4j Desktop Edition (it is free). To do this, go to the Neo4j Deployment Center. Go to section "Neo4j Desktop". Choose the latest version of Neo4j Desktop. Download the Linux version. It is an AppImage, so it can be installed and used without root permissions. You will be asked to fill in a form before you can download. In the following screen you will be given a "Neo4j Desktop Activation Key". Save it.
- The downloaded file is called something like neo4j-desktop-X.Y.Z-x86_64.AppImage, where X.Y.Z is a version number. Make it executable using "chmod 755 [filename]".
- Start Neo4j Desktop by clicking on the downloaded file.
- Accept the license.
- Enter your activation key in the right part of the screen. Click "Activate". If you do not have a key, fill in the left part of the screen. Click "Register with Email". Wait awhile.
- Choose whether you would like to participate in anonymous reporting.
- You may be offered updates for Neo4j Desktop components, please update.
- Move your mouse to "Example Project" in the left column. A red trash can icon appears. Click it to remove the Example Project database "Movie DBMS". Confirm. Then wait awhile.
- The text "No projects found" will appear. Create a project by clicking the button "+ New Project".
- The text "Project" appears with the text "Add a DBMS to get started". Click on the "+ Add" button next to it and select "Local DBMS". Leave the name as it is ("Graph DBMS") and fill in a password. Click "Create".
- [This step is not necessary if you use the Ricgraph Makefile.] Insert the password in field graphdb_password in the Ricgraph initialization file, see below.
- Exit Neo4j Desktop using the "File" menu and select "Quit". If your database was active a message similar to "Your DBMS [name] is running, are you sure you want to quit" appears, choose "Stop DBMS, then quit".
- Ready.
Now we need to find the port number which Neo4j Desktop is using:
- Start Neo4j Desktop. Start the Graph DBMS.
- Click on the words "Graph DBMS". At the right (or below, depending on the width of the Neo4j Desktop window) a new screen appears. Look at the tab "Details". Note the port number next to "Bolt port" (the default value is 7687). Insert this port number in field graphdb_port in the Ricgraph initialization file, see below.
- Ready.
This is only necessary if you plan to use Bloom. If you don't know, skip this step for now, you can come back to it later.
- Start Neo4j Desktop.
- Click on the icon on the left side of Neo4j Desktop.
- Click on "Neo4j Bloom". A new window appears.
- In this window, click on the icon at the top left. A Bloom "Perspective" slides out (Neo4j has an extensive description how to use it).
- Click on "neo4j > Untitled Perspective 1".
- A new window appears. Right of the words "Untitled Perspective 1" there are three vertical dots. Click on it. Click on "Delete". The perspective "Untitled Perspective 1" is removed.
- In the same window, right of the word "Perspectives" click on the word "Import". A file open window appears. Go to directory neo4j_config that is part of Ricgraph and select file ricgraph_bloom_config.json. Click "Open". The perspective "ricgraph_bloom_config" is loaded.
- Click on the text "ricgraph_bloom_config".
- Note that the text "neo4j > Untitled Perspective 1" has been changed in "neo4j > ricgraph_bloom_config".
- A few centimeters below "neo4j > ricgraph_bloom_config", just below the text "Add category", click on the oval "RicgraphNode". At the right, a new window will appear.
- In this window, below the word "Labels", check if an oval box with the text "RicgraphNode" is shown. If not, click on "Add labels", click on "RicgraphNode".
- Click on the icon to go back to the main screen of Bloom.
- Click on the cog icon below , you might want to set "Use classic search" to "on".
- Ready.
To you use the Ricgraph Makefile, this will be done automatically while creating a Python virtual environment (see the following section).
You can choose two types of downloads for Ricgraph:
- The latest released version. Go to the Release page of Ricgraph, choose the most recent version, download either the zip or tar.gz version.
- The "cutting edge" version. Go to the GitHub page of Ricgraph, click the green button "Code", choose tab "Local", choose "Download zip".
To do this, you can either use the Ricgraph Makefile and execute
command make full_singleuser_install
, or follow the steps below.
To be able to use Ricgraph, you will need a Python virtual environment. Virtual environments are a kind of lightweight Python environments, each with their own independent set of Python packages installed in their site directories. A virtual environment is created on top of an existing Python installation. There are two ways of doing this:
- Using Python's venv module;
- Using a Python Integrated development environment (IDE).
- Using Python's venv module.
Read Create a Python virtual environment and install Ricgraph in
it.
This documentation has been written for a multi-user installation of Ricgraph.
To use it for a single users install (as you are doing since you are on this page):
- Suppose you are a user with login alice.
- Suppose your home directory is /home/alice (check this by typing
cd
followed bypwd
). - For every occurrence of /opt in
Create a Python virtual environment and install Ricgraph in
it,
read /home/alice, and ignore any references to "login as user root" and
chown
. - Follow the other instructions as written.
- Using a Python
Integrated development
environment (IDE),
such as PyCharm.
An IDE will automatically generate a virtual environment, and any time you
use the IDE, it will "transfer" you to that virtual environment.
It will also help to execute and debug your scripts.
-
If PyCharm does not automatically generate a virtual environment, you need to go to File --> Settings --> Project: [your project name] --> Python Interpreter, and check if there is a valid interpreter in the right column next to "Python Interpreter". If not, add one, using "Add Interpreter", and choose for example "Add Local Interpreter". A venv will be generated.
-
Next,
unzip
ortar xf
the downloaded file for Ricgraph (see previous section). -
Install the Python requirements. Depending on the Python IDE, single or double-click on file requirements.txt. Probably, a button or text appears that asks you to install requirements. Click on it.
If this does not work, type in the IDE (PyCharm) Terminal:
pip3.11 install -r requirements.txt
You may want to change 3.11 in pip3.11 for the Python version you use.
-
- PyAlex. PyAlex is a Python library for OpenAlex. OpenAlex is an index of hundreds of millions of interconnected scholarly papers, authors, institutions, and more. OpenAlex offers a robust, open, and free REST API to extract, aggregate, or search scholarly data. PyAlex is a lightweight and thin Python interface to this API.
Ricgraph requires an initialization file. A sample file is included as ricgraph.ini-sample. You need to copy this file to ricgraph.ini and modify it to include settings for your graph database backend, and API keys and/or email addresses for other systems you plan to use.
Ricgraph has a [GraphDB] section where you have to specify the graph database backend that you will be using. First, you will need to set the parameter graphdb to the graph database backend name (you can choose between neo4j and memgraph). Further down that section, you will have to fill in six parameters for hostname, port number, username, etc. The comments in the initialization file explain how to do that.
Optionally, you can extend Ricgraph by adding new properties of nodes. Before you can do this, download Ricgraph.
There is a parameter RICGRAPH_NODEADD_MODE in the initialization file which influences how nodes are added to Ricgraph. Suppose we harvest a source system and that results in the following table:
FULL_NAME | ORCID |
---|---|
Name-1 | 0000-0001-1111-1111 |
Name-2 | 0000-0001-1111-2222 |
Name-3 | 0000-0001-1111-2222 |
Name-4 | 0000-0001-1111-3333 |
Name-2 and Name-3 have the same ORCID. This may be correct, e.g. if Name-2 is a name variant of Name-3, e.g. John Doe vs J. Doe, but it also may not be correct, e.g. if Name-2 is John and Name-3 is Peter (possibly caused by a typing mistake in a source system). There is no way for Ricgraph to know which of these two options it is.
RICGRAPH_NODEADD_MODE can be either strict or lenient:
- strict (default setting): only add nodes to Ricgraph which conform to the model described in the Implementation details. In the example above, ORCID 0000-0001-1111-2222 will not be inserted.
- lenient: add every node. In the example above, ORCID 0000-0001-1111-2222 will be inserted.
This will have the following consequences:
-
strict: since ORCID 0000-0001-1111-2222 will not be inserted, a research output from a person with that ORCID may not be inserted in Ricgraph. Or the research output will be inserted, but it might not be linked to the person with this ORCID.
-
lenient: as has been described Implementation details, person-root "represents" a person. Person identifiers (such as ORCID) and research outputs are connected to the person-root node of a person. That means that the person-root node is connected to everything a person has contributed to.
In the example above, ORCID 0000-0001-1111-2222 is inserted. That means that the person-roots of the two persons Name-2 or Name-3 are "merged" and that all research outputs of Name-2 and Name-3 will be connected to one person-root node. After this has been done, there is no way to know which research output belongs to Name-2 or Name-3.
As said, that is fine if Name-2 and Name-3 are name variants, but not fine if they are different names. (Side note: if you want to capture spelling variants, you may want to use a fuzzy string match library such as TheFuzz.)
Lenient is advisable if the sources you harvest from do not contain errors. However, the author of Ricgraph has noticed that this does not occur often, therefore the default is strict.
Before you can do anything with Ricgraph, you need to harvest sources, see Ricgraph harvest scripts. After you have harvested sources, you can execute queries and visualize the results, see Query and visualize Ricgraph.
Depending on your situation (whether you use Neo4j Desktop or Neo4j Community Edition), this section lists the methods for dumping and restoring the Ricgraph database:
- Create a Neo4j Desktop database dump of Ricgraph
- Create a Neo4j Community Edition database dump of Ricgraph
- Restore a Neo4j Desktop database dump of Ricgraph in Neo4j Desktop
- Restore a Neo4j Desktop database dump of Ricgraph in Neo4j Community Edition
- Restore a Neo4j Community Edition database dump of Ricgraph in Neo4j Community Edition
To create a Neo4j Desktop database dump of Ricgraph, follow these steps:
- Start Neo4j Desktop if it is not running, or stop the graph database if it is running.
- Hoover over the name of your graph database (probably "Graph DBMS"), and click on the three horizontal dots at the right.
- Select "Dump".
- Your graph database will be dumped. This may take a while. When it is ready, a message appears.
- Ready.
To do this, you can either use the Ricgraph Makefile and execute
command make dump_graphdb_neo4j_community
, or follow the steps below.
To create a Neo4j Community Edition database dump of Ricgraph, follow these steps:
- Login as user root.
- Stop Neo4j Community Edition:
systemctl stop neo4j.service
- To be able to restore a Neo4j database dump you need to set several permissions
on /etc/neo4j:
chmod 640 /etc/neo4j/* chmod 750 /etc/neo4j
- Do the database dump:
neo4j-admin database dump --expand-commands system --to-path=[path to database dump directory] neo4j-admin database dump --expand-commands neo4j --to-path=[path to database dump directory]
- Start Neo4j Community Edition:
systemctl start neo4j.service
- Check the log for any errors, use one of:
systemctl -l status neo4j.service journalctl -u neo4j.service
- Exit from user root.
To restore a Neo4j Desktop database dump of Ricgraph in Neo4j Desktop, follow these steps:
- Start Neo4j Desktop if it is not running, or stop the graph database if it is running.
- Click on the button "Add" on the right side of "Project" and select "File".
- Select the file "neo4j.dump" from a previous Neo4j Desktop database dump. This file will be added to the "File" section a little down the "Project" window.
- Hoover over this file and click on the three horizontal dots at the right.
- Select "Create new DBMS from dump".
- Give it a name, e.g. "Graph DBMS from import file".
- When asked, enter the password you have specified in the Ricgraph initialization file (this saves you from entering a new password in that file).
- A new local graph database is being created. This may take a while.
- Hoover over the newly created graph database and click "Start" to run it.
- Once it is active, install the Bloom configuration.
- Now you are ready to explore the data using Bloom or Ricgraph Explorer.
To restore a Neo4j Desktop database dump of Ricgraph in Neo4j Community Edition, follow these steps:
- Login as user root.
- Stop Neo4j Community Edition:
systemctl stop neo4j.service
- To be able to restore a Neo4j database dump you need to set several permissions
on /etc/neo4j:
chmod 640 /etc/neo4j/* chmod 750 /etc/neo4j
- Save the old database:
cd /var/lib/neo4j mv data/ data-old
- Go back to your working directory and restore the database dump:
For path to database dump directory, specify the path, not the path and the name of the database dump file (this name is neo4j.dump, it will be inferred automatically by the neo4j-admin command).
cd neo4j-admin database load --expand-commands neo4j --from-path=[path to database dump directory] --overwrite-destination=true
- Set the correct permissions on /var/lib/neo4j/data:
cd /var/lib/neo4j chown -R neo4j:neo4j data
- Start Neo4j Community Edition:
systemctl start neo4j.service
- Check the log for any errors, use one of:
systemctl -l status neo4j.service journalctl -u neo4j.service
- In your web browser, go to http://localhost:7474/browser.
- Neo4j will ask you to login, use username neo4j and password neo4j.
- Neo4j will ask you to change your password, for the new password, enter the password you have specified in the Ricgraph initialization file (this saves you from entering a new password in that file).
- Restart Ricgraph Explorer if you use
Ricgraph in a multi-user environment:
systemctl restart ricgraph_explorer_gunicorn.service
- Check the log for any errors, use one of:
systemctl -l status ricgraph_explorer_gunicorn.service journalctl -u ricgraph_explorer_gunicorn.service
- Done. If all works well you might want to remove your old database:
cd /var/lib/neo4j rm -r data-old
- Exit from user root.
To do this, you can either use the Ricgraph Makefile and execute
command make restore_graphdb_neo4j_community
, or follow the steps below.
To restore a Neo4j Community Edition database dump of Ricgraph in Neo4j Community Edition, follow these steps:
- Login as user root.
- Stop Neo4j Community Edition:
systemctl stop neo4j.service
- To be able to restore a Neo4j database dump you need to set several permissions
on /etc/neo4j:
chmod 640 /etc/neo4j/* chmod 750 /etc/neo4j
- Save the old database:
cd /var/lib mv neo4j/ neo4j-old mkdir /var/lib/neo4j
- Go back to your working directory and restore the database dump:
For path to database dump directory, specify the path, not the path and the name of the database dump file, it will be inferred automatically by the neo4j-admin command.
cd neo4j-admin database load --expand-commands system --from-path=[path to database dump directory] --overwrite-destination=true neo4j-admin database load --expand-commands neo4j --from-path=[path to database dump directory] --overwrite-destination=true
- Set the correct permissions on /var/lib/neo4j/data:
cd /var/lib chown -R neo4j:neo4j neo4j
- Start Neo4j Community Edition:
systemctl start neo4j.service
- Check the log for any errors, use one of:
systemctl -l status neo4j.service journalctl -u neo4j.service
- Restart Ricgraph Explorer if you use
Ricgraph in a multi-user environment:
systemctl restart ricgraph_explorer_gunicorn.service
- Check the log for any errors, use one of:
systemctl -l status ricgraph_explorer_gunicorn.service journalctl -u ricgraph_explorer_gunicorn.service
- Done. If all works well you might want to remove your old database:
cd /var/lib rm -r neo4j-old
- Exit from user root.
The easiest way to go is to Install and use Ricgraph in a container. This is relatively quick but it offers limited possibilities.
If you would like to go for a "full" install of Ricgraph on Windows using either Install and configure Ricgraph for a single user or Install and configure Ricgraph as a server, you are very probably the first person to do so, as far as known. The creator of Ricgraph has no experience in developing software on Windows. So please let me know which steps you have taken, so I can add them to this documentation. If you are a Windows user, I would recommend to create a Linux virtual machine using e.g. VirtualBox, and install Ricgraph in that virtual machine as described above.