Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add [ros2 node kill <node_name>] and [ros2 node kill --all] (similar to [rosnode kill] from ros1) #287

Open
craigh92 opened this issue Jun 13, 2020 · 25 comments
Assignees

Comments

@craigh92
Copy link

Feature request

Feature description

ros1 had the ability to kill a node from the command line using rosnode kill <node_name>, or kill all nodes using rosnode kill -a. This would end the process running each ros node. It would be usefull to have this feature in ros2.

We currently have ros2 lifecycle set which can be used for a similar purpose, but this only works for ManagedNodes which is currently a c++ only feature. Even when this feature does come to python, not all nodes will be managed nodes so it makes sence to have a simpler kill utility that doesn't come with the complexity of the lifecycle states.

Implementation considerations

As far as I can tell, there is no way of doing this through the rclpy, rmw, or rclcpp API, so changes may need to be made in other ros2 projects to make this possible.

As a test I created new workspace ros2kill_ws with two packages, test_py_pkg and test_cpp_pkg, each containing a minimal publisher written in the corresponding language. I then created a python launch file that launches these nodes, and launched it with ros2 launch -a launch/launch.py. After investigating with htop, I found this created 3 new processes:

/usr/bin/python3 ~/ros2_foxy/install/ros2cli/bin/ros2 launch -a launch/launch.py
/usr/bin/python3 ~/ros2kill_ws/install/test_py_pkg/lib/test_py_pkg/py_node --ros-args
~/ros2kill_ws/install/test_cpp_pkg/lib/test_cpp_pkg/cpp_node --ros-args

If the nodes are instead started with ros2 run, there is no third process that hangs around, and they do not have the --ros-args argument passed to them.

So maybe it would be possible to kill the nodes with the following method:

  • Use ros2 pkg prefix <pkg_name> to get the path to where the package is
  • Use ros2 node list -t to get the currently running nodes and the name of the executable for that node
    • (the -t option doesn't exist yet, but could be added)
      • It would use a similar method to ros2 pkg executables to get the executable names
  • Use this information to generate a killall command on linux, or taskkill command on windows.
    • Or use a better more cross-platform method to kill the processes
  • ros2 node kill would then trigger this trough a sys call.
@craigh92 craigh92 changed the title Add ros2 node kill <node_name> and ros2 node kill --all (similar to rosnode kill from ros1) Add [ros2 node kill <node_name>] and [ros2 node kill --all] (similar to [rosnode kill] from ros1) Jun 13, 2020
@sloretz sloretz transferred this issue from ros2/ros2cli Jun 25, 2020
@sloretz
Copy link
Contributor

sloretz commented Jun 25, 2020

As far as I can tell, there is no way of doing this through the rclpy, rmw, or rclcpp API, so changes may need to be made in other ros2 projects to make this possible.

I moved this issue to the design repo because it seems like it would be beneficial to write a design document first. This feature could touch a lot of things.

ros1 had the ability to kill a node from the command line using rosnode kill <node_name>, or kill all nodes using rosnode kill -a. This would end the process running each ros node. It would be usefull to have this feature in ros2.

ROS 1 also only had one node per process. There are a few more cases to consider in ROS 2:

  • What to do if two nodes are manually composed in a process? It would be undesirable to kill one node and have the whole process die.
  • What if the node was dynamically composed in a process? It would be undesirable to kill the container and have every node inside of it die.
  • What if a single node is in a single process, but that process is on a different machine?

@craigh92
Copy link
Author

craigh92 commented Jul 1, 2020

I moved this issue to the design repo because it seems like it would be beneficial to write a design document first. This feature could touch a lot of things.

I'm guessing I just follow these instructions to do that? I will try and convince the powers that be at my workplace that this is worth the effort (we are using ros2 in our project), and if successful I will brainstorm with a few other ros users at my workplace and then put this together. If not I may be able to do this in my own time but I will be considerably slower.

What to do if two nodes are manually composed in a process? It would be undesirable to kill one node and have the whole process die.

How about instead of ros2 node kill, we add a new command to the CLI ros2 process, and one of the verbs for this command is ros2 process kill. If users want individual control over a node in a process with multiple nodes they will have to use the ManagedNode and lifecycle tools. As I mentioned earlier this is not yet available in the python API but I believe they will eventually come.

This utility then just serves as a simpler way to manage the processes without the boilerplate of using a ManagedNode (and for the cases where you are not using a ManagedNode).

What if the node was dynamically composed in a process? It would be undesirable to kill the container and have every node inside of it die.

Sometimes this is desirable. For example, if you encounter a bug and just need to restart the process. If using launch files this is currently difficult as you do not have an easy handle to each process. Again, if the user wants more fine-grained control they should use a ManagedNode.

What if a single node is in a single process, but that process is on a different machine?

For ros2 process kill, then the intuitive behaviour would be for it to only work on the machine running the process. Perhaps an optional argument could be added to find and kill remote processes, but this could be a separate feature and out of scope for this design.

@hidmic
Copy link
Contributor

hidmic commented Jul 1, 2020

I can see value in a simpler ros2 process kill tool, which is clearly meant to support debugging.

@craigh92
Copy link
Author

craigh92 commented Jul 2, 2020

@sloretz Do you think ros2 process kill requires a design document? Or since it is smaller scope should I just reopen a new issue in the ros2/rcli repo, and close this issue?

@gbiggs
Copy link
Member

gbiggs commented Jul 4, 2020

I think that directly managing system processes via ros2 is a dangerous game. Trying to interact with running nodes via the ros2 node command is more reasonable, but probably when you're wanting to kill a node it's already not responding to anything over the ROS interfaces, and unfortunately ros2 node doesn't know anything about the PID of the node.

There are two situations, I think:

  1. Nodes launched via ros2 run
  2. Nodes launched via the launch system

In the case of 1., you have direct access to the process that started the node via the shell. It should be trivial to force it to stop, so I don't think a new command is needed in this situation.

Number 2 is the situation that is relevant to this discussion.

I think that this situation is best resolved by making the launch system more capable for managing running systems, not just starting them. This is something I've wanted to do for quite a while: build a full-fledged system orchestrator that provides a runtime interface to managing a complete system, or even individual sub-systems, with ros2 launch becoming the interface for management commands. Most of that is beyond the scope of this issue, probably, but I can imagine that one of those commands could be to kill a container.

Currently if you use Python-based launch scripts, rather than the XML interface, you could build a way into your launch scripts to let you respond to keyboard events that kill individual nodes. It would be very custom and probably very hacky, but it would work.

@craigh92
Copy link
Author

craigh92 commented Jul 6, 2020

Number 2 is the situation that is relevant to this discussion.

Yep. This is what prompted me to make this issue. I would like to be able to start ros2 launch in the background and not have to keep hold of it's PID in my bash session. Also, sending a SIGINT to the launch process sometimes killed all of the nodes, and sometimes didn't. It just wasn't a very friendly way of process orchestration.

I think that this situation is best resolved by making the launch system more capable for managing running systems, not just starting them.

I agree. Is it possible for the launch system to add services to the nodes that it launches? If it is, one way of doing this could be that every node launched with ros2 launch has a service added to it that calls self.executor.shutdown when triggered. The launch command could also start an extra node that hosts a service that can shut down all nodes (by calling the aforementioned service of each). This behavior could be optionally added using a flag, e.g ros2 launch --killable.

@clalancette
Copy link
Contributor

I think that this situation is best resolved by making the launch system more capable for managing running systems, not just starting them. This is something I've wanted to do for quite a while: build a full-fledged system orchestrator that provides a runtime interface to managing a complete system, or even individual sub-systems, with ros2 launch becoming the interface for management commands.

While I agree with the general desire, I'll strongly recommend looking into accomplishing this by hooking into existing process monitoring mechanisms (namely systemd on Linux, launchd on macOS, and whatever Windows uses for services). There are a lot of corner cases here, and it would be a real shame to essentially reimplement those system services in launch again.

@gbiggs
Copy link
Member

gbiggs commented Jul 8, 2020

That is the way I would go about it, yes.

@suryajayaraman
Copy link

I think that directly managing system processes via ros2 is a dangerous game. Trying to interact with running nodes via the ros2 node command is more reasonable, but probably when you're wanting to kill a node it's already not responding to anything over the ROS interfaces, and unfortunately ros2 node doesn't know anything about the PID of the node.

There are two situations, I think:

  1. Nodes launched via ros2 run
  2. Nodes launched via the launch system

In the case of 1., you have direct access to the process that started the node via the shell. It should be trivial to force it to stop, so I don't think a new command is needed in this situation.

Number 2 is the situation that is relevant to this discussion.

I think that this situation is best resolved by making the launch system more capable for managing running systems, not just starting them. This is something I've wanted to do for quite a while: build a full-fledged system orchestrator that provides a runtime interface to managing a complete system, or even individual sub-systems, with ros2 launch becoming the interface for management commands. Most of that is beyond the scope of this issue, probably, but I can imagine that one of those commands could be to kill a container.

Currently if you use Python-based launch scripts, rather than the XML interface, you could build a way into your launch scripts to let you respond to keyboard events that kill individual nodes. It would be very custom and probably very hacky, but it would work.

Hi, sorry if I sound like a noob. I'm new to this stuff. For the 1st case, what is the recommended way to kill such node (started with ros2 run). Should I use subprocess popen to get pid of the initiated thread and kill it subsequently? If so, can someone give example for killing a ROS2 executable (single node) started with such method?

Thanks in advance.

@fujitatomoya
Copy link
Collaborator

the use cases that i think of,

  • kill specific node and restart. this happens during debugging.
  • kill all nodes, and fresh start.

and to support the 1st one to control each node instance, i think eventually we need to have something similar with ManagedNode to stop/start the node.

@fujitatomoya
Copy link
Collaborator

For the 1st case, what is the recommended way to kill such node (started with ros2 run). Should I use subprocess popen to get pid of the initiated thread and kill it subsequently?

i am not sure if i understand the question, how about the followng?

root@12956542b338:~/ros2_ws/colcon_ws# ros2 run demo_nodes_cpp talker
[INFO] [1627068976.974343304] [talker]: Publishing: 'Hello World: 1'
[INFO] [1627068977.974083996] [talker]: Publishing: 'Hello World: 2'
...
[ros2run]: Terminated

root@12956542b338:~/ros2_ws/colcon_ws# ps -ef | grep talker
root        5134    5066  0 12:36 pts/3    00:00:00 /usr/bin/python3 /root/ros2_ws/colcon_ws/install/ros2cli/bin/ros2 run demo_nodes_cpp talker
root        5135    5134  0 12:36 pts/3    00:00:00 /root/ros2_ws/colcon_ws/install/demo_nodes_cpp/lib/demo_nodes_cpp/talker

root@12956542b338:~/ros2_ws/colcon_ws# kill 5135

@suryajayaraman
Copy link

Sorry @fujitatomoya for the late reply. Thanks for sharing your solution. I found a similar solution which uses the built-in pidof function to get the pid of talker and then kill the process using the kill command. I was able to make this work using 'popen function'

c++ code reference : stackoverflow answer

std::string exec(const char* cmd)
 {
    std::array<char, 128> buffer;
    std::string result;

    auto pipe = popen(cmd, "r"); // get rid of shared_ptr

    if (!pipe) throw std::runtime_error("popen() failed!");

    while (!feof(pipe)) {
        if (fgets(buffer.data(), 128, pipe) != nullptr)
            result += buffer.data();
    }

    auto rc = pclose(pipe);

    if (rc == EXIT_SUCCESS) { // == 0
    } 
   else if (rc == EXIT_FAILURE) {  // EXIT_FAILURE is not used by all programs, maybe needs some adaptation.

    }
    return result;
}

I was wondering the performance comparison between this approach and a ROS2 lifecycle node. From my understanding, from efficiency perspective, the overhead of killing a process and associated resources is much greater compared to how ROS2 manages states of lifecycle node. Please correct if I'm wrong in my assumptions.

Thanks in advance.

@fujitatomoya
Copy link
Collaborator

@suryajayaraman

ROS 2 Managed Node is not to manage the process, but ROS 2 Node. Process space can be composed with multiple ROS 2 Node.

@suryajayaraman
Copy link

@suryajayaraman

ROS 2 Managed Node is not to manage the process, but ROS 2 Node. Process space can be composed with multiple ROS 2 Node.

Sorry, I jumped the gun. Now I get your point. Thanks for the clarifiation

@jonnymoshiri
Copy link

Hi, @craigh92 and @gbiggs and @suryajayaraman . I too see desire for a ros 2 process kill node for use cases. IMO lifecycle nodes are not a beginner level topic and for that reason maybe have not yet translated to some projects. Also regarding:

There are two situations, I think:

  1. Nodes launched via ros2 run
  2. Nodes launched via the launch system

I have actually found a use case for #1 . Some nodes when launched create PID's outside of the most recent terminal command. For example:

ros2 runimage_view video_recorder
echo $!

The returned PID number, is one of several processes spun up by this exe. And therefore a simple o.s. level:

kill $!

does not capture required pid's and opens up for additional complexities.

@Gray-Stone
Copy link

Gray-Stone commented Jan 16, 2024

I would like to add my two cent here.

Killing the process through ps | grep ros or other generic process searching method is not ideal. This involves the risk of killing a wrong process that happen to have ros in the middle of its name.

My argument of having the kill feature in ros2 command is :

  • If a process can be found using ros2 node list, the same method should be able to find out the PID info about it, or if the node doesn't belong to localhost.
  • With the PID, the kill feature could be easily added.

@fujitatomoya
Copy link
Collaborator

Killing the process through ps | grep ros or other generic process searching method is not ideal.

no, it is not. i just showed what we can do at this moment, #287 (comment)

With the PID, the kill feature could be easily added.

we can do this, but that comes back to the question #287 (comment)

i think at lease user understands the which nodes are going to be destroyed if we can kill the process ID.

@Gray-Stone
Copy link

i think at lease user understands the which nodes are going to be destroyed if we can kill the process ID.
Agree, it does have these question to be answered before implementing the ros2 node kill

I think a quick alternative is to provided the PID s of each node. User can using these to figure out what to do if nodes are in the same process (same PID), node on other machine: no PID or PID with hostname.

@fujitatomoya
Copy link
Collaborator

you are welcome to create PR based on that, probably we might need to add rcl and rmw interfaces for doing that.

besides that, i believe PID is not really unique identification.
for example, what if we have the container running with the different pid namespace from host system? there would be the same PID but different process space. (Similar problem actually happened before with DDS GUID, now it has been addressed with adding random number field.)

i am more inclined to consider the user experience with ros2 node kill with <node name> or/with --all, --dry-run --force?
showing PID and user is responsible to figure out where to log in, which nodes are affected... does not look useful and does not scale as distributed system.

that is just my opinion, i would like to hear more.

@clalancette
Copy link
Contributor

you are welcome to create PR based on that, probably we might need to add rcl and rmw interfaces for doing that.

I think we should come to agreement here in the design before we start opening pull requests. Otherwise we may end up with only partial solutions.

@aarsht7
Copy link

aarsht7 commented Feb 2, 2024

In the case of 1., you have direct access to the process that started the node via the shell. It should be trivial to force it to stop, so I don't think a new command is needed in this situation.

I don't see any other approach than PID to kill the node. I recently encountered the issue... for example, I use docker and don't want to open a new instance (terminal) each time I want to run the node. Instead, I prefer to use the same terminal to run multiple nodes (can't use launch because of some reasons) or even to check the other things while a node is running in the background and node doesn't have a launch file associated with it.
If I run node using a command like this ros2 run turtlesim turtlesim_node & (note the & in the end)
I can't ctrl+c it as it is no longer running in the current terminal after the starting and I am using the same terminal to observe the data from the process.

@clalancette
Copy link
Contributor

I can't ctrl+c it as it is no longer running in the current terminal after the starting and I am using the same terminal to observe the data from the process.

That shouldn't be the case. Putting an & at the end of a command just runs it in the background; it is still attached to the terminal. You should be able to run fg to bring it to the foreground, then kill it.

I don't see any other approach than PID to kill the node.

The big problem is that in ROS 2 a node does not equal a PID. You can have many nodes inside of a process, so running ros2 node kill, in my opinion, should not kill them all.

@aarsht7
Copy link

aarsht7 commented Feb 2, 2024

You should be able to run fg to bring it to the foreground, then kill it.

Ah, didn't know fg. Thank you for the info.

in my opinion, should not kill them all

Yes, it shouldn't.

But we do need ros2 node kill in some sense. Maybe it can be started with the killing PID and if some new idea comes up, it can be replaced later.

@bmagyar
Copy link

bmagyar commented Oct 17, 2024

How about ros2 node kill --all being the only option in the beginning? I think that would be the most common use case for people who had something go completely flop on them.

Example: me with a broken launch-testing setup discovering 30 minutes into another debugging session that I have 20 zombie nodes running in the background still.

@ruffsl
Copy link
Member

ruffsl commented Dec 17, 2024

Example: me with a broken launch-testing setup discovering 30 minutes into another debugging session that I have 20 zombie nodes running in the background still.

So true! Sometimes I find it simpler to restart my dev container rather than track down the zombie nodes that a crashed pytest launch leaves behind. Quite annoying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests