- 出错时,恢复性操作是从0开始,还是从上一次执行后的recovery_index_开始?遵循这么条规则:如果执行恢复性操作后,能够把之前导致出错的原因解决掉,下次就从0开始,否则从上一次执行后的recovery_index_开始。
一、状态和出错原因
1.1 状态
move_base节点有3种状态:PLANNING、CONTROLLING和CLEARING。一旦初始化后,过程中会执行的就是收到sendGoal时的executeCB,总是在导航中。

- PLANNING。该阶段要调用makePlan规划全局路径。规划出路径后进入CONTROLLING。如果长时间规划失败,进入CLEARING(PLANNING_R)。
- CONTROLLING。已获得了全局路径,一次又一次进行着局部规划,根据得出的速度移动机器人,直到到达goal。如果出现长时间局部规划失败(CONTROLLING_R),或长时间机器人不动(OSCILLATION_R),进入CLEARING。
- CLEARING。导航卡住了,试着执行恢复性操作。操作执行完后转入PLANNING。
看下导航过程没发生问题时的状态变换。
- (主线程)接收到sendGoal请求,状态进入PLANNING。
- (planThread)执行makePlan规划全局路径。规划出路径后,状态进入CONTROLLING
- (主线程)局部代价地图算出一个最优cmd_vel,按这速度移动机器人。
- (主线程)稍做等待,用tc_->isGoalReached判断是否到goal,如果没有,转到步骤3。如果到达了,此次导航结束。
如果一切正常,不会进入CLEARING,要进入CLEARING是导航出问题了,要进行容错处理。简单来说,机器人在哪里卡住了,要根据经验试着让机器人N种恢复性操作。这N种操作不是每次出错就全做,而是只做一种。举个例子,第一次出错时,做第一种恢复操作;第二次出错时,做第二种恢复操作,如果N种做完了依旧失败,那只能中止导航。每做完一种恢复性操作后,状态一律进入PLANNING。
1.2 出错原因
move_base把出错原因归为三种:PLANNING_R、CONTROLLING_R和OSCILLATION_R,下表是这三种原因各自涉及到的设置参数。
参数 | 原因 | 目前值 | 描述 |
planner_patience_ | PLANNING_R | 2(秒)[注1] | 一旦累计已花在全局路径规划时间超过3秒,进入CLEARING状态 |
max_planning_retries_ | PLANNING_R | -1(次)[注2] | 一旦全局路径规划失败次数超过uint32_t(-1)次,进入CLEARING状态。 |
controller_patience_ | CONTROLLING_R | 3.0(秒) | 一旦累计已花在局部规划时间超过3秒,进入CLEARING状态 |
oscillation_distance_ | OSCILLATION_R | 0.2(米) | 两次位置差值>=20厘米,认为机器人出现了不是抖动的移动 |
oscillation_timeout_ | OSCILLATION_R | 10.0(秒) | 一旦机器人已有超过10秒没动过,进入CLEARING状态 |
- 注1:一旦全局路径规划失败,在短时间内,很少能变成成功。当然有这么种可能,起点和目标点之间存在走来走去的人,恰好走开关键位置了,使变得成功。结合恢复性操作trose_recovery,那里会把代价地图设为默认值,能缓解这情况。为减少溢出时间,有些厂家是5秒,这里降到2秒。
- 注2:由于uint32_t(-1)是个极大值,现实中不会出现超过这么多的失败次数,即目前对全局路径规划失败导致的中止只考虑时间因素。
执行完所有恢复性操作后,还是不行,要结束此次导航。不论什么原因,result是“因失败而中止:setAborted”。
PLANNING_R:一直规划不出全局路径
- last_valid_plan_记为最近一次规划出全局路径时刻。
- (planThread)此次makePlan规划全局路径的失败。
- (planThread)累计已花在规划全局路径时间是否超过5秒,如果没有,转到步骤1,继续规划。否则进入CLEARING状态。recovery_trigger_ = PLANNING_R。
CONTROLLING_R:全局路径是规划出来了,但规划本地路径失败
想像机器人处在这么个空间,它到goal可以有数条路径,有很很窄的路径a到gobal最短,于是全局规划时选了这条路径,因为全局规划时不考虑机器人足迹。但由于这路很窄,一到局部代价地图,考虑机器人足迹后,本地规划的失败。
- last_valid_control_记为最近一次规划出本地路径时刻。
- (主线程)tc_->computeVelocityCommands(cmd_vel),执行本地规划,失败。
- (主线程)判断本地规划累计失败时长是否>=controller_patience_秒,如果没有,进入步骤4。否则进入步骤5。
- (主线程)进入PLANNING状态,再次执行全局路径规划。规划出一条全局路径后,进入步骤2。
- (主线程)超过controller_patience_秒,本地规划都是失败,进入CLEARING状态。recovery_trigger_ = CONTROLLING_R。
OSCILLATION_R:机器人在来回摆动,长时间被困在一小片区域,包括就是不动,像轮子坏了
move_base认为,如果两次位置差值<oscillation_distance_,那就是没有移动。目前oscillation_distance_取值是0.2,即20厘米。
- last_oscillation_reset_记为最近一次机器人有过移动的时刻。
- (主线程)每次executeCycle时间片,检查机器人是否发生了移动。
- (主线程)累计已超过oscillation_timeout_秒都没有移动吗?如果没有,啥也不做。否则进入CLEARING状态。recovery_trigger_ = OSCILLATION_R。
1.3 planThread
planThread是planThread线程的处理例程。
void trose_slice::planThread(bool& exit) { ROS_DEBUG_NAMED("move_base_plan_thread","Starting planner thread..."); // ros::NodeHandle n; // n.setCallbackQueue(&cbqueue_); ros::Timer timer; while (!exit) { // check if we should run the planner (the mutex is locked) if (!runPlanner_) { SDL_Delay(10); continue; } if (exit) { break; } ros::Time start_time = ros::Time::now(); geometry_msgs::PoseStamped temp_goal; { threading::lock lock(planner_mutex_); // time to plan! get a copy of the goal and unlock the mutex temp_goal = planner_goal_; } ROS_DEBUG_NAMED("move_base_plan_thread","Planning..."); //run planner planner_plan_->clear(); bool gotPlan = makePlan(temp_goal, *planner_plan_); if (gotPlan) { ROS_DEBUG_NAMED("move_base_plan_thread","Got Plan with %zu points!", planner_plan_->size()); //pointer swap the plans under mutex (the controller will pull from latest_plan_) std::vector<geometry_msgs::PoseStamped>* temp_plan = planner_plan_; threading::lock lock(planner_mutex_); planner_plan_ = latest_plan_; latest_plan_ = temp_plan; last_valid_plan_ = ros::Time::now(); planning_retries_ = 0; new_global_plan_ = true;
{PLANNING_R[1/3]}此次规划全局路径成功,last_valid_plan_记住这个时刻。
ROS_DEBUG_NAMED("move_base_plan_thread","Generated a plan from the base_global_planner"); // make sure we only start the controller if we still haven't reached the goal if (runPlanner_) { state_ = CONTROLLING; } runPlanner_ = false; } // if we didn't get a plan and we are in the planning state (the robot isn't moving) else if (state_ == PLANNING) {
{PLANNING_R[2/3]}此次规划全局路径失败。
ROS_DEBUG_NAMED("move_base_plan_thread","No Plan..."); ros::Time attempt_end = last_valid_plan_ + ros::Duration(planner_patience_); //check if we've tried to make a plan for over our time limit or our maximum number of retries //issue #496: we stop planning when one of the conditions is true, but if max_planning_retries_ //is negative (the default), it is just ignored and we have the same behavior as ever threading::lock lock(planner_mutex_); planning_retries_++; if (runPlanner_ && (ros::Time::now() > attempt_end || planning_retries_ > uint32_t(max_planning_retries_))){ //we'll move into our obstacle clearing mode state_ = CLEARING; runPlanner_ = false; // proper solution for issue #523 publishZeroVelocity(); recovery_trigger_ = PLANNING_R;
{PLANNING_R[3/3]}累计失败次数或时长溢出。
规划全局路径失败,是否要进入CLEARING,取决于是否满足两个条件中一个。
- 根据last_valid_plan_判断累计已花在规划全局路径时间,一旦超过planner_patience_(5秒),进入CLEARING。
- 判断累计已规划全局路径失败次数planning_retries_,如果超过max_planning_retries_,进入CLEARING。
在目前设置中planner_patience_=5,planning_retries_= -1,由于-1转到uin32_t就是最大值,不可能满足这条件,是否进入CLEARING,只取消决于累计已花在规划全局路径时间是否超过5秒。
} } } }
1.4 executeCycle
executeCycle是个时间片函数。对move_base,每隔(1.0/controller_frequency_)秒会执行一次。对它的返回值,true表示此次executeCycle结束了此次导航,结果可能是成功,可能失败。
bool trose_slice::executeCycle(const geometry_msgs::PoseStamped& goal) { // boost::recursive_mutex::scoped_lock ecl(configuration_mutex_); //we need to be able to publish velocity commands geometry_msgs::Twist cmd_vel; //update feedback to correspond to our curent position geometry_msgs::PoseStamped global_pose; planner_costmap_ros_->getRobotPose(global_pose); const geometry_msgs::PoseStamped& current_position = global_pose; // push the feedback out // move_base_msgs::MoveBaseFeedback feedback; // feedback.base_position = current_position; // as_->publishFeedback(feedback); //check to see if we've moved far enough to reset our oscillation timeout if (distance(current_position, oscillation_pose_) >= oscillation_distance_) { last_oscillation_reset_ = ros::Time::now(); oscillation_pose_ = current_position;
{OSCILLATION_R[1/3]}此次检查发现机器人有过移动,last_oscillation_reset_记住这个时刻。
//if our last recovery was caused by oscillation, we want to reset the recovery index if (recovery_trigger_ == OSCILLATION_R) { recovery_index_ = 0; } } //check that the observation buffers for the costmap are current, we don't want to drive blind if (!controller_costmap_ros_->isCurrent()){ publishZeroVelocity(); return false; } // if we have a new plan then grab it and give it to the controller if (new_global_plan_) { //make sure to set the new plan flag to false new_global_plan_ = false; ROS_DEBUG_NAMED("move_base","Got a new plan...swap pointers"); //do a pointer swap under mutex std::vector<geometry_msgs::PoseStamped>* temp_plan = controller_plan_; { threading::lock lock(planner_mutex_); controller_plan_ = latest_plan_; latest_plan_ = temp_plan; } ROS_DEBUG_NAMED("move_base","pointers swapped!");
planThread成功规划出一条全局路径。把这全局路径改存到主线程能自由使用的controller_plan_。
if (!tc_->setPlan(*controller_plan_)) { // ABORT and SHUTDOWN COSTMAPS ROS_ERROR("Failed to pass global plan to the controller, aborting."); resetState(); { //disable the planner thread threading::lock lock(planner_mutex_); runPlanner_ = false; }
这里为什么会失败?
这里发生的错误认为是无法拯救,结束此次导航,result是结束之二“因失败而中止:setAborted”。
as_setAborted("Failed to pass global plan to the controller."); return true; } //make sure to reset recovery_index_ since we were able to find a valid plan if (recovery_trigger_ == PLANNING_R) { recovery_index_ = 0; } } //the move_base state machine, handles the control logic for navigation switch (state_) { // if we are in a planning state, then we'll attempt to make a plan case PLANNING: { threading::lock lock(planner_mutex_); runPlanner_ = true; // planner_cond_.notify_one(); } ROS_DEBUG_NAMED("move_base","Waiting for plan, in the planning state."); break; //if we're controlling, we'll attempt to find valid velocity commands case CONTROLLING: ROS_DEBUG_NAMED("move_base","In controlling state."); //check to see if we've reached our goal if(tc_->isGoalReached()){ ROS_DEBUG_NAMED("move_base","Goal reached!"); resetState(); { //disable the planner thread threading::lock lock(planner_mutex_); runPlanner_ = false; } as_setSucceeded("Goal reached."); return true; } //check for an oscillation condition if (oscillation_timeout_ > 0.0 && last_oscillation_reset_ + ros::Duration(oscillation_timeout_) < ros::Time::now()) { SDL_Log("%u, {leagor-verbose}check for an oscillation condition", SDL_GetTicks()); publishZeroVelocity(); state_ = CLEARING; recovery_trigger_ = OSCILLATION_R;
{OSCILLATION_R[3/3]}超过oscillation_timeout_秒都没有移动,进入CLEARING状态。。
} { boost::unique_lock<costmap_2d::Costmap2D::mutex_t> lock(*(controller_costmap_ros_->getCostmap()->getMutex())); if (tc_->computeVelocityCommands(cmd_vel)) {
{CONTROLLING_R[1/5]}此次在功规划出本地路径,last_valid_control_记住这个时刻。
ROS_DEBUG_NAMED( "move_base", "Got a valid command from the local planner: %.3lf, %.3lf, %.3lf", cmd_vel.linear.x, cmd_vel.linear.y, cmd_vel.angular.z ); last_valid_control_ = ros::Time::now(); //make sure that we send the velocity command to the base vel_pub_.publish(cmd_vel); if (recovery_trigger_ == CONTROLLING_R) { recovery_index_ = 0; } } else {
{CONTROLLING_R[2/5]}执行本地规划,失败。
ROS_DEBUG_NAMED("move_base", "The local planner could not find a valid plan."); ros::Time attempt_end = last_valid_control_ + ros::Duration(controller_patience_); //check if we've tried to find a valid control for longer than our time limit if (ros::Time::now() > attempt_end) {
{CONTROLLING_R[5/5]}本地规划失败累计时长超过controller_patience_秒,进入CLEARING状态。
//we'll move into our obstacle clearing mode publishZeroVelocity(); state_ = CLEARING; recovery_trigger_ = CONTROLLING_R; } else{
{CONTROLLING_R[4/5]}本地规划失败累计时长<=controller_patience_秒。进入PLANNING状态,再次让执行全局路径规划。
//otherwise, if we can't find a valid control, we'll go back to planning last_valid_plan_ = ros::Time::now(); planning_retries_ = 0; state_ = PLANNING; publishZeroVelocity(); { threading::lock lock(planner_mutex_); runPlanner_ = true; // planner_cond_.notify_one(); } } } } break; //we'll try to clear out space with any user-provided recovery behaviors case CLEARING: ROS_DEBUG_NAMED("move_base","In clearing/recovery state"); //we'll invoke whatever recovery behavior we're currently on if they're enabled if (recovery_behavior_enabled_ && recovery_index_ < recovery_behaviors_.size()){ move_base_msgs::RecoveryStatus msg; msg.pose_stamped = current_position; msg.current_recovery_number = recovery_index_; msg.total_number_of_recoveries = recovery_behaviors_.size(); msg.recovery_behavior_name = recovery_behavior_names_[recovery_index_]; recovery_status_pub_.publish(msg); recovery_behaviors_[recovery_index_]->runBehavior(); //we at least want to give the robot some time to stop oscillating after executing the behavior last_oscillation_reset_ = ros::Time::now(); //we'll check if the recovery behavior actually worked ROS_DEBUG_NAMED("move_base_recovery","Going back to planning state"); last_valid_plan_ = ros::Time::now(); planning_retries_ = 0; state_ = PLANNING;
尝试第recovery_index_种恢复性操作。无论执行那种恢复性操作,状态都进入PLANNING。
{PLANNING_R[1/3]}已做了恢复性操作,让此刻成为累计多久没成功规划全局路径时刻,last_valid_plan_记住这个时刻,
//update the index of the next recovery behavior that we'll try recovery_index_++; } else {
做了所有恢复性操作后,仍旧失败。结束此次导航,result是结束之二“因失败而中止:setAborted”。
{ threading::lock lock(planner_mutex_); //disable the planner thread runPlanner_ = false; } ROS_DEBUG_NAMED("move_base_recovery","Something should abort after this."); if(recovery_trigger_ == CONTROLLING_R){ as_setAborted("Failed to find a valid control. Even after executing recovery behaviors."); } else if(recovery_trigger_ == PLANNING_R){ ROS_ERROR("Aborting because a valid plan could not be found. Even after executing all recovery behaviors"); as_setAborted("Failed to find a valid plan. Even after executing recovery behaviors."); } else if(recovery_trigger_ == OSCILLATION_R){ as_setAborted("Robot is oscillating. Even after executing recovery behaviors."); } resetState(); return true; } break; default:
程序就不应该让进入这里。
ROS_ERROR("This case should never be reached, something is wrong, aborting"); resetState(); { threading::lock lock(planner_mutex_); //disable the planner thread runPlanner_ = false; } as_setAborted("Reached a case that should not be hit in move_base. This is a bug, please report it."); return true; } // we aren't done yet return false; }
二、三个全局路径:planner_plan_、latest_plan_、controller_plan_
std::vector<geometry_msgs::PoseStamped>* planner_plan_; std::vector<geometry_msgs::PoseStamped>* latest_plan_; std::vector<geometry_msgs::PoseStamped>* controller_plan_;
planner_plan_、latest_plan_、controller_plan_都用存储全局路径。为什么用指针,而不是std::vector对象?——过程中要对它们互换赋值,像planner_plan_和latest_plan_互换,用指针可以节省内存、cpu。它们生命期是和MoveBase对象一样。
在大致次序上,makePlan得到的结果赋给planner_plan_,但立即就赋给latest_plan_,latest_plan_则给了planner_plan_(我认主这个赋值可以省略)。“plan”topic发布的是makePlan得到的planner_plan_,但由于很快给了latest_plan_,可直观认为对应的是latst_plan_。在进入executeCycle时,再latest_plan_互换controller_plan_,最后进入BaseLocalPlanner的是controller_plan_。
latest_plan_表示上一次全局规划(makePkan)时规划出的路径。功能是用于同步。在planThread线程执行的makePlan需要点时间,过程中要一直接占用planner_plan_,要是主线程也要读planner_plan_的话,不方便线程同步。
- (planThread线程)调用makePlan规划路径,结果存放在planner_plan_。
- (planThread线程)获取路径锁(planner_mutex_),把planner_plan_赋值给latest_plan_,并把new_global_plan_设为true。释放路径锁,
- (主线程)发现new_global_plan_是true。获取路径锁,new_global_plan_设为false,把latest_plan_赋给值controller_plan_,释放路径锁。
有了latest_plan_后,planThread线程可自由使用planner_plan_,主线程可自由使用controller_plan_,controller_plan_就最近一次makePlan规划出的全局路径。
三、处理结束导航
结束导航包括两个操作,一是执行resetState(),二是向result话题发布此次导航结果。
resetState做下面操作。
- runPlanner_ = false。命令规划线程不再执行规划任务。
- 把一些变量复位到初始值。当中一个是状态恢复到PLANNING。
- 向机器人发布速度0,让机器人停下不动。
向result话题发布此次导航结果,状态码有三类。
result#1:导航成功(setSucceeded)
机器人成功移动到goal。
result#2:因失败而中止(setAborted)
哪里卡住了,尝试恢复后也没解决,只好中止,导航失败。
result#3:取消(setPreempted)
这指的是client发送了cancelGoal命令。
在move_base,当client发送了cancelGoal时,它进入的相关代码。
if (as_->isPreemptRequested()) { // 这段代码位在executeCb,在as_认为,上一次的sendGoal还没处理完。这次又来cancelGoal/sendGoal,认为是取代(Preempt)。 if (as_->isNewGoalAvailable()) { // sendGoal进入这里 } else { // cancelGoal进入这里 as_->setPreempted(); }
四、恢复性操作
多个恢复性操作被组织成一个数组recovery_behaviors_,从索引0开始,不断被执行。如果所有恢复性操作执行完了,仍旧出错,那此次导航失败。recovery_index_对应着这个数级的索引,用于指示一旦要发生恢复性操作,要执行的第几个。
这有个问题,下次出错时,恢复性操作是从0开始,还是从上一次执行后的recovery_index_开始?——两个都有可能。遵循这么条规则:如果执行恢复性操作后,能够把之前导致出错的原因解决掉,下次就从0开始,否则从上一次执行后的recovery_index_开始。举个例子,recovery_index_正等于1,这次出错原因是CONTROLLING_R,也就是机器人来回摆动。如果执行recovery_index_[1]这个恢复性操作后,机器人成功移出一段距离,不来回摆动了,下次再执行恢复性操作时,recovery_index_将从0开始,否则将从2开始(上次执行恢复性操作后,recovery_index_被+1到2)。
五、MoveBase析构函数
析构MoveBase时,须注意销毁内中对象次序。个人认为官方~MoveBase是有问题的,当正处于恢复性操作RotateRecovery时,极可能导致程序崩溃。
MoveBase::~MoveBase(){ // 1. make sure exit executeCb and don't execute it again. recovery_index_ = (int)recovery_behaviors_.size(); planner_patience_ = 0.0; if (as_ != NULL) { delete as_; } recovery_behaviors_.clear();
首先要保证退出executeCB,并保证不会再被执行。
- recovery_index_ = (int)recovery_behaviors_.size()。recovery_behaviors_中存储着要依次执行的恢复性操作,极可能是多个操作,只有这些操作都被执行了,executeCb才会认为导航需要中止。这个赋值保证了,executeCb执行完正在运行的恢复性操作后,只要再次进入CLEARING,就中止导航。
- planner_patience_ = 0.0。如果executeCb当前正在执行恢复性操作,那完成这个操作后,便会让状态进入PLANNING。这个从5到0的修改让planThread线程只要一次makePlan失败,就进入CLEARING,不必等以往的5秒溢出。这里不去改max_planning_retries_,是考虑到一般不会用makePlan失败次数作为进入CLEARING条件。另外,也不是runPlanner_ = false,是因为如果当前正在执行恢复性操作的话,完成后,executeCb会把runPlanner_ = true,那上面修改就没用。
- delete as_。executeCB是在as_内部的ActionServerThread线程执行。销毁它,保证executeCB不会被调用。
- recovery_behaviors_.clear()。销毁要使用的恢复性操作对象,清空recovery_behaviors_。
// 2. dynamic_reconfigure::Server delete dsrv_;
dsrv_类型是dynamic_reconfigure::Server,我对动态配置了解不多。放这里是因为官方~MoveBase也是较早销毁它。
// 3. destroy planThread. // planThread use planner_costmap_ros_ during makePlan. delete planner_thread_;
销毁planThread线程。如果planThread正在执行makePlan,那要使用planner_costmap_ros_。所以必须放在销毁planner_costmap_ros_之前。
// 4. destroy gloal-costmap and local-costmap if (planner_costmap_ros_ != NULL) { delete planner_costmap_ros_; } if (controller_costmap_ros_ != NULL) { delete controller_costmap_ros_; }
executeCB已不会执行,planThread已销毁,可安全销毁两个代价地图了。
// 5. misc object delete planner_plan_; delete latest_plan_; delete controller_plan_; planner_.reset(); tc_.reset(); }
至于~MoveBase要执行多少时间。最坏情况正在执行恢复性操作RotateRecovery,恰好又是开始的话,得等5秒溢出,这时时间5秒多点。对其它情况,时间应该很短。