move_base节点

出错时，恢复性操作是从0开始，还是从上一次执行后的recovery_index_开始？遵循这么条规则：如果执行恢复性操作后，能够把之前导致出错的原因解决掉，下次就从0开始，否则从上一次执行后的recovery_index_开始。

一、状态和出错原因

1.1 状态

move_base节点有3种状态：PLANNING、CONTROLLING和CLEARING。一旦初始化后，过程中会执行的就是收到sendGoal时的executeCB，总是在导航中。

PLANNING。该阶段要调用makePlan规划全局路径。规划出路径后进入CONTROLLING。如果长时间规划失败，进入CLEARING(PLANNING_R)。
CONTROLLING。已获得了全局路径，一次又一次进行着局部规划，根据得出的速度移动机器人，直到到达goal。如果出现长时间局部规划失败(CONTROLLING_R)，或长时间机器人不动(OSCILLATION_R)，进入CLEARING。
CLEARING。导航卡住了，试着执行恢复性操作。操作执行完后转入PLANNING。

看下导航过程没发生问题时的状态变换。

(主线程)接收到sendGoal请求，状态进入PLANNING。
(planThread)执行makePlan规划全局路径。规划出路径后，状态进入CONTROLLING
(主线程)局部代价地图算出一个最优cmd_vel，按这速度移动机器人。
(主线程)稍做等待，用tc_->isGoalReached判断是否到goal，如果没有，转到步骤3。如果到达了，此次导航结束。

如果一切正常，不会进入CLEARING，要进入CLEARING是导航出问题了，要进行容错处理。简单来说，机器人在哪里卡住了，要根据经验试着让机器人N种恢复性操作。这N种操作不是每次出错就全做，而是只做一种。举个例子，第一次出错时，做第一种恢复操作；第二次出错时，做第二种恢复操作，如果N种做完了依旧失败，那只能中止导航。每做完一种恢复性操作后，状态一律进入PLANNING。

1.2 出错原因

move_base把出错原因归为三种：PLANNING_R、CONTROLLING_R和OSCILLATION_R，下表是这三种原因各自涉及到的设置参数。

参数	原因	目前值	描述
planner_patience_	PLANNING_R	2(秒)[注1]	一旦累计已花在全局路径规划时间超过3秒，进入CLEARING状态
max_planning_retries_	PLANNING_R	-1(次)[注2]	一旦全局路径规划失败次数超过uint32_t(-1)次，进入CLEARING状态。
controller_patience_	CONTROLLING_R	3.0(秒)	一旦累计已花在局部规划时间超过3秒，进入CLEARING状态
oscillation_distance_	OSCILLATION_R	0.2(米)	两次位置差值>=20厘米，认为机器人出现了不是抖动的移动
oscillation_timeout_	OSCILLATION_R	10.0(秒)	一旦机器人已有超过10秒没动过，进入CLEARING状态

注1：一旦全局路径规划失败，在短时间内，很少能变成成功。当然有这么种可能，起点和目标点之间存在走来走去的人，恰好走开关键位置了，使变得成功。结合恢复性操作trose_recovery，那里会把代价地图设为默认值，能缓解这情况。为减少溢出时间，有些厂家是5秒，这里降到2秒。
注2：由于uint32_t(-1)是个极大值，现实中不会出现超过这么多的失败次数，即目前对全局路径规划失败导致的中止只考虑时间因素。

执行完所有恢复性操作后，还是不行，要结束此次导航。不论什么原因，result是“因失败而中止：setAborted”。

PLANNING_R：一直规划不出全局路径

last_valid_plan_记为最近一次规划出全局路径时刻。
(planThread)此次makePlan规划全局路径的失败。
(planThread)累计已花在规划全局路径时间是否超过5秒，如果没有，转到步骤1，继续规划。否则进入CLEARING状态。recovery_trigger_ = PLANNING_R。

CONTROLLING_R：全局路径是规划出来了，但规划本地路径失败

想像机器人处在这么个空间，它到goal可以有数条路径，有很很窄的路径a到gobal最短，于是全局规划时选了这条路径，因为全局规划时不考虑机器人足迹。但由于这路很窄，一到局部代价地图，考虑机器人足迹后，本地规划的失败。

last_valid_control_记为最近一次规划出本地路径时刻。
(主线程)tc_->computeVelocityCommands(cmd_vel)，执行本地规划，失败。
(主线程)判断本地规划累计失败时长是否>=controller_patience_秒，如果没有，进入步骤4。否则进入步骤5。
(主线程)进入PLANNING状态，再次执行全局路径规划。规划出一条全局路径后，进入步骤2。
(主线程)超过controller_patience_秒，本地规划都是失败，进入CLEARING状态。recovery_trigger_ = CONTROLLING_R。

OSCILLATION_R：机器人在来回摆动，长时间被困在一小片区域，包括就是不动，像轮子坏了

move_base认为，如果两次位置差值<oscillation_distance_，那就是没有移动。目前oscillation_distance_取值是0.2，即20厘米。

last_oscillation_reset_记为最近一次机器人有过移动的时刻。
(主线程)每次executeCycle时间片，检查机器人是否发生了移动。
(主线程)累计已超过oscillation_timeout_秒都没有移动吗？如果没有，啥也不做。否则进入CLEARING状态。recovery_trigger_ = OSCILLATION_R。

1.3 planThread

planThread是planThread线程的处理例程。

void trose_slice::planThread(bool& exit)
{
    ROS_DEBUG_NAMED("move_base_plan_thread","Starting planner thread...");
    // ros::NodeHandle n;
    // n.setCallbackQueue(&cbqueue_);
    ros::Timer timer;
    while (!exit) {
        // check if we should run the planner (the mutex is locked)
        if (!runPlanner_) {
            SDL_Delay(10);
            continue;
        }
        if (exit) { 
            break; 
        }
        ros::Time start_time = ros::Time::now();

        geometry_msgs::PoseStamped temp_goal;
        {
            threading::lock lock(planner_mutex_);
            // time to plan! get a copy of the goal and unlock the mutex
            temp_goal = planner_goal_;
        }
        ROS_DEBUG_NAMED("move_base_plan_thread","Planning...");

        //run planner
        planner_plan_->clear();
        bool gotPlan = makePlan(temp_goal, *planner_plan_);

        if (gotPlan) {
            ROS_DEBUG_NAMED("move_base_plan_thread","Got Plan with %zu points!", planner_plan_->size());
            //pointer swap the plans under mutex (the controller will pull from latest_plan_)
            std::vector<geometry_msgs::PoseStamped>* temp_plan = planner_plan_;

            threading::lock lock(planner_mutex_);
            planner_plan_ = latest_plan_;
            latest_plan_ = temp_plan;
            last_valid_plan_ = ros::Time::now();
            planning_retries_ = 0;
            new_global_plan_ = true;

{PLANNING_R[1/3]}此次规划全局路径成功，last_valid_plan_记住这个时刻。

            ROS_DEBUG_NAMED("move_base_plan_thread","Generated a plan from the base_global_planner");

            // make sure we only start the controller if we still haven't reached the goal
            if (runPlanner_) {
                state_ = CONTROLLING;
            }

            runPlanner_ = false;
        }
        // if we didn't get a plan and we are in the planning state (the robot isn't moving)
        else if (state_ == PLANNING) {

{PLANNING_R[2/3]}此次规划全局路径失败。

            ROS_DEBUG_NAMED("move_base_plan_thread","No Plan...");
            ros::Time attempt_end = last_valid_plan_ + ros::Duration(planner_patience_);
            //check if we've tried to make a plan for over our time limit or our maximum number of retries
            //issue #496: we stop planning when one of the conditions is true, but if max_planning_retries_
            //is negative (the default), it is just ignored and we have the same behavior as ever
            threading::lock lock(planner_mutex_);
            planning_retries_++;
            if (runPlanner_ &&
                (ros::Time::now() > attempt_end || planning_retries_ > uint32_t(max_planning_retries_))){
                //we'll move into our obstacle clearing mode
                state_ = CLEARING;
                runPlanner_ = false;  // proper solution for issue #523
                publishZeroVelocity();
                recovery_trigger_ = PLANNING_R;

{PLANNING_R[3/3]}累计失败次数或时长溢出。

规划全局路径失败，是否要进入CLEARING，取决于是否满足两个条件中一个。

根据last_valid_plan_判断累计已花在规划全局路径时间，一旦超过planner_patience_(5秒)，进入CLEARING。
判断累计已规划全局路径失败次数planning_retries_，如果超过max_planning_retries_，进入CLEARING。

在目前设置中planner_patience_=5，planning_retries_= -1，由于-1转到uin32_t就是最大值，不可能满足这条件，是否进入CLEARING，只取消决于累计已花在规划全局路径时间是否超过5秒。

1.4 executeCycle

executeCycle是个时间片函数。对move_base，每隔(1.0/controller_frequency_)秒会执行一次。对它的返回值，true表示此次executeCycle结束了此次导航，结果可能是成功，可能失败。

bool trose_slice::executeCycle(const geometry_msgs::PoseStamped& goal)
{
    // boost::recursive_mutex::scoped_lock ecl(configuration_mutex_);
    //we need to be able to publish velocity commands
    geometry_msgs::Twist cmd_vel;

    //update feedback to correspond to our curent position
    geometry_msgs::PoseStamped global_pose;
    planner_costmap_ros_->getRobotPose(global_pose);
    const geometry_msgs::PoseStamped& current_position = global_pose;

    // push the feedback out
    // move_base_msgs::MoveBaseFeedback feedback;
    // feedback.base_position = current_position;
    // as_->publishFeedback(feedback);

    //check to see if we've moved far enough to reset our oscillation timeout
    if (distance(current_position, oscillation_pose_) >= oscillation_distance_) {
        last_oscillation_reset_ = ros::Time::now();
        oscillation_pose_ = current_position;

{OSCILLATION_R[1/3]}此次检查发现机器人有过移动，last_oscillation_reset_记住这个时刻。

        //if our last recovery was caused by oscillation, we want to reset the recovery index
        if (recovery_trigger_ == OSCILLATION_R) {
            recovery_index_ = 0;
        }
    }

    //check that the observation buffers for the costmap are current, we don't want to drive blind
    if (!controller_costmap_ros_->isCurrent()){
        publishZeroVelocity();
        return false;
    }

    // if we have a new plan then grab it and give it to the controller
    if (new_global_plan_) {
        //make sure to set the new plan flag to false
        new_global_plan_ = false;

        ROS_DEBUG_NAMED("move_base","Got a new plan...swap pointers");

        //do a pointer swap under mutex
        std::vector<geometry_msgs::PoseStamped>* temp_plan = controller_plan_;

        {
            threading::lock lock(planner_mutex_);
            controller_plan_ = latest_plan_;
            latest_plan_ = temp_plan;
        }
        ROS_DEBUG_NAMED("move_base","pointers swapped!");

planThread成功规划出一条全局路径。把这全局路径改存到主线程能自由使用的controller_plan_。

        if (!tc_->setPlan(*controller_plan_)) {
            // ABORT and SHUTDOWN COSTMAPS
            ROS_ERROR("Failed to pass global plan to the controller, aborting.");
            resetState();

            {
                //disable the planner thread
                threading::lock lock(planner_mutex_);
                runPlanner_ = false;
            }

这里为什么会失败？

这里发生的错误认为是无法拯救，结束此次导航，result是结束之二“因失败而中止：setAborted”。

            as_setAborted("Failed to pass global plan to the controller.");
            return true;
        }

        //make sure to reset recovery_index_ since we were able to find a valid plan
        if (recovery_trigger_ == PLANNING_R) {
            recovery_index_ = 0;
        }
    }

    //the move_base state machine, handles the control logic for navigation
    switch (state_) {
    // if we are in a planning state, then we'll attempt to make a plan
    case PLANNING:
        {
            threading::lock lock(planner_mutex_);
            runPlanner_ = true;
            // planner_cond_.notify_one();
        }
        ROS_DEBUG_NAMED("move_base","Waiting for plan, in the planning state.");
        break;

    //if we're controlling, we'll attempt to find valid velocity commands
    case CONTROLLING:
        ROS_DEBUG_NAMED("move_base","In controlling state.");

        //check to see if we've reached our goal
        if(tc_->isGoalReached()){
            ROS_DEBUG_NAMED("move_base","Goal reached!");
            resetState();

            {
                //disable the planner thread
                threading::lock lock(planner_mutex_);
                runPlanner_ = false;
            }

            as_setSucceeded("Goal reached.");
            return true;
        }

        //check for an oscillation condition
        if (oscillation_timeout_ > 0.0 &&
            last_oscillation_reset_ + ros::Duration(oscillation_timeout_) < ros::Time::now())
        {
            SDL_Log("%u, {leagor-verbose}check for an oscillation condition", SDL_GetTicks());
            publishZeroVelocity();
            state_ = CLEARING;
            recovery_trigger_ = OSCILLATION_R;

{OSCILLATION_R[3/3]}超过oscillation_timeout_秒都没有移动，进入CLEARING状态。。

        }

        {
            boost::unique_lock<costmap_2d::Costmap2D::mutex_t> lock(*(controller_costmap_ros_->getCostmap()->getMutex()));

            if (tc_->computeVelocityCommands(cmd_vel)) {

{CONTROLLING_R[1/5]}此次在功规划出本地路径，last_valid_control_记住这个时刻。

                ROS_DEBUG_NAMED( "move_base", "Got a valid command from the local planner: %.3lf, %.3lf, %.3lf",
                                cmd_vel.linear.x, cmd_vel.linear.y, cmd_vel.angular.z );
                last_valid_control_ = ros::Time::now();
                //make sure that we send the velocity command to the base
                vel_pub_.publish(cmd_vel);
                if (recovery_trigger_ == CONTROLLING_R) {
                    recovery_index_ = 0;
                }
            } else {

{CONTROLLING_R[2/5]}执行本地规划，失败。

                ROS_DEBUG_NAMED("move_base", "The local planner could not find a valid plan.");
                ros::Time attempt_end = last_valid_control_ + ros::Duration(controller_patience_);

                //check if we've tried to find a valid control for longer than our time limit
                if (ros::Time::now() > attempt_end) {

{CONTROLLING_R[5/5]}本地规划失败累计时长超过controller_patience_秒，进入CLEARING状态。

                    //we'll move into our obstacle clearing mode
                    publishZeroVelocity();
                    state_ = CLEARING;
                    recovery_trigger_ = CONTROLLING_R;
                } else{

{CONTROLLING_R[4/5]}本地规划失败累计时长<=controller_patience_秒。进入PLANNING状态，再次让执行全局路径规划。

                    //otherwise, if we can't find a valid control, we'll go back to planning
                    last_valid_plan_ = ros::Time::now();
                    planning_retries_ = 0;
                    state_ = PLANNING;
                    publishZeroVelocity();

                    {
                        threading::lock lock(planner_mutex_);
                        runPlanner_ = true;
                        // planner_cond_.notify_one();
                    }
                }
            }
        }

        break;

    //we'll try to clear out space with any user-provided recovery behaviors
    case CLEARING:
        ROS_DEBUG_NAMED("move_base","In clearing/recovery state");
        //we'll invoke whatever recovery behavior we're currently on if they're enabled
        if (recovery_behavior_enabled_ && recovery_index_ < recovery_behaviors_.size()){
            move_base_msgs::RecoveryStatus msg;
            msg.pose_stamped = current_position;
            msg.current_recovery_number = recovery_index_;
            msg.total_number_of_recoveries = recovery_behaviors_.size();
            msg.recovery_behavior_name =  recovery_behavior_names_[recovery_index_];

            recovery_status_pub_.publish(msg);

            recovery_behaviors_[recovery_index_]->runBehavior();

            //we at least want to give the robot some time to stop oscillating after executing the behavior
            last_oscillation_reset_ = ros::Time::now();

            //we'll check if the recovery behavior actually worked
            ROS_DEBUG_NAMED("move_base_recovery","Going back to planning state");
            last_valid_plan_ = ros::Time::now();
            planning_retries_ = 0;
            state_ = PLANNING;

尝试第recovery_index_种恢复性操作。无论执行那种恢复性操作，状态都进入PLANNING。

{PLANNING_R[1/3]}已做了恢复性操作，让此刻成为累计多久没成功规划全局路径时刻，last_valid_plan_记住这个时刻，

            //update the index of the next recovery behavior that we'll try
            recovery_index_++;
        } else {

做了所有恢复性操作后，仍旧失败。结束此次导航，result是结束之二“因失败而中止：setAborted”。

            {
                threading::lock lock(planner_mutex_);
                //disable the planner thread
                runPlanner_ = false;
            }

            ROS_DEBUG_NAMED("move_base_recovery","Something should abort after this.");

            if(recovery_trigger_ == CONTROLLING_R){
                as_setAborted("Failed to find a valid control. Even after executing recovery behaviors.");
            }
            else if(recovery_trigger_ == PLANNING_R){
                ROS_ERROR("Aborting because a valid plan could not be found. Even after executing all recovery behaviors");
                as_setAborted("Failed to find a valid plan. Even after executing recovery behaviors.");
            }
            else if(recovery_trigger_ == OSCILLATION_R){
                as_setAborted("Robot is oscillating. Even after executing recovery behaviors.");
            }
            resetState();
            return true;
        }
        break;
    default:

程序就不应该让进入这里。

        ROS_ERROR("This case should never be reached, something is wrong, aborting");
        resetState();
        {
            threading::lock lock(planner_mutex_);
            //disable the planner thread
            runPlanner_ = false;
        }
        as_setAborted("Reached a case that should not be hit in move_base. This is a bug, please report it.");
        return true;
    }

    // we aren't done yet
    return false;
}

二、三个全局路径：planner_plan_、latest_plan_、controller_plan_

std::vector<geometry_msgs::PoseStamped>* planner_plan_;
std::vector<geometry_msgs::PoseStamped>* latest_plan_; 
std::vector<geometry_msgs::PoseStamped>* controller_plan_;

planner_plan_、latest_plan_、controller_plan_都用存储全局路径。为什么用指针，而不是std::vector对象？——过程中要对它们互换赋值，像planner_plan_和latest_plan_互换，用指针可以节省内存、cpu。它们生命期是和MoveBase对象一样。

在大致次序上，makePlan得到的结果赋给planner_plan_，但立即就赋给latest_plan_，latest_plan_则给了planner_plan_(我认主这个赋值可以省略)。“plan”topic发布的是makePlan得到的planner_plan_，但由于很快给了latest_plan_，可直观认为对应的是latst_plan_。在进入executeCycle时，再latest_plan_互换controller_plan_，最后进入BaseLocalPlanner的是controller_plan_。

latest_plan_表示上一次全局规划(makePkan)时规划出的路径。功能是用于同步。在planThread线程执行的makePlan需要点时间，过程中要一直接占用planner_plan_，要是主线程也要读planner_plan_的话，不方便线程同步。

(planThread线程)调用makePlan规划路径，结果存放在planner_plan_。
(planThread线程)获取路径锁(planner_mutex_)，把planner_plan_赋值给latest_plan_，并把new_global_plan_设为true。释放路径锁，
(主线程)发现new_global_plan_是true。获取路径锁，new_global_plan_设为false，把latest_plan_赋给值controller_plan_，释放路径锁。

有了latest_plan_后，planThread线程可自由使用planner_plan_，主线程可自由使用controller_plan_，controller_plan_就最近一次makePlan规划出的全局路径。

三、处理结束导航

结束导航包括两个操作，一是执行resetState()，二是向result话题发布此次导航结果。

resetState做下面操作。

runPlanner_ = false。命令规划线程不再执行规划任务。
把一些变量复位到初始值。当中一个是状态恢复到PLANNING。
向机器人发布速度0，让机器人停下不动。

向result话题发布此次导航结果，状态码有三类。

result#1：导航成功(setSucceeded)

机器人成功移动到goal。

result#2：因失败而中止(setAborted)

哪里卡住了，尝试恢复后也没解决，只好中止，导航失败。

result#3：取消(setPreempted)

这指的是client发送了cancelGoal命令。

在move_base，当client发送了cancelGoal时，它进入的相关代码。

if (as_->isPreemptRequested()) {
  // 这段代码位在executeCb，在as_认为，上一次的sendGoal还没处理完。这次又来cancelGoal/sendGoal，认为是取代(Preempt)。
  if (as_->isNewGoalAvailable()) {
    // sendGoal进入这里
   } else {
    // cancelGoal进入这里
    as_->setPreempted();
}

四、恢复性操作

多个恢复性操作被组织成一个数组recovery_behaviors_，从索引0开始，不断被执行。如果所有恢复性操作执行完了，仍旧出错，那此次导航失败。recovery_index_对应着这个数级的索引，用于指示一旦要发生恢复性操作，要执行的第几个。

这有个问题，下次出错时，恢复性操作是从0开始，还是从上一次执行后的recovery_index_开始？——两个都有可能。遵循这么条规则：如果执行恢复性操作后，能够把之前导致出错的原因解决掉，下次就从0开始，否则从上一次执行后的recovery_index_开始。举个例子，recovery_index_正等于1，这次出错原因是CONTROLLING_R，也就是机器人来回摆动。如果执行recovery_index_[1]这个恢复性操作后，机器人成功移出一段距离，不来回摆动了，下次再执行恢复性操作时，recovery_index_将从0开始，否则将从2开始（上次执行恢复性操作后，recovery_index_被+1到2）。

五、MoveBase析构函数

析构MoveBase时，须注意销毁内中对象次序。个人认为官方~MoveBase是有问题的，当正处于恢复性操作RotateRecovery时，极可能导致程序崩溃。

MoveBase::~MoveBase(){
  // 1. make sure exit executeCb and don't execute it again.
  recovery_index_ = (int)recovery_behaviors_.size();
  planner_patience_ = 0.0;
  if (as_ != NULL) {
    delete as_;
  }

  recovery_behaviors_.clear();

首先要保证退出executeCB，并保证不会再被执行。

recovery_index_ = (int)recovery_behaviors_.size()。recovery_behaviors_中存储着要依次执行的恢复性操作，极可能是多个操作，只有这些操作都被执行了，executeCb才会认为导航需要中止。这个赋值保证了，executeCb执行完正在运行的恢复性操作后，只要再次进入CLEARING，就中止导航。
planner_patience_ = 0.0。如果executeCb当前正在执行恢复性操作，那完成这个操作后，便会让状态进入PLANNING。这个从5到0的修改让planThread线程只要一次makePlan失败，就进入CLEARING，不必等以往的5秒溢出。这里不去改max_planning_retries_，是考虑到一般不会用makePlan失败次数作为进入CLEARING条件。另外，也不是runPlanner_ = false，是因为如果当前正在执行恢复性操作的话，完成后，executeCb会把runPlanner_ = true，那上面修改就没用。
delete as_。executeCB是在as_内部的ActionServerThread线程执行。销毁它，保证executeCB不会被调用。
recovery_behaviors_.clear()。销毁要使用的恢复性操作对象，清空recovery_behaviors_。

  // 2. dynamic_reconfigure::Server
  delete dsrv_;

dsrv_类型是dynamic_reconfigure::Server，我对动态配置了解不多。放这里是因为官方~MoveBase也是较早销毁它。

  // 3. destroy planThread. 
  // planThread use planner_costmap_ros_ during makePlan.
  delete planner_thread_;

销毁planThread线程。如果planThread正在执行makePlan，那要使用planner_costmap_ros_。所以必须放在销毁planner_costmap_ros_之前。

  // 4. destroy gloal-costmap and local-costmap
  if (planner_costmap_ros_ != NULL) {
    delete planner_costmap_ros_;
  }

  if (controller_costmap_ros_ != NULL) {
    delete controller_costmap_ros_;
  }

executeCB已不会执行，planThread已销毁，可安全销毁两个代价地图了。

  // 5. misc object
  delete planner_plan_;
  delete latest_plan_;
  delete controller_plan_;

  planner_.reset();
  tc_.reset();
}

至于~MoveBase要执行多少时间。最坏情况正在执行恢复性操作RotateRecovery，恰好又是开始的话，得等5秒溢出，这时时间5秒多点。对其它情况，时间应该很短。

move_base节点

全部评论: 0 条