ROS高效进阶第五章 -- 机器人语音交互之ros集成科大讯飞中文语音库,实现语音控制机器人小车,ROS高效进阶,集成科大讯飞中文语音库实现机器人小车语音交互控制

马肤

温馨提示:这篇文章已超过468天没有更新,请注意相关的内容是否还可用!

摘要:本章介绍了如何使用ROS(机器人操作系统)集成科大讯飞的中文语音库,实现机器人的语音交互功能。通过这一技术,机器人能够识别和理解人类语言,进而实现语音控制小车等应用。该进阶内容对于提高机器人的智能化水平具有重要意义,为机器人技术在实际应用中的普及和发展奠定了基础。

机器人语音交互之ros集成科大讯飞中文语音库,实现语音控制机器人小车

  • 1 背景和资料
  • 2 正文
    • 2.1 下载科大讯飞语音库
    • 2.2 robot_voice 之语音控制机器人小车移动样例
    • 3 总结

      1 背景和资料

      从本文开始,我们将用两篇文章学习机器人语音交互。本文作为第一篇,将在ros上集成科大讯飞的中文语音库,实现语音控制机器人小车运动。至于语音识别和语音合成的原理,本文并不深究,读者可以自行搜索相关的文章介绍。这里提醒,本文的测试环境是ubuntu20.04 + ros noetic。

      本文参考资料如下:

      (1)《ROS机器人开发实践》胡春旭 第8章

      (2)讯飞语音识别和唤醒开发示例

      (3)讯飞语音听写 Linux SDK 文档

      (4)ROS高效入门第二章 – 基本概念和常用命令学习,基于小乌龟样例

      (5)ROS高效进阶第三章 – 以差速轮式机器人为例,使用Gazebo构建机器人仿真平台

      2 正文

      2.1 下载科大讯飞语音库

      (1)首先登陆讯飞开放平台:讯飞开放平台 ,注册后,点击控制台进入。

      (2)然后创建应用并下载linux sdk,更具体的操作可以参考:讯飞语音识别和唤醒开发示例

      ROS高效进阶第五章 -- 机器人语音交互之ros集成科大讯飞中文语音库,实现语音控制机器人小车,ROS高效进阶,集成科大讯飞中文语音库实现机器人小车语音交互控制 第1张

      (3)最后得到自己专属的sdk,如我本人的:Linux_iat1227_tts_online1227_bb839ccf.zip,其中 bb839ccf 是专属bb839ccf。下面我们将把这套 sdk 集成到 robot_voice 样例中,这里不对这个 zip 包内容进行展开讲解。

      2.2 robot_voice 之语音控制机器人小车移动样例

      (1)robot_voice 样例,我们将实现两个应用,第一个就是本文的语音控制机器人小车移动,拓扑图如下:

      voice_detector:负责语音识别,将语音转换为文字,并作为 client,通过 human_chatter 服务,发给 robot_controller 。

      robot_controller:作为 human_chatter 服务 server,接收 voice_detector 发来的文字化的指令,并生成对应的语音播报文字和控车命令。前者通过 str2voice 服务,发给 voice_creator,后者通过 /cmd_vel topic,发给 mbot_gazebo。

      voice_creator:作为 str2voice 服务server,接收 robot_controller 发来的语音播报文字,合成语音文件并播放。

      mbot_gazebo:作为机器人小车,接收 /cmd_vel topic,并调整运动状态。

      补充:关于ros的服务机制,可以参考本人ROS高效入门博客第二章的2.6节: ROS高效入门第二章 – 基本概念和常用命令学习,基于小乌龟样例

      (2)安装环境:

      unzip Linux_iat1227_tts_online1227_bb839ccf.zip
      sudo cp libs/x64/libmsc.so /usr/lib/
      sudo apt update
      sudo apt install sox
      sudo apt install libsox-fmt-all
      

      SoX, 全称 Sound eXchange,被官方称为 “the Swiss Army knife of audio manipulation”。

      它是一个强大的用于转换和处理声音文件的库。因其操作简单且功能强大,广泛应用在音频数据的处理和分析中。

      除此之外,本人在编译讯飞样例时,遇到了:

      linuxrec.c:12:10: fatal error: alsa/asoundlib.h: No such file or directory
      

      解决方式是:

      sudo apt-get install libasound2-dev
      

      ALSA,全称Advanced Linux Sound Architecture (ALSA) 库,用于处理音频设备。

      (3)创建 robot_voice 及相关文件

      cd ~/catkin_ws/src
      catkin_create_pkg robot_voice roscpp rospy std_msgs geometry_msgs message_generation message_runtime
      cd robot_voice 
      mkdir srv launch
      touch srv/StringToVoice.srv launch/voice_control_robot.launch
      touch src/voice_detector.cpp src/robot_controller.cpp src/voice_creator.cpp
      mkdir ifly_voice include/ifly_voice
      

      请将 Linux_iat1227_tts_online1227_bb839ccf.zip 中的相关文件,分别移入相关目录,供编译使用,如下图:

      ROS高效进阶第五章 -- 机器人语音交互之ros集成科大讯飞中文语音库,实现语音控制机器人小车,ROS高效进阶,集成科大讯飞中文语音库实现机器人小车语音交互控制 第2张

      (4)voice_detector.cpp

      #include 
      #include 
      #include 
      #include 
      #include 
      #include 
      #include "ifly_voice/qisr.h"
      #include "ifly_voice/qtts.h"
      #include "ifly_voice/msp_cmn.h"
      #include "ifly_voice/formats.h"
      #include "ifly_voice/msp_errors.h"
      #include "ifly_voice/speech_recognizer.h"
      #include 
      #include 
      class Helper {
      public:
        static void SignalHandler(int signal) {
            ROS_INFO("\nCaught signal %d. Exiting gracefully...\n", signal);
            exit(0);
        }
      };
      class VoiceDetector {
      public:
        VoiceDetector() {
          ROS_INFO("VoiceDetector Constructor");
        }
        ~VoiceDetector() {
          ROS_INFO("VoiceDetector Destructor");
        }
        int Init() {
          int ret = MSP_SUCCESS;
          ret = MSPLogin(NULL, NULL, login_params_.c_str());
          if (MSP_SUCCESS != ret)	{
            ROS_ERROR("MSPLogin failed , Error code %d", ret);
            MSPLogout(); // Logout...
            return -1;
          }    
          ROS_INFO("VoiceDetector MSP Login for update, waiting for seconds...");
          return 0;
        }
        static void JoinTxt(const char *result, char is_last) {
          if (result) {
            std::string slice_txt = result;
            VoiceDetector::voice_txt_ += slice_txt;
          }
          if (is_last) {
            printf("voice txt : %s\n", VoiceDetector::voice_txt_.c_str());
          }
        }
        static void InitSpeech() {
          VoiceDetector::voice_txt_ = "";
          printf("clear cache, start Listening...\n");
        }
        static void EndSpeech(int reason) {
          if (reason == END_REASON_VAD_DETECT) {
            printf("\nSpeaking done \n");
          } else {
            printf("\nRecognizer error %d\n", reason);
          }
        }
        int SpeechOnce() {
          int ret;
          int i = 0;
          struct speech_rec iat;
          struct speech_rec_notifier recnotifier = {
            JoinTxt,
            InitSpeech,
            EndSpeech
          };
          ret = sr_init(&iat, session_begin_params_.c_str(), SR_MIC, &recnotifier);
          if (ret) {
            ROS_ERROR("speech recognizer init failed");
            return -1;
          }
          ret = sr_start_listening(&iat);
          if (ret) {
            printf("start listen failed %d\n", ret);
          }
          /* demo 15 seconds recording */
          sleep(10);
          ret = sr_stop_listening(&iat);
          if (ret) {
            printf("stop listening failed %d\n", ret);
          }
          sr_uninit(&iat);
          return 0;
        }
        static std::string get_voice_txt_() {
          return voice_txt_;
        }
      private:
      	const std::string login_params_ = "appid = bb839ccf, work_dir = .";
      	const std::string session_begin_params_ =
      		"sub = iat, domain = iat, language = zh_cn, "
      		"accent = mandarin, sample_rate = 16000, "
      		"result_type = plain, result_encoding = utf8";
        const uint32_t	BufferSize = 4096;
        uint64_t g_buffersize = BufferSize;
        static std::string voice_txt_;
      };
      std::string VoiceDetector::voice_txt_ = "";
      int main(int argc, char* argv[]) {
        int ret = 0;
        ros::init(argc, argv, "voice_detector");
        ros::NodeHandle nh;
        // 创建 human_chatter 服务client
        ros::ServiceClient client_ = nh.serviceClient("human_chatter");
        if (signal(SIGINT, Helper::SignalHandler) == SIG_ERR) {
          return -1;
        }
        VoiceDetector vd;
        ret = vd.Init();
        if (ret  
      

      (5)robot_controller.cpp

      #include 
      #include 
      #include 
      #include 
      #include 
      #include 
      #include 
      #include 
      #include 
      class RobotController {
      public:
        RobotController() {
          ROS_INFO("RobotController Constructor");
        }
        ~RobotController() {
          ROS_INFO("RobotController Destructor");
        }
        int Init(ros::NodeHandle& nh) {
          cmd_pub_ = nh.advertise("/cmd_vel", 1000);
          client_ = nh.serviceClient("str2voice");
          return 0;
        }
        void ToDownstream(const std::string& answer_txt, float linear_x, float angular_z) {
        	// 通过 str2voice 服务和 /cmd_vel topic向下游 voice_creator 和 mbot_gazebo 发送
          robot_voice::StringToVoice::Request req;
          robot_voice::StringToVoice::Response resp;
          req.data = answer_txt;
          bool ok = client_.call(req, resp);
          if (ok) {
            printf("send str2voice service success: %s, and pub cmd_vel\n", req.data.c_str());
             geometry_msgs::Twist msg;
             msg.linear.x = linear_x;
             msg.angular.z = angular_z;
             cmd_pub_.publish(msg);
          } else {
            ROS_ERROR("failed to send str2voice service");
          }
        }
        bool ChatterCallbback(robot_voice::StringToVoice::Request &req, robot_voice::StringToVoice::Response &resp) {
          printf("i received: %s\n", req.data.c_str());
          std::string voice_txt = req.data;
      	// 根据指令关键字,发送对应的语音播包文字和 cmd_vel 命令
          if (voice_txt.find("前") != std::string::npos) {
            ToDownstream("小车请向前跑", 0.3, 0);
          } else if (voice_txt.find("后") != std::string::npos) {
            ToDownstream("小车请向后倒", -0.3, 0);
          } else if (voice_txt.find("左") != std::string::npos) {
            ToDownstream("小车请向左转", 0, 0.3);
          } else if (voice_txt.find("右") != std::string::npos) {
            ToDownstream("小车请向右转", 0, -0.3);
          } else if (voice_txt.find("转") != std::string::npos) {
            ToDownstream("小车请打转", 0.3, -0.3);
          }
          resp.success = true;
          return resp.success;
        }
        void Start(ros::NodeHandle& nh) {
        	// 申明 human_chatter 服务,ChatterCallbback是回调函数
          chatter_server_ = nh.advertiseService("human_chatter", &RobotController::ChatterCallbback, this);
        }
      private:
        ros::ServiceServer chatter_server_;
        ros::Publisher cmd_pub_;
        ros::ServiceClient client_;
      };
      int main(int argc, char* argv[]) {
        int ret = 0;
        ros::init(argc, argv, "voice_controller");
        ros::NodeHandle nh;
        RobotController rc;
        rc.Init(nh);
        printf("this is a voice controller app for robot, you can say: 向前, 向后, 向左, 向右, 转圈, 结束\n");
        rc.Start(nh);
        ros::spin();
        return 0;
      }
      

      (6)voice_creator.cpp

      #include 
      #include 
      #include 
      #include 
      #include 
      #include 
      #include "ifly_voice/qisr.h"
      #include "ifly_voice/qtts.h"
      #include "ifly_voice/msp_cmn.h"
      #include "ifly_voice/formats.h"
      #include "ifly_voice/msp_errors.h"
      #include "ifly_voice/speech_recognizer.h"
      #include 
      class Helper {
      public:
        static void SignalHandler(int signal) {
            ROS_INFO("\nCaught signal %d. Exiting gracefully...\n", signal);
            exit(0);
        }
      };
      class VoiceCreator {
      public:
        VoiceCreator() {
          ROS_INFO("VoiceCreator Constructor");
        }
        ~VoiceCreator() {
          ROS_INFO("VoiceCreator Destructor ");
        }
        int Init() {
          int ret = MSP_SUCCESS;
          ret = MSPLogin(NULL, NULL, login_params_.c_str());
          if (MSP_SUCCESS != ret)	{
            ROS_ERROR("MSPLogin failed , Error code %d", ret);
            MSPLogout(); // Logout...
            return -1;
          }    
          ROS_INFO("VoiceCreator MSP Login for update, waiting for seconds...");
          return 0;
        }
        int ProcessTxt(std::string& txt) {
          int          ret          = -1;
          FILE*        fp           = NULL;
          const char*  sessionID    = NULL;
          unsigned int audio_len    = 0;
          int          synth_status = MSP_TTS_FLAG_STILL_HAVE_DATA;
          WavePcmHdr_t wav_hdr      = {
            { 'R', 'I', 'F', 'F' },
            0,
            {'W', 'A', 'V', 'E'},
            {'f', 'm', 't', ' '},
            16,
            1,
            1,
            16000,
            32000,
            2,
            16,
            {'d', 'a', 't', 'a'},
            0  
          };
          const char* src_text = txt.c_str();
          const char* des_path = filename_.c_str();
          const char* params = session_begin_params_.c_str();
          if (NULL == src_text || NULL == des_path) {
            ROS_ERROR("params is error!");
            return ret;
          }
          fp = fopen(des_path, "wb");
          if (NULL == fp) {
            ROS_ERROR("open %s error", des_path);
            return ret;
          }
          /* 开始合成 */
          sessionID = QTTSSessionBegin(params, &ret);
          if (MSP_SUCCESS != ret) {
            ROS_ERROR("QTTSSessionBegin failed, error code: %d", ret);
            fclose(fp);
            return ret;
          }
          ret = QTTSTextPut(sessionID, src_text, (unsigned int)strlen(src_text), NULL);
          if (MSP_SUCCESS != ret) {
            ROS_ERROR("QTTSTextPut failed, error code: %d",ret);
            QTTSSessionEnd(sessionID, "TextPutError");
            fclose(fp);
            return ret;
          }
          
          printf("正在合成 ...\n");
          fwrite(&wav_hdr, sizeof(wav_hdr) ,1, fp); //添加wav音频头,使用采样率为16000
          while (1)  {
            /* 获取合成音频 */
            const void* data = QTTSAudioGet(sessionID, &audio_len, &synth_status, &ret);
      		  if (MSP_SUCCESS != ret) {
              break;
            }
            if (NULL != data) {
              fwrite(data, audio_len, 1, fp);
              wav_hdr.data_size += audio_len; //计算data_size大小
            }
            if (MSP_TTS_FLAG_DATA_END == synth_status) {
              break;
            }
            printf(">");
            usleep(150*1000); //防止频繁占用CPU
          }
          printf("\n");
          if (MSP_SUCCESS != ret) {
            ROS_ERROR("QTTSAudioGet failed, error code: %d",ret);
            QTTSSessionEnd(sessionID, "AudioGetError");
            fclose(fp);
            return ret;
          }
          /* 修正wav文件头数据的大小 */
          wav_hdr.size_8 += wav_hdr.data_size + (sizeof(wav_hdr) - 8);
          
          /* 将修正过的数据写回文件头部,音频文件为wav格式 */
          fseek(fp, 4, 0);
          fwrite(&wav_hdr.size_8,sizeof(wav_hdr.size_8), 1, fp); //写入size_8的值
          fseek(fp, 40, 0); //将文件指针偏移到存储data_size值的位置
          fwrite(&wav_hdr.data_size,sizeof(wav_hdr.data_size), 1, fp); //写入data_size的值
          fclose(fp);
          fp = NULL;
          /* 合成完毕 */
          ret = QTTSSessionEnd(sessionID, "Normal");
          if (MSP_SUCCESS != ret) {
            ROS_ERROR("QTTSSessionEnd failed, error code: %d", ret);
            return ret;
          }
      	// 播放语音文件
          fp = popen(play_cmd_.c_str(),"r");
          if (fp == NULL) {
            ROS_ERROR("play /tmp/tts_sample.wav failed");
            return -1;
          }
          sleep(1);
          pclose(fp);
          return 0;
        }
        bool Speeking(robot_voice::StringToVoice::Request &req, robot_voice::StringToVoice::Response &resp) {
          int ret = -1;
          ret = ProcessTxt(req.data);
          if (MSP_SUCCESS != ret) {
            ROS_ERROR("AnswerVoice failed, error code: %d", ret);
            resp.success = false;
            return false;
          } else {
            resp.success = true;
          }
          return resp.success;
        }
        void Start(ros::NodeHandle& nh) {
        	// 申明 str2voice 服务
          server_ = nh.advertiseService("str2voice", &VoiceCreator::Speeking, this);
        }
      private:
        	ros::ServiceServer server_;
      	const std::string login_params_ = "appid = bb839ccf, work_dir = .";
      	const std::string session_begin_params_ = 
          "voice_name = xiaoyan, text_encoding = utf8, "
          "sample_rate = 16000, speed = 50, volume = 50, "
          "pitch = 50, rdn = 2";
          //合成的语音文件名称
      	const std::string filename_ = "/tmp/tts_sample.wav"; 
      	//语音播放命令
        	const std::string play_cmd_ = "play /tmp/tts_sample.wav";
        /* wav音频头部格式 */
        typedef struct WavePcmHdr {
          char            riff[4];                // = "RIFF"
          int		size_8;                 // = FileSize - 8
          char            wave[4];                // = "WAVE"
          char            fmt[4];                 // = "fmt "
          int		fmt_size;		// = 下一个结构体的大小 : 16
          short int       format_tag;             // = PCM : 1
          short int       channels;               // = 通道数 : 1
          int		samples_per_sec;        // = 采样率 : 8000 | 6000 | 11025 | 16000
          int		avg_bytes_per_sec;      // = 每秒字节数 : samples_per_sec * bits_per_sample / 8
          short int       block_align;            // = 每采样点字节数 : wBitsPerSample / 8
          short int       bits_per_sample;        // = 量化比特数: 8 | 16
          char            data[4];                // = "data";
          int		data_size;              // = 纯数据长度 : FileSize - 44 
        } WavePcmHdr_t;
      };
      int main(int argc, char ** argv) {
        int ret = 0;
        ros::init(argc, argv, "voice_creator");
        ros::NodeHandle nh;
        if (signal(SIGINT, Helper::SignalHandler) == SIG_ERR) {
          return -1;
        }
        VoiceCreator vc;
        ret = vc.Init();
        if (ret  
      

      (6)StringToVoice.srv , voice_control_robot.launch 和 CMakeLists.txt

      StringToVoice.srv

      string data
      ---
      bool success
      

      voice_control_robot.launch

        
        
        
      
      

      CMakeLists.txt

      cmake_minimum_required(VERSION 3.0.2)
      project(robot_voice)
      add_compile_options(-std=c++11)
      find_package(catkin REQUIRED COMPONENTS
        roscpp
        rospy
        std_msgs
        geometry_msgs
        message_generation
      )
      add_service_files(
        FILES
        StringToVoice.srv
      )
      generate_messages(
        DEPENDENCIES
        std_msgs
      )
      catkin_package(
        CATKIN_DEPENDS message_runtime roscpp rospy std_msgs
      )
      include_directories(
        include
        ${catkin_INCLUDE_DIRS}
      )
      add_executable(voice_detector 
        src/voice_detector.cpp
        ifly_voice/speech_recognizer.c
        ifly_voice/linuxrec.c)
      add_executable(robot_controller src/robot_controller.cpp)
      add_executable(voice_creator src/voice_creator.cpp)
      add_dependencies(voice_detector ${PROJECT_NAME}_generate_messages_cpp)
      target_link_libraries(voice_detector
        ${catkin_LIBRARIES} 
        libmsc.so -ldl -lpthread -lm -lrt -lasound
      )
      add_dependencies(robot_controller ${PROJECT_NAME}_generate_messages_cpp)
      target_link_libraries(robot_controller
        ${catkin_LIBRARIES} 
      )
      add_dependencies(voice_creator ${PROJECT_NAME}_generate_messages_cpp)
      target_link_libraries(voice_creator
        ${catkin_LIBRARIES} 
        libmsc.so -ldl -pthread
      )
      

      (7)编译并运行(运行时请注意电脑网络通畅!)

      cd ~/catkin_ws/
      catkin_make -DCATKIN_WHITELIST_PACKAGES="robot_voice;mbot_gazebo"
      source devel/setup.bash
      roslaunch mbot_gazebo view_mbot_gazebo.launch
      // 再开一个窗口
      source devel/setup.bash
      roslaunch robot_voice voice_control_robot.launch
      

      语音控制机器人

      (8)在开发调试过程中,出现了如下编译报错:

      internal compiler error: Illegal instruction
      

      不得已,更新了gcc版本,问题解决

      sudo apt-get install gcc-10
      sudo apt-get install g++-10
      cd /usr/bin
      sudo rm gcc g++
      sudo ln -s gcc-10 gcc
      sudo ln -s g++-10 g++
      

      3 总结

      本文的样例托管在本人的github上:robot_voice


0
收藏0
文章版权声明:除非注明,否则均为VPS857原创文章,转载或复制请以超链接形式并注明出处。

相关阅读

  • 【研发日记】Matlab/Simulink自动生成代码(二)——五种选择结构实现方法,Matlab/Simulink自动生成代码的五种选择结构实现方法(二),Matlab/Simulink自动生成代码的五种选择结构实现方法详解(二)
  • 超级好用的C++实用库之跨平台实用方法,跨平台实用方法的C++实用库超好用指南,C++跨平台实用库使用指南,超好用实用方法集合,C++跨平台实用库超好用指南,方法与技巧集合
  • 【动态规划】斐波那契数列模型(C++),斐波那契数列模型(C++实现与动态规划解析),斐波那契数列模型解析与C++实现(动态规划)
  • 【C++】,string类底层的模拟实现,C++中string类的模拟底层实现探究
  • uniapp 小程序实现微信授权登录(前端和后端),Uniapp小程序实现微信授权登录全流程(前端后端全攻略),Uniapp小程序微信授权登录全流程攻略,前端后端全指南
  • Vue脚手架的安装(保姆级教程),Vue脚手架保姆级安装教程,Vue脚手架保姆级安装指南,Vue脚手架保姆级安装指南,从零开始教你如何安装Vue脚手架
  • 如何在树莓派 Raspberry Pi中本地部署一个web站点并实现无公网IP远程访问,树莓派上本地部署Web站点及无公网IP远程访问指南,树莓派部署Web站点及无公网IP远程访问指南,本地部署与远程访问实践,树莓派部署Web站点及无公网IP远程访问实践指南,树莓派部署Web站点及无公网IP远程访问实践指南,本地部署与远程访问详解,树莓派部署Web站点及无公网IP远程访问实践详解,本地部署与远程访问指南,树莓派部署Web站点及无公网IP远程访问实践详解,本地部署与远程访问指南。
  • vue2技术栈实现AI问答机器人功能(流式与非流式两种接口方法),Vue2技术栈实现AI问答机器人功能,流式与非流式接口方法探究,Vue2技术栈实现AI问答机器人功能,流式与非流式接口方法详解
  • 发表评论

    快捷回复:表情:
    评论列表 (暂无评论,0人围观)

    还没有评论,来说两句吧...

    目录[+]

    取消
    微信二维码
    微信二维码
    支付宝二维码