91日韩欧美在线,久青青草免费在线视频

很多開發(fā)人員在轉(zhuǎn)換完 TensorRT 加速引擎之后，最后準備調(diào)用起來執(zhí)行推理任務的時候，就遇到一些障礙。這個環(huán)節(jié)是需要開發(fā)人員自行撰寫相關(guān)代碼，去執(zhí)行讀入數(shù)據(jù)（前處理）、執(zhí)行推理、顯示結(jié)果（后處理）等工作，如下圖最右邊的部分。

這部分的麻煩之處，在于每個神經(jīng)網(wǎng)絡(luò)的結(jié)構(gòu)不相同，并沒有“通用”的代碼可以適用于大部分的網(wǎng)絡(luò)結(jié)構(gòu)，需要針對指定神經(jīng)網(wǎng)絡(luò)去撰寫對應的代碼，最重要是需要清除這個模型的輸入 (input bold) 與輸出 (outpold) 的名稱與張量結(jié)構(gòu)。

本文以前面在 TAO 工具套件中使用的 ssd 神經(jīng)網(wǎng)絡(luò)為范例，提供基礎(chǔ)的“前后處理”范例代碼給讀者參考，這是從 NVIDIA 中國區(qū)開發(fā)者社區(qū)所舉辦過多屆 “Sky 黑客松”比賽中，所提供的開源內(nèi)容中提取的重點，主要如下：

1、數(shù)據(jù)前處理：

  def _preprocess_trt(img, shape=(300, 300)):    """TRT SSD推理前的數(shù)據(jù)前處理"""    img = cv2.resize(img, shape)    img = img.transpose((2, 0, 1)).astype(np.float32)returnimg

這里 “shape=(300,300)” 為張量的尺度，根據(jù)模型訓練時的長寬兩個變量，至于 transpose 里的 (2,0,1) 是固定的，不需調(diào)整。

2、數(shù)據(jù)后處理：

  def _postprocess_trt(img, output, conf_th, output_layout):    """TRT SSD推理后的結(jié)果的數(shù)據(jù)處理步驟."""    img_h, img_w, _ = img.shape    boxes, confs, clss = [], [], []    for prefix in range(0, len(output), output_layout):        index = int(output[prefix+0])        conf = float(output[prefix+2])        if conf < conf_th:            continue        x1 = int(output[prefix+3] * img_w)        y1 = int(output[prefix+4] * img_h)        x2 = int(output[prefix+5] * img_w)        y2 = int(output[prefix+6] * img_h)        cls = int(output[prefix+1])        boxes.append((x1, y1, x2, y2))        confs.append(conf)        clss.append(cls)returnboxes,confs,clss#返回標框坐標、置信度、類別

這里最重要的 x1, y1,x2, y2 坐標值，必須根據(jù) SSD 神經(jīng)網(wǎng)絡(luò)所定義的規(guī)范去進行修改，其他部分可以通用于大部分神經(jīng)網(wǎng)絡(luò)。

3、定義 TrtSSD 類封裝運行 TRT SSD 所需的東西：

  class TrtSSD(object):# 加載自定義組建，如果TRT版本小于7.0需要額外生成flattenconcat自定義組件庫    def _load_plugins(self):        if trt.__version__[0] < '7':            ctypes.CDLL("ssd/libflattenconcat.so")        trt.init_libnvinfer_plugins(self.trt_logger, '')#加載通過Transfer Learning Toolkit生成的推理引擎    def _load_engine(self):        TRTbin = 'ssd/TRT_%s.bin' % self.model  #請根據(jù)實際狀況自行修改        with open(TRTbin, 'rb') as f, trt.Runtime(self.trt_logger) as runtime:            return runtime.deserialize_cuda_engine(f.read())#通過加載的引擎，生成可執(zhí)行的上下文    def _create_context(self):        for binding in self.engine:            size = trt.volume(self.engine.get_binding_shape(binding)) *                    self.engine.max_batch_size##注意：這里的host_mem需要使用pagelockedmemory，以免內(nèi)存被釋放            host_mem = cuda.pagelocked_empty(size, np.float32)            cuda_mem = cuda.mem_alloc(host_mem.nbytes)            self.bindings.append(int(cuda_mem))            if self.engine.binding_is_input(binding):                self.host_inputs.append(host_mem)                self.cuda_inputs.append(cuda_mem)            else:                self.host_outputs.append(host_mem)                self.cuda_outputs.append(cuda_mem)        return self.engine.create_execution_context()# 初始化引擎    def __init__(self, model, input_shape, output_layout=7):        self.model = model        self.input_shape = input_shape        self.output_layout = output_layout        self.trt_logger = trt.Logger(trt.Logger.INFO)        self._load_plugins()        self.engine = self._load_engine()
        self.host_inputs = []        self.cuda_inputs = []        self.host_outputs = []        self.cuda_outputs = []        self.bindings = []        self.stream = cuda.Stream()        self.context = self._create_context()# 釋放引擎，釋放GPU顯存，釋放CUDA流    def __del__(self):        del self.stream        del self.cuda_outputs        del self.cuda_inputs# 利用生成的可執(zhí)行上下文執(zhí)行推理    def detect(self, img, conf_th=0.3):        img_resized = _preprocess_trt(img, self.input_shape)        np.copyto(self.host_inputs[0], img_resized.ravel())        # 將處理好的圖片從CPU內(nèi)存中復制到GPU顯存        cuda.memcpy_htod_async(            self.cuda_inputs[0], self.host_inputs[0], self.stream)        # 開始執(zhí)行推理任務        self.context.execute_async(            batch_size=1,            bindings=self.bindings,            stream_handle=self.stream.handle)        # 將推理結(jié)果輸出從GPU顯存復制到CPU內(nèi)存        cuda.memcpy_dtoh_async(            self.host_outputs[1], self.cuda_outputs[1], self.stream)        cuda.memcpy_dtoh_async(            self.host_outputs[0], self.cuda_outputs[0], self.stream)        self.stream.synchronize()
        output = self.host_outputs[0]return_postprocess_trt(img,output,conf_th,self.output_layout)

上面三個部分對不同神經(jīng)網(wǎng)絡(luò)都是不同的內(nèi)容，如果要參考 YOLO 神經(jīng)網(wǎng)絡(luò)的對應內(nèi)容，推薦參考https://github.com/jkjung-avt/tensorrt_demos開源項目，里面有完整的 YOLOv3 與 YOLOv4 的詳細內(nèi)容。

本文的開源代碼可以在此鏈接下載完整的內(nèi)容與配套的工具。

https://pan.baidu.com/s/1fGLBnzqtnRNpfD3PbileOA密碼: 99et

審核編輯：李倩

聲明：本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場。文章及其配圖僅供工程師學習之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問題，請聯(lián)系本站處理。舉報投訴