找回密码
 会员注册
查看: 32|回复: 0

Tensorrt安装及使用(python版本)

[复制链接]

6

主题

0

回帖

19

积分

新手上路

积分
19
发表于 2024-9-10 10:59:49 | 显示全部楼层 |阅读模式
官方的教程tensorrt的安装:InstallationGuide::NVIDIADeepLearningTensorRTDocumentation视频教程:TensorRT教程|基于8.6.1版本|第一部分_哔哩哔哩_bilibili代码教程:trt-samples-for-hackathon-cn/cookbookatmaster·NVIDIA/trt-samples-for-hackathon-cn(github.com)Tensorrt的安装官方的教程:安装指南::NVIDIADeepLearningTensorRTDocumentation---InstallationGuide::NVIDIADeepLearningTensorRTDocumentationTensorrt的安装方法主要有:1、使用pipinstall进行安装;2、下载tar、zip、deb文件进行安装;3、使用docker容器进行安装:TensorRTContainerReleaseNotesWindows系统首先选择和本机nVidia驱动、cuda版本、cudnn版本匹配的Tensorrt版本。我使用的:cuda版本:11.4;cudnn版本:11.4建议下载zip进行Tensorrt的安装,参考的教程:windows安装tensorrt-知乎(zhihu.com)Ubuntu系统首先选择和本机nVidia驱动、cuda版本、cudnn版本匹配的Tensorrt版本。我使用的:cuda版本:11.7;cudnn版本:8.9.01、使用pip进行安装:pipinstalltensorrt==8.6.1我这边安装失败2、下载deb文件进行安装os="ubuntuxx04"tag="8.x.x-cuda-x.x"sudodpkg-inv-tensorrt-local-repo-${os}-${tag}_1.0-1_amd64.debsudocp/var/nv-tensorrt-local-repo-${os}-${tag}/*-keyring.gpg/usr/share/keyrings/sudoapt-getupdatesudoapt-getinstalltensorrt我这边同样没安装成功3、使用tar文件进行安装(推荐)推荐使用这种方法进行安装,成功率较高下载对应的版本:developer.nvidia.com/tensorrt-download下载后tar-xzvfTensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz#解压文件#将lib添加到环境变量里面vim~/.bashrcexportLD_LIBRARY_PATH=$LD_LIBRARY_PATH:./TensorRT-8.6.1.6/libsource~/.bashrc#或直接将TensorRT-8.6.1.6/lib添加到cuda/lib64里面cp-r./lib/*/usr/local/cuda/lib64/#安装python的包cdTensorRT-8.6.1.6/pythonpipinstalltensorrt-xxx-none-linux_x86_64.whl下载成功后验证:#验证是否安装成功:python>>>importtensorrt>>>print(tensorrt.__version__)>>>asserttensorrt.Builder(tensorrt.Logger())如果没有报错说明安装成功使用方法我这边的使用的流程是:pytorch->onnx->tensorrt选择resnet18进行转换pytorch转onnx安装onnx,onnxruntime安装一个就行pipinstallonnxpipinstallonnxruntimepipinstallonnxruntime-gpu#gpu版本将pytorch模型转成onnx模型importtorchimporttorchvisionmodel=torchvision.models.resnet18(pretrained=False)device='cuda'iftorch.cuda.is_availableelse'cpu'dummy_input=torch.randn(1,3,224,224,device=device)model.to(device)model.eval()output=model(dummy_input)print("pytorchresult:",torch.argmax(output))importtorch.onnxtorch.onnx.export(model,dummy_input,'./model.onnx',input_names=["input"],output_names=["output"],do_constant_folding=True,verbose=True,keep_initializers_as_inputs=True,opset_version=14,dynamic_axes={"input":{0:"nBatchSize"},"output":{0:"nBatchSize"}})#一般情况#torch.onnx.export(model,torch.randn(1,c,nHeight,nWidth,device="cuda"),'./model.onnx',input_names=["x"],output_names=["y","z"],do_constant_folding=True,verbose=True,keep_initializers_as_inputs=True,opset_version=14,dynamic_axes={"x":{0:"nBatchSize"},"z":{0:"nBatchSize"}})importonnximportnumpyasnpimportonnxruntimeasortmodel_onnx_path='./model.onnx'#验证模型的合法性onnx_model=onnx.load(model_onnx_path)onnx.checker.check_model(onnx_model)#创建ONNX运行时会话ort_session=ort.InferenceSession(model_onnx_path,providers=['CUDAExecutionProvider','CPUExecutionProvider'])#准备输入数据input_data={'input':dummy_input.cpu().numpy()}#运行推理y_pred_onnx=ort_session.run(None,input_data)print("onnxresult:",np.argmax(y_pred_onnx[0]))onnx转tensorrtWindow使用zip安装后使用TensorrtRT-8.6.1.6/bin/trtexec.exe文件生成tensorrt模型文件Ubuntu使用tar安装后使用TensorrtRT-8.6.1.6/bin/trtexec文件生成tensorrt模型文件./trtexec--onnx=model.onnx--saveEngine=model.trt--fp16--workspace=16--shapes=input:2x3x224x224其中的参数:--fp16:是否使用fp16--shapes:输入的大小。tensorrt支持动态batch设置,感兴趣可以尝试tensorrt的使用nVidia的官方使用方法:trt-samples-for-hackathon-cn/cookbookatmaster·NVIDIA/trt-samples-for-hackathon-cn(github.com)打印转换后的tensorrt的模型的信息importtensorrtastrt#加载TensorRT引擎logger=trt.Logger(trt.Logger.INFO)withopen('./model.trt',"rb")asf,trt.Runtime(logger)asruntime:engine=runtime.deserialize_cuda_engine(f.read())foridxinrange(engine.num_bindings):name=engine.get_tensor_name(idx)is_input=engine.get_tensor_mode(name)op_type=engine.get_tensor_dtype(name)shape=engine.get_tensor_shape(name)print('inputid:',idx,'\tisinput:',is_input,'\tbindingname:',name,'\tshape:',shape,'\ttype:',op_type)测试转换后的tensorrt模型,来自nVidia的cookbook/08-Advance/MultiStream/main.pyfromtimeimporttimeimportnumpyasnpimporttensorrtastrtfromcudaimportcudart#安装pipinstallcuda-pythonnp.random.seed(31193)nWarmUp=10nTest=30nB,nC,nH,nW=1,3,224,224data=dummy_input.cpu().numpy()defrun1(engine):input_name=engine.get_tensor_name(0)output_name=engine.get_tensor_name(1)output_type=engine.get_tensor_dtype(output_name)output_shape=engine.get_tensor_shape(output_name)context=engine.create_execution_context()context.set_input_shape(input_name,[nB,nC,nH,nW])_,stream=cudart.cudaStreamCreate()inputH0=np.ascontiguousarray(data.reshape(-1))outputH0=np.empty(output_shape,dtype=trt.nptype(output_type))_,inputD0=cudart.cudaMallocAsync(inputH0.nbytes,stream)_,outputD0=cudart.cudaMallocAsync(outputH0.nbytes,stream)#doacompleteinferencecudart.cudaMemcpyAsync(inputD0,inputH0.ctypes.data,inputH0.nbytes,cudart.cudaMemcpyKind.cudaMemcpyHostToDevice,stream)context.execute_async_v2([int(inputD0),int(outputD0)],stream)cudart.cudaMemcpyAsync(outputH0.ctypes.data,outputD0,outputH0.nbytes,cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost,stream)cudart.cudaStreamSynchronize(stream)#Counttimeofmemorycopyfromhosttodeviceforiinrange(nWarmUp):cudart.cudaMemcpyAsync(inputD0,inputH0.ctypes.data,inputH0.nbytes,cudart.cudaMemcpyKind.cudaMemcpyHostToDevice,stream)trtTimeStart=time()foriinrange(nTest):cudart.cudaMemcpyAsync(inputD0,inputH0.ctypes.data,inputH0.nbytes,cudart.cudaMemcpyKind.cudaMemcpyHostToDevice,stream)cudart.cudaStreamSynchronize(stream)trtTimeEnd=time()print("%6.3fms-1stream,DataCopyHtoD"%((trtTimeEnd-trtTimeStart)/nTest*1000))#Counttimeofinferenceforiinrange(nWarmUp):context.execute_async_v2([int(inputD0),int(outputD0)],stream)trtTimeStart=time()foriinrange(nTest):context.execute_async_v2([int(inputD0),int(outputD0)],stream)cudart.cudaStreamSynchronize(stream)trtTimeEnd=time()print("%6.3fms-1stream,Inference"%((trtTimeEnd-trtTimeStart)/nTest*1000))#Counttimeofmemorycopyfromdevicetohostforiinrange(nWarmUp):cudart.cudaMemcpyAsync(outputH0.ctypes.data,outputD0,outputH0.nbytes,cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost,stream)trtTimeStart=time()foriinrange(nTest):cudart.cudaMemcpyAsync(outputH0.ctypes.data,outputD0,outputH0.nbytes,cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost,stream)cudart.cudaStreamSynchronize(stream)trtTimeEnd=time()print("%6.3fms-1stream,DataCopyDtoH"%((trtTimeEnd-trtTimeStart)/nTest*1000))#Counttimeofendtoendforiinrange(nWarmUp):context.execute_async_v2([int(inputD0),int(outputD0)],stream)trtTimeStart=time()foriinrange(nTest):cudart.cudaMemcpyAsync(inputD0,inputH0.ctypes.data,inputH0.nbytes,cudart.cudaMemcpyKind.cudaMemcpyHostToDevice,stream)context.execute_async_v2([int(inputD0),int(outputD0)],stream)cudart.cudaMemcpyAsync(outputH0.ctypes.data,outputD0,outputH0.nbytes,cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost,stream)cudart.cudaStreamSynchronize(stream)trtTimeEnd=time()print("%6.3fms-1stream,DataCopy+Inference"%((trtTimeEnd-trtTimeStart)/nTest*1000))cudart.cudaStreamDestroy(stream)cudart.cudaFree(inputD0)cudart.cudaFree(outputD0)print("tensorrtresult:",np.argmax(outputH0))if__name__=="__main__":cudart.cudaDeviceSynchronize()f=open("./model.trt","rb")#读取trt模型runtime=trt.Runtime(trt.Logger(trt.Logger.WARNING))#创建一个Runtime(传入记录器Logger)engine=runtime.deserialize_cuda_engine(f.read())#从文件中加载trt引擎run1(engine)#doinferencewithsinglestreamprint(dummy_input.shape,dummy_input.dtype)
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 会员注册

本版积分规则

QQ|手机版|心飞设计-版权所有:微度网络信息技术服务中心 ( 鲁ICP备17032091号-12 )|网站地图

GMT+8, 2025-1-5 09:25 , Processed in 1.042766 second(s), 26 queries .

Powered by Discuz! X3.5

© 2001-2025 Discuz! Team.

快速回复 返回顶部 返回列表