|
问题描述准备使用yolov5训练自己的模型,自己将下载的开源数据集按照自己的要求重新标注了一下,然后现在对其进行划分。问题分析划分数据集主要的步骤就是,首先要将数据集打乱顺序,然后按照一定的比例将其分为训练集,验证集和测试集。这里我定的比例是7:1:2。步骤流程1、将数据集打乱顺序数据集有图片和标注文件,我们需要把两种文件绑定然后将其打乱顺序。首先读取数据后,将两种文件通过zip函数绑定 each_class_image=[]each_class_label=[]forimageinos.listdir(file_path):each_class_image.append(image)forlabelinos.listdir(xml_path):each_class_label.append(label)data=list(zip(each_class_image,each_class_label))1234567然后打乱顺序,再将两个列表分开random.shuffle(data)each_class_image,each_class_label=zip(*data)122、按照确定好的比例将两个列表元素分割分别用三个列表储存一下图片和标注文件的元素 train_images=each_class_image[0:int(train_rate*total)]val_images=each_class_image[int(train_rate*total):int((train_rate+val_rate)*total)]test_images=each_class_image[int((train_rate+val_rate)*total):]train_labels=each_class_label[0:int(train_rate*total)]val_labels=each_class_label[int(train_rate*total):int((train_rate+val_rate)*total)]test_labels=each_class_label[int((train_rate+val_rate)*total):]12345673、在本地生成文件夹,将划分好的数据集分别保存这样就保存好了。forimageintrain_images:#print(image)old_path=file_path+'/'+imagenew_path1=new_file_path+'/'+'train'+'/'+'images'ifnotos.path.exists(new_path1)s.makedirs(new_path1)new_path=new_path1+'/'+imageshutil.copy(old_path,new_path)forlabelintrain_labels:#print(label)old_path=xml_path+'/'+labelnew_path1=new_file_path+'/'+'train'+'/'+'labels'ifnotos.path.exists(new_path1)s.makedirs(new_path1)new_path=new_path1+'/'+labelshutil.copy(old_path,new_path)forimageinval_imagesld_path=file_path+'/'+imagenew_path1=new_file_path+'/'+'val'+'/'+'images'ifnotos.path.exists(new_path1)s.makedirs(new_path1)new_path=new_path1+'/'+imageshutil.copy(old_path,new_path)forlabelinval_labelsld_path=xml_path+'/'+labelnew_path1=new_file_path+'/'+'val'+'/'+'labels'ifnotos.path.exists(new_path1)s.makedirs(new_path1)new_path=new_path1+'/'+labelshutil.copy(old_path,new_path)forimageintest_imagesld_path=file_path+'/'+imagenew_path1=new_file_path+'/'+'test'+'/'+'images'ifnotos.path.exists(new_path1)s.makedirs(new_path1)new_path=new_path1+'/'+imageshutil.copy(old_path,new_path)forlabelintest_labelsld_path=xml_path+'/'+labelnew_path1=new_file_path+'/'+'test'+'/'+'labels'ifnotos.path.exists(new_path1)s.makedirs(new_path1)new_path=new_path1+'/'+labelshutil.copy(old_path,new_path)12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849运行结果展示直接运行单个python文件即可。运行完毕去本地查看图片和标注文件乱序,且一一对应。完整代码分享importosimportshutilimportrandomrandom.seed(0)defsplit_data(file_path,xml_path,new_file_path,train_rate,val_rate,test_rate):each_class_image=[]each_class_label=[]forimageinos.listdir(file_path):each_class_image.append(image)forlabelinos.listdir(xml_path):each_class_label.append(label)data=list(zip(each_class_image,each_class_label))total=len(each_class_image)random.shuffle(data)each_class_image,each_class_label=zip(*data)train_images=each_class_image[0:int(train_rate*total)]val_images=each_class_image[int(train_rate*total):int((train_rate+val_rate)*total)]test_images=each_class_image[int((train_rate+val_rate)*total):]train_labels=each_class_label[0:int(train_rate*total)]val_labels=each_class_label[int(train_rate*total):int((train_rate+val_rate)*total)]test_labels=each_class_label[int((train_rate+val_rate)*total):]forimageintrain_images:print(image)old_path=file_path+'/'+imagenew_path1=new_file_path+'/'+'train'+'/'+'images'ifnotos.path.exists(new_path1):os.makedirs(new_path1)new_path=new_path1+'/'+imageshutil.copy(old_path,new_path)forlabelintrain_labels:print(label)old_path=xml_path+'/'+labelnew_path1=new_file_path+'/'+'train'+'/'+'labels'ifnotos.path.exists(new_path1):os.makedirs(new_path1)new_path=new_path1+'/'+labelshutil.copy(old_path,new_path)forimageinval_images:old_path=file_path+'/'+imagenew_path1=new_file_path+'/'+'val'+'/'+'images'ifnotos.path.exists(new_path1):os.makedirs(new_path1)new_path=new_path1+'/'+imageshutil.copy(old_path,new_path)forlabelinval_labels:old_path=xml_path+'/'+labelnew_path1=new_file_path+'/'+'val'+'/'+'labels'ifnotos.path.exists(new_path1):os.makedirs(new_path1)new_path=new_path1+'/'+labelshutil.copy(old_path,new_path)forimageintest_images:old_path=file_path+'/'+imagenew_path1=new_file_path+'/'+'test'+'/'+'images'ifnotos.path.exists(new_path1):os.makedirs(new_path1)new_path=new_path1+'/'+imageshutil.copy(old_path,new_path)forlabelintest_labels:old_path=xml_path+'/'+labelnew_path1=new_file_path+'/'+'test'+'/'+'labels'ifnotos.path.exists(new_path1):os.makedirs(new_path1)new_path=new_path1+'/'+labelshutil.copy(old_path,new_path)if__name__=='__main__':file_path="D:/Files/dataSet/drone_images"xml_path='D:/Files/dataSet/drone_labels'new_file_path="D:/Files/dataSet/droneData"split_data(file_path,xml_path,new_file_path,train_rate=0.7,val_rate=0.1,test_rate=0.2)123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
|
|