Python酷库之旅-第三方库Pandas(061)

无根大树 · 发表于 2024-9-10 02:24:52

目录一、用法精讲236、pandas.Series.explode方法236-1、语法236-2、参数236-3、功能236-4、返回值236-5、说明236-6、用法236-6-1、数据准备236-6-2、代码示例236-6-3、结果输出237、pandas.Series.searchsorted方法237-1、语法237-2、参数237-3、功能237-4、返回值237-5、说明237-6、用法237-6-1、数据准备237-6-2、代码示例237-6-3、结果输出238、pandas.Series.ravel方法238-1、语法238-2、参数238-3、功能238-4、返回值238-5、说明238-6、用法238-6-1、数据准备238-6-2、代码示例238-6-3、结果输出239、pandas.Series.repeat方法239-1、语法239-2、参数239-3、功能239-4、返回值239-5、说明239-6、用法239-6-1、数据准备239-6-2、代码示例239-6-3、结果输出240、pandas.Series.squeeze方法240-1、语法240-2、参数240-3、功能240-4、返回值240-5、说明240-6、用法240-6-1、数据准备240-6-2、代码示例240-6-3、结果输出二、推荐阅读1、Python筑基之旅2、Python函数之旅3、Python算法之旅4、Python魔法之旅5、博客个人主页一、用法精讲236、pandas.Series.explode方法236-1、语法#236、pandas.Series.explode方法pandas.Series.explode(ignore_index=False)Transformeachelementofalist-liketoarow.Parameters:ignore_indexbool,defaultFalseIfTrue,theresultingindexwillbelabeled0,1,…,n-1.Returns:SeriesExplodedliststorows;indexwillbeduplicatedfortheserows.236-2、参数236-2-1、ignore_index(可选，默认值为False)：布尔值，若设置为False，则保持原始索引，展开后的新Series保持原始Series的索引；若设置为True，则忽略原始索引，展开后的新Series使用新的整数索引。236-3、功能将包含列表、元组或类似的可迭代对象的Series进行展开，使每个元素在新Series中都有一行。简单来说，它可以将一个包含列表的Series转换为一个平坦的Series，其中每个列表元素占据一行。236-4、返回值返回一个新的Series，其索引可能是原来的索引(如果ignore_index=False)或者是重新生成的整数索引(如果ignore_index=True)每个列表-like元素中的项都变成新的行，如果某元素不是列表-like，则保持不变。236-5、说明使用场景：236-5-1、处理嵌套列表数据：在处理从JSON、数据库或其他数据源导入的嵌套数据时，常常会遇到列表嵌套在单个单元格中的情况。explode()方法可以将这些嵌套列表展开为单独的行，便于进一步分析。如：电商订单数据，每个订单包含多个商品。236-5-2、数据清洗与预处理：在数据清洗过程中，常常需要将一个单元格中的多个值分成多行，以便进行进一步的操作和清洗。如：用户标签数据，每个用户可能有多个标签。236-5-3、文本分析：在自然语言处理和文本分析中，常常需要将文本数据拆分成单词或短语，然后对这些拆分后的单词或短语进行分析，explode()方法可以帮助将分词后的列表展开为单独的行。如：分词后的文本数据。236-5-4、时间序列数据处理：在时间序列数据处理中，可能会有某些时间点对应多个事件或值的情况，explode()方法可以将这些多值的时间点展开为多个时间点，以便于进一步分析和处理。如：某时间点的多个事件。236-6、用法236-6-1、数据准备无236-6-2、代码示例#236、pandas.Series.explode方法#236-1、处理嵌套列表数据importpandasaspd#示例数据orders=pd.Series([['item1','item2'],['item3'],['item4','item5','item6']])#使用explode方法展开商品列表exploded_orders=orders.explode()print(exploded_orders,end='\n\n')#236-2、数据清洗与预处理importpandasaspd#示例数据user_tags=pd.Series([['tag1','tag2'],['tag3'],['tag4','tag5','tag6']])#使用explode方法展开标签列表exploded_tags=user_tags.explode()print(exploded_tags,end='\n\n')#236-3、文本分析importpandasaspd#示例数据texts=pd.Series([['word1','word2','word3'],['word4'],['word5','word6']])#使用explode方法展开分词后的列表exploded_texts=texts.explode()print(exploded_texts,end='\n\n')#236-4、时间序列数据处理importpandasaspd#示例数据time_series=pd.Series([['event1','event2'],['event3'],['event4','event5','event6']])#使用explode方法展开时间点的事件列表exploded_time_series=time_series.explode()print(exploded_time_series)236-6-3、结果输出#236、pandas.Series.explode方法#236-1、处理嵌套列表数据#0item1#0item2#1item3#2item4#2item5#2item6#dtype

bject#236-2、数据清洗与预处理#0tag1#0tag2#1tag3#2tag4#2tag5#2tag6#dtype

bject#236-3、文本分析#0word1#0word2#0word3#1word4#2word5#2word6#dtype

bject#236-4、时间序列数据处理#0event1#0event2#1event3#2event4#2event5#2event6#dtype

bject237、pandas.Series.searchsorted方法237-1、语法#237、pandas.Series.searchsorted方法pandas.Series.searchsorted(value,side='left',sorter=None)Findindiceswhereelementsshouldbeinsertedtomaintainorder.FindtheindicesintoasortedSeriesselfsuchthat,ifthecorrespondingelementsinvaluewereinsertedbeforetheindices,theorderofselfwouldbepreserved.NoteTheSeriesmustbemonotonicallysorted,otherwisewronglocationswilllikelybereturned.Pandasdoesnotcheckthisforyou.Parameters:valuearray-likeorscalarValuestoinsertintoself.side{‘left’,‘right’},optionalIf‘left’,theindexofthefirstsuitablelocationfoundisgiven.If‘right’,returnthelastsuchindex.Ifthereisnosuitableindex,returneither0orN(whereNisthelengthofself).sorter1-Darray-like,optionalOptionalarrayofintegerindicesthatsortselfintoascendingorder.Theyaretypicallytheresultofnp.argsort.Returns:intorarrayofintAscalarorarrayofinsertionpointswiththesameshapeasvalue.237-2、参数237-2-1、value(必须)：标量或数组型数据，表示要查找的值。237-2-2、side(可选，默认值为'left')：{'left','right'}，表示在找到等于value的元素时，是插入到左边还是右边。'left'表示插入到等于value的元素的左侧，'right'表示插入到右侧。237-2-3、sorter(可选，默认值为None)：可选数组型数据，表示Series排序后的索引。237-3、功能用于查找一个值或一组值在一个排序好的Series中应插入的位置，以保持顺序不变，该方法对于二分查找、数据插入和位置索引等操作非常有用。237-4、返回值返回整数或整数数组，表示插入位置的索引。237-5、说明无237-6、用法237-6-1、数据准备无237-6-2、代码示例#237、pandas.Series.searchsorted方法#237-1、基本用法importpandasaspd#创建一个排序好的Seriess=pd.Series([1,2,3,4,5])#查找插入值的位置index=s.searchsorted(3)print(index,end='\n\n')#237-2、使用'side'参数importpandasaspd#创建一个排序好的Seriess=pd.Series([1,2,3,3,4,5])#查找插入值的位置（插入左侧）index_left=s.searchsorted(3,side='left')print(index_left)#查找插入值的位置（插入右侧）index_right=s.searchsorted(3,side='right')print(index_right,end='\n\n')#237-3、处理未排序的Seriesimportpandasaspd#创建一个未排序的Seriess=pd.Series([5,1,4,2,3])#获取排序后的索引sorter=s.argsort()#查找插入值的位置index=s.searchsorted(3,sorter=sorter)print(index)237-6-3、结果输出#237、pandas.Series.searchsorted方法#237-1、基本用法#2#237-2、使用'side'参数#2#4#237-3、处理未排序的Series#2238、pandas.Series.ravel方法238-1、语法#238、pandas.Series.ravel方法pandas.Series.ravel(order='C')ReturntheflattenedunderlyingdataasanndarrayorExtensionArray.Deprecatedsinceversion2.2.0:Series.ravelisdeprecated.Theunderlyingarrayisalready1D,soravelisnotnecessary.Useto_numpy()forconversiontoanumpyarrayinstead.Returns:numpy.ndarrayorExtensionArrayFlatteneddataoftheSeries.238-2、参数238-2-1、order(可选，默认值为'C')：字符串类型，选项有：'C'：按照C语言的行优先顺序(行优先，即先按行读取再按列读取)展平数组。'F'：按照Fortran语言的列优先顺序(列优先，即先按列读取再按行读取)展平数组。'A'：如果原始数据在内存中是按行优先顺序存储的，则返回按行优先顺序展平的数组；如果原始数据在内存中是按列优先顺序存储的，则返回按列优先顺序展平的数组。'K'：尽可能保持原始数据的存储顺序。238-3、功能用于将Series对象展平为一个一维的NumPy数组。238-4、返回值返回一个一维的NumPy数组，其中包含了原Series对象中的所有数据。238-5、说明此方法目前版本仍然能用，但后续将被pandas.Series.to_numpy方法替代。238-6、用法238-6-1、数据准备无238-6-2、代码示例#238、pandas.Series.ravel方法importpandasaspdimportnumpyasnp#创建一个PandasSeries对象data=pd.Series([1,2,3,4,5])#使用ravel()方法flattened_data_C=data.ravel(order='C')flattened_data_F=data.ravel(order='F')print("Flatteneddata(Corder):",flattened_data_C)print("Flatteneddata(Forder):",flattened_data_F)238-6-3、结果输出#238、pandas.Series.ravel方法#Flatteneddata(Corder):[12345]#Flatteneddata(Forder):[12345]239、pandas.Series.repeat方法239-1、语法#239、pandas.Series.repeat方法pandas.Series.repeat(repeats,axis=None)RepeatelementsofaSeries.ReturnsanewSerieswhereeachelementofthecurrentSeriesisrepeatedconsecutivelyagivennumberoftimes.Parameters:repeatsintorarrayofintsThenumberofrepetitionsforeachelement.Thisshouldbeanon-negativeinteger.Repeating0timeswillreturnanemptySeries.axisNoneUnused.ParameterneededforcompatibilitywithDataFrame.Returns:SeriesNewlycreatedSerieswithrepeatedelements.239-2、参数239-2-1、repeats(必须)：整数或整数数组，如果是单个整数，则Series中的每个元素都会被重复该整数指定的次数；如果是一个与Series等长的整数数组，则每个元素会按照对应位置的整数进行重复。239-2-2、axis(可选，默认值为None)：参数在Series中无效，因为Series是一维的，因此这个参数在这里不被使用。239-3、功能用于将Series中的每个元素按指定的次数重复，该方法对于数据扩展或增加数据量非常有用。239-4、返回值返回一个新的PandasSeries对象，其中每个元素按指定的次数进行了重复。239-5、说明无239-6、用法239-6-1、数据准备无239-6-2、代码示例#239、pandas.Series.repeat方法importpandasaspd#创建一个PandasSeries对象data=pd.Series([1,2,3])#每个元素重复3次repeated_data_1=data.repeat(3)#每个元素根据给定的数组分别重复repeated_data_2=data.repeat([1,2,3])print("Repeateddata(3times):")print(repeated_data_1)print("\nRepeateddata(1,2,3timesrespectively):")print(repeated_data_2)239-6-3、结果输出#239、pandas.Series.repeat方法#Repeateddata(3times):#01#01#01#12#12#12#23#23#23#dtype:int64##Repeateddata(1,2,3timesrespectively):#01#12#12#23#23#23#dtype:int64240、pandas.Series.squeeze方法240-1、语法#240、pandas.Series.squeeze方法pandas.Series.squeeze(axis=None)Squeeze1dimensionalaxisobjectsintoscalars.SeriesorDataFrameswithasingleelementaresqueezedtoascalar.DataFrameswithasinglecolumnorasinglerowaresqueezedtoaSeries.Otherwisetheobjectisunchanged.Thismethodismostusefulwhenyoudon’tknowifyourobjectisaSeriesorDataFrame,butyoudoknowithasjustasinglecolumn.InthatcaseyoucansafelycallsqueezetoensureyouhaveaSeries.Parameters:axis{0or‘index’,1or‘columns’,None},defaultNoneAspecificaxistosqueeze.Bydefault,alllength-1axesaresqueezed.ForSeriesthisparameterisunusedanddefaultstoNone.Returns

ataFrame,Series,orscalarTheprojectionaftersqueezingaxisoralltheaxes.240-2、参数240-2-1、axis(可选，默认值为None)：{None,0,1}，选项有：None：默认值，自动删除长度为1的维度。0或index：如果Series或DataFrame在索引轴上只有一个值，则压缩该维度。1或columns：如果Series或DataFrame在列轴上只有一个值，则压缩该维度。240-3、功能用于去除Series中长度为1的维度，它常用于处理从DataFrame中提取的单列或单行结果，使得返回的结果更加简洁。240-4、返回值返回一个去除了长度为1的维度后的对象，如果没有长度为1的维度，则返回原对象。240-5、说明无240-6、用法240-6-1、数据准备无240-6-2、代码示例#240、pandas.Series.squeeze方法#240-1、从DataFrame提取单行或单列importpandasaspd#创建一个DataFramedf=pd.DataFrame({'A':[10,20,30],'B':[15,25,35]})#提取单列single_column=df[['A']]squeezed_column=single_column.squeeze()#提取单行single_row=df.iloc[[0]]squeezed_row=single_row.squeeze()print("OriginalsinglecolumnDataFrame:")print(single_column)print("SqueezedSeriesfromsinglecolumn:")print(squeezed_column)print("OriginalsinglerowDataFrame:")print(single_row)print("SqueezedSeriesfromsinglerow:")print(squeezed_row,end='\n\n')#240-2、数据分组后的操作importpandasaspd#创建一个DataFramedf=pd.DataFrame({'Category':['A','A','B'],'Value':[10,20,30]})#按'Category'分组并计算均值grouped=df.groupby('Category').mean()#获取特定类别的数据并使用squeezesingle_category_mean=grouped.loc[['A']]squeezed_category_mean=single_category_mean.squeeze()print("GroupedmeanDataFrame:")print(single_category_mean)print("Squeezedmeanforsinglecategory:")print(squeezed_category_mean,end='\n\n')#240-3、提高内存效率和性能importpandasaspd#创建一个大型DataFramelarge_df=pd.DataFrame({'Value':range(1000000)})#提取单列并使用squeezesqueezed_series=large_df[['Value']].squeeze()#检查内存使用print("MemoryusageoforiginalDataFrame:",large_df.memory_usage(deep=True).sum())print("MemoryusageofsqueezedSeries:",squeezed_series.memory_usage(deep=True),end='\n\n')#240-4、与函数进行交互importmatplotlib.pyplotasplt#定义一个只接受Series的绘图函数defplot_series(series):series.plot(kind='line',title='SeriesPlot')plt.show()#提取数据并传递给函数data=df[['Value']].iloc[0:3]#提取单列plot_series(data.squeeze())#240-5、简化输出#计算平均值并使用squeezeprocessed_result=df[['Value']].mean().squeeze()defdisplay_result(result):print(f"ProcessedResult:{result}")#使用squeeze简化输出display_result(processed_result)#240-6、数据清洗与转换importpandasaspd#创建一个包含冗余维度的DataFrameredundant_df=pd.DataFrame({'Value':[[10],[20],[30]]})#使用apply和squeeze清理数据cleaned_series=redundant_df['Value'].apply(lambdax:pd.Series(x).squeeze())print("OriginalDataFramewithredundantdimension:")print(redundant_df)print("CleanedSeries:")print(cleaned_series,end='\n\n')#240-7、数学与统计计算importpandasaspd#创建一个DataFramedf=pd.DataFrame({'Value':[10,20,30]})#计算总和并使用squeezetotal_sum=df[['Value']].sum().squeeze()print("Totalsumofvalues:",total_sum)240-6-3、结果输出#240、pandas.Series.squeeze方法#240-1、从DataFrame提取单行或单列#OriginalsinglecolumnDataFrame:#A#010#120#230#SqueezedSeriesfromsinglecolumn:#010#120#230#Name:A,dtype:int64#OriginalsinglerowDataFrame:#AB#01015#SqueezedSeriesfromsinglerow:#A10#B15#Name:0,dtype:int64#240-2、数据分组后的操作#GroupedmeanDataFrame:#Value#Category#A15.0#Squeezedmeanforsinglecategory:#15.0#240-3、提高内存效率和性能#MemoryusageoforiginalDataFrame:8000132#MemoryusageofsqueezedSeries:8000132#240-4、与函数进行交互#见图1#240-5、简化输出#ProcessedResult:20.0#240-6、数据清洗与转换#OriginalDataFramewithredundantdimension:#Value#0[10]#1[20]#2[30]#CleanedSeries:#010#120#230#Name:Value,dtype:int64#240-7、数学与统计计算#Totalsumofvalues:60图1：二、推荐阅读1、Python筑基之旅2、Python函数之旅3、Python算法之旅4、Python魔法之旅5、博客个人主页

		自动登录	找回密码
密码			会员注册