Python酷库之旅-第三方库Pandas(064)

通缉犯叫常大 · 发表于 2024-9-10 02:31:37

目录一、用法精讲251、pandas.Series.tz_localize方法251-1、语法251-2、参数251-3、功能251-4、返回值251-5、说明251-6、用法251-6-1、数据准备251-6-2、代码示例251-6-3、结果输出252、pandas.Series.at_time方法252-1、语法252-2、参数252-3、功能252-4、返回值252-5、说明252-6、用法252-6-1、数据准备252-6-2、代码示例252-6-3、结果输出253、pandas.Series.between_time方法253-1、语法253-2、参数253-3、功能253-4、返回值253-5、说明253-6、用法253-6-1、数据准备253-6-2、代码示例253-6-3、结果输出254、pandas.Series.str方法254-1、语法254-2、参数254-3、功能254-3-1、转换大小写254-3-2、字符串匹配和搜索254-3-3、字符串替换和去除254-3-4、字符串分割和连接254-3-5、提取和访问子串254-3-6、格式化和填充254-3-7、长度和计数254-4、返回值254-5、说明254-6、用法254-6-1、数据准备254-6-2、代码示例254-6-3、结果输出255、pandas.Series.cat方法255-1、语法255-2、参数255-3、功能255-4、返回值255-5、说明255-6、用法255-6-1、数据准备255-6-2、代码示例255-6-3、结果输出二、推荐阅读1、Python筑基之旅2、Python函数之旅3、Python算法之旅4、Python魔法之旅5、博客个人主页一、用法精讲251、pandas.Series.tz_localize方法251-1、语法#251、pandas.Series.tz_localize方法pandas.Series.tz_localize(tz,axis=0,level=None,copy=None,ambiguous='raise',nonexistent='raise')Localizetz-naiveindexofaSeriesorDataFrametotargettimezone.ThisoperationlocalizestheIndex.Tolocalizethevaluesinatimezone-naiveSeries,useSeries.dt.tz_localize().Parameters:tzstrortzinfoorNoneTimezonetolocalize.PassingNonewillremovethetimezoneinformationandpreservelocaltime.axis{0or‘index’,1or‘columns’},default0Theaxistolocalizelevelint,str,defaultNoneIfaxisiaaMultiIndex,localizeaspecificlevel.OtherwisemustbeNone.copybool,defaultTrueAlsomakeacopyoftheunderlyingdata.NoteThecopykeywordwillchangebehaviorinpandas3.0.Copy-on-Writewillbeenabledbydefault,whichmeansthatallmethodswithacopykeywordwillusealazycopymechanismtodeferthecopyandignorethecopykeyword.Thecopykeywordwillberemovedinafutureversionofpandas.Youcanalreadygetthefuturebehaviorandimprovementsthroughenablingcopyonwritepd.options.mode.copy_on_write=Trueambiguous‘infer’,bool-ndarray,‘NaT’,default‘raise’WhenclocksmovedbackwardduetoDST,ambiguoustimesmayarise.ForexampleinCentralEuropeanTime(UTC+01),whengoingfrom03:00DSTto02:00non-DST,02:30:00localtimeoccursbothat00:30:00UTCandat01:30:00UTC.Insuchasituation,theambiguousparameterdictateshowambiguoustimesshouldbehandled.‘infer’willattempttoinferfalldst-transitionhoursbasedonorderbool-ndarraywhereTruesignifiesaDSTtime,Falsedesignatesanon-DSTtime(notethatthisflagisonlyapplicableforambiguoustimes)‘NaT’willreturnNaTwherethereareambiguoustimes‘raise’willraiseanAmbiguousTimeErrorifthereareambiguoustimes.nonexistentstr,default‘raise’AnonexistenttimedoesnotexistinaparticulartimezonewhereclocksmovedforwardduetoDST.Validvaluesare:‘shift_forward’willshiftthenonexistenttimeforwardtotheclosestexistingtime‘shift_backward’willshiftthenonexistenttimebackwardtotheclosestexistingtime‘NaT’willreturnNaTwheretherearenonexistenttimestimedeltaobjectswillshiftnonexistenttimesbythetimedelta‘raise’willraiseanNonExistentTimeErroriftherearenonexistenttimes.Returns:Series/DataFrameSametypeastheinput.Raises:TypeErrorIftheTimeSeriesistz-awareandtzisnotNone.251-2、参数251-2-1、tz(必须)：字符串或pytz.timezone对象，指定要本地化的时区，可以是时区的名称(如'US/Eastern')或一个pytz时区对象。251-2-2、axis(可选，默认值为0)：整数或字符串，指定沿着哪个轴进行本地化，对于Series，这个参数通常被忽略，因为Series只有一个轴，即轴0。251-2-3、level(可选，默认值为None)：整数或字符串，当处理多级索引(MultiIndex)时，此参数指定要本地化的级别，对普通的Series对象通常不需要使用。251-2-4、copy(可选，默认值为None)：布尔值，如果设置为False，会尝试在原地修改Series；如果为True，则会返回一个新的Series；默认值为None时，会自动选择合适的策略。251-2-5、ambiguous(可选，默认值为'raise')：字符串，处理夏令时切换期间的模糊时间，如果设置为'raise'，则在出现模糊时间时抛出异常；如果设置为'NaT'，则将这些时间标记为NaT(NotaTime)；如果设置为'ignore'，则保持原样。251-2-6、nonexistent(可选，默认值为'raise')：字符串，处理由于时区转换而不存在的时间，如果设置为'raise'，则在出现不存在的时间时抛出异常；如果设置为'NaT'，则将这些时间标记为NaT(NotaTime)；如果设置为'shift'，则将这些时间移到最近的存在时间。251-3、功能用于将一个没有时区的Series对象的时间戳本地化到指定的时区。251-4、返回值返回一个新的Series对象，其中时间戳已经被本地化到指定的时区，如果copy=False，可能会修改原始Series对象(具体取决于是否需要复制)，新Series的时间戳将带有时区信息，格式为DatetimeIndex。251-5、说明无251-6、用法251-6-1、数据准备无251-6-2、代码示例#251、pandas.Series.tz_localize方法importpandasaspds=pd.Series(range(7),index=pd.DatetimeIndex(['2024-8-201:30:00','2024-8-202:00:00','2024-8-202:30:00','2024-8-203:00:00','2024-8-203:30:00','2024-8-204:00:00','2024-8-204:30:00']))data=s.tz_localize('CET',ambiguous='infer')print(data)251-6-3、结果输出#251、pandas.Series.tz_localize方法#2024-08-0201:30:00+02:000#2024-08-0202:00:00+02:001#2024-08-0202:30:00+02:002#2024-08-0203:00:00+02:003#2024-08-0203:30:00+02:004#2024-08-0204:00:00+02:005#2024-08-0204:30:00+02:006#dtype:int64252、pandas.Series.at_time方法252-1、语法#252、pandas.Series.at_time方法pandas.Series.at_time(time,asof=False,axis=None)Selectvaluesatparticulartimeofday(e.g.,9:30AM).Parameters:timedatetime.timeorstrThevaluestoselect.axis{0or‘index’,1or‘columns’},default0ForSeriesthisparameterisunusedanddefaultsto0.Returns:SeriesorDataFrameRaises:TypeErrorIftheindexisnotaDatetimeIndex252-2、参数252-2-1、time(必须)：字符串或datetime.time，指定要提取的时间，这个时间应当是datetime.time对象或符合时间格式的字符串(如'06:18')。252-2-2、asof(可选，默认值为False)：布尔值，如果设置为True，方法将返回在指定时间点之前最近的时间；如果为False，则方法仅返回与指定时间完全匹配的时间点的数据。252-2-3、axis(可选，默认值为None)：整数或字符串，指定沿着哪个轴进行操作，在Series中，通常可以忽略这个参数，因为Series只有一个轴(即轴0)。252-3、功能用于从Series中筛选出指定时间点的数据，该方法将时间与Series的索引进行匹配，提取出符合指定时间的数据行，可以用来获取一天中某个特定时间点的数据，忽略具体的日期信息。252-4、返回值返回一个新的Series对象，其中包含了在指定时间点的数据，返回的Series中的索引是与指定时间匹配的时间戳。252-5、说明无252-6、用法252-6-1、数据准备无252-6-2、代码示例#252、pandas.Series.at_time方法importpandasaspd#创建一个时间序列idx=pd.date_range('2024-01-01',periods=4,freq='h')data=pd.Series([1,2,3,4],index=idx)#提取每天的'02:00'数据result=data.at_time('02:00')print("提取的时间点数据:")print(result)252-6-3、结果输出#252、pandas.Series.at_time方法#提取的时间点数据:#2024-01-0102:00:003#Freq:h,dtype:int64253、pandas.Series.between_time方法253-1、语法#253、pandas.Series.between_time方法pandas.Series.between_time(start_time,end_time,inclusive='both',axis=None)Selectvaluesbetweenparticulartimesoftheday(e.g.,9:00-9:30AM).Bysettingstart_timetobelaterthanend_time,youcangetthetimesthatarenotbetweenthetwotimes.Parameters:start_timedatetime.timeorstrInitialtimeasatimefilterlimit.end_timedatetime.timeorstrEndtimeasatimefilterlimit.inclusive{“both”,“neither”,“left”,“right”},default“both”Includeboundaries;whethertoseteachboundasclosedoropen.axis{0or‘index’,1or‘columns’},default0Determinerangetimeonindexorcolumnsvalue.ForSeriesthisparameterisunusedanddefaultsto0.Returns:SeriesorDataFrameDatafromtheoriginalobjectfilteredtothespecifieddatesrange.Raises:TypeErrorIftheindexisnotaDatetimeIndex253-2、参数253-2-1、start_time(必须)：字符串或datetime.time，指定时间范围的开始时间，应当是datetime.time对象或符合时间格式的字符串(如'06:18')。253-2-2、end_time(必须)：字符串或datetime.time，指定时间范围的结束时间，应当是datetime.time对象或符合时间格式的字符串(如'17:30')。253-2-3、inclusive(可选，默认值为'both')：{'both','neither','left','right'}，指定时间范围的边界条件，可以选择以下四个值：'both'：包括开始时间和结束时间。'neither'：不包括开始时间和结束时间。'left'：包括开始时间，但不包括结束时间。'right'：不包括开始时间，但包括结束时间。253-2-4、axis(可选，默认值为None)：整数或字符串，指定沿着哪个轴进行操作，在Series中，通常可以忽略这个参数，因为Series只有一个轴(即轴0)。253-3、功能用于从Series中筛选出在指定时间范围内的数据，它将时间与Series的索引进行匹配，提取出在开始时间和结束时间之间的数据，该方法可以用来获取一天中某个时间段的数据，忽略日期信息。253-4、返回值返回一个新的Series对象，其中包含了在指定时间范围内的数据，返回的Series中的索引是与指定时间范围匹配的时间戳。253-5、说明无253-6、用法253-6-1、数据准备无253-6-2、代码示例#253、pandas.Series.between_time方法importpandasaspd#创建一个时间序列idx=pd.date_range('2024-08-01',periods=24,freq='h')data=pd.Series(range(24),index=idx)#提取每天的'09:00'到'17:00'之间的数据result=data.between_time('09:00','17:00')print("提取的时间范围数据:")print(result)253-6-3、结果输出#253、pandas.Series.between_time方法#提取的时间范围数据:#2024-08-0109:00:009#2024-08-0110:00:0010#2024-08-0111:00:0011#2024-08-0112:00:0012#2024-08-0113:00:0013#2024-08-0114:00:0014#2024-08-0115:00:0015#2024-08-0116:00:0016#2024-08-0117:00:0017#Freq:h,dtype:int64254、pandas.Series.str方法254-1、语法#254、pandas.Series.str方法pandas.Series.str()VectorizedstringfunctionsforSeriesandIndex.NAsstayNAunlesshandledotherwisebyaparticularmethod.PatternedafterPython’sstringmethods,withsomeinspirationfromR’sstringrpackage.254-2、参数无254-3、功能254-3-1、转换大小写254-3-1-1、str.lower()：将每个字符串转换为小写。254-3-1-2、str.upper()：将每个字符串转换为大写。254-3-2、字符串匹配和搜索254-3-2-1、str.contains(pattern)：检查每个字符串是否包含指定的模式，返回布尔值的Series。254-3-2-2、str.startswith(prefix)：检查每个字符串是否以指定的前缀开头，返回布尔值的Series。254-3-2-3、str.endswith(suffix)：检查每个字符串是否以指定的后缀结尾，返回布尔值的Series。254-3-3、字符串替换和去除254-3-3-1、str.replace(old,new)：将每个字符串中的指定内容替换为新的内容。254-3-3-2、str.strip()：去除每个字符串的前后空白字符。254-3-4、字符串分割和连接254-3-4-1、str.split(separator)：按照指定分隔符将字符串分割为列表。254-3-4-2、str.join(sep)：使用指定的分隔符连接列表中的元素。254-3-5、提取和访问子串254-3-5-1、str.extract(pattern)：使用正则表达式提取匹配的子字符串。254-3-5-2、str.get(i)：获取每个字符串的第i个字符。254-3-6、格式化和填充254-3-6-1、str.pad(width,side='left',fillchar='')：使用指定字符填充字符串，使其达到指定宽度。可以选择在左、右或两侧填充。254-3-6-2、str.zfill(width)：在左侧填充零，使字符串达到指定宽度。254-3-7、长度和计数254-3-7-1、str.len()：返回每个字符串的长度。254-3-7-2、str.count(pattern)：计算每个字符串中匹配模式的出现次数。254-4、返回值功能不同，产生了不同的返回值。254-5、说明无254-6、用法254-6-1、数据准备无254-6-2、代码示例#254、pandas.Series.str方法#254-1、转换大小写#254-1-1、str.lower()：将每个字符串转换为小写importpandasaspds=pd.Series(['Hello','World','Pandas'])s_lower=s.str.lower()print(s_lower,end='\n\n')#254-1-2、str.upper()：将每个字符串转换为大写importpandasaspds=pd.Series(['Hello','World','Pandas'])s_upper=s.str.upper()print(s_upper,end='\n\n')#254-2、字符串匹配和搜索#254-2-1、str.contains(pattern)：检查每个字符串是否包含指定的模式，返回布尔值的Seriesimportpandasaspds=pd.Series(['Hello','World','Pandas'])s_contains=s.str.contains('o')print(s_contains,end='\n\n')#254-2-2、str.startswith(prefix)：检查每个字符串是否以指定的前缀开头，返回布尔值的Seriesimportpandasaspds=pd.Series(['Hello','World','Pandas'])s_startswith=s.str.startswith('P')print(s_startswith,end='\n\n')#254-2-3、str.endswith(suffix)：检查每个字符串是否以指定的后缀结尾，返回布尔值的Seriesimportpandasaspds=pd.Series(['Hello','World','Pandas'])s_endswith=s.str.endswith('s')print(s_endswith,end='\n\n')#254-3、字符串替换和去除#254-3-1、str.replace(old,new)：将每个字符串中的指定内容替换为新的内容importpandasaspds=pd.Series(['Hello','World','Pandas'])s_replace=s.str.replace('o','0')print(s_replace,end='\n\n')#254-3-2、str.strip()：去除每个字符串的前后空白字符importpandasaspds_with_spaces=pd.Series(['Hello','World','Pandas'])s_strip=s_with_spaces.str.strip()print(s_strip,end='\n\n')#254-4、字符串分割和连接#254-4-1、str.split(separator)：按照指定分隔符将字符串分割为列表importpandasaspds=pd.Series(['Hello','World','Pandas'])s_split=s.str.split('l')print(s_split,end='\n\n')#254-4-2、str.join(sep)：使用指定的分隔符连接列表中的元素importpandasaspds_join=pd.Series([['a','b','c'],['d','e'],['f']])s_joined=s_join.str.join('-')print(s_joined,end='\n\n')#254-5、提取和访问子串#254-5-1、str.extract(pattern)：使用正则表达式提取匹配的子字符串importpandasaspds_dates=pd.Series(['2022-08-01','2023-07-06','2024-08-03'])s_extract=s_dates.str.extract(r'(\d{4})-(\d{2})-(\d{2})')print(s_extract,end='\n\n')#254-5-2、str.get(i)：获取每个字符串的第i个字符importpandasaspds=pd.Series(['Hello','World','Pandas'])s_get=s.str.get(1)print(s_get,end='\n\n')#254-6、格式化和填充#254-6-1、str.pad(width,side='left',fillchar='')：使用指定字符填充字符串，使其达到指定宽度。可以选择在左、右或两侧填充importpandasaspds=pd.Series(['Hello','World','Pandas'])s_pad=s.str.pad(10,side='right',fillchar='*')print(s_pad,end='\n\n')#254-6-2、str.zfill(width)：在左侧填充零，使字符串达到指定宽度importpandasaspds=pd.Series(['Hello','World','Pandas'])s_zfill=s.str.zfill(10)print(s_zfill,end='\n\n')#254-7、长度和计数#254-7-1、str.len()：返回每个字符串的长度importpandasaspds=pd.Series(['Hello','World','Pandas'])s_len=s.str.len()print(s_len,end='\n\n')#254-7-2、str.count(pattern)：计算每个字符串中匹配模式的出现次数importpandasaspds=pd.Series(['Hello','World','Pandas'])s_count=s.str.count('l')print(s_count,end='\n\n')#254-8、综合案例importpandasaspd#创建一个包含混合文本数据的Seriesdata=pd.Series(['Myelsa','bob123','C@r0l','DAVID'])#清洗和标准化文本数据cleaned_data=data.str.strip().str.lower().str.replace(r'\d+','').str.replace(r'[^a-z]','')print(cleaned_data)254-6-3、结果输出#254、pandas.Series.str方法#254-1、转换大小写#254-1-1、str.lower()：将每个字符串转换为小写#0hello#1world#2pandas#dtype

bject#254-1-2、str.upper()：将每个字符串转换为大写#0HELLO#1WORLD#2PANDAS#dtype

bject#254-2、字符串匹配和搜索#254-2-1、str.contains(pattern)：检查每个字符串是否包含指定的模式，返回布尔值的Series#0True#1True#2False#dtype:bool#254-2-2、str.startswith(prefix)：检查每个字符串是否以指定的前缀开头，返回布尔值的Series#0False#1False#2True#dtype:bool#254-2-3、str.endswith(suffix)：检查每个字符串是否以指定的后缀结尾，返回布尔值的Series#0False#1False#2True#dtype:bool#254-3、字符串替换和去除#254-3-1、str.replace(old,new)：将每个字符串中的指定内容替换为新的内容#0Hell0#1W0rld#2Pandas#dtype

bject#254-3-2、str.strip()：去除每个字符串的前后空白字符#0Hello#1World#2Pandas#dtype

bject#254-4、字符串分割和连接#254-4-1、str.split(separator)：按照指定分隔符将字符串分割为列表#0[He,,o]#1[Wor,d]#2[Pandas]#dtype

bject#254-4-2、str.join(sep)：使用指定的分隔符连接列表中的元素#0a-b-c#1d-e#2f#dtype

bject#254-5、提取和访问子串#254-5-1、str.extract(pattern)：#012#020220801#120230706#220240803#254-5-2、str.get(i)：获取每个字符串的第i个字符#0e#1o#2a#dtype

bject#254-6、格式化和填充#254-6-1、str.pad(width,side='left',fillchar='')：使用指定字符填充字符串，使其达到指定宽度。可以选择在左、右或两侧填充#0Hello*****#1World*****#2Pandas****#dtype

bject#254-6-2、str.zfill(width)：在左侧填充零，使字符串达到指定宽度#000000Hello#100000World#20000Pandas#dtype

bject#254-7、长度和计数#254-7-1、str.len()：返回每个字符串的长度#05#15#26#dtype:int64#254-7-2、str.count(pattern)：计算每个字符串中匹配模式的出现次数#02#11#20#dtype:int64#254-8、综合案例#0myelsa#1bob123#2c@r0l#3david#dtype

bject255、pandas.Series.cat方法255-1、语法#255、pandas.Series.cat方法pandas.Series.cat()AccessorobjectforcategoricalpropertiesoftheSeriesvalues.Parameters:dataSeriesorCategoricalIndex255-2、参数无255-3、功能提供了对类别数据的创建、修改、重命名和管理等功能，使得处理数据时可以利用类别数据的特性进行更加细致的操作和分析。255-4、返回值返回值是一个Categorical对象，它提供了关于类别数据的详细信息，包括类别的列表、类别的顺序(如果有的话)等，该对象使得对类别数据的进一步操作和分析更加灵活和高效。255-5、说明无255-6、用法255-6-1、数据准备无255-6-2、代码示例#255、pandas.Series.cat方法importpandasaspd#创建一个包含类别数据的Seriesdata=pd.Series(['apple','banana','apple','orange','banana'],dtype='category')#输出原始数据及其类别print("OriginalSeries:")print(data)print("Categories:")print(data.cat.categories,end='\n\n')#255-1、设置类别的顺序data=data.cat.set_categories(['banana','apple','orange'],ordered=True)print("OrderedCategories:")print(data.cat.categories,end='\n\n')#255-2、添加新类别data=data.cat.add_categories(['grape'])print("CategoriesafterAdding'grape':")print(data.cat.categories,end='\n\n')#255-3、删除类别data=data.cat.remove_categories(['orange'])print("CategoriesafterRemoving'orange':")print(data.cat.categories,end='\n\n')#255-4、重命名类别data=data.cat.rename_categories({'banana':'yellow_banana','apple':'green_apple'})print("CategoriesafterRenaming:")print(data.cat.categories,end='\n\n')#255-5、获取类别的整数编码print("IntegerEncodingoftheSeries:")print(data.cat.codes,end='\n\n')#255-6、查看数据和类别编码print("DatawithIntegerEncoding:")print(data)255-6-3、结果输出#255、pandas.Series.cat方法#OriginalSeries:#0apple#1banana#2apple#3orange#4banana#dtype:category#Categories(3,object):['apple','banana','orange']#Categories:#Index(['apple','banana','orange'],dtype='object')#255-1、设置类别的顺序#OrderedCategories:#Index(['banana','apple','orange'],dtype='object')#255-2、添加新类别#CategoriesafterAdding'grape':#Index(['banana','apple','orange','grape'],dtype='object')#255-3、删除类别#CategoriesafterRemoving'orange':#Index(['banana','apple','grape'],dtype='object')#255-4、重命名类别#CategoriesafterRenaming:#Index(['yellow_banana','green_apple','grape'],dtype='object')#255-5、获取类别的整数编码#IntegerEncodingoftheSeries:#01#10#21#3-1#40#dtype:int8#255-6、查看数据和类别编码#DatawithIntegerEncoding:#0green_apple#1yellow_banana#2green_apple#3NaN#4yellow_banana#dtype:category#Categories(3,object):['yellow_banana'

		自动登录	找回密码
密码			会员注册