找回密码
 会员注册
查看: 27|回复: 0

Python酷库之旅-第三方库Pandas(024)

[复制链接]

2万

主题

0

回帖

7万

积分

超级版主

积分
70601
发表于 2024-9-10 01:46:57 | 显示全部楼层 |阅读模式
目录一、用法精讲61、pandas.to_numeric函数61-1、语法61-2、参数61-3、功能61-4、返回值61-5、说明61-6、用法61-6-1、数据准备61-6-2、代码示例61-6-3、结果输出62、pandas.to_datetime函数62-1、语法62-2、参数62-3、功能62-4、返回值62-5、说明62-6、用法62-6-1、数据准备62-6-2、代码示例62-6-3、结果输出 二、推荐阅读1、Python筑基之旅2、Python函数之旅3、Python算法之旅4、Python魔法之旅5、博客个人主页一、用法精讲61、pandas.to_numeric函数61-1、语法#61、pandas.to_numeric函数pandas.to_numeric(arg,errors='raise',downcast=None,dtype_backend=_NoDefault.no_default)Convertargumenttoanumerictype.Thedefaultreturndtypeisfloat64orint64dependingonthedatasupplied.Usethedowncastparametertoobtainotherdtypes.Pleasenotethatprecisionlossmayoccurifreallylargenumbersarepassedin.Duetotheinternallimitationsofndarray,ifnumberssmallerthan-9223372036854775808(np.iinfo(np.int64).min)orlargerthan18446744073709551615(np.iinfo(np.uint64).max)arepassedin,itisverylikelytheywillbeconvertedtofloatsothattheycanbestoredinanndarray.ThesewarningsapplysimilarlytoSeriessinceitinternallyleveragesndarray.Parameters:argscalar,list,tuple,1-darray,orSeriesArgumenttobeconverted.errors{‘ignore’,‘raise’,‘coerce’},default‘raise’If‘raise’,theninvalidparsingwillraiseanexception.If‘coerce’,theninvalidparsingwillbesetasNaN.If‘ignore’,theninvalidparsingwillreturntheinput.Changedinversion2.2.“ignore”isdeprecated.Catchexceptionsexplicitlyinstead.downcaststr,defaultNoneCanbe‘integer’,‘signed’,‘unsigned’,or‘float’.IfnotNone,andifthedatahasbeensuccessfullycasttoanumericaldtype(orifthedatawasnumerictobeginwith),downcastthatresultingdatatothesmallestnumericaldtypepossibleaccordingtothefollowingrules:‘integer’or‘signed’:smallestsignedintdtype(min.:np.int8)‘unsigned’:smallestunsignedintdtype(min.:np.uint8)‘float’:smallestfloatdtype(min.:np.float32)Asthisbehaviourisseparatefromthecoreconversiontonumericvalues,anyerrorsraisedduringthedowncastingwillbesurfacedregardlessofthevalueofthe‘errors’input.Inaddition,downcastingwillonlyoccurifthesizeoftheresultingdata’sdtypeisstrictlylargerthanthedtypeitistobecastto,soifnoneofthedtypescheckedsatisfythatspecification,nodowncastingwillbeperformedonthedata.dtype_backend{‘numpy_nullable’,‘pyarrow’},default‘numpy_nullable’Back-enddatatypeappliedtotheresultantDataFrame(stillexperimental).Behaviourisasfollows:"numpy_nullable":returnsnullable-dtype-backedDataFrame(default)."pyarrow":returnspyarrow-backednullableArrowDtypeDataFrame.Newinversion2.0.Returns:retNumericifparsingsucceeded.Returntypedependsoninput.SeriesifSeries,otherwisendarray.61-2、参数61-2-1、arg(必须):表示你想要转换的数据,可以是一个单独的数值、列表、Series或者DataFrame。61-2-2、errors(可选,默认值为'raise'):指定在遇到不能转换为数字的值时的处理方式,可选的值有:61-2-2-1、'raise'(默认值):遇到错误时会引发异常。61-2-2-2、'coerce':遇到不能转换为数字的值时,将其转换为NaN(缺失值)。61-2-2-3、'ignore':忽略不能转换为数字的值,保持原样。61-2-3、downcast(可选,默认值为None):用于将数据转换为较低精度的数值类型,以减少内存使用,可选值有:61-2-3-1、None(默认值):不进行降级。61-2-3-2、'integer':尽可能转换为较小的整数类型。61-2-3-3、'signed':尽可能转换为较小的有符号整数类型。61-2-3-4、'unsigned':尽可能转换为较小的无符号整数类型。61-2-3-5、'float':尽可能转换为较小的浮点数类型。61-2-4、dtype_backend(可选):内部调用,一般不需要用户直接设置。61-3、功能        用于将参数(如单个值、列表、Series或者DataFrame)中的数据转换为数字类型(整数或浮点数)。61-4、返回值        函数的返回值取决于输入数据的类型:61-4-1、单个值:如果输入是单个值,返回一个转换后的数值(整数或浮点数)。61-4-2、列表:如果输入是列表,返回一个包含转换后数值的列表。61-4-3、Series:如果输入是pandasSeries,返回一个转换后的pandasSeries,类型为数值类型。61-4-4、DataFrame:如果输入是pandasDataFrame,返回一个转换后的DataFrame,每一列都会尝试转换为数值类型。61-5、说明        该函数通过灵活的参数设置,能够有效地将不同类型的数据转换为数值类型,并提供多种错误处理选项,适用于数据预处理和清洗的各类场景。61-6、用法61-6-1、数据准备无61-6-2、代码示例#61、pandas.to_numeric函数#61-1、转换Seriesimportpandasaspddata=pd.Series(['1','2','3','apple','5'])#转换为数字,遇到错误将其转换为NaNnumeric_data=pd.to_numeric(data,errors='coerce')print(numeric_data,end='\n\n')#61-2、转换DataFrameimportpandasaspddf=pd.DataFrame({'A':['1','2','3','apple','5'],'B':['10.5','20.1','30.2','40.0','50.5']})#转换为数字,遇到错误将其转换为NaNnumeric_df=df.apply(pd.to_numeric,errors='coerce')print(numeric_df)61-6-3、结果输出#61、pandas.to_numeric函数#61-1、转换Series#01.0#12.0#23.0#3NaN#45.0#dtype:float64#61-2、转换DataFrame#AB#01.010.5#12.020.1#23.030.2#3NaN40.0#45.050.562、pandas.to_datetime函数62-1、语法#62、pandas.to_datetime函数pandas.to_datetime(arg,errors='raise',dayfirst=False,yearfirst=False,utc=False,format=None,exact=_NoDefault.no_default,unit=None,infer_datetime_format=_NoDefault.no_default,origin='unix',cache=True)Convertargumenttodatetime.Thisfunctionconvertsascalar,array-like,SeriesorDataFrame/dict-liketoapandasdatetimeobject.Parameters:argint,float,str,datetime,list,tuple,1-darray,Series,DataFrame/dict-likeTheobjecttoconverttoadatetime.IfaDataFrameisprovided,themethodexpectsminimallythefollowingcolumns:"year","month","day".Thecolumn“year”mustbespecifiedin4-digitformat.errors{‘ignore’,‘raise’,‘coerce’},default‘raise’If'raise',theninvalidparsingwillraiseanexception.If'coerce',theninvalidparsingwillbesetasNaT.If'ignore',theninvalidparsingwillreturntheinput.dayfirstbool,defaultFalseSpecifyadateparseorderifargisstrorislist-like.IfTrue,parsesdateswiththedayfirst,e.g."10/11/12"isparsedas2012-11-10.Warningdayfirst=Trueisnotstrict,butwillprefertoparsewithdayfirst.yearfirstbool,defaultFalseSpecifyadateparseorderifargisstrorislist-like.IfTrueparsesdateswiththeyearfirst,e.g."10/11/12"isparsedas2010-11-12.IfbothdayfirstandyearfirstareTrue,yearfirstispreceded(sameasdateutil).Warningyearfirst=Trueisnotstrict,butwillprefertoparsewithyearfirst.utcbool,defaultFalseControltimezone-relatedparsing,localizationandconversion.IfTrue,thefunctionalwaysreturnsatimezone-awareUTC-localizedTimestamp,SeriesorDatetimeIndex.Todothis,timezone-naiveinputsarelocalizedasUTC,whiletimezone-awareinputsareconvertedtoUTC.IfFalse(default),inputswillnotbecoercedtoUTC.Timezone-naiveinputswillremainnaive,whiletimezone-awareoneswillkeeptheirtimeoffsets.Limitationsexistformixedoffsets(typically,daylightsavings),seeExamplessectionfordetails.WarningInafutureversionofpandas,parsingdatetimeswithmixedtimezoneswillraiseanerrorunlessutc=True.Pleasespecifyutc=Truetooptintothenewbehaviourandsilencethiswarning.TocreateaSerieswithmixedoffsetsandobjectdtype,pleaseuseapplyanddatetime.datetime.strptime.Seealso:pandasgeneraldocumentationabouttimezoneconversionandlocalization.formatstr,defaultNoneThestrftimetoparsetime,e.g."%d/%m/%Y".Seestrftimedocumentationformoreinformationonchoices,thoughnotethat"%f"willparseallthewayuptonanoseconds.Youcanalsopass:“ISO8601”,toparseanyISO8601timestring(notnecessarilyinexactlythesameformat);“mixed”,toinfertheformatforeachelementindividually.Thisisrisky,andyoushouldprobablyuseitalongwithdayfirst.NoteIfaDataFrameispassed,thenformathasnoeffect.exactbool,defaultTrueControlhowformatisused:IfTrue,requireanexactformatmatch.IfFalse,allowtheformattomatchanywhereinthetargetstring.Cannotbeusedalongsideformat='ISO8601'orformat='mixed'.unitstr,default‘ns’Theunitofthearg(D,s,ms,us,ns)denotetheunit,whichisanintegerorfloatnumber.Thiswillbebasedofftheorigin.Example,withunit='ms'andorigin='unix',thiswouldcalculatethenumberofmillisecondstotheunixepochstart.infer_datetime_formatbool,defaultFalseIfTrueandnoformatisgiven,attempttoinfertheformatofthedatetimestringsbasedonthefirstnon-NaNelement,andifitcanbeinferred,switchtoafastermethodofparsingthem.Insomecasesthiscanincreasetheparsingspeedby~5-10x.Deprecatedsinceversion2.0.0:Astrictversionofthisargumentisnowthedefault,passingithasnoeffect.originscalar,default‘unix’Definethereferencedate.Thenumericvalueswouldbeparsedasnumberofunits(definedbyunit)sincethisreferencedate.If'unix'(orPOSIX)time;originissetto1970-01-01.If'julian',unitmustbe'D',andoriginissettobeginningofJulianCalendar.Juliandaynumber0isassignedtothedaystartingatnoononJanuary1,4713BC.IfTimestampconvertible(Timestamp,dt.datetime,np.datetimt64ordatestring),originissettoTimestampidentifiedbyorigin.Ifafloatorinteger,originisthedifference(inunitsdeterminedbytheunitargument)relativeto1970-01-01.cachebool,defaultTrueIfTrue,useacacheofunique,converteddatestoapplythedatetimeconversion.Mayproducesignificantspeed-upwhenparsingduplicatedatestrings,especiallyoneswithtimezoneoffsets.Thecacheisonlyusedwhenthereareatleast50values.Thepresenceofout-of-boundsvalueswillrenderthecacheunusableandmayslowdownparsing.Returns:datetimeIfparsingsucceeded.Returntypedependsoninput(typesinparenthesiscorrespondtofallbackincaseofunsuccessfultimezoneorout-of-rangetimestampparsing):scalar:Timestamp(ordatetime.datetime)array-likeatetimeIndex(orSerieswithobjectdtypecontainingdatetime.datetime)Series:Seriesofdatetime64dtype(orSeriesofobjectdtypecontainingdatetime.datetime)DataFrame:Seriesofdatetime64dtype(orSeriesofobjectdtypecontainingdatetime.datetime)RaisesarserErrorWhenparsingadatefromstringfails.ValueErrorWhenanotherdatetimeconversionerrorhappens.Forexamplewhenoneof‘year’,‘month’,day’columnsismissinginaDataFrame,orwhenaTimezone-awaredatetime.datetimeisfoundinanarray-likeofmixedtimeoffsets,andutc=False.62-2、参数62-2-1、arg(必须):表示需要转换为日期时间的对象,可以是单个日期时间字符串、日期时间对象、列表、Series或DataFrame。62-2-2、errors(可选,默认值为'raise'):指定在遇到不能转换为数字的值时的处理方式,可选的值有:62-2-2-1、'raise'(默认值):遇到错误时会引发异常。62-2-2-2、'coerce':遇到不能转换为数字的值时,将其转换为NaN(缺失值)。62-2-2-3、'ignore':忽略不能转换为数字的值,保持原样。62-2-3、dayfirst(可选,默认值为False):当为True时,解析日期时优先将前两位作为日,例如:dayfirst=True将'10/11/2024'解析为2024年11月10日。62-2-4、yearfirst(可选,默认值为False):当为True时,解析日期时优先将前两位作为年,例如:yearfirst=True将'2024-10-11'解析为2024年10月11日。62-2-5、utc(可选,默认值为False):当为True时,将时间转换为UTC时间。62-2-6、format(可选,默认值为None):指定日期时间字符串的格式,例如:format='%Y-%m-%d%H:%M:%S'。62-2-7、exact(可选):当为True时,要求日期时间字符串完全匹配格式。62-2-8、unit(可选,默认值为None):如果传入的是整数或浮点数,指定其时间单位,如s(秒),ms(毫秒),us(微秒),ns(纳秒)。62-2-9、infer_datetime_format(可选):当为True时,自动推断日期时间字符串的格式以提高解析速度。62-2-10、origin(可选,默认值为'unix'):指定时间计算的起点,可以是'unix'(1970-01-01),也可以是具体的时间字符串。62-2-11、cache(可选,默认值为True):当为True时,启用缓存以提高解析速度。62-3、功能        用于将各种格式的输入数据转换为datetime64[ns]类型,确保数据在后续分析中具有一致的日期时间格式。62-4、返回值        返回值类型取决于输入:62-4-1、如果输入是单个字符串或单个数值,则返回一个Timestamp对象。62-4-2、如果输入是列表、数组、Series或DataFrame,则返回一个DatetimeIndex或Series,其中包含转换后的日期时间数据。62-5、说明    无62-6、用法62-6-1、数据准备无62-6-2、代码示例#62、pandas.to_datetime函数#62-1、将字符串转换为datetimeimportpandasaspddate_str='2024-07-15'date=pd.to_datetime(date_str)print(date,end='\n\n')#62-2、将列表转换为datetimeimportpandasaspddate_list=['2024-07-15','2025-07-15']dates=pd.to_datetime(date_list)print(dates,end='\n\n')#62-3、处理Series并处理错误importpandasaspddate_series=pd.Series(['2024-07-15','2025-07-15','notadate'])dates=pd.to_datetime(date_series,errors='coerce')print(dates,end='\n\n')#62-4、使用特定格式解析日期importpandasaspddate_str='15/07/2024'date=pd.to_datetime(date_str,format='%d/%m/%Y',dayfirst=True)print(date,end='\n\n')#62-5、将时间戳转换为datetimeimportpandasaspdtimestamp_series=pd.Series([1626357600,1626358200])dates=pd.to_datetime(timestamp_series,unit='s')print(dates)62-6-3、结果输出 #62、pandas.to_datetime函数#62-1、将字符串转换为datetime#2024-07-1500:00:00#62-2、将列表转换为datetime#DatetimeIndex(['2024-07-15','2025-07-15'],dtype='datetime64[ns]',freq=None)#62-3、处理Series并处理错误#02024-07-15#12025-07-15#2NaT#dtype:datetime64[ns]#62-4、使用特定格式解析日期#2024-07-1500:00:00#62-5、将时间戳转换为datetime#02021-07-1514:00:00#12021-07-1514:10:00#dtype:datetime64[ns]二、推荐阅读1、Python筑基之旅2、Python函数之旅3、Python算法之旅4、Python魔法之旅5、博客个人主页
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 会员注册

本版积分规则

QQ|手机版|心飞设计-版权所有:微度网络信息技术服务中心 ( 鲁ICP备17032091号-12 )|网站地图

GMT+8, 2025-1-8 12:18 , Processed in 0.625673 second(s), 26 queries .

Powered by Discuz! X3.5

© 2001-2025 Discuz! Team.

快速回复 返回顶部 返回列表