找回密码
 会员注册
查看: 22|回复: 0

Python酷库之旅-第三方库Pandas(003)

[复制链接]

2万

主题

0

回帖

7万

积分

超级版主

积分
70610
发表于 2024-9-10 01:20:53 | 显示全部楼层 |阅读模式
目录一、用法精讲4、pandas.read_csv函数4-1、语法4-2、参数4-3、功能4-4、返回值4-5、说明4-6、用法4-6-1、创建csv文件4-6-2、代码示例 4-6-3、结果输出二、推荐阅读1、Python筑基之旅2、Python函数之旅3、Python算法之旅4、Python魔法之旅5、博客个人主页一、用法精讲4、pandas.read_csv函数4-1、语法#4、pandas.read_csv函数pandas.read_csv(filepath_or_buffer,*,sep=_NoDefault.no_default,delimiter=None,header='infer',names=_NoDefault.no_default,index_col=None,usecols=None,dtype=None,engine=None,converters=None,true_values=None,false_values=None,skipinitialspace=False,skiprows=None,skipfooter=0,nrows=None,na_values=None,keep_default_na=True,na_filter=True,verbose=_NoDefault.no_default,skip_blank_lines=True,parse_dates=None,infer_datetime_format=_NoDefault.no_default,keep_date_col=_NoDefault.no_default,date_parser=_NoDefault.no_default,date_format=None,dayfirst=False,cache_dates=True,iterator=False,chunksize=None,compression='infer',thousands=None,decimal='.',lineterminator=None,quotechar='"',quoting=0,doublequote=True,escapechar=None,comment=None,encoding=None,encoding_errors='strict',dialect=None,on_bad_lines='error',delim_whitespace=_NoDefault.no_default,low_memory=True,memory_map=False,float_precision=None,storage_options=None,dtype_backend=_NoDefault.no_default)Readacomma-separatedvalues(csv)fileintoDataFrame.Alsosupportsoptionallyiteratingorbreakingofthefileintochunks.AdditionalhelpcanbefoundintheonlinedocsforIOTools.Parameters:filepath_or_bufferstr,pathobjectorfile-likeobjectAnyvalidstringpathisacceptable.ThestringcouldbeaURL.ValidURLschemesincludehttp,ftp,s3,gs,andfile.ForfileURLs,ahostisexpected.Alocalfilecouldbe:file://localhost/path/to/table.csv.Ifyouwanttopassinapathobject,pandasacceptsanyos.PathLike.Byfile-likeobject,werefertoobjectswitharead()method,suchasafilehandle(e.g.viabuiltinopenfunction)orStringIO.sepstr,default‘,’Characterorregexpatterntotreatasthedelimiter.Ifsep=None,theCenginecannotautomaticallydetecttheseparator,butthePythonparsingenginecan,meaningthelatterwillbeusedandautomaticallydetecttheseparatorfromonlythefirstvalidrowofthefilebyPython’sbuiltinsniffertool,csv.Sniffer.Inaddition,separatorslongerthan1characteranddifferentfrom'\s+'willbeinterpretedasregularexpressionsandwillalsoforcetheuseofthePythonparsingengine.Notethatregexdelimitersarepronetoignoringquoteddata.Regexexample:'\r\t'.delimiterstr,optionalAliasforsep.headerint,Sequenceofint,‘infer’orNone,default‘infer’Rownumber(s)containingcolumnlabelsandmarkingthestartofthedata(zero-indexed).Defaultbehavioristoinferthecolumnnames:ifnonamesarepassedthebehaviorisidenticaltoheader=0andcolumnnamesareinferredfromthefirstlineofthefile,ifcolumnnamesarepassedexplicitlytonamesthenthebehaviorisidenticaltoheader=None.Explicitlypassheader=0tobeabletoreplaceexistingnames.TheheadercanbealistofintegersthatspecifyrowlocationsforaMultiIndexonthecolumnse.g.[0,1,3].Interveningrowsthatarenotspecifiedwillbeskipped(e.g.2inthisexampleisskipped).Notethatthisparameterignorescommentedlinesandemptylinesifskip_blank_lines=True,soheader=0denotesthefirstlineofdataratherthanthefirstlineofthefile.namesSequenceofHashable,optionalSequenceofcolumnlabelstoapply.Ifthefilecontainsaheaderrow,thenyoushouldexplicitlypassheader=0tooverridethecolumnnames.Duplicatesinthislistarenotallowed.index_colHashable,SequenceofHashableorFalse,optionalColumn(s)touseasrowlabel(s),denotedeitherbycolumnlabelsorcolumnindices.Ifasequenceoflabelsorindicesisgiven,MultiIndexwillbeformedfortherowlabels.Note:index_col=Falsecanbeusedtoforcepandastonotusethefirstcolumnastheindex,e.g.,whenyouhaveamalformedfilewithdelimitersattheendofeachline.usecolsSequenceofHashableorCallable,optionalSubsetofcolumnstoselect,denotedeitherbycolumnlabelsorcolumnindices.Iflist-like,allelementsmusteitherbepositional(i.e.integerindicesintothedocumentcolumns)orstringsthatcorrespondtocolumnnamesprovidedeitherbytheuserinnamesorinferredfromthedocumentheaderrow(s).Ifnamesaregiven,thedocumentheaderrow(s)arenottakenintoaccount.Forexample,avalidlist-likeusecolsparameterwouldbe[0,1,2]or['foo','bar','baz'].Elementorderisignored,sousecols=[0,1]isthesameas[1,0].ToinstantiateaDataFramefromdatawithelementorderpreservedusepd.read_csv(data,usecols=['foo','bar'])[['foo','bar']]forcolumnsin['foo','bar']orderorpd.read_csv(data,usecols=['foo','bar'])[['bar','foo']]for['bar','foo']order.Ifcallable,thecallablefunctionwillbeevaluatedagainstthecolumnnames,returningnameswherethecallablefunctionevaluatestoTrue.Anexampleofavalidcallableargumentwouldbelambdax:x.upper()in['AAA','BBB','DDD'].Usingthisparameterresultsinmuchfasterparsingtimeandlowermemoryusage.dtypedtypeordictof{Hashabledtype},optionalDatatype(s)toapplytoeitherthewholedatasetorindividualcolumns.E.g.,{'a':np.float64,'b':np.int32,'c':'Int64'}Usestrorobjecttogetherwithsuitablena_valuessettingstopreserveandnotinterpretdtype.Ifconvertersarespecified,theywillbeappliedINSTEADofdtypeconversion.Newinversion1.5.0:Supportfordefaultdictwasadded.Specifyadefaultdictasinputwherethedefaultdeterminesthedtypeofthecolumnswhicharenotexplicitlylisted.engine{‘c’,‘python’,‘pyarrow’},optionalParserenginetouse.TheCandpyarrowenginesarefaster,whilethepythonengineiscurrentlymorefeature-complete.Multithreadingiscurrentlyonlysupportedbythepyarrowengine.Newinversion1.4.0:The‘pyarrow’enginewasaddedasanexperimentalengine,andsomefeaturesareunsupported,ormaynotworkcorrectly,withthisengine.convertersdictof{HashableCallable},optionalFunctionsforconvertingvaluesinspecifiedcolumns.Keyscaneitherbecolumnlabelsorcolumnindices.true_valueslist,optionalValuestoconsiderasTrueinadditiontocase-insensitivevariantsof‘True’.false_valueslist,optionalValuestoconsiderasFalseinadditiontocase-insensitivevariantsof‘False’.skipinitialspacebool,defaultFalseSkipspacesafterdelimiter.skiprowsint,listofintorCallable,optionalLinenumberstoskip(0-indexed)ornumberoflinestoskip(int)atthestartofthefile.Ifcallable,thecallablefunctionwillbeevaluatedagainsttherowindices,returningTrueiftherowshouldbeskippedandFalseotherwise.Anexampleofavalidcallableargumentwouldbelambdax:xin[0,2].skipfooterint,default0Numberoflinesatbottomoffiletoskip(Unsupportedwithengine='c').nrowsint,optionalNumberofrowsoffiletoread.Usefulforreadingpiecesoflargefiles.na_valuesHashable,IterableofHashableordictof{HashableIterable},optionalAdditionalstringstorecognizeasNA/NaN.Ifdictpassed,specificper-columnNAvalues.BydefaultthefollowingvaluesareinterpretedasNaN:““,“#N/A”,“#N/AN/A”,“#NA”,“-1.#IND”,“-1.#QNAN”,“-NaN”,“-nan”,“1.#IND”,“1.#QNAN”,“”,“N/A”,“NA”,“NULL”,“NaN”,“None”,“n/a”,“nan”,“null“.keep_default_nabool,defaultTrueWhetherornottoincludethedefaultNaNvalueswhenparsingthedata.Dependingonwhetherna_valuesispassedin,thebehaviorisasfollows:Ifkeep_default_naisTrue,andna_valuesarespecified,na_valuesisappendedtothedefaultNaNvaluesusedforparsing.Ifkeep_default_naisTrue,andna_valuesarenotspecified,onlythedefaultNaNvaluesareusedforparsing.Ifkeep_default_naisFalse,andna_valuesarespecified,onlytheNaNvaluesspecifiedna_valuesareusedforparsing.Ifkeep_default_naisFalse,andna_valuesarenotspecified,nostringswillbeparsedasNaN.Notethatifna_filterispassedinasFalse,thekeep_default_naandna_valuesparameterswillbeignored.na_filterbool,defaultTrueDetectmissingvaluemarkers(emptystringsandthevalueofna_values).IndatawithoutanyNAvalues,passingna_filter=Falsecanimprovetheperformanceofreadingalargefile.verbosebool,defaultFalseIndicatenumberofNAvaluesplacedinnon-numericcolumns.Deprecatedsinceversion2.2.0.skip_blank_linesbool,defaultTrueIfTrue,skipoverblanklinesratherthaninterpretingasNaNvalues.parse_datesbool,listofHashable,listoflistsordictof{Hashablelist},defaultFalseThebehaviorisasfollows:bool.IfTrue->tryparsingtheindex.Note:AutomaticallysettoTrueifdate_formatordate_parserargumentshavebeenpassed.listofintornames.e.g.If[1,2,3]->tryparsingcolumns1,2,3eachasaseparatedatecolumn.listoflist.e.g.If[[1,3]]->combinecolumns1and3andparseasasingledatecolumn.Valuesarejoinedwithaspacebeforeparsing.dict,e.g.{'foo':[1,3]}->parsecolumns1,3asdateandcallresult‘foo’.Valuesarejoinedwithaspacebeforeparsing.Ifacolumnorindexcannotberepresentedasanarrayofdatetime,saybecauseofanunparsablevalueoramixtureoftimezones,thecolumnorindexwillbereturnedunalteredasanobjectdatatype.Fornon-standarddatetimeparsing,useto_datetime()afterread_csv().Note:Afast-pathexistsforiso8601-formatteddates.infer_datetime_formatbool,defaultFalseIfTrueandparse_datesisenabled,pandaswillattempttoinfertheformatofthedatetimestringsinthecolumns,andifitcanbeinferred,switchtoafastermethodofparsingthem.Insomecasesthiscanincreasetheparsingspeedby5-10x.Deprecatedsinceversion2.0.0:Astrictversionofthisargumentisnowthedefault,passingithasnoeffect.keep_date_colbool,defaultFalseIfTrueandparse_datesspecifiescombiningmultiplecolumnsthenkeeptheoriginalcolumns.date_parserCallable,optionalFunctiontouseforconvertingasequenceofstringcolumnstoanarrayofdatetimeinstances.Thedefaultusesdateutil.parser.parsertodotheconversion.pandaswilltrytocalldate_parserinthreedifferentways,advancingtothenextifanexceptionoccurs:1)Passoneormorearrays(asdefinedbyparse_dates)asarguments;2)concatenate(row-wise)thestringvaluesfromthecolumnsdefinedbyparse_datesintoasinglearrayandpassthat;and3)calldate_parseronceforeachrowusingoneormorestrings(correspondingtothecolumnsdefinedbyparse_dates)asarguments.Deprecatedsinceversion2.0.0:Usedate_formatinstead,orreadinasobjectandthenapplyto_datetime()as-needed.date_formatstrordictofcolumn->format,optionalFormattouseforparsingdateswhenusedinconjunctionwithparse_dates.Thestrftimetoparsetime,e.g."%d/%m/%Y".Seestrftimedocumentationformoreinformationonchoices,thoughnotethat"%f"willparseallthewayuptonanoseconds.Youcanalsopass:“ISO8601”,toparseanyISO8601timestring(notnecessarilyinexactlythesameformat);“mixed”,toinfertheformatforeachelementindividually.Thisisrisky,andyoushouldprobablyuseitalongwithdayfirst.Newinversion2.0.0.dayfirstbool,defaultFalseDD/MMformatdates,internationalandEuropeanformat.cache_datesbool,defaultTrueIfTrue,useacacheofunique,converteddatestoapplythedatetimeconversion.Mayproducesignificantspeed-upwhenparsingduplicatedatestrings,especiallyoneswithtimezoneoffsets.iteratorbool,defaultFalseReturnTextFileReaderobjectforiterationorgettingchunkswithget_chunk().chunksizeint,optionalNumberoflinestoreadfromthefileperchunk.PassingavaluewillcausethefunctiontoreturnaTextFileReaderobjectforiteration.SeetheIOToolsdocsformoreinformationoniteratorandchunksize.compressionstrordict,default‘infer’Foron-the-flydecompressionofon-diskdata.If‘infer’and‘filepath_or_buffer’ispath-like,thendetectcompressionfromthefollowingextensions:‘.gz’,‘.bz2’,‘.zip’,‘.xz’,‘.zst’,‘.tar’,‘.tar.gz’,‘.tar.xz’or‘.tar.bz2’(otherwisenocompression).Ifusing‘zip’or‘tar’,theZIPfilemustcontainonlyonedatafiletobereadin.SettoNonefornodecompression.Canalsobeadictwithkey'method'settooneof{'zip','gzip','bz2','zstd','xz','tar'}andotherkey-valuepairsareforwardedtozipfile.ZipFile,gzip.GzipFile,bz2.BZ2File,zstandard.ZstdDecompressor,lzma.LZMAFileortarfile.TarFile,respectively.Asanexample,thefollowingcouldbepassedforZstandarddecompressionusingacustomcompressiondictionary:compression={'method':'zstd','dict_data':my_compression_dict}.Newinversion1.5.0:Addedsupportfor.tarfiles.Changedinversion1.4.0:Zstandardsupport.thousandsstr(length1),optionalCharacteractingasthethousandsseparatorinnumericalvalues.decimalstr(length1),default‘.’Charactertorecognizeasdecimalpoint(e.g.,use‘,’forEuropeandata).lineterminatorstr(length1),optionalCharacterusedtodenotealinebreak.OnlyvalidwithCparser.quotecharstr(length1),optionalCharacterusedtodenotethestartandendofaquoteditem.Quoteditemscanincludethedelimiteranditwillbeignored.quoting{0orcsv.QUOTE_MINIMAL,1orcsv.QUOTE_ALL,2orcsv.QUOTE_NONNUMERIC,3orcsv.QUOTE_NONE},defaultcsv.QUOTE_MINIMALControlfieldquotingbehaviorpercsv.QUOTE_*constants.Defaultiscsv.QUOTE_MINIMAL(i.e.,0)whichimpliesthatonlyfieldscontainingspecialcharactersarequoted(e.g.,charactersdefinedinquotechar,delimiter,orlineterminator.doublequotebool,defaultTrueWhenquotecharisspecifiedandquotingisnotQUOTE_NONE,indicatewhetherornottointerprettwoconsecutivequotecharelementsINSIDEafieldasasinglequotecharelement.escapecharstr(length1),optionalCharacterusedtoescapeothercharacters.commentstr(length1),optionalCharacterindicatingthattheremainderoflineshouldnotbeparsed.Iffoundatthebeginningofaline,thelinewillbeignoredaltogether.Thisparametermustbeasinglecharacter.Likeemptylines(aslongasskip_blank_lines=True),fullycommentedlinesareignoredbytheparameterheaderbutnotbyskiprows.Forexample,ifcomment='#',parsing#empty\na,b,c\n1,2,3withheader=0willresultin'a,b,c'beingtreatedastheheader.encodingstr,optional,default‘utf-8’EncodingtouseforUTFwhenreading/writing(ex.'utf-8').ListofPythonstandardencodings.encoding_errorsstr,optional,default‘strict’Howencodingerrorsaretreated.Listofpossiblevalues.Newinversion1.3.0.dialectstrorcsv.Dialect,optionalIfprovided,thisparameterwilloverridevalues(defaultornot)forthefollowingparameters:delimiter,doublequote,escapechar,skipinitialspace,quotechar,andquoting.Ifitisnecessarytooverridevalues,aParserWarningwillbeissued.Seecsv.Dialectdocumentationformoredetails.on_bad_lines{‘error’,‘warn’,‘skip’}orCallable,default‘error’Specifieswhattodouponencounteringabadline(alinewithtoomanyfields).Allowedvaluesare:'error',raiseanExceptionwhenabadlineisencountered.'warn',raiseawarningwhenabadlineisencounteredandskipthatline.'skip',skipbadlineswithoutraisingorwarningwhentheyareencountered.Newinversion1.3.0.Newinversion1.4.0:Callable,functionwithsignature(bad_line:list[str])->list[str]|Nonethatwillprocessasinglebadline.bad_lineisalistofstringssplitbythesep.IfthefunctionreturnsNone,thebadlinewillbeignored.Ifthefunctionreturnsanewlistofstringswithmoreelementsthanexpected,aParserWarningwillbeemittedwhiledroppingextraelements.Onlysupportedwhenengine='python'Changedinversion2.2.0:Callable,functionwithsignatureasdescribedinpyarrowdocumentationwhenengine='pyarrow'delim_whitespacebool,defaultFalseSpecifieswhetherornotwhitespace(e.g.''or'\t')willbeusedasthesepdelimiter.Equivalenttosettingsep='\s+'.IfthisoptionissettoTrue,nothingshouldbepassedinforthedelimiterparameter.Deprecatedsinceversion2.2.0:Usesep="\s+"instead.low_memorybool,defaultTrueInternallyprocessthefileinchunks,resultinginlowermemoryusewhileparsing,butpossiblymixedtypeinference.ToensurenomixedtypeseithersetFalse,orspecifythetypewiththedtypeparameter.NotethattheentirefileisreadintoasingleDataFrameregardless,usethechunksizeoriteratorparametertoreturnthedatainchunks.(OnlyvalidwithCparser).memory_mapbool,defaultFalseIfafilepathisprovidedforfilepath_or_buffer,mapthefileobjectdirectlyontomemoryandaccessthedatadirectlyfromthere.UsingthisoptioncanimproveperformancebecausethereisnolongeranyI/Ooverhead.float_precision{‘high’,‘legacy’,‘round_trip’},optionalSpecifieswhichconvertertheCengineshoulduseforfloating-pointvalues.TheoptionsareNoneor'high'fortheordinaryconverter,'legacy'fortheoriginallowerprecisionpandasconverter,and'round_trip'fortheround-tripconverter.storage_optionsdict,optionalExtraoptionsthatmakesenseforaparticularstorageconnection,e.g.host,port,username,password,etc.ForHTTP(S)URLsthekey-valuepairsareforwardedtourllib.request.Requestasheaderoptions.ForotherURLs(e.g.startingwith“s3://”,and“gcs://”)thekey-valuepairsareforwardedtofsspec.open.Pleaseseefsspecandurllibformoredetails,andformoreexamplesonstorageoptionsreferhere.dtype_backend{‘numpy_nullable’,‘pyarrow’},default‘numpy_nullable’Back-enddatatypeappliedtotheresultantDataFrame(stillexperimental).Behaviourisasfollows:"numpy_nullable":returnsnullable-dtype-backedDataFrame(default)."pyarrow":returnspyarrow-backednullableArrowDtypeDataFrame.Newinversion2.0.ReturnsataFrameorTextFileReaderAcomma-separatedvalues(csv)fileisreturnedastwo-dimensionaldatastructurewithlabeledaxes.4-2、参数4-2-1、filepath_or_buffer(必须):文件的路径对象或任何对象具有read()方法(如文件句柄或类似文件的对象)。4-2-2、sep/delimiter(可选):字段分隔符。如果未指定,则尝试自动检测。使用sep而不是delimiter,其中delimiter的默认值为None。4-2-3、header(可选,默认值为‘infer’):指定哪行(从0开始计数)作为列名,如果文件中没有列,则默认为'infer'。如果为整数或整数列表,则假定这些行是列名。如果为'infer',则尝试自动检测列名。如果传递了None,则不会将任何行视为列名。4-2-4、names(可选):用于结果的列名的列表,如果文件不包含列行,则需要提供此参数。4-2-5、index_col(可选,默认值为None):用作行索引的列编号或列名,可以是整数、列名字符串或列名的列表。如果为None(默认),则使用从0开始的整数索引。4-2-6、usecols(可选,默认值为None):返回一个子集的列。默认情况下,解析所有列。如果为整数列表,则返回这些位置的列;如果为字符串列表,则返回这些名称的列。4-2-7、dtype(可选,默认值为None):数据或列的数据类型。可以是单个类型或类型字典。4-2-8、engine(可选,默认值为None):用于文件解析的解析引擎:{'c','python'},其中,'c'引擎更快,但'python'引擎是更灵活的。4-2-9、converters(可选,默认值为None):列的转换器字典。键可以是列名或列的索引(从0开始)。4-2-10、true_values/false_values(可选,默认值为None):用于将字符串值转换为布尔值的序列。4-2-11、skipinitialspace(可选,默认值为False):跳过字段值的初始空格。4-2-12、skiprows(可选,默认值为None):需要跳过的行号列表(从0开始),或跳过文件开头的行数。4-2-13、skipfooter(可选,默认值为0):从文件末尾跳过的行数(不支持迭代或分块读取)。4-2-14、nrows(可选,默认值为None):需要读取的行数(从文件开始算起)。4-2-15、na_values(可选,默认值为None):附加识别为NA/missing的字符串列表。4-2-16、keep_default_na(可选,默认值为True):如果指定了na_values参数,并且keep_default_na为False,则默认NA值将被忽略。4-2-17、na_filter(可选,默认值为True):检测缺失值标记(空字符串和na_values)。对于大型数据集,设置为False可以提高读取性能。4-2-18、verbose(可选): 如果发生错误,则打印更多信息。4-2-19、skip_blank_lines(可选,默认值为True):如果为True,则跳过空行;否则将其视为NaN。4-2-20、parse_dates(可选,默认值为False):尝试将数据解析为日期。4-2-21、infer_datetime_format(可选):如果为True,并且parse_dates也被启用,pandas将尝试推断日期/时间的格式。4-2-22、keep_date_col(可选):如果连接多列来解析日期,则保留原始列。4-2-23、date_parser(可选):用于解析日期的函数。4-2-24、date_format(可选,默认值为None):字符串或字符串列表,用于指定日期/时间的格式。4-2-25、dayfirst(可选,默认值为False):当解析日期时,是否将日放在月之前。4-2-26、cache_dates(可选,默认值为True):如果为True,则使用缓存的日期解析器。4-2-27、iterator(可选,默认值为False):如果为True,则返回TextFileReader对象,用于增量迭代。4-2-28、chunksize(可选,默认值为None):指定读取文件的块大小(对于迭代)。4-2-29、compression(可选,默认值为'infer'):用于读取文件的压缩类型,如'gzip','bz2','zip','xz'或'infer'(如果文件扩展名已知)。4-2-30、thousands(可选,默认值为None):千位分隔符。4-2-31、decimal(可选,默认值'.'):小数点字符。4-2-32、lineterminator(可选,默认值为None):行尾字符串(仅对C引擎有效)。4-2-33、quotechar(可选,默认值为''):用于标识字段中引用的字符(仅对C引擎有效)。4-2-34、quoting(可选,默认值为0):控制引号的处理方式的参数(仅对C引擎有效)。4-2-35、doublequote(可选,默认值为True):当字段和引号字符都被引用时,指示是否应解释两个引号字符为一个(仅对C引擎有效)。4-2-36、escapechar(可选,默认值为None):当在字段中需要包含引号字符时,用于转义该引号字符的字符(仅对C引擎有效)。4-2-37、comment(可选,默认值为None):标识注释字符的开始,行中该字符之后的部分将被忽略。如果为None(默认值),则不忽略任何行。4-2-38、encoding(可选,默认值为None):用于解码文件的编码。如果为None(默认值),则尝试使用Python的locale.getpreferredencoding(False)来获取系统默认的编码。如果文件包含非ASCII字符,并且没有指定编码,这可能会导致解码错误。4-2-39、encoding_errors(可选,默认值为strict):指定如何处理编码错误。有效选项包括'strict'、'ignore'、'replace'、'surrogatepass'等,'strict'(默认值)将引发异常,'ignore'将忽略错误,'replace'将使用?替换错误字符,'surrogatepass'将允许通过代理对(surrogatepairs)表示UTF-16字符,这可能在某些情况下导致不可预见的错误。4-2-40、dialect(可选,默认值为None):如果指定,则解析器将尝试使用提供的方言参数集,这通常用于更复杂的CSV文件,其中需要更详细的控制(如ExcelCSV文件)。pandas本身并不直接支持复杂的方言定义,但这个参数可以与其他支持方言的库(如csv模块)一起使用,但这在pandas.read_table()中并不常见。4-2-41、on_bad_lines(可选,默认值为'error'):指定在读取过程中遇到“坏行”(即格式不正确的行)时的行为,有效选项包括'error'(默认值,抛出异常)、'warn'(发出警告并跳过该行)、'skip'(仅跳过该行)。4-2-42、delim_whitespace(可选):如果为True,则使用任何空白字符(如空格、制表符等)作为字段分隔符。注意,这与仅指定sep='\s+'不同,因为sep='\s+'将使用正则表达式来匹配一个或多个空白字符作为分隔符,而delim_whitespace=True则允许任何空白字符作为分隔符,并且不会将连续的空白字符视为单个分隔符。4-2-43、low_memory(可选,默认值为True):如果为True(默认值),则尝试以较低内存的方式读取文件,特别是通过分块读取数据,这可能对于处理大文件很有用,但可能会牺牲一些性能。4-2-44、memory_map(可选,默认值为False):如果为True,则使用内存映射文件来读取数据,这可以提高读取大文件的性能,但可能会增加内存使用量。4-2-45、float_precision(可选,默认值为None):指定写入输出文件时浮点数的精度(小数点后的位数),这主要用于写入操作,而不是read_table()方法的直接参数,但在这里提及以供参考。4-2-46、storage_options(可选,默认值为None):用于存储后端(如HDFS、S3等)的额外选项,这是一个字典,可以包含与存储后端相关的配置选项。4-2-47、dtype_backend(可选):指定用于处理数据类型的后端,这通常不需要用户直接设置,因为pandas会根据文件内容和提供的其他参数自动选择适当的后端。4-3、功能        用于读取CSV(逗号分隔值)文件并将其转换为DataFrame对象。4-4、返回值        返回一个pandas.DataFrame对象,该对象包含了从指定文件路径或文件对象中读取的数据。4-5、说明    无4-6、用法4-6-1、创建csv文件#4、创建csv文件的两种方式#4-1、用csv库importcsv#要写入CSV的数据rows=[["Name","Age","City"],["Myelsa",42,"NewYork"],["Bryce",6,"LosAngeles"],["Jimmy",35,"Chicago"]]#打开文件以写入,如果文件不存在则创建withopen('example.csv','w',newline='')asfile:writer=csv.writer(file)#写入所有行writer.writerows(rows)print("CSV文件已使用csv创建!")#4-2、用Pandas库importpandasaspd#要写入CSV的数据data={'Name':['Myelsa','Bryce','Jimmy'],'Age':[42,6,15],'City':['NewYork','LosAngeles','Chicago']}#创建DataFramedf=pd.DataFrame(data)#将DataFrame写入CSV文件df.to_csv('example.csv',index=False)#index=False表示不将行索引写入文件print("CSV文件已使用pandas创建!")4-6-2、代码示例 #4-3、基本读取importpandasaspd#读取CSV文件df=pd.read_csv('example.csv')#显示前几行数据print(df.head())#4-4、指定分隔符importpandasaspddf=pd.read_csv('example.csv',sep=';')print(df.head())#4-5、跳过行和指定列importpandasaspd#跳过前两行,并只读取第一列和第三列df=pd.read_csv('example.csv',skiprows=2,usecols=[0,2])print(df.head())#4-6、指定列名importpandasaspd#假设文件没有列头,我们手动指定列名df=pd.read_csv('example.csv',header=None,names=['Name_1','Age_1','City_1'])print(df.head())4-6-3、结果输出#4-3、基本读取#NameAgeCity#0Myelsa42NewYork#1Bryce6LosAngeles#2Jimmy15Chicago#4-4、指定分隔符#Name,Age,City#0Myelsa,42,NewYork#1Bryce,6,LosAngeles#2Jimmy,15,Chicago#4-5、跳过行和指定列#BryceLosAngeles#0JimmyChicago#4-6、指定列名#Name_1Age_1City_1#0NameAgeCity#1Myelsa42NewYork#2Bryce6LosAngeles#3Jimmy15Chicago二、推荐阅读1、Python筑基之旅2、Python函数之旅3、Python算法之旅4、Python魔法之旅5、博客个人主页
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 会员注册

本版积分规则

QQ|手机版|心飞设计-版权所有:微度网络信息技术服务中心 ( 鲁ICP备17032091号-12 )|网站地图

GMT+8, 2025-1-8 12:49 , Processed in 6.359946 second(s), 25 queries .

Powered by Discuz! X3.5

© 2001-2025 Discuz! Team.

快速回复 返回顶部 返回列表