|
原本是想抽取arxiv上面论文中的参考文献信息,但是PDF文件难以解析。固想到用该论文的信息去其他数据库中检索。semanticscholar上面的论文就可以显示出文章的参考文献信息,固调用API实现此目的。总体的流程就是:根据arxiv-id获取semanticscholar-id通过semanticscholar-id获取该文章的参考文献信息(title、author、time、id)数据准备之前已经爬取好了arxiv上关于GCN的论文元数据,见文章👉python爬取arXiv论文元数据例如论文链接https://arxiv.org/pdf/2403.02221的最后一串数字2403.02221就是该篇文章的arxiv-id。根据文章arxiv-id获取semanticscholar-idimportrequests#设置arXivIDarxiv_id="2403.00825"#构造请求的URL,使用arXivID作为参数url=f"https://api.semanticscholar.org/v1/paper/arXiv:{arxiv_id}"#发起请求response=requests.get(url)#检查请求是否成功ifresponse.status_code==200:#解析响应的JSON数据data=response.json()#获取并打印SemanticScholar的IDsemantic_scholar_id=data.get("paperId")print(f"SemanticScholarID:{semantic_scholar_id}")else:print(f"请求失败,状态码:{response.status_code}")123456789101112131415161718运行代码后输出:SemanticScholarID:f15a2d6878429c395e31d738a481fb39e98ca7e2通过semanticscholar-id获取该篇文章的参考文献信息(、作者、年份和ID)importrequestssemantic_scholar_id="075f320d8e82673b51204a768d831a17f9999c02"#构造请求的URLurl=f"https://api.semanticscholar.org/v1/paper/{semantic_scholar_id}"#发起请求response=requests.get(url)#检查请求是否成功ifresponse.status_code==200:data=response.json()#检查是否有引用文献if"references"indataandlen(data["references"])>0:#打印引用文献的信息,例如、作者、时间和SemanticScholar-IDforreferenceindata["references"]:print(f"Title:{reference.get('title','Notitleavailable')}")#打印每个引用的作者,如果有的话if"authors"inreference:authors=",".join([author.get("name","N/A")forauthorinreference["authors"]])print(f"Authors:{authors}")#打印出版年份print(f"Year:{reference.get('year','Noyearavailable')}")#打印SemanticScholarIDprint(f"SemanticScholarID:{reference.get('paperId','NoIDavailable')}")print("-----")else:print("暂没有References")else:print(f"请求失败,状态码:{response.status_code}")1234567891011121314151617181920212223242526272829303132输出:Title:MSNet:Multi-ResolutionSynergisticNetworksforAdaptiveInferenceAuthors:RenlongHang,XuweiQian,QingshanLiuYear:2023SemanticScholarID:46a0dfaa98118728052b9f017940470ba79ce0f1-----Title:ConvNeXtV2:Co-designingandScalingConvNetswithMaskedAutoencodersAuthors:SanghyunWoo,ShoubhikDebnath,RonghangHu,XinleiChen,ZhuangLiu,In-SoKweon,SainingXieYear:2023SemanticScholarID:2218f1713d7f721ab76801063416ec9b11c7646f-----TitleynamicNeuralNetworks:ASurveyAuthors:YizengHan,GaoHuang,ShijiSong,LeYang,HonghuiWang,YulinWangYear:2021SemanticScholarID:837ac4ed6825502f0460caec45e12e734c85b113-----#可以列出所有的参考文献,篇幅原因我仅列了3个12345678910111213141516另外还有其他的一些功能,下面举两个,再想了解其他更多的信息参考官方文档👉semanticscholar-api官方文档链接通过semanticscholar-id获取该篇文章的信息importrequestspaper_id="f15a2d6878429c395e31d738a481fb39e98ca7e2"#构造请求的URLurl=f"https://api.semanticscholar.org/v1/paper/{paper_id}"#发起请求response=requests.get(url)#检查请求是否成功ifresponse.status_code==200:paper_info=response.json()print(paper_info)#打印文章信息else:print(f"请求失败,状态码:{response.status_code}")12345678910111213输出:{'abstract':"Textclassificationisthetaskofassigningadocumenttoapredefinedclass.However,itisexpensivetoacquireenoughlabeleddocumentsortolabelthem.Inthispaper,westudytheregularizationmethods'effectsonvariousclassificationmodelswhenonlyafewlabeleddataareavailable.Wecompareasimplewordembedding-basedmodel,whichissimplebuteffective,withcomplexmodels(CNNandBiLSTM).Insupervisedlearning,adversarialtrainingcanfurtherregularizethemodel.Whenanunlabeleddatasetisavailable,wecanregularizethemodelusingsemi-supervisedlearningmethodssuchasthePimodelandvirtualadversarialtraining.Weevaluatetheregularizationeffectsonfourtextclassificationdatasets(AGnews,DBpedia,Yahoo!Answers,YelpPolarity),usingonly0.1%to0.5%oftheoriginallabeledtrainingdocuments.Thesimplemodelperformsrelativelywellinfullysupervisedlearning,butwiththehelpofadversarialtrainingandsemi-supervisedlearning,bothsimpleandcomplexmodelscanberegularized,showingbetterresultsforcomplexmodels.Althoughthesimplemodelisrobusttooverfitting,acomplexmodelwithwell-designedpriorbeliefscanbealsorobusttooverfitting.",'arxivId':'2403.00825','authors':[{'authorId':'2156939179','name':'JonggaLee','url':'https://www.semanticscholar.org/author/2156939179'},{'authorId':'2289841708','name':'JaeseungYim','url':'https://www.semanticscholar.org/author/2289841708'},{'authorId':'2289841978','name':'SeoheePark','url':'https://www.semanticscholar.org/author/2289841978'},{'authorId':'2290016625','name':'ChangwonLim','url':'https://www.semanticscholar.org/author/2290016625'}],'citationVelocity':0,'citations':[],'corpusId':268230995,'doi':'10.48550/arXiv.2403.00825','fieldsOfStudy':['ComputerScience'],'influentialCitationCount':0,'isOpenAccess':False,'isPublisherLicensed':True,'is_open_access':False,'is_publisher_licensed':True,'numCitedBy':0,'numCiting':0,'paperId':'f15a2d6878429c395e31d738a481fb39e98ca7e2','references':[],'s2FieldsOfStudy':[{'category':'ComputerScience','source':'external'},{'category':'ComputerScience','source':'s2-fos-model'}],'title':'Comparingeffectivenessofregularizationmethodsontextclassification:Simpleandcomplexmodelindatashortagesituation','topics':[],'url':'https://www.semanticscholar.org/paper/f15a2d6878429c395e31d738a481fb39e98ca7e2','venue':'arXiv.org','year':2024}1234567891011121314151617181920212223242526272829303132333435'运行运行获取该篇论文的10篇推荐论文importrequestsimportjson#设置API的基础URLbase_url="https://api.semanticscholar.org/recommendations/v1"#指定要请求的论文推荐的API路径和参数paper_id="075f320d8e82673b51204a768d831a17f9999c02"path=f"/papers/forpaper/{paper_id}"params={"limit":10,#请求返回的推荐论文数量"fields":"title,authors,year"#请求返回的字段}#发起GET请求response=requests.get(f"{base_url}{path}",params=params)#检查请求是否成功ifresponse.status_code==200:#解析响应内容recommendations=response.json()print(json.dumps(recommendations,indent=2))else:print(f"Error:{response.status_code}")12345678910111213141516171819202122232425输出:{"recommendedPapers":[{"paperId":"bd8ee79c28ef2eb55185c6912484847696c0773b","title":"SoD2:StaticallyOptimizingDynamicDeepNeuralNetwork","year":2024,"authors":[{"authorId":"48643324","name":"WeiNiu"},{"authorId":"2289611051","name":"GaganAgrawal"},{"authorId":"2244768705","name":"BinRen"}]},{..............省略]}12345678910111213141516171819202122232425通过doi获取论文信息importrequestsdoi="10.1145/3292500.3330925"#构造请求的URLurl=f"https://api.semanticscholar.org/v1/paper/{doi}"response=requests.get(url)ifresponse.status_code==200:paper_details=response.json()print(f"SemanticScholarID:{paper_details.get('paperId')}")print(f"Title:{paper_details.get('title')}")print(f"Authors:{[author['name']forauthorinpaper_details.get('authors',[])]}")print(f"YearofPublication:{paper_details.get('year')}")print(f"Abstract:{paper_details.get('abstract','Noabstractavailable')}")else:print(f"Error:Failedtoretrievedata,statuscode{response.status_code}")1234567891011121314151617181920输出:SemanticScholarID:05c4eb154ad9512a69569c18d68bc4428ee8bb83Title:Cluster-GCN:AnEfficientAlgorithmforTrainingDeepandLargeGraphConvolutionalNetworksAuthors:['Wei-LinChiang','XuanqingLiu','SiSi','YangLi','SamyBengio','Cho-JuiHsieh']YearofPublication:2019Abstract:Graphconvolutionalnetwork(GCN)hasbeensuccessfullyappliedtomanygraph-basedapplications;however,trainingalarge-scaleGCNremainschallenging.CurrentSGD-basedalgorit....省略12345邮箱:k1933211129@163.com
|
|