PUBLICATIONS
Parsinlu: a suite of language understanding challenges for persian
Daniel Khashabi, Arman Cohan, Siamak Shakeri, Pedram Hosseini, Pouya Pezeshkpour, Malihe Alikhani, Moin Aminnaseri, and +21 authors
TACL  2021
Tl;DR: We introduce ParsiNLU, the first benchmark in Persian language that includes a range of high-level tasks -- Reading Comprehension, Textual Entailment, etc. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers.
CDLM: Cross Document Language Modeling
Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew E. Peters, Arie Cattan, and Ido Dagan
EMNLP Findings  2021
Tl;DR: A new pretrained language model for cross document tasks
MS2: Multi-Document Summarization of Medical Studies
Jay DeYoung, Iz Beltagy, Madeleine van Zuylen, Bailey Kuehl, and Lucy Lu Wang
EMNLP  2021
Tl;DR: To assess the effectiveness of any medical intervention, researchers must conduct a time-intensive and highly manual literature review. NLP systems can help to automate or assist in parts of this expensive process. In support of this goal, we release MS^2 (Multi-Document Summarization of Medical Stu...
PAWLS: PDF Annotation With Labels and Structure
Mark Neumann, Zejiang Shen, and Sam Skjonsberg
ACL  2021
Tl;DR: PAWLS is a new annotation tool designed specifically for the PDF document format. PAWLS supports span-based textual annotation, N-ary relations and freeform, non-textual bounding boxes, all of which can be exported in convenient formats for training multi-modal machine learning models.
Searching for scientific evidence in a pandemic: An overview of TREC-COVID
Kirk Roberts, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, Kyle Lo, I. Soboroff, E. Voorhees, Lucy Lu Wang, and W. Hersh
Journal of Biomedical Informatics  2021
Tl;DR: This paper provides a comprehensive overview of the structure and results of TREC-COVID, an information retrieval (IR) shared task to evaluate search on scientific literature related to COVID-19.
Overview and Insights from the SciVer Shared Task on Scientific Claim Verification
David Wadden and Kyle Lo
SDP  2021
Tl;DR: We present an overview of the SCIVER shared task. In addition to surveying the participating systems, we provide several insights into modeling approaches to support continued progress and future research on scientific claim verification.
FLEX: Unifying Evaluation for Few-Shot NLP
Jonathan Bragg*, Arman Cohan*, Kyle Lo, and Iz Beltagy
preprint  2021
Tl;DR: Few-shot NLP research lacks a unified, challenging-yet-realistic evaluation setup. In response, we introduce FLEX, a rigorous few-shot learning NLP benchmark and public leaderboard measuring four transfer types. We also present UniFew, a simple, competitive baseline that does not rely on heavy promp...
Simplified Data Wrangling with ir_datasets
Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Arman Cohan, and Nazli Goharian
SIGIR  2021
Tl;DR: We introduce a new robust and lightweight tool (ir_datasets) for acquiring, managing, and performing typical operations over datasets used in IR
Extracting a Knowledge Base of Mechanisms from COVID-19 Papers
Tom Hope*, Aida Amini*, David Wadden, Madeleine van Zuylen, Eric Horvitz, Roy Schwartz, and Hannaneh Hajishirzi
NAACL  2021
Tl;DR: To navigate the collection of COVID19 papers from different domains, we present a KB of mechanisms relating to COVID19, to support domain-agnostic search and exploration of general activities, functions, influences and associations in these papers.
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers
Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, and Matt Gardner
NAACL  2021
Tl;DR: Qasper is a dataset of 5049 questions over 1585 NLP papers designed to facilitate document-grounded, information-seeking QA. Existing models that do well on other QA tasks do not perform well on these questions.
LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis
Zejiang Shen, Ruochen Zhang, Melissa Dell, Ben Lee, Jacob Carlson, and Weining Li
preprint  2021
Tl;DR: Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks. Ideally, research outcomes could be easily deployed in production and extended for further investigation. However, various factors like loosely organized codebases and sophisticated mod...
Incorporating Visual Layout Structures for Scientific Text Classification
Zejiang Shen, Kyle Lo, Lucy Lu Wang, Bailey Kuehl, Daniel S. Weld, and Doug Downey
preprint  2021
Tl;DR: We introduce new methods for incorporating VIsual LAyout (VILA) structures, e.g., the grouping of page texts into text lines or text blocks, into language models to further improve performance on automated scientific document understanding.
What Do We Mean by "Accessibility Research"? A Literature Survey of Accessibility Papers in CHI and ASSETS from 1994 to 2019
Kelly Mack, Emma McDonnell, Dhruv Jain, Lucy Lu Wang, Jon Froehlich, and Leah Findlater
CHI  2021
Tl;DR: Accessibility research has grown substantially in the past few decades, yet there has been no literature review of the field. To understand current and historical trends, we created and analyzed a dataset of accessibility papers appearing at CHI and ASSETS since ASSETS'founding in 1994.
Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and Symbols
Andrew Head, Kyle Lo, Dongyeop Kang, Raymond Fok, Sam Skjonsberg, Daniel S. Weld, and and Marti A. Hearst
CHI  2021
Tl;DR: We introduce ScholarPhi, an augmented reading interface that brings definitions of technical terms and symbols to readers when and where they need them most.
Improving the Accessibility of Scientific Documents: Current State, User Needs, and a System Solution to Enhance Scientific PDF Accessibility for Blind and Low Vision Users
Lucy Lu Wang*, Isabel Cachola*, Jonathan Bragg, Evie Yu-Yen Cheng, Chelsea Haupt, Matt Latzke, Bailey Kuehl, Madeleine van Zuylen, Linda Wagner, and Daniel S Weld
preprint  2021
Tl;DR: The majority of scientific papers are distributed in PDF, which pose challenges for accessibility, especially for blind and low vision (BLV) readers. We characterize the scope of this problem...
Scaling Creative Inspiration with Fine-Grained Functional Facets of Product Ideas
Tom Hope, Ronen Tamari, Hyeonsu Kang, Daniel Hershcovich, Joel Chan, Aniket Kittur, and Dafna Shahaf
preprint  2021
Tl;DR: Repositories of products or scientific papers offer an opportunity for creating automated systems that assist users in discovering inspirations and solutions. We propose a novel computational representation of ideas with fine-grained functional facets and use it to help problem-solvers search for id...
Gender Trends in Computer Science Authorship
Lucy Lu Wang, Gabriel Stanovsky, Luca Weihs, and Oren Etzioni
CACM  2021
Tl;DR: An analysis of 2.87 million computer science papers reveals that, if current trends continue, parity between the number of male and female authors will not be reached in this century. With optimistic projection models, gender parity is forecast to be reached by 2100 in CS, but projected to be reache...
S2AND: A Benchmark and Evaluation System for Author Name Disambiguation
Shivashankar Subramanian, Daniel King, Doug Downey, and Sergey Feldman
preprint  2021
Tl;DR: S2AND is a new benchmark dataset and reference model for the author disabiguation (AND) task. AND is a critical task for digital libraries, and existing work is scattered across multiple datasets, each covering a different slice of literature, and with a different feature set. In S2AND, we unify the...
GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation
Daniel Khashabi, Gabriel Stanovsky, Jonathan Bragg, Nicholas Lourie, Jungo Kasai, Yejin Choi, Noah A. Smith, and Daniel S. Weld
preprint  2021
Tl;DR: This work introduces GENIE, an extensible human evaluation leaderboard, which brings the ease of leaderboards to text generation tasks. GENIE automatically posts leaderboard submissions to crowdsourcing platforms and presents both manual and automatic metrics on the leaderboard.
Text mining approaches for dealing with the rapidly expanding literature on COVID-19
Lucy Lu Wang and Kyle Lo
Briefings in Bioinformatics  2020
Tl;DR: This review discusses the corpora, modeling resources, systems and shared tasks that have been introduced for COVID-19, and lists 39 systems that provide functionality such as search, discovery, visualization and summarization over the COVID-19 literature.
On Generating Extended Summaries of Long Documents
Sajad Sotudeh, Arman Cohan, and Nazli Goharian
preprint  2020
Tl;DR: We present a new hierarchical extractive method for generating extended summaries of long papers.
ABNIRML: Analyzing the Behavior of Neural IR Models
Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, and Arman Cohan
preprint  2020
Tl;DR: We present a new comprehensive framework for Analyzing the Behavior of Neural IR ModeLs (ABNIRML), which includes new types of diagnostic tests that allow us to probe several characteristics---such as sensitivity to word order---that are not addressed by previous techniques.
MedICaT: A Dataset of Medical Images, Captions, and Textual References
Sanjay Subramanian, Lucy Lu Wang, Sachin Mehta, Ben Bogin, Madeleine van Zuylen, S. Parasa, Sameer Singh, Matt Gardner, and Hannaneh Hajishirzi
EMNLP Findings  2020
Tl;DR: Using MedICaT, a dataset of medical images in context, the task of subfigure to subcaption alignment in compound figures is introduced and the utility of inline references in image-text matching is demonstrated.
TLDR: Extreme Summarization of Scientific Documents
Isabel Cachola, Kyle Lo, Arman Cohan, and Daniel S. Weld
EMNLP Findings  2020
Tl;DR: We introduce TLDR generation for scientific papers, a new automatic summarization task with high source compression and provide a new dataset and models for effective generation of TLDRs.
SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search
Tom Hope, Jason Portenoy*, Kishore Vasan*, Jonathan Borchardt*, Eric Horvitz, Daniel S. Weld, Marti A. Hearst, and Jevin D. West
EMNLP  2020
Tl;DR: SciSight is a novel framework for exploratory search of COVID-19 research that integrates two key capabilities: first, exploring interactions between biomedical facets (e.g., proteins, genes, drugs, diseases, patient characteristics); and second, discovering groups of researchers and how they are co...
Fact or Fiction: Verifying Scientific Claims
David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, and Hannaneh Hajishirzi
EMNLP  2020
Tl;DR: we construct SciFact, a dataset of 1.4K expert-written scientific claims paired with evidence-containing abstracts annotated with labels and rationales. We develop baseline models for SciFact, and demonstrate that these models benefit from combined training on a large dataset of claims about Wikiped...
Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions
Dongyeop Kang, Andrew Head, Risham Sidhu, Kyle Lo, Daniel S. Weld, and Marti A. Hearst
EMNLP  2020
Tl;DR: The task of definition detection is important for scholarly papers, because papers often make use of technical terminology that may be unfamiliar to readers. We develop a new definition detection system, HEDDEx, that utilizes syntactic features, transformer encoders, and heuristic filters, and evalu...
Modelling kidney disease using ontology: insights from the Kidney Precision Medicine Project
Edison Ong*, Lucy Lu Wang*, J. Schaub, J. O’Toole, Becky Steck, A. Rosenberg, Frederick Dowd, J. Hansen, L. Barisoni, Sanjay Jain, I. D. de Boer, M. T. Valerius, S. Waikar, Christopher Park, and 14 more...
Nature Reviews Nephrology  2020
Tl;DR: An important need exists to better understand and stratify kidney disease according to its underlying pathophysiology in order to develop more precise and effective therapeutic agents. National collaborative efforts such as the Kidney Precision Medicine Project are working towards this goal through...
TREC-COVID: Rationale and Structure of an Information Retrieval Shared Task for COVID-19
Kirk Roberts, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, Kyle Lo, Ian Soboroff, Ellen Voorhees, Lucy Lu Wang, and William R Hersh
JAMIA  2020
Tl;DR: This article presents a brief description of the rationale and structure of TREC-COVID, a still-ongoing IR evaluation. TREC-COVID is creating a new paradigm for search evaluation in rapidly evolving crisis scenarios.
CORD-19: The Covid-19 Open Research Dataset
Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Darrin Eide, Kathryn Funk, Rodney Kinney, Ziyang Liu, William Merrill, Paul Mooney, Dewey Murdick, Devvret Rishi, Jerry Sheehan, and 10 more...
NLP-COVID at ACL  2020
Tl;DR: The Covid-19 Open Research Dataset (CORD-19) is a growing 1 resource of scientific papers on Covid-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development of text mining and information retrieval systems over its rich collection of metadata and structured fu...
S2ORC: The Semantic Scholar Open Research Corpus
Kyle Lo, Lucy Lu Wang, Mark E Neumann, Rodney Michael Kinney, and Daniel S. Weld
ACL  2020
Tl;DR: We introduce S2ORC, a large contextual citation graph of English-language academic papers from multiple scientific domains; the corpus consists of 81.1M papers, 380.5M citation edges, and associated paper metadata.
Stolen Probability: A Structural Weakness of Neural Language Models
David Demeter, Gregory Kimmel, and Doug Downey
ACL  2020
Tl;DR: We show that the softmax output common in neural language models leads to a limitation: some words (in particular, those with an embedding interior to the convex hull of the embedding space) can never be assigned high probability by the model, no matter what the context.
Language (Re)modelling: Towards Embodied Language Understanding
Ronen Tamari, Chen Shani, Tom Hope, Miriam R. L. Petruck, Omri Abend, and Dafna Shahaf
ACL  2020
Tl;DR: We bring together ideas from cognitive science and AI/NLU, arguing that grounding by analogical inference and executable simulation will greatly benefit NLU systems. We propose a system architecture along with a roadmap towards realizing this vision.
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith
ACL  2020
Tl;DR: We argue that textual domains comprise a spectrum of different granularities. Pretraining along this spectrum maximizes performance of language models on NLP tasks.
SUPP.AI: Finding Evidence for Supplement-Drug Interactions
Lucy Lu Wang, Oyvind Tafjord, Sarthak Jain, Arman Cohan, Sam Skjonsbert, Carissa Schoenick, Nick Botner, and Waleed Ammar
ACL  2020
Tl;DR: We extracted evidence of supplement-drug interactions from 22M scientific articles. Using transfer learning approaches, we fine-tune the BERT language model using labeled evidence of drug-drug interactions, and use the resulting model to detect supplement interaction evidence. We surface these inter...
SciREX: A Challenge Dataset for Document-Level Information Extraction
Sarthak Jain, Madeleine van Zuylen, Hanna Hajishirzi, and Iz Beltagy
ACL  2020
Tl;DR: We introduce a new dataset called SciREX that requires understanding of the whole document to annotate entities, and their document-level relationships that usually span beyond sentences or even sections.
SPECTER: Document-level Representation Learning using Citation-informed Transformers
Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, and Daniel S. Weld
ACL  2020
Tl;DR: We propose a document representation model that incorporates inter-document context into pretrained language models.
High-Precision Extraction of Emerging Concepts from Scientific Literature
Daniel King, Doug Downey, and Daniel S. Weld
SIGIR  2020
Tl;DR: A novel, unsupervised method for extracting scientific concepts from papers, based on the intuition that each scientific concept is likely to be introduced or popularized by a single paper that is disproportionately cited by subsequent papers mentioning the concept.
Longformer: The Long-Document Transformer
Iz Beltagy, Matthew E. Peters, and Arman Cohan
preprint  2020
Tl;DR: We introduce the Longformer, with an attention mechanism that scales linearly with sequence length, achieving state-of-the-art results on multiple character-level language modeling and document-level tasks.
Building a Better Search Engine for Semantic Scholar
Sergey Feldman
blog  2020
Tl;DR: 2020 is the year of search for Semantic Scholar, a free, AI-powered research tool for scientific literature, based at the Allen Institute for AI. One of our biggest endeavors this year is to improve the relevance of our search engine, and my mission beginning at the start of the year was to figure o...
TREC-COVID: Constructing a Pandemic Information Retrieval Test Collection
Ellen M. Voorhees, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, William R. Hersh, Kyle Lo, Kirk Roberts, Ian Soboroff, and Lucy Lu Wang
SIGIR Forum  2020
Tl;DR: TREC-COVID is a community evaluation designed to build a test collection that captures the information needs of biomedical researchers using the scientific literature during a pandemic.
SLEDGE: A Simple Yet Effective Baseline for Coronavirus Scientific Knowledge Search
Sean MacAvaney, Arman Cohan, and Nazli Goharian
preprint  2020
Tl;DR: We present a SLDEDGE, a search system that utilizes SciBERT to effectively re-rank articles related to SARS-CoV-2. SLEDGE achieves state-of-the-art results on the TREC covid search round 1 benchmark.
Abductive Commonsense Reasoning
Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Scott Wen-tau Yih, and Yejin Choi
ICLR  2020
Tl;DR: We conceptualize a new task of Abductive NLI and introduce a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations, formulated as multiple choice questions for easy automatic evaluation.
Explanation-Based Tuning of Opaque Machine Learners with Application to Paper Recommendation
Benjamin Charles Germain Lee, Kyle Lo, Doug Downey, and Daniel S. Weld
preprint  2020
Tl;DR: We developed a general approach for actionable explanations, which you can try within our Semantic Sanity prototype. User studies of the approach have shown that it leads to higher perceived user control, trust, and satisfaction.
Citation Text Generation
Kelvin Luu, Rik Koncel-Kedziorski, Kyle Lo, Isabel Cachola, and Noah A. Smith
preprint  2020
Tl;DR: We introduce the task of citation text generation: given a pair of scientific documents, explain their relationship in natural language text in the manner of a citation from one text to the other.
Just Add Functions: A Neural-Symbolic Language Model (2020)
Dave Demeter and Doug Downey
AAAI  2020
Tl;DR: We present a model based on pretrained language models for classifying sentences in context of other sentences. Achieves SOTA results on 4 datasets on 2 different domains. We also release a challenging dataset of 2K discourse facets in CS domain.
Pretrained Language Models for Sequential Sentence Classification
Arman Cohan, Iz Beltagy, Daniel King, Bhavana Dalvi, and Daniel S. Weld
EMNLP  2019
Tl;DR: We present a model based on pretrained language models for classifying sentences in context of other sentences. Achieves SOTA results on 4 datasets on 2 different domains. We also release a challenging dataset of 2K discourse facets in CS domain.
SciBERT: Pretrained Language Model for Scientific Text
Iz Beltagy, Kyle Lo, and Arman Cohan
EMNLP  2019
Tl;DR: SciBERT is a pretrained language model for scientific text.
ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing
Mark Neumann, Daniel King, Iz Beltagy, and Waleed Ammar
BioNLP  2019
Tl;DR: We created a spaCy pipeline for biomedical and scientific text processing. The core models include dependency parsing, part of speech tagging, and named entity recognition models retrained on general biomedical text, and custom tokenization. We also release four specific named entity recognition mod...
Combining Distant and Direct Supervision for Neural Relation Extraction (2019)
Iz Beltagy, Kyle Lo, and Waleed Ammar
NAACL  2019
Tl;DR: We improve relation extraction models by combining the distant supervision data with an additional directly-supervised data, which we use as supervision for the attention weights. We find that joint training on both types of supervision leads to a better model because it improves the model's ability...
Structural Scaffolds for Citation Intent Classification in Scientific Publications
Arman Cohan, Waleed Ammar, Madeleine van Zuylen, and Field Cady
NAACL  2019
Tl;DR: We propose a new scaffolding model for classifying citation intents using two auxiliary tasks to handle low-resouce training data. We additionally propose SciCite, a multi-domain dataset of citation intents.
GrapAL: Querying Semantic Scholar's Literature Graph
Christine Betts, Joanna L. Power, and Waleed Ammar
NAACL  2019
Tl;DR: We introduce GrapAL (Graph database of Academic Literature), a versatile tool for exploring and investigating scientific literature which satisfies a variety of use cases and information needs requested by researchers.
China catching up to US in AI research
Field Cady and Oren Etzioni
blog  2019
Tl;DR: We analyzed over two million academic papers, and found that China has already surpassed the US in published AI papers. If current trends continue, China is poised to overtake the US in the most-cited 50% of papers this year, in the most-cited 10% of papers next year, and in the 1% of most-cited pap...
Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction
Sergey Feldman, Waleed Ammar, Kyle Lo, Elly Trepman, Madeleine van Zuylen, and Oren Etzioni
JAMA  2019
Tl;DR: We extracted counts of women and men from over 40k published clinical trial articles and found substantial underrepresentation of female participants in 7 of 11 disease categories, especially HIV/AIDS, chronic kidney diseases, and cardiovascular diseases.
Construction of the Literature Graph in Semantic Scholar
Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, and et al.
NAACL  2018
Tl;DR: This paper introduces the Semantic Scholar literature graph, consisting of more than 280M nodes, representing papers, authors, entities and various interactions between them. [acknowledgements: TAGME entity linker (https://tagme.d4science.org/)]
Extracting Scientific Figures with Distantly Supervised Neural Networks
Noah Siegel, Nicholas Lourie, Russell Power, and Waleed Ammar
JCDL  2018
Tl;DR: In this paper, we induce high-quality training labels for the task of figure extraction in a large number of scientific documents, with no human intervention.
Content-Based Citation Recommendation
Chandra Bhagavatula, Sergey Feldman, Russell Power, and Waleed Ammar
NAACL  2018
Tl;DR: We embed a given query document into a vector space, then use its nearest neighbors as candidates, and rerank the candidates using a discriminative model trained to distinguish between observed and unobserved citations.
Ontology Alignment in the Biomedical Domain Using Entity Definitions and Context
Lucy Lu Wang, Chandra Bhagavatula, Mark Neumann, Kyle Lo, Christopher Wilhelm, and Waleed Ammar
BioNLP  2018
Tl;DR: This ontology matcher can be used to generate alignments between entities in two biomedical ontologies. The matcher uses entity definitions and usage context retrieved from the Semantic Scholar corpus to assist in entity matching.
Does ArXiv help increase citation counts?
Sergey Feldman, Kyle Lo, and Waleed Ammar
preprint  2018
Tl;DR: We explore the degree to which papers prepublished on arXiv garner more citations, in an attempt to paint a sharper picture of fairness issues related to prepublishing. We observe that papers submitted to arXiv before acceptance have, on average, 65% more citations in the following year compared to...
A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications
Dongyeop Kang, Waleed Ammar, Bhavana Dalvi, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy, and Roy Schwartz
NAACL  2018
Tl;DR: We present the first public dataset of scientific peer reviews available for research purposes, containing 14.7K paper drafts and the corresponding accept/reject decisions in top-tier venues.
Semi-supervised End-to-End Entity and Relation Extraction
Waleed Ammar, Mathew E. Peters, Chandra Bhagavatula, and Russell Power
SemEval  2017
Tl;DR: Our submission to SemEval 2017 Task 10 (ScienceIE) shared task placed 1st in end-to-end entity and relation extraction and 2nd in relation-only extraction. We find that pretraining neural forward and backward language model produces word representations that can drastically improve model performanc...
Identifying Meaningful Citations
Marco Valenzuela, Vu Ha, and Oren Etzioni
AAAI  2015
Tl;DR: We introduce the novel task of identifying important citations in scholarly literature, i.e., citations that indicate that the cited work is used or extended in the new effort. We believe this task is a crucial component in algorithms that detect and follow research topics and in methods that measur...
Semantic Scholar is a free, AI-powered research tool for scientific literature, based at the Allen Institute for AI.
© The Allen Institute for Artificial Intelligence.
All rights Reserved.
Privacy Policy
|
Terms and Conditions