Hinrich Schütze Chair of Computational Linguistics University of Munich (LMU) Oettingenstr 67 80538 Muenchen, Germany hinrich@hotmail.com |
For which objects?
For which objectives?
Abstract: Natural language input in deep learning is commonly represented as embeddings. While embeddings are widely used, fundamental questions about the nature and purpose of embeddings remain. Drawing on traditional computational linguistics as well as parallels between language and vision, I will address two of these questions in this talk. (1) Which linguistic units should be represented as embeddings? (2) What are we trying to achieve using embeddings and how do we measure success?
Hinrich Schütze's Biography
Hinrich Schütze is best known for co-authoring the standard reference book on statistical natural language processing. His book Introduction to Information Retrieval co-authored with Chris Manning and Prabhakar Raghavan was published in 2008 and has already been adopted by many IR courses throughout the world. Dr. Schütze obtained his Ph.D. from Stanford University and has worked for a number of Silicon Valley companies, including two large search engines and several text mining startups.
Ido Dagan Natural Language Processing Lab Department of Computer Science Bar Ilan University Ramat Gan, 52900, Israel dagan@cs.biu.ac.il |
Abstract:
How can we capture the information expressed in large amounts of text? And how can we allow people, as well as computer applications, to easily explore it? When comparing textual knowledge to formal knowledge representation (KR) paradigms, two prominent differences arise. First, typical KR paradigms rely on pre-specified vocabularies, which are limited in their scope, while natural language is inherently open. Second, in a formal knowledge base each fact is encoded in a single canonical manner, while in multiple texts a fact may be repeated with some redundant, complementary or even contradictory information.
In this talk I will outline a new research direction, which we term Natural Language Knowledge Graphs (NLKG), that aims to represent textual information in a consolidated manner, based on the available natural language vocabulary and structure. I will first suggest some plausible requirements that such graphs should satisfy, that would allow effective communication of the encoded knowledge. Then, I will describe our current specification for NLKG structure, motivated by a use case of representing multiple tweets describing an event. Our structure merges individual proposition extractions, created in an Open-IE flavor, into a representation of consolidated entities and propositions, adapting the spirit of formal knowledge graphs. Different mentions of entities and propositions are organized into entailment graphs, which allow tracing the inference relationships between these mentions. Finally, I will review some concrete research components, including a proposition extraction tool and lexical inference methods, and will illustrate the potential application of NLKGs for text exploration.
Ido Dagan's Biography
Ido Dagan is a Professor at the Department of Computer Science at Bar-Ilan University, Israel and a Fellow of the Association for Computational Linguistics (ACL). His interests are in applied semantic processing, focusing on textual inference and natural-language based knowledge representation and acquisition. Dagan and colleagues established the textual entailment recognition paradigm. He was the President of the ACL in 2010 and served on its Executive Committee during 2008-2011. In that capacity, he led the establishment of the Transactions of the Association for Computational Linguistics. Dagan received his B.A. summa cum laude and his Ph.D. (1992) in Computer Science from the Technion. He was a research fellow at the IBM Haifa Scientific Center (1991) and a Member of Technical Staff at AT&T Bell Laboratories (1992-1994). During 1998-2003 he was co-founder and CTO of FocusEngine and VP of Technology of LingoMotors.
Elmar Nöth Friedrich-Alexander-Universität Speech Processing and Understanding group Martensstraße 3 91058 Erlangen Germany elmar.noeth@fau.de |
Abstract:
Alzheimer’s disease (AD) is the most common neurodegenerative disorder. It generally deteriorates memory function, then language, then executive function to the point where simple activities of daily living (ADLs) become difficult (e.g. taking medicine or turning off a stove). Parkinson’s disease (PD) is the second most common neurodegenerative disease, also primarily affecting individuals of advanced age. Its cardinal symptoms include akinesia, tremor, rigidity, and postural imbalance. Together, AD and PD afflict approximately 55 million people, and there is no cure. Currently, professional or informal caregivers look after these individuals, either at home or in long-term care facilities. Caregiving is already a great, expensive burden on the system, but things will soon become far worse. Populations of many nations are aging rapidly and, with over 12% of people above the age of 65 having either AD or PD, incidence rates are set to triple over the next few decades.
Monitoring and assessment are vital, but current models are unsustainable. Patients need to be monitored regularly (e.g. to check if medication needs to be updated), which is expensive, time-consuming, and especially difficult when travelling to the closest neurologist is unrealistic. Monitoring patients using non-intrusive sensors to collect data during ADLs from speech, gait, and handwriting, can help to reduce the burden.
In this talk I will report on the results of the workshop on "Remote Monitoring of Neurodegeneration through Speech", which was part of the “Third Frederick Jelinek Memorial Summer Workshop”.