Click to edit Master title style,*,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Linguistics 187:Grammar Engineering,Ron Kaplan,Tracy King,Martin Forst,Administrivia,Schedule:Office hours,Requirements,Overview,Semantic Search:,Powerset,Hakia,Applications of Language Engineering,Functionality,Domain Coverage,Low,Narrow,Broad,High,Deep,Shallow,Synthesis,Keyword Search:,Google,Yahoo,Microsoft Live,Post-Search,Sifting,AutonomousKnowledge Filtering,NaturalDialogue,Microsoft Paperclip,Manually-tagged Keyword Search,Document BaseManagement,Restricted,Dialogue,Useful Summary,Good Translation,Grammar engineering for deep processing,Draws on theoretical linguistics,software engineering,Theoretical linguistics=papers,Generalizations,universality,idealization(competence),Software engineering=programs,Coverage,interface,QA,maintainability,efficiency,practicality,Grammar engineering,Grammar:Theory=Program:Programming language,Reflect linguistic generalizations,Respect special cases of ordinary language,Deal with large-scale interactions,Theory/practice trade-offs,What is a shallow grammar,often trained automatically from marked up corpora,part of speech tagging,chunking,trees,POS tagging and Chunking,Part of speech tagging:,I,/PRP,saw,/VBD,her,/PRP,duck,/VB,./PUNCT,I,/PRP,saw,/VBD,her,/PRP$,duck,/NN,./PUNCT,Chunking:,general chunking,I begin with an intuition:when I read a sentence,I read it a chunk at a time.(Abney),NP chunking,NP,President Obama,visited,NP,the Hermitage,in,NP,Leningrad,Treebank grammars,Phrase structure tree(c-structure),Annotations for heads,grammatical functions,Collins parser output,Deep grammars,Provide detailed syntactic/semantic analyses,LFG(ParGram),HPSG(LinGO,Matrix),Grammatical functions,tense,number,etc.,Mary wants to leave.,subj,(want1,Mary3),comp,(want1,leave2),subj,(leave2,Mary3),tense,(want1,present),Usually manually constructed,linguistically motivated rules,Why would you want one,Meaning sensitive applications,overkill for many NLP applications,crucial for others,Applications which use shallow methods for English may not work for free word order languages,can read many functions off of trees in English,SUBJ,:NP sister to VP,S NP,Mary,VP,left,OBJ,:first NP sister to V,S NP,Mary,VP,saw,NP,John,need other information in German,Japanese,etc.,Deep analysis matters,if you care about the answer,Example:,A delegation led by Vice President Philips,head of the chemical division,f,lew to Chicago a week after the incident.,Question:Who flew to Chicago?,Candidate answers:,division,closest noun,head,next closest,V.P.,Philips,next,shallow but wrong,delegation,furthest away but,Subject of,flew,deep and right,Search:Keywords to natural language,Suppose you want to know who Obama criticized.,With shallow keyword search engines:,Keywords:“Obama criticized,Simple to use,but,Precison errors,Hillary and John criticized Barack,interesting(maybe)but irrelevant,Recall errors,What about denounce,condemn,Advanced search:More expressive,but complex and unused,(“Obama(criticize OR condemn OR),Compensate with web graph and other ranking features,Who did Obama criticize?,Who did Obama criticize?,Who criticized Obama?,from,subj,by,Sir Edward Heath(name),pneumonia(noun),die(verb),Sir Edward Heath died from pneumonia.,Sir Edward Heath(noun),UK Prime Minister,politician,Parses each sentence on the page,Extracts entities&semantic relationships,Identifies and expands to similar entities,relationships&abstractions,Indexes multiple facts for each sentence,Semantic search(Powerset),disease,killed,Mapping Queries to Content,Edward Heaths death,death of Edward Heath,disease that killed Edward Heath,diseases that killed politicians,politicians who died from disease,politicians that died from pneumonia,politicians killed by pneumonia,who died from pneumonia,what politicians died from disease,which politician died from pneumonia,what disease did Edward Heath die from,what killed Sir Edward Heath,what was Sir Edward Heath killed by,Sir Edward Heath died from pneumonia at 19:30 on 17 July 2005,Acquisition:,manual+ML,Open-textcontent,NLquestions,ContentSemantics,Content Acquisition,User search,Ranking,XLE parse,Semantic map,XLE parse,Semantic map,Indexing,Query,Resultpresentation,Large-scalesemantic index,Retrieval,Who did IBM acquire in the last 10 years?,IBM purchased Lotus in 1998.,QuestionSemantics,Knowledge Resources,LFG Grammar,Doc1:IBM purchased Lotus in 1998.Doc2:List of IBM purchases,Traditional Problems,Time consuming and expensive to write,Robustness,want output for any input:real-world applications,Ambiguity,Efficiency,Interfaces to other application components,Why deep analysis is difficult,Languages are,hard to describe,Meaning depends on complex properties of words and sequences,Different languages rely on different properties,Errors and disfluencies,Languages are,hard to compute,Expensive to recognize complex patterns,Sentences are ambiguous,Ambiguities multiply:explosion in time a