書式設定,書式設定,第 2,第 3,第 4,第 5,*,MIDST and IMAGES-M,Masao Yokota,Fukuoka Institute of Technology,MIDST and IMAGES-MMasao Yokota,1,Background&motivation,Intelligent systems should be,more human-friendly considering,Floods of multimedia information,Increase of highly matured societies,Development of robots for practical use,The others,Solution,Integrated Multimedia Understanding System,IMAGES-M,Background&motivation,2,IMAGES-M,Speech Processing Unit,(SPU),Action Data Processing Unit,(APU),Text Processing Unit,(TPU),Picture Processing Unit,(PPU),Sensory Data Processing Unit,(SDPU),Knowledge Base,(KB),Inference Engine,(IE),IMAGES-MSpeech Processing Unit,3,Demonstration of IMAGES-M-Collaboration of TPU and PPU-,(Phase 1)Text to Picture translation,Input:Text,Output:Pictorial interpretation,(Phase 2)Q-A about Picture by Text,Input:Query Text,Output:Answer Text,Demonstration of IMAGES-M-C,4,The lamp above the chair is small.,The red pot is 1m to the left of the chair.,The blue big box is 3m to the right of the chair.,Input text(Japanese/English/Chinese),Output picture,The lamp above the chair is sm,5,The octagon is to the upper right of the triangle.,The octagon is above the quadrangle.,The triangle is to the lower left of the octagon.,Output text,Input picture,The octagon is to the upper ri,6,Input sentence:Taro ga kubi wo furu(=Taro shakes his head).,Output animation:,Input sentence:Taro ga kubi w,7,Cross-reference between picture and text,美和台通国道号線出会。,下和白交差点出会。,Cross-reference between pictur,8,Integrated Multimedia Understanding based on,L,md,Picture Animation,Text Action,Speech Sensory data,Descriptive power and Computability,of Meta Language,L,md,for Intermediate Representation,Intermediate Representation,Integrated Multimedia Underst,9,Mental Image Directed Semantic Theory(MIDST),proposed by Yokota,M.,Information Processing by,intelligent entities,=,Mental Image Processing,Mental Images,Sensory Images =Sensations coded,by Sensors,Conceptual Images=Sensory Images processed,by Brains,(e.g.Word Concepts),Mental Image Directed Semantic,10,Multimedia Description Language,L,md,based on Mental Image Directed Semantic Theory(MIDST),Syntax,Many-sorted predicate logic,with a special predicate constant,L,called“Atomic Locus”,Semantics,Interpretation in association with,an,omnisensual mental image model so called“Loci in attribute spaces”,Multimedia Description Languag,11,LOCATION,SHAPE,COLOR,Omnisensual Mental Image Model,Coded Sensations,Loci in Attribute Spaces,Sensation(=Sensory event),=,Spatio-temporal distribution,of stimuli.,LOCATIONSHAPECOLOROmnisensual,12,Atomic Locus,L(x,y,p,q,a,g,k),t,i,t,j,p,q,x,y,a,t,i,t,j,P,q,x,y,a,Gt:temporal event,Gs:spatial event,g=,“Matter x causes Attribute a of Matter y to keep or change its value temporally or spatially,over a time interval,where the value p and q are relative to Standard k.”,Atomic Locus L(x,y,p,q,a,g,k)t,13,Terms of Atomic Locus,L(,1,2,3,4,5,6,7,),Term,Type Name,Semantic Role,1,Matter,Event Causer(EC),2,Attribute Carrier(AC),3,Attribute Value,Beginning of Locus,4,Ending of Locus,5,Attribute,Domain of Attribute Value,6,Event Type,Relation between AC and FAO,7,Standard,Unit,Origin,Scale etc for Values,Terms of Atomic LocusL(1,2,14,(S1)The bus runs from Tokyo to Osaka.,(,x,y,k)L(x,y,Tokyo,Osaka,A12,Gt,k),bus(y),(S2)The road runs from Tokyo to Osaka.,(,x,y,k)L(x,y,Tokyo,Osaka,A12,Gs,k),road(y),A12:Physical Location,Event types,Temporal event,Tokyo,Osaka,Spatial event,FAO,AC,(S1)The bus runs from Tokyo t,15,Attributes,Table 1 Attributes,Attributes Table 1 Att,16,Table 2 Standards,Categories of standards,Remarks,Rigid Standard,Objective standards such as denoted by measuring,units,(meter,gram,etc.).,Species Standard,The,attribute value ordinary,for a species.A,short train,is ordinarily longer than a,long pencil,.,Proportional Standard,Oblong,means that the width is greater than the height at a physical object.,Individual Standard,Much,money for one person can be too,little,for another.,Purposive Standard,One room large enough for a persons,sleeping,must be too small for his,jogging,.,Declarative Standard,The origin of an order such as next must be declared explicitly just as next,to him,.,Standards,Table 2 StandardsCategories o,17,1,i,2,(,1,2,),i,(,1,2,),i,:tempo-logical connective,j,:locus,:binary logical connective(i.e.,),:AND,i,:temporal relation between loci such as before,during,etc.,Tempo-logical connectives,1 i 2 (1 2)i(1,18,Definition of,i,The durations of,1,and,2,are t,11,t,12,and t,21,t,22,respectively.,Definition of i The durations,19,Conceptualization of sensory events,.L(,x,x,p,q,A12,Gt,k,),L(,x,y,p,q,A12,Gt,k,),x,y,p,q,.,x,y,x,y,A12,:Location,Time,Conceptualization,Event 1,Event N,Formalization,x,y,Conceptualization of sensory e,20,(,x,y,p1,p2,k)L(,x,x,p1,p2,A12,Gt,k),(L(,x,x,p2,p1,A12,Gt,k),(L(,x,y,p2,p1,A12,Gt,k),x,y,p1,p2,:Simultaneous AND,(SAND),:Consecutive AND,(CAND),t,1,t,2,p1,x,A12,t,3,y,p2,t,SAND and CAND,Image of,x,fetches,y,(x,y,p1,p2,k)L(x,x,p1,21,A