按一下以編輯母片標題樣式,按一下以編輯母片,第二層,第三層,第四層,第五層,*,Application of Support Vector Machine to detect an association between a disease or trait and multiple SNP variations,Author:Gene Kim,MyungHo Kim,Advisor:Dr.Hsu,Graduate:Ching-Wen Hong,Application of Support Vector,1,Outline,1.Motivation,2.Objective,3.Whats SNP(single nucleotide polymorphism),4.How to find SNP variations,5.A review of Support Vector Machine,6.A representation of multiple SNP variations as a vector,7.The marks,8.Inseparable Case,9.Test results with clinical data,10.,Personal opinion,Outline1.Motivation,2,Motivation,研究每個人的單一核甘酸多型性(SNP)的差異,可以幫助了解致病基因,甚至預測藥物對個人是否具有療效,進一步設計量身訂做藥物,對新藥的開發有極大的影響。,SNP,的研究是後基因時代生技產業發展的主要趨勢。,Motivation研究每個人的單一核甘酸多型性(SNP,3,Objective,We can present a method of detecting whether there is an association between multiple SNP variations and a trait or disease.,The method exploits the Support Vector Machine(SVM)which has been attracting lots of attentions recently.,ObjectiveWe can present a meth,4,Whats SNP,何謂SNP(單一核甘酸多型性),雖然同種生物其染色體差異極小,但平均1000,個鹼基對(base pair)就有一個發生突變,這些變異稱為SNP,是造成每個人對藥物的敏感性不同,、血型不同、身高 等等的原因,。,此外,SNP,也和癌症,、心血管疾病、自體免疫等等疾病有關。目前國內賽亞基因和台大醫院合作,正從事C型肝炎SNP研究,試圖找出病患的SNP,以預測藥物是否對病人有效。,Whats SNP何謂SNP(單一核甘酸多型性),5,Whats SNP,A genetic marker is M1,M2,in the DNA,The different variants of DNA that different people have at the marker are alleles,denoted by 1,2,3.,The number of alleles per marker is small:typically less than ten(for called microsatellite marker)or exactly two(for called SNPs).,Whats SNP A genetic marker is,6,How to find SNP variations,The problem of determining whether a set of SNP variation cause a specific disease or trait could be formulated as follows.For a given disease or trait,1.For each set of SNP variations,find its representation as a vector in a Euclidean space.(haplotype data,clinical data,.we will discuss this in the page9),2.Get a systematic way of distinguishing SNP genotype of normal people from ones of people with the disease or trait.,We will use the Support Vector Machine(SVM)to separate SNP vectors into two groups(normal,sick).,How to find SNP variationsThe,7,A review of Support Vector Machine,What is a SVM?,a family of learning algorithm for classification of objects into two classes.,Input:a training set,(x,1,y,1,),(x,l,y,l,)of object x,i,E(n-dim vector space)and their known classes y,i,E-1,+1.,Output:a classifier f:,-1,+1.which predicts the class f(x)for any(new)object x E,A review of Support Vector M,8,A review of Support Vector Machine,(1).Linear SVM for separable training sets:,a training set S=,(x,1,y,1,),(x,l,y,l,),xiE,yi E-1,+1.,A review of Support Vector M,9,A review of Support Vector Machine,The optimal hyperplane is defined by the pair(w,b).,Solve the linear program problem,Min,w,st.,y,i,(,x,i,w+b)-10 ,i=1,l,This is a class quadratic(convex)program,A review of Support Vector M,10,A review of Support Vector Machine,(2).Linear SVM for non-separable training sets,Solve the linear program problem,Min,w,+C,(,i,),c is a extreme large value,S.t.,y,i,(,x,i,w+b)-1+,i,0 ,i,0,0,ic,i=1,l,A review of Support Vector M,11,A representation of multiple SNP variations as a vector,Scheme,Given each disease or trait,and a collection of SNP data which depending on genotype in a consistent way.(haplotype,clinical data):7 step,1,.,Assume that there is no environmental factor.,2.SNP locations are assumed to be know for the disease or trait.,3.Assume there is a reference SNP data.(good health records),4.By giving scores based on difference from the reference data,assign a vector to each SNP data.,A representation of multiple S,12,A representation of multiple SNP variations as a vector,The dimension of vector is the number of SNPs to the related disease or trait.,5.A training set is chosen for the disease or trait,in other words,SNP genotype data of normal and sick population.,6.By using Step 4,compute the SNP vectors of the training data set(xi,yi),xi is a SNP data,yi=1(sick)or -1(normal),7.Use the SVM to get a hyperplane dividing into two groups(sick,normal),A representation of multiple S,13,The remarks,1.The reference data can be built by collecting SNP genotypes from the healthy normal population.,2.The hyperplane obatined can be considered as acriterion,and,given a new data set,it can be used for testing whether the person of the data is susceptible to the disease or trait.,3.Representation of an object as a vector might be critical for making use the SVM.How to make domain knowledge contained in vector representations is one of the major issues.,4.The idea of difference scoring could be applied to other data sets(visual data such as X-ray or MRI image,),in particular,to haplotype data and to find out a linkage among SNP to the disease or trait.,5.Once a group of SNP patterns are identified,it can compute contribution score of each of those SNP to the disease or trait.,The remarks1.The reference da,14,Inseparable Case,For the inseparable case,the iterated use of SVM enables us to divide a collection of labe