,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,2004.10,*,单击此处编辑母版标题样式,第,三,章 回归分析,Legal Notice,声明,:,版权为南京理工大学韩之俊教授所有,未经本人书面同意不得拷贝,否则必究,2004.10,2,本章内容:,3.1,回归分析概述,3.2,一元回归分析方法及案例,2004.10,3,3.1,概述,3.1.1,一般提法(数学模型),设,x,自变量,非随机变量,y,因变量,随机变量,不可观测的随机变量,y=f(x)+,N(0,2,),2004.10,4,x,y,为一对相关变量,例,x,父亲身高,y,儿子身高,回归分析的研究对象:,相关变量的统计规律,3.1.1,一般提法,2004.10,5,3.1.2,回归分析的类型,线形回归,f(x)=,+,x,非线形回归,代换成线形回归来研究,一元回归:,一个自变量,多元回归:,多个自变量,2004.10,6,3.1.3,回归分析的方法,获取数据,(x,i,y,i,)i=1,2,n n,20,回归方程:,采用最小二乘原理,估计有关参数,,建立如下回归方程:,2004.10,7,回归方程的显著性检验,相关分析法,方差分析法,应用预测与控制,3.1.3,回归分析的方法,2004.10,8,3.2,一元线性回归,3.2.1,数据的收集,(x,i,y,i,)i=1,2,n,y,i,=,+x,i,+,i,i,N(0,2,),2004.10,9,3.2.2,作散点图(散布图,相关图),正相关,负相关,不相关,曲线相关,x,y,x,x,x,y,y,y,2004.10,10,3.2.3,建立回归方程,最小二乘法,Q,(a,b),=,y,i,-(a+bx,i,),2,=min,i=1,x,x,i,n,y,y,i,a+bx,i,y,=,a+bx,2004.10,11,3.2.3,建立回归方程,Q,(a,b),=,y,i,-(a+bx,i,),2,=min,n,i=1,2004.10,12,为了简便,引入以下符号:,则有:,2004.10,13,案例,一位工业心理学家获得了,10,个工人的智商值和劳动生产率,其结果见列表,试计算智商值与劳动生产率之间的相关系数,r,,,并对,r,进行显著性检验,(,取,0.05),。,样本序号,智商值,x,劳动生产率,y(,pcs/h,),x,i,2,y,i,2,x,i,y,i,1,2,3,4,5,6,7,8,9,10,110,120,130,126,122,121,103,98,80,97,5.2,6.0,6.3,5.7,4.8,4.2,3.0,2.9,2.7,3.2,12100,14400,16900,15876,14884,14641,10609,9604,6400,9409,27.04,36.00,39.69,32.49,23.04,17.64,9.00,8.41,7.29,10.24,572.0,720.0,819.0,718.2,585.6,508.2,309.0,284.2,216.0,310.4,合计,1107,44.0,124823,210.84,5042.6,平均,110.7,4.4,-,-,-,2004.10,14,ixyye,11105.24.340.86,21206.05.100.90,31306.35.850.45,41265.75.550.15,51224.85.25-0.45,61214.25.17-0.97,71033.03.82-0.82,8982.93.44-0.54,9802.72.080.62,10973.23.36-0.16,Minitab,数据见“智商,.,mtw,”,2004.10,15,相关散点图,Scatter Plot,2004.10,16,解,L,xy,=5042.6-,(,1107*44,),/10=171.8,L,xx=124823-1107,2,/10=2278.1,L,yy,=210.84-44,2,/10=17.24,b,=,171.8/2278.1=0.075,a,=,4.4-0.075x110.7=,-3.95,=,-3.95,+0.075,x,2004.10,17,3.2.4,回归的显著性检验,H,0,:,=0,H,1,:,0,(1),相关分析法,例解,0.01,=,0.765,0.05,=,0.632,查表:,x,、,y,高度线性相关,(高度显著),2004.10,18,(2),方差分析,S,T,=,l,yy,=,17.24,例解,总波动平方和,S,T,f,T,=,n,-,1,=,9,误差波动平方和,S,e,(,剩余平方和、残差平方和),S,回,=,b,2,l,xx,=,b,l,xy,=0.075x171.8=12.89,f,回,=,1,(一元线性回归),回归波动平方和,S,回,S,e,=,S,T,-S,回,=17.42-12.89=4.35,f,e,=,f,T,-f,回,=9-1=8,2004.10,19,用,MINITAB,的输出,The regression equation is,回归方程为:,Y=-3.95+0.0754 X,Predictor,Coef,SE,Coef,T P,Constant -3.948 1.713 -2.31 0.050,X 0.07541 0.01533 4.92 0.001,S=0.731772 R-Sq=75.2%R-,Sq(adj,)=72.0%,Analysis of Variance,Source DF SS MS F P,Regression 1 12.956 12.956 24.19 0.001,Residual Error 8 4.284 0.535,Total 9 17.240,P,F,8,1,(0.01)=11.26,,,故拒绝原假设,认为有高度显著相关。,来源,S,f,V,F,x,S,回,=12.89,1,12.89,F=23.9*,e,Se=4.35,8,V,e,=,0.54,T,S,T,=17.24,9,2004.10,21,残差的正态性,2004.10,22,残差与顺序的随机关系,2004.10,23,3.2.5,预测问题,一般提法,给定自变量,x,=x,0,置信水平,1-,=0.95,预测因变量的取值范围,预测区间,预测区间的算法,区间中心,a+bx,0,区间半径,2004.10,24,案例,当智商值,x,为,115,时,置信水平,1-,=0.95,下,预测劳动生产率,y=?,解:,设预测区间为:,a+bx,0,d,中点:,a+bx,0,=,-3.90+0.075x115=4.73,预测区间:,4.73,1.72=(3.01,6.45),半径:,d,=,1+0.1+x5.32x0.5=1.72,(115-110.7),2,2278.1,2004.10,25,不能轻易外推出建立方程的数据区间,否则预测误差大。,注意点,X,取值范围越大,方程精度越高。,2004.10,26,