,11/7/2018,#,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,2018/11/8,#,大数据实时计算,Flink SQL,架,构介绍,技术创新,变革未来,大数据实时计算 Flink SQL 架构介绍技术创新,1,目,录,1,Background,2,Flink,SQL,基本概念,3,Flink,SQL,核心功能,4,Flink,SQL,优化,5,阿里云流计算产品,目 录1Background2Flink SQL 基本概念3,2,Background,Background,3,Alibaba,Blink,阿里巴巴,Blink,团队有,20+,flink,contributor,6,名,committer,,向社区贡,献了数百个Commit,+,=,Apache,Flink,Alibabas,Improvements,Blink,Alibaba,Blink,Blink,Runtime,+ Flink,SQL,=,Alibaba Blink阿里巴巴Blink团队有 20+,4,团队工作,主导制定,Flink,SQL,语义,DynamicTable,2016-2017,Retraction,2016-2017,完,善,Flink,SQL,功能,Agg,Join,Window,2017,跑通全部,TPCH,Query,2018,性能提升,大量的查询优化,2017-2018,资源配置自动化,2018,贡献社区,贡献社区,部分贡献社区,团队工作主导制定 Flink SQL 语义完善 Flink,5,Flink SQL,Flink SQL,6,Why,SQL?,De,c,lar,a,t,i,v,e,One,Query,One,Result,Optimized,Understandable,Stable,U,n,i,f,y,Why SQL?DeclarativeOne Query,7,SQL,不是为流设计的,没有,Retraction,批计算查询返回一个结果并结束,数据是有限的,批处理,流数据是无穷的,流上的查询不断产生结果且不会结束,有对历史数据的修改,(Retraction),流处理,SQL 不是为流设计的没有Retraction批计算查询返回,8,动态表,(,Dynamic,Table,),动态表,(,Dynamic,Table,),:,数据会随着时间变化的表,动态表(Dynamic Table)动态表(Dynamic,9,动态表,+,连续查询,连续查询,(,Continuous,Query,):持续运行的查询,Stream,Stream,连,续,查,询,Stream,Stream,连,续,查,询,连续查询,Stream,动态表 + 连续查询连续查询(Continuous Quer,10,流计算,Retraction,流计算 Retraction,11,流计算,Retraction,流计算 Retraction,12,世界上不需要所谓的,Stream,SQL,标准的,ANSI,SQL,就可以用来定义流计算,世界上不需要所谓的 Stream SQL,13,Flink,SQL,核心功能,DDL,&,DML,UDF/UDTF/UDAF,Window,Agg,Join,Group,Agg,Over,Agg,Flink SQL 核心功能DDL & DMLUDF/UDT,14,Loading,Data,-,定义数据源表,CREATE TABLE,clicks,(,VARCHAR,TI,M,ES,T,A,M,P,VARCHAR,user,c,Ti,m,e url,),WITH,(,type =,kafka,topic =,click_topic,);,SELECT,*,FROM,clicks,user,cTime,url,Mary,12:00:00,./home,Bob,12:00:00,./cart,Mary,12:00:05,./prod?id=1,Loading Data- 定义数据源表VARCHAR,15,S,a,ving,Data,-,定义数据结果表,CREATE TABLE,last_clicks,(,user,c,Ti,m,e url,VARCHAR,TI,M,ES,T,A,M,P,VARCHAR,PRIMARY KEY,(user),),WITH,(,type =,mysql,);,INSERT INTO,last_clicks,SELECT,*,FROM,clicks,Saving Data- 定义数据结果表user cT,16,Multi Output,CREATE,VIEW,taobao_clicks,AS,SELECT,*,FROM,clicks WHERE,url LIKE,%,INSERT,INTO,mysql_result SELECT,*,FROM,taobao_clicks,INSERT,INTO,hbase_result,SELECT,*,FROM,taobao_clicks,CREATE TABLE,mysql_clicks,(,user cTime url,VARCHAR,TI,M,E,S,TA,MP,VARCHAR,PRIMARY KEY,(user),),WITH,(,type,=,mysql,);,CREATE TABLE,hbase_clicks,(,user cTime url,VARCHAR,TIME,S,TAM,P,VARCHAR,PRIMARY KEY,(user),),WITH,(,type,=,hbase,);,Multi OutputCREATE VIEW taoba,17,Group,Aggregate,Mary,1,Mary,2,result,user,cnt,Mary,3,Bob,1,SELECT,user,COUNT(url) as,cnt FROM,clicks,GROUP BY,user,clicks,user,cTime,url,Mary,12:00:00,./home,Bob,12:00:00,./cart,Mary,12:00:05,./prod?id=1,Mary,12:01:45,./prod?id=7,从历史到现在每个用户点击的次数,Group AggregateMary1Mary2resul,18,Window,Aggregate,每小时每个用户点击的次数,result,user,endT,cnt,Mary,13:00:00,3,Bob,13:00:00,1,Bob,14:00:00,1,Liz,14:00:00,2,Bob,13:01:00,./prod?id=4,Liz,13:30:00,./cart,Liz,13:59:00,./home,SELECT,user,T,U,M,B,L,E,_,E,N,D,(,cTime,INTERVAL 1,HOURS),AS endT, COUNT(url) AS,cnt,FROM,clicks,GROUP,BY,user,T,U,M,B,L,E,(,cTime,INTERVAL 1,HOURS),clicks,user,cTime,url,Mary,12:00:00,./home,Bob,12:00:00,./cart,Mary,12:02:00,./prod?id=2,Mary,12:55:00,./home,Window Aggregate每小时每个用户点击的次数re,19,双,流,JOIN,:,支持,INNER,LEFT,RIGHT,FULL,SEMI,ANTI,SELECT,o.orderId, o.productId, o.orderTime,s.shipTim,FROM Order,JOIN,Shipm,ON,o.order,Orders,orderId,productId,orderTime,5,30,10:17:00,6,10,10:17:05,9,10,11:02:00,12,10,11:24:11,Shipments,orderId,shipTime,5,10:55:00,6,10:20:00,9,11:58:00,12,11:44:00,e,s AS,o,ents,AS,s,Id,=,s.orderId,result,orderId,productId,orderTime,shipTime,5,30,10:17:00,10:55:00,6,10,10:17:05,10:20:00,9,10,11:02:00,11:58:00,12,10,11:24:11,11:44:00,双流 JOIN:支持 INNER, LEFT, RIGHT,20,维,表,JOIN,:,支持,INNER,LEFT,CREATE,TABLE,Products,(,productId VARCHAR, productName VARCHAR,price,DECIMAL,PRIMARY KEY (productId),PERIOD FOR,SYSTEM_TIME,) WITH (,type,=,hbase,);,SELECT,o.*,p.*,FROM,Orders,AS,o,JOIN,Products,FOR,SYSTEM_TIME AS OF,PROCTIME(),AS,p,ON,o.productId,=,p.productId,维表 JOIN:支持 INNER, LEFTCREATE T,21,聊几个优化,聊几个优化,22,a,w,a,b,w,b,DataBase,Reduced,Throughput,Wait,for,Response,a,d,DataBase,a,b,c,b,c,d,Send,Request,Receive,Request,W,a,i,t,Concurrent,Processing,Increased,Throughput,Sync.,IO,Async.,IO,异步维,表,JOIN,awabwbDataBaseReduced Throughp,23,异步维,表,JOIN,CREATE TABLE Products,(,productId,VARCHAR, productName VARCHAR, price,DECIMAL,PRIMARY,KEY,(productId), PERIOD FOR,SYSTEM_TIME,),WITH,(,type,=,hbase,async,=,true,);,SELECT,o.*,p.* FROM Orders,AS,o,JOIN Products FOR SYSTEM_TIME,AS OF,PROCTIME(),AS,p,ON o.productId,=,p.productId,一行配置的改动,异步维表 JOINCREATE TABLE Products,24,A,g,g,A,g,g,Map,Map,Map,如何处理数据倾斜,Data-Skew,AggAggMapMapMap如何处理数据倾斜Data-Sk,25,如何处理数据倾斜,Data-Skew,A,g,g,A,g,g,Map,Map,Map,H,o,t,!,!,反压,反压,反压,如何处理数据倾斜Data-SkewAggAggMapMapM,26,如何处理数据倾斜,Data-Skew,Local-Global,Aggregation,优化,如何处理数据倾斜Data-SkewLocal-Global,27,如何处理数据倾斜,Data-Skew,Lo,c,al Agg,Lo,c,al,Agg,Lo,c,al,Agg,Global Agg,G,lobal Agg,A,g,g,A,g,g,Map,Map,Map,Map,Map,Map,Local-Global,Aggregation,Simple,Aggregation,如何处理数据倾斜Data-SkewLocal AggLoc,28,Local-Global,带来,20X,的性能提升,优化前,优化后,Local-Global优化前优化后,29,阿里云流计算产品,阿里云流计算产品,30,大数据实时计算Flink-SQL架构介绍课件,31,大数据实时计算Flink-SQL架构介绍课件,32,谢 谢!,谢 谢!,33,