资源预览内容
第1页 / 共25页
第2页 / 共25页
第3页 / 共25页
第4页 / 共25页
第5页 / 共25页
第6页 / 共25页
第7页 / 共25页
第8页 / 共25页
第9页 / 共25页
第10页 / 共25页
第11页 / 共25页
第12页 / 共25页
第13页 / 共25页
第14页 / 共25页
第15页 / 共25页
第16页 / 共25页
第17页 / 共25页
第18页 / 共25页
第19页 / 共25页
第20页 / 共25页
亲,该文档总共25页,到这儿已超出免费预览范围,如果喜欢就下载吧!
点击查看更多>>
资源描述
Click to edit the title text format,Click to edit the outline text format,Second Outline Level,Third Outline Level,Fourth Outline Level,Fifth Outline Level,Sixth Outline Level,Seventh Outline Level,Eighth Outline Level,Ninth Outline Level,OSKI:A Library of Automatically Tuned Sparse Matrix Kernels,Richard Vuduc(LLNL),James Demmel,Katherine Yelick,Berkeley Benchmarking and OPtimization(BeBOP)Project,EECS Department,University of California,Berkeley,SIAM CSE,February 12,2005,OSKI:Optimized Sparse Kernel Interface,Sparse kernels tuned for users matrix&machine,Hides complexity of run-time tuning,Low-level BLAS-style functionality,Sparse matrix-vector multiply(SpMV),triangular solve(TrSV),Includes fast locality-aware kernels:ATA*x,Initial target:cache-based superscalar uniprocessors,Faster than standard implementations,Up to 4x faster SpMV,1.8x TrSV,4x ATA*x,For“advanced users&solver library writers,Available as stand-alone open-source library(pre-release),PETSc extension in progress,Written in C(can call from Fortran),Motivation:The Difficulty of Tuning,n=21216,nnz=1.5 M,kernel:SpMV,Source:NASA structural analysis problem,8x8,dense substructure,Speedups on Itanium 2:The Need for Search,Reference,Best:4x2,Mflop/s,Mflop/s,SpMV Performanceraefsky3,SpMV Performanceraefsky3,How OSKI Tunes(Overview),Benchmark,data,1.Build for,Target,Arch.,2.Benchmark,Heuristic,models,1.Evaluate,Models,Generated,code,variants,2.Select,Data Struct.,&Code,Library Install-Time(offline),Application Run-Time,To user:,Matrix handle,for kernel,calls,Workload,from program,monitoring,Extensibility:Advanced users may write&dynamically add“Code variants and“Heuristic models to system.,History,Matrix,Cost of Tuning,Non-trivial run-time tuning cost:up to 40 mat-vecs,Dominated by conversion time,Design point:user calls“tune routine explicitly,Exposes cost,Tuning time limited using estimated workload,Provided by user or inferred by library,User may save tuning results,To apply on future runs with similar matrix,Stored in“human-readable format,How to Call OSKI:Basic Usage,May gradually migrate existing apps,Step 1:“Wrap existing data structures,Step 2:Make BLAS-like kernel calls,int*ptr=,*ind=;double*val=;,/*,Matrix,in CSR format,*/,double*x=,*y=;,/*,Let,x,and,y,be two dense vectors,*/,/*,Compute y=,y+,Ax,500 times,*/,for(i=0;i 500;i+),my_matmult(ptr,ind,val,x,b,y);,How to Call OSKI:Basic Usage,May gradually migrate existing apps,Step 1:“Wrap existing data structures,Step 2:Make BLAS-like kernel calls,int*ptr=,*ind=;double*val=;,/*,Matrix,in CSR format,*/,double*x=,*y=;,/*,Let,x,and,y,be two dense vectors,*/,/*,Step 1:Create OSKI wrappers around this data,*/,oski_matrix_t,A_tunable=,oski_CreateMatCSR,(ptr,ind,val,num_rows,num_cols,SHARE_INPUTMAT,);,oski_vecview_t,x_view=,oski_CreateVecView,(x,num_cols,UNIT_STRIDE,);,oski_vecview_t,y_view=,oski_CreateVecView,(y,num_rows,UNIT_STRIDE,);,/*,Compute y=,y+,Ax,500 times,*/,for(i=0;i 500;i+),my_matmult(ptr,ind,val,x,b,y);,How to Call OSKI:Basic Usage,May gradually migrate existing apps,Step 1:“Wrap existing data structures,Step 2:Make BLAS-like kernel calls,int*ptr=,*ind=;double*val=;,/*,Matrix,in CSR format,*/,double*x=,*y=;,/*,Let,x,and,y,be two dense vectors,*/,/*,Step 1:Create OSKI wrappers around this data,*/,oski_matrix_t,A_tunable=,oski_CreateMatCSR,(ptr,ind,val,num_rows,num_cols,SHARE_INPUTMAT,);,oski_vecview_t,x_view=,oski_CreateVecView,(x,num_cols,UNIT_STRIDE,);,oski_vecview_t,y_view=,oski_CreateVecView,(y,num_rows,UNIT_STRIDE,);,/*,Compute y=,y+,Ax,500 times,*/,for(i=0;i 500;i+),oski_MatMult,(A_tunable,OP_NORMAL,x_view,y_view);,/*Step 2*/,How to Call OSKI:Tune with Explicit Hints,User calls“tune routine,May provide explicit tuning hints(OPTIONAL),oski_matrix_t,A_tunable=,oski_CreateMatCSR,();,/*/,/*,Tell OSKI we will call SpMV 500 times(workload hint),*/,oski_SetHintMatMult,(A_tunable,OP_NORMAL,x_view,y_view,500,);,/*,Tell OSKI we think the matrix has 8x8 blocks(structural hint),*/,oski_SetHint,(A_tunable,HINT_SINGLE_BLOCKSIZE,8,8);,oski_TuneMat,(A_tunable);,/*,Ask OSKI to tune,*/,for(i=0;i 500;i+),oski_MatMult,(A_tunable,OP_NORMAL,x_view,y_view);,How the User Calls OSKI:Implicit Tuning,Ask library to infer workload,Library profiles all kernel calls,May periodically re-tune,oski_matrix_t,A_tunable=,oski_CreateMatCSR,();,/*/,for(i=0;i 500;i+),oski_MatMult,(A_tunable,OP_NORMAL,x_view,y_view);,oski_TuneMat,(A_tunable);,/*,Ask OSKI to tune,*/,Additional Features,Embedded scripting language for selecting customized,complex transformations,Mechanism to save/restore transformations,#In file,“my_xform.txt,#Compute Afast=P*A*PT using Pinars reordering algorithm,A_fast,P=reorder_TSP(InputMat);,#Split Afast=A1+A2,where A1 in 2x2 block format,A2 in CSR,A1,A2=A_fast.extract_blocks(2,2);,return transpose(P)*(A1+A2)*P;,/*In“my_app.c*/,fp=fopen(“my_xform.txt,“rt);,fgets(buffer,BUFSIZE,fp);,oski_ApplyMatTransform(A_tunable,buffer);,oski_MatMult(A_tunable,);,Additional Features,GNU AutoTools(autoconf)based install
点击显示更多内容>>

最新DOC

最新PPT

最新RAR

收藏 下载该资源
网站客服QQ:3392350380
装配图网版权所有
苏ICP备12009002号-6