CONTEXT-SENSITIVE POINTER ANALYSIS USING BINARY DECISION DIA
更新时间:2023-04-25 01:37:01 阅读量: 实用文档 文档下载
- contest推荐度:
- 相关推荐
CONTEXT-SENSITIVE POINTER ANALYSIS
USING BINARY DECISION DIAGRAMS
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
John Whaley
March2007
c Copyright by John Whaley2007
All Rights Reserved
ii
I certify that I have read this dissertation and that,in my opinion,it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.
(David L.Dill)
I certify that I have read this dissertation and that,in my opinion,it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.
iv
Abstract
This thesis shows that whole-program context-sensitive inclusion-based pointer anal-ysis,a previously intractable problem,can be e?ciently solved using binary decision diagrams.In addition,we show that it is possible to automatically translate from a high-level analysis speci?cation written in Datalog into an e?cient implementation using binary decision diagrams.
We present the?rst scalable context-sensitive,inclusion-based pointer analysis for Java programs.Our approach to context sensitivity is to create a clone of a method for every context of interest,and run a context-insensitive algorithm over the expanded call graph to get context-sensitive results.For precision,we generate a clone for every acyclic path through a program’s call graph,treating methods in a strongly connected component as a single node.Normally,this formulation is hopelessly intractable as a call graph often has1014acyclic paths or more.We show that these exponential relations can be computed e?ciently using BDDs.
We also describe bddbddb,a BDD-Based Deductive DataBase,which implements the declarative language Datalog with strati?ed negation,totally-ordered?nite do-mains and comparison operators.bddbddb uses binary decision diagrams(BDDs)to e?ciently represent large relations.BDD operations take time proportional to the size of the data structure,not the number of tuples in a relation,which leads to fast execution times.bddbddb is an e?ective tool for implementing a large class of program analyses.We show that a context-insensitive pointer analysis implemented with bddbddb is about twice as fast as a carefully hand-tuned version.
v
Preface
When I’m working on a problem,I never think about beauty.I think only
how to solve the problem.But when I have?nished,if the solution is not
beautiful,I know it is wrong.
R.Buckminster Fuller(1895–1983)
Shortly after arriving at Stanford,my adviser Monica Lam challenged me to “solve”pointer analysis.At that time,the only pointer analysis that could success-fully scale to large programs was an imprecise equivalence-based pointer analysis.I played around with various techniques to improve performance and scalability,achiev-ing some moderate success but a breakthrough remained elusive.The best I was able to achieve was selective context sensitivity,where a client analysis could selectively invoke context sensitivity to improve precision on parts of the program where it would make a di?erence.Interesting,but not totally satisfying,and the algorithm was so di?cult few people could understand it.
In2003,I read a paper by Berndl et al.about implementing a context-insensitive pointer analysis using binary decision diagrams.What made the paper interesting was not the experimental results—the BDD implementation was actually slower than a more traditional implementation—but the fact that the algorithm formulation and implementation was greatly simpli?ed by using BDDs.
I implemented Berndl’s algorithm and began to experiment with it.I soon realized the key to its performance was in its e?cient data representation.At that point,I realized I had been on the wrong path.I had been trying to come up with more and more complicated algorithms to solve the pointer analysis problem.Instead,I
vi
should have focused on how to represent the points-to data e?ciently—an e?cient algorithm would follow from the data representation.
As my experience with BDDs grew,I began to think they could be used to imple-ment context sensitivity e?ciently.The common substructure,e?cient computation on large sets,and memoization features of BDDs all seemed to be good matches for context-sensitive analysis.My?rst attempts were unsuccessful,but I persevered because I believed it could be made to work.And after many months of slow and methodical progress,I?nally had a fully context-sensitive inclusion-based pointer analysis that could analyze very large Java programs on the order of tens of minutes.
Unfortunately,after all of the optimizations and hacks that were required for good performance,the implementation had little semblance to the high-level algorithm. The simple elegance of a BDD solution was lost.Furthermore,I had worked on that one algorithm for nearly a year.There were still many other analyses I wanted to work on,and I did not relish the idea of spending a year for each one.
At that time,Monica Lam was working on a new edition of the venerated Dragon book and was struggling with the question of how to present di?cult material in an easy and understandable way.Her desire to make things simpler and more general for the Dragon book greatly in?uenced my direction and fundamentally changed my view of research.I realized that for research to have a signi?cant impact,it should be easy to understand and applicable to a wide variety of problems.
I started writing a tool that would automatically translate from a high-level algo-rithm speci?cation into an e?cient implementation.Interactions with Thomas Reps and Je?Ullman lead me to Datalog,which had a rich body of research behind it and a semantics that closely matched BDDs.I incorporated all of the tricks I had learned from my one year experience with using BDDs for program analysis.My intention was to abstract away the BDD back-end as much as possible,and allow the user to simply specify a set of inference rules and have it work e?ciently.
One of my main goals was that I wanted the system to be usable by others.Many people were intrigued by the possibility of writing program analyses in just a few lines of code and being able to automatically translate a speci?cation into an e?cient implementation.Early on,I worked closely with other Stanford students,especially
vii
Chris Unkel,Dzintars Avots,Michael Martin,Ben Livshits,Michael Carbin,Jim Zhuang,and Mayur Naik,who were using the system to implement their own program analyses.Their feedback led to great improvements in usability.
The tool’s name,bddbddb,came out of an email discussion with Monica Lam and Chris Unkel.The name was both silly and memorable,and would certainly show up ?rst on a Google search.In other words,an excellent name.I still get chuckles from the audience about the name when I do presentations about bddbddb.
My intention with this dissertation is for it to be the most complete and authorita-tive reference on the bddbddb system and program analysis using Datalog and BDDs. As with any system,bddbddb is constantly evolving and any documentation will soon be hopelessly out-of-date.However,this dissertation will attempt to describe the core principles and techniques without getting caught up in the particulars.After reading this dissertation,the reader should have enough knowledge to be able to formulate their own program analyses in Datalog and build their own BDD-based solver,if they were so inclined.
I also intended this dissertation to be readable by someone who has some knowl-edge of compilers and program analysis.Thus,I have tried to include enough back-ground for someone with the motivation and perseverance to be able to pick up this dissertation and read it without getting too lost.I don’t fancy myself a writer,but I have made an e?ort to make the text and concepts as clear as possible.
Most of the work in this dissertation is my own,with the notable exceptions of the BDD variable learning algorithm(Section4.3),which was primarily the work of Michael Carbin,the re?ection algorithm(Section5.2)which was joint work with Benjamin Livshits,and the static race detection algorithm(Section5.3),which was designed by Mayur Naik.I included these for completeness and as a demonstration that others have been able to build useful analyses using bddbddb.
I have organized this dissertation into eight chapters.Chapter1gives an intro-duction and describes why I believe program analysis,high-level speci?cation,and context sensitivity are important areas of research.Chapter2gives some background on the research areas relevant to the rest of the dissertation—namely,program anal-ysis(2.1),pointer analysis(2.2),Datalog(2.3),binary decision diagrams(2.4),and
viii
context sensitivity(2.5).Chapter3describes my approach to pointer analysis and context sensitivity using BDDs.Chapter4describes the bddbddb system in detail. Chapter5describes various applications of the bddbddb system by myself and others. Chapter6presents some experimental results from the system.Chapter7compares my work with related work,and Chapter8concludes.
Someone once said,“A thesis is never?nished,merely abandoned.”I now realize the point where one comes to fully understand and appreciate that statement is the point of enlightenment that allows one to graduate.There are many areas that I wanted to pursue and investigate more thoroughly,however after time and re?ection I realized it was better to move on and pursue other directions.
When I consider this dissertation as a whole,there was a frightening amount of work.Even more frightening perhaps is the realization that a large portion of what I worked on during my Masters and Ph.D.—in the areas of pro?ling[26,244], dynamic compilation[10,46,243,245],pointer and escape analysis[203,252,255], checkpointing[246],component interfaces[254],virtual machines[247,249],and spec-ulative parallelization[251]—does not even warrant a mention in this dissertation. But I’ve realized that a Ph.D.is not about the end result,it is about the journey.
John Whaley
March2007
ix
Acknowledgments
?First o?,I must thank my wife Miho,who gives meaning to everything I do.?I want to thank my adviser Monica Lam for teaching me so much and pushing
me to excellence.Much of the work in this thesis is directly or indirectly due to her.And never in my wildest dreams would I have imagined that my research would end up as Chapter12in the venerable Dragon book.
?I would also like to thank Martin Rinard,who was the advisor for my Masters thesis at MIT.He taught me many valuable lessons about research that I still use today.
?My Ph.D.orals committee—Juan Alonso,David Dill,Dawson Engler,Monica Lam,and Je?Ullman—did a stellar job.I was extremely lucky to have such
a superstar committee and I’ll never forget the experience.I am especially
indebted to my dissertation readers Monica,David,and Dawson,who gave me very fast turnarounds and great comments.
?I would also like to thank all of my coworkers from IBM Watson and IBM Tokyo who introduced me to compiler research and gave me many opportunities early in my career.
?I want to give thanks to all of the Stanford students I’ve worked with over the years:Dzintars Avots,Godmar Back,David Bloom,Michael Carbin,Brian Carlstrom,Ramesh Chandra,Benjamin Chelf,Andy Chou,Jim Chow,Ben D’Angelo,Michael Dalton,Seth Hallem,Sudheendra Hangal,David Heine,Ted Kremenek,Shih-Wei Liao,Amy Lim,Ben Livshits,Michael Martin,Mayur
x
Naik,Brian Murphy,Mayur Naik,Jim Norris,Je?Oplinger,Shankar Pon-nekanti,Will Robinson,Olatunji Ruwase,Joel Sandin,Costa Sapuntzakis, Patrick Sathyanathan,Brad Schumitsch,Garrett Smith,Paul Twohey,Chris Unkel,Yichen Xie,Junfeng Yang,Nickolai Zeldovich,Jim Zhuang,and the countless others that I forgot to write.
?I especially want to give thanks to my paper collaborators Michael Martin,Ben Livshits,Dzintars Avots,Michael Carbin,Chris Unkel,Christos Kozyrakis, Mayur Naik,and Alex Aiken.
?I want to thank my parents for buying that TI/99-4back in1980that started a lifetime passion with computers and programming.
?I’d also like to thank NSF and Intel for providing?nancial support for my time at Stanford.
?Finally,I would like to thank you,the reader.Most Ph.D.theses are never read after they are signed,and so the fact you are sitting down and reading this means you probably think there is some value in it.Although I cannot promise anything,I will do my best not to disappoint you.
xi
Contents
Abstract v
Preface vi Acknowledgments x
1Introduction1
1.1Motivation (1)
1.2Challenges (2)
1.2.1Pointer Analysis (4)
1.2.2Context-Sensitive Program Analysis (5)
1.3Our Solution (5)
1.3.1Binary Decision Diagrams (6)
1.3.2Cloning to Achieve Context Sensitivity (6)
1.3.3High-level Speci?cation to E?cient Implementation (7)
1.4List of Contributions (10)
1.5Organization (13)
2Background14
2.1Program Analysis (14)
2.1.1Applications of Program Analysis (15)
2.1.2Styles of Program Analysis (15)
2.2Pointer Analysis (16)
2.2.1Types of Pointer Analysis (17)
xii
2.2.2Call Graph Discovery (21)
2.3Datalog (23)
2.3.1Introduction to Datalog (23)
2.3.2Datalog Semantics (25)
2.3.3Evaluation Strategy (27)
2.3.4Magic Sets Transformation (28)
2.3.5Other Datalog variants (28)
2.4Binary Decision Diagrams (29)
2.4.1BDD Variable Ordering (30)
2.4.2Other Variants (32)
2.5Context Sensitivity (32)
2.5.1De?nitions of Context (33)
2.5.2Techniques for Solving Context-sensitive Analyses (34)
3Pointer Analysis35
3.1Context-Insensitive Pointer Analysis (35)
3.1.1Improving Pointer Analysis with Types (38)
3.2Call Graph Discovery (40)
3.3Context-Sensitive Pointer Analysis (43)
3.3.1Numbering Call Paths (44)
3.3.2Context-Sensitive Pointer Analysis with a Precomputed Call
Graph (48)
3.4Object-Sensitive Pointer Analysis (50)
4bddbddb:BDD-Based Deductive Database53
4.1From Datalog to BDD Operations (54)
4.1.1Relational Algebra (54)
4.1.2Boolean Functions (55)
4.1.3BDD Operations (56)
4.2Translating and Optimizing Datalog Programs (57)
4.2.1Datalog Source Transformations (57)
4.2.2Datalog Rule Optimization (58)
xiii
4.2.3Intermediate Representation (61)
4.2.4IR Optimizations (61)
4.2.5BDD Decision Variable Assignment (63)
4.2.6Additional Optimizations (64)
4.2.7Interpretation (65)
4.2.8Other Considerations (65)
4.3Learning BDD Variable Orderings (66)
4.3.1The Problem Space (67)
4.3.2Problem Characteristics (68)
4.3.3Integrating Learning into bddbddb (69)
4.3.4Algorithm Overview (71)
4.3.5Data Sets (72)
4.3.6Order Constructor (73)
4.3.7Uncertainty Sampler (76)
4.3.8Generating a New Order (77)
4.4Querying bddbddb (78)
4.4.1The Come-from Query (79)
5Applications for Program Analysis83
5.1Queries and Other Analyses (84)
5.1.1Debugging a Memory Leak (84)
5.1.2Heap Connectivity (84)
5.1.3Finding a Security Vulnerability (85)
5.1.4Aliased Parameters (86)
5.1.5Type Re?nement (86)
5.1.6Interprocedural Data Flow (87)
5.1.7Context-Sensitive Mod-Ref Analysis (88)
5.1.8Context-Sensitive Type Analysis (89)
5.1.9Thread Escape Analysis (90)
5.2Pointer Analysis for Re?ection (93)
5.2.1Re?ection Resolution Using Casts (98)
xiv
5.3Static Race Detection (101)
5.3.1Race Detection Algorithm Overview (102)
5.3.2Original-Pairs Computation (102)
5.3.3Reachable-Pairs Computation (104)
5.3.4Aliasing-Pairs Computation (106)
5.3.5Escaping-Pairs Computation (106)
5.3.6Unlocked-Pairs Computation (109)
5.3.7Summary of Static Race Detection Results (112)
5.4Program Analyses using bddbddb (112)
6Experimental Results117
6.1Pointer Analysis Results (117)
6.1.1Methodology (118)
6.1.2Analysis Times (120)
6.1.3Evaluation of Results (122)
6.1.4Experience (123)
6.2E?ectiveness of bddbddb Compilation (126)
6.2.1Comparing Lines of Code (127)
6.2.2Comparing Analysis Times (128)
6.2.3External Lock and SQL Injection Analyses (131)
6.3Re?ection Resolution Results (131)
6.3.1Experimental Setup (131)
6.3.2Evaluation Approach (132)
6.3.3Local Analysis for Re?ection Resolution(Local) (134)
6.3.4Points-to Information for Re?ection Resolution(Points-to).134
6.3.5Casts for Re?ection Resolution(Casts) (136)
6.3.6Achieving a Sound Call Graph Approximation(Sound) (138)
6.3.7E?ect of Re?ection Resolution on Call Graph Size (138)
6.4Results from Machine Learning BDD Variable Orders (140)
6.4.1Methodology (140)
6.4.2Results (142)
xv
7Related Work144
7.1Pointer Analysis (145)
7.1.1Scalable Pointer Analysis (145)
7.1.2Context-sensitive Pointer Analysis (146)
7.1.3BDD-based Pointer Analysis (147)
7.2BDDs for Program Analysis (147)
7.2.1BDDs for Model Checking (148)
7.2.2BDDs for Predicate Abstraction (148)
7.2.3BDDs as a Data Representation (148)
7.3High-level Languages for Program Analysis (149)
7.3.1Constraint Languages (149)
7.3.2User-speci?ed Program Queries (150)
7.3.3Program Analysis with Databases (150)
7.4Optimizing Datalog (152)
7.4.1Datalog Implementations (152)
7.4.2Logic programming with BDDs (153)
7.4.3Datalog Evaluation Strategies (153)
7.5BDD Optimizations (153)
7.5.1BDD Libraries (154)
7.5.2BDD Variable Ordering (154)
7.6Applications of Program Analysis (156)
7.6.1Call Graph Discovery (156)
7.6.2Analyzing Re?ection (158)
7.6.3Finding Program Errors (159)
8Conclusions161
8.1Future Work (162)
8.1.1Other Data Representations (162)
8.1.2Beyond Datalog (163)
8.1.3Integration in the Software Development Process (163)
Bibliography164
xvi
List of Tables
4.1The grouping of the data entries in our training set.Each episode is
tied to a particular rule,rule application,and operation within the
rule.Within each episode the order along with its run time is stored.72
6.1Descriptions of the benchmarks we used to evaluate our pointer analyses.117
6.2Information about the benchmarks we used to test our pointer analyses.118
6.3Analysis times and peak memory usages for each of the benchmarks
and analyses.Time is in seconds and memory is in megabytes (121)
6.4Results of escape analysis (124)
6.5Results of the type re?nement query.Numbers are percentages.
Columns labeled multi and re?ne refer to multi-type variables and
re?nable-type variables,respectively (125)
6.6LOC for hand-coded analyses versus lines of Datalog using bddbddb.127
6.7Comparison of context-insensitive Java pointer analysis runtimes.
Times are in seconds (129)
6.8Comparison of context-sensitive Java pointer analysis runtimes.Times
are in seconds (129)
6.9Comparison of C pointer analysis runtimes.Times are in seconds (129)
6.10External lock analysis runtimes.Times are in seconds (130)
6.11SQL injection query results.Times are in seconds.∞indicates that
the analysis did not?nish (130)
6.12Summary of information about our benchmarks.Applications are
sorted by the number of lines of code in column3 (132)
6.13Results of resolving Class.forName calls for di?erent analysis versions.133
xvii
6.14Number of classes and methods in the call graph for di?erent analysis
versions (139)
6.15Information about the analyses that we used to evaluate our BDD order
?nding algorithm.The four columns of numbers are the number of
rules,relations,and domains in the input Datalog?le,and the number
of possible domain orders (140)
6.16The results of our learning algorithm.The?rst four columns of num-
bers compare the speed of a random order,an order generated with a
sifting algorithm,our best hand-tuned order,and the order output by
the algorithm.∞means that the analysis did not complete because
it ran out of memory.The next four columns give statistics on the
performance of the learning algorithm (142)
6.17A comparison of the run times of our hand-tuned and generated or-
ders for the j pacs and j
List of Figures
2.1An example Datalog program with two minimal solutions (26)
2.2(a)Binary encoding of a relation.(b)and(c)are BDD encodings
of the relation given by(a)with decision variable orders b1,b2,b3and
b2,b1,b3,respectively (30)
3.1Example of path numbering.The graph on the left is the original
graph.Nodes M2and M3are in a cycle and therefore are placed in
one equivalence class.Each edge is marked with path numbers at the
source and target of the edge.The graph on the right is the graph with
all of the paths expanded (45)
3.2The six contexts of function M6in Example1 (45)
4.1(a)Predicate dependency graph for Algorithm1.(b)Breaking the
PDG into SCCs and?nding cycles (58)
4.2The learning algorithm embedded in Datalog resolution (70)
4.3Steps of a learning episode.This corresponds to the contents of the
Active Learner box in Figure4.2 (71)
4.4An example decision tree induced from the training set data in Ta-
ble4.1on rule2 (75)
4.5Datalog program generated by bddbddb to compute come-from query
“vP(100,200):–?”on Algorithm1.The second,third,fourth,
and?fth chunks of rules come from Rules(3.2),(3.4),(3.1),and(3.3)
respectively (82)
xix
5.1Typical use of re?ection to create new objects (94)
5.2A fragment of a speci?cation?le accepted by our system.A string
identifying a call site to Class.forName is mapped to a class name that
that call may resolve to (97)
5.3A case in freetts where our analysis is unable to determine the type
of objects instantiated on line5using casts (101)
6.1Re?ection resolution using points-to results in
javax.xml.transform.FactoryFinder in the JDK (135)
xx
Chapter1
Introduction
1.1Motivation
Program analysis is the process of analyzing the behavior of a computer program. Program analysis has many important applications.It has traditionally been used primarily for optimizing programs so that they will run faster.More recently,program analysis is increasingly being used in tools to aid software development.
Program analysis has the potential to be extremely useful in?nding bugs and security vulnerabilities in software.Research has shown that static analysis can reduce defects by up to a factor of six[136],and that60%of software faults that were found in released software products could have been detected with static analysis tools[34]. 40%of the faults that could be found through static analysis will eventually become a defect in production software[119].
Software bugs are a serious problem.They cost the U.S.economy an estimated $59.5billion per year[178].While not all errors can be removed,more than a third of the cost—an estimated$22.2billion—could be eliminated by an improved infrastructure that includes static analysis tools[178].
Static analysis brings numerous bene?ts to the software development process. Static analysis is attractive as it can catch bugs early in development,before the software is released.A common rule of thumb in software development is that a bug that costs$1to?x before release will cost$100to?x after release.Thus,it is
1
正在阅读:
CONTEXT-SENSITIVE POINTER ANALYSIS USING BINARY DECISION DIA04-25
中考专题测试二次根式及一元二次方程06-06
三一重工有限公司挖掘机营销创新毕业设计论文 - 图文05-27
提前解除房屋租赁合同通知03-03
对商业银行经营战略转型的思考06-05
微机接口技术随堂作业04-19
基于51单片机的数字温度计论文资料05-29
船长实习报告 - 图文05-10
- 1Computational analysis of shrouded wind turbine configurations using a 3-dimensional RANS solver
- 2A novel 3D mesh compression using mesh segmentation with multiple principal plane analysis
- 3A novel 3D mesh compression using mesh segmentation with multiple principal plane analysis
- 4The decision of the New York Philharmonic to hire
- 5Parenting Style as Context An Integrative Model
- 6Error Analysis and Contrastive Analysis
- 7Parenting Style as Context An Integrative Model
- 8drools Decision Table(决策表)
- 9J McCarthy-a logical AI approach to context
- 10NonClinical Dose Formulation Analysis Method Validation and Sample Analysis
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- SENSITIVE
- ANALYSIS
- DECISION
- CONTEXT
- POINTER
- BINARY
- USING
- DIA
- 高中物理第1章电磁感应3楞次定律学业分层测评教科版2
- 九年级上学期数学教学计划模板
- 儿童是天生的哲学家
- 3种网站推广方法操作流程详解
- 贵州省普通高中学生休学复学申请表
- 混凝土泵常见故障及排除技巧
- 城市轨道交通概论总结
- 最新湖北省黄冈市中考语文模拟试题(有配套答案)
- 2015年辽宁省数据分析深入
- 硫酸铜晶体里结晶水含量的测定
- 2019年天津市中考语文试卷(含答案)
- 中小学体育教案表格
- 2010 年信息技术学业水平考试试题
- 幼儿园大班语言教案《猪八戒吃西瓜》
- 二年级语文下册 我是苹果教案 沪教版
- 宿舍美化大赛策划书模板
- 管理层的认定精编版
- +买保险就是对生命的尊重
- 2018年河北师范大学法政与公共管理学院705公共管理学之公共管理
- 2018年床垫行业深度研究报告