计算机科学 ›› 2019, Vol. 46 ›› Issue (10): 141-147.doi: 10.11896/jsjkx.180801573
张凌浩1, 桂盛霖2,3, 穆逢君2, 王胜1
ZHANG Ling-hao1, GUI Sheng-lin2,3, MU Feng-jun2, WANG Sheng1
摘要: 如何发现代码克隆,是软件维护和软件侵权纠纷案件中的一个关键问题。由于商业保密等原因,在商业软件的侵权纠纷案中往往无法使用基于源代码比对的克隆检测技术。因此,针对这类无法获得源代码进行代码克隆检测的场景,文中提出一种针对二进制可执行文件分析的代码克隆检测方法。首先,通过反编译与指令类型抽象得到二进制可执行目标文件的指令类型序列;然后,对指令类型序列构建后缀树,利用后缀树的性质获取函数级的指令序列间的克隆信息,并通过消除沙砾指令进一步提高检测性能;最后,基于MIPS32指令集,使用Linux 内核和经过混淆处理的代码分别作为克隆级别0-级别2与级别1-级别4的二进制可执行文件代码克隆测试样本,并与源代码检测工具进行对比测试。结果表明,所提算法在缺少源代码的场景下同样能进行细粒度的克隆分析,且对各级代码克隆均具有较好的检测性能。
中图分类号:
[1]United States Court of Appeals for the Federal Circuit:JACOBSENV.KATZER [A/OL].(2018-03-27)[2018-07-07].http://www.cafc.uscourts.gov/sites/default/files/opinions-orders/08-1001.pdf. [2]LV H.Computer software copyright infringement judge[D]. Lanzhou:Lanzhou University,2015.(in Chinese) 吕浩.计算机软件著作权侵权的判定研究[D].兰州:兰州大学,2015. [3]DONG H,TANG Q.Review of Jacobsen V.Katzer Case:The Influence of the Differences Between Chinese and American Copyright Systems on the Nature of Open Source Agreements and Its Enlightenment[J].China Copyright,2009,2009(3):48-51.(in Chinese) 董皓,唐磬.Jacobsen V.Katzer案评述:中美版权制度差异对开源协议性质的影响及启示[J].中国版权,2009,2009(3):48-51. [4]SU Q,WU W M,LI Z L,et al.Research and Application of Chaos Opaque Predicate in Code Obfuscation[J].Computer Science,2013,40(6):155-159.(in Chinese) 苏庆,吴伟民,李忠良,李景樑,陈为德.混沌不透明谓词在代码混淆中的研究与应用[J].计算机科学,2013,40(6):155-159. [5]ROY,CHANCHAL K,CORDY,et al.Comparison and evaluation of code clone detection techniques and tools:A qualitative approach[J].Science of Computer Programming,2009,74(7):470-495. [6]RATTAN D,BHATIA R,SINGH M.Software clone detection:A systematic review[J].Information & Software Technology,2013,55(7):1165-1199. [7]SU X H,ZHANG F L.A Survey for Management Oriented Code Clone Research[J].Chinese Journal of Computers,2018,41(3):628-651.(in Chinese) 苏小红,张凡龙.面向管理的克隆代码研究综述[J].计算机学报,2018,41(3):628-651. [8]CHEN H,GUO T,CUI B J,et al.Comparison and analysis on binary file similarity detection technique[C]//Proceedings of Conference on Vulnerability Analysis and Risk.Beijing:China Information Technology Security Evalution Center,2011. [9]JOHNSON J H.Substring matching for clone detection and change tracking[C]//Proceedings of International Conference on Software Maintenance,1994.New York:IEEE,1994:120-126. [10]KAMIYA T,KUSUMOTO S,INOUE K.CCFinder:A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code[J].IEEE Transactions on Software Engineering,2002,28(7):654-670. [11]SAINI V,SAJNANI H,KIM J,et al.SourcererCC and SourcererCC-I:tools to detect clones in batch mode and during software development[C]// Proceedings of the 38th Proceedings of International Conference on Software Engineering Companion,2016.New York:IEEE,2016:597-600. [12]SUDHAMANI M,RANGARAJAN L.Code clone detection based on order and content of control statements[C]//Procee-dings of International Conference on Contemporary Computing &Informatics.New York:IEEE,2017:59-64. [13]WANG M,WANG P,XU Y.CCSharp:An Efficient Three-Phase Code Clone Detector Using Modified PDGs[C]//Procee-dings of Asia-pacific Software Engineering Conference.New York:IEEE,2018:100-109. [14]ROY C K,CORDY J R.NICAD:Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization[C]//Proceedings of IEEE International Confe-rence on Program Comprehension.Washington DC:IEEE Computer Society,2008:172-181. [15]WHITE M,TUFANO M,VENDOME C,et al.Deep learning code fragments for code clone detection[C]//Proceedings of IEEE/ACM International Conference on Automated Software Engineering.NewYork:IEEE,2016:87-98. [16]RAGKHITWETSAGUL C,KRINKE J,MARNETTE B.A picture is worth a thousand words:Code clone detection based on image similarity[C]//Proceedings of International Workshop on Software Clones.NewYork:IEEE,2018:44-50. [17]SÆBJØRNSEN A,WILLCOCK J,PANAS T,et al.Detecting code clones in binary executables[C]//Proceedings of Eighteenth International Symposium on Software Testing and Ana-lysis.New York:ACM,2009:117-128. [18]DONG M,ZHUANG H,ZHANG R,et al.A New Method of Software Clone Detection Based on Binary Instruction Structure Analysis[C]//Proceedings of International Conference on Wireless Communications.NewYork:IEEE,2013:1-4. [19]HU Y K,ZHANG Y Y,LI J R,et al.Binary code clone detection across architectures and compiling configurations[C]//Proceedings of International Conference on Program Comprehension.New York:IEEE,2017:88-98. [20]TIS Committee.Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specification[S/OL].http://refspecs.linuxbase.org/elf/elf.pdf. [21]MIPS32® Instruction Set Quick Reference [S/OL].https://www.mips.com/?do-download=mips32-instruction-set-quick-reference-v1-01. [22]MCCREIGHT E M.A Space-Economical Suffix Tree Construction Algorithm[J].Journal of the Acm,1976,23(2):262-272. [23]UKKONEN E.On-line construction of suffix trees[J].Algorithmica,1995,14(3):249-260. [24]rswier c4:C in four functions[CP/OL].(2017-08-22) [2018-08-22].https://github.com/rswier/c4.git. [25]C/C++ Obfuscator[EB/OL].(2018-08-09)[2018-08-09]http://stunnix.com/prod/cxxo/. [26]UDDIN M S,ROY C K,SCHNEIDER K A.SimCad:An extensible and faster clone detection tool for large scale software systems[C]//Proceedings of International Conference on Program Comprehension.NewYork:IEEE,2013:236-238. |
[1] | 陈钧吾, 余华山. 面向无尺度图的Δ-stepping算法改进策略 Strategies for Improving Δ-stepping Algorithm on Scale-free Graphs 计算机科学, 2022, 49(6A): 594-600. https://doi.org/10.11896/jsjkx.210400062 |
[2] | 鄂海红, 张田宇, 宋美娜. 基于Web的数据可视化图表渲染优化方法 Web-based Data Visualization Chart Rendering Optimization Method 计算机科学, 2021, 48(3): 119-123. https://doi.org/10.11896/jsjkx.200600038 |
[3] | 张晓, 张思蒙, 石佳, 董聪, 李战怀. Ceph分布式存储系统性能优化技术研究综述 Review on Performance Optimization of Ceph Distributed Storage System 计算机科学, 2021, 48(2): 1-12. https://doi.org/10.11896/jsjkx.201000149 |
[4] | 乐乔艺, 刘建勋, 孙晓平, 张祥平. 代码克隆检测研究进展综述 Survey of Research Progress of Code Clone Detection 计算机科学, 2021, 48(11A): 509-522. https://doi.org/10.11896/jsjkx.210300310 |
[5] | 徐江峰谭玉龙. 基于机器学习的HBase配置参数优化研究 Research on HBase Configuration Parameter Optimization Based on Machine Learning 计算机科学, 2020, 47(6A): 474-479. https://doi.org/10.11896/JsJkx.190900046 |
[6] | 张丹,罗平. 代码相似性检测方法与工具综述 Survey of Code Similarity Detection Methods and Tools 计算机科学, 2020, 47(3): 5-10. https://doi.org/10.11896/jsjkx.190500148 |
[7] | 张彭奕, 宋杰. 区块链共识算法效能优化研究进展 Research Advance on Efficiency Optimization of Blockchain Consensus Algorithms 计算机科学, 2020, 47(12): 296-303. https://doi.org/10.11896/jsjkx.200700020 |
[8] | 徐传福,王曦,刘舒,陈世钊,林玉. 基于Python的大规模高性能LBM多相流模拟 Large-scale High-performance Lattice Boltzmann Multi-phase Flow Simulations Based on Python 计算机科学, 2020, 47(1): 17-23. https://doi.org/10.11896/jsjkx.190500009 |
[9] | 徐启泽, 韩文廷, 陈俊仕, 安虹. 众核平台上广度优先搜索算法的优化 Optimization of Breadth-first Search Algorithm Based on Many-core Platform 计算机科学, 2019, 46(1): 314-319. https://doi.org/10.11896/j.issn.1002-137X.2019.01.049 |
[10] | 孙涛, 张俊星. SDN性能优化技术研究综述 Review of SDN Performance Optimization Technology 计算机科学, 2018, 45(11A): 84-91. |
[11] | 邹凌君,陈崚,戴彩艳. 基于广义后缀树的二分网络社区挖掘算法 Detecting Community from Bipartite Network Based on Generalized Suffix Tree 计算机科学, 2017, 44(7): 221-226. https://doi.org/10.11896/j.issn.1002-137X.2017.07.039 |
[12] | 孙志龙,沙行勉,诸葛晴凤,陈咸彰,吴剀劼. 面向内存文件系统的数据一致性更新机制研究 Research on Data Consistency for In-memory File Systems 计算机科学, 2017, 44(2): 222-227. https://doi.org/10.11896/j.issn.1002-137X.2017.02.036 |
[13] | 倪友聪,李松,叶鹏,杜欣. 基于随机搜索规则的软件体系结构层性能演化优化方法 Random Search Rule Based Performance Evolutionary Optimization Method at Software Architecture Level 计算机科学, 2017, 44(11): 156-163. https://doi.org/10.11896/j.issn.1002-137X.2017.11.023 |
[14] | 赵利伟,陈咸彰,诸葛晴凤. 连接操作在SIMFS和EXT4上的性能比较 Performance Comparison of Join Operations on SIMFS and EXT4 计算机科学, 2016, 43(6): 184-187. https://doi.org/10.11896/j.issn.1002-137X.2016.06.037 |
[15] | 柯叶青,马志柔,伍海江,刘 杰. 一种简历语义搜索系统的实现方法 SmartHR:A Resume Query and Management System Based on Semantic Web 计算机科学, 2015, 42(12): 56-59. |
|