Inferring Method Specifications from Natural Language API Descriptions

  • Rahul Pandita [Department of Computer Science, North Carolina State University, Raleigh, USA]
  • Xusheng Xiao [Department of Computer Science, North Carolina State University, Raleigh, USA]
  • Stephan Oney [Human-Computer Interaction Institute, Carnegie Mellon University, Pittsburgh, USA]
  • Hao Zhong [Laboratory for Internet Software Technologies, Institute of Software, Chinese Academy of Sciences, Beijing, China]
  • Tao Xie [Department of Computer Science, North Carolina State University, Raleigh, USA]
  • Amit Paradkar [I.B.M. T. J. Watson Research Center, Hawthorne, NY, USA]

Application Program Interface (API) documents are a typical and the most-common way of describing specifications of reusable software libraries, thus facilitating reuse. However, even with such documents, developers often overlook the information provided in the documents and build software systems that misuse reused libraries. Since API documents are written in natural language, existing tools cannot verify the method specifications described in a library's API documents of a library, against the code using that library. On the other hand, formal specifications (such as code contracts) provide a formal representation of these specifications, which can be verified by existing testing and verification tools. However, in practice, most libraries do not come with code contracts, thus hindering tool-based verification. To address this issue, we propose a novel approach to infer formal specifications from natural language texts of API documents. Our evaluation results show that our approach has an average of 92% precison and 93% recall in indentfying sentences describing code contracts from over 2500 sentences of API documents. Furthermore, our results show that our approach has an average 83.4% accuracy in infering specifications from over 1600 sentnces describing code contracts.

This work is supported in part by NSF grants CCF-0845272, CCF-0915400, CNS-0958235, and ARO grant W911NF-08-1-0443. Hao Zhong is sponsored by the National Science Foundation of China under Grant No. 61100071.

Artifacts

  • Implementation (Refer README.txt to get a list of dependencies.)
  • Samples of Noun List
    • true
    • false
    • void
    • null
    • &&
    • ||
    • empty
    • underscores
    • abstract
    • constructor
    • class
    • interface
  • Samples of Jargon List
    • min [minimum]
    • min. [minimum]
    • max [maximum]
    • max. [maximum]
    • && [and]
    • (..) [two periods]
    • (.) [single period]
    • i.e. [that is]
    • eg. [example]
  • Samples of Synonym List
    • be[=]
    • larger[greater]
    • smaller[lesser]
    • equal[=]

Disclaimer: "The material located at this site is not endorsed, sponsored or provided by or on behalf of North Carolina State University."