FACTOID # 70: Contrary to the popular rhyme, the rain falls mainly on Guinea.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Source lines of code

Source lines of code (SLOC) is a software metric used to measure the amount of code in a software program. SLOC is typically used to estimate the amount of effort that will be required to develop a program, as well as to estimate programming productivity or effort once the software is produced. A software metric is a measure of some property of a piece of software or its specifications. ... It has been suggested that this article or section be merged with Computer program. ... Programming productivity refers to a variety of software development issues and methodologies affecting the quantity and quality of code produced by an individual or team. ...

Contents

Measuring SLOC

Many useful comparisons involve only the order of magnitude of lines of code in a project. Software projects can vary between 100 to 100,000,000 lines of code. Using lines of code to compare a 10,000 line project to a 100,000 line project is far more useful than when comparing a 20,000 line project with a 21,000 line project. While it is debatable exactly how to measure lines of code, wide discrepancies in 2 different measurements should not vary by an order of magnitude. An order of magnitude is the class of scale or magnitude of any amount, where each class contains values of a fixed ratio to the class preceding it. ...


There are two major types of SLOC measures: physical SLOC and logical SLOC. Specific definitions of these two measures vary, but the most common definition of physical SLOC is a count of lines in the text of the program's source code including comment lines. Blank lines are also included unless the lines of code in a section consists of more than 25% blank lines. In this case blank lines in excess of 25% are not counted toward lines of code. Source code (commonly just source or code) is any series of statements written in some human-readable computer programming language. ...


Logical SLOC measures attempt to measure the number of "statements", but their specific definitions are tied to specific computer languages (one simple logical SLOC measure for C-like languages is the number of statement-terminating semicolons). It is much easier to create tools that measure physical SLOC, and physical SLOC definitions are easier to explain. However, physical SLOC measures are sensitive to logically irrelevant formatting and style conventions, while logical SLOC is less sensitive to formatting and style conventions. Unfortunately, SLOC measures are often stated without giving their definition, and logical SLOC can often be significantly different from physical SLOC. C is a general-purpose, procedural, imperative computer programming language developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system. ...


Consider this snippet of C code as an example of the ambiguity encountered when determining SLOC:

 for (i=0; i<100; ++i) printf("hello"); /* How many lines of code is this? */ 

In this example we have:

  • 1 Physical Lines of Code LOC
  • 2 Logical Lines of Code lLOC (for statement and printf statement)
  • 1 Comment Line

Depending on the programmer and/or coding standards, the above "line of code" could be, and usually is, written on many separate lines:

 for (i=0; i<100; ++i) { printf("hello"); } /* Now how many lines of code is this? */ 

In this example we have:

  • 4 Physical Lines of Code LOC (Is placing braces work to be estimated?)
  • 2 Logical Line of Code lLOC (What about all the work writing non-statement lines?)
  • 1 Comment Line (Tools must account for all code and comments regardless of comment placement.)

Even the "logical" and "physical" SLOC values can have a large number of varying definitions. Robert E. Park (while at the Software Engineering Institute) et al. developed a framework for defining SLOC values, to enable people to carefully explain and define the SLOC measure used in a project. For example, most software systems reuse code, and determining which (if any) reused code to include is important when reporting a measure.


Origins of SLOC

At the time that people began using SLOC as a metric, the most commonly used languages, such as FORTRAN and assembler, were line-oriented languages. These languages were developed at the time when punch cards were the main form of data entry for programming. One punch card usually represented one line of code. It was one discrete object that was easily counted. It was the visible output of the programmer so it made sense to managers to count lines of code as a measurement of a programmer's productivity. Today, the most commonly used computer languages allow a lot more leeway for formatting. One line of text no longer necessarily corresponds to one line of code. Fortran (previously FORTRAN[1]) is a general-purpose[2], procedural,[3] imperative programming language that is especially suited to numeric computation and scientific computing. ... An assembly language is a low-level language used in the writing of computer programs. ... The punch card (or Hollerith card) is a recording medium for holding information for use by automated data processing machines. ...


Usage of SLOC measures

SLOC measures are somewhat controversial, particularly in the way that they are sometimes misused. Experiments have repeatedly confirmed that effort is highly correlated with SLOC, that is, programs with larger SLOC values take more time to develop. Thus, SLOC can be very effective in estimating effort. However, functionality is less well correlated with SLOC: skilled developers may be able to develop the same functionality with far less code, so one program with less SLOC may exhibit more functionality than another similar program. In particular, SLOC is a poor productivity measure of individuals, since a developer can develop only a few lines and yet be far more productive in terms of functionality than a developer who ends up creating more lines (and generally spending more effort). Good developers may merge multiple code modules into a single module, improving the system yet appearing to have negative productivity because they remove code. Also, especially skilled developers tend to be assigned the most difficult tasks, and thus may sometimes appear less "productive" than other developers on a task by this measure.


SLOC is particularly ineffective at comparing programs written in different languages unless adjustment factors are applied to normalize languages. Various computer languages balance brevity and clarity in different ways; as an extreme example, most assembly languages would require hundreds of lines of code to perform the same task as a few characters in APL. The following example shows a comparison of a "Hello World" program written in C, and the same program written in COBOL - a language known for being particularly verbose. The term computer language is a more expansive and alternate term for the more commonly-used term programming language. ... An assembly language is a low-level language used in the writing of computer programs. ... APL (for A Programming Language) is an array programming language based on a notation invented in 1957 by Kenneth E. Iverson while at Harvard University. ... C is a general-purpose, procedural, imperative computer programming language developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system. ... COBOL is a third-generation programming language, and one of the oldest programming languages still in active use. ...

C COBOL
 #include <stdio.h> int main(void) { printf("Hello World"); return 0; } 

000100 IDENTIFICATION DIVISION.
000200 PROGRAM-ID. HELLOWORLD.
000300
000400*
000500 ENVIRONMENT DIVISION.
000600 CONFIGURATION SECTION.
000700 SOURCE-COMPUTER. RM-COBOL.
000800 OBJECT-COMPUTER. RM-COBOL.
000900
001000 DATA DIVISION.
001100 FILE SECTION.
001200
100000 PROCEDURE DIVISION.
100100
100200 MAIN-LOGIC SECTION.
100300 BEGIN.
100400 DISPLAY " " LINE 1 POSITION 1 ERASE EOS.
100500 DISPLAY "Hello world!" LINE 15 POSITION 10.
100600 STOP RUN.
100700 MAIN-LOGIC-EXIT.
100800 EXIT.

Lines of code: 5
(excluding whitespace)
Lines of code: 17
(excluding whitespace)

Another increasingly common problem in comparing SLOC metrics is the difference between auto-generated and hand-written code. Modern software tools often have the capability to auto-generate enormous amounts of code with a few clicks of a mouse. For instance, GUI builders automatically generate all the source code for a GUI object simply by dragging an icon onto a workspace. The work involved in creating this code cannot reasonably be compared to the work necessary to write a device driver, for instance. By the same token, a hand-coded custom GUI class could easily be more demanding than a simple device driver; hence the shortcoming of this metric. In computer programming, widget toolkits (or GUI toolkits) are sets of basic building units for graphical user interfaces. ... Various widgets. ...


There are several cost, schedule, and effort estimation models which use SLOC as an input parameter, including the widely-used Constructive Cost Model (COCOMO) series of models by Barry Boehm et al, PRICE Systems True S and Galorath's SEER-SEM. While these models have shown good predictive power, they are only as good as the estimates (particularly the SLOC estimates) fed to them. Many have advocated the use of function points instead of SLOC as a measure of functionality, but since function points are highly correlated to SLOC (and cannot be automatically measured) this is not a universally held view. This article or section is in need of attention from an expert on the subject. ... Barry W. Boehm is known for many contributions to software engineering. ... SEER-SEM the System Evaluation and Estimation of Resources - Software Estimating Model is a software project estimation model widely used within defense, military / aerospace, government, Information Technology (banking, finance, insurance and other enterprises) worldwide. ... Function points are objective measures of the size of computer applications and the projects that build them. ...


According to Andrew Tanenbaum, the SLOC values for various operating systems in Microsoft's Windows NT product line are as follows: Andrew S. Tanenbaum Andrew Stuart Andy Tanenbaum (born 1944) is the head of Department of Computer Systems, Vrije Universiteit, Netherlands. ...

Year Operating System SLOC (Million)
1993 Windows NT 3.1 6[citation needed]
1994 Windows NT 3.5 10[citation needed]
1996 Windows NT 4.0 16[citation needed]
2000 Windows 2000 29[citation needed]
2001 Windows XP 40[citation needed]
2005 Windows Vista Beta 2 50[citation needed]

David A. Wheeler studied the Red Hat distribution of the GNU/Linux operating system, and reported that Red Hat Linux version 7.1 (released April 2001) contained over 30 million physical SLOC. He also determined that, had it been developed by conventional proprietary means, it would have required about 8,000 person-years of development effort and would have cost over $1 billion (in year 2000 U.S. dollars). Red Hat, Inc. ... It has been suggested that Criticism of Linux be merged into this article or section. ...


A similar study was later made of Debian GNU/Linux version 2.2 (also known as "Potato"); this version of GNU/Linux was originally released in August 2000. This study found that Debian GNU/Linux 2.2 included over 55 million SLOC, and if developed in a conventional proprietary way would have required 14,005 person-years and cost $1.9 billion USD to develop. Later runs of the tools used report that the following release of Debian had 104 million SLOC, and as of year 2005, the newest release is going to include over 213 million SLOC. Debian is a project based around the development of a free, complete operating system through the collaboration of volunteers from around the world. ... 2005 is a common year starting on Saturday of the Gregorian calendar. ...


One can find figures of major operating systems (the various Windows version have been presented in a table above)

Operating System SLOC (Million)
Red Hat Linux 6.2 17[citation needed]
Red Hat Linux 7.1 30[citation needed]
Debian 2.2 55-59[1][2]
Debian 3.0 104[2]
Debian 3.1 215[2]
Sun Solaris 7.5[citation needed]
Mac OS X 10.4 86[3]
Linux kernel 2.6.0 6.0[citation needed]

In comparison, below are figures for various applications. Red Hat Linux was a popular Linux distribution assembled by Red Hat until the early 2000s, when it was discontinued. ... Debian is a project based around the development of a free, complete operating system through the collaboration of volunteers from around the world. ... Solaris is a computer operating system developed by Sun Microsystems. ... Mac OS X (official IPA pronunciation: ) is a line of proprietary, graphical operating systems developed, marketed, and sold by Apple Inc. ... To meet Wikipedias quality standards, this article or section can be improved by converting lengthy lists to text. ...

Graphics Program SLOC (Million)
OpenOffice.org ~10[citation needed]
Blender 2.42 ~1[citation needed]
GIMP v2.3.8 0.65[citation needed]
Paint.NET 3.0 0.13[citation needed]

OpenOffice. ... It has been suggested that Suzanne (Blender primitive) be merged into this article or section. ... The GNU Image Manipulation Program, or GIMP, is a raster graphics editor application with some support for vector graphics. ... Paint. ...

SLOC and relation to security faults

The central enemy of reliability is complexity.

Geer et al

A number of experts have claimed a relationship between the number of lines of code in a program and the number of bugs that it contains. This relationship is not simple, since the number of errors per line of code varies greatly according to the language used, the type of quality assurance processes, and level of testing, but it does appear to exist. More importantly, the number of bugs in a program has been directly related to the number of security faults that are likely to be found in the program.


This has had a number of important implications for system security and these can be seen reflected in operating system design. Firstly, more complex systems are likely to be more insecure simply due to the greater number of lines of code needed to develop them. For this reason, security focused systems such as OpenBSD grow much more slowly than other systems such as Windows and Linux. A second idea, taken up in both OpenBSD and many Linux variants, is that separating code into different sections which run with different security environments (with or without special privileges, for example) ensures that the most security critical segments are small and carefully audited. OpenBSD is a freely available Unix-like computer operating system descended from Berkeley Software Distribution (BSD), a Unix derivative developed at the University of California, Berkeley. ... Microsoft Windows is the name of several families of proprietary software operating systems by Microsoft. ... It has been suggested that Criticism of Linux be merged into this article or section. ... A software code audit is a comprehensive analysis of source code in a programming project with the intent of discovering bugs, security breaches or violations of programming conventions. ...

Image File history File links Broom_icon. ...

Advantages

(a) Scope for Automation of Counting: Since Line of Code is a physical entity; manual counting effort can be easily eliminated by automating the counting process. Small utilities may be developed for counting the LOC in a program. However, a code counting utility developed for a specific language cannot be used for other languages due to the syntactical and structural differences among languages.


(b) An Intuitive Metric: Line of Code serves as an intuitive metric for measuring the size of software due to the fact that it can be seen and the effect of it can be visualized. Function Point is more of an objective metric which cannot be imagined as being a physical entity, it exists only in the logical space. This way, LOC comes in handy to express the size of software among programmers with low levels of experience.


Disadvantages

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.

Bill Gates William Henry Gates III (born October 28, 1955) is an American entrepreneur and the co-founder, chairman, former chief software architect, and former CEO of Microsoft, the worlds largest software company. ...

(a) Lack of Accountability: Lines of code measure suffers from some fundamental problems. Some think it isn't useful to measure the productivity of a project using only results from the coding phase, which usually accounts for only 30% to 35% of the overall effort.


(b) Lack of Cohesion with Functionality: Though experiments have repeatedly confirmed that effort is highly correlated with LOC, functionality is less well correlated with LOC. That is, skilled developers may be able to develop the same functionality with far less code, so one program with less LOC may exhibit more functionality than another similar program. In particular, LOC is a poor productivity measure of individuals, since a developer can develop only a few lines and still be more productive than a developer creating more lines of code.


(c) Adverse Impact on Estimation: As a consequence of the fact presented under point (a), estimates done based on lines of code can adversely go wrong, in all possibility.


(d) Developer’s Experience: Implementation of a specific logic differs based on the level of experience of the developer. Hence, number of lines of code differs from person to person. An experienced developer may implement certain functionality in fewer lines of code than another developer of relatively less experience does, though they use the same language.


(e) Difference in Languages: Consider two applications that provide the same functionality (screens, reports, databases). One of the applications is written in C++ and the other application written in a language like COBOL. The number of function points would be exactly the same, but aspects of the application would be different. The lines of code needed to develop the application would certainly not be the same. As a consequence, the amount of effort required to develop the application would be different (hours per function point). Unlike Lines of Code, the number of Function Points will remain constant.


(f) Advent of GUI Tools: With the advent of GUI-based languages/tools such as Visual Basic, much of development work is done by drag-and-drops and a few mouse clicks, where the programmer virtually writes no piece of code, most of the time. It is not possible to account for the code that is automatically generated in this case. This difference invites huge variations in productivity and other metrics with respect to different languages, making the Lines of Code more and more irrelevant in the context of GUI-based languages/tools, which are prominent in the present software development arena.


(g) Problems with Multiple Languages: In today’s software scenario, software is often developed in more than one language. Very often, a number of languages are employed depending on the complexity and requirements. Tracking and reporting of productivity and defect rates poses a serious problem in this case since defects cannot be attributed to a particular language subsequent to integration of the system. Function Point stands out to be the best measure of size in this case.


(h) Lack of Counting Standards: There is no standard definition of what a line of code is. Do comments count? Are data declarations included? What happens if a statement extends over several lines? – These are the questions that often arise. Though organizations like SEI and IEEE have published some guidelines in an attempt to standardize counting, it is difficult to put these into practice especially in the face of newer and newer languages being introduced every year.


Related terms

KLOC: 1000 lines of code


KDLOC: 1000 delivered lines of code


KSLOC: 1000 source lines of code


MLOC: 1000000 lines of code


GLOC: 1000000000 lines of code


TLOC: 1000000000000 lines of code


Programs for counting lines of code

There are many applications available for programmatically counting lines of code within source. Requirements for a source code metric tool should include the ability to process many source code languages and be operating system independent. Companies that use one tool for C on Windows and another tool for C on UNIX and a third tool for Java on Linux do not develop a common estimation basis for their CMMI metrics.


Free

  • The simplest source line counting command in UNIX variants is wc. For example, to count the number of lines in all .cxx, .cpp, .h, and .c files in and below the current directory, use:
 find . -regex '.*.(c|h|cxx|cpp)' -print0 | xargs -0 cat | wc -l 
  • JCounter is a free open source Java program designed to be extensible and provide many statistics of source files. It is able to support any language or change to logical line counts by implementing a simple interface. Currently it supports Java and C++. JCounter can be found at the JCounter website
  • LocMetrics is a free Windows tool for counting lines of C#, Java, or C++ code.
  • K-LOC Calculator is another free Windows tool for counting physical lines.
  • SLOCCount, a tool available under GPL for counting lines of code for more than two dozen programming languages
  • CCCC, another tool available under GPL which analyzes C++ and Java files and generates a report on various metrics of the code. Metrics supported include lines of code, McCabe's complexity and metrics proposed by Chidamber&Kemerer and Henry&Kafura.

The GNU logo For other uses of GPL, see GPL (disambiguation). ... The GNU logo For other uses of GPL, see GPL (disambiguation). ...

Commercial

  • EZ-Metrix is a commercial web-based source code counting utility that measures more than 75 different languages, and compares two file lists to quantify differences (i.e., new, modified, deleted, unmodified).
  • Resource Standard Metrics is a commercial tool designed to process ANSI C, ANSI C++, C#, and Java 2.0+ while operating on Windows, UNIX, Linux and Mac OS X.
  • A Windows program for counting KLOCs in source files is available at [1].
  • Another program for Windows, Code Counter Pro, which counts physical KLOCs and supports languages like C, C++, C#, Java, Cobol, Delphi, VB, ASP, PHP and Fortran.

Microsoft Windows is the name of several families of proprietary software operating systems by Microsoft. ...

Cultural references

In the PBS documentary Triumph of the Nerds, Microsoft executive Steve Ballmer criticized the use of counting lines of code: Not to be confused with Public Broadcasting Services in Malta. ... Triumph of the Nerds: The Rise of Accidental Empires is a documentary film written and hosted by Robert X. Cringely. ... Steven Anthony Ballmer (born March 24, 1956 in Detroit, Michigan) is an American businessman and has been the chief executive officer of Microsoft Corporation since January 2000. ...

In IBM there's a religion in software that says you have to count K-LOCs, and a K-LOC is a thousand line of code. How big a project is it? Oh, it's sort of a 10K-LOC project. This is a 20K-LOCer. And this is 50K-LOCs. And IBM wanted to sort of make it the religion about how we got paid. How much money we made off OS/2, how much they did. How many K-LOCs did you do? And we kept trying to convince them - hey, if we have - a developer's got a good idea and he can get something done in 4K-LOCs instead of 20K-LOCs, should we make less money? Because he's made something smaller and faster, less K-LOC. K-LOCs, K-LOCs, that's the methodology. Ugh! Anyway, that always makes my back just crinkle up at the thought of the whole thing. To meet Wikipedias quality standards, this article or section may require cleanup. ...

References

  1. ^ González-Barahona, Jesús M., Miguel A. Ortuño Pérez, Pedro de las Heras Quirós, José Centeno González, and Vicente Matellán Olivera. Counting potatoes: the size of Debian 2.2. debian.org. Retrieved on 2003-08-12.
  2. ^ a b c Robles, Gregorio. Debian Counting. Retrieved on 2007-02-16.
  3. ^ Jobs, Steve (August 2006). Live from WWDC 2006: Steve Jobs Keynote. Retrieved on 2007-02-16. “86 million lines of source code that was ported to run on an entirely new architecture with zero hiccups.”

2003 (MMIII) was a common year starting on Wednesday of the Gregorian calendar. ... August 12 is the 224th day of the year (225th in leap years) in the Gregorian calendar. ... 2007 (MMVII) is the current year, a common year starting on Monday of the Gregorian calendar and the CE era. ... February 16 is the 47th day of the year in the Gregorian calendar. ... For the Manfred Mann album, see 2006 (album). ... 2007 (MMVII) is the current year, a common year starting on Monday of the Gregorian calendar and the CE era. ... February 16 is the 47th day of the year in the Gregorian calendar. ...

External links


  Results from FactBites:
 
NationMaster - Encyclopedia: Source lines of code (3085 words)
SLOC is typically used to estimate the amount of effort that will be required to develop a program, as well as to estimate productivity or effort once the software is produced.
Logical SLOC measures attempt to measure the number of "statements", but their specific definitions are tied to specific computer languages (one simple logical SLOC measure for C-like languages is the number of statement-terminating semicolons).
For example, if lines of code is used to judge performance, then employees will write as many separate lines of code as possible, and if they find a way to shorten their code, they may not use it.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.