An empirical analysis of economic returns to open source participation, I Hann, J Roberts, S Slaughter

Tags: contributors, software development, the Apache, participation, human capital, contributor, rank, open source projects, labor economics, developer, Apache Portable Runtime, Apache, ASF members, Apache Software Foundation, committee member, project management, Software Engineer, software source code, programmers, productive capacity, open source community, Eric Raymond, open source movement, open source project, Apache web server, development projects, source code contributions, project committee, open source programmers, committee members, ASF Board, Apache project, open source software, signaling, null hypothesis, signal, software domains, serial correlation, software defects, Senior Software Engineer, Carnegie Mellon University, Journal of Political Economy, Apache community
Content: AN EMPIRICAL ANALYSIS OF ECONOMIC RETURNS TO OPEN SOURCE PARTICIPATION* IL-HORN HANN JEFF ROBERTS SANDRA SLAUGHTER ROY FIELDING Relying on volunteer labor, open source projects like the Apache web server create commercial quality software. Why developers contribute freely without direct remuneration has been widely debated. We offer empirical evidence that such participation can be explained by existing theories in labor economics. Analyzing panel data covering a four-year period, we find that increases in human capital, measured as project contribution, do not lead to increased wages. In contrast, credentials earned through a merit-based ranking system are associated with significantly increased wages. Our results suggest that status within an open source meritocracy operates as a credible signal of productive capacity. * We thank the open source programmers who have contributed to this study. We also thank the participants of the session on "Economics of Open Source Software" at the 2004 Annual Meeting of the American Economic Association in San Diego, the participants of the conference "Open Source Software: Economics, Law and Policy" in Toulouse, France organized by Institut d'Economie Industrielle (IDEI) and the Center for Economic Policy Research (CEPR), and Rebecca Hann for their valuable comments. This research has been generously supported by a Faculty Development Grant from Carnegie Mellon University and a Doctoral Student Research Grant from the Carnegie Bosch Institute at Carnegie Mellon University. 1
I. INTRODUCTION Open source software, i.e., public software development projects where participants can read, modify, and redistribute the software source code [OSI 2001], is arguably one of the most exciting phenomena in the information technology industry today. Open source programmers have contributed industrialstrength software that has been recognized as a viable competitor in several product domains, including operating systems and web-server software. The open source movement has proliferated to other software domains, ranging from software Development Tools to office applications, and even computer games. One widely debated question is why open source programmers contribute voluntarily, thereby foregoing any direct remuneration that they could accrue while working on a commercial system. Often quoted individual level motivations for participating in open source development projects cover a broad spectrum including scratching a "personal itch" with respect to software functionality, enjoyment, and desire to be "part of a team" [Ghosh 1998; Raymond 1999a; O'Reilly 2000]. Others liken the open source community to a gift culture where the status of a participant depends on "what he gives away" [Raymond 1999b]. More recently, Lerner and Tirole [2002] opined that open source participation can, in part, be explained using economic theories. They argue that open source participation yields two types of rewards: immediate rewards that ensue from the increase in productivity (less the opportunity cost of time), and delayed rewards relating to various career concerns such as one's future marketability. For the latter, a participant motivated by career concerns has incentive to signal his or her abilities to the labor market. This signaling incentive is likely to be stronger when performance is more visible to the relevant audience, and performance is informative about talent [Lerner and Tirole 2002]. Such an incentive is particularly relevant in the information technology industry for two reasons. First, programming is often viewed as more of an art than a skill [Weinberg 1998]. Good programming is not confined to learning the syntax and specific features of a programming language and the practice of good documentation. 2
Productive programmers are believed to have a certain aptitude that allows them to proceed logically from problem to solution, and in the process derive the most efficient and general software design possible. Subsequently, he or she has to take the lead in propagating the software design to co-workers and in sharing sufficient insights such that the co-workers in turn can be productive. Hence, it has been documented in the software engineering literature that the productivity of a "star" programmer is an order of magnitude greater than that of an average programmer.1 Second, the inability to formalize characteristics of highly productive programmers makes the programming process very difficult to evaluate. Not surprisingly, it often proves to be a challenge for employers to evaluate the majority of programmers [Kirsch 1996].2,3 In addition to signaling imperfectly observable abilities, participating in open source projects has the potential to increase a contributor's human capital. In open source projects, contributors can select both the problem they want to attack and the implementation approach or solution. Once implementation is complete, other contributors provide timely feedback on the solution, ranging from identification of software defects (bugs) to suggestions on how to improve the submitted software code [Raymond 1999a].4 Hence, contributing to open source projects can be seen as learning experiences that increase the programmer's knowledge. Inasmuch as this knowledge is transferable, open source participation increases the contributor's human capital. In this paper we empirically investigate whether open source participation is consistent with theories in labor economics. As we have noted, contributing to open source projects can potentially be beneficial to contributors in two ways: (i) participation enhances existing skills or provides opportunities to gain new experience and hence makes contributors more valuable to employers and (ii) participation signals contributors' imperfectly observable productive characteristics to their employers. In order to measure the first benefit, we proxy the knowledge gained by contributors directly from the volume of their software source code submissions. To measure the second benefit, we exploit a unique setting of a specific open source project (the Apache Software Foundation) that ranks its members based on merit.5 3
From a signaling perspective, we maintain that certain abilities such as software design understanding and project leadership skills are endowments that are often difficult to evaluate. However, an open source community affords a venue in which such abilities can be discerned by highly skilled peers and rewarded with a higher rank. Using panel data collected on open source contributors and a fixed-effect specification of the standard wage equation to isolate contributors' time invariant qualities, we distinguish the learning effect and the signaling effect from time invariant characteristics such as intelligence in explaining participation. We find that, in the context of the Apache open source projects, greater open source experience, as measured in contributions made, does not result in wage increases for contributors. This suggests that employers do not reward the gain in experience through open source participation as an increase in human capital. On the other hand, achieving a higher status in the merit-based ranking within the Apache open source community is associated with a 13-27% increase in wages, depending on the rank attained. Our results are consistent with the notion that a high rank within the Apache Software Foundation is a credible signal of the productive capacity of a programmer. In the next section we present the rationale for open source participation. Section 3 describes our data setting and sources, the Apache projects, as well as the organization that governs the development process, the Apache Software Foundation. In Section 4, we develop the model used to estimate the delayed returns on open source participation and present the results. We conclude in Section 5. II. EXPLAINING OPEN SOURCE PARTICIPATION Initial explanations for the motivations of programmers to contribute to open source projects have been based on either increased use value of the software or concepts deriving from social psychological or cultural motivations. Eric Raymond, an evangelist of the open source movement, popularized social psychological or cultural explanations of open source participation. In the cultural view, the open source community's truly valuable and protect worthy property is the ownership of ideas or programming projects. Given the abundance of resources, i.e., computing power, bandwidth, and disk space, social 4
status is determined not by what one has, but by what one gives away. This leads to a gift culture in which the reputation of a programmer is primarily determined by his or her free contributions [Raymond 1999b]. As a second explanation of participation, Raymond offers a "craftsmanship" model where the artisan aspects of programming motivate developers to create works to be admired not only by themselves but also by others. In both situations developers are motivated through the recognition of their contributions by their peers. These explanations find theoretical support in social psychology [Mauss 1967; Clary et al. 1998]. Economists have offered alternative explanations based on labor economics. Two possible views can be articulated: a human capital explanation and a signaling or sorting explanation. Human capital explanations for the value of open source participation are straightforward: participation allows developers to gain marketable technical skills [Becker 1962; Blaug 1976]. An explanation for open source participation consistent with human capital theory would maintain that open source participation is an investment in training that leads to higher earnings in the future. As an investment, the choice to participate depends upon two considerations. First, the individual considers the opportunity cost associated with participation, and second, the individual considers the expected earnings in the job market after participation. Human capital theory predicts that the greater the investment, the greater the return. Therefore, higher earnings should be correlated with higher levels of open source participation. While attainment of a skill may be an important result of participation, proponents of a signaling or sorting theory of labor markets argue that participation serves as a signal of imperfectly observable productive capacities to current and future employers [Weiss 1995]. Given a distribution of inherent productivity among potential open source participants, the more productive developers would like to signal their superior productivity to employers [Spence 1973]. As we noted in the introduction, this is especially applicable in the context of software development productivity. One study of "star" programmers, for instance, found that the top 1 percent were 1,272 percent more efficient than the average [Goleman 1998]. At the same time, due to the nature of programming activities, it might be 5
difficult for a programmer to convey fully his or her productive capacities. Programming, as a task, requires significant autonomy and creativity where the behaviors that transform inputs to outputs may not be well understood by management [Kirsch 1996]. While it might be relatively easy to identify the "star" programmers, it is much more difficult to identify above average programmers who have a good understanding of the problem and often develop an efficient solution for the problem at hand. Further, the level of code contributions per se might not be the best indicator of productive capacity. Open source projects represent very large-scale, distributed development projects involving thousands of contributions from hundreds of developers [Mockus et al. 2000; O'Reilly 2000]. High ability contributors typically make many submissions to the code base, but it could be the depth of their understanding, the efficient design of the solution, and their ability to persuade, to get people "on board" with their ideas and strategies that represent the true quality of their contribution [Moon and Sproull 2002; von Krogh et al. 2003]. While possible, as a practical matter it is difficult for employers to efficiently evaluate these qualities based on individual code contributions. It seems reasonable then that employers seek a reliable proxy that is correlated with these desirable characteristics indicative of or obtained through successful open source participation. If potential employers can use open source participation as a signaling mechanism, then the existence of a "credential" or observable measure of successful participation would allow them to make inferences about a developer's productive capacity. In so far as open source participation is correlated with some desirable trait such as ability or motivation, it can be used by either employers to screen potential employees or by applicants to signal these desirable traits. It is important to point out that some of these economic and "non-economic" explanations are overlapping. For example, a desire for a higher status among peers may be a strong incentive to contribute while serving as a signal to employers. However, as observed by Lerner and Tirole [2002], explaining participation by solely social or cultural factors remains a puzzle for several reasons. First, one could expect to reap similar benefits as part of a commercial software development effort obviating the need to participate in an open source project. Second, it is not clear why such noble behavior would 6
be limited to the field of software development. Moreover, a separation of these motives is, for our purposes, not necessary. As Spence [1974] states: "A signal is a manipulable attribute or activity which conveys information ... in general it is not necessary to insist that the actor, in manipulating the attribute, think of himself as signaling or conveying information." III. DATA SETTING AND SOURCES Our data set combines several primary Data sources including archival data from large open source software projects, and two targeted surveys of open source participants. We briefly describe the setting of the data collection, each data source, and our measures of key variables, with supplementary details in the Appendix. A. Context: Apache Software Foundation Open Source Software Projects We investigated three major open source projects under the control of the Apache Software Foundation (ASF). The ASF projects enjoy wide acceptance both in the marketplace and the open source software development community. The Apache web-server and its derivatives maintain a dominant 64% share of the web-server market [Netcraft 2003]. Similarly, the ASF projects consistently attract and retain the large number of participants vital for open source project success [von Krogh et al. 2003]. For example, for the years 1998 through 2002 the three ASF projects in this study incorporated over 100,000 changes from more than 1,300 different open source developers. The ASF is a not-for-profit corporation that provides the legal, organizational and financial infrastructure for the software projects gathered under the ASF open-source umbrella. Each of the ASF projects operates autonomously including all aspects of product development. ASF projects are characterized by a "collaborative, consensus-based development process, an open and pragmatic software license, and a desire to create high quality software that leads the way in its field" [Apache 2002]. Membership in the ASF is by invitation only and is based on a strict meritocracy. The ASF encompasses a number of subprojects related to the development of a full-featured webserver product offering. We studied the largest and most significant of these projects including the 7
Apache server project which is a freely available source code implementation of an HTTP (Web) server and is the project around which the Apache Group initially formed; the Jakarta project which consists of more than 18 Apache related server side Java subprojects; and the XML project which includes over nine Apache XML related subprojects. The Apache context is particularly well suited for an examination of economic returns to participation in open source development. A common characteristic of open source projects is the presence of a strong project leader [Raymond 1999a]. Apache, however, is unique among open source projects in this regard. Since its inception the Apache project has operated under a model of Shared leadership and responsibility. This model of shared responsibility is reflected in the principles of the meritocracy that define advancement within the ASF [Fielding 1999]. As a meritocracy, status, responsibility, and benefits are commensurate with contribution. There are five observable levels of recognition or rank within the ASF. In order of increasing status, these are developer, committer, project management committee member, ASF member, and ASF board member. As expected within a merit-based structure, the number of participants at any status level decreases with rank. In all cases, advancement is in recognition of an individual's commitment and contributions to an Apache project. This hierarchy within the ASF makes the Apache projects particularly appropriate for an evaluation of open source participation. As observed by Tyler, et al. [2000], data for identifying economic returns to a variable serving as a signal in labor markets should contain exogenous variation in the signal status among individuals with similar levels of human capital. Participants in ASF projects possess such a variable or credential ­ their rank within the ASF. Anecdotal evidence suggests that both open source participants and information technology labor markets value information regarding productive capacity revealed via open source participation. First, personal conversations between the authors and several participants of successful open source projects6 revealed the participants' awareness of career related impacts of open source participation. In addition, resumes of Apache contributors prominently mention declarations of contributions, specific technical 8
accomplishments and, most interestingly, significant Apache project management responsibilities, as well as status as an ASF member or ASF board member.7 Also listed in the Appendix are several representative job postings requiring Apache specific experience. It is important to note that software development involves skills that can often be transferred to other software projects [Basili and Caldiera 1995; Statz 1999]. If employers are aware that successful open source participation is correlated with some desirable, and somewhat fungible, trait(s) then a reliable "signal" of success such as rank within the ASF could serve as a sorting mechanism in the labor markets. Again, anecdotal evidence from recruiters suggests that this is the case (see Appendix). Individual reasons for initial participation in any Apache project vary. Typical reasons cited include reporting a problem or "bug", or fixing a problem in the software that has become a nuisance or impairs usage. Another reason is to extend existing functionality or to add new features required by the user or the user's organization. For many contributors there is a single encounter with the project. Some developers, however, choose a deeper level of involvement and continue to make contributions. If developers' contributions are significant and consistent over a period of time they may be nominated for an increase in rank from developer to "committer." This promotion is an important advancement within the Apache community; it signifies that this contributor has obtained the privilege to submit code changes directly to the source code repository. All participants of rank developer have to submit their code for review by a committer before it is accepted. An existing ASF member may nominate committers who continue their involvement in the project for ASF membership. ASF membership is largely a matter of recognition and carries with it a certain prestige in the Apache community [Fielding 1999]. ASF members are eligible to be nominated by the ASF Board of Directors or to serve on a project committee. Project committee members are responsible for all aspects of managing an Apache subproject including project plans and roadmaps, release schedules, etc. The ASF Board of Directors makes decisions regarding corporate governance as well as decisions regarding the addition of new projects under the ASF organizational umbrella. 9
B. Data Source: Archival Data on Apache Participants' Rank and Contributions One of the basic tenets of open source software is that the development process and resulting products are "open" and freely available. Fundamentally, these projects represent large-scale publicly distributed software development processes. As such, and in keeping with free and open access, all open source work products are placed in the public domain under various "free software" licensing arrangements. Apart from the source and binary codes of the actual programs, Apache products include developer web sites, change logs, documentation, and developer communications in the form of email archives. From these products, we extracted two types of information: information pertaining to each individual's progression along the Apache career path, and information about each individual's source code contributions to the project. We discuss each in turn. Our primary interest is the construction of an Apache career path for each contributor. Our objective is to capture upward progression within the five levels of the ASF meritocracy, i.e., developer, committer, project management committee member, ASF member, and ASF board member. Before actually joining an open source project, potential developers typically observe the "lay of the land" regarding both the form and substance of project membership [von Krogh et al. 2003]. In our study, a participant's first source code contribution is considered a consummation of the joining decision, signaling entry into the meritocracy and the beginning of one's Apache career. During the time period prior to making an actual contribution, any latent contributors are considered to be outside the Apache meritocracy. To determine each individual's Apache career progression we used archival data from three sources: the Apache developer web site, contribution meta-data from the Concurrent Versioning System (CVS) revision control software8, and minutes from the Apache Board of Directors meetings. Each Apache subproject maintains a separate developer website that includes a list of contributors and project management committee members. By observing changes to these files over time we are able to construct a time line for the promotion of individuals within each project. Progression up the Apache career ladder is captured as a series of discrete transitions. The first transition is from latent contributor to an Apache developer. The second transition occurs when a developer is granted commit access to the project source code 10
archives and thus achieves the status of committer.9 Both of these changes in rank are derived from CVS meta-data. We observe initial contributions, and hence when developer status begins. Similarly, the transition from developer to committer occurs when contributors first exercise their newly acquired CVS commit privileges. Elevation from committer to the rank of project management committee member may occur either as a "field promotion" via a consensus among existing committee members or by appointment from the ASF Board. Attaining ASF membership requires nomination by an existing ASF member and election by secret ballot. New members are announced each year at the annual Board of Directors meeting. Thus, in our panel, the transition to either project management committee member or to ASF member occurs on the date of the announcement. Lastly, the rank of ASF board member is achieved via election by existing ASF members. Transition to this rank occurs on the date of recognition of new board members as recorded in board meeting minutes. To extract information regarding individual contributions from the data, we developed tools to mine contribution information at the level of the individual developer. A submission to an open source project is known as a "patch." Patches are analogous to modification requests in traditional software development environments. Unlike modification requests in traditional environments, however, patches result from largely random developer submissions and have no formal designation or means of tracking. Our research follows the method used by Mockus et al. [2000] to reconstruct patches from source code archives. For each patch we extracted and retained common software metrics including lines of code added and deleted, the date of submission, the names and number of source code files affected by the change, change log entries, and the list of patch authors. We constructed a longitudinal data set of participant contributions by year. The longitudinal contribution data encompassed contributions made and accepted into any of our three target Apache projects. Data collection was completed in January 2003 and included all contributions from January 1, 1998 through December 31, 2002. 11
C. Data Source: Survey Data on Apache Participants' Demography and Job History To augment the longitudinal contribution data set outlined above, we collected demographic and job history data in two waves. Two secure web-based surveys of Apache contributors were conducted for this purpose. Of primary interest in each survey was the respondent's wages for the current and prior year. Dr. Roy Fielding, the then chairman of the ASF, introduced the first survey to 1,301 uniquely identified contributors via e-mail in November 2001. Two hundred thirty-three e-mail invitations were undeliverable. Of the remaining 1,068 contributors, 325 completed the instrument, yielding a response rate of 30%. The second wave involved the 237 respondents from the first survey who agreed to participate in another round of data collection. The second survey was introduced via e-mail in January 2003. Eleven e-mail invitations were undeliverable. Of the remaining 226 contributors, 122 completed the instrument yielding a response rate of 54%. As discussed earlier, ASF projects are globally distributed software development projects. As such, our sample includes wages reported in 25 different currencies. Information technology labor market characteristics can vary widely between countries [Schreyer and Pilat 2002]. An econometric model that accounts for these cross-national differences would be difficult to implement requiring a number of currency and purchasing power related transformations. In addition, wage changes due to moves from one country to another pose another difficulty. Accordingly, we do not attempt to account for crossnational differences here. Rather, we limit our analysis to respondents who earned income in U.S. dollars. Lastly, a temporal relationship between open source participation and wages would prescribe a model involving lagged independent variables. Given this functional form, we retain only those crosssections where both the dependent and independent variables result from a common labor market experience, viz. the U.S. Applying the above constraints yields a cross-sectional time series panel of 147 cross-sections (individual respondents) each having at least two years of reported wages for a total of 360 observations for any of the years 1999 through 2002. 12
Participant rank within the ASF plays a critical role in the analysis and interpretation of results that follow. To discern whether the distribution of rank in our sample is comparable to that of the population, we performed several non-parametric tests of location, scale and empirical distribution. Specifically, we employed the Kolmogorov-Smirnov and Mann-Whitney U tests to compare Apache rank in our sample to that of the Apache contributor population. Both tests evaluate the hypothesis that the rank of respondents and non-respondents are drawn from the same underlying population. The results of these tests suggest that these two groups are indeed drawn from the same underlying population, as we fail to reject this hypothesis (Mann-Whitney statistic=10,527 (p = .14); Kolmogorov-Smirnov statistic=.62 (p=.83)). In addition, we also compared the Apache career path of participants in our sample with the Apache career paths of the overall Apache population. Table I shows the observed patterns of rank progression for our respondents over the period covered by the panel. Table II shows all observed patterns of ASF advancement for the overall Apache population. Of the 147 respondents in our sample, a majority (119) reported wages for at least one time period prior to their involvement with the Apache project. Indeed, the most commonly occurring pattern (49 occurrences) is one where respondents begin their Apache careers in the final observation period. Further, it is quite common for an open source contributor to "plateau" at the rank of developer. Thus, the majority of respondents make very infrequent contributions to the project perhaps in response to issues that directly affect their work.10 The remaining roughly 20% of our respondents display contribution patterns demonstrating varying degrees of promotion within the meritocracy. Of these remaining respondents, 20 plateau at the rank of committer. The remaining 12 attain the rank of project committee member or higher. A similar distribution of advancement patterns can be observed from the 1,301 contributors in the overall Apache population. *** insert here ­ Table I ­ Patterns of Rank Progression in Sample *** *** insert here ­ Table II ­ Patterns of Rank Progression in Population *** 13
We compared the distribution of rank patterns over the period covered by our panel for our respondents and the population of Apache participants. We tested the equivalence of the location and scale of rank patterns across these two groups using the Mann-Whitney U test. In addition, we tested whether the distribution of rank patterns is the same across these groups using the Kolmogorov-Smirnov empirical distribution function statistic. In both cases, the results indicate that the rank patterns of our respondents and non-respondents are drawn from the same underlying population. (Mann-Whitney statistic=90,418 (p = .27); Kolmogorov-Smirnov statistic=.76 (p=.62)) Lastly, we examined whether there is selectivity bias in our sample. Inference based on balanced or unbalanced panel data may be subject to bias if the non-response within the panel is endogenously determined [Heckman 1979]. We checked for selectivity bias using the variable addition tests outlined in Verbeek and Nijman [1992]. Following Verbeek and Nijman, we introduced three variables, none of which should enter the model with significant coefficients under the null hypothesis of no selectivity bias. The first variable is the number of observation waves in which respondent i participates. The second is a dummy variable equal to 1 if the respondent i has observations in all waves. The third is a time varying dummy variable indicating whether the respondent i is observed in the previous period. The results for these tests showed no indication of selectivity bias in our sample as none of three test variable coefficients significantly enter the model. IV. EMPIRICAL METHODOLOGY AND RESULTS By combining the contribution data extracted from project archives with the job history data obtained from our surveys into a single panel, we can explore the relationship between open source participation and the change in wages over time. Our approach is to employ econometric models that take advantage of our repeated measures data to control for time-invariant participant endowments. A. Model and Measures Because we are interested in the nature of the relationship between open source participation and the market for information technology labor, the human capital model provides a natural structure for 14
assessing the returns to open source software participation. Accordingly, we formulate essentially Mincerian wage models traditionally used to test the impact of education on log-earnings [Mincer 1974]. The basic models specify the dependent variable as log wages, and the independent variables include observed demographic characteristics such as schooling, experience, and other variables of interest to the researcher [Weiss 1995]. The general model is of the form:
(1)
LWAGEi = 0 + 1Si + 2EXPi + 3EXPSQi + i.
where LWAGEi is the natural logarithm of wages for individual i, Si represents years of schooling, EXPi and EXPSQi represent labor market experience, and i is the standard disturbance term. Subsequently, this model has been extended to explore numerous other factors hypothesized to affect the wage relationship among individuals. Such extensions include gender [Kay and Hagan 1995], race [Belman and Heywood 1991], union status [Jakubson 1991], use of technology [Benjamin et al. 2002], individual social network factors [Pfeffer and Konrad 1991], military service [Goldberg and Warner 1987], academic course work [Kang and Bishop 1986], GED attainment [Murnane et al. 1999], and volunteerism [Day and Devlin 1998]. Following the literature in this regard, we extend the human capital model to include variables related to open source participation such as the participant's project contributions, Apache rank, job history, and other background information. In our setting, total wage is a function of accumulated Apache contributions, rank within the Apache Software Foundation, accumulated work experience, programming skills, education, firm size, firm type (publicly listed or private), firm industry, and job switch. Columns one and two of Table III list our model variables and descriptions.
*** insert here ­ Table III ­ Model Variables and Descriptive statistics ***
15
The dependent variable, LWAGE, is the natural logarithm of the sum of each participant's annual wages and bonuses.11 To account for inflation, each year's wages are expressed in constant 1998 U.S. dollars. CNTRB is a measure of each participant's open source experience in terms of project contributions. In the human capital model, career experience is commonly measured as the length of time, typically in years, spent participating in one's vocation. In contrast, open source participation is a voluntary activity, and can be transitory or sporadic. If the relationship between open source participation and wages has a "job training" or human capital explanation, it is important that we find a reliable proxy for the experience garnered through open source participation. Learning by doing is increasingly recognized as an important factor influencing the relationship between experience and productivity or success in software development [Orlikowski 2002; Boh et al. 2003]. With experience, developers gain familiarity with the software application domain [Banker et al. 1998] and increase their understanding of the structure and architecture of the modules, files, and code within the system [Robillard 1999]. The experience in software development (the "doing") largely consists of authoring the software; that is writing lines of software code using a particular language.12 The number of lines of software code written or changed is a commonly used productivity metric in software development organizations [Humphrey 1995; Boehm et al. 2000]. Thus, we operationalize open source experience (CNTRB) as a participant's cumulative number of lines of code contributed and accepted by the Apache project.13 If CNTRB is a good proxy for the learning experience of an open source developer, we expect CNTRB to be positively correlated with LWAGE. The dichotomous variables NORANK, DEV, COM, and PMC+ (collectively referred to as RANK) operationalize the observed levels of contributor rank naturally occurring within the Apache meritocracy, that is, latent contributor, developer, committer, and project management committee member or above, respectively. Promotion within the meritocracy is awarded after a positive peer review of one's tangible and intangible contributions to the project. RANK may then, in part, reflect sought after (yet hard to 16
observe) traits valued by information technology labor markets, such as the depth of developers' understanding, their efficient designs, or their ability to persuade, to get people "on board" with their ideas and strategies. If Apache RANK is a signal of productive capacity in the open source environment, we would expect our RANK variables to be positively correlated with LWAGE. The variable PDAPC is a qualitative variable that assumes the value of 1 when a participant's paying job in time period t involves contributing to any one of the Apache projects. Increasingly, companies are sponsoring employees to participate in open source projects that are seen as an integral part of the company's information technology strategy [IBM 1998]. Expectations regarding this variable are unclear. If the act of participation in an open source project is a desirable (i.e., valuable) outcome, then participants may be willing to forgo some amount of earnings for the opportunity to work on the project. In this case, PDAPC would be negative and significant. If, on the other hand, open source experience represents a specialized or rare skill, then employers could be expected to pay a premium for such skills. In this case, PDAPC would be positive and significant. EXPR and LEDU are the traditional human capital variables. EXPR is the total number of years of work experience of a contributor at time t-1. Consistent with the human capital literature, we expect wages to increase with work experience, but the percentage increase to decline with higher work experience. Thus we expect EXPR to be positively correlated with wages, and EXSQ to be negatively correlated with wages. LEDU is represents the number of years of schooling for a participant at time t-1. Education is typically represented as time invariant in studies of human capital accumulation; however, the presence of students in our sample makes it possible to infer accurate levels of schooling within subject by tracking a respondent's declaration of full-time student status within each observation period. Schooling is often the variable of primary interest in studies of human capital, and returns to schooling are expected to be positive. STDNT, FPUB, and FSWIN are qualitative variables that assume the value of 1 if the participant is, respectively, a full-time student, works for a publicly traded firm, or works for a firm operating in the 17
software or e-commerce industry in period t and is 0 otherwise. Students are frequent contributors to open source software projects [Lakhami et al. 2002] and, ceteris paribus, we expect students to earn low wages. As a result, we expect STDNT to be negatively correlated with LWAGE. Firm characteristics, such as firm size and sector, have been shown to significantly affect the earnings of software developers [Ang et al. 2002]. Following the prior research, we expect that participants working in publicly traded firms and firms engaged in the production of software to have higher wages, all else equal. Lastly, the TIMEn variables are dichotomous controls representing the observation period in which we observe the dependent variable. These time variables capture any systematic changes in our data attributable to general economic conditions. As shown in Table III, a majority of respondents (on average 69%) classify their occupation as technical in nature such as "Software Engineer/Developer" and are employed by firms working in the software or ecommerce industries. Overall, data regarding participants' WAGE and EXPR are comparable to published accounts of salary and experience for software developers [Computerworld 2002] with participants reporting an average annual salary of $80,000 and six years of industry experience. As is common in technology industries [Watson 2000], we observe frequent job switching behavior over the course of our panel with the high of 36% of respondents reporting that they switched jobs in 1998. Respondents are well educated with over 50% holding college degrees, 23% holding masters degrees, and 9% of respondents holding Ph.D.s. The number of developers at the various levels of rank remains fairly constant over the period of our panel. As expected, the largest number of participants can be found at the rank of developer, the entry level or base of the ASF meritocracy, while the higher status levels of rank (COM and PMC+) contain fewer members. B. Estimation and Results A common concern in the human capital literature is the potential correlation of some unobserved person-specific variable, say ui, with one or more of the regressors. If we assume that ui contains some time-invariant heritable characteristic, such as intelligence, and to the extent that ui is correlated with one 18
of the other regressors, both OLS and GLS will yield biased and inconsistent parameter estimates [Chamberlain 1984]. In the present case, our concerns are focused on accounting for unobserved skill or quality differences across contributors. Potential quality differences include inherent programming and design capabilities, the ability to succinctly explain complex technical issues, or the ability to selfmotivate and work in an unstructured, often chaotic, environment. Measures of work or programming experience may not adequately reflect such skills. Ideally, one would like to directly control for such individual effects, and indeed this is a goal of many empirical studies of human capital [Taubman and Wales 1973]. If, however, we assume that such abilities are rooted in the individual, and thus constant over time, then a fixed-effect (FE) model solves the omitted variables problem. By differencing away time-invariant variables, whether observed or unobserved, the FE model produces consistent parameter estimates, purged of heritable individual effects. Hence, we make use of our cross-sectional time-series (panel) data to fit a FE regression model to explore the relationship between open source participation and wages over time. We estimate the returns to open source participation using the following equation:
(2)
LWAGEi,t = i + 1CNTRBi,t-1 + 2DEVi,t-1 + 3COMi,t-1 + 4PMC+i,t-1 + 5EXPRi,t-1 +
6EXSQi,t-1 + 7LEDUi,t-1 + 8JSWCHi,t + 9FPUBi,t + 10FSWINi,t + 11STDNTi,t +
12PDAPCi,t + 13-15(TIMEn)i,t + i,t
(i = 1, . . .N; t = 1 . . .T);
where i represents cross-section i observed at time t. The individual effect i is assumed to be an estimable cross-section specific constant term. Equation (2) is essentially a two-way specification along dimensions of rank and time. The results of estimating equation (2) using the FE estimator are presented in Table IV. Column 1 shows the parameter estimates, standard errors and significance levels for the FE estimates. To test the applicability of the FE estimator in our case, we conduct an omnibus test of the null hypothesis that all i are equal to zero. This check is easily rejected (F=6.91, p<.001) indicating the appropriateness of the FE estimator given our data. 19
*** insert here ­ Table IV ­ Regression Results *** In evaluating our results, we first examine the impact of those variables in our model that are unique to our open source software setting. The coefficient for CNTRB (1=-.0001, p=.07) while significant at the 10% level, is for all practical purposes, zero, with a value of less than .001. Alternative specifications of CNTRB, such as the percent of overall contributions submitted, the median absolute deviation from the median, the number of Standard deviations from the mean, and the amount of time spent working on Apache projects also failed to yield a substantive change in the coefficient. Our interpretation of this result is that open source project experience, expressed as cumulative contributions is not, per se, associated with an increase in wages. Given the size and complexity of ASF projects and the use of relatively unsophisticated software development tools and methods, it is not hard to imagine that employers would find it difficult to judge, first hand, the merit of an open source job candidate's contributions.14 The relationship between our measures of Apache rank or status within the project, however, tell a different story. The coefficient of the rank variable, DEV, is negative but not significant (2=-.055, p=.20). Recall from Tables I and II that the careers of Apache contributors progress in different ways and at different rates. Indeed, for 3 out of 4 observation periods, the wages earned by latent contributors (i.e., those having no rank) are not statistically different from that of those who achieve rank DEV ­ the first rung of the Apache career ladder. While the Apache meritocracy, and consequently our conceptualization of rank, is certainly ordinal, it is not interval. As a practical matter, the requirements to move from NORANK to DEV are quite low in our formulation requiring only a single contribution, regardless of the size or significance of the contribution. In other words, the minimum threshold for attaining rank DEV, a single contribution, is not significantly different from no contributions at all and hence may not provide any additional insights into the productive capacity of the respondents at that rank. 20
Turning to our remaining measures of rank, we find that the coefficients for COM and PMC+ are positive and significant (3=.132, p=.04; and 4=.257, p=.01, respectively). We can calculate the percentage change in LWAGE associated with respondents having rank COM as 100·(e(.1320) ­ 1) = 14.11%.15 Similarly, the percentage change in LWAGE associated with respondents having rank PMC+ is 29.32%. That is, after controlling for open source experience, education, work experience, job switch, and firm characteristics, and latent individual effects via the FE estimators, respondents having an Apache rank of COM enjoy wages that are, on average, 14.11% higher that those having no rank at all. Likewise, respondents having an Apache rank of PMC+ enjoy wages that are, on average, 29.32% higher than respondents having no rank. Also note that the difference between the COM and PMC+ coefficients is significantly different from zero (4-3= .125, p < .001). A respondent at rank PMC+ enjoys wages that are on average 13.3% greater than that respondents have a rank of COM. It is of interest to note that Apache participants can be quite generous with the number of contributions they make to the Apache projects. As shown in Table III, for each contribution occurrence, the average CNTRB value across all ranks is 25. More interestingly, we find that CNTRB increases significantly with each increase in rank; with highly ranked participants contributing an annual average CNTRB value of nearly 200 lines of code. Specifically, respondents at rank PMC+ contributed significantly more contributions (CNTRB) than respondents at rank DEV or COM. Recall from the previous discussion that moving from rank DEV to higher levels of rank is associated with significant increases in wages while measures of contribution hold little explanatory power for observed wages exhibiting significant, but negligible, coefficient values. One plausible interpretation of this finding is that teasing apart the relationship between contributions and success in an open source project is a difficult task. Credentials or proxies of success, on the other hand, appear to effectively convey information relating open source participation with desirable software engineering skills. Taken together, these results suggest that employers do not appear to reward participants for their learning experience in the open source projects operationalized as CNTRB. However, the significantly higher wages paid to contributors with higher 21
RANK is consistent with the notion that the RANK conveys sought-after, but typically hard-to-observe, characteristics that may distinguish above average programmers. The last variable in our model that is unique to our open source software setting is PDAPC. While respondents who were paid to develop Apache software were observed in every observation period, there is no significant relationship in our sample between being a paid Apache developer and LWAGE. Recall that there are conflicting expectations regarding the relationship between PDAPC and LWAGE. In our sample, it appears that employers pay neither a premium nor a discount on wages to employees who are compensated for their Apache participation. We now turn our attention to those variables in our model that are commonly included in models of human capital. The coefficients of the control variables for experience are consistent in both sign and magnitude with the existing literature. Each year of EXPR significantly increases LWAGE by 7%, but with increasing work experience, increases in LWAGE are growing more slowly (as EXSQ is negative and significant). Recall that in our formulation, LEDU only varies within the cross-section for students. Given our use of FE (within-subject) estimators, this limited variation in LEDU results in a negative but nonsignificant coefficient in our sample. Surprisingly, JSWCH is not significant given both the wellestablished positive relationship between job switching and wage increases [Schafer 2003] and the amount of job switching behavior observed in our sample. It should be noted, however, that our data collection took place during a sharp decline in the information technology job market following the "dot com" shake-out. Given the decreases in overall information technology employment and the concomitant decreases in information technology hiring during this period, lateral job moves or even moves involving a reduction in wages could logically be expected. Substantial time period influences in our sample are also suggested by the significance of the observation period dummy variables. We find no significant evidence that firm level factors are associated with higher LWAGE. Wages of respondents working for publicly held firms and those working for private firms are not significantly different. Likewise, average wages for respondents working for firms engaged in the production of 22
software, and those who do not, are not significantly different. Lastly, consistent with expectations, we find that being a student is significantly and negatively associated with LWAGE as students earn 16.7% less than non-students. C. Alternative Estimations and Considerations As Wooldridge [2002] observes, a primary motivation for using panel data is to solve the problem of omitted variables that may be correlated with other regressors in the model. We have made use of the FE estimator to address potential issues related to unobserved individual quality traits that may be correlated with other regressors of interest such as RANK or CNTRB.16 The FE specification is no panacea, however. The FE specification treats differences between respondents as parametric shifts of the regression function and thus limits the applicability of FE estimates to out-of-sample prediction. In contrast, the random effects (RE) specification views individual-specific constant terms as randomly distributed across cross-sectional units and is appropriate if the sampled cross-sectional units are drawn from a large population. While we cannot reject the FE model using the omnibus test for the significance of all i, neither can we reject the RE model based on standard tests for random effects. To test the appropriateness of the RE model, we conducted a Hausman specification test for random effects [Hausman 1978]. Under the null hypothesis of no correlation between the latent individual effect, ui and the other regressors, failure to reject the null implies that the GLS estimates are consistent and efficient, and that the OLS estimates are consistent and inefficient. The Hausman specification test for our RE model results in 2(d.f.=15) mstatistic of 6.01 with a corresponding p-value of 0.98. Therefore we fail to reject the null hypothesis of no correlation between ui and the other regressors and thus accept the appropriateness of the RE estimator for our data. We investigated a RE specification of our model, estimating our wage equation using a maximum likelihood RE estimator. The results from our RE estimation are shown in column 2 of Table IV and can be compared with those from our FE estimation in column 1. If the RE and FE estimates differ 23
significantly, we may conclude the presence of a significant latent individual effect. If, on the hand, the RE and FE estimates do not differ significantly (an empirical Hausman test of sorts), then we are justified in rejecting claims that such latent effects exist. Note that while the FE estimates are manifestly less efficient than the RE estimates, there is still significant agreement between the two methods on the sign, magnitude, and significance of each coefficient in our model. The uniformity of the RE and FE estimates is even more pronounced among our measures of open source participation with essentially no difference between the RE and FE estimates for CNTRB and RANK. One potential threat to our results is that Apache RANK may be endogenously determined. That is, if open source participants observe that rank is associated with higher wages, they may increase their "investment" in the project in order to attain higher rank and hence higher wages. First, we note that by linking our dependent variable to prior (lagged) values of rank we have diminished the possibility that rank is endogenously determined [Greene 2003]. Even so, we investigate the possibility that rank is endogenously determined using an instrumental variables (IV) estimation of our model. Investigations into correlations between the FE residuals and candidate instruments reveal that median CNTRB within RANK is highly and significantly correlated with the corresponding level of RANK. At the same time, this variable is not significantly correlated with the dependent variable, LWAGE, making it a suitable instrument for levels of rank. Column 4 of Table IV shows the FE estimates from a 2SLS-IV regression using the instrument just described. As expected, the IV coefficients are less precisely estimated; however, coefficient estimates remain very close to original FE regression results. A Hausman test comparing the two models fails to reject the null hypothesis that the difference in coefficients is not systematic and thus provides some support for the exogeneity of RANK. Davidson and MacKinnon [1993]; however, note that the Hausman test is not properly interpreted as a direct test of exogeneity. Wooldridge [2002, pg 285] suggests a test for strict exogeneity using the FE regression model. Following Wooldridge, we estimate our normal wage equation with the addition of the leading values of the rank variables suspected of violating the assumption of strict exogeneity. The Wooldridge test fails to reject 24
the null hypothesis that all such leads of our rank variables are zero (p=0.731) providing additional evidence that RANK is indeed exogenous. In both the FE and RE specifications of our model, the it are assumed to be independently distributed across individuals with no restrictions placed on the form of the within-subject autocorrelations. As observed by Arellano [1987], this formulation allows for heteroskedasticity and serial correlation of an arbitrary form. Indeed, a common challenge in using repeated measures on individual units to elucidate economic relationships is the presence of serial correlation in the error terms [Bhargava et al. 1982]. Test statistics for a first-order autoregressive process in unbalanced panels possess complex distributional properties [Baltagi and Wu 1999]. As a result, establishing critical values is computationally obtuse. As is often the case, however, examination of the test statistic itself is revealing. With a value of 2.51, the Baltagi-Wu LBI17 test statistic for our panel would almost certainly reject the null hypothesis of no serial correlation.18 Additionally, examination of the unstructured within-subject residual correlation matrix reveals that the serial correlation of residuals decreases as the time lag increases, consistent with an AR(1) process. Indeed, when we compute the likelihood ratio comparing a fully unstructured within-subject residual covariance model to an AR(1) model, we reject the null hypothesis of the superiority of the full model at a 1% significance level (likelihood ratio = 20.3 ~ 2d.f.=7)19 in favor of the more parsimonious autoregressive model. Accordingly, we re-estimate our RE model specifying an AR(1) within-subject correlation structure. Results are shown in column 5 of Table IV. Again, considering our variables of primary interest there is little or no impact regarding the relationship between CNTRB, RANK and LWAGE. While CNTRB is slightly more precisely estimated, the parameter estimate is relatively unchanged exhibiting negligible impact on LWAGE. Likewise, the parameter estimates and standard errors for our RANK variables show slight changes, however both COM and PMC+ retain significance at the 5% and 1% levels, respectively. Our primary interest is the relationship between open source participation and wages. To this point our regressions have included observations for participants from time periods prior to the start of their 25
Apache career (i.e., for latent contributors). As a check of the robustness of our results, we constructed a new panel comprised solely of observations where participants have acquired at least a RANK of DEV. Applying this constraint yields a cross-sectional time-series panel of 222 observations for the years 1999 through 2002. Table V shows the FE and RE parameter estimates for this new panel. Overall, the results are nearly identical to previous estimates. Although less precisely estimated, the coefficients on our RANK variables remain essentially unchanged. *** insert here ­ Table V ­ Regression Results Excluding Latent Contributors *** V. CONCLUSION The research presented here seeks to explore one of the more puzzling aspects of the open source phenomenon ­ why do developers participate? Specifically, we explore whether participation is consistent with well-established theories from labor economics. From this literature, we establish two plausible theoretical bases for the existence of returns to open source participation; viz., human capital theory and signaling theory. A human capital explanation of participation suggests that open source experience serves a "job training" function. In contrast, signaling theory suggests that successful open source participation serves a signaling or sorting function for IT labor markets. Our analysis shows that employers do not reward the accumulation of experience in open source projects per se. Rather, successful open source participation, measured as higher open source rank, is associated with higher wages, even after controlling for work and programming experience. This finding is robust: related specifications for CONTRB as well as other model specifications yield similar results. That wages do not increase with contributions to the Apache Project is consistent with the notion that employers find it difficult to assess the performance of programmers and hence changes in their human capital. It follows that employers would have even greater difficulty evaluating open source contributions in order to assess performance even though the source code is freely available. Even so, open source 26
participation absent accompanying increases in rank may yet hold career advancement potential. Inasmuch as a contributor can apply his or her gained knowledge on the job, the programmer may be rewarded in the long run. Our findings suggest that in the case of the Apache Project, the open source community effectively screens programmers based on their productive capacity. Employers appear to recognize Apache's merit based ranking as a reliable proxy that is correlated with desirable, but imperfectly observably productive abilities. Our research contributes by providing empirical evidence on economic incentives for open source participation. Understanding the incentive structure is a critical first step in evaluating open source as a viable model for organizations seeking to exploit the obvious competitive advantages of "costless" open source development. As Strasser [2001] observes, questions about incentives are not merely academic: understanding how open source development actually works has profound implications for the strategies that corporations and governments should pursue over time. Providing insight into the economic incentive mechanisms underlying open source participation is a significant first step in this process. 27
APPENDIX ­ ANECDOTAL EVIDENCE OF OSS LABOR MARKET SIGNALING A. Excerpts from Selected Apache Contributor Resumes20 · Participant A11A1 ... I am also a member of the Apache Software Foundation. I have contributed to the development of the Apache HTTP Server, Apache Portable Runtime, and flood. I've also been known to dabble in Subversion development. · Participant B012C Professional Experience [DATE REDACTED] - present Director, The Apache Software Foundation [DATE REDACTED] Chairman, The Apache Software Foundation Software ... Apache httpd is the world's most-used HTTP server software, installed on over 63% of all public Web sites. Co-founder and core developer, 1995-present. · Participant A5090 ...I spend most of my time these days working on various Apache projects, including the HTTP Server Project and the Apache Portable Runtime. I've also been known to work on the Apache 2 filter for PHP, as well as being the co-author of flood (a load-testing tool for webservers). I'm also an independent consultant specializing in the same stuff, in addition to some embedded Linux work and Unix system administration work I've done. · Participant A11D6 Senior Software Engineer (1/00-5/01) * Added filtering and other major functional improvements to Apache 2.0 * Designed and implemented portions of the Apache Portable Runtime The two projects that I am best known for are the Apache Web-server and the Apache Portable Run-time. I am an emeritus member of the Apache Software Foundation. I have also been the Vice President of the APR (Apache Portable Runtime) project. · Participant A1130 ... Member of the core development team for the Apache Web-server since April 1995 including developing the status and user tracking modules and on the security team. B. Excerpts from Selected "Apache Week" Job Listings · Sr. Apache Developer (Atlanta) Work with the best at a great, industry-leading technology company. Must have experience MODIFYING the internal Apache code. C, Perl and FTPD skills must also be top-notch. · Senior Software Engineer (Canada) Workfire Technologies Corporation has an opening for a Senior Software Engineer to develop a system of modified Apache proxy servers for various Windows and Unix platforms. A sound knowledge of the Apache architecture and experience with the Apache 1.3.x code is highly desirable. There will be a strong focus on using HTTP/1.1 and other protocol advancements to make Internet communication more efficient. For a more general description of this position, see www.workfire.com/post4.htm. · Tomcat Admin and Developers (Reading UK) 28
... Various jobs exist at Workplace Systems Reading for both Apache/Tomcat administrators and Java servlet Developers. For administrators, WebSphere skills also a bonus. Developers must have proven expertise. · Software Development Engineer (USA) ... Covalent technologies seeks Software Development Engineer with strong Unix, Internet, programming (C/C++, Java, Tk, Perl), and Apache skills. Duties may include Apache server development, PKI cryptography applications, as well as product development and GUI design. · Developer/Webmaster (San Diego, CA) ... Linux/Apache based Internet privacy service seeking C and Perl developer with web design and maintenance experience to maintain and add services to our website and develop special-purpose Apache modules to drive it. $ + options + medical/dental. Fulltime. · Chief Technology Officer (USA) Wanted: Chief Technology Officer for pre-IPO provider of Open Source solutions. Experience leading Open Source development efforts and Apache installations... · Web-server Engineer (USA) C2Net seeks an Apache expert with strong C and Unix skills to work on the Stronghold web-server. Job responsibilities would include Apache development and some tech support escalations. More information is available... · Software Engineer (England) C2Net Europe seeks self-motivated software engineer with strong C and Unix skills to work on the Stronghold web-server as well as get involved and contribute directly to the Apache, mod_ssl, and OpenSSL projects... C. Technical Recruiter Strategies and Techniques Excerpts from "How To Get Hired As An Open-Source Developer" (Levitt 2002). 21 What was Todd Cranston-Cuebas, prolific Senior Technical Recruiter for Ticketmaster, doing at the recent Apachecon technical conference in Las Vegas? Searching for open source talent, endearing himself to the Apache technical community and engaging in his own sort of "passive" recruitment. Todd has sage advice for both open-source recruiters and job seekers -- straight from the trenches. ... Question: What's the main difference between recruiting for open source skills (e.g. Apache, Linux, MySQL, PHP, etc.) versus proprietary skills (e.g. Microsoft .NET, Novell, Oracle, etc)? Cranston-Cuebas: Companies looking for open source skill sets are very focused on the proven abilities of the engineer in the work environment. In other words, if you can do it, you are the right candidate. With proprietary systems, like Microsoft technologies, there's a tendency to look at things like certifications as a prerequisite for hires. In the open source world, there are very few certifications that matter. Computer science degrees are helpful because they give you a solid foundation in the fundamentals of computer science, but having 5 years of proven, in the trenches, expertise is undeniably valuable. It's unfortunate that the US government requires a degree [in the field of expertise] in 29
order for foreigners to obtain H1B work visas because some of the best people in the industry don't have degrees. MARSHALL SCHOOL OF BUSINESS, University of Southern California GRADUATE SCHOOL OF INDUSTRIAL ADMINISTRATION, CARNEGIE MELLON UNIVERSITY GRADUATE SCHOOL OF INDUSTRIAL ADMINISTRATION, CARNEGIE MELLON UNIVERSITY CO-FOUNDER, APACHE SOFTWARE FOUNDATION REFERENCES Ang, Soon and Cynthia Mathis Beath, "Hierarchical Elements in Software Contracts," Journal of Organizational Computing, III (1993), 329-362. ____, Sandra A. Slaughter and Kok Yee Ng, "Determinants of Pay for Information Technology Professionals: Modeling Cross-Level Interactions," Management Science, XLVIII (2002), 1427-1446. Apache, "The Apache Software Foundation," Accessed September, 2002, (http://www.apache.org/foundation/). Arellano, Manuel, "Computing Robust Standard Errors for within-Groups Estimators," Oxford Bulletin of Economics and Statistics, XLIX (1987), 431-434. Baltagi, Badi H. and Ping X. Wu, "Unequally Spaced Panel Data Regressions with Ar(1) Disturbances," Econometric Theory, XV (1999), 814-823. Banker, Rajiv D. , Gordon B. Davis and Sandra A. Slaughter, "Software Development Practices, Software Complexity, and Software Maintenance Performance: A Field Study," Management Science, XLIV (1998), 433-450. Basili, Victor R. and G. Caldiera, "Improve Software Quality by Reusing Knowledge and Experience," Sloan Management Review, XXXVII (1995), 55-64. Becker, Gary S., "Investment in Human Capital: A Theoretical Analysis," Journal of Political Economy, LXX (1962), 9-49. Belman, Dale and John S. Heywood, "Sheepskin Effects in the Returns to Education: An Examination of Women and Minorities," The Review of Economics and Statistics, LXXIII (1991), 720724. Benjamin, John D., G. Donald Jud, Kevin A. Roth and Daniel T. Winkler, "Technology and Realtor Income," Journal of Real Estate Finance and Economics, XXV (2002), 51-65. Bhargava, A., L Franzini and W. Narendranathan, "Serial Correlation and the Fixed Effects Model," The Review of Economic Studies, XLIX (1982), 533-549. Blaug, Mark, "The Empirical Status of Human Capital Theory: A Slightly Jaundiced Survey," Journal of Economic Literature, XIV (1976), 827-855. Boehm, Barry W., Ellis Horowitz, Ray Madachy, donald Reifer, Bradford K. Clark, Bert Steece, A. Winsor Brown, Sunita Chulani and Chris Abts, Software Cost Estimation with Cocomo 2, (Upper Saddle River, NJ:Prentice Hall, 2000). Boh, Wai Fong, Sandra A. Slaughter and Jose Alberto Espinosa, "Learning from Experience in Software Development: A Multi-Level Analysis," GSIA Working Paper, (2003). Chamberlain, Gary, "Panel Data," Handbook of Econometrics, Z. Griliches and M. D. Intriligator, Eds.(Amsterdam:North Holland, 1984), 1247-1318. 30
Clary, E. Gil, Mark Snyder, Robert D. Ridge, John Copeland, A.A. Stukas, J. Haugen and P. Miene, "Understanding and Assessing the Motivations of Volunteers: A Functional Approach," Journal of Personality and Social Psychology, LXXIV (1998), 1516-1530. Computerworld, "Computerworld 2002 Salary/Skills Survey," Computerworld Inc., Accessed March, 2003, (http://www.computerworld.com/departments/surveys/skills?from=bsm). Davidson, Russell and James G. MacKinnon, Estimation and Inference in Econometrics, (New York, NY:Oxford University Press, 1993). Day, Kathleen M and Rose Anne Devlin, "The Payoff to Work without Pay: Volunteer Work as an Investment in Human Capital," The Canadian Journal of Economics, XXXI (1998), 1179-1191. Fielding, Roy, "Shared Leadership in the Apache Project," Communications of the ACM, XLII (1999), 42-43. Ghosh, Rishab Aiyer, "Interview with Linus Torvalds: What Motivates Free Software Developers?," First Monday, III (1998). Goldberg, Matthew S. and John. T. Warner, "Military Experience, Civilian Experience, and the Earnings of Veterans," Journal of Human Resources, XXII (1987), 62-81. Goleman, Daniel, Working with Emotional Intelligence, (New York, NY:Bantam Books, 1998). Greene, William H., Econometric Analysis, (Upper Saddle River, NJ:Prentice Hall, 2003). Halloran, Timothy J. and William L. Scherlis, "High Quality and Open Source Software Practices," Meeting Challenges and Surviving Success: 2nd Workshop on Open Source Software Engineering. 24th International Conference on Software Engineering, (Orlando, FL:2002). Halvorsen, Robert and Raymond Palmquist, "The Interpretation of Dummy Variables in Semilogarithmic Equations," The American Economic Review, LXX (1980), 474-475. Hausman, Jerry A., "Specification Tests in Econometrics," Econometrica, XLVI (1978), 1251-1271. ____ and William E. Taylor, "Panel Data and Unobservable Individual Effects," Econometrica, XLIX (1981), 1377-1398. Heckman, James J., "Sample Selection Bias as Specification Error," Econometrica, XLVII (1979), 153-161. Humphrey, Watts S., A Discipline for Software Engineering, (Reading, MA:Addison-Wesley, 1995). IBM, Press Release, "IBM Helps Companies Turn Simple Web Sites into Powerful E-Business Solutions," (New York:Business Wire, 1998). Jakubson, George, "Estimation and Testing of the Union Wage Effect Using Panel Data," The Review of Economic Studies, LVIII (1991), 971-991. Kang, Shin and John Bishop, "Effects of Curriculum on Labor Market Success Immediately after High School," Journal of Industrial Teacher Education, XXIII (1986), 14-29. Kay, Fiona M. and John Hagan, "The Persistent Glass Ceiling: Gendered Inequalities in the Earnings of Lawyers," British Journal of Sociology, XLVI (1995), 279-310. Kirsch, Laurie J., "The Management of Complex Tasks in Organizations: Controlling the Systems Development Process," Organization Science, VII (1996), 1-21. Lakhami, Karim R., Bob Wolf, Jeff Bates and Chris DiBona, "The Boston Consulting Group Hacker Survey," Boston Consulting Group, Accessed October 15, 2002,(http://www.osdn.com/bcg/BCGHACKERSURVEY-0.73.pdf). Lancashire, David, "The Fading Altruism of Open Source Development," First Monday, VI (2001). Lerner, Josh and Jean Tirole, "Some Simple Economics of Open Source," The Journal of Industrial Economics, L (2002), 197-234. 31
Levitt, Jason, "How to Get Hired as an Open-Source Developer," The Open Enterprise, Accessed Oct, 2003,(http://www.techweb.com/wire/story/TOE20021202S0001). Mauss, Marcel, The Gift; Forms and Functions of Exchange in Archaic Societies, (New York:Norton, 1967). Mincer, Jacob, Schooling, Experience, and Earnings., (New York:Columbia University Press, 1974). Mockus, Audris, Roy Fielding and James D. Herbsleb, "A Case Study of Open Source Software Development: The Apache Server," Proceedings of the Proceedings of the 22nd international conference on on Software engineering, (Limerick Ireland:ACM, 2000). Moon, Jae Yun and Lee Sproull, "Essence of Distributed Work: The Case of the Linux Kernel," Distributed Work, S. Kiesler, Ed.(Cambridge:MIT Press, 2002), 381-404. Murnane, Richard J., John B. Willett and Kathryn Parker Boudett, "Do Male Dropouts Benefit from Obtaining a Ged, Postsecondary Education, and Training?," Evaluation Review, XXIII (1999), 475-504. Netcraft, "The Netcraft Web-server Survey," Netcraft, Accessed Aug, 2003, (http://news.netcraft.com/archives/2003/08/01/august_2003_web_server_survey.html). O'Reilly, Tim, "Open Source: The Model for Collaboration in the Age of the Internet," Proceedings of the Computers, Freedom and Privacy, (Toronto, Canada:2000). Orlikowski, Wanda J., "Knowing in Practice: Enacting a Collective Capability in Distributed Organizing," Organization Science, XIII (2002), 249-273. OSI, "The Open Source Definition," The Open Source Initiative, Accessed May, 2001,(http://opensource.org/docs/definition_plain.html). Pfeffer, Jeffrey and Alison M. Konrad, "The Effects of Individual Power on Earnings," Work and Occupations, XVIII (1991), 385-414. Raftery, Adrian E., "Bayesian Model Selection in Social Research," Sociological Methodology, XXV (1995), 111-163. Raymond, Eric, "The Cathedral and the Bazaar," The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary, (Cambridge, MA:O'Reilly, 1999a), 19-64. ____, "Homesteading the Noosphere," The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary, (Cambridge, MA:O'Reilly, 1999b), 65-112. Robillard, Pierre N., "The Role of Knowledge in Software Development," Communications of the ACM, XLII (1999), 87-92. Schreyer, Paul and Dirk Pilat, "Measuring Productivity," OECD Economic Studies, XXXIII (2002), 127-170. Schafer, Maria, "It Salaries Hold Their Own," Computerworld Inc., Accessed Oct, 2003, (http://www.computerworld.com/careertopics/careers/story/0,10801,79574,00.html). Shah, Solani, "Understanding the Nature of Participation and Coordination in Open and Gated Source Software Development Communities," Working paper, MIT Sloan School of Management, (Cambridge:2003). Spence, Michael, "Job Market Signaling," Quarterly Journal of Economics, LXXXVII (1973), 355374. ____, Market Signaling: Information Transfer in Hiring and Related Screening Processes, (Cambridge, MA:Harvard University Press, 1974). Statz, Joyce, "Leverage Your Lessons.," IEEE Software, XVI (1999), 30-32. Strasser, Mathias, "A New Paradigm in Intellectual Property Law? The Case against Open Sources," Standford Technology Law Review, Accessed Oct, 15, 2003,(http://stlr.stanford.edu/stlr/articles/01stlr_4). 32
Taubman, Paul J. and Terence J. Wales, "Higher Education, Mental Ability, and Screening," Journal of Political Economy, LXXXI (1973), 28-55. Tyler, John H., Richard J. Murnane and John B. Willett, "Estimating the Labor Market Signaling Value of the Ged," Quarterly Journal of Economics, CXV (2000), 431-468. Verbeek, MMarno and Theo Nijman, "Testing for Selectivity Bias in Panel Data Models," International Economic Review, XXXIII (1992), 681-703. von Krogh, Georg, Sebastian Spaeth and Karim R. Lakhami, "Community, Joining, and Specialization in Open Source Software Innovation: A Case Study," Research Policy, VII (2003), 12171241. Watson, Sharon, "End of Job Loyalty," Computerworld, (2000), 52-53. Weinberg, Gerald M., The Psychology of Computer Programming. Silver Anniversary Edition, (New York:Van Nostrand Reinhold, 1998). Weiss, Andrew, "Human-Capital Vs Signaling Explanations of Wages," Journal of Economic Perspectives, IX (1995), 133-154. Whang, Seungjin, "Contracting for Software Development," Management Science, XXXVIII (1992), 307-324. Wooldridge, Jeffrey M., Econometric Analysis of Cross Section and Panel Data, (Cambridge, MA:MIT Press, 2002). 33
NOTES 1. This view has been echoed by practitioners. For example, Jim Clark, founder of Silicon Graphics and Netscape, handpicked Pavan Nigam as the chief technology officer of WebMD [his third venture] because "...the difference between a great software guy and an O.K. software guy is huge. A great software guy is worth 10 times an O.K. software guy" [New York Times Magazine, Oct. 10, 1999]. 2. The problem of observing abilities is not unique to software programmers. For example, the academic tenure process heavily relies on the approval of peers at other academic institutions, despite the fact that the home institution has observed the output of the tenure candidate for a significant period of time. 3. This difficulty with assessing software development performance has also been pointed out in the software contracting literature [Whang 1992; Ang and Beath 1993]. 4. Dubbed "Linus's Law", Eric Raymond [1999a] notes the inverse relationship between the amount of time to identify and correct software defects and the number of independent developers simultaneously working on the problem. Or as Raymond quips, "Given enough eyeballs, all bugs are shallow". 5. This is described in detail in Section III. 6. Sendmail and the Apache Web-server. 8. See the Appendix for excerpts from resumes of several Apache contributors. 8. CVS or Concurrent Versioning System is the de-facto standard source control system for Internet-enabled open source software projects. 9. In the CVS model, permanent changes to the source base are "committed". Hence, contributors who are granted the authority to commit their changes without going through an intermediary are commonly called `committers'. 10. Such issues are most likely related to software defects that result in a reoccurring pattern of lost productivity [Shah 2003]. 11. Delayed returns to open source participation may include other benefits, the most important of which are stock options. However, we chose not to ask for detailed information about stock options for the following reasons. First, open source participants working at start-ups told us that they were not allowed to disclose the number of stock options that they were holding, making it impossible to estimate the value of an option. Further, a significant percentage of respondents work at privately held firms, which makes it difficult to assess the value of these options. 12. A software development language is a combination of syntax and grammar rules used to transform software designs into instructions suitable for execution by a computer. 13. To account for possible differences in the lines of code measure between programming languages, each contribution is converted to a common metric using industry standard language conversion factors [Boehm, Abts et al. 2000] 14. The commercial software industry continues to adopt software development tools adept at managing the team or group based nature of software development. In contrast, the software development tools most often used on open source projects consist of a common set of rudimentary tools selected not for their ability to manage large group projects, but rather for their wide availability, their compatibility with other open source project environments, and their support for asynchronous software development [Halloran and Scherlis 2002]. 15. In a semilogarithmic regression equation, the percentage impact of a dichotomous variable coefficient, c, on the dependent variable is properly expressed as 100·(exp(c)-1 [Halvorsen and Palmquist 1980]. 16. Note that in the absence of such omitted variables, FE is consistent but not fully efficient [Hausman and Taylor 1981]. 17. LBI is the locally best invariant test statistic for AR[1] processes in unbalanced panel data [Baltagi and Wu 1999]. 18. In practice, published critical values for both the LBI and the Bhargava et al. modified Durbin-Watson statistics for panels similar to ours are quite small and often negative. 19. This finding is also confirmed by comparing both the Akaike's Information Criterion (AIC) and the Schwarz's Bayesian Criterion (BIC) of the two models [Raftery 1995]. 20. Information that could reveal a participant's identity has been redacted. 21. Entire interview can be seen at http://www.theopenenterprise.com/story/ toe20021202s0001. 34
Table I
Patterns of Rank Progression in Sample
Period 1
Period 2
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
DEV
NORNK
DEV
NORNK
DEV
NORNK NORNK NORNK NORNK NORNK
DEV DEV, COM DEV, COM DEV, COM DEV, COM, PMC+
DEV DEV PMC+
DEV COM, PMC+ PMC+
Note: number of observations = 147.
Period 3 NORNK NORNK DEV DEV DEV DEV, COM DEV, COM DEV, COM, PMC+ DEV DEV DEV COM, PMC+ COM COM PMC+ PMC+ DEV PMC+ PMC+
Period 4 DEV DEV, COM DEV COM COM, PMC+ COM PMC+ PMC+ DEV COM COM, PMC+ PMC+ COM PMC+ PMC+ PMC+ DEV PMC+ PMC+
Frequency 49 4 33 5 1 6 1 1 8 1 1 2 4 1 1 1 25 2 1
% 33.33 2.72 22.45 3.4 0.68 4.08 0.68 0.68 5.44 0.68 0.68 1.36 2.72 0.68 0.68 0.68 17.01 1.36 0.68
35
Table II
Patterns of Rank Progression in Population
Period 1 Period 2
Period 3
Period 4
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
NORNK
DEV
NORNK
DEV
NORNK
DEV
NORNK
DEV
NORNK NORNK NORNK NORNK
DEV DEV, COM DEV, COM DEV, COM
NORNK
DEV, COM, PMC+
DEV
DEV
DEV
DEV
DEV
COM
DEV DEV
COM COM, PMC+
COM
COM
COM
PMC+
PMC+
PMC+
Note: number of observations = 1301.
NORNK NORNK NORNK DEV DEV DEV DEV, COM DEV, COM DEV, COM, PMC+ DEV DEV DEV COM COM, PMC+ COM COM PMC+ PMC+ DEV COM, PMC+ COM PMC+ PMC+ COM PMC+ PMC+
DEV DEV, COM DEV, COM, PMC+ DEV COM COM, PMC+ COM PMC+ PMC+ DEV COM COM, PMC+ COM PMC+ COM PMC+ PMC+ PMC+ DEV PMC+ COM PMC+ PMC+ COM PMC+ PMC+
Frequency 360 49 2 337 22 1 71 7 8 108 4 3 7 3 24 1 8 5 240 2 1 1 5 4 4 24
% 27.67 3.77 0.15 25.9 1.69 0.08 5.46 0.54 0.61 8.3 0.31 0.23 0.54 0.23 1.84 0.08 0.61 0.38 18.45 0.15 0.08 0.08 0.38 0.31 0.31 1.84
36
Table III
Model Variables and Descriptive Statistics
Time varying variables - Continuous Total wages in 1998 Dollars Average # hours worked per year Average # hours contributed to Apache per year
Variable TWAGE WKHRS APHRS
Mean 79.95 2122.22 206.12
Time varying lagged variables - Continuous Career experience Career experience squared Level of education in years Project contributions
Variable EXPR EXSQ LEDU CNTRB
Mean 5.67 48.31 16.22 24.74
Std. Dev. 29.51 395.12 482.43 Std. Dev. 3.99 55.09 2.20 138.10
No. Observations - 360
Time varying variables - Dichotomous Employed by publicly traded firm Employed in the software industry Student Paid to develop Apache software
Variable FPUB FSWIN STDNT PDAPC
Frequencies by Observation Period
0
1
2
3
48 (36%) 42 (30%) 8 (19%) 13 (31%)
53 (40%) 64 (45%) 23 (54%) 21 (50%)
19 (14%) 13 (9%)
3 (7%)
3 (7%)
13 (10%) 20 (14%) 6 (14%) 4 (10%)
Time varying lagged variables - Dichotomous Observed job switch No RANK (No contributions to project) Apache rank equal to `developer' Apache rank equal to `committer' Apache rank of project management committee member or higher
JSWCH NORANK DEV COM PMC+
48 (36%) 108 (81%) 24 (18%) 0 (0%) 1 (1%)
43 (30%) 96 (68%) 36 (25%) 6 (4%) 4 (3%)
10 (23%) 13 (30%) 24 (56%) 5 (12%) 1 (2%)
9 (21%) 0 (0%) 32 (76%) 8 (19%) 2 (5%)
Other variables
No college degree
NOCOL
20 (15%) 23 (16%)
College degree
COLL
68 (51%) 74 (52%)
Masters degree
MAST
33 (25%) 33 (23%)
PhD
PROF
12 (9%) 13 (9%)
Technical job such as software engineer, system administrator, etc Management job such as project manager or CIO Other job title
TECH MGMT OTJOB
90 (67%) 20 (15%) 23 (17%)
93 (65%) 31 (22%) 18 (13%)
Notes: values are grand means over full sample. Number of observations = 360.
6 (14%) 21 (49%) 13 (30%) 3 (7%) 33 (72%) 3 (7%) 7 (15%)
5 (12%) 21 (50%) 14 (33%) 2 (5%) 27 (59%) 4 (9%) 11 (24%)
37
Table IV
Regression Results: Fixed Effect, Random Effect, 2SLS, and Random Effect with AR[1]
Coefficients (1)
(2)
(3)
(4)
(5)
FE
ML-RE GLS-RE 2SLS-IV RE-AR(1)
PDAPC
0.043
0.053
0.053
0.043
0.050
(0.049) (0.042)
(0.043) (0.049)
(0.043)
JSWCH
-0.002
0.004
0.003
-0.002
-0.001
(0.027) (0.025)
(0.025) (0.027)
(0.025)
FPUB
0.026
0.0491* 0.049
0.026
0.052*
(0.036) (0.030)
(0.031) (0.036)
(0.031)
FSWIN
0.053
0.064** 0.064** 0.052
0.064**
(0.038) (0.031)
(0.032) (0.038)
(0.032)
STDNT
-0.183** -0.259*** -0.259*** -0.184** -0.262***
(0.089) (0.065)
(0.066) (0.090)
(0.066)
CNTRB
-0.000* -0.000** -0.000** -0.000* -0.000**
(< 0.001) (< 0.001) (< 0.001) (< 0.001) (< 0.001)
EXPR
0.073* 0.078*** 0.078*** 0.073*
0.078***
(0.045) (0.015)
(0.016) (0.045)
(0.016)
EXSQ
-0.003** -0.003*** -0.003*** -0.003** -0.003***
(0.001) (0.001)
(0.001) (0.001)
(0.001)
TIME1
0.085* 0.076*** 0.076*** 0.084*
0.076***
(0.048) (0.021)
(0.021) (0.048)
(0.021)
TIME2
0.136
0.122*** 0.122*** 0.135
0.123***
(0.099) (0.039)
(0.040) (0.100)
(0.040)
TIME3
0.096
0.070
0.070
0.093
0.068
(0.140) (0.048)
(0.049) (0.142)
(0.051)
DEV
-0.055
-0.058
-0.058
-0.054
-0.057
(0.043) (0.036)
(0.037) (0.045)
(0.037)
COM
0.132** 0.125** 0.125** 0.134** 0.129**
PMC+
(0.064) (0.059)
(0.060) (0.067)
0.257*** 0.261*** 0.261*** 0.270*
(0.061) 0.268***
(0.105) (0.092)
(0.094) (0.152)
(0.095)
LEDU
0.017
0.016
0.016
0.017
0.016
(0.046) (0.012)
(0.013) (0.046)
(0.012)
INTCPT
10.591*** 10.568*** 10.568*** 10.592*** 10.569***
Pseudo R2
(0.8780) (0.199)
.400
N/A
(0.204) .409
(0.880) .400
(0.199) .410
Note: in all regressions, the dependent variable is total wages expressed in 1998 US Dollars. Standard errors are in
parentheses. *** signifies p-value < .01, ** signifies p-value < .05, * signifies p-value < .10. Number of
observations = 360.
38
Table V
Regression Results: Fixed Effect and Random Effect Model Excluding Latent Contributors
Coefficients PDAPC
(1) FE 0.071 (0.065)
(2) GLS-RE 0.086 (0.056)
JSWCH
-0.007
-0.004
(0.036)
(0.034)
FPUB
0.014 (0.045)
0.028 (0.041)
FSWIN
0.050
0.066
(0.052)
(0.045)
STDNT
-0.255** (0.122)
-0.286*** (0.095)
CNTRB
-0.000*
-0.000*
(< 0.001)
(< 0.001)
EXPR
0.067 (0.054)
0.068*** (0.022)
EXSQ
-0.003*
-0.002*
(0.002)
(0.001)
TIME1
0.082 (0.064)
0.069** (0.034)
TIME2
0.140
0.115**
(0.122)
(0.049)
TIME3
0.107 (0.173)
0.064 (0.060)
DEV
-0.052
-0.050
(0.049)
(0.044)
COM
0.134*
0.131*
PMC+
(0.073) 0.269**
(0.069) 0.263***
(0.119)
(0.106)
LEDU
-0.000
0.006
(0.060)
(0.017)
INTCPT
10.934***
10.753***
Pseudo R2
(1.147) .311
(0.274) .328
Note: in all regressions, the dependent variable is total wages expressed in 1998 US Dollars. Standard errors are in parentheses. *** signifies p-value < .01, ** signifies p-value < .05, * signifies p-value < .10. Number of observations = 222.
39

I Hann, J Roberts, S Slaughter

File: an-empirical-analysis-of-economic-returns-to-open-source-participation.pdf
Title: Microsoft Word - OSS Paper Draft-17.doc
Author: I Hann, J Roberts, S Slaughter
Published: Mon Mar 15 09:12:53 2004
Pages: 39
File size: 0.31 Mb


, pages, 0 Mb

HAI Goonetileke Collection, 212 pages, 1.61 Mb

San Mateo County, 1 pages, 0.41 Mb
Copyright © 2018 doc.uments.com