How to Count What Counts: TIS the Season for Syllabi Metrics?



Jeff Colgan’s recent paper advances the study of the international relations (IR) discipline in three ways.  First, he empirically explores a series of prominent and untested claims about the direction of the field.  Second, he provides a new method for measuring the impact of published scholarship.  Finally, he generates a series of plausible and interesting claims about the field -- some of which he tests and some that remain to be explored.  This last feature is one of the most valuable parts of Colgan’s contribution.  Despite the fact that this is an empirical paper, the idea-to-word ratio is very high, and the paper produces new ideas that serve as an inviting springboard for wild speculation (and future research) for the rest of us.

The editors of ISQ have invited several response essays and we expect these will be full of empirical, conceptual, and normative critiques.  We will join the fray briefly near the end of this essay, mostly with quibbles rather than foundational critiques, but will use the bulk of our essay to present data from the Teaching, Research, and International Policy (TRIP) faculty surveys that speak to a number of Colgan’s timely questions.  We show that there is disagreement among IR scholars in the United States about whether various citation metrics are good measures of scholarly impact, or whether and how they should be used in the tenure and promotion process.  We provide evidence that suggests IR scholars assess impact differently depending upon their gender, academic rank, methodology, analytic approach, epistemology, and type of institution.  We also find that the way faculty organize the IR field seminar is broadly consistent with Colgan’s findings about the type of readings assigned in field seminar syllabi.  Like Colgan, we hope these results generate more introspection, conversation, and research on disciplinary practices within IR.

Citation Counts Don’t Count Everything

Colgan joins a growing chorus of scholars who observe that the academy is increasingly obsessed with measuring and demonstrating the impact of scholarship.  Individual scholars seek to demonstrate the impact of their work in order to achieve tenure and promotion, departments to impress administrators or prospective students, and universities to maintain rankings and thus resources.  While various citation metrics (Web of Science and Google Scholar) and surveys (TRIP and Garand and Giles) have been used to assess the impact of scholarship and ideas, there have been very few efforts to study impact by measuring which books, journals, or specific articles are included on course syllabi.  Presumably professors select research that they believe will be most useful in teaching the discipline to the next generation of IR scholars... or practitioners.

Colgan is skeptical of “various metrics based on citation counts,” and uses this as one justification for creating the “Teaching Influence Score” (TIS).  But is this skepticism (discussed here, here, and here) reflected in the views of all IR scholars?  We use data from the 2014 TRIP Faculty Survey of IR scholars based at U.S. institutions to explore what types of scholars are likely to see various citation metrics as objective and/or useful in assessing impact.  When asked whether “citation counts provide an objective measure of scholarly impact,” about half disagreed while less than a third agreed.  In a robust pattern (see Figure 1) that recurs in other questions below, we see greater enthusiasm for citation counts among quantitative scholars than those who employ qualitative methods.

Figure 1. Perception of Citation Counts by Qualitative and Quantitative Scholars

While we do not have as many observations for scholars who use other types of methods, and thus cannot be as confident in the representativeness of their responses, a similar pattern emerges when we include policy analysis, legal/ethical analysis, and formal modeling as methods.  Here we observe that quantitative scholars and formal modelers are more likely to agree with the premise that citation counts represent an objective measure of influence (see Figure 2).  This finding persists despite the fact that articles using quantitative methods are systematically less likely to be cited, and citation rates decay more quickly than articles that Saideman (2015) codes as “grand theory” articles or non-formal IR theory articles. 

Figure 2. Perception of Citation Counts by All Methods


Colgan cites Curry’s (2012) discussion of Thomson Reuters SSCI impact factor when he claims: “if you use impact factors you are statistically illiterate.”  For those who accept Curry’s premise, it may be distressing, or mildly amusing to learn that when you slice the data by methodology, scholars who employ quantitative methods were about twice as likely to report that they “weight publications by their SSCI impact score” as displayed in the second column of Figure 3 below.  

Figure 3. Evaluating Journal Articles with Ranking or Rating System

In addition to methodology, we found that the analytic perspective one brings to the study of politics also influences one’s faith in citation counts.  Respondents who claim to employ a rational choice framework tend to believe that citation counts provide an objective measure of scholarly impact (See Figure 4), while those who “do not assume the rationality of actors” are half as likely to agree.  Similarly, rationalists report that they are twice as likely as non-rationalists to think Google Scholar citation counts or the h-index is important (46 percent compared to 19 percent, respectively).  While not shown in the figure below, we see similar results for epistemology, where self-described “positivists” are more likely to see citation counts as an objective measure of impact and to use them in their assessments for tenure and promotion.

Figure 4. Perception of Citation Counts by Analytic Framework


We guessed that anecdotes about the use of various citation metrics in the tenure and promotion process likely reflect emerging practices among scholars at R-1 institutions, where the pressure to publish is most intense.  However, scholars’ views on this issue at research universities are almost the same as their colleagues’ views at liberal arts colleges or comprehensive four-year institutions.  Similarly, while one might expect tenure status to have a large effect on the perception of various citation metrics, the eyeball test (as illustrated in Figure 5) suggests only minor differences between scholars at different academic ranks with full professors being slightly more positive than their more junior colleagues.

Figure 5. Perception of Citation Counts by Academic Rank

Finally, there has been much recent analysis and discussion about gender citation bias and even work in progress by Colgan about parallel underrepresentation of female scholars on syllabi.  So, readers may not be terribly surprised to learn that female IR scholars use citation metrics less frequently than their male counterparts and are more skeptical about their objectivity as measures of scholarly impact.

How to Improve TIS as a Measure of Impact

Most scholars tend to agree that citation measures do not and cannot capture everything we might want to measure; however, aside from individual judgments and external letters in the tenure and promotion process, it is one of the few metrics we have to measure scholarly impact.  Colgan’s new “Training Influence Score” (TIS) specifies another systematic and transparent measure of scholarly impact that, like citation metrics, reputational surveys, and other new efforts to measure scholarly impact offer partial and imperfect measures of the underlying concept. TIS can help to triangulate, to use a variety of measures meant to capture different types of scholarly impact that can be used in conjunction with more traditional assessments of research quality and/or impact.  We provide a few “friendly amendments” that will make Colgan’s measure even more useful if implemented, and then raise one fundamental limitation of TIS. 

Scholars have argued here and here that one problem with citation metrics is that they capture both positive and negative citations, and perhaps we ought not be crediting scholars with negative citations. Colgan suggests that a similar effect could be at play in a work’s inclusion on a syllabus.  One of us (Tierney) assigns two of the discipline’s most highly cited works (Huntington’s Clash of Civilizations and Mearsheimer’s “Back to the Future”) of the past thirty years because they are so clearly written and so clearly wrong.  These are wonderful pedagogical foils for the classroom.  In the limited empirical research done on this topic, scholars have found that negative citations are actually more rare than most imagine them to be. We currently have a project underway with Lindsay Hundley that aims to measure this and other qualities of a citation to a given publication.  If you think of a “negative citation” as one that expresses a negative sentiment about the quality of the cited work, preliminary work suggests that only about 2 percent of citations to the most cited IR books and articles are “negative” in this sense.  If you broaden the definition to include any citation that disagrees with either the theory, methods, or conclusions of the cited source, we find “negative” citations to be around 20 percent of all citations.

Examining the Direction of IR

In terms of specific inferences that Colgan draws from his analysis, we focus on just one issue that likely has a direct impact on the findings.  Colgan does a good job discussing the fact that IR is typically organized and taught as a subfield of political science at most U.S. universities, but that the publication outlets most valued by IR scholars are different from those valued by specialists of American and (to a lesser degree) Comparative Politics.  However, some of his conclusions are more or less truer today than his results suggest.  Recall, TIS draws upon a convenience sample of syllabi used between the years 2008 and 2013 with the majority of syllabi drawn from the most recent part of the time series in 2012 or 2013.  Colgan compares various features of IR articles and journals that appear on syllabi in this later period to a cross section of all articles/journals in the TRIP Journal Article Dataset from 1980 to 2006 (Maliniak and Powers 2015), which was the only data to which Colgan had access at the time he published this paper. 

TRIP’s most recent journal article data runs through 2012 and actually reaffirms/strengthens some of Colgan's key findings.  Specifically, the data between 2007 and 2012, that Colgan could not include in his analysis, contain a higher proportion of articles that employ quantitative methods and a lower proportion of “formal theory” and “analytic non-formal” articles (see Figure 6).  These newer data actually strengthen Colgan’s claim about the “gap” between what is published in top IR journals and what is taught in IR field seminars.  However, newer data mildly weaken a second claim in the paper where Colgan demonstrates that the types of articles published in the top ranked IR journals (IO, ISQ, IS, and WP) are systematically different than IR articles published in the four “general political science journals,” (APSR, AJPS, BJPS, and JOP).  But while expanding the data’s date range strengthens the claim that “taught IR” is different than “published IR,” in this later case the update reduces the differences between IR and general interest journals.  Since 2006 articles published in IO, ISQ, WP, and IS have become more similar to articles published in the general political science journals.  Specifically, the IR journals have published a higher proportion of articles that are quantitative and positivist than in the pre-2006 era and have reduced the number of descriptive, formal, and analytic non-formal theory papers.  So, by this measure, articles published in top IR journals look more like articles published in general interest journals, shrinking (but not eliminating) the purported gap between IR and political science.

Figure 6. Methods of Frequently-Taught Articles vs. All Published Articles

*Colgan’s original table amended with new column for updated TRIP Journal Article Database






Frequency Taught

Percentage Published

Percentage Published













Analytic & Non-Formal
















Policy Analysis




 Colgan targets the “IR field seminar” in the top 65 U.S. PhD programs as the most relevant source for data on impact through teaching.  This seems a good place to start, but we note several features of the sampling strategy that inhibit valid inferences and/or allow more noise than one might like around TIS estimates.  Most obviously, the sample could be improved by collecting more than just one syllabus from one core IR course taught in one semester at each university at some time over a seven-year period. The TIS sample is likely not representative of what any given department might include in its IR field seminar (much less its PhD curriculum) because the content of that syllabus almost certainly varies depending on the specific tastes of the instructor for that selected semester.  For example, at UCSD in the early 1990s the “core seminar” in IR was actually two courses (the “system” course and the “unit” course), that were taught by some combination of John Ruggie, Lisa Martin, Peter Cowhey, David Lake, and Peter Gourevitch.  Unsurprisingly, those were very different syllabi depending on the instructor!

Since the large variation observed in the olden days at UCSD did not seem like a unique situation, we emailed faculty members at two of the “outlier” departments (UVA and Northwestern) as measured by TIS 1.0.  For the semester he analyzed, Colgan catalogued all the readings and concluded that “one of these universities, Northwestern, does not teach a single one of these canonical readings in its core IR course.”  But had Colgan happened to have sampled the syllabus from the very same course taught the following semester, he would have found that Karen Alter teaches 9 of the 10 most popular readings found on other IR syllabi, rather than 0 of 10, which he found on the syllabus taught in a previous semester.  Similarly, while Colgan accurately reports that the UVA syllabus he analyzed “assigns almost nothing from JCR or AJPS,” a second syllabus for the same course (co-taught by John Owen and Todd Sechser), but from a different year within the sampling frame, assigns seven different articles from either JCR or AJPS.  Thus, the current measures provided by TIS are unnecessarily narrow and likely suffer from various types of measurement error.  Colgan could improve the validity of TIS by including all IR field seminar syllabi used by a given department over any specified period of time.

Since TIS seeks to measure impact as represented by what is taught within the discipline, one could broaden the measure (or proliferate related measures) in a number of other ways, including, but not limited to: (1) measuring what readings are assigned in all PhD seminars rather than simply the field seminar; (2) expanding the date range to generate a larger sample; (3) analyzing what is taught in MA or BA courses to capture impact earlier and more broadly in the educational process; (4) increasing the number of PhD programs covered beyond the U.S. top 65, or beyond the U.S.  Incidentally, when we asked all IR scholars how they organized their PhD field seminar in IR, their responses (254 of them) were broadly consistent with Colgan’s findings -- instructors tended to organize the seminar around the big paradigmatic ideas that show up most frequently in Colgan’s TIS measure.  So, there is some additional indirect evidence that Colgan’s results from the top PhD programs are representative of the broader population. While the sample could certainly be improved, Colgan’s decision to start with top PhD programs makes good sense and helps to establish proof of concept.

The Challenge of Measuring Scholarly Impact: Triangulating What Counts

Whether citation counts should be used to evaluate scholarship is an ongoing debate, but compared to Colgan’s TIS or reputational surveys, citation counts are likely going to be more widely applicable than what we currently have on hand.  If one were sponsoring a competition for a lifetime achievement award in IR, or even a named professorship at a top 10 university, then a top 20 ranking on the TRIP survey or a top 20 ranking on a new TIS index might be modestly helpful.  But for the rest of us who are trying to make decisions about whether to tenure someone who received his or her PhD a few years ago, TRIP and TIS are not all that useful, as both are extremely skewed toward the “top end” of the distribution of scholars.  Even if we dramatically expanded the number of syllabi, the number of courses covered, and the number of institutions covered, assistant professors will rarely appear.  So, while TIS may be valuable for some things, including promotion to full professor at universities with very high standards, it is not all that useful as a substitute for citation metrics at tenure time, which is the decision point when such metrics likely matter most -- for good or ill.

While such metrics are unlikely ever to replace the subjective judgments of colleagues and external evaluation letters, they are systematically collected and over time we have learned more about the types of omissions and biases that are currently present in such measures.  We know that citation counts do not tell the full story about the quality or impact of any piece of scholarship.  This is why we can all benefit from the development of multiple different metrics, including Colgan’s pioneering work on the “taught discipline.”  We encourage Colgan and his fellow travelers to continue improving TIS, since no single measure that we have today is sufficient to illuminate all the types of impact we might be interested in measuring and/or encouraging.


Discuss this Article
There are currently no comments, be the first to post one.
Start the Discussion...
Only registered users may post comments.
ISQ On Twitter