On December 18 1975, in a meeting with senior staff, Secretary of State Henry Kissinger decided to ‘raise a little hell’. He was furious with what he called their ‘incomprehensible’ decision to include sensitive information in a diplomatic cable. “I want to raise a little bit of hell about the Department's conduct in my absence. Until last week I thought we had a disciplined group; now we've gone to pieces completely. Take this cable on East Timor. You know my attitude and anyone who knows my position as you do must know that I would not have approved it. The only consequence is to put yourself on record. […] What possible explanation is there for it? I had told you to stop it quietly. I didn’t say you couldn’t make a recommendation orally. […] It is incomprehensible. It is wrong in substance and in procedure. It is a disgrace.”Kissinger then went on to complain that the cable would undoubtedly leak: “It will go to Congress too and then we will have hearings on it. [That will] leak in three months and it will come out that Kissinger overruled his pristine bureaucrats and violated the law. […] You have a responsibility to recognize that we are living in a revolutionary situation. Everything on paper will be used against me.”‘The secret life of Henry Kissinger; minutes of a 1975 meeting with Lawrence Eagleburger’ by Mark Hertsgaard, The Nation, October 29, 1990. About the projectScarcity of information is a common frustration for historians. This is especially true for researchers of antiquity, but not exclusively so. For students of twentieth- and twenty-first century history the opposite problem is also increasingly common — overwhelmed instead by a deluge of information and confronted by a vast field of haystacks within which they must locate the needles (and presumably, use them to knit together a valid historical interpretation), historians have already struggled with what is now understood as ‘big data’. Exhaustive efforts by historians at approaching vast troves of information have often employed a traditional ‘close-reading’ methodology in which each author’s thesis is illustrated by hand-picked, ostensibly representative samples, presented as valid proof of the underlying argument. Ensuring such examples are indeed representative for historical interpretation is increasingly difficult as the size of the archive increases. As larger and larger archives of human cultural output are accumulated, historians are beginning to employ other tools and methods — including those developed in other fields, including computational biology and linguistics — to overcome ‘information overload’ and facilitate new historical interpretations. This paper is an application of ‘big data’ computational techniques like those employed by Michel and Nelson to research the Digital National Security Archive (DNSA)’s recently released Kissinger Collections, comprising approximately 17500 meeting memoranda (‘memcons’) and teleconference transcripts (‘telcons’) detailing Kissinger's correspondence during the period 1969-1977: it is a first effort at ‘Diplonomics’. The declassification of the Kissinger material by the State Department and the hosting of that material on the DNSA’s Kissinger Collection web site therefore presents an opportunity and a challenge for historians. While having this large volume of information online for researchers is valuable, the restriction to a web-based ‘search’ interface can render it of limited use to researchers. The application of more sophisticated computational techniques permits a comprehensive analysis of the historical records of the Kissinger collection at the DNSA, and facilitates meaningful historical interpretations. While this new way of looking at history is based on data, unlike other methods of historical analysis (eg 'cliometrics') it is the variations of the content of the text itself, rather than economic data, that is measured. |
Topic Modeling
Topic Stream Graphs
Topic Area Graphs
Topic Force Graphs
Instead of a more traditional x/y axis graph, each memcon in the archive and their relation to the 40 topics of the topic model are represented here using a 'force-directed' diagram. More than prior figures, this graph is off-putting at first and requires a bit of orientation to understand. Here each document is represented by one of a network of small circles, connected by lines and placed at a distance from the larger circles (the topics) according to their respective association to each topic. The size of the topic circles and their textual labels reflects the total weight afforded to them by the documents in the archive, and the color of the small documents’ circles and connecting lines reflects the classification status of each document. This graph elegantly demonstrates in one view the interrelated ‘clusters’ of documents by proximity, their classification status, and the complex ways in which all documents relate to their constituent topic(s) and to one another. Even more than the line/area graphs, this image synthesizes the information gathered through metadata analysis, n-gram counting, and topic modeling to present inter-relationships not always readily apparent from a tabular view of the underlying data. The blue dots/lines represent documents with ‘Top Secret’ classification status, the yellow dots are ‘Secret,’ the pink dots are ‘Unclassified’ and the 40 topics of the topic model are displayed as grey circles with text. Documents sharing similar topic weightings are clustered together, and placed at a relative distance from those topics. The placement of documents and topics related to matters of high military or national security significance among the bluish upper left region is unsurprising, as is the placement of ‘laughter’ so far on the other side of the graph. It is also notable that this upper left hand area of the graph contains those countries at the heart of Nixon and Kissinger’s vaunted “triangular diplomacy.” The topics concerning Soviet Union, China, Vietnam, and related topics are all placed in close proximity to one another occupying a close-knit area of the graph, suggesting that when those topics were mentioned they were often mentioned together. There is another fascinating topic in this topic model revealed by this graph, one with a unique significance. The “Laughter” topic is based upon those documents in which the transcriber literally placed the phrase “[laughter]," representing jovial, lighthearted moments of Kissinger’s correspondence in which the participants had a chuckle. A historian would expect these sorts of emotional expressions to occur in inverse proportion to the gravity of their respective topics (for example, the least ‘laughter’ during those negotiations in which relations were at their most sensitive, tense and/or adversarial), and the placement of the “Laughter” topic at the furthest possible point from topics relating to the Soviet Union, China and Vietnam negotiations validates this interpretation. The placement of the ‘Cambodia’ topic outside that military arc, much closer to ‘Laughter’ than, say, ‘Vietnam’ or ‘Soviet,’ is very interesting, suggesting that the archive may contain only those documents of a less contentious or generic nature compared to those other topics.The “Cambodia” topic’s comparative proximity to the Laughter topic, clearly visible in this graph, reflects an uncharacteristically ‘jovial’ slant of the content of the documents in the Cambodia topic in comparison to those from the other topics of similarly grave military importance. It is an odd result that supports other findings that the archive’s “Cambodia” material on which this topic is based is likely a hand-picked, sanitized and non-representative selection of only the more congenial exchanges regarding Cambodia, specifically excluding tense and difficult situations. Memoranda detailing planning and execution of disavowed military incursions, involvement in the installation of the Lon Nol regime, and other incidents are largely absent from the archive. Computational techniques here combined with a subjective historian’s assessment of the inapplicability of ‘laughter’ to topics like Cambodia, have thus uncovered a strong relationship between a document’s classification status and its subject matter. Further interpretations of the proximity of the 'laughter' topic (among others) to these geopolitical foci are detailed in greater depth in the written paper. Topic Modeling performed using 'MALLET Topic Modeling Toolkit.' |
Memcons: Static Topic Model Force Graph Memcons: Interactive Topic Model Force Graph Telcons: Interactive Topic Model Force Graph (may take a while to load) |
Individual/Organizational Influence
Word Correlation
Word Correlation Force Graphs
'Bombing' This study has uncovered new quantitative evidence suggesting a possible absence of material regarding a number of controversial military topics in the DNSA’s Kissinger Memcons and Telcons collections. This evidence is the result of comparisons between linguistic patterns in the memcons and the telephone transcripts that have illuminated a number of significant discrepancies. First and perhaps most strikingly, the statistical frequency of the words “Cambodia” and “Vietnam” when measured in correlation with the word “bombing” differs greatly between the two channels of communication. When Kissinger and his associates were using the word 'bombing' in official meetings, it was associated much more with words related to 'Vietnam' than in the telephone conversations, in which 'bombing' was seen in greater statistical correlation with the names of other countries in Indochina (Laos, Thailand and Cambodia). Click on the image for a zoomed-in view. |
'Bombing' Word Correlation Force-Directed Graph |
This is an interactive 'd3' version of the force-directed word correlation analysis of the word 'Bombing'. Currently, the diagram does not take 'edge weights' into account, so the nodes within each cluster are placed inexactly. Until 'edge weight' code is completed, static graph above is far more accurate and 'stable'. |
'Bombing' Word Correlation Interactive Force-Directed Graph |
'Cambodia' In addition, words related to violence (bombing, attack, invade, etc.) were more likely to be seen in correlation with the word "Cambodia" in the Telcons than in the Memcons, which displayed a greater correlation frequency between 'Cambodia' and words related to laughter. Click on the image for a zoomed-in view. Word Correlation Analysis performed using 'AntConc' by Laurence Anthony. |
'Cambodia' Word Correlation Force-Directed Graph |
Sentiment Analysis
Sentiment Analysis Line Graphs
Sentiment Analysis performed using 'LIWC2007' by James Pennebaker, et al. |
This project has been generously supported from 2012-2013 and 2013-2014 by a Provost's Digital Innovation Grant from the Provost's Office of the Graduate Center of the City University of New York. |