I love this comic. It so sums up dissertation writing for most PhDs. We spend years in classes, preparing for qualifying exams, submitting grant proposals, conducting preliminary research, defending our research proposal to our department, conducting the actual dissertation research, only to come back home and feel as though there is no way that we can synthesize all of this data into 200 or 300 pages of text. Come to think of it, it does seem quite impossible. And dare I say… impractical? If you are like me, preparing yourself from PhD to industry, rather than Phd to Academia, then coming back from the field to write 250 pages of text might seem quite tiresome and not worth your time. This blog post is dedicated to demystifying the data analytics process for large sums of anthropological data. In my next blog post, I will be questioning the importance and/or impracticality of writing a dissertation. But for now, let’s get to the nitty gritty of ethnographic data analysis!
Demystifying Dissertation Data Analytics for Anthropologists
Not all PhDs in anthropology have been trained in data analytics. I’m actually realizing this just now as I return from the field and hear all sorts of comments around my department (and others) such as, “what do I do with all of this data?” or “I hope there is a dissertation among these pages…?” Well, of course there is a dissertation among all of that work! But, without the right tools, we are capable of taking great research and creating a shoddy dissertation through poor data analysis.
Before starting my PhD, I worked for an international development contractor in Washington, DC. I was part of the Research & Evaluations team, and essentially I supported the team in analyzing their data when they came back from the field. I was a fairly young employee, so I was not yet put on the ticket in terms of international data collection for the USAID and private sector projects that we carried out. I did, however, get to work with some awesome data sets (mostly interviews), as well as design and administer many large-scale government surveys. While I was not working with as massive of a data set as I am dealing with now (15 months of uninterrupted field research + several months of online/field research over the past 4 years… all part of the same project), I still was dealing with loads of data that all needed to tell one thoughtful and highly analytical story. I have considered the process of organizing and analyzing my dissertation data the same as the process before, just much larger. And truly, these evaluation techniques have proven quite useful for my data analysis. I hope you will find these evaluation data analysis skills are applicable to your own research as well:
- Organize your data sets. What I mean by this is separate your diary notes from your field note jotting, from your survey responses (and break these down as well if you had multiple surveys), from your interviews, from your focus group recordings, and other data sets if you have them. Anthropological interviews tend to be semi-structured or even unstructured, making separating these interviews fairly difficult or even unnecessary. If we are dealing with 100 interviews that all have different questions, then there is no way to look at them as one systematic dataset. Instead, we need to read them carefully and code them ourselves. If, however, you are dealing with multiple interviews that are 100% (or close to) structured the same, then put those interviews together as one dataset.
- Choose a mixed-method data analysis software. These are often simply labeled as qualitative data analysis software, but unless absolutely necessary, I recommend simply using one to keep all of your data together in one easily accessible place. Depending on the complexity of my datasets I use NVivo or Dedoose. Dedoose is actually quite good, although it hasn’t received much attention in the anthropology world. It is a low-cost, online platform that allows you to analyze your entire data set, from interviews to surveys, in one location. I find it quite effective, especially if you plan on cold coding your data.
- Upload your data sets. At this point, you might want to give yourself a quick tutorial of whatever analytical software package you choose. Spend an afternoon really going through it. It will save you the headache on the other side of realizing that, for example, in order for your analysis to be most effective, you needed to upload documents one-by-one rather than as a singular document for diary entries. If available, use your social science resource center at your university and make use of any in-person tutorials that might be offered. I cannot emphasize enough how much you should have a solid understanding of how to use your software before diving into the analysis!
- Code your data. I am amazed at how few people actually code their data. I have heard this not just in academia, but in industry as well. People seem to think that patterns just magically emerge and that there are only a select group of people in the world that can “see” these patterns amidst copious amounts of data. Well that is just plain silly! Of course, finding patterns does take some practice and you have to be used to analyzing data to have patterns present themselves more intuitively, but I believe anyone can learn how to find patterns if they just understood how to code data. It’s really quite simple. First, start to think about themes in your research. Think big. Not is not the time to be creating subcodes. You will be going back over your data a second, third, or even fourth time for creating subcodes and sub-subcodes. You know your data better than anyone. And by the end of your research, you should be able to create some large codes without having to read through your data. But you will find as you do read through your data that you will likely add more. I want to use my data as an example to help you understand how big these codes should be. My dissertation research looked at the politics of cultural heritage in Albania and how heritage is being positioned, neglected, or erased in Albania’s nation-building process towards European Union integration. I know, super wacky, right? Well, I’m a trained archaeologist, and actually this kind of heritage research is not so crazy in the archaeology world. So for codes, I had about 10-15. I spent an afternoon thinking about what these major codes would be. I started with types of heritage: 1) Museums, 2) Historic Cities, 3) Archaeological sites, 4) Memorials and Monuments. Then I moved to type of heritage as in time period: 5) Ottoman heritage, 6) Greco-Roman heritage, 7) Communist heritage. Finally, I separated it by type of organization or interest group: 8) Albanian government, 9) Foreign government, 10) Intergovernmental agency, 11) NGOs and Foundations, 12) Communities. I assigned a color to each one and started to go through my data page by page, coding/highlighting (sometimes using multiple colors for the same excerpt) my data. This should be done within the software as it saves your codes and accompanying data. This first go took about a couple of weeks and a few new codes emerged that I hadn’t originally thought of. For example, I found intangible heritage needed to be added to my list of types of heritage. After you have created the major codes, you can now use your software to compile the data for each code. Now you can start really analyzing as you see subcodes emerge within each code. For example, under Communist Heritage, I had subcodes such as 1) formerly persecuted, 2) corruption, 3) nepotism, 4) prisons, 5) tunnel systems, etc. You can only find your subcodes if you continue to go through your data in organized chunks. From here, I could take the 10 pages of data that I had that dealt with corruption surrounding communist heritage and see this as a clear pattern. So, my overall points to coding are that 1) it is necessary and must be done, 2) take your time with it, 3) only you can know what your codes will be, so trust in yourself and in the ownership of your data, and 4) allow yourself to be surprised by your data! Allow yourself to find codes and subcodes that you didn’t know were there. But don’t force anything to emerge if it doesn’t. Your data is what it is. It is your responsibility to analyze it as ethically and accurately as possible.
- Map your codes and subcodes. I love this part of the data analysis process! It is the most fun (in my opinion). You can do this however feels best to you. But I find visualizing the codes all in one place as the best way to start thinking through my data. I like to go to an office supplies store and just buy a giant piece of paper (I’m old school like that!) You know, one of those you use for a design board? If you have a huge white board, even better. Using your software, you can now see which codes are the largest. They will show you that you have 500+ excerpts dealing with “archaeological sites” and only 100 dealing with “museums” for example. This is helpful knowledge. You can also see which codes interlink. Communist heritage, memorials, and corruption interlinked a lot in my work, for example. I placed bubbles in the center of the paper and labeled each with my most used codes. There were 3 of them. Then, I started to add branches outward with other connecting codes and major subcodes. For now, I left off subcodes (or even codes) that had very little data attached to it. This sort of “minor” data can work its way back in to the mapping process later. Once it was done, I could visualize my data on a beautiful map. You can even print accompanying photos and add it to the bubbles for some additional creative analysis. This allows you to see the bigger picture of all of that data and start to identify the major trends that are emerging. This is where you can really visualize what overlaps, identify areas of conflict in your research, and see “abnormalities” that might make for a more complex and interesting data analysis. For example, I found that Greco-Roman heritage was largely neglected in Albania, but one site was heavily promoted and invested in. What did this mean for the larger narrative of my research?
- Create an outline for your data. This could actually end up being the outline of your dissertation (or report). But it doesn’t have to be. It’s just a way to start taking the map and putting it into words using creative titles. Figure out how many chapters (ideally) you want your final piece to have, and then go with that (but be flexible and allow for one or two more or less). Now, start finding a way to locate all of your important codes, subcodes, trends, and unusual case studies, into one concrete 1-2 page outline. This is often the aha! moment where you can see now how all that amazingly complex data actually fits together in one cohesive narrative. At this point, usually your narrative naturally emerges.
- Write your narrative. This is not a final piece. It is just a way for you to now vocalize through enriched text what is in your outline. What is the major story here? First, write a paragraph for each chapter, what it is about, how it is divided, what case studies you will be going through, etc. Then, reading each chapter’s paragraph together, write one paragraph that sums all of this data up into one cohesive narrative. I can tell you for example that I didn’t know my research was about Albania’s European Union integration process until I coded my data, mapped it, created a navigable outline, and started to write about what each chapter was about. I then stopped and thought about how each chapter interlinked. And it suddenly became clear. If it doesn’t, keep working your data. And also know, that anomalies in datasets exist for a reason. The world is a complex place. Your narrative might work 95% of the time but there will always be exceptions. Don’t ignore your exceptions because they don’t fit into your narrative. Work them in. Understand them. Uncover why they are there. This will allow you to tell a more accurate version of your data and will also allow for a really cool and complex narrative at the very end.
- Rework your data and have fun. Finally, just remember to keep working your data before you start to write. In some cases, going through years of data can take months. Allow yourself (and your data) that time that you/it deserve(s). Keep working it until it all makes sense, anomalies and all! And most importantly, have fun with this process. No one will ever get to do this with your data. You will be come the utmost expert on this topic. So, enjoy the process of uncovering your treasures and visualizing the fruits of your labor! Best of luck and happy analyzing!