An interview with Kate Wittenberg and Karen Hanson of Portico
Originally published February 2024. Reposted from The Scholarly Kitchen with author permission.
Continuing our Kitchen Essentials series of interviews with leaders of infrastructure organizations, this week we are speaking with leaders of preservation initiatives. Today we’re hearing from Kate Wittenberg and Karen Hanson, the Managing Director and Lead Research Developer, respectively of Portico, the community-supported preservation archive that safeguards access to e-journals, e-books, and digital collections. Full disclosure: Portico is a service of ITHAKA, where I work, and Kate and Karen are colleagues.
Please tell us a bit about yourselves — your roles at Portico, how you got there, and why you embarked on a career in research infrastructure?
Kate: I am the Managing Director of Portico. Before taking on this role I worked in scholarly publishing – first as Editor-in-Chief at Columbia University Press, and then as Director of the Electronic Publishing Initiative at Columbia, a collaboration between the Press and the library for developing born-digital scholarly publications. Having worked for many years at the intersection of academic libraries, scholarly publishing, and digital scholarship, I welcomed the opportunity to contribute to the preservation of the work being produced by scholars for future generations.
Karen: I’m the Lead Research Developer at Portico. I look at our workflows for processing and managing content and think about what we need to do to preserve the content for the long term. This includes investigating new and complex forms of content. It took me a few career jumps to get here. I started with a zoology degree and worked in a lab. I liked studying zoology, but didn’t enjoy lab work. Next I learned how to develop websites. I liked the continuous learning and problem solving but felt like I wanted to connect it to something. I found myself drawn to research environments because imagining what’s possible, and trying to understand how things came to be or how they work keeps me engaged and motivated. I eventually did a library science degree partly because I couldn’t decide what to specialize in, and I liked the idea of supporting learning without having to teach. Once I started working in an academic library environment, I finally found my home in digital preservation research — I dig into lots of different kinds of material, it’s cross-disciplinary, and I get to do research while also supporting it.
What do you like most and least about working in research infrastructure?
Kate: While it is not the first thing that most people think about, I believe that in many ways infrastructure is the key component of the scholarly communications system. As scholarship and publishing become increasingly innovative and complex, the infrastructure that supports and enables this work must constantly evolve in response to the needs of the changing environment. For example, as scholars create work that includes elements such as video, software, datasets, and annotations, they will need publishing and preservation infrastructure that is able to handle this type of publication. This creates an exciting challenge for those of us working in the field as we must be informed about what is going on in key areas of scholarship and publishing and then translate that insight into creating infrastructure to support those developments.
Karen: I think the story of how I got here reveals some of what I like about it. In digital preservation, I’m constantly learning new things and trying to peel back layers of how different things work, while also trying to imagine what will be possible in the future. I find the work really interesting and rewarding. There is also a terrific community. Many of us have similar challenges and the best way to accomplish our goals is to work together. So I enjoy that sense of collegiality and collaboration.
On the negative side, research infrastructure work (and digital preservation in particular) tends to be invisible to most people unless something goes wrong, and so the complexity and cost of sustaining this infrastructure can be underestimated. People may assume someone is taking care of it. Digital preservation specifically can seem like less of a priority when those responsible for generating content are focused on increasing or supporting users, keeping up with innovation, and trying to create interesting new content. Meanwhile, it can be best for preservation outcomes if people act while things are thriving, but it may be hard to make the case that it should be a priority. You know if you fail to persuade people, it will likely be too late or more complicated by the time they realize they need it. That can be painful when we see it happen, and unfortunately it does sometimes happen, sadly.
Based on your own experiences, what advice would you give someone starting, or thinking of starting, a career in preservation infrastructure?
Kate: I think it is important to feel comfortable and satisfied having a less visible role in the scholarly communication/preservation ecosystem. One needs to enjoy the process of listening, learning, and then building systems that support the work of others in important ways but not being front and center in terms of visibility.
Karen: I agree with Kate’s comment — I do like the sense of mission and the ability to contribute to scholarly infrastructure without having to be in the spotlight. There are a lot of different roles so people’s experience will vary, but from my technical perspective, I’d add that it helps to be quite adaptable, curious, willing to learn/experiment, and also comfortable with quite a bit of uncertainty. As the digital content and environment evolves, you may need to keep adapting your thinking as you try to make choices that will minimize the risk of loss based on the available research, tools, and budget. That’s where having a great community comes in so that you can learn from and support each other, and also why having the option of more than one preservation service is a good thing – each one makes different choices to manage the risks and uses different approaches to collect content, we won’t all be right every time. That’s understood, and so collaboration over competition is critical in this field. To help navigate all of this, organizations such as the Digital Preservation Coalition and Open Preservation Foundation are incredibly valuable. They generate and share excellent resources and create a community for people working in digital preservation at all levels, including those that are just starting out. I’d recommend looking into these organizations if you’re new to the field, they’re terrific!
What sort of infrastructure does Portico provide, and who are your users?
Kate: Portico provides technical and organizational infrastructure to support digital preservation of scholarly materials in a dark archive environment. Our users (we call them participants) are libraries, academic publishers, and increasingly, archives. When our content is made accessible in a trigger event our users are also individual scholars and students in the academic community.
How is Portico sustained financially?
Kate: Portico is a not-for-profit community-supported service that is sustained by financial support from libraries and publishers who contribute to the costs of preserving the scholarly record.
As the leader(s) of a preservation infrastructure organization, what do you think are the biggest opportunities we’ve not yet realized as a community — and what’s stopping us?
Kate: I think that we need to raise the level of awareness about the importance of preservation throughout the academic community. So much content is at risk, but often preservation is simply not on the radar of leaders in library, publishing, and higher education organizations. The consequences of losing access to valuable content are so significant, yet because preservation is by its nature in the background it is often not understood, valued, or supported as much as it needs to be.
Karen: Building on Kate’s comment about awareness — through my research into complex content, it’s clear that the more complex the content gets, the more important it is to incorporate preservation-oriented thinking into the process as early as possible to improve the chance the content will be preserved well in the long term. Digital preservation researchers will continue to find ways to preserve things as best we can, even when there has been no thought to preservation in the creation process or tools that are used, but it would be fantastic to see preservation built into these. I think research infrastructure might be one domain where this seems somewhat plausible with the right advocacy and funding. So my dream is a much closer relationship and collaboration between preservationists, content creators, platform builders, and those who fund them.
Looking at your own organization, what are you most proud of, and what keeps you awake at night?
Kate: I am tremendously proud of the progress Portico has made in preserving a large amount of content in a secure environment using robust infrastructure and systems, and supported by an effective and long-term business model. I worry a great deal about the increasingly complex nature of born-digital scholarship and the difficulty of preserving it effectively in a scalable and cost-effective manner. We need to maintain the preservation of large amounts of more traditional published content while simultaneously learning how to handle the next generation of scholarly communication, which may look quite different from what we have done previously.
Karen: I would echo Kate on both points here. Something I’m also proud of — I’ve been here long enough to see various requests that don’t fit with our normal operations. Libraries, publishers, or other organizations seeking help with a preservation challenge. If the request aligns with our mission and it seems like something positive for the community, we try to accommodate it. These are often one-off or short term things to help avoid loss of specific collections, but there are times where members of our community highlight a broader gap that needs some examination. In these cases, if it looks like something we can help with, we will spend some time looking into what we could do to address it, or if there might be a sustainable approach to cost recovery that would allow us to incorporate it into our preservation work. We have a motivated team that believes in what we do and they all get involved with trying to solve challenging problems. We take the idea of being of and for the community very seriously and I’m proud of that. What keeps me up at night? The increasing complexity of incoming content that Kate mentioned is up there on the list, though I try not to lose sleep over it. We’re involved in some research with NYU Libraries about preserving complex content, and that is helping us make informed decisions for improving our tools and workflows to help iteratively reduce the risk of loss. I see an AI question coming up next – I may have something to add there that worries me.
What impact has/does/will AI have on Portico?
Karen: On the positive side, my hope is we can semi-automate some of the routine configuration work we do for new content — have AI do a first pass and free up our experts to do more research and work on improving configurations for the increasingly complex content. The Portico team is starting to run tests to see if this could work, and if it does we may look for other cases where it could assist the process. On the negative side, I worry about the downstream effects of what publishers are going through with AI. For example, we’re seeing mass retractions related to paper mill generated articles, and improvements in AI will keep making these harder to detect. In our data, I note that publishers handle retractions in many different ways, making it difficult to ensure that retracted articles are appropriately labeled outside of the original platform to prevent them being cited as fact. While retractions are not a new problem, with AI the scale of the problem is growing. There is some work to standardize retraction processes such as the NISO CREC Working Group and CrossMark, but these are still fairly new and adoption isn’t yet widespread. I’m not sure how big this problem will become in the meantime and what new processes we might need to deal with this. For example, if a journal is closed down after being overwhelmed by paper-mill papers should we do anything different when we trigger for access? And what happens if a journal’s articles are identified as questionable after the publisher is gone? To the previous question — hosting harmful disinformation is definitely something that could keep me up at night. Anyway, this is just one example of how we have to adapt to AI as a downstream service, and I do worry that it will add to our workload when we already have a lot on our plate with figuring out how to preserve complex works. So I guess if AI can lighten our workload as an assistant, it could also add that burden right back.
What changes do you think we’ll see in terms of the overall preservation infrastructure over the next five to ten years, and how will they impact the kinds of roles you’ll be hiring for at Portico?
Kate: Portico has a small, agile, and highly experienced team that handles a very large number of digital journals and books. While preserving current scholarly content comprises the core part of our work today, there are many challenges ahead, driven by the changes in scholarly communications. Content that comprises the scholarly record has become both more dynamic and less bounded. Its various components (for example, text, supporting datasets, and video) can reside in more than one repository, in more than one version. For such content to be preserved, we will need to develop new approaches and different processes. To do this we will need preservation professionals who are ready and able to adapt to a rapidly-changing environment. This will require a willingness to innovate, an openness to questioning existing norms, and an ability to develop an understanding of the scholarly communications environment and how to support its changing needs over time.
Karen: I agree with Kate, and because of the complexity and networked nature of the content, we will likely need to do a better job of coordinating between archives. Some redundancy of material between archives is good for preservation because of the different approaches, but too much redundancy is bad for the environment and not a good use of the limited funding available. The fact is, no one archive can be the best at preserving every kind of content, and so we will likely need to specialize and depend on others for specific expertise such as software preservation.
Also to Kate’s comment about making content more preservable at the time of production. Perhaps not a prediction, but a hope for the future – I’d like to see a preservation mindset applied upstream as part of the publishing tools and processes to enable efficient preservation of complex materials. If that works out, new roles could emerge that involve applying digital preservation practices to the process of content creation. For the research I’m doing with NYU Libraries, we’re embedding preservation experts into publisher workflows and communicating with platform developers to see what it takes to incorporate preservation practices into these workflows and tools, and this may provide insight into what kinds of roles and training are needed to accomplish this. During this research we’ve noted similarities with accessibility improvements in the sense that a lot of what needs to be done for preservation is about building good practices into the processes and tools.
Finally, I think our business model may evolve in the coming years given the increase in open access publishing, and the growth in variety and complexity of the content. We have been diversifying the kinds of content providers we work with. This means the content providers and their content may no longer fit into our current models, so we will need to consider how to approach that to ensure an inclusive service for providers with different kinds of business models and content.