Researchers investigating generative AI and scholarly publishing

A new study by Ithaka S+R seeks to gain insight into the technology’s potential to transform the production of academic scholarship.

The rapid rise of generative artificial intelligence (AI) has confronted the scholarly publishing world with the potential risks and benefits of using the new technology in the production of academic research and writing.

As the technology continues to gain popularity, Ithaka S+R, an education research firm, launched a new study last month to gain more insight into the implications of generative AI on scholarly publishing.

Over the next several months, researchers will interview about 15 “decision makers from the publishing sector, and others with subject expertise, on generative AI’s opportunities and risks,” according to a blog post about the project.

Generative AI “has potential for really radical transformative change, but it’s also moving very quickly,” said Tracy Bergstrom, Ithaka’s program manager for collections and infrastructure who will help lead the new study. “Trying to not just keep up but also think about the implications has made it a very challenging space over the past year.”

Ithaka’s inquiries come as the entire higher education sector is grappling with how to approach generative AI.

Since generative AI is still so new—ChatGPT went public in late 2022 and had 100 million monthly active users within two months—there’s not much uniformity across higher education about if or how students and faculty members should use it to aid in the completion of assignments and research papers.

Just 20 percent of universities have adopted a policy governing the use of AI in teaching and research, according to Inside Higher Ed’s latest annual provost survey. However, 63 percent of provosts said an AI policy is currently in the works at their individual campuses.

But in addition to following institutional rules, academic researchers under pressure to publish also have to make sure that if they use ChatGPT or a similar program to write a paper, it doesn’t violate a journal’s rules. While some journals don’t have explicit guidelines about Generative AI, the Science family of journals, for instance, requires explicit permission from an editor to use any text, images, figures or data generated by AI.

In January, Ithaka published a report called “The Second Digital Transformation of Scholarly Publishing,” which was based on dozens of interviews conducted between March and June of 2023 with publishers, librarians, advocates, analysts, funders and policymakers “as the horizons of possibility and risk posed by generative AI were becoming apparent,” according to the organization’s website.

But in the nearly one year since those interviews, the capability of generative AI has grown, and it’s raising even more questions.

“We heard about generative AI as a forthcoming thing people were keeping an eye on, but we didn’t gather enough data at that point in time to say anything constructive about how generative AI tools in this space may push on these transformative processes we’re seeing at the moment,” Bergstrom said of the report Ithaka published in January.

As part of the new study, Bergstrom said she and her team have several critical questions they want to investigate about generative AI and scholarly publishing, including: How are generative AI tools going to be integrated? What are new areas in which generative AI tools might blow away or completely replace some of the older processes we see right now? How might the publishing industry be transformed because of generative AI tools? What are the most pressing ethical and market challenges around the tools?

Mohammad Hosseini, a research ethics and integrity expert and assistant professor at Northwestern University’s medical college, said that questions about the integrity of and the trust in AI-generated content is important to answer before delving into the logistical aspects of integrating generative AI into the academic research process.

“I’m skeptical, but I’m not a luddite. I think we should use it for our benefit,” he said, noting that despite reproduction of research results being a hallmark of scientific integrity, it’s often difficult to replicate content produced by generative AI. “Given the kinds of mistakes and errors these systems make, every user needs to ask themselves if they can verify the accuracy of generated content and error and biases.”

And if faulty AI-generated content makes it through the publishing process undetected, it becomes part of the vast information pool generative AI programs will draw on to generate answers to future queries.

“We want research to be trustworthy. We want people to try our vaccines in the middle of a pandemic. We want people to believe our statistics about the risks of second-hand smoke,” Hosseini said. “But if we have these systems that spit out content that seems to come from scientific discovery, we may lose the trustworthiness of science over time.”

Gregory E. Kaebnick, co-editor of the Hastings Center Report, a peer-reviewed bioethics journal, said he’s encouraged by some of the questions Ithaka is asking in its new study.

“The nature of transformative technology is that you can’t anticipate what the change will be. But once it’s happened, you can’t rewind,” he said. “The thing to do is to try to ask in an iterative way how AI is being used and what kind of future uses people anticipate.”

While he has some reservations about how generative AI could infringe another scholar’s copyright, Kaebnick said it may help to diversify authorship of scholarly publications. He said Large Language Models (LLMs), a type of generative AI trained on text-based data that produces text-based answers, can be useful tools for researchers whose primary language isn’t English to have a better shot at publishing their work in major journals, which are often written in English.

“They struggle to write well and organize their thoughts in a way that works well for a journal,” Kaebnick said. “I’d envision them using an LLM as an incredibly sophisticated thesaurus where they’re asking ‘What’s a better paragraph here?’ or “What’s a better organizational scheme?’ That strikes me as a legitimate use.”

However, “it could be challenging to figure out where the line is for a legitimate use like that and a use where you give over authorship to an LLM or use an LLM to mimic another author,” Kaebnick said.

But making that distinction will require honest input from the academic community.

“The transparency will help us think about these problems and the use of LLMs together and decide what’s working and what’s problematic,” Kaebnick said. “But we can’t have a public conversation about these things unless we know what’s going on. And that’s one of the things that’s valuable about Ithaka’s study: They’re trying to figure out what’s going on.”

The latest news on developments at colleges and universities around the country.

If conservatives and liberals start scouring their opponents’ academic publications for stolen ideas or phrases

A study finds increasing levels of buy-in of open resources, with about three in 10 instructors requiring OER i

Substack, which hosts roughly two million paid subscribers, said its base of academic users increased more than 100 p

Subscribe for free to Inside Higher Ed’s newsletters, featuring the latest news, opinion and great new careers in higher education — delivered to your inbox.

Copyright © 2024 Inside Higher Ed All rights reserved. | Website designed by nclud