Text and data mining at Springer Nature

What is TDM?

TDM (Text and Data Mining) is the automated process of selecting and analyzing large amounts of text or data resources for purposes such as searching, finding patterns, discovering relationships, semantic analysis and learning how content relates to ideas and needs in a way that can provide valuable information needed for studies, research, etc.

Text and Data Mining Illustration

Advancing Discovery

NEW TDM WHITE PAPER

Bringing Insight to Data: Info Pros’ Role in Text and Data Mining

Text and Data Mining at Springer Nature

(PDF, 2.19 MB)

Springer Nature TDM policy

Springer Nature recognizes the importance of new research techniques and aims to support innovation in this regard. As the volume of scientific publications increases and TDM software tools improve, Springer Nature appreciates the need for a more formalized process to enable TDM, and strives to make this as simple as possible for researchers.

For questions on text and data mining (TDM) licenses contact [email protected].

TDM for researchers at subscribing academic institutions

For subscribed journals and books, Springer Nature grants researchers text and data mining rights via their institutions, provided the purpose is non-commercial research.

Individual researchers can download subscription (and open access) journal articles and books for TDM purposes directly from Springer Nature’s content platforms. They are requested to limit this to 1 request per second. The selection of desired articles can be conducted by using existing search methods and tools, such as PubMed, Web of Science, or Springer Nature’s Metadata API, among others. An API key can be requested for researchers who want to use Springer Nature’s TDM APIs. Use of the API provides additional querying parameters and a higher bandwidth for content requests (150 requests per minute).

Researchers are required to use reasonable measures to protect the security of downloaded content, store content on a secure internal server without access for third parties and only for the duration of the TDM project.

Researchers are requested to be considerate and limit downloads to a reasonable rate which does not impose an undue burden on Springer Nature’s systems and servers.

Implementation by academic and government institutions

Subscribing academic and government institutions may include text and data mining rights in all new and renewed Journal and ebook subscription agreements under Springer Nature’s standard TDM terms (Springer Nature's specialist Database products excluded). For such customers the rights to perform TDM is at no additional cost for content that their subcription license provides access to. Existing subscribers may also add TDM rights under these terms before their agreement is up for renewal.

The use of Springer Nature’s TDM API incurs additional costs.

TDM for commercial research (Industry)

For TDM in the context of commercial research, Springer Nature offers standard TDM terms as well as the TDM API for a fee.

TDM categories

TDM for subscribers

For subscribers Springer Nature offers a large variety of TDM tools such as metadata and fulltext APIs, to be applicable to both open access and subscribed resources. See https://api.springernature.com and contact [email protected] if you have further questions.

TDM for non-subscribers

Non-subscribers are offered a variety of TDM tools for our Open Access resources, such as our Open Access fulltext API (see https://api.springernature.com). TDM requirements from non-subscribers for pay-walled content are treated on a case-by-case basis. Please contact [email protected].

TDM for Open Access content

For Open Access content we are offering a powerful Open Access fulltext API (see https://api.springernature.com). The Open Access API provides metadata and fulltext content where available for more than 900,000 online documents from Springer Nature open access XML, including BioMed Central and SpringerOpen journals.

What about image mining

At the moment Springer Nature does not offer any API for image mining.

What about argumentation mining

“Argumentation mining aims to automatically detect, classify and structure argumentation in text. Therefore, argumentation mining is an important part of a complete argumentation analysis, i.e. understanding the content of serial arguments, their linguistic structure, the relationship between the preceding and following arguments, recognizing the underlying conceptual beliefs, and understanding within the comprehensive coherence of the specific topic.” (Mochales, R. & Moens, MF. Artif Intell Law (2011) 19: 1. https://doi.org/10.1007/s10506-010-9104-x)

Argumentation mining can be considered as a subset of text mining. If you are planning to locally store non-open-access content during an argumentation mining project, please get in contact with [email protected] to discuss options.

TDM tools and methods

APIs

Springer Nature offers a variety of APIs to facilitate Text and Data Mining activities.

Metadata API: Metadata and abstracts for online documents (journal articles, book chapters, protocols, etc.)
Meta API: New versioned metadata for online documents with additional fields and links to source content)
Fulltext API for Open Access content: Fulltext content where available for Springer Nature Open Access XML
Fulltext API for Open Access and pay-walled content (under license): Fulltext content where available for all Springer Nature XML
Journal header data API: "journal-level" API that provides XML based on the Journal ID
Citations API

A full picture of our API offerings with details information, examples and API key sign-up can be looked up under https://api.springernature.com.

Metadata delivery

Springer Nature also offers direct metadata delivery options in various formats, such as JATS, Dublin Core, ONIX, or MARC records, using different protocols ftp/ftps, sftp) including for metadata harvesting (OAI-PMH).

For direct metadata download features via the metadata downloader see metadata.springernature.com. If you have requests for metadata delivery please contact [email protected].

Crossref TDM services

Springer Nature is participating in the Crossref TDM working group and is recommending Crossref services for pan-publisher TDM.

For further information see http://tdmsupport.crossref.org/

Glossary

TDM

Argumentation mining

XML parsing

“XML parsing is the process of reading an XML document and providing an interface to the user application for accessing the document.” (Li C. (2009) XML Parsing, SAX/DOM. In: LIU L., ÖZSU M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9)

API

“API stands for application programming interface. An API helps expose a business service or an enterprise asset to the developers building an application. Applications can be installed and accessed from a variety of devices, such as smartphones, tablets, kiosks, gaming consoles, connected cars, and so forth. Google Maps APIs for locating a place on a map, Facebook APIs for gaming or sharing content, and the Amazon APIs for product information are some examples of APIs. Developers use these APIs to build cool and innovative apps that can provide an enriching user experience. For example, developers can use APIs from different travel companies to build an app that compares and displays each travel companies’ price for the same hotel. A user can then make an informed decision and book the hotel through the company that is providing the best offer. This saves the user from doing the comparison on his own—thus improving his overall experience. APIs thus help provide an improved user experience.” (De B. (2017) API Management. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-1305-6)

API key

“An API key identifies the application using an API. It provides a simple mechanism to authenticate the apps. API keys allow an API to determine which applications are using it. API keys are generally long series of random characters typically passed as an HTTP query parameter or header. This makes it easy to use an API key in an API request for application authentication. API keys are also known by other names, such as app ID, client ID, app key, or consumer key.” (De B. (2017) API Management. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-1305-6)

XML (Extensible Markup Language)

“XML is a standard for the modeling of data using a tree structure. It consists of rules defining the structure of documents that can store data consistent with the defined structure. XML therefore is a framework for the definition of data structures without describing how an actual document can be processed.” (Lupp M. (2008) Extensible Markup Language. In: Encyclopedia of GIS. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-35973-1)

IE (Information extraction)

“Information Extraction (IE) is a task of extracting pre-specified types of facts from written texts or speech transcripts, and converting them into structured representations (e.g., databases).” (Ji H. (2009) Information Extraction. In: LIU L., ÖZSU M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9)

NLP (Natural Language Processing)

“Natural Language Processing (NLP) … aims at developing computational methods and algorithms for understanding and generating human languages.” (Rus V. (2013) Natural Language Processing. In: Runehov A.L.C., Oviedo L. (eds) Encyclopedia of Sciences and Religions. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-8265-8)