PBDB Data Service: Documentation

Data service version 1.1 (b3)

This is the current version of the data service, and should be used for all new applications.

DESCRIPTION

The function of this data service is to provide programmatic access to the information stored in the Paleobiology Database. Our goal is to make the entire database accessible by means of this service, so that anyone can write client software that interacts with it.

SCOPE

This service currently provides access to the following classes of information, by means of the indicated URLs. The following links will take you to pages which document the individual URL paths, listing the parameters accepted and the data fields returned by each.

ClassDescription
Fossil Occurrences

A fossil occurence represents the occurrence of a particular organism at a particular location in time and space. Each occurrence is a member of a single fossil collection, and has a taxonomic identification which may be more or less specific.

Fossil Collections

A fossil collection is somewhat loosely defined as a set of fossil occurrences that are co-located geographically and temporally. Each collection has a geographic location, stratigraphic context, and age estimate.

Taxonomic names

The taxonomic names stored in the database are arranged hierarchically. Our tree of life is quite complete down to the class level, and reasonably complete down to the suborder level. Below that, coverage varies. Many parts of the tree have been completely entered, while others are sparser.

Time intervals

The database lists almost every geologic time interval in current use, including the standard set established by the International Commission on Stratigraphy (v2013-1).

Geological strata

Every fossil collection in the database is categorized by the formation in which it occurs, and many by group and member.

Bibliographic references

Each fossil occurrence, fossil collection and taxonomic name in the database is associated with one or more bibliographic references, identifying the source from which this information was entered.

Client configuration

This class provides information about the structure, encoding and organization of the information in the database. It is designed to enable the easy configuration of client applications.

For now, this service provides read-only access to the publicly available parts of the data. In the future, we plan to add an authentication module which will accept login credentials and will allow access to protected data, addition of new data, and modification of existing data.

USAGE

You can access this service by making HTTP requests whose URLs conform to a simple scheme. In most cases each URL maps to a single database query, and the body of the response represents some or all of the resulting records. For a description of how this information is encoded, see the documentation for the various output formats.

For example, consider the following URL:

/data1.1/taxa/single.json?name=Dascillidae&show=attr

An HTTP GET request using this URL would return information about the taxon Dascillidae (soft-bodied plant beetles). The components of this URL are as follows:

/data1.1/taxa/single

The URL path indicates the operation to be carried out. For a GET request, it specifies the class of information to be retrieved.

json

The path suffix indicates the format in which the results will be returned. In this case, the result will be expressed in Javascript Object Notation.

name=Dascillidae

Some of the parameters are used to construct a database query that will retrieve the desired information. This one selects a particular taxonomic name.

show=attr

Other parameters change or augment the set of information returned. This one specifies that in addition to basic information about the taxonomic name the result should also include the name's attribution.

Each URL path accepts its own set of parameters as well as a set of common parameters that control the form of the result.

For now, the only HTTP requests that are accepted are GET requests. Once we allow authentication and data modification, these operations will be carried out by means of POST, PUT and DELETE requests.

FORMATS

By using the appropriate suffix, you can choose to retrieve any of the available information in any of the available formats:

FormatSuffixDocumentationDescription
JSONjsonJSON response

Javascript Object Notation (JSON) is the most commonly used format for data communication on the Web. Our JSON responses use short (3-character) field names in order to minimize the amount of data returned.

XMLxmlXML response

Our XML responses use Darwin Core element names. Unfortunately, many of our data fields have no counterpart in the Darawin Core element set and thus cannot be included in responses of this type.

tab-separated texttsv txtText response

This format produces files that are very similar to the download files from the classic PBDB. The same field names are used, so that you can compare results with previous PBDB downloads and use the same analysis tools. Results in this format can be easily loaded into most spreadsheet software for futher processing and analysis.

comma-separated textcsvText response

This format is identical to the tab-separated format, but with fields quoted and separated by commas. It can be similarly loaded into most spreadsheet software.

If an error occurs, the response body will be a JSON object if the URL path suffix is json and HTML otherwise. If the URL path suffix is not recognized, an error of type 415 Unknown Media Type will be returned.

AUTHOR

This service is provided by the Paleobiology Database, hosted by the Department of Geoscience at the University of Wisconsin-Madison.

If you have questions about this service, please contact Michael McClennen <mmcclenn@geology.wisc.edu>.