SchemaOrg Reader

This notebook shows how to read metadata from Schema.org, using a URL with embedded Schema.org JSON-LD.

Fetch metadata

from commonmeta import Metadata

# Fetch metadata from a URL pointing to a landing page for a scholary resource
string = 'https://doi.pangaea.de/10.1594/PANGAEA.836178'
metadata = Metadata(string)

# Check that metadata was fetched successfully
print(metadata.state)
None

Inspect the metadata

The metadata are optionally embedded in the HTML of the page, using the JSON-LD format. The metadata are embedded in a <script> tag with the type attribute set to application/ld+json They are converted into the internal commonmeta format.

  • id: the persistent identifier of the resource
  • type: the type of the resource in commmonmeta format, e.g Dataset, Software or JournalArticle
  • titles: the title(s) of the resource
  • creators: the creator(s)/author(s) of the resource
  • publisher: the publisher of the resource
  • publication_year: the publication year of the resource

In addition, there are plenty of optional metadata. They are converted into the standardized commonmeta format used internally. This format is close to the metadata format used by DataCite.

commonmeta = metadata.write()
print(commonmeta)
{
    "id": "https://doi.org/10.1594/pangaea.836178",
    "type": "Dataset",
    "url": "https://doi.pangaea.de/10.1594/PANGAEA.836178",
    "contributors": [
        {
            "type": "Person",
            "contributorRoles": [
                "Author"
            ],
            "givenName": "Emma",
            "familyName": "Johansson"
        },
        {
            "type": "Person",
            "contributorRoles": [
                "Author"
            ],
            "givenName": "Sten",
            "familyName": "Berglund"
        },
        {
            "type": "Person",
            "contributorRoles": [
                "Author"
            ],
            "givenName": "Tobias",
            "familyName": "Lindborg"
        },
        {
            "type": "Person",
            "contributorRoles": [
                "Author"
            ],
            "givenName": "Johannes",
            "familyName": "Petrone"
        },
        {
            "type": "Person",
            "contributorRoles": [
                "Author"
            ],
            "givenName": "Dirk",
            "familyName": "van As"
        },
        {
            "type": "Person",
            "contributorRoles": [
                "Author"
            ],
            "givenName": "Lars-G\u00f6ran",
            "familyName": "Gustafsson"
        },
        {
            "type": "Person",
            "contributorRoles": [
                "Author"
            ],
            "givenName": "Jens-Ove",
            "familyName": "N\u00e4slund"
        },
        {
            "type": "Person",
            "contributorRoles": [
                "Author"
            ],
            "givenName": "Hjalmar",
            "familyName": "Laudon"
        }
    ],
    "titles": [
        {
            "title": "Hydrological and meteorological investigations in a lake near Kangerlussuaq, west Greenland"
        }
    ],
    "publisher": {
        "name": "PANGAEA"
    },
    "date": {
        "published": "2014-09-25"
    },
    "container": {
        "type": "DataRepository",
        "title": "PANGAEA",
        "identifier": "https://www.pangaea.de/",
        "identifierType": "URL"
    },
    "language": "en",
    "license": {
        "id": "CC-BY-3.0",
        "url": "https://creativecommons.org/licenses/by/3.0/legalcode"
    },
    "descriptions": [
        {
            "description": "Few hydrological studies have been made in Greenland, other than on glacial hydrology associated with the ice sheet. Understanding permafrost hydrology and hydroclimatic change and variability, however, provides key information for understanding climate change effects and feedbacks in the Arctic landscape. This paper presents a new extensive and detailed hydrological and meteorological open access dataset, with high temporal resolution from a 1.56 km**2 permafrost catchment with a lake underlain by a through talik close to the ice sheet in the Kangerlussuaq region, western Greenland. The paper describes the hydrological site investigations and utilized equipment, as well as the data collection and processing. The investigations were performed between 2010 and 2013. The high spatial resolution, within the investigated area, of the dataset makes it highly suitable for various detailed hydrological and ecological studies on catchment scale.",
            "descriptionType": "Abstract"
        }
    ],
    "geo_locations": [
        {
            "geoLocationPoint": {
                "pointLongitude": -50.18037,
                "pointLatitude": 67.12594
            }
        }
    ],
    "provider": "DataCite"
}

Enhance the metadata with HTML meta tags

The metadata are enhanced with the following HTML meta tags: citation_doi, citation_author, citation_title, citation_publisher, citation_publication_date, citation_keywords, citation_language, citation_issn. These tags are recommended by Google Scholar and widely used by publishers and repositories. Below is an example with embedded HTML meta tags not using Schema.org:

url = 'https://verfassungsblog.de/einburgerung-und-ausburgerung'
metadata = Metadata(url)
commonmeta = metadata.write()
print(commonmeta)
{
    "id": "https://doi.org/10.17176/20221210-001644-0",
    "type": "Article",
    "url": "https://verfassungsblog.de/einburgerung-und-ausburgerung",
    "contributors": [
        {
            "type": "Person",
            "contributorRoles": [
                "Author"
            ],
            "givenName": "Maria Martha",
            "familyName": "Gerdes"
        }
    ],
    "titles": [
        {
            "title": "Einb\u00fcrgerung und Ausb\u00fcrgerung: Warum die Staatsangeh\u00f6rigkeitsrechtsreform nicht ohne Ausb\u00fcrgerungsrechtsreform funktioniert"
        }
    ],
    "publisher": {
        "name": "Verfassungsblog"
    },
    "date": {
        "published": "2022-12-09"
    },
    "container": {
        "type": "Blog",
        "title": "Verfassungsblog"
    },
    "subjects": [
        {
            "subject": "staatsangeh\u00f6rigkeit"
        },
        {
            "subject": "mehrstaatigkeit"
        },
        {
            "subject": "einb\u00fcrgerung"
        },
        {
            "subject": "bundesinnenministerium"
        }
    ],
    "language": "de-DE",
    "descriptions": [
        {
            "description": "Die von der Bundesinnenministerin vorangetriebene Staatsangeh\u00f6rigkeitsrechtsreform zur Erleichterung der Einb\u00fcrgerung wirft altbekannte Fragen der Zuordnung von Personen zu Staaten und die damit verbundenen Zugeh\u00f6rigkeitsvorstellungen zu einem Staatsvolk auf. Allerdings liegt auch bei dem aktuellen Reformvorhaben die Aufmerksamkeit nur auf dem Erwerb der Staatsangeh\u00f6rigkeit. Dieser Fokus l\u00e4sst die andere Seite der Medaille unber\u00fccksichtigt: Um die M\u00f6glichkeit von Mehrstaatigkeit konsequent f\u00fcr das gesamte Staatsangeh\u00f6rigkeitsrecht umzusetzen, muss die Diskussion zus\u00e4tzlich f\u00fcr das Ausb\u00fcrgerungsrecht gef\u00fchrt werden.",
            "descriptionType": "Abstract"
        }
    ],
    "provider": "DataCite"
}