Title: | Create Data Frames that are Easier to Exchange and Reuse |
---|---|
Description: | The aim of the 'dataset' package is to make tidy datasets easier to release, exchange and reuse. It organizes and formats data frame 'R' objects into well-referenced, well-described, interoperable datasets into release and reuse ready form. |
Authors: | Daniel Antal [aut, cre] |
Maintainer: | Daniel Antal <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.3.4018 |
Built: | 2025-03-25 20:21:49 UTC |
Source: | https://github.com/dataobservatory-eu/dataset |
Coerce to character vector
as_character(x) ## S3 method for class 'haven_labelled_defined' as_character(x)
as_character(x) ## S3 method for class 'haven_labelled_defined' as_character(x)
x |
A vector created with |
A character vector.
as_character(iris_dataset$Species)
as_character(iris_dataset$Species)
Add metadata conforming the DCMI Metadata Terms. to datasets, i.e. structured R data.frame or list objects, for an accurate and consistent identification of a resource for citation and retrieval purposes.
as_dublincore(x, type = "bibentry", ...) dublincore( title, creator, contributor = NULL, publisher = NULL, identifier = NULL, subject = NULL, type = "DCMITYPE:Dataset", dataset_date = NULL, language = NULL, relation = NULL, format = "application/r-rds", rights = NULL, datasource = NULL, description = NULL, coverage = NULL ) is.dublincore(x) ## S3 method for class 'dublincore' is.dublincore(x)
as_dublincore(x, type = "bibentry", ...) dublincore( title, creator, contributor = NULL, publisher = NULL, identifier = NULL, subject = NULL, type = "DCMITYPE:Dataset", dataset_date = NULL, language = NULL, relation = NULL, format = "application/r-rds", rights = NULL, datasource = NULL, description = NULL, coverage = NULL ) is.dublincore(x) ## S3 method for class 'dublincore' is.dublincore(x)
x |
An object that is tested if it has a class "dublincore". |
type |
The nature or genre of the resource. Recommended best practice is
to use a controlled vocabulary such as the DCMI Type Vocabulary
DCMITYPE.
For a dataset, the correct term is |
... |
Optional parameters to add to a |
title |
dct:title,
a name given to the resource. |
creator |
An entity primarily responsible for making the resource.
dct:creator
Corresponds to |
contributor |
An entity responsible for making contributions to the dataset. See DCMI: Contributor, and for possible contribution type, please review MARC Code List for Relators. |
publisher |
Corresponds to
dct:publisher
and Publisher in DataCite. The name of the entity that holds, archives,
publishes prints, distributes, releases, issues, or produces the resource.
This property will be used to formulate the citation, so consider the
prominence of the role. For software, use |
identifier |
An unambiguous reference to the resource within a given
context. Recommended practice is to identify the resource by means of a
string conforming to an identification system. Examples include
International Standard Book Number (ISBN), Digital Object Identifier (DOI),
and Uniform Resource Name (URN). Select and identifier scheme from
registered
URI schemes maintained by IANA. More details:
Guidelines
for using resource identifiers in Dublin Core metadata and IEEE LOM.
Similar to |
subject |
Defaults to |
dataset_date |
Corresponds to a point or period of time associated with
an event in the lifecycle of the resource.
dct:date.
|
language |
A language of the dataset. See DCMI: Language. |
relation |
A related resource. Recommended best practice is to identify
the related resource by means of a string conforming to a formal
identification system. See:
dct:relation.
Similar to |
format |
The file format, physical medium, or dimensions of the dataset. See DCMI: Format. |
rights |
Corresponds to
dct:rights
and |
datasource |
The source of the dataset,
DCMI:
Source, which corresponds to a |
description |
An account of the resource. It may include but is not
limited to: an abstract, a table of contents, a graphical representation,
or a free-text account of the resource.
dct:description.
In |
coverage |
The spatial or temporal topic of the resource, spatial applicability of the dataset, or jurisdiction under which the dataset is relevant. See DCMI: Coverage. |
The Dublin Core, also known as the Dublin Core Metadata Element Set
(DCMES), is a set of fifteen main metadata items for describing digital or
physical resources, such as datasets or their printed versions. Dublin Core
has been formally standardized internationally as ISO 15836, as IETF RFC
5013 by the Internet Engineering Task Force (IETF), as well as in the U.S.
as ANSI/NISO Z39.85.
To provide compatibility with bibentry
we try to add
dataset_date
parameter first as publication_date
metadata
field, and as a year
field, too. This element can be get or set with
publication_year
.
The ResourceType
property will be by definition "Dataset".
The Size
attribute (e.g. bytes, pages, inches, etc.) will
automatically added to the dataset.
dublincore()
creates a utils::bibentry
object extended with standard Dublin Core bibliographical metadata,
as_dublincore()
retrieves the contents of this bibentry object of a
dataset_df from its attributes, and returns the contents as list,
dataset_df, or bibentry object, or an ntriples string.
A logical value, if the bibliographic entries are listed according to the Dublin Core specification.
Other bibentry functions:
datacite()
,
get_bibentry()
orange_bibentry <- dublincore( title = "Growth of Orange Trees", creator = c( person( given = "N.R.", family = "Draper", role = "cre", comment = c(VIAF = "http://viaf.org/viaf/84585260") ), person( given = "H", family = "Smith", role = "cre" ) ), contributor = person( given = "Antal", family = "Daniel", role = "dtm" ), #' Add data manager publisher = "Wiley", datasource = "https://isbnsearch.org/isbn/9780471170822", dataset_date = 1998, identifier = "https://doi.org/10.5281/zenodo.14917851", language = "en", description = "The Orange data frame has 35 rows and 3 columns of records of the growth of orange trees." ) # To review the existing dataset_bibentry of a dataset_df object: as_dublincore(orange_df, type = "list")
orange_bibentry <- dublincore( title = "Growth of Orange Trees", creator = c( person( given = "N.R.", family = "Draper", role = "cre", comment = c(VIAF = "http://viaf.org/viaf/84585260") ), person( given = "H", family = "Smith", role = "cre" ) ), contributor = person( given = "Antal", family = "Daniel", role = "dtm" ), #' Add data manager publisher = "Wiley", datasource = "https://isbnsearch.org/isbn/9780471170822", dataset_date = 1998, identifier = "https://doi.org/10.5281/zenodo.14917851", language = "en", description = "The Orange data frame has 35 rows and 3 columns of records of the growth of orange trees." ) # To review the existing dataset_bibentry of a dataset_df object: as_dublincore(orange_df, type = "list")
Coerce a defined vector to numeric
as_numeric(x) ## S3 method for class 'haven_labelled_defined' as_numeric(x)
as_numeric(x) ## S3 method for class 'haven_labelled_defined' as_numeric(x)
x |
A vector created with |
A numeric vector.
as_numeric(iris_dataset$Petal.Length)
as_numeric(iris_dataset$Petal.Length)
Add rows of the y dataset to the x dataset.
bind_defined_rows(x, y, ...)
bind_defined_rows(x, y, ...)
x |
A dataset created with |
y |
A dataset created with |
... |
Optional parameters: |
By default, the dataset_bibentry bibliographical data and the
title is recycled from dataset x. You can give a new title with
..., title="New Title".
By default, the unique creators of dataset y, who are not present in
dataset x, are added to the creators of the new dataset.
A dataset with the combined rows.
A <- dataset_df( a = defined(c(11, 14, 16), label = "length", unit = "cm"), dataset_bibentry = dublincore( title = "Test", creator = person("Jane Doe"), dataset_date = Sys.Date() ) ) B <- dataset_df( a = defined(c(12, 17, 19), label = "length", unit = "cm"), dataset_bibentry = dublincore( title = "Test", creator = person("Jane Doe") ) ) bind_defined_rows(x = A, y = B)
A <- dataset_df( a = defined(c(11, 14, 16), label = "length", unit = "cm"), dataset_bibentry = dublincore( title = "Test", creator = person("Jane Doe"), dataset_date = Sys.Date() ) ) B <- dataset_df( a = defined(c(12, 17, 19), label = "length", unit = "cm"), dataset_bibentry = dublincore( title = "Test", creator = person("Jane Doe") ) ) bind_defined_rows(x = A, y = B)
The c() method with the haven_labelled_defined class requires a strict
matching of the var_label, unit, definiton, and namespace attributes (if
they exist and do not have a NULL
value)
## S3 method for class 'haven_labelled_defined' c(...)
## S3 method for class 'haven_labelled_defined' c(...)
... |
objects to be concatenated. |
A haven_labelled_defined vector.
a <- defined(1:3, label = "Length", unit = "meter") b <- defined(4:6, label = "Length", unit = "meter") c(a, b)
a <- defined(1:3, label = "Length", unit = "meter") b <- defined(4:6, label = "Length", unit = "meter") c(a, b)
Add the optional Creator
property as an attribute to a
dataset object.
creator(x) creator(x, overwrite = TRUE) <- value
creator(x) creator(x, overwrite = TRUE) <- value
x |
A semantically rich data frame object created by
|
overwrite |
If the attributes should be overwritten. In case it is set
to |
value |
The |
The Creator
corresponds to
dct:creator
in Dublin Core and Creator in DataCite. The name of the entity that holds,
archives, publishes prints, distributes, releases, issues, or produces the
dataset. This property will be used to formulate the citation, so consider
the prominence of the role.
The Creator attribute as a character of length one is added to
x
.
Other Bibliographic reference functions:
dataset_title()
creator(orange_df) # To change author: creator(orange_df) <- person("Jane", "Doe") # To add author: creator(orange_df, overwrite = FALSE) <- person("John", "Doe")
creator(orange_df) # To change author: creator(orange_df) <- person("Jane", "Doe") # To add author: creator(orange_df, overwrite = FALSE) <- person("John", "Doe")
Add metadata conforming the DataCite Metadata Schema.
datacite( Title, Creator, Identifier = NULL, Publisher = NULL, PublicationYear = NULL, Subject = subject_create(term = "data sets", subjectScheme = "Library of Congress Subject Headings (LCSH)", schemeURI = "https://id.loc.gov/authorities/subjects.html", valueURI = "http://id.loc.gov/authorities/subjects/sh2018002256"), Type = "Dataset", Contributor = NULL, Date = ":tba", DateList = NULL, Language = NULL, AlternateIdentifier = ":unas", RelatedIdentifier = ":unas", Format = ":tba", Version = "0.1.0", Rights = ":tba", Description = ":tba", Geolocation = ":unas", FundingReference = ":unas" ) as_datacite(x, type = "bibentry", ...) is.datacite(x) ## S3 method for class 'datacite' is.datacite(x)
datacite( Title, Creator, Identifier = NULL, Publisher = NULL, PublicationYear = NULL, Subject = subject_create(term = "data sets", subjectScheme = "Library of Congress Subject Headings (LCSH)", schemeURI = "https://id.loc.gov/authorities/subjects.html", valueURI = "http://id.loc.gov/authorities/subjects/sh2018002256"), Type = "Dataset", Contributor = NULL, Date = ":tba", DateList = NULL, Language = NULL, AlternateIdentifier = ":unas", RelatedIdentifier = ":unas", Format = ":tba", Version = "0.1.0", Rights = ":tba", Description = ":tba", Geolocation = ":unas", FundingReference = ":unas" ) as_datacite(x, type = "bibentry", ...) is.datacite(x) ## S3 method for class 'datacite' is.datacite(x)
Title |
The name(s) or title(s) by which a resource is known. May be the
title of a dataset or the name of a piece of software. Similar to
dct:title. |
Creator |
The main researchers involved in producing the data, or the authors of the publication, in priority order. To supply multiple creators, repeat this property. |
Identifier |
The Identifier is a unique string that identifies a
resource. For software, determine whether the identifier is for a specific
version of a piece of software, (per the
Force11
Software Citation Principles, or for all versions. Similar to
|
Publisher |
The name of the entity that holds, archives, publishes
prints, distributes, releases, issues, or produces the resource. This
property will be used to formulate the citation, so consider the prominence
of the role. For software, use Publisher for the code repository. Mandatory
in DataCite, and similar to |
PublicationYear |
The year when the data was or will be made publicly
available in |
Subject |
Recommended for discovery. Subject, keyword, classification
code, or key phrase describing the resource. Similar to
dct:subject.
|
Type |
Defaults to |
Contributor |
Recommended for discovery. The institution or person responsible for collecting, managing, distributing, or otherwise contributing to the development of the resource. |
Date |
A character string in any of the following formats: |
DateList |
DataCite 4.4 allows to set multiple dates to a resource, they should be added as a list. Currently not yet implemented. See: datacite:Date. |
Language |
The primary language of the resource. Allowed values are
taken from IETF BCP 47, ISO 639-1 language code. See |
AlternateIdentifier |
An identifier or identifiers other than the
primary Identifier applied to the resource being registered. This may be
any alphanumeric string unique within its domain of issue. It may be used
for local identifiers. |
RelatedIdentifier |
Recommended for discovery. Defaults to
|
Format |
Technical format of the resource. Use file extension or MIME type where possible, e.g., PDF, XML, MPG or application/pdf, text/xml, video/mpeg. Similar to dct:format. |
Version |
Free text. Suggested practice: track
major_version.minor_version. Defaults to |
Rights |
Any rights information for this resource. The property may be
repeated to record complex rights characteristics, but this is not yet
supported. Free text. See |
Description |
Recommended for discovery. All additional information that
does not fit in any of the other categories. It may be used for technical
information—a free text. Defaults to |
Geolocation |
Recommended for discovery. Spatial region or named place
where the data was gathered or about which the data is focused. See
|
FundingReference |
Information about financial support (funding) for the
resource being registered. Defaults to |
x |
An object that is tested if it has a class "datacite". |
type |
A DataCite 4.4 metadata can be returned as a |
... |
Optional parameters to add to a |
DataCite is a leading global non-profit organisation that provides persistent identifiers (DOIs) for research data and other research outputs. Organisations within the research community join DataCite as members to be able to assign DOIs to all their research outputs. This way, their outputs become discoverable, and associated metadata is made available to the community.
The ResourceType
property will be by definition "Dataset".
The Size
attribute (e.g. bytes, pages, inches, etc.) will
automatically added to the dataset.
datacite()
creates a utils::bibentry
object extended with standard Dublin Core bibliographical metadata,
as_datacite()
retrieves the contents of this bibentry object of a
dataset_df from its attributes, and returns the contents as list,
dataset_df, or bibentry object.
as_datacite(x, type)
returns the DataCite bibliographical
metadata of x either as a list, a bibentry object, or a dataset_df object.
is.datacite(x)
returns a logical values (if the object x
is of class datacite
).
DataCite 4.3 Mandatory Properties and DataCite 4.3 Optional Properties
Other bibentry functions:
as_dublincore()
,
get_bibentry()
datacite( Title = "Growth of Orange Trees", Creator = c( person( given = "N.R.", family = "Draper", role = "cre", comment = c(VIAF = "http://viaf.org/viaf/84585260") ), person( given = "H", family = "Smith", role = "cre" ) ), Publisher = "Wiley", Date = 1998, Language = "en" ) as_datacite(iris_dataset)
datacite( Title = "Growth of Orange Trees", Creator = c( person( given = "N.R.", family = "Draper", role = "cre", comment = c(VIAF = "http://viaf.org/viaf/84585260") ), person( given = "H", family = "Smith", role = "cre" ) ), Publisher = "Wiley", Date = 1998, Language = "en" ) as_datacite(iris_dataset)
The dataset_df
constructor creates the objects of this class, which
are semantically rich, modern data frames inherited from
tibble::tibble
.
dataset_df( ..., identifier = c(eg = "http://example.com/dataset#"), var_labels = NULL, units = NULL, definitions = NULL, dataset_bibentry = NULL, dataset_subject = NULL ) as_dataset_df( df, identifier = c(eg = "http://example.com/dataset#"), var_labels = NULL, units = NULL, definitions = NULL, dataset_bibentry = NULL, dataset_subject = NULL, ... ) is.dataset_df(x) ## S3 method for class 'dataset_df' print(x, ...) is_dataset_df(x)
dataset_df( ..., identifier = c(eg = "http://example.com/dataset#"), var_labels = NULL, units = NULL, definitions = NULL, dataset_bibentry = NULL, dataset_subject = NULL ) as_dataset_df( df, identifier = c(eg = "http://example.com/dataset#"), var_labels = NULL, units = NULL, definitions = NULL, dataset_bibentry = NULL, dataset_subject = NULL, ... ) is.dataset_df(x) ## S3 method for class 'dataset_df' print(x, ...) is_dataset_df(x)
... |
The vectors (variables) that should be included in the dataset. |
identifier |
Defaults to |
var_labels |
The long, human readable labels of each variable. |
units |
The units of measurement for the measured variables. |
definitions |
The linked definitions of the variables, attributes, or constants. |
dataset_bibentry |
A list of bibliographic references and descriptive metadata
about the dataset as a whole created with |
dataset_subject |
The subject of the dataset, see |
df |
A |
x |
A |
To check if an object has the class dataset_df use is.dataset_df
.print
is the method to print out the semantically rich data frames
created with the constructor of dataset_df
.summary
is the method to summarise these semantically rich data frames.
For more details, please check the vignette("dataset_df", package = "dataset")
vignette.
dataset_df
is the constructor of this type, it returns an object
inherited from a data frame with semantically rich metadata.
is.dataset_df
returns a logical value
(if the object is of class dataset_df
.)
my_dataset <- dataset_df( country_name = defined( c("AD", "LI"), definition = "http://data.europa.eu/bna/c_6c2bb82d", namespace = "https://www.geonames.org/countries/$1/" ), gdp = defined( c(3897, 7365), label = "Gross Domestic Product", unit = "million dollars", definition = "http://data.europa.eu/83i/aa/GDP" ) ) print(my_dataset) is.dataset_df(my_dataset)
my_dataset <- dataset_df( country_name = defined( c("AD", "LI"), definition = "http://data.europa.eu/bna/c_6c2bb82d", namespace = "https://www.geonames.org/countries/$1/" ), gdp = defined( c(3897, 7365), label = "Gross Domestic Product", unit = "million dollars", definition = "http://data.europa.eu/83i/aa/GDP" ) ) print(my_dataset) is.dataset_df(my_dataset)
Get or reset the dataset's main title.
dataset_title(x) dataset_title(x, overwrite = FALSE) <- value
dataset_title(x) dataset_title(x, overwrite = FALSE) <- value
x |
A dataset object created with |
overwrite |
If the attributes should be overwritten. In case it is set
to |
value |
The name(s) or title(s) by which a resource is known. See: dct:title. |
In the DataCite definition, several titles can be used; it is not yet implemented.
A string with the dataset's title; set_dataset_title
returns
a dataset object with the changed (main) title.
Other Bibliographic reference functions:
creator()
dataset_title(orange_df) dataset_title(orange_df, overwrite = TRUE) <- "The Growth of Orange Trees" dataset_title(orange_df)
dataset_title(orange_df) dataset_title(orange_df, overwrite = TRUE) <- "The Growth of Orange Trees" dataset_title(orange_df)
The dataset is converted into a three-column long format with
columns s
for subject, p
for predicate and o
for
object.
dataset_to_triples(x, idcol = NULL)
dataset_to_triples(x, idcol = NULL)
x |
An R object that contains the data of the dataset (a data.frame or
inherited from |
idcol |
The identifier column. If |
The long form version of the original dataset, retaining the attributes and class.
dataset_to_triples(iris_dataset)
dataset_to_triples(iris_dataset)
The defined
constructor creates the objects of this
class, which are semantically extended vectors inherited from
haven::labelled
.
defined( x, labels = NULL, label = NULL, unit = NULL, definition = NULL, namespace = NULL ) is.defined(x) ## S3 method for class 'haven_labelled_defined' as.character(x, ...) ## S3 method for class 'haven_labelled_defined' summary(object, ...)
defined( x, labels = NULL, label = NULL, unit = NULL, definition = NULL, namespace = NULL ) is.defined(x) ## S3 method for class 'haven_labelled_defined' as.character(x, ...) ## S3 method for class 'haven_labelled_defined' summary(object, ...)
x |
A vector to label. Must be either numeric (integer or double) or character. |
labels |
A named vector or |
label |
A short, human-readable description of the vector or |
unit |
A character string of length one containing the unit of measure
or |
definition |
A character string of length one containing a linked
definition or |
namespace |
A namespace for individual observations or categories or
|
... |
Further parameters for inheritance, not in use. |
object |
An R object to be summarised. |
as.character
coerces a defined vector into a character
vector.summary
summarises the defined
vector.
For more
details, please check the vignette("defined", package = "dataset")
vignette.
The constructor defined
returns a vector with defined value
labels, a variable label, an optional unit of measurement and linked
definition.is.defined
returns a logical value, stating if the
object is of class defined
.
Other defined metadata methods and functions:
var_label()
,
var_namespace()
,
var_unit()
gdp_vector <- defined( c(3897, 7365, 6753), label = "Gross Domestic Product", unit = "million dollars", definition = "http://data.europa.eu/83i/aa/GDP" ) # To check the s3 class of the vector: is.defined(gdp_vector) # To print the defined vector: print(gdp_vector) # To summarise the defined vector: summary(gdp_vector) # Subsetting work as expected: gdp_vector[1:2]
gdp_vector <- defined( c(3897, 7365, 6753), label = "Gross Domestic Product", unit = "million dollars", definition = "http://data.europa.eu/83i/aa/GDP" ) # To check the s3 class of the vector: is.defined(gdp_vector) # To print the defined vector: print(gdp_vector) # To summarise the defined vector: summary(gdp_vector) # Subsetting work as expected: gdp_vector[1:2]
Describe a dataset
describe(x, con)
describe(x, con)
x |
A dataset_df object. |
con |
A connection, for example, |
The description of the dataset_df object is written to the connection in the n-triples form, nothing is returned.
temp_prov <- tempfile() describe(iris_dataset, con = temp_prov) readLines(temp_prov)
temp_prov <- tempfile() describe(iris_dataset, con = temp_prov) readLines(temp_prov)
Get/set the optional Description
property as an attribute
to an R object.
description(x) description(x, overwrite = FALSE) <- value
description(x) description(x, overwrite = FALSE) <- value
x |
A dataset object created with |
overwrite |
If the |
value |
The |
The Description
is recommended for discovery in DataCite. All
additional information that does not fit in any of the other categories.
May be used for technical information. A free text. Similar to
dct:description.
The Description
attribute as a character of length 1 is added
to x
.
Other Reference metadata functions:
geolocation()
,
identifier()
,
language
,
publication_year()
,
publisher()
,
rights()
description(orange_df) description( orange_df, overwrite = TRUE ) <- "The 'orange' dataset has 35 rows and 3 columns of records of the growth of orange trees." description(orange_df)
description(orange_df) description( orange_df, overwrite = TRUE ) <- "The 'orange' dataset has 35 rows and 3 columns of records of the growth of orange trees." description(orange_df)
Get/set the optional Geolocation
property as an attribute
to an R object.
geolocation(x) geolocation(x, overwrite = TRUE) <- value
geolocation(x) geolocation(x, overwrite = TRUE) <- value
x |
A semantically rich data frame object created by
|
overwrite |
If the attributes should be overwritten. In case it is set
to |
value |
The |
The Geolocation
is recommended for discovery in DataCite 4.4.
Spatial region or named place where the data was gathered or about which
the data is focused. See:
datacite:Geolocation.
The Geolocation
attribute as a character of length 1 is added
to x
.
Other Reference metadata functions:
description()
,
identifier()
,
language
,
publication_year()
,
publisher()
,
rights()
orange_dataset <- orange_df geolocation(orange_df) <- "US" geolocation(orange_df) geolocation(orange_df, overwrite = FALSE) <- "GB"
orange_dataset <- orange_df geolocation(orange_df) <- "US" geolocation(orange_df) geolocation(orange_df, overwrite = FALSE) <- "GB"
The dataset_df
objects contain among their
attributes bibliographic entries which are stored in a
utils::bibentry
object. Upon creation, these
entries are filled with default values when applicable.
To retrieve
the bibentry of a dataset_df object, use get_bibentry
.
To
create a new bibentry, use the datacite
function for an
interface and default values according to the DataCite standard, or the
dublincore
function for the more general Dublin Core
standard.
To change or an entire new bibliographic entry to a
dataset_df object (or any data.frame-like object), use the
`set_bibentry<-`
function (see examples.) For more details, please
check the vignette("bibentry", package="dataset")
vignette.
get_bibentry(dataset) set_bibentry(dataset) <- value
get_bibentry(dataset) set_bibentry(dataset) <- value
dataset |
A dataset created with |
value |
A |
The get_bibentry
returns from the
bibentry
object of x
from its attributes; the
`set_bibentry<-`
assignment function sets this attribute to
value
and invisibly returns x
with the changed attributes. To
set well-formatted input value
, refer to datacite
or
dublincore
(see Details.)
Other bibentry functions:
as_dublincore()
,
datacite()
# Get the bibentry of a dataset_df object: iris_bibentry <- get_bibentry(iris_dataset) # Create a well-formatted bibentry object: alternative_bibentry <- datacite( Creator = person("Jane Doe"), Title = "The Famous Iris Dataset", Publisher = "MyOrg" ) # Assign the new bibentry object: set_bibentry(iris_dataset) <- alternative_bibentry # Print the bibentry object according to the DataCite notation: as_datacite(iris_dataset, "list") # Print the bibentry object according to the Dublin Core notation: as_dublincore(iris_dataset, "list")
# Get the bibentry of a dataset_df object: iris_bibentry <- get_bibentry(iris_dataset) # Create a well-formatted bibentry object: alternative_bibentry <- datacite( Creator = person("Jane Doe"), Title = "The Famous Iris Dataset", Publisher = "MyOrg" ) # Assign the new bibentry object: set_bibentry(iris_dataset) <- alternative_bibentry # Print the bibentry object according to the DataCite notation: as_datacite(iris_dataset, "list") # Print the bibentry object according to the Dublin Core notation: as_dublincore(iris_dataset, "list")
Add a prefixed identifier to the first column of the dataset.
id_to_column(x, prefix = "eg:", ids = NULL)
id_to_column(x, prefix = "eg:", ids = NULL)
x |
A dataset created with |
prefix |
Defaults to |
ids |
Defaults to |
A dataset conforming the original sub-class of x
.
# Example with a dataset_df object: id_to_column(iris_dataset) # Example with a data.frame object:#' id_to_column(iris, prefix = "eg:iris-o")
# Example with a dataset_df object: id_to_column(iris_dataset) # Example with a data.frame object:#' id_to_column(iris, prefix = "eg:iris-o")
Add the optional Identifier property as an attribute to an R object.
identifier(x) identifier(x, overwrite = TRUE) <- value
identifier(x) identifier(x, overwrite = TRUE) <- value
x |
An |
overwrite |
If the attributes should be overwritten. In case it is set
to |
value |
The |
The Identifier
is an unambiguous reference to the resource
within a given context. Recommended practice is to identify the resource by
means of a string conforming to an identification system. Examples include
International Standard Book Number (ISBN), Digital Object Identifier (DOI),
and Uniform Resource Name (URN). Select and identifier scheme from
registered
URI schemes maintained by IANA. More details:
Guidelines
for using resource identifiers in Dublin Core metadata and IEEE LOM.
Similar to Identifier
in datacite
.
DataCite
4.4.
It is not part of the "core" Dublin Core terms, but we always add
it to the metadata attributes of a dataset (in case you use a strict Dublin
Core property sheet you can omit it.)
Dublin
Core metadata terms.
The Identifier
attribute as a character of length 1 is added
to x
.
Other Reference metadata functions:
description()
,
geolocation()
,
language
,
publication_year()
,
publisher()
,
rights()
identifier(orange_df) orange_copy <- orange_df identifier(orange_copy) <- "https://doi.org/99999/9999999"
identifier(orange_df) orange_copy <- orange_df identifier(orange_copy) <- "https://doi.org/99999/9999999"
This famous (Fisher's or Anderson's) iris data set gives the measurements in
centimetres of the variables sepal length and width and petal length and
width, respectively, for 50 flowers from each of 3 species of iris.
The species are Iris setosa, versicolor, and virginica.
This is a replication of datasets::iris
as
dataset s3 class.
iris_dataset
iris_dataset
iris is a data frame with 150 cases (rows) and 6 variables (columns) named rowid, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species.
See datasets::iris
for details.
Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, Part II, p179–188.
The data were collected by Anderson, Edgar (1935). The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2–5.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Add the optional Language property as an attribute to an R object.
language(x) language(x, iso_639_code = "639-3") <- value
language(x) language(x, iso_639_code = "639-3") <- value
x |
A semantically rich data frame object created by
|
iso_639_code |
Defaults to |
value |
The language to be added to the object attributes, added by
name, or as a 2- or 3-character code for the language. You can add a
language code or language name, and the parameter is normalized to
|
Language is an optional property in DataCite 4.4; see:
datacite:Language
It is a part of the "core" of the
Dublin
Core metadata terms. The language parameter is validated against the
[ISOcodes]{ISO_639_2}
table.
The attribute language
is
added to the object. It will be exported into DataCite applications in a
capitalized Lanugage
format.
The Language is added to the x
as ISO 639-1
, the
Datacite recommendation, or ISO 639-3
used by the Zenodo data
repository.
Other Reference metadata functions:
description()
,
geolocation()
,
identifier()
,
publication_year()
,
publisher()
,
rights()
myorange <- orange_df language(myorange) <- "English" language(myorange) language(myorange) <- "fr" language(myorange)
myorange <- orange_df language(myorange) <- "English" language(myorange) language(myorange) <- "fr" language(myorange)
Create a single N-Triple triple.
n_triple(s, p, o)
n_triple(s, p, o)
s |
The subject of a triplet. |
p |
The predicate of a triplet. |
o |
The object of a triplet. |
N-Triples is an easy to parse line-based subset of Turtle to serialize
RDF. An N-Triple triple is a sequence of RDF terms representing the subject,
predicate and object of an RDF Triple. Use n_triples
to serialize
multiple statements.
A character vector containing one N-Triple string.
s <- "http://example.org/show/218" p <- "http://www.w3.org/2000/01/rdf-schema#label" o <- "That Seventies Show" n_triple(s, p, o)
s <- "http://example.org/show/218" p <- "http://www.w3.org/2000/01/rdf-schema#label" o <- "That Seventies Show" n_triple(s, p, o)
Create triple statements to annotate your dataset with standard, interoperable metadata.
n_triples(triples)
n_triples(triples)
triples |
Concatenated N-Triples created with |
N-Triples is an easy to parse line-based subset of Turtle to serialize RDF. See RDF 1.2 N-Triples. A line-based syntax for an RDF graph.
A character vector containing unique N-Triple strings.
triple_1 <- n_triple( "http://example.org/show/218", "http://www.w3.org/2000/01/rdf-schema#label", "That Seventies Show" ) triple_2 <- n_triple( "http://example.org/show/218", "http://example.org/show/localName", '"Cette Série des Années Septante"@fr-be' ) n_triples(c(triple_1, triple_2, triple_1))
triple_1 <- n_triple( "http://example.org/show/218", "http://www.w3.org/2000/01/rdf-schema#label", "That Seventies Show" ) triple_2 <- n_triple( "http://example.org/show/218", "http://example.org/show/localName", '"Cette Série des Années Septante"@fr-be' ) n_triples(c(triple_1, triple_2, triple_1))
The Orange data frame has 35 rows and 3 columns of records of
the growth of orange trees. This is a replication of
datasets::Orange
as dataset_df s3 class.
orange_df
orange_df
orange_df is a data frame with 35 cases (rows) and 3 variables (columns) named rowid, tree, age, circumference
See datasets::Orange
for details.
Draper, N. R. and Smith, H. (1998), Applied Regression Analysis (3rd ed), Wiley (exercise 24.N). Pinheiro, J. C. and Bates, D. M. (2000) Mixed-effects Models in S and S-PLUS, Springer.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Add or update information about the history (provenance) of the dataset.
provenance(x) provenance(x) <- value
provenance(x) provenance(x) <- value
x |
A dataset created with |
value |
Use |
provenance(x)
returns the provenance attributes created by
n_triples
as a text; provenance(x)<-value
adds the new
provenance attributes and returns x
invisibly.
provenance(orange_df) ## add a statement: provenance(orange_df) <- n_triple( "https://doi.org/10.5281/zenodo.14917851", "http://www.w3.org/ns/prov#wasInformedBy", "isbn:9780471170822" )
provenance(orange_df) ## add a statement: provenance(orange_df) <- n_triple( "https://doi.org/10.5281/zenodo.14917851", "http://www.w3.org/ns/prov#wasInformedBy", "isbn:9780471170822" )
Get/set the optional publication_year
property as an
attribute to an R object.
publication_year(x) publication_year(x, overwrite = TRUE) <- value
publication_year(x) publication_year(x, overwrite = TRUE) <- value
x |
A semantically rich data frame object created by
|
overwrite |
If the attributes should be overwritten. In case it is set
to |
value |
The publication_year as a character set. |
The PublicationYear
is the year when the data was or will be
made publicly available in YYYY
format. See
Publication
Year: DataCite Additional Guidance.
Returns the year
metadata field of the DataBibentry
of
the dataset
Other Reference metadata functions:
description()
,
geolocation()
,
identifier()
,
language
,
publisher()
,
rights()
publication_year(iris_dataset) publication_year(iris_dataset) <- 1936
publication_year(iris_dataset) publication_year(iris_dataset) <- 1936
Add the optional Publisher
property as an attribute to an
R object.
publisher(x) publisher(x, overwrite = TRUE) <- value
publisher(x) publisher(x, overwrite = TRUE) <- value
x |
A dataset object created with |
overwrite |
If the attributes should be overwritten. In case it is set
to |
value |
The |
The Publisher
corresponds to dct:publisher and Publisher in
DataCite. The name of the entity that holds, archives, publishes prints,
distributes, releases, issues, or produces the resource. This property will
be used to formulate the citation, so consider the prominence of the role.
For software, use Publisher for the code repository. If there is an entity
other than a code repository, that "holds, archives, publishes, prints,
distributes, releases, issues, or produces" the code, use the property
Contributor/contributorType/ hostingInstitution for the code repository.
The Publisher attribute as a character of length 1 is added to
x
.
Other Reference metadata functions:
description()
,
geolocation()
,
identifier()
,
language
,
publication_year()
,
rights()
publisher(orange_df) <- "Wiley" publisher(orange_df)
publisher(orange_df) <- "Wiley" publisher(orange_df)
Get/set the optional Rights
property as an attribute to an
R object.
rights(x) rights(x, overwrite = FALSE) <- value
rights(x) rights(x, overwrite = FALSE) <- value
x |
A semantically rich data frame object created by |
overwrite |
If the |
value |
The |
Rights
corresponds to
dct:rights and
datacite
Rights. Information about rights held in and over the resource.
Typically, rights information includes a statement about various property
rights associated with the resource, including intellectual property rights.
The Rights
attribute as a character of length 1 is added to x
.
Other Reference metadata functions:
description()
,
geolocation()
,
identifier()
,
language
,
publication_year()
,
publisher()
rights(iris_dataset) <- "CC-BY-SA" rights(iris_dataset)
rights(iris_dataset) <- "CC-BY-SA" rights(iris_dataset)
Create/add/retrieve a subject
subject(x) subject_create( term, schemeURI = NULL, valueURI = NULL, prefix = NULL, subjectScheme = NULL, classificationCode = NULL ) subject(x) <- value is.subject(x)
subject(x) subject_create( term, schemeURI = NULL, valueURI = NULL, prefix = NULL, subjectScheme = NULL, classificationCode = NULL ) subject(x) <- value is.subject(x)
x |
A dataset object created with |
term |
A subject term, for example, |
schemeURI |
The URI of the subject identifier scheme, for example
|
valueURI |
The URI of the subject term.
|
prefix |
An abbreviated prefix of a scheme URI, for example,
|
subjectScheme |
The name of the subject scheme or classification code or authority if one is used. It is a namespace. |
classificationCode |
The classificationCode subproperty may be used for subject schemes, like ANZSRC, which do not have valueURIs for each subject term. |
value |
A subject field created by |
The subject class and its function record the subject property of the dataset.
The DataCite definition allows the use of multiple subproperties, however, these
cannot be added to the standard utils::bibentry
object. Therefore, if the user sets the value of the subject field to a
character string, it is added to the bibentry of the dataset, and also to
a separate subject
attribute. If the user wants to use the more detailed
subproperties (see examples with subject_create
), then the subject$term
value is added to the bibentry as a text, and the more complex subject object
is added as a separate attribute to the dataset_df object.#'
subject(x)
returns the subject attribute of the
dataset_df
object x
, subject(x)<-value
sets
the same attribute to value
and invisibly returns the x
object with
the changed attributes.
A subject_create
returns a named list with the subject term,
the subject scheme, URIs and prefix.
is.subject
returns a logical value, TRUE
if the subject as a list
is well-formatted by subject_create
with its necessary key-value pairs.
# To set the subject of a dataset_df object: subject(iris_dataset) <- subject_create( term = "Irises (plants)", schemeURI = "http://id.loc.gov/authorities/subjects", valueURI = "https://id.loc.gov/authorities/subjects/sh85068079", subjectScheme = "LCCH", prefix = "lcch:" ) # To retrieve the subject with its subproperties: subject(iris_dataset)
# To set the subject of a dataset_df object: subject(iris_dataset) <- subject_create( term = "Irises (plants)", schemeURI = "http://id.loc.gov/authorities/subjects", valueURI = "https://id.loc.gov/authorities/subjects/sh85068079", subjectScheme = "LCCH", prefix = "lcch:" ) # To retrieve the subject with its subproperties: subject(iris_dataset)
Get / set a definition for a vector or a dataset
var_definition(x, ...) var_definition(x) <- value definition_attribute(x) get_definition_attribute(x) set_definition_attribute(x, value) definition_attribute(x) <- value
var_definition(x, ...) var_definition(x) <- value definition_attribute(x) get_definition_attribute(x) set_definition_attribute(x, value) definition_attribute(x) <- value
x |
a vector |
... |
Further parameters for inheritance, not in use. |
value |
a character string or |
get_variable_definitions()
is identical to var_definition()
.
The (linked) definition of the meaning of the data contained by a
vector constructed with defined
.
small_country_dataset <- dataset_df( country_name = defined(c("Andorra", "Lichtenstein"), label = "Country"), gdp = defined(c(3897, 7365), label = "Gross Domestic Product", unit = "million dollars" ) ) var_definition(small_country_dataset$country_name) <- "http://data.europa.eu/bna/c_6c2bb82d" var_definition(small_country_dataset$country_name) # To remove a definition of measure var_definition(small_country_dataset$country_name) <- NULL
small_country_dataset <- dataset_df( country_name = defined(c("Andorra", "Lichtenstein"), label = "Country"), gdp = defined(c(3897, 7365), label = "Gross Domestic Product", unit = "million dollars" ) ) var_definition(small_country_dataset$country_name) <- "http://data.europa.eu/bna/c_6c2bb82d" var_definition(small_country_dataset$country_name) # To remove a definition of measure var_definition(small_country_dataset$country_name) <- NULL
Add a human readable, easier to understand label as a metadata attribute to a variable or vector than the programmatic vector object name, or column name in the data frame.
## S3 method for class 'defined' var_label(x, ...) ## S3 method for class 'dataset_df' var_label( x, unlist = FALSE, null_action = c("keep", "fill", "skip", "na", "empty"), recurse = FALSE, ... ) label_attribute(x) ## S3 replacement method for class 'defined' var_label(x) <- value ## S3 replacement method for class 'dataset_df' var_label(x) <- value
## S3 method for class 'defined' var_label(x, ...) ## S3 method for class 'dataset_df' var_label( x, unlist = FALSE, null_action = c("keep", "fill", "skip", "na", "empty"), recurse = FALSE, ... ) label_attribute(x) ## S3 replacement method for class 'defined' var_label(x) <- value ## S3 replacement method for class 'dataset_df' var_label(x) <- value
x |
a vector or a data.frame |
... |
Further potential parameters reserved for inherited classes. |
unlist |
for data frames, return a named vector instead of a list |
null_action |
for data frames, by default |
recurse |
if |
value |
a character string or |
See labelled::var_label
for details about
variable labels.
See vignette("defined", package = "dataset")
to use comprehensively
with variable labels, namespaces, units of measures, and machine-independent
permanent variable identifiers.
var_label()
returns returns the label
attribute as a character
string.
The var_label<-
assignment method allows to add, remove, or overwrite this attribute on a vector
x
. The assignment function returns the x
vector invisibly.
Other defined metadata methods and functions:
defined()
,
var_namespace()
,
var_unit()
# Retrieve the label attribute: var_label(orange_df$circumference) # To (re)set the label attribute: var_label(orange_df$circumference) <- "circumference (breast height)"
# Retrieve the label attribute: var_label(orange_df$circumference) # To (re)set the label attribute: var_label(orange_df$circumference) <- "circumference (breast height)"
Retain the namespace part of a permanent, global variable identifier which is independent of the R instance in use.
var_namespace(x, ...) var_namespace(x) <- value get_variable_namespaces(x, ...) namespace_attribute(x) get_namespace_attribute(x) set_namespace_attribute(x, value) namespace_attribute(x) <- value
var_namespace(x, ...) var_namespace(x) <- value get_variable_namespaces(x, ...) namespace_attribute(x) get_namespace_attribute(x) set_namespace_attribute(x, value) namespace_attribute(x) <- value
x |
a vector |
... |
Further potential parameters reserved for inherited classes. |
value |
a character string or |
The namespace attribute is useful when users join or concatenate data from remote, linked, and open data sources. In such cases, variable identifiers (labels or names) are often resolved with a common namespace prefix, which, together with the namespace, forms a URI or IRI permanent identifier for the variable. Retaining the namespace in such cases allows cross-validation or success later updates of the vector (as a column of a dataset.)
get_variable_namespaces()
is identical to var_namespace()
.
See vignette("defined", package = "dataset")
to use comprehensively
with variable labels, namespaces, units of measures, and machine-independent
permanent variable identifiers.
The namespace attribute of a vector constructed with defined
.
Other defined metadata methods and functions:
defined()
,
var_label()
,
var_unit()
qid <- defined(c("Q275912", "Q116196078"), namespace = c(wd = "https://www.wikidata.org/wiki/") ) var_namespace(qid) # To remove a namespace var_namespace(qid) <- NULL
qid <- defined(c("Q275912", "Q116196078"), namespace = c(wd = "https://www.wikidata.org/wiki/") ) var_namespace(qid) # To remove a namespace var_namespace(qid) <- NULL
Get / Set a unit of measure
var_unit(x, ...) var_unit(x) <- value get_variable_units(x, ...) unit_attribute(x) get_unit_attribute(x) set_unit_attribute(x, value) unit_attribute(x) <- value
var_unit(x, ...) var_unit(x) <- value get_variable_units(x, ...) unit_attribute(x) get_unit_attribute(x) set_unit_attribute(x, value) unit_attribute(x) <- value
x |
A vector. |
... |
Further potential parameters reserved for inherited classes. |
value |
A character string or |
The aim of the unit
attribute is to add to the R vector object its
unit of measure (for example, physical units like gram and kilogram or
currency units like dollars or euros), so that they are not concatenated or
joined in a syntactically correct but semantically incorrect way (i.e.,
accidentally concatenating values quoted in dollars and euros from different
subvectors.) This is particularly useful when working with linked open data,
i.e., when joins or concatenations are performed on data arriving from a remote
source.get_variable_units()
is identical to var_unit()
.
See vignette("defined", package = "dataset")
to use comprehensively
with variable labels, namespaces, units of measures, and machine-independent
permanent variable identifiers.
The unit attribute of a vector constructed with defined
,
or any vector that is enriched with a unit attribute.
The var_unit<-
assignment method allows to add, remove, or overwrite this attribute on a vector
x
. The assignment function returns the x
vector invisibly.
Other defined metadata methods and functions:
defined()
,
var_label()
,
var_namespace()
# The defined vector class and dataset_df support units of measure attributes: var_unit(orange_df$circumference) # Normally columns of a data.frame do not have a unit attribute: var_unit(mtcars$wt) # You can add them with the assignment function: var_unit(mtcars$wt) <- "1000 lbs" # To remove a unit of measure assign the NULL value: var_unit(mtcars$wt) <- NULL
# The defined vector class and dataset_df support units of measure attributes: var_unit(orange_df$circumference) # Normally columns of a data.frame do not have a unit attribute: var_unit(mtcars$wt) # You can add them with the assignment function: var_unit(mtcars$wt) <- "1000 lbs" # To remove a unit of measure assign the NULL value: var_unit(mtcars$wt) <- NULL
Convert the numeric, boolean and Date/time columns of a dataset
xs:decimal
, xsLboolean
, xs:date
and
xs:dateTime
.
xsd_convert(x, idcol, ...) ## S3 method for class 'data.frame' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'dataset' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'tibble' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'character' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'numeric' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'haven_labelled_defined' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'integer' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'logical' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'factor' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'POSIXct' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'Date' xsd_convert(x, idcol = NULL, ...)
xsd_convert(x, idcol, ...) ## S3 method for class 'data.frame' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'dataset' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'tibble' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'character' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'numeric' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'haven_labelled_defined' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'integer' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'logical' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'factor' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'POSIXct' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'Date' xsd_convert(x, idcol = NULL, ...)
x |
An object to be coerced to an XLM Schema defined string format. |
idcol |
The name or position of the column that contains the row
(observation) identifiers. If |
... |
Further optional parameters for generic method. |
A serialisation of an R vector or data frame (dataset) in XML.
# Convert data.frame to XML Schema Definition xsd_convert(data.frame(a = 1:3, b = c("a", "b", "c"))) # Convert dataset to XML Schema Definition xsd_convert(head(dataset_df(orange_df))) # Convert integers or doubles, numbers: xsd_convert(1:3) # Convert logical values: xsd_convert(TRUE)
# Convert data.frame to XML Schema Definition xsd_convert(data.frame(a = 1:3, b = c("a", "b", "c"))) # Convert dataset to XML Schema Definition xsd_convert(head(dataset_df(orange_df))) # Convert integers or doubles, numbers: xsd_convert(1:3) # Convert logical values: xsd_convert(TRUE)