Title: | Create Data Frames that are Easier to Exchange and Reuse |
---|---|
Description: | The aim of the 'dataset' package is to make tidy datasets easier to release, exchange and reuse. It organizes and formats data frame 'R' objects into well-referenced, well-described, interoperable datasets into release and reuse ready form. |
Authors: | Daniel Antal [aut, cre] , Marcelo Perlin [rev] |
Maintainer: | Daniel Antal <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.3.4002 |
Built: | 2024-12-26 19:20:49 UTC |
Source: | https://github.com/dataobservatory-eu/dataset |
Coerce to character vector
as_character(x) ## S3 method for class 'haven_labelled_defined' as_character(x)
as_character(x) ## S3 method for class 'haven_labelled_defined' as_character(x)
x |
A vector created with |
A character vector.
as_character(iris_dataset$Species)
as_character(iris_dataset$Species)
Add metadata conforming the DCMI Metadata Terms. to datasets, i.e. structured R data.frame or list objects, for an accurate and consistent identification of a resource for citation and retrieval purposes.
as_dublincore(x, type = "bibentry", ...) dublincore( title, creator, contributor = NULL, publisher = NULL, identifier = NULL, subject = NULL, type = "DCMITYPE:Dataset", dataset_date = NULL, language = NULL, relation = NULL, format = "application/r-rds", rights = NULL, datasource = NULL, description = NULL, coverage = NULL ) is.dublincore(x) ## S3 method for class 'dublincore' is.dublincore(x)
as_dublincore(x, type = "bibentry", ...) dublincore( title, creator, contributor = NULL, publisher = NULL, identifier = NULL, subject = NULL, type = "DCMITYPE:Dataset", dataset_date = NULL, language = NULL, relation = NULL, format = "application/r-rds", rights = NULL, datasource = NULL, description = NULL, coverage = NULL ) is.dublincore(x) ## S3 method for class 'dublincore' is.dublincore(x)
x |
An object that is tested if it has a class "dublincore". |
type |
The nature or genre of the resource. Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary
DCMITYPE.
For a dataset, the correct term is |
... |
Optional parameters to add to a |
title |
dct:title, a name given to the resource.
|
creator |
An entity primarily responsible for making the resource.
dct:creator
Corresponds to |
contributor |
An entity responsible for making contributions to the dataset. See DCMI: Contributor, and for possible contribution type, please review MARC Code List for Relators. |
publisher |
Corresponds to dct:publisher
and Publisher in DataCite.
The name of the entity that holds, archives, publishes prints, distributes, releases,
issues, or produces the resource. This property will be used to formulate the citation,
so consider the prominence of the role. For software, use |
identifier |
An unambiguous reference to the resource within a given context.
Recommended practice is to identify the resource by means of a string conforming to an
identification system. Examples include International Standard Book Number (ISBN),
Digital Object Identifier (DOI), and Uniform Resource Name (URN).
Select and identifier scheme from
registered URI schemes maintained by IANA.
More details: Guidelines for using resource identifiers in Dublin Core metadata and IEEE LOM.
Similar to |
subject |
Defaults to |
dataset_date |
Corresponds to a point or period of time associated with an event in the
lifecycle of the resource. dct:date.
|
language |
A language of the dataset. See DCMI: Language. |
relation |
A related resource. Recommended best practice is to identify the related
resource by means of a string conforming to a formal identification system.
See: dct:relation.
Similar to |
format |
The file format, physical medium, or dimensions of the dataset. See DCMI: Format. |
rights |
Corresponds to dct:rights and
|
datasource |
The source of the dataset,
DCMI: Source,
which corresponds to a |
description |
An account of the resource. It may include but is not limited to:
an abstract, a table of contents, a graphical representation, or a free-text account of the resource.
dct:description. In
|
coverage |
The spatial or temporal topic of the resource, spatial applicability of the dataset, or jurisdiction under which the dataset is relevant. See DCMI: Coverage. |
The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen main metadata items for describing digital or physical resources, such as datasets or their printed versions. Dublin Core has been formally standardized internationally as ISO 15836, as IETF RFC 5013 by the Internet Engineering Task Force (IETF), as well as in the U.S. as ANSI/NISO Z39.85.
The ResourceType
property will be by definition "Dataset".
The Size
attribute (e.g. bytes, pages, inches, etc.) will automatically added to the dataset.
dublincore()
creates a utils::bibentry
object
extended with standard Dublin Core bibliographical metadata, as_dublincore()
retrieves the contents of this bibentry object of a dataset_df from its
attributes, and returns the contents as list, dataset_df, or bibentry object, or an
ntriples string.
A logical value, if the bibliographic entries are listed according to the Dublin Core specification.
Other bibentry functions:
datacite()
,
get_bibentry()
my_bibentry <- dct_iris <- dublincore( title = "Iris Dataset", creator = person("Edgar", "Anderson", role = "aut"), publisher = person("American Iris Society", role="pbl"), contributor = person("Daniel", "Antal", role="dtm"), datasource = "https://doi.org/10.1111/j.1469-1809.1936.tb02137.x", dataset_date = 1935, language = "en", description = "The famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica." ) as_dublincore(iris_dataset, type="list")
my_bibentry <- dct_iris <- dublincore( title = "Iris Dataset", creator = person("Edgar", "Anderson", role = "aut"), publisher = person("American Iris Society", role="pbl"), contributor = person("Daniel", "Antal", role="dtm"), datasource = "https://doi.org/10.1111/j.1469-1809.1936.tb02137.x", dataset_date = 1935, language = "en", description = "The famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica." ) as_dublincore(iris_dataset, type="list")
Coerce a defined vector to numeric
as_numeric(x) ## S3 method for class 'haven_labelled_defined' as_numeric(x)
as_numeric(x) ## S3 method for class 'haven_labelled_defined' as_numeric(x)
x |
A vector created with |
A numeric vector.
as_numeric(iris_dataset$Petal.Length)
as_numeric(iris_dataset$Petal.Length)
Add the optional Creator
property as an attribute to a
dataset object.
creator(x) creator(x, overwrite = TRUE) <- value
creator(x) creator(x, overwrite = TRUE) <- value
x |
A semantically rich data frame object created by |
overwrite |
If the attributes should be overwritten. In case it is set to |
value |
The |
The Creator
corresponds to dct:creator
in Dublin Core and Creator in DataCite.
The name of the entity that holds, archives, publishes prints, distributes,
releases, issues, or produces the dataset. This property will be used to
formulate the citation, so consider the prominence of the role.
The Creator attribute as a character of length one is added to x
.
Other Bibliographic reference functions:
dataset_title()
creator(iris_dataset) # To change author: creator(iris_dataset) <- person("Jane", "Doe") # To add author: creator(iris_dataset, overwrite=FALSE) <- person("John", "Doe")
creator(iris_dataset) # To change author: creator(iris_dataset) <- person("Jane", "Doe") # To add author: creator(iris_dataset, overwrite=FALSE) <- person("John", "Doe")
Add metadata conforming the DataCite Metadata Schema.
datacite( Title, Creator, Identifier = NULL, Publisher = NULL, PublicationYear = NULL, Subject = subject_create(term = "data sets", subjectScheme = "Library of Congress Subject Headings (LCSH)", schemeURI = "https://id.loc.gov/authorities/subjects.html", valueURI = "http://id.loc.gov/authorities/subjects/sh2018002256"), Type = "Dataset", Contributor = NULL, DateList = ":tba", Language = NULL, AlternateIdentifier = ":unas", RelatedIdentifier = ":unas", Format = ":tba", Version = "0.1.0", Rights = ":tba", Description = ":tba", Geolocation = ":unas", FundingReference = ":unas" ) as_datacite(x, type = "bibentry", ...) is.datacite(x) ## S3 method for class 'datacite' is.datacite(x)
datacite( Title, Creator, Identifier = NULL, Publisher = NULL, PublicationYear = NULL, Subject = subject_create(term = "data sets", subjectScheme = "Library of Congress Subject Headings (LCSH)", schemeURI = "https://id.loc.gov/authorities/subjects.html", valueURI = "http://id.loc.gov/authorities/subjects/sh2018002256"), Type = "Dataset", Contributor = NULL, DateList = ":tba", Language = NULL, AlternateIdentifier = ":unas", RelatedIdentifier = ":unas", Format = ":tba", Version = "0.1.0", Rights = ":tba", Description = ":tba", Geolocation = ":unas", FundingReference = ":unas" ) as_datacite(x, type = "bibentry", ...) is.datacite(x) ## S3 method for class 'datacite' is.datacite(x)
Title |
The name(s) or title(s) by which a resource is known. May be the
title of a dataset or the name of a piece of software.
Similar to
dct:title. |
Creator |
The main researchers involved in producing the data, or the authors of the publication, in priority order. To supply multiple creators, repeat this property. |
Identifier |
The Identifier is a unique string that identifies a resource. For software, determine
whether the identifier is for a specific version of a piece of software, (per the
Force11 Software Citation Principles,
or for all versions. Similar to |
Publisher |
The name of the entity that holds, archives, publishes prints,
distributes, releases, issues, or produces the resource. This property will be used to
formulate the citation, so consider the prominence of the role.
For software, use Publisher for the code repository. Mandatory in DataCite, and similar to
|
PublicationYear |
The year when the data was or will be made publicly available in
|
Subject |
Recommended for discovery. Subject, keyword, classification code, or key
phrase describing the resource. Similar to
dct:subject. |
Type |
Defaults to |
Contributor |
Recommended for discovery. The institution or person responsible for collecting, managing, distributing, or otherwise contributing to the development of the resource. |
DateList |
DataCite 4.4 allows to set multiple dates to a resource, they should be added as a list. See: datacite:Date. |
Language |
The primary language of the resource. Allowed values are taken from
IETF BCP 47, ISO 639-1 language code. See |
AlternateIdentifier |
An identifier or identifiers other than the primary
Identifier applied to the resource being registered. This may be any
alphanumeric string unique within its domain of issue. It may be used for
local identifiers.
|
RelatedIdentifier |
Recommended for discovery.
Defaults to |
Format |
Technical format of the resource. Use file extension or MIME type where possible, e.g., PDF, XML, MPG or application/pdf, text/xml, video/mpeg. Similar to dct:format. |
Version |
Free text. Suggested practice: track major_version.minor_version.
Defaults to |
Rights |
Any rights information for this resource. The property may be
repeated to record complex rights characteristics, but this is not yet supported.
Free text. See |
Description |
Recommended for discovery. All additional information that does not
fit in any of the other categories. It may be used for technical information—a free text.
Defaults to |
Geolocation |
Recommended for discovery. Spatial region or named place where the data was gathered
or about which the data is focused. See |
FundingReference |
Information about financial support (funding) for the resource
being registered.
Defaults to |
x |
An object that is tested if it has a class "datacite". |
type |
A DataCite 4.4 metadata can be returned as a |
... |
Optional parameters to add to a |
DataCite is a leading global non-profit organisation that provides persistent identifiers (DOIs) for research data and other research outputs. Organisations within the research community join DataCite as members to be able to assign DOIs to all their research outputs. This way, their outputs become discoverable, and associated metadata is made available to the community.
The ResourceType
property will be by definition "Dataset".
The Size
attribute (e.g. bytes, pages, inches, etc.) will automatically added to
the dataset.
datacite()
creates a utils::bibentry
object
extended with standard Dublin Core bibliographical metadata, as_datacite()
retrieves the contents of this bibentry object of a dataset_df from its
attributes, and returns the contents as list, dataset_df, or bibentry object.
as_datacite(x, type)
returns the DataCite bibliographical metadata
of x either as a list, a bibentry object, or a dataset_df object.
is.datacite(x)
returns a logical values (if the object x
is of class datacite
).
DataCite 4.3 Mandatory Properties and DataCite 4.3 Optional Properties
Other bibentry functions:
as_dublincore()
,
get_bibentry()
datacite( Title = "Iris Dataset", Creator = person(family = "Anderson", given = "Edgar", role = "aut"), Publisher = "American Iris Society", PublicationYear = 1935, Geolocation = "US", Language = "en") as_datacite(iris_dataset)
datacite( Title = "Iris Dataset", Creator = person(family = "Anderson", given = "Edgar", role = "aut"), Publisher = "American Iris Society", PublicationYear = 1935, Geolocation = "US", Language = "en") as_datacite(iris_dataset)
The dataset_df
constructor creates the objects of this class, which
are semantically rich, modern data frames inherited from
tibble::tibble
.
dataset_df( ..., identifier = c(eg = "http://example.com/dataset#"), var_labels = NULL, units = NULL, definitions = NULL, dataset_bibentry = NULL, dataset_subject = NULL ) as_dataset_df( df, identifier = c(eg = "http://example.com/dataset#"), var_labels = NULL, units = NULL, definitions = NULL, dataset_bibentry = NULL, dataset_subject = NULL, ... ) is.dataset_df(x) ## S3 method for class 'dataset_df' print(x, ...) is_dataset_df(x)
dataset_df( ..., identifier = c(eg = "http://example.com/dataset#"), var_labels = NULL, units = NULL, definitions = NULL, dataset_bibentry = NULL, dataset_subject = NULL ) as_dataset_df( df, identifier = c(eg = "http://example.com/dataset#"), var_labels = NULL, units = NULL, definitions = NULL, dataset_bibentry = NULL, dataset_subject = NULL, ... ) is.dataset_df(x) ## S3 method for class 'dataset_df' print(x, ...) is_dataset_df(x)
... |
The vectors (variables) that should be included in the dataset. |
identifier |
Defaults to |
var_labels |
The long, human readable labels of each variable. |
units |
The units of measurement for the measured variables. |
definitions |
The linked definitions of the variables, attributes, or constants. |
dataset_bibentry |
A list of bibliographic references and descriptive metadata
about the dataset as a whole created with |
dataset_subject |
The subject of the dataset, see |
df |
A |
x |
A |
To check if an object has the class dataset_df use is.dataset_df
.print
is the method to print out the semantically rich data frames
created with the constructor of dataset_df
.summary
is the method to summarise these semantically rich data frames.
For more details, please check the vignette("dataset_df", package = "dataset")
vignette.
dataset_df
is the constructor of this type, it returns an object
inherited from a data frame with semantically rich metadata.
is.dataset_df
returns a logical value
(if the object is of class dataset_df
.)
my_dataset <- dataset_df( country_name = defined( c("AD", "LI"), definition = "http://data.europa.eu/bna/c_6c2bb82d", namespace = "https://www.geonames.org/countries/$1/"), gdp = defined( c(3897, 7365), label = "Gross Domestic Product", unit = "million dollars", definition = "http://data.europa.eu/83i/aa/GDP") ) print(my_dataset) is.dataset_df(my_dataset)
my_dataset <- dataset_df( country_name = defined( c("AD", "LI"), definition = "http://data.europa.eu/bna/c_6c2bb82d", namespace = "https://www.geonames.org/countries/$1/"), gdp = defined( c(3897, 7365), label = "Gross Domestic Product", unit = "million dollars", definition = "http://data.europa.eu/83i/aa/GDP") ) print(my_dataset) is.dataset_df(my_dataset)
Get or reset the dataset's main title.
dataset_title(x) dataset_title(x, overwrite = FALSE) <- value
dataset_title(x) dataset_title(x, overwrite = FALSE) <- value
x |
A dataset object created with |
overwrite |
If the attributes should be overwritten. In case it is set
to |
value |
The name(s) or title(s) by which a resource is known. See: dct:title. |
In the DataCite definition, several titles can be used; it is not yet implemented.
A string with the dataset's title; set_dataset_title
returns
a dataset object with the changed (main) title.
Other Bibliographic reference functions:
creator()
dataset_title(iris_dataset) dataset_title(iris_dataset, overwrite = TRUE) <-"The Famous Iris Dataset" dataset_title(iris_dataset)
dataset_title(iris_dataset) dataset_title(iris_dataset, overwrite = TRUE) <-"The Famous Iris Dataset" dataset_title(iris_dataset)
The dataset is converted into a three-column long format with
columns s
for subject, p
for predicate and o
for
object.
dataset_to_triples(x, idcol = NULL)
dataset_to_triples(x, idcol = NULL)
x |
An R object that contains the data of the dataset (a data.frame or
inherited from |
idcol |
The identifier column. If |
The long form version of the original dataset, retaining the attributes and class.
dataset_to_triples(iris_dataset)
dataset_to_triples(iris_dataset)
The defined
constructor creates the objects of this class, which
are semantically extended vectors inherited from
haven::labelled
.
defined( x, labels = NULL, label = NULL, unit = NULL, definition = NULL, namespace = NULL ) is.defined(x) ## S3 method for class 'haven_labelled_defined' as.character(x, ...) ## S3 method for class 'haven_labelled_defined' summary(object, ...)
defined( x, labels = NULL, label = NULL, unit = NULL, definition = NULL, namespace = NULL ) is.defined(x) ## S3 method for class 'haven_labelled_defined' as.character(x, ...) ## S3 method for class 'haven_labelled_defined' summary(object, ...)
x |
A vector to label. Must be either numeric (integer or double) or character. |
labels |
A named vector or |
label |
A short, human-readable description of the vector or |
unit |
A character string of length one containing the
unit of measure or |
definition |
A character string of length one containing a
linked definition or |
namespace |
A namespace for individual observations or categories or |
... |
Further parameters for inheritance, not in use. |
object |
An R object to be summarised. |
as.character
coerces a defined vector into a character vector.summary
summarises the defined
vector.
For more details, please check the vignette("defined", package = "dataset")
vignette.
The constructor defined
returns a vector with defined
value labels, a variable label, an optional unit of measurement and linked
definition.is.defined
returns a logical value, stateing if the object is of
class defined
.
Other defined metadata methods and functions:
var_label()
,
var_namespace()
,
var_unit()
gdp_vector <- defined( c(3897, 7365, 6753), label = "Gross Domestic Product", unit = "million dollars", definition = "http://data.europa.eu/83i/aa/GDP" ) # To check the s3 class of the vector: is.defined(gdp_vector) # To print the defined vector: print(gdp_vector) # To summarise the defined vector: summary(gdp_vector) # Subsetting work as expected: gdp_vector[1:2]
gdp_vector <- defined( c(3897, 7365, 6753), label = "Gross Domestic Product", unit = "million dollars", definition = "http://data.europa.eu/83i/aa/GDP" ) # To check the s3 class of the vector: is.defined(gdp_vector) # To print the defined vector: print(gdp_vector) # To summarise the defined vector: summary(gdp_vector) # Subsetting work as expected: gdp_vector[1:2]
Describe a dataset
describe(x, con)
describe(x, con)
x |
A dataset_df object. |
con |
A connection, for example, |
The description of the dataset_df object is written to the connection in the n-triples form, nothing is returned.
temp_prov <- tempfile() describe(iris_dataset, con=temp_prov) readLines(temp_prov)
temp_prov <- tempfile() describe(iris_dataset, con=temp_prov) readLines(temp_prov)
Get/set the optional Description
property as an attribute to an R object.
description(x) description(x, overwrite = FALSE) <- value
description(x) description(x, overwrite = FALSE) <- value
x |
A dataset object created with |
overwrite |
If the |
value |
The |
The Description
is recommended for discovery in DataCite. All additional information that does not
fit in any of the other categories. May be used for technical information. A free text.
Similar to
dct:description.
The Description
attribute as a character of length 1 is added to x
.
Other Reference metadata functions:
geolocation()
,
identifier()
,
language
,
publication_year()
,
publisher()
,
rights()
description(iris_dataset, overwrite = TRUE) <- "The famous iris dataset used in R language examples." description(iris_dataset)
description(iris_dataset, overwrite = TRUE) <- "The famous iris dataset used in R language examples." description(iris_dataset)
Get/set the optional Geolocation
property as an attribute to an
R object.
geolocation(x) geolocation(x, overwrite = TRUE) <- value
geolocation(x) geolocation(x, overwrite = TRUE) <- value
x |
A semantically rich data frame object created by |
overwrite |
If the attributes should be overwritten. In case it is set to |
value |
The |
The Geolocation
is recommended for discovery in DataCite 4.4.
Spatial region or named place where the data was gathered
or about which the data is focused. See:
datacite:Geolocation.
The Geolocation
attribute as a character of length 1 is added to x
.
Other Reference metadata functions:
description()
,
identifier()
,
language
,
publication_year()
,
publisher()
,
rights()
iris_dataset <- iris geolocation(iris_dataset) <- "US" geolocation(iris_dataset) geolocation(iris_dataset, overwrite = FALSE) <- "GB"
iris_dataset <- iris geolocation(iris_dataset) <- "US" geolocation(iris_dataset) geolocation(iris_dataset, overwrite = FALSE) <- "GB"
The dataset_df
objects contain among their attributes bibliographic entries which
are stored in a utils::bibentry
object.
Upon creation, these entries are filled with default values when applicable.
To retrieve the bibentry of a dataset_df object, use get_bibentry
.
To create a new bibentry, use the datacite
function for an interface and
default values according to the DataCite standard, or the
dublincore
function for the more general Dublin Core standard.
To change or an entire new bibliographic entry to a dataset_df object (or any
data.frame-like object), use the `set_bibentry<-`
function (see examples.)
For more details, please check the vignette("bibentry", package="dataset")
vignette.
get_bibentry(dataset) set_bibentry(dataset) <- value
get_bibentry(dataset) set_bibentry(dataset) <- value
dataset |
A dataset created with |
value |
A |
The get_bibentry
returns from the bibentry
object of x
from its attributes; the `set_bibentry<-`
assignment
function sets this attribute to value
and invisibly returns x
with the changed attributes. To set well-formatted input value
,
refer to datacite
or
dublincore
(see Details.)
Other bibentry functions:
as_dublincore()
,
datacite()
# Get the bibentry of a dataset_df object: iris_bibentry <- get_bibentry(iris_dataset) # Create a well-formatted bibentry object: alternative_bibentry <- datacite( Creator=person("Jane Doe"), Title ="The Famous Iris Dataset", Publisher = "MyOrg") # Assign the new bibentry object: set_bibentry(iris_dataset) <- alternative_bibentry # Print the bibentry object according to the DataCite notation: as_datacite(iris_dataset, "list") # Print the bibentry object according to the Dublin Core notation: as_dublincore(iris_dataset, "list")
# Get the bibentry of a dataset_df object: iris_bibentry <- get_bibentry(iris_dataset) # Create a well-formatted bibentry object: alternative_bibentry <- datacite( Creator=person("Jane Doe"), Title ="The Famous Iris Dataset", Publisher = "MyOrg") # Assign the new bibentry object: set_bibentry(iris_dataset) <- alternative_bibentry # Print the bibentry object according to the DataCite notation: as_datacite(iris_dataset, "list") # Print the bibentry object according to the Dublin Core notation: as_dublincore(iris_dataset, "list")
Add a prefixed identifier to the first column of the dataset.
id_to_column(x, prefix = "eg:", ids = NULL)
id_to_column(x, prefix = "eg:", ids = NULL)
x |
A dataset created with |
prefix |
Defaults to |
ids |
Defaults to |
A dataset conforming the original sub-class of x
.
# Example with a dataset_df object: id_to_column(iris_dataset) # Example with a data.frame object:#' id_to_column(iris, prefix="eg:iris-o")
# Example with a dataset_df object: id_to_column(iris_dataset) # Example with a data.frame object:#' id_to_column(iris, prefix="eg:iris-o")
Add the optional Identifier property as an attribute to an R object.
identifier(x) identifier(x, overwrite = TRUE) <- value
identifier(x) identifier(x, overwrite = TRUE) <- value
x |
An |
overwrite |
If the attributes should be overwritten. In case it is set to |
value |
The |
The Identifier
is an unambiguous reference to the resource
within a given context. Recommended practice is to identify the resource by
means of a string conforming to an identification system. Examples include
International Standard Book Number (ISBN),
Digital Object Identifier (DOI), and Uniform Resource Name (URN).
Select and identifier scheme from
registered URI schemes maintained by IANA.
More details: Guidelines for using resource identifiers in Dublin Core metadata and IEEE LOM.
Similar to Identifier
in datacite
.
DataCite 4.4.
It is not part of the "core" Dublin Core terms, but we always add it to the
metadata attributes of a dataset (in case you use a strict Dublin Core
property sheet you can omit it.)
Dublin Core metadata terms.
The Identifier
attribute as a character of length 1 is added to x
.
Other Reference metadata functions:
description()
,
geolocation()
,
language
,
publication_year()
,
publisher()
,
rights()
identifier(iris_dataset) <- "https://doi.org/10.1111/j.1469-1809.1936.tb02137.x" identifier(iris_dataset)
identifier(iris_dataset) <- "https://doi.org/10.1111/j.1469-1809.1936.tb02137.x" identifier(iris_dataset)
This famous (Fisher's or Anderson's) iris data set gives the measurements in
centimetres of the variables sepal length and width and petal length and
width, respectively, for 50 flowers from each of 3 species of iris.
The species are Iris setosa, versicolor, and virginica.
This is a replication of datasets::iris
as
dataset s3 class.
iris_dataset
iris_dataset
iris is a data frame with 150 cases (rows) and 6 variables (columns) named rowid, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species.
See datasets::iris
for details.
Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, Part II, p179–188.
The data were collected by Anderson, Edgar (1935). The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2–5.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Add the optional Language property as an attribute to an R object.
language(x) language(x, iso_639_code = "639-3") <- value
language(x) language(x, iso_639_code = "639-3") <- value
x |
A semantically rich data frame object created by |
iso_639_code |
Defaults to |
value |
The language to be added to the object attributes, added by name, or
as a 2- or 3-character code for the language. You can add a language code or language name,
and the parameter is normalized to |
Language is an optional property in DataCite 4.4; see:
datacite:Language
It is a part of the "core" of the
Dublin Core metadata terms.
The language parameter is validated against the [ISOcodes]{ISO_639_2}
table.
The attribute language
is added to the object. It will be exported into DataCite
applications in a capitalized Lanugage
format.
The Language is added to the x
as
ISO 639-1
, the Datacite recommendation, or ISO 639-3
used by the
Zenodo data repository.
Other Reference metadata functions:
description()
,
geolocation()
,
identifier()
,
publication_year()
,
publisher()
,
rights()
myiris <- iris_dataset language(myiris) <- "English" language(myiris) language(myiris) <- "fr" language(myiris)
myiris <- iris_dataset language(myiris) <- "English" language(myiris) language(myiris) <- "fr" language(myiris)
Create a single N-Triple triple.
n_triple(s, p, o)
n_triple(s, p, o)
s |
The subject of a triplet. |
p |
The predicate of a triplet. |
o |
The object of a triplet. |
N-Triples is an easy to parse line-based subset of Turtle to serialize
RDF. An N-Triple triple is a sequence of RDF terms representing the subject,
predicate and object of an RDF Triple. Use n_triples
to serialize
multiple statements.
A character vector containing one N-Triple string.
s <- "http://example.org/show/218" p <- "http://www.w3.org/2000/01/rdf-schema#label" o <- "That Seventies Show" n_triple(s, p, o)
s <- "http://example.org/show/218" p <- "http://www.w3.org/2000/01/rdf-schema#label" o <- "That Seventies Show" n_triple(s, p, o)
Create triple statements to annotate your dataset with standard, interoperable metadata.
n_triples(triples)
n_triples(triples)
triples |
Concatenated N-Triples created with |
N-Triples is an easy to parse line-based subset of Turtle to serialize RDF. See RDF 1.2 N-Triples. A line-based syntax for an RDF graph.
A character vector containing unique N-Triple strings.
triple_1 <- n_triple("http://example.org/show/218", "http://www.w3.org/2000/01/rdf-schema#label", "That Seventies Show") triple_2 <- n_triple("http://example.org/show/218", "http://example.org/show/localName", '"Cette Série des Années Septante"@fr-be') n_triples(c(triple_1, triple_2, triple_1))
triple_1 <- n_triple("http://example.org/show/218", "http://www.w3.org/2000/01/rdf-schema#label", "That Seventies Show") triple_2 <- n_triple("http://example.org/show/218", "http://example.org/show/localName", '"Cette Série des Années Septante"@fr-be') n_triples(c(triple_1, triple_2, triple_1))
Add or update information about the history (provenance) of the dataset.
provenance(x) provenance(x) <- value
provenance(x) provenance(x) <- value
x |
A dataset created with |
value |
Use |
provenance(x)
returns the provenance attributes created by
n_triples
as a text; provenance(x)<-value
adds the new
provenance attributes and returns x
invisibly.
provenance(iris_dataset) ## add a statement: provenance(iris_dataset) <- n_triple( "https://doi.org/10.5281/zenodo.10396807", "http://www.w3.org/ns/prov#wasInformedBy", "http://example.com/source#1")
provenance(iris_dataset) ## add a statement: provenance(iris_dataset) <- n_triple( "https://doi.org/10.5281/zenodo.10396807", "http://www.w3.org/ns/prov#wasInformedBy", "http://example.com/source#1")
Get/set the optional publication_year
property as an attribute to an R object.
publication_year(x) publication_year(x, overwrite = TRUE) <- value
publication_year(x) publication_year(x, overwrite = TRUE) <- value
x |
A semantically rich data frame object created by |
overwrite |
If the attributes should be overwritten. In case it is set to |
value |
The publication_year as a character set. |
The PublicationYear
is the year when the data was or will be made
publicly available in YYYY
format.
See
Publication Year: DataCite Additional Guidance.
Returns the year
metadata field of the DataBibentry
of
the dataset
Other Reference metadata functions:
description()
,
geolocation()
,
identifier()
,
language
,
publisher()
,
rights()
publication_year(iris_dataset) publication_year(iris_dataset) <- 1936
publication_year(iris_dataset) publication_year(iris_dataset) <- 1936
Add the optional Publisher
property as an attribute to an R object.
publisher(x) publisher(x, overwrite = TRUE) <- value
publisher(x) publisher(x, overwrite = TRUE) <- value
x |
A dataset object created with |
overwrite |
If the attributes should be overwritten. In case it is set
to |
value |
The |
The Publisher
corresponds to dct:publisher and Publisher in DataCite.
The name of the entity that holds, archives, publishes prints, distributes, releases,
issues, or produces the resource. This property will be used to formulate the citation,
so consider the prominence of the role. For software, use Publisher for the
code repository. If there is an entity other than a code repository, that
"holds, archives, publishes, prints, distributes, releases, issues, or produces" the
code, use the property Contributor/contributorType/ hostingInstitution for the code
repository.
The Publisher attribute as a character of length 1 is added to x
.
Other Reference metadata functions:
description()
,
geolocation()
,
identifier()
,
language
,
publication_year()
,
rights()
publisher(iris_dataset) <- "American Iris Society" publisher(iris_dataset)
publisher(iris_dataset) <- "American Iris Society" publisher(iris_dataset)
Get/set the optional Rights
property as an attribute to an
R object.
rights(x) rights(x, overwrite = FALSE) <- value
rights(x) rights(x, overwrite = FALSE) <- value
x |
A semantically rich data frame object created by |
overwrite |
If the |
value |
The |
Rights
corresponds to
dct:rights and
datacite
Rights. Information about rights held in and over the resource.
Typically, rights information includes a statement about various property
rights associated with the resource, including intellectual property rights.
The Rights
attribute as a character of length 1 is added to x
.
Other Reference metadata functions:
description()
,
geolocation()
,
identifier()
,
language
,
publication_year()
,
publisher()
rights(iris_dataset) <- "CC-BY-SA" rights(iris_dataset)
rights(iris_dataset) <- "CC-BY-SA" rights(iris_dataset)
Create/add/retrieve a subject
subject(x) subject_create( term, schemeURI = NULL, valueURI = NULL, prefix = NULL, subjectScheme = NULL, classificationCode = NULL ) subject(x) <- value is.subject(x)
subject(x) subject_create( term, schemeURI = NULL, valueURI = NULL, prefix = NULL, subjectScheme = NULL, classificationCode = NULL ) subject(x) <- value is.subject(x)
x |
A dataset object created with |
term |
A subject term, for example, |
schemeURI |
The URI of the subject identifier scheme, for example
|
valueURI |
The URI of the subject term.
|
prefix |
An abbreviated prefix of a scheme URI, for example,
|
subjectScheme |
The name of the subject scheme or classification code or authority if one is used. It is a namespace. |
classificationCode |
The classificationCode subproperty may be used for subject schemes, like ANZSRC, which do not have valueURIs for each subject term. |
value |
A subject field created by |
The subject class and its function record the subject property of the dataset.
The DataCite definition allows the use of multiple subproperties, however, these
cannot be added to the standard utils::bibentry
object. Therefore, if the user sets the value of the subject field to a
character string, it is added to the bibentry of the dataset, and also to
a separate subject
attribute. If the user wants to use the more detailed
subproperties (see examples with subject_create
), then the subject$term
value is added to the bibentry as a text, and the more complex subject object
is added as a separate attribute to the dataset_df object.#'
subject(x)
returns the subject attribute of the
dataset_df
object x
, subject(x)<-value
sets
the same attribute to value
and invisibly returns the x
object with
the changed attributes.
A subject_create
returns a named list with the subject term,
the subject scheme, URIs and prefix.
is.subject
returns a logical value, TRUE
if the subject as a list
is well-formatted by subject_create
with its necessary key-value pairs.
# To set the subject of a dataset_df object: subject(iris_dataset) <- subject_create( term = "Irises (plants)", schemeURI = "http://id.loc.gov/authorities/subjects", valueURI = "https://id.loc.gov/authorities/subjects/sh85068079", subjectScheme = "LCCH", prefix = "lcch:") # To retrieve the subject with its subproperties: subject(iris_dataset)
# To set the subject of a dataset_df object: subject(iris_dataset) <- subject_create( term = "Irises (plants)", schemeURI = "http://id.loc.gov/authorities/subjects", valueURI = "https://id.loc.gov/authorities/subjects/sh85068079", subjectScheme = "LCCH", prefix = "lcch:") # To retrieve the subject with its subproperties: subject(iris_dataset)
Get / set a definition for a vector or a dataset
var_definition(x, ...) var_definition(x) <- value definition_attribute(x) get_definition_attribute(x) set_definition_attribute(x, value) definition_attribute(x) <- value
var_definition(x, ...) var_definition(x) <- value definition_attribute(x) get_definition_attribute(x) set_definition_attribute(x, value) definition_attribute(x) <- value
x |
a vector |
... |
Further parameters for inheritance, not in use. |
value |
a character string or |
get_variable_definitions()
is identical to var_definition()
.
The (linked) definition of the meaning of the data contained by a
vector constructed with defined
.
small_country_dataset <- dataset_df( country_name = defined(c("Andorra", "Lichtenstein"), label = "Country"), gdp = defined(c(3897, 7365), label = "Gross Domestic Product", unit = "million dollars") ) var_definition(small_country_dataset$country_name) <- "http://data.europa.eu/bna/c_6c2bb82d" var_definition(small_country_dataset$country_name) # To remove a definition of measure var_definition(small_country_dataset$country_name) <- NULL
small_country_dataset <- dataset_df( country_name = defined(c("Andorra", "Lichtenstein"), label = "Country"), gdp = defined(c(3897, 7365), label = "Gross Domestic Product", unit = "million dollars") ) var_definition(small_country_dataset$country_name) <- "http://data.europa.eu/bna/c_6c2bb82d" var_definition(small_country_dataset$country_name) # To remove a definition of measure var_definition(small_country_dataset$country_name) <- NULL
Add a human readable, easier to understand label as a metadata attribute to a variable or vector than the programmatic vector object name, or column name in the data frame.
## S3 method for class 'defined' var_label(x, ...) ## S3 method for class 'dataset_df' var_label( x, unlist = FALSE, null_action = c("keep", "fill", "skip", "na", "empty"), recurse = FALSE, ... ) label_attribute(x) ## S3 replacement method for class 'defined' var_label(x) <- value ## S3 replacement method for class 'dataset_df' var_label(x) <- value
## S3 method for class 'defined' var_label(x, ...) ## S3 method for class 'dataset_df' var_label( x, unlist = FALSE, null_action = c("keep", "fill", "skip", "na", "empty"), recurse = FALSE, ... ) label_attribute(x) ## S3 replacement method for class 'defined' var_label(x) <- value ## S3 replacement method for class 'dataset_df' var_label(x) <- value
x |
a vector or a data.frame |
... |
Further potential parameters reserved for inherited classes. |
unlist |
for data frames, return a named vector instead of a list |
null_action |
for data frames, by default |
recurse |
if |
value |
a character string or |
See labelled::var_label
for details about
variable labels.
See vignette("defined", package = "dataset")
to use comprehensively
with variable labels, namespaces, units of measures, and machine-independent
permanent variable identifiers.
var_label()
returns returns the label
attribute as a character
string.
The var_label<-
assignment method allows to add, remove, or overwrite this attribute on a vector
x
. The assignment function returns the x
vector invisibly.
Other defined metadata methods and functions:
defined()
,
var_namespace()
,
var_unit()
iris_dataset_2 <- iris_dataset # Retrieve the label attribute: var_label(iris_dataset_2$Sepal.Length)
iris_dataset_2 <- iris_dataset # Retrieve the label attribute: var_label(iris_dataset_2$Sepal.Length)
Retain the namespace part of a permanent, global variable identifier which is independent of the R instance in use.
var_namespace(x, ...) var_namespace(x) <- value get_variable_namespaces(x, ...) namespace_attribute(x) get_namespace_attribute(x) set_namespace_attribute(x, value) namespace_attribute(x) <- value
var_namespace(x, ...) var_namespace(x) <- value get_variable_namespaces(x, ...) namespace_attribute(x) get_namespace_attribute(x) set_namespace_attribute(x, value) namespace_attribute(x) <- value
x |
a vector |
... |
Further potential parameters reserved for inherited classes. |
value |
a character string or |
The namespace attribute is useful when users join or concatenate data from remote, linked, and open data sources. In such cases, variable identifiers (labels or names) are often resolved with a common namespace prefix, which, together with the namespace, forms a URI or IRI permanent identifier for the variable. Retaining the namespace in such cases allows cross-validation or success later updates of the vector (as a column of a dataset.)
get_variable_namespaces()
is identical to var_namespace()
.
See vignette("defined", package = "dataset")
to use comprehensively
with variable labels, namespaces, units of measures, and machine-independent
permanent variable identifiers.
The namespace attribute of a vector constructed with defined
.
Other defined metadata methods and functions:
defined()
,
var_label()
,
var_unit()
qid = defined(c("Q275912", "Q116196078"), namespace = c(wd="https://www.wikidata.org/wiki/") ) var_namespace(qid) # To remove a namespace var_namespace(qid) <- NULL
qid = defined(c("Q275912", "Q116196078"), namespace = c(wd="https://www.wikidata.org/wiki/") ) var_namespace(qid) # To remove a namespace var_namespace(qid) <- NULL
Get / Set a unit of measure
var_unit(x, ...) var_unit(x) <- value get_variable_units(x, ...) unit_attribute(x) get_unit_attribute(x) set_unit_attribute(x, value) unit_attribute(x) <- value
var_unit(x, ...) var_unit(x) <- value get_variable_units(x, ...) unit_attribute(x) get_unit_attribute(x) set_unit_attribute(x, value) unit_attribute(x) <- value
x |
A vector. |
... |
Further potential parameters reserved for inherited classes. |
value |
A character string or |
The aim of the unit
attribute is to add to the R vector object its
unit of measure (for example, physical units like gram and kilogram or
currency units like dollars or euros), so that they are not concatenated or
joined in a syntactically correct but semantically incorrect way (i.e.,
accidentally concatenating values quoted in dollars and euros from different
subvectors.) This is particularly useful when working with linked open data,
i.e., when joins or concatenations are performed on data arriving from a remote
source.get_variable_units()
is identical to var_unit()
.
See vignette("defined", package = "dataset")
to use comprehensively
with variable labels, namespaces, units of measures, and machine-independent
permanent variable identifiers.
The unit attribute of a vector constructed with defined
,
or any vector that is enriched with a unit attribute.
The var_unit<-
assignment method allows to add, remove, or overwrite this attribute on a vector
x
. The assignment function returns the x
vector invisibly.
Other defined metadata methods and functions:
defined()
,
var_label()
,
var_namespace()
# The defined vector class and dataset_df support units of measure attributes: var_unit(iris_dataset$Sepal.Length) # Normally columns of a data.frame do not have a unit attribute: var_unit(iris$Sepal.Length) # You can add them with the assignment function: var_unit(iris$Sepal.Length) <- "centimeter" # To remove a unit of measure assign the NULL value: var_unit(iris$Sepal.Length) <- NULL
# The defined vector class and dataset_df support units of measure attributes: var_unit(iris_dataset$Sepal.Length) # Normally columns of a data.frame do not have a unit attribute: var_unit(iris$Sepal.Length) # You can add them with the assignment function: var_unit(iris$Sepal.Length) <- "centimeter" # To remove a unit of measure assign the NULL value: var_unit(iris$Sepal.Length) <- NULL
Convert the numeric, boolean and Date/time columns of a dataset
xs:decimal
, xsLboolean
, xs:date
and xs:dateTime
.
xsd_convert(x, idcol, ...) ## S3 method for class 'data.frame' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'dataset' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'tibble' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'character' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'numeric' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'haven_labelled_defined' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'integer' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'logical' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'factor' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'POSIXct' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'Date' xsd_convert(x, idcol = NULL, ...)
xsd_convert(x, idcol, ...) ## S3 method for class 'data.frame' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'dataset' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'tibble' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'character' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'numeric' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'haven_labelled_defined' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'integer' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'logical' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'factor' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'POSIXct' xsd_convert(x, idcol = NULL, ...) ## S3 method for class 'Date' xsd_convert(x, idcol = NULL, ...)
x |
An object to be coerced to an XLM Schema defined string format. |
idcol |
The name or position of the column that contains the row
(observation) identifiers. If |
... |
Further optional parameters for generic method. |
A serialisation of an R vector or data frame (dataset) in XML.
# Convert data.frame to XML Schema Definition xsd_convert(head(iris)) # Convert dataset to XML Schema Definition xsd_convert(head(iris_dataset)) # Convert integers or doubles, numbers: xsd_convert(1:3) # Convert logical values: xsd_convert(TRUE)
# Convert data.frame to XML Schema Definition xsd_convert(head(iris)) # Convert dataset to XML Schema Definition xsd_convert(head(iris_dataset)) # Convert integers or doubles, numbers: xsd_convert(1:3) # Convert logical values: xsd_convert(TRUE)