Title: | Creating and Reading Data Packages |
---|---|
Description: | Open, read data from and modify Data Packages. Data Packages are an open standard for bundling and describing data sets (<https://datapackage.org>). When data is read from a Data Package care is taken to convert the data as much a possible to R appropriate data types. The package can be extended with plugins for additional data types. |
Authors: | Jan van der Laan [aut, cre] |
Maintainer: | Jan van der Laan <[email protected]> |
License: | GPL-3 |
Version: | 0.1.1 |
Built: | 2025-03-13 05:31:29 UTC |
Source: | https://github.com/djvanderlaan/datapackage |
Read the CSV-data for a Data Resource
csv_reader( path, resource, use_fread = FALSE, convert_categories = c("no", "to_factor"), as_connection = FALSE, ... )
csv_reader( path, resource, use_fread = FALSE, convert_categories = c("no", "to_factor"), as_connection = FALSE, ... )
path |
path to the data set. |
resource |
a Data Resource. |
use_fread |
use the |
convert_categories |
how to handle columns for which the field
descriptor has a |
as_connection |
This argument is ignored. The function will always
return a |
... |
additional arguments are passed on to |
Returns a data.frame
with the data.
Generally used by calling dp_get_data
.
Write data of data resource to CSV-file
csv_writer(x, resource_name, datapackage, use_fwrite = FALSE, ...)
csv_writer(x, resource_name, datapackage, use_fwrite = FALSE, ...)
x |
|
resource_name |
name of the data resource in the data package. |
datapackage |
the Data Package to which the file should be written. |
use_fwrite |
write the file using |
... |
ignored for now |
The function doesn't return anything. It is called for it's side effect of creating CSV-files in the directory of the data package.
Add a reader function for a specific format
dp_add_reader( format, reader, mediatypes = character(0), extensions = character(0) )
dp_add_reader( format, reader, mediatypes = character(0), extensions = character(0) )
format |
the data format read by the reader. Should be a length 1 character vector. |
reader |
the reader function. See details. |
mediatypes |
a character vector with the media-types that are used for the format. |
extensions |
a character vector with typical file extensions used by the format. |
Adds a reader for a given format. The reader is added to a list of reades
references by the format. It is also possible to assign mediatypes and file
extensions to the format. When the format for a given Data Resource is
missing, dp_get_data
will first check if a mediatype is
associated with the resource and will try to look up which format belongs to
the fiven mediatype. If that doesn't result in a valid format,
dp_get_data
will try the same with the extension of the file.
Note that adding a reader for an existing format will overwrite the existing reader.
Does not return anything (invisible(NULL)
).
# Add a very simple reader for json json_reader <- function(path, resource, ...) { lapply(path, function(fn) { jsonlite::read_json(fn) }) } dp_add_reader("json", json_reader, c("application/json"), "json")
# Add a very simple reader for json json_reader <- function(path, resource, ...) { lapply(path, function(fn) { jsonlite::read_json(fn) }) } dp_add_reader("json", json_reader, c("application/json"), "json")
Add a writer function for a specific format
dp_add_writer(format, writer)
dp_add_writer(format, writer)
format |
the data format read by the writer Should be a length 1 character vector. |
writer |
the writer function. See details. |
Adds a writer for a given format. The writer is added to a list of writers referenced by the format. The writer function should accept 'data' with the data as its first argument, 'resource_name' the name of the resource to which the data set belongs, 'datapackage' that datapackage to which the data should be written.
Note that adding a writer for an existing format will overwrite the existing writer
Does not return anything (invisible(NULL)
).
# Add a very simple writer for json json_writer <- function(data, resource_name, datapackage, ...) { dataresource <- dp_resource(datapackage, resource_name) path <- dp_path(dataresource, full_path = TRUE) jsonlite::write_json(data, path) } dp_add_writer("json", json_writer)
# Add a very simple writer for json json_writer <- function(data, resource_name, datapackage, ...) { dataresource <- dp_resource(datapackage, resource_name) path <- dp_path(dataresource, full_path = TRUE) jsonlite::write_json(data, path) } dp_add_writer("json", json_writer)
Convert columns of data.frame to their correct types using table schema
dp_apply_schema( dta, resource, convert_categories = c("no", "to_factor", "to_code"), ... )
dp_apply_schema( dta, resource, convert_categories = c("no", "to_factor", "to_code"), ... )
dta |
a |
resource |
an object with the Data Resource of the data set. |
convert_categories |
how to handle columns for which the field
descriptor has a |
... |
additional arguments are passed on to the |
Converts each column in dta
to the correct R-type using the type
information in the table schema. For example, if the original column type in
dta
is a character vector and the table schema specifies that the field is
of type number, the column is converted to numeric using the decimal
separator and thousands separator specified in the field descriptor (or default values
for these if not).
Returns a copy of the input data.frame with columns modified to match the types given in de table schema.
This function calls conversion functions for each of the columns, see
dp_to_number
, dp_to_boolean
, dp_to_integer
,
dp_to_date
. dp_to_datetime
, dp_to_yearmonth
,
and dp_to_string
.
Get the a data.frame with the categories for a variable.
dp_categorieslist(x, ...) ## Default S3 method: dp_categorieslist( x, fielddescriptor = attr(x, "fielddescriptor"), datapackage = dp_get_datapackage(fielddescriptor), ... ) ## S3 method for class 'fielddescriptor' dp_categorieslist( x, datapackage = dp_get_datapackage(x), normalised = FALSE, ... )
dp_categorieslist(x, ...) ## Default S3 method: dp_categorieslist( x, fielddescriptor = attr(x, "fielddescriptor"), datapackage = dp_get_datapackage(fielddescriptor), ... ) ## S3 method for class 'fielddescriptor' dp_categorieslist( x, datapackage = dp_get_datapackage(x), normalised = FALSE, ... )
x |
the variable for which to get the Categories List |
... |
used to pass extra arguments on to other methods. |
fielddescriptor |
the Field Descriptor associated with the variable. |
datapackage |
the Data Package where the variable is from. |
normalised |
if |
Returns a data.frame
with the categories or NULL
when none could
be found.
Check if a data set is valid given a Data Resource
dp_check_dataresource( x, dataresource = attr(x, "resource"), constraints = TRUE, throw = FALSE, tolerance = sqrt(.Machine$double.eps) )
dp_check_dataresource( x, dataresource = attr(x, "resource"), constraints = TRUE, throw = FALSE, tolerance = sqrt(.Machine$double.eps) )
x |
|
dataresource |
|
constraints |
also check relevant constraints in the field descriptor. |
throw |
generate an error if the data set is not valid according to the dataresource. |
tolerance |
numerical tolerance used in some of the tests |
Returns TRUE
when the field is valid. Returns a character vector with
length >= 1 if the field is not valid. The text in the character values
indicates why the field is not valid.
When throw = TRUE
the function will generate an error instead of
returning a character vector. When the dataset is valid the function returns
TRUE
invisibly.
Use isTRUE
to check if the test was successful.
See dp_check_field
for a function that checks a column or vector.
Check if a vector is valid given a field descriptor
dp_check_field( x, fielddescriptor, constraints = TRUE, tolerance = sqrt(.Machine$double.eps) )
dp_check_field( x, fielddescriptor, constraints = TRUE, tolerance = sqrt(.Machine$double.eps) )
x |
vector to test |
fielddescriptor |
field descriptor to test the vector against |
constraints |
also check relevant constraints in the field descriptor. |
tolerance |
numerical tolerance used in some of the tests |
Returns TRUE
when the field is valid. Returns a character vector with
length >= 1 if the field is not valid. The text in the character values
indicates why the field is not valid.
Use isTRUE
to check if the test was successful.
See dp_check_dataresource
for a function that checks a complete data set.
Get the Field Descriptor associated with a certain field in a Data Resource
dp_field(x, field_name)
dp_field(x, field_name)
x |
a |
field_name |
length one character vector with the name of the field. |
An object of type fielddescriptor
.
List the fields in a Data Resource
dp_field_names(x)
dp_field_names(x)
x |
object for which to get the field names. This can either be a Data Resource or Table Schema. |
Returns a character vector with the fields in the Data Resource.
Generate Data Resource for a dataset
dp_generate_dataresource( x, name, path = paste0(name, getextension(format)), format = "csv", mediatype = getmediatype(format), use_existing = FALSE, categories_type = c("regular", "resource"), categorieslist = iscategorieslist(x), ... )
dp_generate_dataresource( x, name, path = paste0(name, getextension(format)), format = "csv", mediatype = getmediatype(format), use_existing = FALSE, categories_type = c("regular", "resource"), categorieslist = iscategorieslist(x), ... )
x |
|
name |
name of the Data Resource |
path |
name of the file in which to store the dataset. This should be a path relative to the location of the directory in which the Data Package in which the Data Resource will be stored. |
format |
the data format in which the data is stored. |
mediatype |
mediatype of the data |
use_existing |
use existing field descriptors if present. |
categories_type |
how should categories be stored. See
|
categorieslist |
does the data resource function as a categories list for fields in another data resource |
... |
Currently ignored |
Returns a Data Resource object.
Note that this function does not create the file at path
. The export
of the Data Resource is automatically set to CSV.
# generate an example dataset dta <- data.frame(a = 1:3, b = factor(letters[1:3])) resource <- dp_generate_dataresource(dta, "example") print(resource)
# generate an example dataset dta <- data.frame(a = 1:3, b = factor(letters[1:3])) resource <- dp_generate_dataresource(dta, "example") print(resource)
Generate a fielddescriptor for a given variable in a dataset
dp_generate_fielddescriptor(x, name, ...) ## Default S3 method: dp_generate_fielddescriptor(x, name, ...) ## S3 method for class 'numeric' dp_generate_fielddescriptor( x, name, use_existing = TRUE, use_categories = TRUE, categories_type = c("regular", "resource"), ... ) ## S3 method for class 'integer' dp_generate_fielddescriptor( x, name, use_existing = TRUE, use_categories = TRUE, categories_type = c("regular", "resource"), ... ) ## S3 method for class 'logical' dp_generate_fielddescriptor( x, name, use_existing = TRUE, use_categories = TRUE, categories_type = c("regular", "resource"), ... ) ## S3 method for class 'Date' dp_generate_fielddescriptor( x, name, use_existing = TRUE, use_categories = TRUE, categories_type = c("regular", "resource"), ... ) ## S3 method for class 'character' dp_generate_fielddescriptor( x, name, use_existing = TRUE, use_categories = TRUE, categories_type = c("regular", "resource"), ... ) ## S3 method for class 'factor' dp_generate_fielddescriptor( x, name, use_existing = TRUE, use_categories = TRUE, categories_type = c("regular", "resource"), ... )
dp_generate_fielddescriptor(x, name, ...) ## Default S3 method: dp_generate_fielddescriptor(x, name, ...) ## S3 method for class 'numeric' dp_generate_fielddescriptor( x, name, use_existing = TRUE, use_categories = TRUE, categories_type = c("regular", "resource"), ... ) ## S3 method for class 'integer' dp_generate_fielddescriptor( x, name, use_existing = TRUE, use_categories = TRUE, categories_type = c("regular", "resource"), ... ) ## S3 method for class 'logical' dp_generate_fielddescriptor( x, name, use_existing = TRUE, use_categories = TRUE, categories_type = c("regular", "resource"), ... ) ## S3 method for class 'Date' dp_generate_fielddescriptor( x, name, use_existing = TRUE, use_categories = TRUE, categories_type = c("regular", "resource"), ... ) ## S3 method for class 'character' dp_generate_fielddescriptor( x, name, use_existing = TRUE, use_categories = TRUE, categories_type = c("regular", "resource"), ... ) ## S3 method for class 'factor' dp_generate_fielddescriptor( x, name, use_existing = TRUE, use_categories = TRUE, categories_type = c("regular", "resource"), ... )
x |
vector for which to generate the fielddescriptor |
name |
name of the field in the dataset. |
... |
used to pass extra arguments to methods. |
use_existing |
use existing field descriptor if present (assumes this is stored in the 'fielddescriptor' attribute). |
use_categories |
do not generate a categories field except when |
categories_type |
how should categories be stored. Note that type "resource" is not officially part of the standard. |
Returns a fielddescriptor
.
Get a connection to the data belonging to a Data Resource
dp_get_connection(x, ...)
dp_get_connection(x, ...)
x |
Can either be a Data Resource or Data Package. |
... |
Extra arguments are passed on to |
When x
is a Data Package a additional argument resource_name
is
needed to identify the correct Data Resource. See dp_get_data
.
This function calls dp_get_data
with an additional
as_connection = TRUE)
argument.
Depending on the type of Data Resource a connection to the data is returned.
Some readers will return the data set as a data.frame
.
Get the data belonging to a Data Resource
dp_get_data(x, ..., as_connection = FALSE) ## S3 method for class 'dataresource' dp_get_data(x, reader = "guess", ..., as_connection = FALSE) ## S3 method for class 'datapackage' dp_get_data(x, resource_name, reader = "guess", ..., as_connection = FALSE)
dp_get_data(x, ..., as_connection = FALSE) ## S3 method for class 'dataresource' dp_get_data(x, reader = "guess", ..., as_connection = FALSE) ## S3 method for class 'datapackage' dp_get_data(x, resource_name, reader = "guess", ..., as_connection = FALSE)
x |
a |
... |
passed on to the |
as_connection |
Try to return a connection to the data instead of the data itself. When a reader does not support returning connections it will return the data. |
reader |
the reader to use to read the data. This should be either a
function accepting the path to the data set (a character vector with possibly
multitple filenames) and the Data Resource as second argument, or the
character string |
resource_name |
the name of the |
When reader = "guess"
the function will try to guess which reader to
use based on the format
and mediatype
of the Data Resource.
Currently only CSV is supported. For other data types a custom reader has to
be provided unless the data is stored inside the Data Resource object.
It is also possible to assign default readers for data formats. For that see
dp_add_reader
.
Will return the data. This will generally be a data.frame
but
depending on the file type can also be other types of R-objects.
When called with the as_connection = TRUE
argument, it will try to
return a connection to the data. This depends on the reader. When the reader
does not support returning connections it will return the data.
dp_get_connection
is a wrapper around
dp_get_data(..., as_connection = TRUE)
.
Get the Data Package associated with the object
dp_get_datapackage(x)
dp_get_datapackage(x)
x |
the object for which to determine the associated Data Package |
This method can, of course, only determine the Data Package when this
information is stored in one of the attributes of the object. This can be
either be a datapackage
attribute or an dataresource
attribute.
Returns a Data Resource object, or returns NULL
when none could be
found.
Quickly read a dataset from a Data Package
dp_load_from_datapackage(path, resource_name, ...)
dp_load_from_datapackage(path, resource_name, ...)
path |
the directory with the Data Package |
resource_name |
the name of the Data Resource. When omitted the Data Resource with the same name as the Data Package is read in and when no such resource exists the first Data Resource is read in. |
... |
passed on to |
This function is a wrapper around open_datapackage
and
dp_get_data
. It offers a quick way to read in a dataset from a
Data Package.
Returns a dataset. Usually a data.frame
.
Return the number of resources in a Data Package
dp_nresources(dp)
dp_nresources(dp)
dp |
A Data Package object. |
Returns an integer with the number of resources in the Data Package.
Get a list of properties defined for the object
dp_properties(x) ## S3 method for class 'readonlydatapackage' dp_properties(x) ## S3 method for class 'editabledatapackage' dp_properties(x) ## S3 method for class 'dataresource' dp_properties(x) ## S3 method for class 'tableschema' dp_properties(x)
dp_properties(x) ## S3 method for class 'readonlydatapackage' dp_properties(x) ## S3 method for class 'editabledatapackage' dp_properties(x) ## S3 method for class 'dataresource' dp_properties(x) ## S3 method for class 'tableschema' dp_properties(x)
x |
the object for which to obtain the properties |
Returns a character vector (possibly zero length) with the names of the properties.
The dp_property
method can be used to get the values of the
properties.
Get and set properties of Data Packages and Data Resources
dp_property(x, attribute) ## S3 method for class 'readonlydatapackage' dp_property(x, attribute) ## S3 method for class 'editabledatapackage' dp_property(x, attribute) dp_property(x, attribute) <- value ## S3 replacement method for class 'readonlydatapackage' dp_property(x, attribute) <- value ## S3 replacement method for class 'editabledatapackage' dp_property(x, attribute) <- value ## S3 method for class 'dataresource' dp_property(x, attribute) ## S3 replacement method for class 'dataresource' dp_property(x, attribute) <- value ## S3 method for class 'tableschema' dp_property(x, attribute) ## S3 replacement method for class 'tableschema' dp_property(x, attribute) <- value ## S3 method for class 'fielddescriptor' dp_property(x, attribute) ## S3 replacement method for class 'fielddescriptor' dp_property(x, attribute) <- value
dp_property(x, attribute) ## S3 method for class 'readonlydatapackage' dp_property(x, attribute) ## S3 method for class 'editabledatapackage' dp_property(x, attribute) dp_property(x, attribute) <- value ## S3 replacement method for class 'readonlydatapackage' dp_property(x, attribute) <- value ## S3 replacement method for class 'editabledatapackage' dp_property(x, attribute) <- value ## S3 method for class 'dataresource' dp_property(x, attribute) ## S3 replacement method for class 'dataresource' dp_property(x, attribute) <- value ## S3 method for class 'tableschema' dp_property(x, attribute) ## S3 replacement method for class 'tableschema' dp_property(x, attribute) <- value ## S3 method for class 'fielddescriptor' dp_property(x, attribute) ## S3 replacement method for class 'fielddescriptor' dp_property(x, attribute) <- value
x |
a |
attribute |
a length 1 character vector with the name of the property. |
value |
the new value of the property. |
Either returns the property or modifies the object.
See dp_name
etc. for methods for specific properties for Data
Packages and dp_encoding
etc. for specific properties for Data
Resources. These specific methods also check if the input is valid for the
given property.
Modifying the resources of a Data Package
dp_resource(x, resource_name) ## S3 method for class 'datapackage' dp_resource(x, resource_name) dp_resource(x, resource_name) <- value ## S3 replacement method for class 'readonlydatapackage' dp_resource(x, resource_name) <- value ## S3 replacement method for class 'editabledatapackage' dp_resource(x, resource_name) <- value
dp_resource(x, resource_name) ## S3 method for class 'datapackage' dp_resource(x, resource_name) dp_resource(x, resource_name) <- value ## S3 replacement method for class 'readonlydatapackage' dp_resource(x, resource_name) <- value ## S3 replacement method for class 'editabledatapackage' dp_resource(x, resource_name) <- value
x |
a |
resource_name |
the name of a resource. |
value |
a |
When a resource with the name already exists this resource is overwritten. Therefore, the assignment operator can also be used to modify existing resources.
Either returns a Data Resource object or modifies the Data Package.
Get the names of the resources in a Data Package
dp_resource_names(dp)
dp_resource_names(dp)
dp |
A |
Returns a character vector with the names of the data resources in the Data Package.
Modify a set of Data Resources in a Data Package
dp_resources(x) <- value
dp_resources(x) <- value
x |
a |
value |
a |
Returns a modified x
.
Save a dataset as a Data Package
dp_save_as_datapackage( data, path, name, categories_type = c("regular", "resource") )
dp_save_as_datapackage( data, path, name, categories_type = c("regular", "resource") )
data |
the data.frame with the data to save |
path |
directory in which to create the datapackage |
name |
name of the Data Resource. When omitted a name is generated. |
categories_type |
how should categories be stored. See
|
This function is a wrapper function around new_datapackage
,
dp_generate_dataresource
and dp_write_data
. These
functions are called with the default arguments. This allows for a quick way
to save a data set with any necessary data needed to read the dataset.
Does not return anything. Called for the side effect of creating a directory and creating a number of files in this directory. Together these form a complete Data Package.
Convert a vector to 'boolean' using the specified field descriptor
dp_to_boolean(x, fielddescriptor = list(), ...)
dp_to_boolean(x, fielddescriptor = list(), ...)
x |
the vector to convert. |
fielddescriptor |
the field descriptor for the field. |
... |
passed on to other methods. |
When fielddescriptor
is missing a default field descriptor is
generated.
Will return an logical
vector with fielddescriptor
added as
the 'fielddescriptor' attribute.
code
using the associated categoriesRecode a variable to code
using the associated categories
dp_to_code(x, categorieslist = dp_categorieslist(x), ..., warn = FALSE)
dp_to_code(x, categorieslist = dp_categorieslist(x), ..., warn = FALSE)
x |
the variable to recode |
categorieslist |
a |
... |
passed on to |
warn |
give a warning when there is no code list. |
Uses the code
method from the 'codelist' package.
This package therefore needs to be installed. See the documentation of that
package for how to work with 'code' objects.
Returns a 'code' object or x
when no categories
could be found (categorieslist = NULL
).
An alternative is the dp_to_factor
function to convert to
regular R factor.
fn <- system.file("examples/iris", package = "datapackage") dp <- open_datapackage(fn) dta <- dp |> dp_get_data("complex", convert_categories = "no") dp_to_code(dta$factor1) dp |> dp_get_data("complex", convert_categories = "dp_to_code")
fn <- system.file("examples/iris", package = "datapackage") dp <- open_datapackage(fn) dta <- dp |> dp_get_data("complex", convert_categories = "no") dp_to_code(dta$factor1) dp |> dp_get_data("complex", convert_categories = "dp_to_code")
Convert a vector to 'date' using the specified field descriptor
dp_to_date(x, fielddescriptor = list(), ...)
dp_to_date(x, fielddescriptor = list(), ...)
x |
the vector to convert. |
fielddescriptor |
the field descriptor for the field. |
... |
passed on to other methods. |
When fielddescriptor
is missing a default field descriptor is
generated.
Will return an Date
vector with fielddescriptor
added as the
'fielddescriptor' attribute.
Convert a vector to 'datetime' using the specified field descriptor
dp_to_datetime(x, fielddescriptor = list(), ...)
dp_to_datetime(x, fielddescriptor = list(), ...)
x |
the vector to convert. |
fielddescriptor |
the field descriptor for the field. |
... |
passed on to other methods. |
When fielddescriptor
is missing a default field descriptor is
generated.
For the default format 'iso8601::iso8601todatetime' is used to convert. This function allows more formats than the Data Package standard prescribes. When format equals "any" the default 'as.POSIXct' function is used.
When x
is numeric or integer, it is assumed that these are seconds
since the unix time epoch (1970-01-01T00:00:00).
Will return an POSIXct
vector with fielddescriptor
added as the
'fielddescriptor' attribute.
Recode a variable to factor using the associated categories
dp_to_factor(x, categorieslist = dp_categorieslist(x), warn = TRUE)
dp_to_factor(x, categorieslist = dp_categorieslist(x), warn = TRUE)
x |
the variable to recode |
categorieslist |
a |
warn |
give a warning when there is no code list. |
Returns a factor vector or x
when no categories could be found
(categorieslist = NULL
).
An alternative is the dp_to_code
function to convert to
'code' object from the 'codelist' package.
fn <- system.file("examples/iris", package = "datapackage") dp <- open_datapackage(fn) dta <- dp |> dp_get_data("complex", convert_categories = "no") dp_to_factor(dta$factor1) dp |> dp_get_data("complex", convert_categories = "to_factor")
fn <- system.file("examples/iris", package = "datapackage") dp <- open_datapackage(fn) dta <- dp |> dp_get_data("complex", convert_categories = "no") dp_to_factor(dta$factor1) dp |> dp_get_data("complex", convert_categories = "to_factor")
Convert a vector to 'integer' using the specified field descriptor
dp_to_integer(x, fielddescriptor = list(), ...)
dp_to_integer(x, fielddescriptor = list(), ...)
x |
the vector to convert. |
fielddescriptor |
the field descriptor for the field. |
... |
passed on to other methods. |
When fielddescriptor
is missing a default field descriptor is
generated.
Will return an integer
vector with fielddescriptor
added as
the 'fielddescriptor' attribute.
Convert a vector to 'number' using the specified field descriptor
dp_to_number(x, fielddescriptor = list(), decimalChar = ".", ...)
dp_to_number(x, fielddescriptor = list(), decimalChar = ".", ...)
x |
the vector to convert. |
fielddescriptor |
the field descriptor for the field. |
decimalChar |
decimal separator. Used when the field field descriptor does not specify a decimal separator. |
... |
passed on to other methods. |
When fielddescriptor
is missing a default field descriptor is
generated.
Will return an numeric
vector with fielddescriptor
added as
the 'fielddescriptor' attribute.
Convert a vector to 'string' using the specified fielddescriptor
dp_to_string(x, fielddescriptor = list(), ...)
dp_to_string(x, fielddescriptor = list(), ...)
x |
the vector to convert. |
fielddescriptor |
the field descriptor for the field. |
... |
passed on to other methods. |
When fielddescriptor
is missing a default field descriptor is
generated.
Will return an character
vector with fielddescriptor
added as
the 'fielddescriptor' attribute.
Convert a vector to 'time' using the specified field descriptor
dp_to_time(x, fielddescriptor = list(), ...)
dp_to_time(x, fielddescriptor = list(), ...)
x |
the vector to convert. |
fielddescriptor |
the field descriptor for the field. |
... |
passed on to other methods. |
When fielddescriptor
is missing a default field descriptor is
generated.
For the default format 'iso8601::iso8601totime' is used to convert. This function allows more formats than the Data Package standard prescribes. When format equals "any" the default 'as.POSIXct' function is used.
When x
is numeric or integer, it is assumed that these are seconds
since the unix time epoch (1970-01-01T00:00:00Z).
Will return an Time
vector (see iso8601totime
with fielddescriptor
added as the 'fielddescriptor' attribute.
Convert a vector to 'year' using the specified field descriptor
dp_to_year(x, fielddescriptor = list(), ...)
dp_to_year(x, fielddescriptor = list(), ...)
x |
the vector to convert. |
fielddescriptor |
the field descriptor for the field. |
... |
passed on to other methods. |
When fielddescriptor
is missing a default field descriptor is
generated.
Will return an integer vector with fielddescriptor
added as the
'fielddescriptor' attribute.
Convert a vector to 'yearmonth' using the specified field descriptor
dp_to_yearmonth(x, fielddescriptor = list(), ...)
dp_to_yearmonth(x, fielddescriptor = list(), ...)
x |
the vector to convert. |
fielddescriptor |
the field descriptor for the field. |
... |
passed on to other methods. |
When fielddescriptor
is missing a default field descriptor is
generated.
Valid formats are "YYYY-MM" or "YYYYMM". When x is numeric or integer, it is assumed that it was a yearmonth in the format "YYYYMM" that was accidentally converted to numeric format.
Will return an Date
vector with fielddescriptor
added as the
'fielddescriptor' attribute. The dates will be the first of the given month.
Therefore, a 'yearmonth' "2024-05" is translated to a date "2024-05-01".
Write data of resource to file
dp_write_data(x, ..., write_categories = TRUE) ## S3 method for class 'datapackage' dp_write_data( x, resource_name, data, writer = "guess", ..., write_categories = TRUE ) ## S3 method for class 'dataresource' dp_write_data( x, data, datapackage = dp_get_datapackage(x), writer = "guess", ..., write_categories = TRUE )
dp_write_data(x, ..., write_categories = TRUE) ## S3 method for class 'datapackage' dp_write_data( x, resource_name, data, writer = "guess", ..., write_categories = TRUE ) ## S3 method for class 'dataresource' dp_write_data( x, data, datapackage = dp_get_datapackage(x), writer = "guess", ..., write_categories = TRUE )
x |
the Data Package or Data Resource to which the data is to be written to. |
... |
additional arguments are passed on to the writer function. |
write_categories |
write both the data set |
resource_name |
name of the Data Resource in the Data Package to which the data needs to be written. |
data |
|
writer |
the writer to use to write the data. This should be either a
function accepting the Data Package, name of the Data Resource, the data and
the |
datapackage |
the Data Package to which the data needs to be written. |
When writer = "guess"
the function will try to guess which writer to
use based on the format
and mediatype
of the Data Resource.
The function doesn't return anything. It is called for it's side effect of creating files in the directory of the Data Package.
Read the FWF-data for a Data Resource
fwf_reader(path, resource, convert_categories = c("no", "to_factor"), ...)
fwf_reader(path, resource, convert_categories = c("no", "to_factor"), ...)
path |
path to the data set. |
resource |
a Data Resource. |
convert_categories |
how to handle columns for which the field
descriptor has a |
... |
additional arguments are passed on to |
Returns a data.frame
with the data.
Generally used by calling dp_get_data
.
Creating and Adding Contributors to a Data Package
new_contributor( title, role = c("contributor", "author", "publisher", "maintainer", "wrangler"), path = NULL, email = NULL, organisation = NULL ) dp_add_contributor(x, contributor) dp_add_contributor(x) <- value
new_contributor( title, role = c("contributor", "author", "publisher", "maintainer", "wrangler"), path = NULL, email = NULL, organisation = NULL ) dp_add_contributor(x, contributor) dp_add_contributor(x) <- value
title |
A length 1 character vector with the full nam of the contributor. |
role |
The role of the contributor |
path |
A URL to e.g. a home page of the contributor |
email |
The email address of the contributor |
organisation |
The orgination the contributor belongs to. |
x |
The Data Package to which the contributor has to be added. |
contributor |
a contributor object |
value |
a contributor object |
new_contributor
returns a list with the given properties. This function
is meant to assist in creating valid contributors.
dp <- open_datapackage(system.file(package = "datapackage", "examples/iris")) dp_contributors(dp) dp_contributors(dp) <- list( new_contributor("John Doe", email = "[email protected]"), list(title = "Jane Doe", role = "maintainer") ) dp_add_contributor(dp) <- new_contributor("Janet Doe")
dp <- open_datapackage(system.file(package = "datapackage", "examples/iris")) dp_contributors(dp) dp_contributors(dp) <- list( new_contributor("John Doe", email = "[email protected]"), list(title = "Jane Doe", role = "maintainer") ) dp_add_contributor(dp) <- new_contributor("Janet Doe")
Create a new Data Package
new_datapackage(path, name = NULL, title = NULL, description = NULL, ...)
new_datapackage(path, name = NULL, title = NULL, description = NULL, ...)
path |
The directory which will contain the Data Package or the filename in which to write the Data Package. |
name |
The name of the Data Package. |
title |
The title of the Data Package. |
description |
The description of the Data Package. |
... |
Ignored for now. |
The directory of path
, or the directory containing path
if path
is a file name, is created and the file with the Data Package information is
created. When path
is a directory a file datapackage.json
is
created. The function returns an editable datapackage
object.
dir <- tempdir() dp <- new_datapackage(dir, name = "test-package") dp_title(dp) <- "A Test Data Package" dp_add_contributor(dp) <- new_contributor(title = "John Doe")
dir <- tempdir() dp <- new_datapackage(dir, name = "test-package") dp_title(dp) <- "A Test Data Package" dp_add_contributor(dp) <- new_contributor(title = "John Doe")
Create a new Data Resource
new_dataresource( name, title = NULL, description = NULL, path = NULL, format = NULL, mediatype = NULL, encoding = NULL, bytes = NULL, hash = NULL, ... )
new_dataresource( name, title = NULL, description = NULL, path = NULL, format = NULL, mediatype = NULL, encoding = NULL, bytes = NULL, hash = NULL, ... )
name |
The name of the Data Resource. |
title |
The title of the Data Resource. |
description |
The description of the Data Resource. |
path |
the path of the Data Resource |
format |
the format of the Data Resource |
mediatype |
the mediatype of the Data Resource |
encoding |
the encoding of the Data Resource |
bytes |
the number of bytes of the Data Resource |
hash |
the hash of the Data Resource |
... |
additional arguments are added as additional properties. It is checked if these are valid. |
Returns a dataresource
object which is a list with the properties of
the Data Resource.
dir <- tempdir() dp <- new_datapackage(dir, name = "test-package") res <- new_dataresource(name = "iris") dp_title(res) <- "The Iris Data Set" dp_encoding(res) <- "UTF-8" dp_mediatype(res) <- "text/csv" # resource adds a resource if it doesn't yet exist or updates # an existing resource dp_resource(dp, "iris") <- res
dir <- tempdir() dp <- new_datapackage(dir, name = "test-package") res <- new_dataresource(name = "iris") dp_title(res) <- "The Iris Data Set" dp_encoding(res) <- "UTF-8" dp_mediatype(res) <- "text/csv" # resource adds a resource if it doesn't yet exist or updates # an existing resource dp_resource(dp, "iris") <- res
Open a data package
open_datapackage(path, readonly = TRUE)
open_datapackage(path, readonly = TRUE)
path |
The filename or the data package description or the directory in which the data package is located. |
readonly |
Open the data package as a read-only data package or not. See 'details' |
When path
is a directory name, the function looks for the files
'datapackage.json' or 'datapackage.yaml' in the directory. Otherwise, the
function assumes the file contains the description of the data package.
When the data package is read with readonly = FALSE
, any operations
reading properties from the data package read those properties directly from
the file on disk. And setting the properties will change the file on disk.
This ensures the file is always consistent.
Returns a list with the contents of the data package when
readonly = TRUE
. Otherwise an empty list is returned. In both cases
the filename of the data package description (typically 'datapackage.json')
and the director in which the data package is located are stored in
attributes of the result.
Getting and setting properties of Data Packages
dp_contributors(x, ...) dp_contributors(x) <- value ## S3 method for class 'datapackage' dp_contributors(x, ...) ## S3 replacement method for class 'datapackage' dp_contributors(x) <- value dp_name(x) ## S3 method for class 'datapackage' dp_name(x) dp_name(x) <- value ## S3 replacement method for class 'datapackage' dp_name(x) <- value dp_title(x) ## S3 method for class 'datapackage' dp_title(x) dp_title(x) <- value ## S3 replacement method for class 'datapackage' dp_title(x) <- value dp_description(x, ..., first_paragraph = FALSE, dots = FALSE) ## S3 method for class 'datapackage' dp_description(x, ..., first_paragraph = FALSE, dots = FALSE) dp_description(x) <- value ## S3 replacement method for class 'datapackage' dp_description(x) <- value dp_keywords(x, ...) ## S3 method for class 'datapackage' dp_keywords(x, ...) dp_keywords(x) <- value ## S3 replacement method for class 'datapackage' dp_keywords(x) <- value dp_created(x, ...) ## S3 method for class 'datapackage' dp_created(x, ...) dp_created(x) <- value ## S3 replacement method for class 'datapackage' dp_created(x) <- value dp_id(x, ...) ## S3 method for class 'datapackage' dp_id(x, ...) dp_id(x) <- value ## S3 replacement method for class 'datapackage' dp_id(x) <- value
dp_contributors(x, ...) dp_contributors(x) <- value ## S3 method for class 'datapackage' dp_contributors(x, ...) ## S3 replacement method for class 'datapackage' dp_contributors(x) <- value dp_name(x) ## S3 method for class 'datapackage' dp_name(x) dp_name(x) <- value ## S3 replacement method for class 'datapackage' dp_name(x) <- value dp_title(x) ## S3 method for class 'datapackage' dp_title(x) dp_title(x) <- value ## S3 replacement method for class 'datapackage' dp_title(x) <- value dp_description(x, ..., first_paragraph = FALSE, dots = FALSE) ## S3 method for class 'datapackage' dp_description(x, ..., first_paragraph = FALSE, dots = FALSE) dp_description(x) <- value ## S3 replacement method for class 'datapackage' dp_description(x) <- value dp_keywords(x, ...) ## S3 method for class 'datapackage' dp_keywords(x, ...) dp_keywords(x) <- value ## S3 replacement method for class 'datapackage' dp_keywords(x) <- value dp_created(x, ...) ## S3 method for class 'datapackage' dp_created(x, ...) dp_created(x) <- value ## S3 replacement method for class 'datapackage' dp_created(x) <- value dp_id(x, ...) ## S3 method for class 'datapackage' dp_id(x, ...) dp_id(x) <- value ## S3 replacement method for class 'datapackage' dp_id(x) <- value
x |
a |
... |
used to pass additional arguments to other methods. |
value |
the new value of the property. |
first_paragraph |
Only return the first paragraph of the description. |
dots |
When returning only the first paragraph indicate missing
paragraphs with |
Either returns the property or modifies the object.
See dp_resource
for methods for getting and setting the resources
of a Data Package.
See PropertiesDataresource and PropertiesFielddescriptor for methods
for Data Resources and Field Descriptors respectively. Also see
dp_property
for a generic method for getting and setting
properties. These functions can also be used to get and set 'unofficial'
properties'
Getting and setting properties of Data Resources
## S3 method for class 'dataresource' dp_name(x) ## S3 replacement method for class 'dataresource' dp_name(x) <- value ## S3 method for class 'dataresource' dp_title(x) ## S3 replacement method for class 'dataresource' dp_title(x) <- value ## S3 method for class 'dataresource' dp_description(x, ..., first_paragraph = FALSE, dots = FALSE) ## S3 replacement method for class 'dataresource' dp_description(x) <- value dp_path(x, ...) dp_path(x) <- value ## S3 method for class 'dataresource' dp_path(x, full_path = FALSE, ...) ## S3 replacement method for class 'dataresource' dp_path(x) <- value dp_format(x, ...) dp_format(x) <- value ## S3 method for class 'dataresource' dp_format(x, default = FALSE, ...) ## S3 replacement method for class 'dataresource' dp_format(x) <- value dp_mediatype(x, ...) dp_mediatype(x) <- value ## S3 method for class 'dataresource' dp_mediatype(x, ...) ## S3 replacement method for class 'dataresource' dp_mediatype(x) <- value dp_encoding(x, default = FALSE, ...) dp_encoding(x) <- value ## S3 method for class 'dataresource' dp_encoding(x, default = FALSE, ...) ## S3 replacement method for class 'dataresource' dp_encoding(x) <- value dp_bytes(x, ...) dp_bytes(x) <- value ## S3 method for class 'dataresource' dp_bytes(x, ...) ## S3 replacement method for class 'dataresource' dp_bytes(x) <- value dp_hash(x, ...) dp_hash(x) <- value ## S3 method for class 'dataresource' dp_hash(x, ...) ## S3 replacement method for class 'dataresource' dp_hash(x) <- value ## S3 replacement method for class 'fielddescriptor' dp_name(x) <- value ## S3 replacement method for class 'fielddescriptor' dp_title(x) <- value ## S3 method for class 'fielddescriptor' dp_description(x, ..., first_paragraph = FALSE, dots = FALSE) ## S3 replacement method for class 'fielddescriptor' dp_format(x) <- value dp_schema(x) ## S3 method for class 'dataresource' dp_schema(x)
## S3 method for class 'dataresource' dp_name(x) ## S3 replacement method for class 'dataresource' dp_name(x) <- value ## S3 method for class 'dataresource' dp_title(x) ## S3 replacement method for class 'dataresource' dp_title(x) <- value ## S3 method for class 'dataresource' dp_description(x, ..., first_paragraph = FALSE, dots = FALSE) ## S3 replacement method for class 'dataresource' dp_description(x) <- value dp_path(x, ...) dp_path(x) <- value ## S3 method for class 'dataresource' dp_path(x, full_path = FALSE, ...) ## S3 replacement method for class 'dataresource' dp_path(x) <- value dp_format(x, ...) dp_format(x) <- value ## S3 method for class 'dataresource' dp_format(x, default = FALSE, ...) ## S3 replacement method for class 'dataresource' dp_format(x) <- value dp_mediatype(x, ...) dp_mediatype(x) <- value ## S3 method for class 'dataresource' dp_mediatype(x, ...) ## S3 replacement method for class 'dataresource' dp_mediatype(x) <- value dp_encoding(x, default = FALSE, ...) dp_encoding(x) <- value ## S3 method for class 'dataresource' dp_encoding(x, default = FALSE, ...) ## S3 replacement method for class 'dataresource' dp_encoding(x) <- value dp_bytes(x, ...) dp_bytes(x) <- value ## S3 method for class 'dataresource' dp_bytes(x, ...) ## S3 replacement method for class 'dataresource' dp_bytes(x) <- value dp_hash(x, ...) dp_hash(x) <- value ## S3 method for class 'dataresource' dp_hash(x, ...) ## S3 replacement method for class 'dataresource' dp_hash(x) <- value ## S3 replacement method for class 'fielddescriptor' dp_name(x) <- value ## S3 replacement method for class 'fielddescriptor' dp_title(x) <- value ## S3 method for class 'fielddescriptor' dp_description(x, ..., first_paragraph = FALSE, dots = FALSE) ## S3 replacement method for class 'fielddescriptor' dp_format(x) <- value dp_schema(x) ## S3 method for class 'dataresource' dp_schema(x)
x |
a |
value |
the new value of the property. |
... |
used to pass additional arguments to other methods. |
first_paragraph |
Only return the first paragraph of the description. |
dots |
When returning only the first paragraph indicate missing
paragraphs with |
full_path |
Return the full path including the path to the Data Package and not only the path relative to the Data Package. This is only relevant for relative paths. |
default |
return the default value if the property had a default value and the property is not set. |
Either returns the property or modifies the object. If the property of not
set NULL
is returned (unless default = TRUE
).
See PropertiesDatapackage and PropertiesFielddescriptor for methods
for Data Packages and Field Descriptors respectively. Also see
dp_property
for a generic method for getting and setting
properties. These functions can also be used to get and set 'unofficial'
properties'
Getting and setting properties of Data Resources
## S3 method for class 'fielddescriptor' dp_name(x) ## S3 method for class 'fielddescriptor' dp_title(x) ## S3 replacement method for class 'fielddescriptor' dp_description(x) <- value ## S3 method for class 'fielddescriptor' dp_format(x, ...)
## S3 method for class 'fielddescriptor' dp_name(x) ## S3 method for class 'fielddescriptor' dp_title(x) ## S3 replacement method for class 'fielddescriptor' dp_description(x) <- value ## S3 method for class 'fielddescriptor' dp_format(x, ...)
x |
a |
value |
the new value of the property. |
... |
used to pass additional arguments to other methods. |
Either returns the property or modifies the object. If the property is not
set NULL
is returned (unless default = TRUE
).
See PropertiesDatapackage and PropertiesDataresource for methods
for Data Packages and Data Resources respectively. Also see
dp_property
for a generic method for getting and setting
properties. These functions can also be used to get and set 'unofficial'
properties'