8000 Releases · EdwinTh/padr · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Releases: EdwinTh/padr

Patch release tibble

07 Apr 07:09
Compare
Choose a tag to compare

When the tibble package moved to v3 some breaking changes were introduced, after this release padr plays nicely with tibble again.

padr 0.5.0

11 Jun 14:07
Compare
Choose a tag to compare

thicken gained two arguments (drop and ties_to_earlier) and a couple of bug fixes

padr 0.4.2

21 Apr 13:52
Compare
Choose a tag to compare

Patch release to remove the sample dependency from the fill_by_ unit tests. This was required by the CRAN maintainers, since the behaviour of sample will change in R 3.6.

padr 0.4.1

27 Jun 14:13
Compare
Choose a tag to compare

Patch release requested by CRAN maintainers. lintr was not called in the test suite, this is now fixed.

padr 0.4.0 introducing the custom suite

02 Dec 09:14
Compare
Choose a tag to compare

Improvements

  • thicken is sped up significantly:
  • get_interval no longer applied to assess interval validity (its slow on large variables because it converts a POSIX to character). Rather validity is now compared after thickening by checking if results differes frome original. Makes function approximately four times faster.
  • get_interval is sped up significantly:
  • to convert date to character format is used, instead of as.character. For large vectors it 4 to 5 times faster.

New Features

  • span_date and span_time are new functions and they are wrappers around seq.Date and seq.POSIXt respectively. Because of their default settings (minimal specification of date and datetimes and interval inference) they require very little inputs for straightforward spanning.

  • The closest_weekday function is introduced. It finds the closest requested weekday around the start of a datetime variable. This function helps to find quickly the start_val for thicken when the interval is "week".

  • Two new functions are introduced that help with visualising interval data.

  • center_interval shifts the datetime variable from either the beginning or the end of the interval, to the center of the interval. This will improve visualisations such as dot plots and bar plots, where the timestamp is still considered to be continuous.

  • format_interval takes the start_value of an interval and infers the end. It uses strftime on both the start value and the end value, to create a character vector that reflects the full interval.

  • The _cust suite allows for user-specified spanning to use in thickening and padding.
  • to create an asymmetric spanning, subset_span subsets a datetime vector to the desired date and time points. These are provided in a list.

  • span_around takes a datetime variable as input and spans a variable around it of a desired interval. This automates finding the min and the man of x manually, determining which values are needed to create a span of a desired interval, and do the actual spanning.

Bug Fixes / Enhancements

  • Both pad and thicken will no longer break when there are missing values in the datetime variable. Rows containing missing values will be retained in the returned data frame. In the case of thicken they will remain on the same position as the input data frame. The added column will have a missing value as well. For pad all the rows with missing values will be moved to the end of the dataframe, since there is no natural position for them in the order of padded rows.

  • When time variable has NULL as timezone, also posix_to_date used to break (related to #14). This made thicken break when the desired interval is "day" or higher. This is now fixed by don't regarding the timezone.

  • get_interval now throws an informative error when the datetime variable has missing values (#33).

  • pad now throws an informative error when the datetime variable is used in the grouping (#38)

  • added "ByteCompile: true" to DESCRIPTION.

Further Changes

  • pad no longer throws a message when the interval is specified (#31).

  • span around hours and minutes now start at the current hour and minute. This to make span_around sensible.

First version where interval had unit added to it.

15 May 19:51
Compare
Choose a tag to compare

padr 0.3.0

Changes

Interval no long required to be of a single unit

The interval is no longer limited to be of a single unit, for each of the eight interval sizes. Every time span accepted by seq.Date or seq.POSIXt is now accepted. Since the original implementation was fully around single-unit-intervals, some default behavior had to change. Because of it, this version is not entirely backwards compatible with earlier versions of padr. The following functions are affected:

  • thicken: the interval argument now has to be specified. In earlier versions it was optional. When it was not specified, the added variable was one interval level higher than that of the input datetime variable. With the widening of the interval definition, there is not longer a natural step up.

  • get_interval: does no longer only retrieve the interval of a datetime variable, but also its unit (the step size). For instance, the following would have returned "day" in the past, but will now return "2 day":

date_var <- as.Date(c('2017-01-01', '2017-01-03', '2017-01-05'))
get_interval(date_var)

  • pad: when the interval is not specified, get_interval is applied on the datetime variable. Its outcome might now be different. When get_interval returns a different interval than it used to, pad will do the padding at this different interval. Extending the above example, the have resulted in a data frame with two padded rows:

x <- data.frame(date_var, y = 1:3)

Since the interval of date_var used be "day", there were missing records for 2017-01-02 and 2017-01-04. These records were inserted, with missing values for y. However, now the interval of date_var is "2 day" and on this level there is no need for padding. To get ther original result the interval argument should be specified with "day".

Changes in pad

Pad has been reimplemented

The function was slow when applied on many group becuaus looped over them. Function has been reimplemented so it needs only one join to do the padding for all the groups simultaneously. dplyr functions are used for this new implementation, both for speed and coding clarity.

When applying pad to groups the interval is determined differently. It used to determine the interval seperately for each of the groups. With the new interval definition this would often yield undesired results. Now, the interval on the full datetime variable, ignoring the groups. If the user would like to allow for differing intervals over the groups it is advised to use dplyr::do. See also the final example of pad.

dplyr::group_by

Besides its own argument for grouping, pad does now also accepts the grouping from dplyr. Making the following two results equal:

x %>% dplyr::group_by(z) %>% pad
x %>% pad(group = 'z')

Moreover, both pad and thicken now maintain the grouping of the input data_frame. The return from both functions will have the exact same grouping.

break_above

This new argument to pad is a safety net for situations where the returned dataframe is much larger than the user anticipated. This would happen when the datetime variable is of a lower interval than the user thought it was. Before doing the actual padding, the function estimates the number of rows in the result. If these are above break_above the function will break.

Changes in thicken

  • Observations before the start_val are now removed from the dataset (with a warning). They used to be all mapped to the start_val.

fill_by functions default behavior is changed.

They used to require specification of all the column names that had to filled. This is annoying when many columns had to filled. The functions no longer break when no variable names are specified, but they fill all columns in the data frame.

New features

pad_int

The new function pad_int does padding of an integer field. Its working is very similar to the general pad. The by argument must alway be specified, since a data.frame would almost alway contain multiple numeric columns. Instead of the interval, one can specify the step size by which the integer increases.

Bug fixes

  • Issue #13 When the end_val is specified in pad, it would mistakenly update the start_val with its value. This resulted in the return of only the last line of the padded data.frame, instead of the full padded data.frame.

  • Issue #14 When dt_var has NULL as timezone, to_posix (helper of round_thicken, which itself is a helper of thicken) used to break, and thereby thicken itself broke.

  • Issue #24 In pad with grouping, the function will no longer breaks if for one of the groups the start_val is behind its last observation, or the end_val is before its first observation. Group is omitted and warning is thrown. If all groups are omitted, function breaks with an informative error. The same goes when there is no grouping.

Other changes

  • For determining the interval in pad the start_val and/or the end_val are taken into account, if specified. They are concatenated to the datetime variable before the interval is determined.

  • Both pad and thicken now throw informative errors when the start_val or end_val (pad only) are of the wrong class.

Bug fix #11

24 Feb 19:37
Compare
Choose a tag to compare

In this patch release the major bug #11 was fixed. No functional changes from v0.2.0

Added group argument

13 Feb 09:03
Compare
Choose a tag to compare

pad function has gained the group argument, making it possible to do the padding within the groups. It also has three bug fixes (#8, #9, # 10)

NOTE this release contains a major bug (#11) breaking pad when using nonstandard arguments.

First release on CRAN

24 Jan 07:38
Compare
Choose a tag to compare

This release went to CRAN on Jan 19 2017. It is the first version of padr containing two datasets: coffee and emergency, and six functions: pad, thicken, get_interval, fill_by_value, fill_by_function, fill_by_prevalent

0