Package 'jmotif' reference manual

Title:	Time Series Analysis Toolkit Based on Symbolic Aggregate Discretization, i.e. SAX
Description:	Implements time series z-normalization, SAX, HOT-SAX, VSM, SAX-VSM, RePair, and RRA algorithms facilitating time series motif (i.e., recurrent pattern), discord (i.e., anomaly), and characteristic pattern discovery along with interpretable time series classification.
Authors:	Pavel Senin [aut, cre]
Maintainer:	Pavel Senin <[email protected]>
License:	GPL-2
Version:	1.1.1
Built:	2025-03-12 04:44:38 UTC
Source:	https://github.com/jmotif/jmotif-r

Translates an alphabet size into the array of corresponding SAX cut-lines built using the Normal distribution.

Description

Translates an alphabet size into the array of corresponding SAX cut-lines built using the Normal distribution.

Usage

alphabet_to_cuts(a_size)
alphabet_to_cuts(a_size)

Arguments

a_size

the alphabet size, a value between 2 and 20 (inclusive).

References

Lonardi, S., Lin, J., Keogh, E., Patel, P., Finding motifs in time series. In Proc. of the 2nd Workshop on Temporal Data Mining (pp. 53-68). (2002)

Examples

alphabet_to_cuts(5)
alphabet_to_cuts(5)

Computes a TF-IDF weight vectors for a set of word bags.

Description

Computes a TF-IDF weight vectors for a set of word bags.

Usage

bags_to_tfidf(data)
bags_to_tfidf(data)

Arguments

data

the list containing the input word bags.

References

Senin Pavel and Malinchik Sergey, SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model. Data Mining (ICDM), 2013 IEEE 13th International Conference on, pp.1175,1180, 7-10 Dec. 2013.

Salton, G., Wong, A., Yang., C., A vector space model for automatic indexing. Commun. ACM 18, 11, 613-620, 1975.

Examples

bag1 = data.frame(
   "words" = c("this", "is", "a", "sample"),
   "counts" = c(1, 1, 2, 1),
   stringsAsFactors = FALSE
   )
bag2 = data.frame(
   "words" = c("this", "is", "another", "example"),
   "counts" = c(1, 1, 2, 3),
   stringsAsFactors = FALSE
   )
ll = list("bag1" = bag1, "bag2" = bag2)
tfidf = bags_to_tfidf(ll)
bag1 = data.frame(
   "words" = c("this", "is", "a", "sample"),
   "counts" = c(1, 1, 2, 1),
   stringsAsFactors = FALSE
   )
bag2 = data.frame(
   "words" = c("this", "is", "another", "example"),
   "counts" = c(1, 1, 2, 3),
   stringsAsFactors = FALSE
   )
ll = list("bag1" = bag1, "bag2" = bag2)
tfidf = bags_to_tfidf(ll)

A standard UCR Cylinder-Bell-Funnel dataset from http://www.cs.ucr.edu/~eamonn/time_series_data

Description

A standard UCR Cylinder-Bell-Funnel dataset from http://www.cs.ucr.edu/~eamonn/time_series_data

Usage

CBF
CBF

Format

A four-elements list containing train and test data along with their labels

labels_train: the training data labels, correspond to data matrix rows
data_train: the training data matrix, each row is a time series instance
labels_test: the test data labels, correspond to data matrix rows
data_test: the test data matrix, each row is a time series instance

Computes the cosine similarity between numeric vectors

Description

Computes the cosine similarity between numeric vectors

Usage

cosine_dist(m)
cosine_dist(m)

Arguments

`m`	the data matrix

Value

Returns the cosine similarity

Examples

a <- c(2, 1, 0, 2, 0, 1, 1, 1)
b <- c(2, 1, 1, 1, 1, 0, 1, 1)
sim <- cosine_dist(rbind(a,b))
a <- c(2, 1, 0, 2, 0, 1, 1, 1)
b <- c(2, 1, 1, 1, 1, 0, 1, 1)
sim <- cosine_dist(rbind(a,b))

Computes the cosine distance value between a bag of words and a set of TF-IDF weight vectors.

Description

Computes the cosine distance value between a bag of words and a set of TF-IDF weight vectors.

Usage

cosine_sim(data)
cosine_sim(data)

Arguments

data

the list containing a word-bag and the TF-IDF object.

References

Salton, G., Wong, A., Yang., C., A vector space model for automatic indexing. Commun. ACM 18, 11, 613-620, 1975.

Finds the Euclidean distance between points, if distance is above the threshold, abandons the computation and returns NAN.

Description

Finds the Euclidean distance between points, if distance is above the threshold, abandons the computation and returns NAN.

Usage

early_abandoned_dist(seq1, seq2, upper_limit)
early_abandoned_dist(seq1, seq2, upper_limit)

Arguments

`seq1`	the array 1.
`seq2`	the array 2.
`upper_limit`	the max value after reaching which the distance computation stops and the NAN is returned.

A PHYSIONET dataset

Description

A PHYSIONET dataset

Usage

ecg0606
ecg0606

Format

A vector of numeric values

Finds the Euclidean distance between points.

Description

Finds the Euclidean distance between points.

Usage

euclidean_dist(seq1, seq2)
euclidean_dist(seq1, seq2)

Arguments

`seq1`	the array 1.
`seq2`	the array 2. stops and the NAN is returned.

Finds a discord using brute force algorithm.

Description

Finds a discord using brute force algorithm.

Usage

find_discords_brute_force(ts, w_size, discords_num)
find_discords_brute_force(ts, w_size, discords_num)

Arguments

`ts`	the input timeseries.
`w_size`	the sliding window size.
`discords_num`	the number of discords to report.

References

Keogh, E., Lin, J., Fu, A., HOT SAX: Efficiently finding the most unusual time series subsequence. Proceeding ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining

Examples

discords = find_discords_brute_force(ecg0606[1:600], 100, 1)
plot(ecg0606[1:600], type = "l", col = "cornflowerblue", main = "ECG 0606")
lines(x=c(discords[1,2]:(discords[1,2]+100)),
   y=ecg0606[discords[1,2]:(discords[1,2]+100)], col="red")
discords = find_discords_brute_force(ecg0606[1:600], 100, 1)
plot(ecg0606[1:600], type = "l", col = "cornflowerblue", main = "ECG 0606")
lines(x=c(discords[1,2]:(discords[1,2]+100)),
   y=ecg0606[discords[1,2]:(discords[1,2]+100)], col="red")

Finds a discord (i.e. time series anomaly) with HOT-SAX. Usually works the best with lower sizes of discretization parameters: PAA and Alphabet.

Description

Finds a discord (i.e. time series anomaly) with HOT-SAX. Usually works the best with lower sizes of discretization parameters: PAA and Alphabet.

Usage

find_discords_hotsax(ts, w_size, paa_size, a_size, n_threshold, discords_num)
find_discords_hotsax(ts, w_size, paa_size, a_size, n_threshold, discords_num)

Arguments

`ts`	the input timeseries.
`w_size`	the sliding window size.
`paa_size`	the PAA size.
`a_size`	the alphabet size.
`n_threshold`	the normalization threshold.
`discords_num`	the number of discords to report.

References

Keogh, E., Lin, J., Fu, A., HOT SAX: Efficiently finding the most unusual time series subsequence. Proceeding ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining

Examples

discords = find_discords_hotsax(ecg0606, 100, 3, 3, 0.01, 1)
plot(ecg0606, type = "l", col = "cornflowerblue", main = "ECG 0606")
lines(x=c(discords[1,2]:(discords[1,2]+100)),
   y=ecg0606[discords[1,2]:(discords[1,2]+100)], col="red")
discords = find_discords_hotsax(ecg0606, 100, 3, 3, 0.01, 1)
plot(ecg0606, type = "l", col = "cornflowerblue", main = "ECG 0606")
lines(x=c(discords[1,2]:(discords[1,2]+100)),
   y=ecg0606[discords[1,2]:(discords[1,2]+100)], col="red")

Finds a discord with RRA (Rare Rule Anomaly) algorithm. Usually works the best with higher than that for HOT-SAX sizes of discretization parameters (i.e., PAA and Alphabet sizes).

Description

Finds a discord with RRA (Rare Rule Anomaly) algorithm. Usually works the best with higher than that for HOT-SAX sizes of discretization parameters (i.e., PAA and Alphabet sizes).

Usage

find_discords_rra(
  series,
  w_size,
  paa_size,
  a_size,
  nr_strategy,
  n_threshold,
  discords_num
)
find_discords_rra(
  series,
  w_size,
  paa_size,
  a_size,
  nr_strategy,
  n_threshold,
  discords_num
)

Arguments

`series`	the input timeseries.
`w_size`	the sliding window size.
`paa_size`	the PAA size.
`a_size`	the alphabet size.
`nr_strategy`	the numerosity reduction strategy ("none", "exact", "mindist").
`n_threshold`	the normalization threshold.
`discords_num`	the number of discords to report.

References

Senin Pavel and Malinchik Sergey, SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model., Data Mining (ICDM), 2013 IEEE 13th International Conference on.

Examples

discords = find_discords_rra(ecg0606, 100, 4, 4, "none", 0.01, 1)
plot(ecg0606, type = "l", col = "cornflowerblue", main = "ECG 0606")
lines(x=c(discords[1,2]:(discords[1,2]+100)),
   y=ecg0606[discords[1,2]:(discords[1,2]+100)], col="red")
discords = find_discords_rra(ecg0606, 100, 4, 4, "none", 0.01, 1)
plot(ecg0606, type = "l", col = "cornflowerblue", main = "ECG 0606")
lines(x=c(discords[1,2]:(discords[1,2]+100)),
   y=ecg0606[discords[1,2]:(discords[1,2]+100)], col="red")

A standard UCR Gun Point dataset from http://www.cs.ucr.edu/~eamonn/time_series_data

Description

A standard UCR Gun Point dataset from http://www.cs.ucr.edu/~eamonn/time_series_data

Usage

Gun_Point
Gun_Point

Format

A four-elements list containing train and test data along with their labels

labels_train: the training data labels, correspond to data matrix rows
data_train: the training data matrix, each row is a time series instance
labels_test: the test data labels, correspond to data matrix rows
data_test: the test data matrix, each row is a time series instance

Get the ASCII letter by an index.

Description

Get the ASCII letter by an index.

Usage

idx_to_letter(idx)
idx_to_letter(idx)

Arguments

idx

the index.

Examples

# letter 'b'
idx_to_letter(2)
# letter 'b'
idx_to_letter(2)

Compares two strings using mindist.

Description

Compares two strings using mindist.

Usage

is_equal_mindist(a, b)
is_equal_mindist(a, b)

Arguments

`a`	the string a.
`b`	the string b.

Examples

is_equal_str("aaa", "bbb") # true
is_equal_str("aaa", "ccc") # false
is_equal_str("aaa", "bbb") # true
is_equal_str("aaa", "ccc") # false

Compares two strings using natural letter ordering.

Description

Compares two strings using natural letter ordering.

Usage

is_equal_str(a, b)
is_equal_str(a, b)

Arguments

`a`	the string a.
`b`	the string b.

Examples

is_equal_str("aaa", "bbb")
is_equal_str("ccc", "ccc")
is_equal_str("aaa", "bbb")
is_equal_str("ccc", "ccc")

Get the index for an ASCII letter.

Description

Get the index for an ASCII letter.

Usage

letter_to_idx(letter)
letter_to_idx(letter)

Arguments

letter

the letter.

Examples

# letter 'b' translates to 2
letter_to_idx('b')
# letter 'b' translates to 2
letter_to_idx('b')

Get an ASCII indexes sequence for a given character array.

Description

Get an ASCII indexes sequence for a given character array.

Usage

letters_to_idx(str)
letters_to_idx(str)

Arguments

str

the character array.

Examples

letters_to_idx(c('a','b','c','a'))
letters_to_idx(c('a','b','c','a'))

Converts a set of time-series into a single bag of words.

Description

Converts a set of time-series into a single bag of words.

Usage

manyseries_to_wordbag(data, w_size, paa_size, a_size, nr_strategy, n_threshold)
manyseries_to_wordbag(data, w_size, paa_size, a_size, nr_strategy, n_threshold)

Arguments

`data`	the timeseries data, row-wise.
`w_size`	the sliding window size.
`paa_size`	the PAA size.
`a_size`	the alphabet size.
`nr_strategy`	the NR strategy.
`n_threshold`	the normalization threshold.

References

Salton, G., Wong, A., Yang., C., A vector space model for automatic indexing. Commun. ACM 18, 11, 613-620, 1975.

Computes the mindist value for two strings

Description

Computes the mindist value for two strings

Usage

min_dist(str1, str2, alphabet_size, compression_ratio = 1)
min_dist(str1, str2, alphabet_size, compression_ratio = 1)

Arguments

`str1`	the first string
`str2`	the second string
`alphabet_size`	the used alphabet size
`compression_ratio`	the distance compression ratio

Value

Returns the distance between strings

References

Lonardi, S., Lin, J., Keogh, E., Patel, P., Finding motifs in time series. In Proc. of the 2nd Workshop on Temporal Data Mining (pp. 53-68).

Examples

str1 <- c('a', 'b', 'c')
str2 <- c('c', 'b', 'a')
min_dist(str1, str2, 3)
str1 <- c('a', 'b', 'c')
str2 <- c('c', 'b', 'a')
min_dist(str1, str2, 3)

Computes a Piecewise Aggregate Approximation (PAA) for a time series.

Description

Computes a Piecewise Aggregate Approximation (PAA) for a time series.

Usage

paa(ts, paa_num)
paa(ts, paa_num)

Arguments

`ts`	a timeseries to compute the PAA for.
`paa_num`	the desired PAA size.

References

Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S., Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems, 3(3), 263-286. (2001)

Examples

x = c(-1, -2, -1, 0, 2, 1, 1, 0)
x_paa3 = paa(x, 3)
#
plot(x, type = "l", main = c("8-points time series and its PAA transform into three points",
                          "PAA shown schematically in blue"))
points(x, pch = 16, lwd = 5)
#
paa_bounds = c(1, 1+7/3, 1+7/3*2, 8)
abline(v = paa_bounds, lty = 3, lwd = 2, col = "cornflowerblue")
segments(paa_bounds[1:3], x_paa3, paa_bounds[2:4], x_paa3, col = "cornflowerblue", lwd = 2)
points(x = c(1, 1+7/3, 1+7/3*2) + (7/3)/2, y = x_paa3, pch = 15, lwd = 5, col = "cornflowerblue")
x = c(-1, -2, -1, 0, 2, 1, 1, 0)
x_paa3 = paa(x, 3)
#
plot(x, type = "l", main = c("8-points time series and its PAA transform into three points",
                          "PAA shown schematically in blue"))
points(x, pch = 16, lwd = 5)
#
paa_bounds = c(1, 1+7/3, 1+7/3*2, 8)
abline(v = paa_bounds, lty = 3, lwd = 2, col = "cornflowerblue")
segments(paa_bounds[1:3], x_paa3, paa_bounds[2:4], x_paa3, col = "cornflowerblue", lwd = 2)
points(x = c(1, 1+7/3, 1+7/3*2) + (7/3)/2, y = x_paa3, pch = 15, lwd = 5, col = "cornflowerblue")

Discretize a time series with SAX using chunking (no sliding window).

Description

Discretize a time series with SAX using chunking (no sliding window).

Usage

sax_by_chunking(ts, paa_size, a_size, n_threshold)
sax_by_chunking(ts, paa_size, a_size, n_threshold)

Arguments

`ts`	the input time series.
`paa_size`	the PAA size.
`a_size`	the alphabet size.
`n_threshold`	the normalization threshold.

References

Lonardi, S., Lin, J., Keogh, E., Patel, P., Finding motifs in time series. In Proc. of the 2nd Workshop on Temporal Data Mining (pp. 53-68). (2002)

Generates a SAX MinDist distance matrix (i.e. the "lookup table") for a given alphabet size.

Description

Generates a SAX MinDist distance matrix (i.e. the "lookup table") for a given alphabet size.

Usage

sax_distance_matrix(a_size)
sax_distance_matrix(a_size)

Arguments

a_size

the desired alphabet size (a value between 2 and 20, inclusive)

Value

Returns a distance matrix (for SAX minDist) for a specified alphabet size

References

Lonardi, S., Lin, J., Keogh, E., Patel, P., Finding motifs in time series. In Proc. of the 2nd Workshop on Temporal Data Mining (pp. 53-68).

Examples

sax_distance_matrix(5)
sax_distance_matrix(5)

Discretizes a time series with SAX via sliding window.

Description

Discretizes a time series with SAX via sliding window.

Usage

sax_via_window(ts, w_size, paa_size, a_size, nr_strategy, n_threshold)
sax_via_window(ts, w_size, paa_size, a_size, nr_strategy, n_threshold)

Arguments

`ts`	the input timeseries.
`w_size`	the sliding window size.
`paa_size`	the PAA size.
`a_size`	the alphabet size.
`nr_strategy`	the Numerosity Reduction strategy, acceptable values are "exact" and "mindist" – any other value triggers no numerosity reduction.
`n_threshold`	the normalization threshold.

References

Lonardi, S., Lin, J., Keogh, E., Patel, P., Finding motifs in time series. In Proc. of the 2nd Workshop on Temporal Data Mining (pp. 53-68). (2002)

Transforms a time series into the char array using SAX and the normal alphabet.

Description

Transforms a time series into the char array using SAX and the normal alphabet.

Usage

series_to_chars(ts, a_size)
series_to_chars(ts, a_size)

Arguments

`ts`	the timeseries.
`a_size`	the alphabet size.

References

Lonardi, S., Lin, J., Keogh, E., Patel, P., Finding motifs in time series. In Proc. of the 2nd Workshop on Temporal Data Mining (pp. 53-68). (2002)

Examples

y = c(-1, -2, -1, 0, 2, 1, 1, 0)
y_paa3 = paa(y, 3)
series_to_chars(y_paa3, 3)
y = c(-1, -2, -1, 0, 2, 1, 1, 0)
y_paa3 = paa(y, 3)
series_to_chars(y_paa3, 3)

Transforms a time series into the string.

Description

Transforms a time series into the string.

Usage

series_to_string(ts, a_size)
series_to_string(ts, a_size)

Arguments

`ts`	the timeseries.
`a_size`	the alphabet size.

References

Lonardi, S., Lin, J., Keogh, E., Patel, P., Finding motifs in time series. In Proc. of the 2nd Workshop on Temporal Data Mining (pp. 53-68). (2002)

Examples

y = c(-1, -2, -1, 0, 2, 1, 1, 0)
y_paa3 = paa(y, 3)
series_to_string(y_paa3, 3)
y = c(-1, -2, -1, 0, 2, 1, 1, 0)
y_paa3 = paa(y, 3)
series_to_string(y_paa3, 3)

Converts a single time series into a bag of words.

Description

Converts a single time series into a bag of words.

Usage

series_to_wordbag(ts, w_size, paa_size, a_size, nr_strategy, n_threshold)
series_to_wordbag(ts, w_size, paa_size, a_size, nr_strategy, n_threshold)

Arguments

`ts`	the timeseries.
`w_size`	the sliding window size.
`paa_size`	the PAA size.
`a_size`	the alphabet size.
`nr_strategy`	the NR strategy.
`n_threshold`	the normalization threshold.

References

Salton, G., Wong, A., Yang., C., A vector space model for automatic indexing. Commun. ACM 18, 11, 613-620, 1975.

Runs the repair on a string.

Description

Runs the repair on a string.

Usage

str_to_repair_grammar(str)
str_to_repair_grammar(str)

Arguments

str

the input string.

References

N.J. Larsson and A. Moffat. Offline dictionary-based compression. In Data Compression Conference, 1999.

Examples

str_to_repair_grammar("abc abc cba cba bac xxx abc abc cba cba bac")
str_to_repair_grammar("abc abc cba cba bac xxx abc abc cba cba bac")

Extracts a subseries.

Description

Extracts a subseries.

Usage

subseries(ts, start, end)
subseries(ts, start, end)

Arguments

`ts`	the input timeseries (0-based, left inclusive).
`start`	the interval start.
`end`	the interval end.

Examples

y = c(-1, -2, -1, 0, 2, 1, 1, 0)
subseries(y, 0, 3)
y = c(-1, -2, -1, 0, 2, 1, 1, 0)
subseries(y, 0, 3)

Z-normalizes a time series by subtracting its mean and dividing by the standard deviation.

Description

Z-normalizes a time series by subtracting its mean and dividing by the standard deviation.

Usage

znorm(ts, threshold = 0.01)
znorm(ts, threshold = 0.01)

Arguments

`ts`	the input time series.
`threshold`	the z-normalization threshold value, if the input time series' standard deviation will be found less than this value, the procedure will not be applied, so the "under-threshold-noise" would not get amplified.

References

Dina Goldin and Paris Kanellakis, On similarity queries for time-series data: Constraint specification and implementation. In Principles and Practice of Constraint Programming (CP 1995), pages 137-153. (1995)

Examples

x = seq(0, pi*4, 0.02)
y = sin(x) * 5 + rnorm(length(x))
plot(x, y, type="l", col="blue")
lines(x, znorm(y, 0.01), type="l", col="red")
x = seq(0, pi*4, 0.02)
y = sin(x) * 5 + rnorm(length(x))
plot(x, y, type="l", col="blue")
lines(x, znorm(y, 0.01), type="l", col="red")

Package 'jmotif'

Help Index

Translates an alphabet size into the array of corresponding SAX cut-lines built using the Normal distribution.

Description

Usage

Arguments

References

Examples

Computes a TF-IDF weight vectors for a set of word bags.

Description

Usage

Arguments

References

Examples

A standard UCR Cylinder-Bell-Funnel dataset from http://www.cs.ucr.edu/~eamonn/time_series_data

Description

Usage

Format

Computes the cosine similarity between numeric vectors

Description

Usage

Arguments

Value

Examples

Computes the cosine distance value between a bag of words and a set of TF-IDF weight vectors.

Description

Usage

Arguments

References

Finds the Euclidean distance between points, if distance is above the threshold, abandons the computation and returns NAN.

Description

Usage

Arguments

A PHYSIONET dataset

Description

Usage

Format

Finds the Euclidean distance between points.

Description

Usage

Arguments

Finds a discord using brute force algorithm.

Description

Usage

Arguments

References

Examples

Finds a discord (i.e. time series anomaly) with HOT-SAX. Usually works the best with lower sizes of discretization parameters: PAA and Alphabet.

Description

Usage

Arguments

References

Examples

Finds a discord with RRA (Rare Rule Anomaly) algorithm. Usually works the best with higher than that for HOT-SAX sizes of discretization parameters (i.e., PAA and Alphabet sizes).

Description

Usage

Arguments

References

Examples

A standard UCR Gun Point dataset from http://www.cs.ucr.edu/~eamonn/time_series_data

Description

Usage

Format

Get the ASCII letter by an index.

Description

Usage

Arguments

Examples

Compares two strings using mindist.

Description

Usage

Arguments

Examples

Compares two strings using natural letter ordering.

Description

Usage

Arguments

Examples

Get the index for an ASCII letter.

Description