Package 'OutliersLearn' reference manual

Title:	Educational Outlier Package with Common Outlier Detection Algorithms
Description:	Provides implementations of some of the most important outlier detection algorithms. Includes a tutorial mode option that shows a description of each algorithm and provides a step-by-step execution explanation of how it identifies outliers from the given data with the specified input parameters. References include the works of Azzedine Boukerche, Lining Zheng, and Omar Alfandi (2020) <doi:10.1145/3381028>, Abir Smiti (2020) <doi:10.1016/j.cosrev.2020.100306>, and Xiaogang Su, Chih-Ling Tsai (2011) <doi:10.1002/widm.19>.
Authors:	Andres Missiego Manjon [aut, cre], Juan Jose Cuadrado Gallego [aut]
Maintainer:	Andres Missiego Manjon <[email protected]>
License:	MIT + file LICENSE
Version:	1.0.0
Built:	2025-02-18 05:56:54 UTC
Source:	https://github.com/missiegobeats/outlierslearn

Box And Whiskers

Description

This function implements the box & whiskers algorithm to detect outliers

Usage

boxandwhiskers(data, d, tutorialMode)
boxandwhiskers(data, d, tutorialMode)

Arguments

`data`	Input data.
`d`	Degree of outlier or distance at which an event is considered an outlier
`tutorialMode`	if TRUE the tutorial mode is activated (the algorithm will include an explanation detailing the theory behind the outlier detection algorithm and a step by step explanation of how is the data processed to obtain the outliers following the theory mentioned earlier)

Value

None, does not return any value

Author(s)

Andres Missiego Manjon

Examples

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,
4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))))
inputData = data.frame(inputData)
boxandwhiskers(inputData,2,FALSE) # Can be set to TRUE

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,
4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))))
inputData = data.frame(inputData)
boxandwhiskers(inputData,2,FALSE) # Can be set to TRUE

DBSCAN_method

Description

Outlier detection method using DBSCAN

Usage

DBSCAN_method(inputData, max_distance_threshold, min_pts, tutorialMode)
DBSCAN_method(inputData, max_distance_threshold, min_pts, tutorialMode)

Arguments

`inputData`	Input Data (must be a data.frame)
`max_distance_threshold`	This is used to calculate the distance between all the points and check if the euclidean distance is less than the max_distance_threshold parameter to decide if add it to the neighbors or not
`min_pts`	the minimum number of points to form a dense region
`tutorialMode`	if TRUE the tutorial mode is activated (the algorithm will include an explanation detailing the theory behind the outlier detection algorithm and a step by step explanation of how is the data processed to obtain the outliers following the theory mentioned earlier)

Value

None, does not return any value

Author(s)

Andres Missiego Manjon

Examples

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,
4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))));
inputData = data.frame(inputData);
eps = 4;
min_pts = 3;
DBSCAN_method(inputData, eps, min_pts, FALSE); #Can be set to TRUE

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,
4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))));
inputData = data.frame(inputData);
eps = 4;
min_pts = 3;
DBSCAN_method(inputData, eps, min_pts, FALSE); #Can be set to TRUE

euclidean_distance

Description

This function calculates the euclidean distance between 2 points. They must have the same number of dimensions

Usage

euclidean_distance(p1, p2)
euclidean_distance(p1, p2)

Arguments

`p1`	One of the points that will be used by the algorithm with N dimensions
`p2`	The other point that will be used by the algorithm with N dimensions

Value

Euclidean Distance calculated between the two N-dimensional points

Author(s)

Andres Missiego Manjon

Examples

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,
4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))));
inputData = data.frame(inputData);
point1 = inputData[1,];
point2 = inputData[4,];
distance = euclidean_distance(point1, point2);

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,
4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))));
inputData = data.frame(inputData);
point1 = inputData[1,];
point2 = inputData[4,];
distance = euclidean_distance(point1, point2);

knn

Description

This function implements the knn algorithm for outlier detection

Usage

knn(data, d, K, tutorialMode)
knn(data, d, K, tutorialMode)

Arguments

`data`	Input Data (must be a data.frame)
`d`	Degree of outlier or distance at which an event is considered an outlier
`K`	Nearest neighbor for which an event must have a degree of outlier to be considered an outlier
`tutorialMode`	if TRUE the tutorial mode is activated (the algorithm will include an explanation detailing the theory behind the outlier detection algorithm and a step by step explanation of how is the data processed to obtain the outliers following the theory mentioned earlier)

Value

None, does not return any value

Author(s)

Andres Missiego Manjon

Examples

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,
4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))))
inputData = data.frame(inputData)
knn(inputData,3,2,FALSE) #Can be changed to TRUE

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,
4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))))
inputData = data.frame(inputData)
knn(inputData,3,2,FALSE) #Can be changed to TRUE

lof

Description

Local Outlier Factor algorithm to detect outliers

Usage

lof(inputData, K, threshold, tutorialMode)
lof(inputData, K, threshold, tutorialMode)

Arguments

`inputData`	Input Data (must be a data.frame)
`K`	This number represents the nearest neighbor to use to calculate the density of each point. This value is chosen arbitrarily and is responsibility of the data scientist/user to select a number adequate to the dataset.
`threshold`	Value that is used to classify the points comparing it to the calculated ARDs of the points in the dataset. If the ARD is smaller, the point is classified as an outliers. If not, the point is classified as a normal point (inlier)
`tutorialMode`	if TRUE the tutorial mode is activated (the algorithm will include an explanation detailing the theory behind the outlier detection algorithm and a step by step explanation of how is the data processed to obtain the outliers following the theory mentioned earlier)

Value

None, does not return any value

Author(s)

Andres Missiego Manjon

Examples

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,
4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))));
inputData = data.frame(inputData);
lof(inputData,3,0.5,FALSE) #Can be changed to TRUE

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,
4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))));
inputData = data.frame(inputData);
lof(inputData,3,0.5,FALSE) #Can be changed to TRUE

mahalanobis_distance

Description

Calculates the mahalanobis_distance given the input data

Usage

mahalanobis_distance(value, sample_mean, sample_covariance_matrix)
mahalanobis_distance(value, sample_mean, sample_covariance_matrix)

Arguments

`value`	Point to calculate the mahalanobis_distance
`sample_mean`	Sample mean
`sample_covariance_matrix`	Sample Covariance Matrix

Value

Mahalanobis distance associated to the point

Author(s)

Andres Missiego Manjon

Examples

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,
4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))));
inputData = data.frame(inputData);
inputData = as.matrix(inputData);
sampleMeans = c();
for(i in 1:ncol(inputData)){
  column = inputData[,i];
  calculatedMean = sum(column)/length(column);
  print(sprintf("Calculated mean for column %d: %f", i, calculatedMean))
  sampleMeans = c(sampleMeans, calculatedMean);
}
covariance_matrix = cov(inputData);
distance = mahalanobis_distance(inputData[3,], sampleMeans, covariance_matrix);

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,
4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))));
inputData = data.frame(inputData);
inputData = as.matrix(inputData);
sampleMeans = c();
for(i in 1:ncol(inputData)){
  column = inputData[,i];
  calculatedMean = sum(column)/length(column);
  print(sprintf("Calculated mean for column %d: %f", i, calculatedMean))
  sampleMeans = c(sampleMeans, calculatedMean);
}
covariance_matrix = cov(inputData);
distance = mahalanobis_distance(inputData[3,], sampleMeans, covariance_matrix);

mahalanobis_method

Description

Detect outliers using the Mahalanobis Distance method

Usage

mahalanobis_method(inputData, alpha, tutorialMode)
mahalanobis_method(inputData, alpha, tutorialMode)

Arguments

`inputData`	Input Data dataset that will be processed (with or not the step by step explanation) to obtain the underlying outliers. It must be a data.frame type.
`alpha`	Significance level alpha. This value indicates the proportion that it is expected to be outliers out of the dataset. It has to be in the range from 0 to 1
`tutorialMode`	if TRUE the tutorial mode is activated (the algorithm will include an explanation detailing the theory behind the outlier detection algorithm and a step by step explanation of how is the data processed to obtain the outliers following the theory mentioned earlier)

Value

None, does not return any value

Author(s)

Andres Missiego Manjon

Examples

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,
4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))));
inputData = data.frame(inputData);
mahalanobis_method(inputData, 0.7, FALSE); #Can be set to TRUE

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,
4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))));
inputData = data.frame(inputData);
mahalanobis_method(inputData, 0.7, FALSE); #Can be set to TRUE

manhattan_dist

Description

Calculates the manhattan distance between two 2D points

Usage

manhattan_dist(A, B)
manhattan_dist(A, B)

Arguments

`A`	One of the 2D points
`B`	The other 2D point

Value

Manhattan distance calculated between point A and B

Author(s)

Andres Missiego Manjon

Examples

distance = manhattan_dist(c(1,2), c(3,4));

distance = manhattan_dist(c(1,2), c(3,4));

mean_outliersLearn

Description

Calculates the mean of the given data vector

Usage

mean_outliersLearn(data)
mean_outliersLearn(data)

Arguments

data

Input Data that will be processed to calculate the mean. It must be a vector

Value

Mean of the input data

Author(s)

Andres Missiego Manjon

Examples

mean = mean_outliersLearn(c(2,3,2.3,7.8));

mean = mean_outliersLearn(c(2,3,2.3,7.8));

quantile_outliersLearn

Description

Function that obtains the 'v' quantile

Usage

quantile_outliersLearn(data, v)
quantile_outliersLearn(data, v)

Arguments

`data`	Input Data
`v`	Goes from 0 to 1 (e.g. 0.25). Indicates the quantile that wants to be obtained

Value

Quantile v calculated

Author(s)

Andres Missiego Manjon

Examples

q = quantile_outliersLearn(c(12,2,3,4,1,13), 0.60)

q = quantile_outliersLearn(c(12,2,3,4,1,13), 0.60)

sd_outliersLearn

Description

Calculates the standard deviation of the input data given the mean.

Usage

sd_outliersLearn(data, mean)
sd_outliersLearn(data, mean)

Arguments

`data`	Input Data that will be used to calculate the standard deviation. Must be a vector
`mean`	Mean of the input data vector of the function.

Value

Standard Deviation of the input data

Author(s)

Andres Missiego Manjon

Examples

inputData = c(1,2,3,4,5,6,1);
mean = sum(inputData)/length(inputData);
sd = sd_outliersLearn(inputData, mean);

inputData = c(1,2,3,4,5,6,1);
mean = sum(inputData)/length(inputData);
sd = sd_outliersLearn(inputData, mean);

transform_to_vector

Description

Transform any type of data to a vector

Usage

transform_to_vector(data)
transform_to_vector(data)

Arguments

data

Input data that will be transformed into a vector

Value

Data formatted as a vector

Author(s)

Andres Missiego Manjon

Examples

numeric_data = c(1, 2, 3)
character_data = c("a", "b", "c")
logical_data = c(TRUE, FALSE, TRUE)
factor_data = factor(c("A", "B", "A"))
integer_data = as.integer(c(1, 2, 3))
complex_data = complex(real = c(1, 2, 3), imaginary = c(4, 5, 6))
list_data = list(1, "apple", TRUE)
data_frame_data = data.frame(x = c(1, 2, 3), y = c("a", "b", "c"))

transformed_numeric = transform_to_vector(numeric_data)
transformed_character = transform_to_vector(character_data)
transformed_logical = transform_to_vector(logical_data)
transformed_factor = transform_to_vector(factor_data)
transformed_integer = transform_to_vector(integer_data)
transformed_complex = transform_to_vector(complex_data)
transformed_list = transform_to_vector(list_data)
transformed_data_frame = transform_to_vector(data_frame_data)

numeric_data = c(1, 2, 3)
character_data = c("a", "b", "c")
logical_data = c(TRUE, FALSE, TRUE)
factor_data = factor(c("A", "B", "A"))
integer_data = as.integer(c(1, 2, 3))
complex_data = complex(real = c(1, 2, 3), imaginary = c(4, 5, 6))
list_data = list(1, "apple", TRUE)
data_frame_data = data.frame(x = c(1, 2, 3), y = c("a", "b", "c"))

transformed_numeric = transform_to_vector(numeric_data)
transformed_character = transform_to_vector(character_data)
transformed_logical = transform_to_vector(logical_data)
transformed_factor = transform_to_vector(factor_data)
transformed_integer = transform_to_vector(integer_data)
transformed_complex = transform_to_vector(complex_data)
transformed_list = transform_to_vector(list_data)
transformed_data_frame = transform_to_vector(data_frame_data)

z_score_method

Description

This function implements the outlier detection algorithm using standard deviation and mean

Usage

z_score_method(data, d, tutorialMode)
z_score_method(data, d, tutorialMode)

Arguments

`data`	Input Data that will be processed with or without the tutorial mode activated
`d`	Degree of outlier or distance at which an event is considered an outlier
`tutorialMode`	if TRUE the tutorial mode is activated (the algorithm will include an explanation detailing the theory behind the outlier detection algorithm and a step by step explanation of how is the data processed to obtain the outliers following the theory mentioned earlier)

Value

None, does not return any value

Author(s)

Andres Missiego Manjon

Examples

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,
4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))))
inputData = data.frame(inputData)
z_score_method(inputData,2,FALSE) #Can be changed to TRUE

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,
4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))))
inputData = data.frame(inputData)
z_score_method(inputData,2,FALSE) #Can be changed to TRUE

Package 'OutliersLearn'

Help Index

Box And Whiskers

Description

Usage

Arguments

Value

Author(s)

Examples

DBSCAN_method

Description

Usage

Arguments

Value

Author(s)

Examples

euclidean_distance

Description

Usage

Arguments

Value

Author(s)

Examples

knn

Description

Usage

Arguments

Value

Author(s)

Examples

lof

Description

Usage

Arguments

Value

Author(s)

Examples

mahalanobis_distance

Description

Usage

Arguments

Value

Author(s)

Examples

mahalanobis_method

Description

Usage

Arguments

Value

Author(s)

Examples

manhattan_dist

Description

Usage

Arguments

Value

Author(s)

Examples

mean_outliersLearn

Description

Usage

Arguments

Value

Author(s)

Examples

quantile_outliersLearn

Description

Usage

Arguments

Value

Author(s)

Examples

sd_outliersLearn

Description

Usage

Arguments

Value

Author(s)

Examples

transform_to_vector