OmniSketch  0.1
Oh my sketch!
OmniSketch::Data::GndTruth< key_len, T > Class Template Reference

Ground truth of the streaming data. More...

#include <data.h>

Public Member Functions

bool empty () const
 Return whether the instance is empty. More...
 
min () const
 Return the minimum value. More...
 
max () const
 Return the maximum value. More...
 
int64_t totalValue () const
 Return the sum of values of all flowkeys. More...
 
void swap (GndTruth &other)
 Swap content, note that calling histories are swapped as well. More...
 
RightConstIterator begin () const
 Return a random access iterator pointed to the first element. More...
 
RightConstIterator end () const
 Return a random access const iterator pointed to the very end. More...
 
size_t size () const
 return the number of flows More...
 
size_t count (const FlowKey< key_len > &flowkey) const
 return whether a flowkey is in the streaming data or not More...
 
at (const FlowKey< key_len > &flowkey) const
 Get the value of a certain key. More...
 
std::pair< RightConstIterator, RightConstIteratorequalRange (T value)
 Return all flowkeys that share a single value. More...
 
void getGroundTruth (typename std::vector< Record< key_len >>::const_iterator begin, typename std::vector< Record< key_len >>::const_iterator end, CntMethod cnt_method)
 Get the ground truth of the given stream. More...
 
void getHeavyHitter (const GndTruth &flow_summary, double threshold, HXMethod hh_method)
 Get heavy hitters of the given stream (from flow summary) More...
 
void getHeavyHitter (GndTruth &&flow_summary, double threshold, HXMethod hh_method)
 Get heavy hitters of the given stream (from flow summary) More...
 
void getHeavyHitter (typename std::vector< Record< key_len >>::const_iterator begin, typename std::vector< Record< key_len >>::const_iterator end, CntMethod cnt_method, double threshold, HXMethod hh_method)
 Get heavy hitters from streaming data. More...
 
void getHeavyChanger (const GndTruth &flow_summary_1, const GndTruth &flow_summary_2, double threshold, HXMethod hc_method)
 Get heavy changers of the given stream (from flow summary) More...
 
void getHeavyChanger (GndTruth &&flow_summary_1, GndTruth &&flow_summary_2, double threshold, HXMethod hc_method)
 Get heavy changers of the given stream (from flow summary) More...
 
void getHeavyChanger (typename std::vector< Record< key_len >>::const_iterator begin_1, typename std::vector< Record< key_len >>::const_iterator end_1, typename std::vector< Record< key_len >>::const_iterator begin_2, typename std::vector< Record< key_len >>::const_iterator end_2, CntMethod cnt_method, double threshold, HXMethod hc_method)
 Get heavy hitters from streaming data. More...
 

Protected Types

using BidirMap = boost::bimaps::bimap< boost::bimaps::unordered_set_of< FlowKey< key_len >, std::hash< FlowKey< key_len > >, std::equal_to< FlowKey< key_len > >>, boost::bimaps::vector_of< T > >
 
using RightVal = typename BidirMap::right_value_type
 
using LeftVal = typename BidirMap::left_value_type
 
using RightConstIterator = typename BidirMap::right_const_iterator
 

Protected Attributes

BidirMap my_map
 The internal bidirectional map. More...
 
int64_t tot_value = 0
 Sum of all counters. More...
 
int64_t called = 0
 How many times that getXXX are called on this instance. More...
 

Detailed Description

template<int32_t key_len, typename T = int64_t>
class OmniSketch::Data::GndTruth< key_len, T >

Ground truth of the streaming data.

This class encapsulates boost::bimap, a time-efficient implementation of bidirectional map. See also this page for more information. It is left signature-compatible with std::unordered_map and right signature-compatible with std::vector. Besides, values are in descending order. Sensible users should not bother with underlying data structure, since it is the black box! Moreover, the class supports range-expression. It means that you may iterate all the flowkeys in the following manner:

using namespace OmniSketch::Data;
for(const auto &kv: gnd_truth){
kv.get_left(); // FlowKey<13>, equivalent to kv.second
kv.get_right(); // int32_t, equivalent to kv.first
}
Template Parameters
Ttype of counter
key_lenlength of flowkey
Note
  • T is suggested to be int32_t when the counting method is chosen to be InPacket and be int64_t on InLength, so as to avoid overflow.
  • Rationale: It seems to be odd that so many methods, such as finding heavy hitters, heavy changers & super spreaders, are designed to be the class methods of GndTruth, in lieu of those of StreamData. Well, the design is quite intuitive once you keep in mind that GndTruth should be provided with streaming data to find the truth. Besides, GndTruth can be extracted from another GndTruth. See example below.

Example

using namespace OmniSketch::Data;
// read data
StreamData<8> data("record.bin");
if(!data.succeed()) exit(-1);
GndTruth<8, int32_t> flow_summary, heavy_hitter;
// 1. flow summary
flow_summary.getGndTruth(data.begin(), data.end(), InLength);
// 2. heavy hitter, extract from flow summary
heavy_hitter.getHeavyHitter(flow_summary, 0.1, TopK);
// 2. is equivalent to:
//
// heavy_hitter.getHeavyHitter(data.begin(), data.end(), 0.1, TopK);
// // whose time complexity is about 1.+2.
Warning
  • On each instance of GndTruth, you should call getXXX(...) method at most once. Otherwise, you should define a new instance and call the method upon the new one. See example below. The only exception happens when you swap two GndTruth instances. In that case, their calling histories are swapped as well. See swap().
  • Any instance that is about to call getHeavyChanger() has to declare T as an signed type, no matter the size, since there could be negative values half way in arithemetic operations. In any other cases, declaring T as unsigned is fine after all.

A Second Example

using namespace OmniSketch::Data;
// read data
StreamData<8> data("record.bin");
if(!data.succeed()) exit(-1);
GndTruth flow_summary, heavy_hitter;
// 1. get ground truth
flow_summary.getGndTruth(data.begin(), data.end(), InLength);
// Buggy code!!! Since getXXX(...) is called upon flow_summary a second time.
// 2. extract heavy hitter
flow_summary.getHeavyHitter(flow_summary, 0.1, Data::TopK);

Member Typedef Documentation

◆ BidirMap

template<int32_t key_len, typename T = int64_t>
using OmniSketch::Data::GndTruth< key_len, T >::BidirMap = boost::bimaps::bimap<boost::bimaps::unordered_set_of< FlowKey<key_len>, std::hash<FlowKey<key_len> >, std::equal_to<FlowKey<key_len> >>, boost::bimaps::vector_of<T> >
protected

◆ LeftVal

template<int32_t key_len, typename T = int64_t>
using OmniSketch::Data::GndTruth< key_len, T >::LeftVal = typename BidirMap::left_value_type
protected

◆ RightConstIterator

template<int32_t key_len, typename T = int64_t>
using OmniSketch::Data::GndTruth< key_len, T >::RightConstIterator = typename BidirMap::right_const_iterator
protected

◆ RightVal

template<int32_t key_len, typename T = int64_t>
using OmniSketch::Data::GndTruth< key_len, T >::RightVal = typename BidirMap::right_value_type
protected

Member Function Documentation

◆ at()

template<int32_t key_len, typename T >
T OmniSketch::Data::GndTruth< key_len, T >::at ( const FlowKey< key_len > &  flowkey) const

Get the value of a certain key.

If the key does not exist, an out-of-range exception would be thrown.

◆ begin()

template<int32_t key_len, typename T = int64_t>
RightConstIterator OmniSketch::Data::GndTruth< key_len, T >::begin ( ) const
inline

Return a random access iterator pointed to the first element.

The next two examples show how to traverse the ground truth and fetch length for flowkey:

Examples

// GndTruth<4> gnd_truth;
for(auto ptr = gnd_truth.begin(); ptr != gnd_truth.end(); ptr++){
std::cout << ptr->get_left().getIp() << " " << ptr->get_right() <<
std::endl;
// Equivalently,
std::cout << ptr->second.getIp() << " " << ptr->first << std::endl;
}
// ptr->second (same as ptr->get_left()): const FlowKey &
// ptr->first (same as ptr->get_right()): const T &
// GndTruth<4> gnd_truth;
for(const auto &kv : gnd_truth){
std::cout << kv.get_left().getIp() << " " << kv.get_right() <<
std::endl;
// Equivalently,
std::cout << kv.second.getIp() << " " << kv.first << std::endl;
}
// kv.second (same as kv.get_left()): const FlowKey &
// kv.first (same as kv.get_right()): const T &

◆ count()

template<int32_t key_len, typename T = int64_t>
size_t OmniSketch::Data::GndTruth< key_len, T >::count ( const FlowKey< key_len > &  flowkey) const
inline

return whether a flowkey is in the streaming data or not

Always 0 or 1 in this case.

◆ empty()

template<int32_t key_len, typename T = int64_t>
bool OmniSketch::Data::GndTruth< key_len, T >::empty ( ) const
inline

Return whether the instance is empty.

◆ end()

template<int32_t key_len, typename T = int64_t>
RightConstIterator OmniSketch::Data::GndTruth< key_len, T >::end ( ) const
inline

Return a random access const iterator pointed to the very end.

See also
begin()

◆ equalRange()

template<int32_t key_len, typename T >
std::pair< typename GndTruth< key_len, T >::RightConstIterator, typename GndTruth< key_len, T >::RightConstIterator > OmniSketch::Data::GndTruth< key_len, T >::equalRange ( value)

Return all flowkeys that share a single value.

Time complexity is logarithm in the number of flowkeys

Parameters
valuethe value to query
Returns
A pair of random access iterator pointed to the range [start, end). Below is a code snippet demonstrating how to iterate over the range and extract flowkeys one by one.

Example

using namespace OmniSketch::Data;
// GndTruth<4> gndtruth;
const auto pair = gndtruth.equalRange(20);
for(auto ptr = pair.first; ptr != pair.second; ptr++) {
std::cout << ptr->get_left().getIp() << " " << ptr->get_right() <<
std::endl;
// Equivalently,
std::cout << ptr->second.getIp() << " " << ptr->first << std::endl;
}
// ptr->second (same as ptr->get_left()): const flowkey &
// ptr->first (same as ptr->get_right()): const T &

◆ getGroundTruth()

template<int32_t key_len, typename T >
void OmniSketch::Data::GndTruth< key_len, T >::getGroundTruth ( typename std::vector< Record< key_len >>::const_iterator  begin,
typename std::vector< Record< key_len >>::const_iterator  end,
CntMethod  cnt_method 
)

Get the ground truth of the given stream.

Attention
On success, the left view is a hash table that maps flowkey to its value; the right view a sorted vector that maps value to flowkey in descending order.
Parameters
beginthe beginning iterator (recommended to be return value of StreamData<key_len>::begin() or of StreamData<key_len>::diff())
endthe ending iterator (recommended to be return value of StreamData<key_len>::diff() or of StreamData<key_len>::end())
cnt_methodcounting method
Note
  • The function will log flows whose length<=0 || length > 1500 if cnt_method is InLength. In this scenario, it is likely that either one of the data format or the raw data is corrupted.
  • Rationale: When cnt_method equals InPacket, even if there is length info in the data, the method will not do the range check.
  • Also, the function will complain for counter overflow or calling twice. See warning in the comment of this class for more info.

◆ getHeavyChanger() [1/3]

template<int32_t key_len, typename T >
void OmniSketch::Data::GndTruth< key_len, T >::getHeavyChanger ( const GndTruth< key_len, T > &  flow_summary_1,
const GndTruth< key_len, T > &  flow_summary_2,
double  threshold,
HXMethod  hc_method 
)

Get heavy changers of the given stream (from flow summary)

Heavy changers are with respect to the two flow summaries. If hc_method equals TopK, the heaviest threshold flows are to be counted as heavy changers. Otherwise, all the flows whose deviation contributes to the whole a fraction strictly greater than threshold are counted.

Parameters
flow_summary_1the first flow summary
flow_summary_2the second flow summary
thresholdthreshold value
hc_methoddefinition of heavy changers
Note
threshold should be in [1, infty) when hh_method is TopK, and should be in [0, 1] when hh_method is Percentile. Otherwise, an exception would be thrown.

◆ getHeavyChanger() [2/3]

template<int32_t key_len, typename T >
void OmniSketch::Data::GndTruth< key_len, T >::getHeavyChanger ( GndTruth< key_len, T > &&  flow_summary_1,
GndTruth< key_len, T > &&  flow_summary_2,
double  threshold,
HXMethod  hc_method 
)

Get heavy changers of the given stream (from flow summary)

Parameters
flow_summary_1the first flow summary
flow_summary_2the second flow summary
thresholdthreshold value
hc_methoddefinition of heavy changers
Note
This function is more time-efficient compared with its const reference version (cf. getHeavyChanger(const GndTruth &, const GndTruth &, double, HXMethod)) since it does not copy flowkeys from the first flow summary. Rather, it moves them. Hence if your flow summaries are no longer in use, move it!

◆ getHeavyChanger() [3/3]

template<int32_t key_len, typename T >
void OmniSketch::Data::GndTruth< key_len, T >::getHeavyChanger ( typename std::vector< Record< key_len >>::const_iterator  begin_1,
typename std::vector< Record< key_len >>::const_iterator  end_1,
typename std::vector< Record< key_len >>::const_iterator  begin_2,
typename std::vector< Record< key_len >>::const_iterator  end_2,
CntMethod  cnt_method,
double  threshold,
HXMethod  hc_method 
)

Get heavy hitters from streaming data.

This function is provided for the user's convenience.

Note
What differs from getGroundTruth() is that this function does not check counter overflow in the very detail. But it does check for spurious packet length if cnt_method equals InLength.

◆ getHeavyHitter() [1/3]

template<int32_t key_len, typename T >
void OmniSketch::Data::GndTruth< key_len, T >::getHeavyHitter ( const GndTruth< key_len, T > &  flow_summary,
double  threshold,
HXMethod  hh_method 
)

Get heavy hitters of the given stream (from flow summary)

If hh_method equals TopK, the heaviest threshold flows are to be counted as heavy hitters. Otherwise, all the flows who contribute a fraction strictly greater than threshold are to be counted.

Parameters
flow_summarysummary of flow, typically the one returned by getGroundTruth()
thresholdthreshold value
hh_methoddefinition of heavy hitter
Note
threshold should be in [1, infty) when hh_method equals TopK, and should be in [0, 1] when hh_method equals Percentile. Otherwise, an exception would be thrown.

◆ getHeavyHitter() [2/3]

template<int32_t key_len, typename T >
void OmniSketch::Data::GndTruth< key_len, T >::getHeavyHitter ( GndTruth< key_len, T > &&  flow_summary,
double  threshold,
HXMethod  hh_method 
)

Get heavy hitters of the given stream (from flow summary)

See also
getHeavyHitter(const GndTruth &, double, HXMethod);
Note
This function is more time-efficient compared with its const reference version (cf. getHeavyHitter(const GndTruth &, double, HXMethod)) since it does not copy any flowkeys from flow summary. Rather, it moves them. Hence if your flow_summary is no longer in use, move it!

◆ getHeavyHitter() [3/3]

template<int32_t key_len, typename T >
void OmniSketch::Data::GndTruth< key_len, T >::getHeavyHitter ( typename std::vector< Record< key_len >>::const_iterator  begin,
typename std::vector< Record< key_len >>::const_iterator  end,
CntMethod  cnt_method,
double  threshold,
HXMethod  hh_method 
)

Get heavy hitters from streaming data.

A combination of getGroundTruth() and getHeavyHitter(GndTruth<key_len, T> &&, double, HXMethod). Provided for the user's convenience.

◆ max()

template<int32_t key_len, typename T = int64_t>
T OmniSketch::Data::GndTruth< key_len, T >::max ( ) const
inline

Return the maximum value.

Calling this function on an empty instance causes undefined behavior.

◆ min()

template<int32_t key_len, typename T = int64_t>
T OmniSketch::Data::GndTruth< key_len, T >::min ( ) const
inline

Return the minimum value.

Calling this function on an empty instance causes undefined behavior.

◆ size()

template<int32_t key_len, typename T = int64_t>
size_t OmniSketch::Data::GndTruth< key_len, T >::size ( ) const
inline

return the number of flows

◆ swap()

template<int32_t key_len, typename T >
void OmniSketch::Data::GndTruth< key_len, T >::swap ( GndTruth< key_len, T > &  other)

Swap content, note that calling histories are swapped as well.

Parameters
otherthe other GndTruth instance to be swapped with
Note
It is fine if a GndTruth instance swap with itself. No effect at all.

◆ totalValue()

template<int32_t key_len, typename T = int64_t>
int64_t OmniSketch::Data::GndTruth< key_len, T >::totalValue ( ) const
inline

Return the sum of values of all flowkeys.

Member Data Documentation

◆ called

template<int32_t key_len, typename T = int64_t>
int64_t OmniSketch::Data::GndTruth< key_len, T >::called = 0
protected

How many times that getXXX are called on this instance.

◆ my_map

template<int32_t key_len, typename T = int64_t>
BidirMap OmniSketch::Data::GndTruth< key_len, T >::my_map
protected

The internal bidirectional map.

◆ tot_value

template<int32_t key_len, typename T = int64_t>
int64_t OmniSketch::Data::GndTruth< key_len, T >::tot_value = 0
protected

Sum of all counters.

OmniSketch::Data
Miscellaneous tools for processing data.
Definition: data.h:24
OmniSketch::Data::GndTruth
Ground truth of the streaming data.
Definition: data.h:315
OmniSketch::Data::InLength
@ InLength
Count for (header + payload) in bytes.
Definition: data.h:30
OmniSketch::Data::GndTruth::getHeavyHitter
void getHeavyHitter(const GndTruth &flow_summary, double threshold, HXMethod hh_method)
Get heavy hitters of the given stream (from flow summary)
Definition: data.h:945
OmniSketch::Data::StreamData
Store the formatted streaming data.
Definition: data.h:152
OmniSketch::Data::TopK
@ TopK
Top K flow(s)
Definition: data.h:38