英语翻译是有关数据仓库与数据挖掘的 再申明一次啊 Searching and Mining Trillions of T

英语翻译
是有关数据仓库与数据挖掘的 再申明一次啊
Searching and Mining Trillions of Time Series
Subsequences under Dynamic Time Warping
ABSTRACT
Most time series data mining algorithms use similarity search as a
core subroutine,and thus the time taken for similarity search is the
bottleneck for virtually all time series data mining algorithms.The
difficulty of scaling search to large datasets largely explains why
most academic work on time series data mining has plateaued at
considering a few millions of time series objects,while much of
industry and science sits on billions of time series objects waiting
to be explored.In this work we show that by using a combination
of four novel ideas we can search and mine truly massive time
series for the first time.We demonstrate the following extremely
unintuitive fact; in large datasets we can exactly search under
DTW much more quickly than the current state-of-the-art
Euclidean distance search algorithms.We demonstrate our work on
the largest set of time series experiments ever attempted.In
particular,the largest dataset we consider is larger than the
combined size of all of the time series datasets considered in all data
mining papers ever published.We show that our ideas allow us to
solve higher-level time series data mining problem such as motif
discovery and clustering at scales that would otherwise be
untenable.In addition to mining massive datasets,we will show
that our ideas also have implications for real-time monitoring of
data streams,allowing us to handle much faster arrival rates
and/or use cheaper and lower powered devices than are currently
possible.
zhifeng 1年前 已收到1个回答 举报

6gem 幼苗

共回答了16个问题采纳率:100% 举报

人工翻译,请放心采纳.
在动态时间规整下搜索和挖掘数以兆计的时间序列子序列
摘要:大多数时间序列数据挖掘算法都采用相似性搜索作为核心子程序,因此相似性搜索所花的时间对几乎所时间序列数据挖掘算法来说都是瓶颈.将搜索扩展到大型数据集的难度很大程度上解释了为什么对时间序列数据挖掘的大多数学术研究一直停留在考虑的几百万个时间序列对象的平台上,而很多产业和科学却有数以十亿计的时间序列对象等待着进行探索的原因.在本文中,我们表明了,通过使用四种新想法的结合,我们可以首次搜索和挖掘真正大量的时间序列.我们证明了以下极为直观的事实,在大型数据集中,我们可以准确地在DTW(动态时间规整)下,比目前最先进的Euclidean距离搜索算法快得多的进行搜索.我们演示了我们对迄今试图进行的最大时间序列集的试验所做的研究工作.具体而言,我们考虑的最大的数据集比在所有迄今出版过的所有数据挖掘论文中加以考虑的所有时间序列数据集的总和规模还要大.我们证明了,我们的想法使我们能够解决高层次的时间序列数据挖掘问题,比如在否则将站不住脚的规模上进行主体发现和集群.除了挖掘大量数据集,我们将证明,我们的想法也有实时监控数据流的含义,从而使我们能够处理快得多的到达速率和/或使用比目前可能的更便宜,更省电的设备.

1年前

2
可能相似的问题
Copyright © 2024 YULUCN.COM - 雨露学习互助 - 17 q. 0.171 s. - webmaster@yulucn.com