Analysis of Quality Factors of Online Video Streaming Based on Measurement

  • Anyi Xu
Keywords: Spark, machine learning, KMeans, data mining, video streaming media

Abstract

Spark is a low-latency, distributed computing system for large data sets. Spark is compatible with Hadoop data sources, but about 100 times faster than MapReducer, and is particularly well suited for machine learning. Spark is still in the embryonic stage, not yet high-speed development, with the Spark1.0.0 version of the release, marking the apache's top open source project Spark as a large data upstart, more and more attention by the IT industry will be widely used. Equipped with Spark platform and application of Spark to study the analysis of online video streaming media quality factors. This paper introduces the background knowledge of the research, and introduces and studies the composition and principle of Spark in detail. According to the needs of the experiment, the overall configuration of the platform is completed, its performance is verified, and its machine learning library is studied. First, we introduce the user requirements and architecture models of the widely accepted distributed file system in the industry. Then, the architecture of RDD is introduced. Finally, the relationship between the time of viewing video and the number of buffers is analyzed by KMeans machine learning algorithm, and the relationship of streaming media related factors is summarized. The platform used in the experiment is the Linux Ubuntu12.04LTS , the application is the Apache Spark platform. All the system preparation, debugging and testing are carried out in this experimental platform.

References

http://hadoop.apache.org/

AMP:https://amplab.cs.berkeley.edu/software/

Powered By Spark:

https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark

Spark quick-start:http://spark.apache.org/docs/latest/quick-start.html

Cluster analysis:http://en.wikipedia.org/wiki/Cluster_analysis

MacQueen, J. B. .Some Methods for classification and Analysis of

Multivariate Observations. Proceedings of 5th Berkeley Symposium on

Mathematical Statistics and Probability 1. University of California Press. pp.

–297. Retrieved 2009.

K-means clustering:http://en.wikipedia.org/wiki/K-means_clustering

Florin Dobrian, Asad Awan, Dilip Joseph, Aditya Ganjam, Jibin Zhan, Vyas

Sekar, Ion Stoica, Hui Zhang. Understanding the Impact of Video Quality on

User Engagement. SIGCOMM 2011 .

http://cwiki.apache.org/confluence/display/SPARK/

Athula Balachandran,Vyas Sekar,Aditya Akella,Srinivasan Seshan,Ion

Stoica,Hui Zhang.A Quest for an Internet Video Quality-of-Experience

Metric.HotNet 2012.

Florin Dobrian, Asad Awan, Dilip Joseph, Aditya Ganjam, Jibin Zhan,

Vyas Sekar, Ion Stoica, Hui Zhang. Understanding the Impact of Video Quality

on User Engagement.SIGCOMM 2011 .

Ahahzad Ali, Anket Mathur, Hui Zhang. Measurement of Commercial

Peer-to-Peer Live Video Streaming. Workshop in Recent Advances in Peer-to-

Peer Streaming.August,2006.

Phillipa Gill, Martin Arlitt, Zongpeng Li, Anirban Mahanti. YouTube

Traffic Characterization: A View From the Edge. In Proc. IMC, 2007.

Section
Articles