Top 10 Arxiv Papers Today in Multimedia


2.16 Mikeys
#1. Tiyuntsong: A Self-Play Reinforcement Learning Approach for ABR Video Streaming
Tianchi Huang, Xin Yao, Chenglei Wu, Rui-Xiao Zhang, Lifeng Sun
Existing reinforcement learning(RL)-based adaptive bitrate(ABR) approaches outperform the previous fixed control rules based methods by improving the Quality of Experience(QoE) score, while the QoE metric can hardly provide clear guidance for optimization, resulting in the unexpected strategies. In this paper, we propose Tiyuntsong, a self-play reinforcement learning approach with generative adversarial network(GAN)-based method for ABR video streaming. Tiyuntsong learns strategies automatically by training two agents who are competing against each other. Note that the competition results are evaluated with the rule rather than a numerical QoE score, and the rule has a clear optimization goal. Meanwhile, we propose GAN Enhancement Module to extract hidden features from the past status for preserving the information without the limitations of sequence lengths. Using testbed experiments, we show that the utilization of GAN significantly improves the Tiyuntsong's performance. By comparing the performance of ABRs, we observe that...
more | pdf | html
Figures
Tweets
BrundageBot: Tiyuntsong: A Self-Play Reinforcement Learning Approach for ABR Video Streaming. Tianchi Huang, Xin Yao, Chenglei Wu, Rui-Xiao Zhang, and Lifeng Sun https://t.co/rDaWxuGcuO
ComputerPapers: Tiyuntsong: A Self-Play Reinforcement Learning Approach for ABR Video Streaming. https://t.co/Y9huMnswLS
Github
Repository: sabre
User: UMass-LIDS
Language: Python
Stargazers: 4
Subscribers: 1
Forks: 2
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 6506
Unqiue Words: 2356

0.0 Mikeys
#2. From Thumbnails to Summaries - A single Deep Neural Network to Rule Them All
Hongxiang Gu, Viswanathan Swaminathan
Video summaries come in many forms, from traditional single-image thumbnails, animated thumbnails, storyboards, to trailer-like video summaries. Content creators use the summaries to display the most attractive portion of their videos; the users use them to quickly evaluate if a video is worth watching. All forms of summaries are essential to video viewers, content creators, and advertisers. Often video content management systems have to generate multiple versions of summaries that vary in duration and presentational forms. We present a framework ReconstSum that utilizes LSTM-based autoencoder architecture to extract and select a sparse subset of video frames or keyshots that optimally represent the input video in an unsupervised manner. The encoder selects a subset from the input video while the decoder seeks to reconstruct the video from the selection. The goal is to minimize the difference between the original input video and the reconstructed video. Our method is easily extendable to generate a variety of applications...
more | pdf | html
Figures
Tweets
nmfeeds: [O] https://t.co/iYT2dSF3Df From Thumbnails to Summaries - A single Deep Neural Network to Rule Them All. Video summaries ...
ComputerPapers: From Thumbnails to Summaries - A single Deep Neural Network to Rule Them All. https://t.co/JKbWHHDAjH
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 4188
Unqiue Words: 1517

0.0 Mikeys
#3. Few-Shot Adaptation for Multimedia Semantic Indexing
Nakamasa Inoue, Koichi Shinoda
We propose a few-shot adaptation framework, which bridges zero-shot learning and supervised many-shot learning, for semantic indexing of image and video data. Few-shot adaptation provides robust parameter estimation with few training examples, by optimizing the parameters of zero-shot learning and supervised many-shot learning simultaneously. In this method, first we build a zero-shot detector, and then update it by using the few examples. Our experiments show the effectiveness of the proposed framework on three datasets: TRECVID Semantic Indexing 2010, 2014, and ImageNET. On the ImageNET dataset, we show that our method outperforms recent few-shot learning methods. On the TRECVID 2014 dataset, we achieve 15.19% and 35.98% in Mean Average Precision under the zero-shot condition and the supervised condition, respectively. To the best of our knowledge, these are the best results on this dataset.
more | pdf | html
Figures
Tweets
udmrzn: RT @arxiv_cscv: Few-Shot Adaptation for Multimedia Semantic Indexing https://t.co/IJzBTe8Ixd
Github
None.
Youtube
None.
Other stats
Sample Sizes : [1, 1, 1]
Authors: 2
Total Words: 7019
Unqiue Words: 2000

0.0 Mikeys
#4. Listen to Dance: Music-driven choreography generation using Autoregressive Encoder-Decoder Network
Juheon Lee, Seohyun Kim, Kyogu Lee
Automatic choreography generation is a challenging task because it often requires an understanding of two abstract concepts - music and dance - which are realized in the two different modalities, namely audio and video, respectively. In this paper, we propose a music-driven choreography generation system using an auto-regressive encoder-decoder network. To this end, we first collect a set of multimedia clips that include both music and corresponding dance motion. We then extract the joint coordinates of the dancer from video and the mel-spectrogram of music from audio, and train our network using music-choreography pairs as input. Finally, a novel dance motion is generated at the inference time when only music is given as an input. We performed a user study for a qualitative evaluation of the proposed method, and the results show that the proposed model is able to generate musically meaningful and natural dance movements given an unheard song.
more | pdf | html
Figures
Tweets
ComputerPapers: Listen to Dance: Music-driven choreography generation using Autoregressive Encoder-Decoder Network. https://t.co/qWXsICXYIv
MUKULBHALLA7: https://t.co/sPkttkkNaS
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 3144
Unqiue Words: 1151

0.0 Mikeys
#5. Provably Secure Steganography on Generative Media
Kejiang Chen, Hang Zhou, Dongdong Hou, Hanqing Zhao, Weiming Zhang, Nenghai Yu
In this paper, we propose provably secure steganography on generative media. Firstly, we discuss the essence of the steganographic security, which is identical to behavioral security. The behavioral security implies that the generative media are suitable for information hiding as well. Based on the duality of source coding and generating discrete distribution from fair coins and the explicit probability distribution yielded by generative model, perfectly secure steganography on generative media is proposed. Instead of random sampling from the probability distribution as ordinary generative models do, we combine the source decoding into the process of generation, which can implement the sampling according to the probability distribution as well as embed the encrypted message. Adaptive Arithmetic Coding is selected as the source coding method, and it is proved theoretically that the proposed generative steganography framework using adaptive Arithmetic Coding is asymptotically perfect secure. Taking text-to-speech system based on...
more | pdf | html
Figures
Tweets
arxiv_org: Provably Secure Steganography on Generative Media. https://t.co/wlIE2zU6Zq https://t.co/T27R91NqJ3
ComputerPapers: Provably Secure Steganography on Generative Media. https://t.co/navNHsiJ4q
Rosenchild: RT @arxiv_org: Provably Secure Steganography on Generative Media. https://t.co/wlIE2zU6Zq https://t.co/T27R91NqJ3
Rosenchild: RT @arxiv_org: Provably Secure Steganography on Generative Media. https://t.co/wlIE2zU6Zq https://t.co/T27R91NqJ3
shubh_300595: RT @arxiv_org: Provably Secure Steganography on Generative Media. https://t.co/wlIE2zU6Zq https://t.co/T27R91NqJ3
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 7780
Unqiue Words: 2109

0.0 Mikeys
#6. Fast Block Structure Determination in AV1-based Multiple Resolutions Video Encoding
Bichuan Guo, Yuxing Han, Jiangtao Wen
The widely used adaptive HTTP streaming requires an efficient algorithm to encode the same video to different resolutions. In this paper, we propose a fast block structure determination algorithm based on the AV1 codec that accelerates high resolution encoding, which is the bottle-neck of multiple resolutions encoding. The block structure similarity across resolutions is modeled by the fineness of frame detail and scale of object motions, this enables us to accelerate high resolution encoding based on low resolution encoding results. The average depth of a block's co-located neighborhood is used to decide early termination in the RDO process. Encoding results show that our proposed algorithm reduces encoding time by 30.1%-36.8%, while keeping BD-rate low at 0.71%-1.04%. Comparing to the state-of-the-art, our method halves performance loss without sacrificing time savings.
more | pdf | html
Figures
Tweets
arxiv_org: Fast Block Structure Determination in AV1-based Multiple Resolutions Video Encoding. https://t.co/eZO0NMgcRj https://t.co/fn9wvG4uow
ComputerPapers: Fast Block Structure Determination in AV1-based Multiple Resolutions Video Encoding. https://t.co/NVQ0wf17cK
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 4555
Unqiue Words: 1480

0.0 Mikeys
#7. Two-pass Light Field Image Compression for Spatial Quality and Angular Consistency
Bichuan Guo, Jiangtao Wen, Yuxing Han
The quality assessment of light field images presents new challenges to conventional compression methods, as the spatial quality is affected by the optical distortion of capturing devices, and the angular consistency affects the performance of dynamic rendering applications. In this paper, we propose a two-pass encoding system for pseudo-temporal sequence based light field image compression with a novel frame level bit allocation framework that optimizes spatial quality and angular consistency simultaneously. Frame level rate-distortion models are estimated during the first pass, and the second pass performs the actual encoding with optimized bit allocations given by a two-step convex programming. The proposed framework supports various encoder configurations. Experimental results show that comparing to the anchor HM 16.16 (HEVC reference software), the proposed two-pass encoding system on average achieves 11.2% to 11.9% BD-rate reductions for the all-intra configuration, 15.8% to 32.7% BD-rate reductions for the random-access...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 13728
Unqiue Words: 3473

0.0 Mikeys
#8. SoniControl - A Mobile Ultrasonic Firewall
Matthias Zeppelzauer, Alexis Ringot, Florian Taurer
The exchange of data between mobile devices in the near-ultrasonic frequency band is a new promising technology for near field communication (NFC) but also raises a number of privacy concerns. We present the first ultrasonic firewall that reliably detects ultrasonic communication and provides the user with effective means to prevent hidden data exchange. This demonstration showcases a new media-based communication technology ("data over audio") together with its related privacy concerns. It enables users to (i) interactively test out and experience ultrasonic information exchange and (ii) shows how to protect oneself against unwanted tracking.
more | pdf | html
Figures
Tweets
cynicalsecurity: M. Zeppelzauer et al., "SoniControl - A Mobile Ultrasonic Firewall" https://t.co/55Biki6TUq
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 1781
Unqiue Words: 900

0.0 Mikeys
#9. Streaming Video QoE Modeling and Prediction: A Long Short-Term Memory Approach
Nagabhushan Eswara, S Ashique, Anand Panchbhai, Soumen Chakraborty, Hemanth P. Sethuram, Kiran Kuchi, Abhinav Kumar, Sumohana S. Channappayya
HTTP based adaptive video streaming has become a popular choice of streaming due to the reliable transmission and the flexibility offered to adapt to varying network conditions. However, due to rate adaptation in adaptive streaming, the quality of the videos at the client keeps varying with time depending on the end-to-end network conditions. Further, varying network conditions can lead to the video client running out of playback content resulting in rebuffering events. These factors affect the user satisfaction and cause degradation of the user quality of experience (QoE). It is important to quantify the perceptual QoE of the streaming video users and monitor the same in a continuous manner so that the QoE degradation can be minimized. However, the continuous evaluation of QoE is challenging as it is determined by complex dynamic interactions among the QoE influencing factors. Towards this end, we present LSTM-QoE, a recurrent neural network based QoE prediction model using a Long Short-Term Memory (LSTM) network. The LSTM-QoE is...
more | pdf | html
Figures
None.
Tweets
arxiv_org: Streaming Video QoE Modeling and Prediction: A Long Short-Term Memory Approach. https://t.co/7ZkzpGsAwq https://t.co/JRosoxhPgf
nmfeeds: [O] https://t.co/lpt1n0veJP Streaming Video QoE Modeling and Prediction: A Long Short-Term Memory Approach. HTTP based ada...
MIn3ws: RT @arxiv_org: Streaming Video QoE Modeling and Prediction: A Long Short-Term Memory Approach. https://t.co/7ZkzpGsAwq https://t.co/JRosoxh…
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 8
Total Words: 9663
Unqiue Words: 2330

0.0 Mikeys
#10. A Convolutional Neural Networks Denoising Approach for Salt and Pepper Noise
Bo Fu, Xiao-Yang Zhao, Yi Li, Xiang-Hai Wang, Yong-Gong Ren
The salt and pepper noise, especially the one with extremely high percentage of impulses, brings a significant challenge to image denoising. In this paper, we propose a non-local switching filter convolutional neural network denoising algorithm, named NLSF-CNN, for salt and pepper noise. As its name suggested, our NLSF-CNN consists of two steps, i.e., a NLSF processing step and a CNN training step. First, we develop a NLSF pre-processing step for noisy images using non-local information. Then, the pre-processed images are divided into patches and used for CNN training, leading to a CNN denoising model for future noisy images. We conduct a number of experiments to evaluate the effectiveness of NLSF-CNN. Experimental results show that NLSF-CNN outperforms the state-of-the-art denoising algorithms with a few training images.
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 4491
Unqiue Words: 1623

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 57,756 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 57,756 papers.