Top 10 Arxiv Papers Today in Computer Vision And Pattern Recognition


2.068 Mikeys
#1. VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai
We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short). VL-BERT adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguistic embedded features as input. In it, each element of the input is either of a word from the input sentence, or a region-of-interest (RoI) from the input image. It is designed to fit for most of the vision-and-language downstream tasks. To better exploit the generic representation, we pre-train VL-BERT on massive-scale Conceptual Captions dataset with three tasks: masked language modeling with visual clues, masked RoI classification with linguistic clues, and sentence-image relationship prediction. Extensive empirical analysis demonstrates that the pre-training procedure can better align the visual-linguistic clues and benefit the downstream tasks, such as visual question answering, visual commonsense reasoning and referring expression comprehension. It is worth noting...
more | pdf | html
Figures
None.
Tweets
BrundageBot: VL-BERT: Pre-training of Generic Visual-Linguistic Representations. Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai https://t.co/eeqrCgbZze
arxiv_in_review: #ICLR2020 VL-BERT: Pre-training of Generic Visual-Linguistic Representations. (arXiv:1908.08530v1 [cs\.CV]) https://t.co/lfDcjcuN0B
arxivml: "VL-BERT: Pre-training of Generic Visual-Linguistic Representations", Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei… https://t.co/2PSRBL6rsn
Memoirs: VL-BERT: Pre-training of Generic Visual-Linguistic Representations. https://t.co/WDh5Mgf5Jx
arxiv_cscv: VL-BERT: Pre-training of Generic Visual-Linguistic Representations https://t.co/AKQoR6pinr
arxiv_cscv: VL-BERT: Pre-training of Generic Visual-Linguistic Representations https://t.co/AKQoR6pinr
arxiv_cscv: VL-BERT: Pre-training of Generic Visual-Linguistic Representations https://t.co/AKQoR67GYR
arxiv_cscl: VL-BERT: Pre-training of Generic Visual-Linguistic Representations https://t.co/YjcG15hrIq
arxiv_cscl: VL-BERT: Pre-training of Generic Visual-Linguistic Representations https://t.co/YjcG15hrIq
arxiv_cscl: VL-BERT: Pre-training of Generic Visual-Linguistic Representations https://t.co/YjcG15hrIq
arxiv_cscl: VL-BERT: Pre-training of Generic Visual-Linguistic Representations https://t.co/YjcG15z2zY
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 7
Total Words: 0
Unqiue Words: 0

2.06 Mikeys
#2. Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
Jyoti Aneja, Harsh Agrawal, Dhruv Batra, Alexander Schwing
Diverse and accurate vision+language modeling is an important goal to retain creative freedom and maintain user engagement. However, adequately capturing the intricacies of diversity in language models is challenging. Recent works commonly resort to latent variable models augmented with more or less supervision from object detectors or part-of-speech tags. Common to all those methods is the fact that the latent variable either only initializes the sentence generation process or is identical across the steps of generation. Both methods offer no fine-grained control. To address this concern, we propose Seq-CVAE which learns a latent space for every word position. We encourage this temporal latent space to capture the 'intention' about how to complete the sentence by mimicking a representation which summarizes the future. We illustrate the efficacy of the proposed approach to anticipate the sentence continuation on the challenging MSCOCO dataset, significantly improving diversity metrics compared to baselines while performing on par...
more | pdf | html
Figures
None.
Tweets
BrundageBot: Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning. Jyoti Aneja, Harsh Agrawal, Dhruv Batra, and Alexander Schwing https://t.co/OiveCQyKXK
arxivml: "Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning", Jyoti Aneja, Harsh Agrawal, … https://t.co/2y1MO0bGqf
StatsPapers: Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning. https://t.co/POo18z8uVP
arxiv_cscv: Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning https://t.co/aIbW4Syhw5
arxiv_cscv: Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning https://t.co/aIbW4Syhw5
arxiv_cscv: Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning https://t.co/aIbW4SPSnD
arxiv_cscl: Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning https://t.co/qtC2wMy9aU
arxiv_cscl: Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning https://t.co/qtC2wMy9aU
arxiv_cscl: Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning https://t.co/qtC2wMy9aU
arxiv_cscl: Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning https://t.co/qtC2wMPK2s
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

2.06 Mikeys
#3. ViCo: Word Embeddings from Visual Co-occurrences
Tanmay Gupta, Alexander Schwing, Derek Hoiem
We propose to learn word embeddings from visual co-occurrences. Two words co-occur visually if both words apply to the same image or image region. Specifically, we extract four types of visual co-occurrences between object and attribute words from large-scale, textually-annotated visual databases like VisualGenome and ImageNet. We then train a multi-task log-bilinear model that compactly encodes word "meanings" represented by each co-occurrence type into a single visual word-vector. Through unsupervised clustering, supervised partitioning, and a zero-shot-like generalization analysis we show that our word embeddings complement text-only embeddings like GloVe by better representing similarities and differences between visual concepts that are difficult to obtain from text corpora alone. We further evaluate our embeddings on five downstream applications, four of which are vision-language tasks. Augmenting GloVe with our embeddings yields gains on all tasks. We also find that random embeddings perform comparably to learned embeddings...
more | pdf | html
Figures
None.
Tweets
BrundageBot: ViCo: Word Embeddings from Visual Co-occurrences. Tanmay Gupta, Alexander Schwing, and Derek Hoiem https://t.co/RFVPCPQQEy
arxivml: "ViCo: Word Embeddings from Visual Co-occurrences", Tanmay Gupta, Alexander Schwing, Derek Hoiem https://t.co/ZTor7QoRhE
tanmay2099: Need a break from BERTmania? Checkout ViCo -- multi-sense word embeddings from visual (as opposed to textual) co-occurrences. Work done in collaboration with @HoiemDerek and @alexschwing at @IllinoisCS. To be presented at ICCV 2019! https://t.co/ieT0BHTXbk
arxiv_cscv: ViCo: Word Embeddings from Visual Co-occurrences https://t.co/iY04LVqvJx
arxiv_cscv: ViCo: Word Embeddings from Visual Co-occurrences https://t.co/iY04LVqvJx
arxiv_cscv: ViCo: Word Embeddings from Visual Co-occurrences https://t.co/iY04LVI6B5
arxiv_cscl: ViCo: Word Embeddings from Visual Co-occurrences https://t.co/4h4xR1CRTn
arxiv_cscl: ViCo: Word Embeddings from Visual Co-occurrences https://t.co/4h4xR1CRTn
arxiv_cscl: ViCo: Word Embeddings from Visual Co-occurrences https://t.co/4h4xR1UsKV
arxiv_cscl: ViCo: Word Embeddings from Visual Co-occurrences https://t.co/4h4xR1CRTn
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

2.057 Mikeys
#4. Deep Green Function Convolution for Improving Saliency in Convolutional Neural Networks
Dominique Beaini, Sofiane Achiche, Alexandre Duperré, Maxime Raison
Current saliency methods require to learn large scale regional features using small convolutional kernels, which is not possible with a simple feed-forward network. Some methods solve this problem by using segmentation into superpixels while others downscale the image through the network and rescale it back to its original size. The objective of this paper is to show that saliency convolutional neural networks (CNN) can be improved by using a Green's function convolution (GFC) to extrapolate edges features into salient regions. The GFC acts as a gradient integrator, allowing to produce saliency features from thin edge-like features directly inside the CNN. Hence, we propose the gradient integration and sum (GIS) layer that combines the edges features with the saliency features. Using the HED and DSS architecture, we demonstrated that adding a GIS layer near the network's output allows to reduce the sensitivity to the parameter initialization and overfitting, thus improving the repeatability of the training. By adding a GIS layer...
more | pdf | html
Figures
Tweets
BrundageBot: Deep Green Function Convolution for Improving Saliency in Convolutional Neural Networks. Dominique Beaini, Sofiane Achiche, Alexandre Duperré, and Maxime Raison https://t.co/IctwizHzKt
arxivml: "Deep Green Function Convolution for Improving Saliency in Convolutional Neural Networks", Dominique Beaini, Sofian… https://t.co/wymUZvQlxO
Memoirs: Deep Green Function Convolution for Improving Saliency in Convolutional Neural Networks. https://t.co/hUyiPOdMGz
arxiv_cscv: Deep Green Function Convolution for Improving Saliency in Convolutional Neural Networks https://t.co/RPwL6XUqzN
arxiv_cscv: Deep Green Function Convolution for Improving Saliency in Convolutional Neural Networks https://t.co/RPwL6XUqzN
arxiv_cscv: Deep Green Function Convolution for Improving Saliency in Convolutional Neural Networks https://t.co/RPwL6XUqzN
arxiv_cscv: Deep Green Function Convolution for Improving Saliency in Convolutional Neural Networks https://t.co/RPwL6XCPbd
arxiv_cscv: Deep Green Function Convolution for Improving Saliency in Convolutional Neural Networks https://t.co/RPwL6XCPbd
dirackuma: @momiji_fullmoon https://t.co/wh54b4raGw 文字化け…
disigandalf: RT @arxiv_cscv: Deep Green Function Convolution for Improving Saliency in Convolutional Neural Networks https://t.co/RPwL6XUqzN
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 7846
Unqiue Words: 2066

2.052 Mikeys
#5. EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen
We focus on multi-modal fusion for egocentric action recognition, and propose a novel architecture for multi-modal temporal-binding, i.e. the combination of modalities within a range of temporal offsets. We train the architecture with three modalities -- RGB, Flow and Audio -- and combine them with mid-level fusion alongside sparse temporal sampling of fused representations. In contrast with previous works, modalities are fused before temporal aggregation, with shared modality and fusion weights over time. Our proposed architecture is trained end-to-end, outperforming individual modalities as well as late-fusion of modalities. We demonstrate the importance of audio in egocentric vision, on per-class basis, for identifying actions as well as interacting objects. Our method achieves state of the art results on both the seen and unseen test sets of the largest egocentric dataset: EPIC-Kitchens, on all metrics using the public leaderboard.
more | pdf | html
Figures
None.
Tweets
BrundageBot: EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition. Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, and Dima Damen https://t.co/ZwzxiXDtCY
dimadamen: Our @ICCV19 paper "EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition" now on Arxiv. With @VILaboratory @e_kazakos and @Oxford_VGG @NagraniArsha and AZ. Video: https://t.co/PM4y9bUcQG Arxiv: https://t.co/BE0GFeVpa4 Project: https://t.co/BE0GFeVpa4 https://t.co/IISKB2yaQB
arxivml: "EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition", Evangelos Kazakos, Arsha Nagrani, A… https://t.co/tAlM966R3o
arxiv_cscv: EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition https://t.co/9aR0D15hVK
arxiv_cscv: EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition https://t.co/9aR0D1mTkk
arxiv_cscv: EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition https://t.co/9aR0D1mTkk
arxiv_cscv: EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition https://t.co/9aR0D15hVK
udoooom: RT @arxiv_cscv: EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition https://t.co/9aR0D1mTkk
dimadamen: RT @arxiv_cscv: EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition https://t.co/9aR0D1mTkk
NagraniArsha: RT @arxiv_cscv: EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition https://t.co/9aR0D1mTkk
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

2.052 Mikeys
#6. Noise Flow: Noise Modeling with Conditional Normalizing Flows
Abdelrahman Abdelhamed, Marcus A. Brubaker, Michael S. Brown
Modeling and synthesizing image noise is an important aspect in many computer vision applications. The long-standing additive white Gaussian and heteroscedastic (signal-dependent) noise models widely used in the literature provide only a coarse approximation of real sensor noise. This paper introduces Noise Flow, a powerful and accurate noise model based on recent normalizing flow architectures. Noise Flow combines well-established basic parametric noise models (e.g., signal-dependent noise) with the flexibility and expressiveness of normalizing flow networks. The result is a single, comprehensive, compact noise model containing fewer than 2500 parameters yet able to represent multiple cameras and gain factors. Noise Flow dramatically outperforms existing noise models, with 0.42 nats/pixel improvement over the camera-calibrated noise level functions, which translates to 52% improvement in the likelihood of sampled noise. Noise Flow represents the first serious attempt to go beyond simple parametric models to one that leverages the...
more | pdf | html
Figures
Tweets
CSProfKGD: Abdelrahman Abdelhamed, Marcus A. Brubaker (@marcusabrubaker), and Michael S. Brown, Noise Flow: Noise Modeling with Conditional Normalizing Flows, #ICCV2019 Preprint: https://t.co/R1j1j5piPx https://t.co/30quXriOXw
arxivml: "Noise Flow: Noise Modeling with Conditional Normalizing Flows", Abdelrahman Abdelhamed, Marcus A. Brubaker, Michae… https://t.co/kVc8jQh7jo
Memoirs: Noise Flow: Noise Modeling with Conditional Normalizing Flows. https://t.co/6DIY43BLuM
arxiv_cscv: Noise Flow: Noise Modeling with Conditional Normalizing Flows https://t.co/BB43MiHzJK
arxiv_cscv: Noise Flow: Noise Modeling with Conditional Normalizing Flows https://t.co/BB43MipYla
arxiv_cscv: Noise Flow: Noise Modeling with Conditional Normalizing Flows https://t.co/BB43MiHzJK
arxiv_cscv: Noise Flow: Noise Modeling with Conditional Normalizing Flows https://t.co/BB43MipYla
Github
Repository: noise_flow
User: BorealisAI
Language: Python
Stargazers: 3
Subscribers: 2
Forks: 1
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 6586
Unqiue Words: 2107

2.052 Mikeys
#7. Compositional Video Prediction
Yufei Ye, Maneesh Singh, Abhinav Gupta, Shubham Tulsiani
We present an approach for pixel-level future prediction given an input image of a scene. We observe that a scene is comprised of distinct entities that undergo motion and present an approach that operationalizes this insight. We implicitly predict future states of independent entities while reasoning about their interactions, and compose future video frames using these predicted states. We overcome the inherent multi-modality of the task using a global trajectory-level latent random variable, and show that this allows us to sample diverse and plausible futures. We empirically validate our approach against alternate representations and ways of incorporating multi-modality. We examine two datasets, one comprising of stacked objects that may fall, and the other containing videos of humans performing activities in a gym, and show that our approach allows realistic stochastic video prediction across these diverse settings. See https://judyye.github.io/CVP/ for video predictions.
more | pdf | html
Figures
Tweets
BrundageBot: Compositional Video Prediction. Yufei Ye, Maneesh Singh, Abhinav Gupta, and Shubham Tulsiani https://t.co/9Y0hfjQaf3
roadrunning01: Compositional Video Prediction pdf: https://t.co/xT4gWA45ZD abs: https://t.co/OXWhOhXgoZ project page: https://t.co/USKx57TF66 github: https://t.co/QXIHuNudMf https://t.co/zkl3DW93m4
arxivml: "Compositional Video Prediction", Yufei Ye, Maneesh Singh, Abhinav Gupta, Shubham Tulsiani https://t.co/1hprBfIGKf
arxiv_cscv: Compositional Video Prediction https://t.co/AfV4sIpPLz
arxiv_cscv: Compositional Video Prediction https://t.co/AfV4sIpPLz
arxiv_cscv: Compositional Video Prediction https://t.co/AfV4sIHqD7
udoooom: RT @arxiv_cscv: Compositional Video Prediction https://t.co/AfV4sIpPLz
JoaoVictor_AC: RT @roadrunning01: Compositional Video Prediction pdf: https://t.co/xT4gWA45ZD abs: https://t.co/OXWhOhXgoZ project page: https://t.co/USKx…
aviopene: RT @roadrunning01: Compositional Video Prediction pdf: https://t.co/xT4gWA45ZD abs: https://t.co/OXWhOhXgoZ project page: https://t.co/USKx…
dkastaniotis: RT @roadrunning01: Compositional Video Prediction pdf: https://t.co/xT4gWA45ZD abs: https://t.co/OXWhOhXgoZ project page: https://t.co/USKx…
SArmadilloTank: RT @roadrunning01: Compositional Video Prediction pdf: https://t.co/xT4gWA45ZD abs: https://t.co/OXWhOhXgoZ project page: https://t.co/USKx…
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 7285
Unqiue Words: 2239

2.051 Mikeys
#8. Progressive Face Super-Resolution via Attention to Facial Landmark
Deokyun Kim, Minseon Kim, Gihyun Kwon, Dae-Shik Kim
Face Super-Resolution (SR) is a subfield of the SR domain that specifically targets the reconstruction of face images. The main challenge of face SR is to restore essential facial features without distortion. We propose a novel face SR method that generates photo-realistic 8x super-resolved face images with fully retained facial details. To that end, we adopt a progressive training method, which allows stable training by splitting the network into successive steps, each producing output with a progressively higher resolution. We also propose a novel facial attention loss and apply it at each step to focus on restoring facial attributes in greater details by multiplying the pixel difference and heatmap values. Lastly, we propose a compressed version of the state-of-the-art face alignment network (FAN) for landmark heatmap extraction. With the proposed FAN, we can extract the heatmaps suitable for face SR and also reduce the overall training time. Experimental results verify that our method outperforms state-of-the-art methods in...
more | pdf | html
Figures
None.
Tweets
jonathanfly: I tried the new 'Progressive Face Super-Resolution via Attention to Facial Landmark' Funny detail not in official samples: a whole lot of Harry-Potter-Style scars. 3 out of the first 8. Paper: https://t.co/0YdQ0D0bun Code: https://t.co/i0Ev4o3fqU (note typo in eval py line 37) https://t.co/0mQMhrl2cW
arxivml: "Progressive Face Super-Resolution via Attention to Facial Landmark", Deokyun Kim, Minseon Kim, Gihyun Kwon, Dae-Sh… https://t.co/YUSODd418G
arxiv_cscv: Progressive Face Super-Resolution via Attention to Facial Landmark https://t.co/YzG3VykXVu
arxiv_cscv: Progressive Face Super-Resolution via Attention to Facial Landmark https://t.co/YzG3Vy3mwU
yshhrknmr: RT @roadrunning01: Progressive Face Super-Resolution via Attention to Facial Landmark pdf: https://t.co/srHXUXUNnB abs: https://t.co/xvgMEn…
maggie_albrecht: RT @roadrunning01: Progressive Face Super-Resolution via Attention to Facial Landmark pdf: https://t.co/srHXUXUNnB abs: https://t.co/xvgMEn…
JoaoVictor_AC: RT @roadrunning01: Progressive Face Super-Resolution via Attention to Facial Landmark pdf: https://t.co/srHXUXUNnB abs: https://t.co/xvgMEn…
whyboris: RT @roadrunning01: Progressive Face Super-Resolution via Attention to Facial Landmark pdf: https://t.co/srHXUXUNnB abs: https://t.co/xvgMEn…
twbompo: RT @roadrunning01: Progressive Face Super-Resolution via Attention to Facial Landmark pdf: https://t.co/srHXUXUNnB abs: https://t.co/xvgMEn…
subhobrata1: RT @roadrunning01: Progressive Face Super-Resolution via Attention to Facial Landmark pdf: https://t.co/srHXUXUNnB abs: https://t.co/xvgMEn…
veydpz_public: RT @roadrunning01: Progressive Face Super-Resolution via Attention to Facial Landmark pdf: https://t.co/srHXUXUNnB abs: https://t.co/xvgMEn…
AlonGruss: RT @roadrunning01: Progressive Face Super-Resolution via Attention to Facial Landmark pdf: https://t.co/srHXUXUNnB abs: https://t.co/xvgMEn…
meverteam: RT @roadrunning01: Progressive Face Super-Resolution via Attention to Facial Landmark pdf: https://t.co/srHXUXUNnB abs: https://t.co/xvgMEn…
hey_kishore: RT @roadrunning01: Progressive Face Super-Resolution via Attention to Facial Landmark pdf: https://t.co/srHXUXUNnB abs: https://t.co/xvgMEn…
caymanlee: RT @roadrunning01: Progressive Face Super-Resolution via Attention to Facial Landmark pdf: https://t.co/srHXUXUNnB abs: https://t.co/xvgMEn…
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

2.048 Mikeys
#9. Predicting Animation Skeletons for 3D Articulated Models via Volumetric Nets
Zhan Xu, Yang Zhou, Evangelos Kalogerakis, Karan Singh
We present a learning method for predicting animation skeletons for input 3D models of articulated characters. In contrast to previous approaches that fit pre-defined skeleton templates or predict fixed sets of joints, our method produces an animation skeleton tailored for the structure and geometry of the input 3D model. Our architecture is based on a stack of hourglass modules trained on a large dataset of 3D rigged characters mined from the web. It operates on the volumetric representation of the input 3D shapes augmented with geometric shape features that provide additional cues for joint and bone locations. Our method also enables intuitive user control of the level-of-detail for the output skeleton. Our evaluation demonstrates that our approach predicts animation skeletons that are much more similar to the ones created by humans compared to several alternatives and baselines.
more | pdf | html
Figures
Tweets
arxivml: "Predicting Animation Skeletons for 3D Articulated Models via Volumetric Nets", Zhan Xu, Yang Zhou, Evangelos Kalog… https://t.co/0NGrrJww6I
arxiv_cscv: Predicting Animation Skeletons for 3D Articulated Models via Volumetric Nets https://t.co/pcXDruMfLJ
arxiv_cscv: Predicting Animation Skeletons for 3D Articulated Models via Volumetric Nets https://t.co/pcXDrv3QDh
arxiv_cscv: Predicting Animation Skeletons for 3D Articulated Models via Volumetric Nets https://t.co/pcXDruMfLJ
arxiv_cscv: Predicting Animation Skeletons for 3D Articulated Models via Volumetric Nets https://t.co/pcXDrv3QDh
arxiv_csgr: Predicting Animation Skeletons for 3D Articulated Models via Volumetric Nets https://t.co/H2hXvwJf6K
arxiv_csgr: Predicting Animation Skeletons for 3D Articulated Models via Volumetric Nets https://t.co/H2hXvwrDIa
arxiv_csgr: Predicting Animation Skeletons for 3D Articulated Models via Volumetric Nets https://t.co/H2hXvwJf6K
arxiv_csgr: Predicting Animation Skeletons for 3D Articulated Models via Volumetric Nets https://t.co/H2hXvwJf6K
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 8604
Unqiue Words: 2544

2.042 Mikeys
#10. Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation
Jiahao Lin, Gim Hee Lee
Existing deep learning approaches on 3d human pose estimation for videos are either based on Recurrent or Convolutional Neural Networks (RNNs or CNNs). However, RNN-based frameworks can only tackle sequences with limited frames because sequential models are sensitive to bad frames and tend to drift over long sequences. Although existing CNN-based temporal frameworks attempt to address the sensitivity and drift problems by concurrently processing all input frames in the sequence, the existing state-of-the-art CNN-based framework is limited to 3d pose estimation of a single frame from a sequential input. In this paper, we propose a deep learning-based framework that utilizes matrix factorization for sequential 3d human poses estimation. Our approach processes all input frames concurrently to avoid the sensitivity and drift problems, and yet outputs the 3d pose estimates for every frame in the input sequence. More specifically, the 3d poses in all frames are represented as a motion matrix factorized into a trajectory bases matrix and...
more | pdf | html
Figures
Tweets
BrundageBot: Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation. Jiahao Lin and Gim Hee Lee https://t.co/WMSHJBqwef
arxivml: "Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation", Jiahao Lin, Gim Hee Lee https://t.co/g9fNR5DRO4
arxiv_cscv: Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation https://t.co/31w3APWVr6
arxiv_cscv: Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation https://t.co/31w3APWVr6
arxiv_cscv: Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation https://t.co/31w3AQewiE
Github

Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation

Repository: trajectory-pose-3d
User: jiahaoLjh
Language: Python
Stargazers: 1
Subscribers: 2
Forks: 1
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 6747
Unqiue Words: 2201

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 177,899 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 177,899 papers.