Top 10 Arxiv Papers Today in Computer Vision And Pattern Recognition


2.165 Mikeys
#1. A Partially Reversible U-Net for Memory-Efficient Volumetric Image Segmentation
Robin Brügger, Christian F. Baumgartner, Ender Konukoglu
One of the key drawbacks of 3D convolutional neural networks for segmentation is their memory footprint, which necessitates compromises in the network architecture in order to fit into a given memory budget. Motivated by the RevNet for image classification, we propose a partially reversible U-Net architecture that reduces memory consumption substantially. The reversible architecture allows us to exactly recover each layer's outputs from the subsequent layer's ones, eliminating the need to store activations for backpropagation. This alleviates the biggest memory bottleneck and enables very deep (theoretically infinitely deep) 3D architectures. On the BraTS challenge dataset, we demonstrate substantial memory savings. We further show that the freed memory can be used for processing the whole field-of-view (FOV) instead of patches. Increasing network depth led to higher segmentation accuracy while growing the memory footprint only by a very small fraction, thanks to the partially reversible architecture.
more | pdf | html
Figures
Tweets
c_f_baumgartner: More exciting work from @bmic_eth accepted to @miccai2019! Master student Robin Bruegger used reversible units to develop a class of extremely memory-efficient 3D segmentation networks of almost unlimited depth. Pre-print: https://t.co/YrOAYVF8UP Code: https://t.co/6ThhPGHnNK https://t.co/Nz9gQSDfiP
c_f_baumgartner: The non-PDF link is here https://t.co/n9cZe5h6qO for those who prefer this.
arxiv_cscv: A Partially Reversible U-Net for Memory-Efficient Volumetric Image Segmentation https://t.co/OZkxjfL2Se
Github

Framework for creating (partially) reversible neural networks with PyTorch

Repository: RevTorch
User: RobinBruegger
Language: Python
Stargazers: 0
Subscribers: 0
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 3227
Unqiue Words: 1230

2.11 Mikeys
#2. Connecting Touch and Vision via Cross-Modal Prediction
Yunzhu Li, Jun-Yan Zhu, Russ Tedrake, Antonio Torralba
Humans perceive the world using multi-modal sensory inputs such as vision, audition, and touch. In this work, we investigate the cross-modal connection between vision and touch. The main challenge in this cross-domain modeling task lies in the significant scale discrepancy between the two: while our eyes perceive an entire visual scene at once, humans can only feel a small region of an object at any given moment. To connect vision and touch, we introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input. To accomplish our goals, we first equip robots with both visual and tactile sensors and collect a large-scale dataset of corresponding vision and tactile image sequences. To close the scale gap, we present a new conditional adversarial model that incorporates the scale and location information of the touch. Human perceptual studies demonstrate that our model can produce realistic visual images from tactile data and vice versa....
more | pdf | html
Figures
Tweets
BrundageBot: Connecting Touch and Vision via Cross-Modal Prediction. Yunzhu Li, Jun-Yan Zhu, Russ Tedrake, and Antonio Torralba https://t.co/hurcJL4Crx
arxivml: "Connecting Touch and Vision via Cross-Modal Prediction", Yunzhu Li, Jun-Yan Zhu, Russ Tedrake, Antonio Torralba https://t.co/wo3dmxpkTA
Memoirs: Connecting Touch and Vision via Cross-Modal Prediction. https://t.co/yfcSJJrZHa
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 7032
Unqiue Words: 2213

2.106 Mikeys
#3. Copy and Paste: A Simple But Effective Initialization Method for Black-Box Adversarial Attacks
Thomas Brunner, Frederik Diehl, Alois Knoll
Many optimization methods for generating black-box adversarial examples have been proposed, but the aspect of initializing said optimizers has not been considered in much detail. We show that the choice of starting points is indeed crucial, and that the performance of state-of-the-art attacks depends on it. First, we discuss desirable properties of starting points for attacking image classifiers, and how they can be chosen to increase query efficiency. Notably, we find that simply copying small patches from other images is a valid strategy. In an evaluation on ImageNet, we show that this initialization reduces the number of queries required for a state-of-the-art Boundary Attack by 81%, significantly outperforming previous results reported for targeted black-box adversarial examples.
more | pdf | html
Figures
None.
Tweets
BrundageBot: Copy and Paste: A Simple But Effective Initialization Method for Black-Box Adversarial Attacks. Thomas Brunner, Frederik Diehl, and Alois Knoll https://t.co/51H2HnJvZf
StatsPapers: Copy and Paste: A Simple But Effective Initialization Method for Black-Box Adversarial Attacks. https://t.co/fioqZehSyF
ballforest: RT @StatsPapers: Copy and Paste: A Simple But Effective Initialization Method for Black-Box Adversarial Attacks. https://t.co/fioqZehSyF
SythonUK: RT @StatsPapers: Copy and Paste: A Simple But Effective Initialization Method for Black-Box Adversarial Attacks. https://t.co/fioqZehSyF
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

2.076 Mikeys
#4. Towards End-to-End Text Spotting in Natural Scenes
Hui Li, Peng Wang, Chunhua Shen
Text spotting in natural scene images is of great importance for many image understanding tasks. It includes two sub-tasks: text detection and recognition. In this work, we propose a unified network that simultaneously localizes and recognizes text with a single forward pass, avoiding intermediate processes such as image cropping and feature re-calculation, word separation, and character grouping. In contrast to existing approaches that consider text detection and recognition as two distinct tasks and tackle them one by one, the proposed framework settles these two tasks concurrently. The whole framework can be trained end-to-end and is able to handle text of arbitrary shapes. The convolutional features are calculated only once and shared by both detection and recognition modules. Through multi-task training, the learned features become more discriminate and improve the overall performance. By employing the $2$D attention model in word recognition, the irregularity of text can be robustly addressed. It provides the spatial...
more | pdf | html
Figures
Tweets
BrundageBot: Towards End-to-End Text Spotting in Natural Scenes. Hui Li, Peng Wang, and Chunhua Shen https://t.co/cAxObY3VUc
arxiv_cscv: Towards End-to-End Text Spotting in Natural Scenes https://t.co/aQRujtQDj5
keylinker: RT @arxiv_cscv: Towards End-to-End Text Spotting in Natural Scenes https://t.co/aQRujtQDj5
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 11862
Unqiue Words: 3047

2.064 Mikeys
#5. Cross-View Policy Learning for Street Navigation
Ang Li, Huiyi Hu, Piotr Mirowski, Mehrdad Farajtabar
The ability to navigate from visual observations in unfamiliar environments is a core component of intelligent agents and an ongoing challenge for Deep Reinforcement Learning (RL). Street View can be a sensible testbed for such RL agents, because it provides real-world photographic imagery at ground level, with diverse street appearances; it has been made into an interactive environment called StreetLearn and used for research on navigation. However, goal-driven street navigation agents have not so far been able to transfer to unseen areas without extensive retraining, and relying on simulation is not a scalable solution. Since aerial images are easily and globally accessible, we propose instead to train a multi-modal policy on ground and aerial views, then transfer the ground view policy to unseen (target) parts of the city by utilizing aerial view observations. Our core idea is to pair the ground view with an aerial view and to learn a joint policy that is transferable across views. We achieve this by learning a similar...
more | pdf | html
Figures
Tweets
BrundageBot: Cross-View Policy Learning for Street Navigation. Ang Li, Huiyi Hu, Piotr Mirowski, and Mehrdad Farajtabar https://t.co/02ZgOpupOd
Memoirs: Cross-View Policy Learning for Street Navigation. https://t.co/LZNIru1AwL
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 7297
Unqiue Words: 2266

2.064 Mikeys
#6. Divide and Conquer the Embedding Space for Metric Learning
Artsiom Sanakoyeu, Vadim Tschernezki, Uta Büchler, Björn Ommer
Learning the embedding space, where semantically similar objects are located close together and dissimilar objects far apart, is a cornerstone of many computer vision applications. Existing approaches usually learn a single metric in the embedding space for all available data points, which may have a very complex non-uniform distribution with different notions of similarity between objects, e.g. appearance, shape, color or semantic meaning. Approaches for learning a single distance metric often struggle to encode all different types of relationships and do not generalize well. In this work, we propose a novel easy-to-implement divide and conquer approach for deep metric learning, which significantly improves the state-of-the-art performance of metric learning. Our approach utilizes the embedding space more efficiently by jointly splitting the embedding space and data into $K$ smaller sub-problems. It divides both, the data and the embedding space into $K$ subsets and learns $K$ separate distance metrics in the non-overlapping...
more | pdf | html
Figures
None.
Tweets
BrundageBot: Divide and Conquer the Embedding Space for Metric Learning. Artsiom Sanakoyeu, Vadim Tschernezki, Uta Büchler, and Björn Ommer https://t.co/a6LAmDDtaw
Memoirs: Divide and Conquer the Embedding Space for Metric Learning. https://t.co/lT4u1STYvM
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

2.061 Mikeys
#7. Utilizing the Instability in Weakly Supervised Object Detection
Yan Gao, Boxiao Liu, Nan Guo, Xiaochun Ye, Fang Wan, Haihang You, Dongrui Fan
Weakly supervised object detection (WSOD) focuses on training object detector with only image-level annotations, and is challenging due to the gap between the supervision and the objective. Most of existing approaches model WSOD as a multiple instance learning (MIL) problem. However, we observe that the result of MIL based detector is unstable, i.e., the most confident bounding boxes change significantly when using different initializations. We quantitatively demonstrate the instability by introducing a metric to measure it, and empirically analyze the reason of instability. Although the instability seems harmful for detection task, we argue that it can be utilized to improve the performance by fusing the results of differently initialized detectors. To implement this idea, we propose an end-to-end framework with multiple detection branches, and introduce a simple fusion strategy. We further propose an orthogonal initialization method to increase the difference between detection branches. By utilizing the instability, we achieve...
more | pdf | html
Figures
None.
Tweets
BrundageBot: Utilizing the Instability in Weakly Supervised Object Detection. Yan Gao, Boxiao Liu, Nan Guo, Xiaochun Ye, Fang Wan, Haihang You, and Dongrui Fan https://t.co/CeDKTHRVve
arxiv_cscv: Utilizing the Instability in Weakly Supervised Object Detection https://t.co/CC3JOmBANR
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 7
Total Words: 0
Unqiue Words: 0

2.061 Mikeys
#8. Semantics to Space(S2S): Embedding semantics into spatial space for zero-shot verb-object query inferencing
Sungmin Eum, Heesung Kwon
We present a novel deep zero-shot learning (ZSL) model for inferencing human-object-interaction with verb-object (VO) query. While the previous ZSL approaches only use the semantic/textual information to be fed into the query stream, we seek to incorporate and embed the semantics into the visual representation stream as well. Our approach is powered by Semantics-to-Space (S2S) architecture where semantics derived from the residing objects are embedded into a spatial space. This architecture allows the co-capturing of the semantic attributes of the human and the objects along with their location/size/silhouette information. As this is the first attempt to address the zero-shot human-object-interaction inferencing with VO query, we have constructed a new dataset, Verb-Transferability 60 (VT60). VT60 provides 60 different VO pairs with overlapping verbs tailored for testing ZSL approaches with VO query. Experimental evaluations show that our approach not only outperforms the state-of-the-art, but also shows the capability of...
more | pdf | html
Figures
None.
Tweets
BrundageBot: Semantics to Space(S2S): Embedding semantics into spatial space for zero-shot verb-object query inferencing. Sungmin Eum and Heesung Kwon https://t.co/H7dbeBuxo9
arxiv_cscv: Semantics to Space(S2S): Embedding semantics into spatial space for zero-shot verb-object query inferencing https://t.co/gssktKnZTJ
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

2.058 Mikeys
#9. Universal Barcode Detector via Semantic Segmentation
Andrey Zharkov, Ivan Zagaynov
Universal Barcode Detector via Semantic Segmentation
more | pdf | html
Figures
None.
Tweets
arxivml: "Universal Barcode Detector via Semantic Segmentation", Andrey Zharkov, Ivan Zagaynov https://t.co/O5L1bnGU57
Memoirs: Universal Barcode Detector via Semantic Segmentation. https://t.co/IeXpA9WQpu
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

2.057 Mikeys
#10. MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation
Lorenzo Bertoni, Sven Kreiss, Alexandre Alahi
We tackle the fundamentally ill-posed problem of 3D human localization from monocular RGB images. Driven by the limitation of neural networks outputting point estimates, we address the ambiguity in the task with a new neural network predicting confidence intervals through a loss function based on the Laplace distribution. Our architecture is a light-weight feed-forward neural network which predicts the 3D coordinates given 2D human pose. The design is particularly well suited for small training data and cross-dataset generalization. Our experiments show that (i) we outperform state-of-the art results on KITTI and nuScenes datasets, (ii) even outperform stereo for far-away pedestrians, and (iii) estimate meaningful confidence intervals. We further share insights on our model of uncertainty in case of limited observation and out-of-distribution samples.
more | pdf | html
Figures
Tweets
Github
Repository: monoloco
User: vita-epfl
Language: Python
Stargazers: 1
Subscribers: 5
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 6457
Unqiue Words: 2084

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 143,632 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 143,632 papers.