A Programmable Approach to Model Compression
Deep neural networks frequently contain far more weights, represented at a higher precision, than are required for the specific task which they are trained to perform. Consequently, they can often be compressed using techniques such as weight pruning and quantization that reduce both model size and inference time without appreciable loss in accuracy. Compressing models before they are deployed can therefore result in significantly more efficient systems. However, while the results are desirable, finding the best compression strategy for a given neural network, target platform, and optimization objective often requires extensive experimentation. Moreover, finding optimal hyperparameters for a given compression strategy typically results in even more expensive, frequently manual, trial-and-error exploration. In this paper, we introduce a programmable system for model compression called Condensa. Users programmatically compose simple operators, in Python, to build complex compression strategies. Given a strategy and a user-provided objective, such as minimization of running time, Condensa uses a novel sample-efficient constrained Bayesian optimization algorithm to automatically infer desirable sparsity ratios. Our experiments on three real-world image classification and language modeling tasks demonstrate memory footprint reductions of up to 65x and runtime throughput improvements of up to 2.22x using at most 10 samples per search. We have released a reference implementation of Condensa at https://github.com/NVlabs/condensa.
NurtureToken New!

Token crowdsale for this paper ends in

Buy Nurture Tokens

Authors

Are you an author of this paper? Check the Twitter handle we have for you is correct.

Vinu Joseph (edit)
Saurav Muralidharan (add twitter)
Animesh Garg (edit)
Michael Garland (add twitter)
Ganesh Gopalakrishnan (add twitter)
Ask The Authors

Ask the authors of this paper a question or leave a comment.

1. (comment) 11/06/19 07:39PM

Read it. Rate it.
#1. Which part of the paper did you read?

#2. The paper contains new data or analyses that is openly accessible?
#3. The conclusion is supported by the data and analyses?
#4. The conclusion is of scientific interest?
#5. The result is likely to lead to future research?

Github
User:
Stargazers:
44
Forks:
2
Open Issues:
1
Network:
2
Subscribers:
17
Language:
Python
Programmable Model Compression
Youtube
Link:
None (add)
Views:
0
Likes:
0
Dislikes:
0
Favorites:
0
Comments:
0
Other
Sample Sizes (N=):
Inserted:
Words Total:
Words Unique:
Source:
Abstract:
None
11/06/19 06:01PM
9,530
2,910
Tweets
arxiv_cs_LG: A Programmable Approach to Model Compression. Vinu Joseph, Saurav Muralidharan, Animesh Garg, Michael Garland, and Ganesh Gopalakrishnan https://t.co/LsRdmY1B9U
arxiv_cs_cv_pr: A Programmable Approach to Model Compression. Vinu Joseph, Saurav Muralidharan, Animesh Garg, Michael Garland, and Ganesh Gopalakrishnan https://t.co/wGSjIbaveV
arxiv_cscv: A Programmable Approach to Model Compression https://t.co/K0EboEhC0t
360unfiltered: RT @SciFi: A Programmable Approach to Model Compression. https://t.co/SGmg789hUM
360unfiltered: RT @SciFi: A Programmable Approach to Model Compression. https://t.co/SGmg789hUM
arxivml: "A Programmable Approach to Model Compression", Vinu Joseph, Saurav Muralidharan, Animesh Garg, Michael Garland, Ga… https://t.co/CcNMFJycrs
SciFi: A Programmable Approach to Model Compression. https://t.co/SGmg789hUM
BrundageBot: A Programmable Approach to Model Compression. Vinu Joseph, Saurav Muralidharan, Animesh Garg, Michael Garland, and Ganesh Gopalakrishnan https://t.co/WqaKJG3Xbu
Images
Related