In fields across science and engineering, we are increasingly faced with problems where the number of variables or features we need to estimate is much larger than the number of observations. Under such high-dimensional scaling, for any hope of statistically consistent estimation, it becomes vital to leverage any potential structure in the problem such as sparsity, low-rank structure or block sparsity. However, data may deviate significantly from any one such statistical model. The motivation of this thesis is: can we simultaneously leverage more than one such statistical structural model, to obtain consistency in a larger number of problems, and with fewer samples, than can be obtained by single models? Our approach involves combining via simple linear superposition, a technique we term dirty models. The idea is very simple: while any one structure might not capture the data, a superposition of structural classes might. Dirty models thus searches for a parameter that can be decomposed into a number of simpler structures such as (a) sparse plus block-sparse, (b) sparse plus low-rank and (c) low-rank plus block-sparse. In this thesis, we propose dirty model based algorithms for different problems such as multi-task learning, graph clustering and time-series analysis with latent factors. We analyze these algorithms in terms of the number of observations we need to estimate the variables. These algorithms are based on convex optimization and sometimes they are relatively slow. We provide a class of low-complexity greedy algorithms that not only can solve these optimizations faster, but also guarantee the solution. Other than theoretical results, in each case, we provide experimental results to illustrate the power of dirty models.