A set of 116 structurally very diverse compounds, mainly drugs, was characterized by 1630 molecular descriptors. The biological property modelled in this study was the transdermal permeability coefficient logK(p). The main objective was to find a limited set of suitable model compounds for skin penetration studies. The classification and regression trees (CART) approach was applied and the resulting groups were discussed in terms of their role as possible model compounds and their determining descriptors. A second objective was to model transdermal penetration as a function of selected descriptors in quantitative structure-property relationships (QSPR) using a boosted CART (BRT) approach and multiple linear regression (MLR) analysis, where regression models were obtained by stepwise selection of the best descriptors. Evaluation of the standard statistical, as well as descriptor-number dependent, regression quality attributes yielded a maximal 10-dimensional MLR model. The CART and MLR models were subjected to an external validation with a test set of 12 compounds, not included in the original learning set of 104 compounds, to assess the predictive power of the models.