Detecting heterogeneity in generalized linear modeling

thumbnail

Tutor / Supervisor

Student

Hernández Potiomkin, Yaroslav

Document type

Master thesis

Date

2017

rights

Open AccessOpen Access

Publisher

Universitat Politècnica de Catalunya



Abstract

In classical model fitting techinques, such as traditional Multiple Linear Regression models (MLR) or Generalized Linear Models (GLM), the assumption is that the individuals come from homogeneous population. However, this condition may be not necessarily met, as there may be many factors that influence the behaviour of the individuals and therefore, biasing the model estimations. For instance, let us consider that we want to study the salaries among a certain set of individuals that come from relatively defined professional sector. The first approach would be to collect all possible modeling variables and fit the model. But it may happen that this could lead us to inaccurate estimations, since the salaries can be driven differently according to gender, region, ethnicity, among others. These variables are called segmentation variables and their number may grow very fast. In this case arises a combinatorial problem giving many possibilities of how to group those individuals. Our main goal in this work, is to go deeper in this kind of problems, and present an automatic solution to detect homogeneous segments among the heterogeneous population in the GLM context. The PATHMOX methodology is a powerful method proposed by Gastón (2009) [19] to automate the task of finding segments. The statistical tests needed to guide the PATHMOX algorithm and discover the constructs that differentiate those segments, are proposed by Lamberti (2015) [8]. First, we provide several solutions to detect heterogeneity, by means of moderating variables as in Covariance Analysis or by means of comparison of coefficients using parametric or non-parametric approaches, in section 2. Additionally, we present the method to characterize classes or continuous response by taking into account only segmentation variables in section 4. Then, we concentrate on the Generalized Linear Modeling context to define the automatic heterogeneity detection method. Then, we accurately present all the needed hypothesis test procedures in section 3. Finally, we also carry out a quite extensive simulation studies and a real problem application in sections 6 and 7, respectively.
user

Participating teacher

Files