Monday, July 08, 2013

Who is the Biggest Loser Inducer


                So, I am going to talk about the Biggest Loser now (I really need to learn how to write intros). In the Biggest Loser contestants work under the direction of a trainer trying to lose as much weight as possible. In the first couple seasons the trainers are Jillian Michaels and Bob Harper. I was curious as to which of them was more effective at getting people to lose weight, so I turned to my old friend regression analysis. If you are unfamiliar with regression analysis I explained it in a previous post and, of course, there is a Wikipedia entry, though it does get a bit complicated.
                Caveats and Addendums: Once again I would like to remind you I am an amateur. Do not take anything I say too seriously. If you manage to get on the Biggest Loser, I claim no responsibility for any lost prize money, though if you want to share any prize money I will not object. In fact this whole thing was probably done wrong (see the end of this post for why).
                I know that for the first two seasons Bob and Jillian are the only two trainers and that the contestants stay with their trainer throughout the entire season. I was going to use data from the first two seasons, but this whole thing took a lot longer than I expected (you’d think I would have learned to expect that by now). I mainly choose to do this because I thought it would be easy. So basically I am lazy and only used data from the first season which I got from Wikipedia. Have I mentioned regression analysis is hard?
                As I explained in my first post, a regression has a dependent variable and independent variables that are supposed to explain it. In this case I choose to have the weight lost each week to be the dependent. I used height, which trainer, gender, and weight the previous week for the independent variables. If you read my previous post you might be wondering how I can include the trainer as a variable. Quite simply actually, for categorical variables such as this you simply create a binary variable, meaning it is either a zero or a one. In this case I used Jillian as a variable, so if a person had Jillian there Jillian variable would be a one, and if they had Bob it would be a zero. Then the coefficient represents what the change is when the variable is “true” i.e. how many more/less pounds one would expect to lose with Jillian as the trainer.
 I was surprised to see the R squared was only .04 and the adjusted R squared was actually negative. R squared is a statistic that shows how well the regression equation predicts the dependent variable. So a .04 means that your “predictions” would be 4% more accurate than just assuming it was the average. Basically all my hard work was totally useless.
                I decided that what week it was might be an important factor as well e.g. I have heard people usually do worse on the second week. I can’t just throw in what week it is as a variable since if you recall the variable is multiplied by a coefficient in the regression equation. What would it mean if the coefficient were .8 and the weeks go from 1-10? So like with the trainer I created ten binary variables, one for each week. When I type out the equation I got I won’t include these since they aren't what I am testing for and that would be a lot to type out. Now the adjusted R squared is .12 meaning it explains 12% of the variance in weight lost. This still isn't great, but it is better than it may seem. R squareds are often low and we aren't interested in making predictions anyway. Anyhow here are the results:
Change in weight = -6.23 -1.54(Jillian) -1.26(Male) -.25(inches)
                Apparently Jillian is the better trainer according to my results. We would expect her to get people to lose about 1.5 more pounds per week. That is too bad since I like Bob more, but one should not be trying to reach a certain result.

Oh one further addendum: I realized while typing up this post that there is a time component to this data which changes things for regression. I don’t actually know how to do a regression with a time component, so my whole method for trying to solve this is probably totally not even legitimate. I have already spent way too much time on this, so I am not going to try to figure it out now, but perhaps sometime (relatively) soon I will try to do one with time correctly. Actually now that I think about I think creating variables for the weeks might resolve that issue. I will update if I figure it out.

Side Note #1: Thanks for visiting my blog.

Side Note #2: I was hoping to have this post be on a website I have bought, but I have not set it up yet. And after dealing with issues with PSPP, the program I use, I do not feel like trying to figure it out. I suppose that is why I do these things, to learn how to deal with all the difficulties that inevitably arise. Anyway I will probably be launching the website sometime (relatively) soon.

Side Note #3: I consider this whole blog/future website thing to be a very long term project, so don’t be surprised if the quality and frequency of posts aren't always consistent.

Side Note #4: I try to strike a fine balance of not being to academicky, being informative, somewhat random, yet coherent, accessible and accurate. If you think I am failing at any or all of these or have any suggestions please let me know. Constructive criticism is always appreciated. The other kind, not so much.

Side Note # 5: Puppies!



(Credit: Flickr user Kiwi NZ)