So,
I am going to talk about the Biggest Loser now (I really need to learn how to
write intros). In the Biggest Loser contestants work under the direction of a
trainer trying to lose as much weight as possible. In the first couple seasons
the trainers are Jillian Michaels and Bob Harper. I was curious as to which of
them was more effective at getting people to lose weight, so I turned to my old
friend regression analysis. If you are unfamiliar with regression analysis I
explained it in a previous post and, of course, there is a Wikipedia entry,
though it does get a bit complicated.
Caveats
and Addendums: Once again I would like to remind you I am an amateur. Do not
take anything I say too seriously. If you manage to get on the Biggest Loser, I
claim no responsibility for any lost prize money, though if you want to share
any prize money I will not object. In fact this whole thing was probably done
wrong (see the end of this post for why).
I
know that for the first two seasons Bob and Jillian are the only two trainers
and that the contestants stay with their trainer throughout the entire season.
I was going to use data from the first two seasons, but this whole thing took a
lot longer than I expected (you’d think I would have learned to expect that by
now). I mainly choose to do this because I thought it would be easy. So
basically I am lazy and only used data from the first season which I got from Wikipedia. Have I mentioned regression analysis is hard?
As
I explained in my first post, a regression has a dependent variable and
independent variables that are supposed to explain it. In this case I choose to
have the weight lost each week to be the dependent. I used height, which
trainer, gender, and weight the previous week for the independent variables. If
you read my previous post you might be wondering how I can include the trainer
as a variable. Quite simply actually, for categorical variables such as this you
simply create a binary variable, meaning it is either a zero or a one. In this
case I used Jillian as a variable, so if a person had Jillian there Jillian variable
would be a one, and if they had Bob it would be a zero. Then the coefficient
represents what the change is when the variable is “true” i.e. how many
more/less pounds one would expect to lose with Jillian as the trainer.
I was surprised to see the R squared was only
.04 and the adjusted R squared was actually negative. R squared is a statistic
that shows how well the regression equation predicts the dependent variable. So
a .04 means that your “predictions” would be 4% more accurate than just
assuming it was the average. Basically all my hard work was totally useless.
I
decided that what week it was might be an important factor as well e.g. I have
heard people usually do worse on the second week. I can’t just throw in what
week it is as a variable since if you recall the variable is multiplied by a coefficient
in the regression equation. What would it mean if the coefficient were .8 and
the weeks go from 1-10? So like with the trainer I created ten binary variables,
one for each week. When I type out the equation I got I won’t include these
since they aren't what I am testing for and that would be a lot to type out.
Now the adjusted R squared is .12 meaning it explains 12% of the variance in
weight lost. This still isn't great, but it is better than it may seem. R
squareds are often low and we aren't interested in making predictions anyway.
Anyhow here are the results:
Change
in weight = -6.23 -1.54(Jillian) -1.26(Male) -.25(inches)
Apparently
Jillian is the better trainer according to my results. We would expect her to
get people to lose about 1.5 more pounds per week. That is too bad since I like
Bob more, but one should not be trying to reach a certain result.
Oh one further addendum: I realized
while typing up this post that there is a time component to this data which
changes things for regression. I don’t actually know how to do a regression
with a time component, so my whole method for trying to solve this is probably
totally not even legitimate. I have already spent way too much time on this, so
I am not going to try to figure it out now, but perhaps sometime (relatively)
soon I will try to do one with time correctly. Actually now that I think about
I think creating variables for the weeks might resolve that issue. I will
update if I figure it out.
Side Note #1: Thanks for visiting
my blog.
Side Note #2: I was hoping to have
this post be on a website I have bought, but I have not set it up yet. And
after dealing with issues with PSPP, the program I use, I do not feel like
trying to figure it out. I suppose that is why I do these things, to learn how
to deal with all the difficulties that inevitably arise. Anyway I will probably
be launching the website sometime (relatively) soon.
Side Note #3: I consider this whole
blog/future website thing to be a very long term project, so don’t be surprised
if the quality and frequency of posts aren't always consistent.
Side Note #4: I try to strike a
fine balance of not being to academicky, being informative, somewhat random,
yet coherent, accessible and accurate. If you think I am failing at any or all
of these or have any suggestions please let me know. Constructive criticism is
always appreciated. The other kind, not so much.
Side Note # 5: Puppies!
(Credit: Flickr user Kiwi NZ)
No comments:
Post a Comment