Adam

Empirical Software Evaluation

by Adam on January 14, 2010

in Software Engineering

The empirical context of Software Evaluation refers to evaluation based on “information gained by means of observation, experience, or experiment.” It is very much to do with rigorous, scientific investigations and evaluations. This is in contrast to anecdotal evidence and “rules of thumb”.

The following questions are related to the empirical context.

Question 2.1

For her final year project, Samantha has been developing a layout tool that will help ‘improve’ the appearance of web pages by applying a set of rules to help position the contents of a page. To evaluate this, she has produced a set of ‘test pages’, some of them in ‘raw’ form and some have been produced from her layout tool. She plans to ask a number of other students to view these pages, and to rank each one for clarity using a scale of 1 – 10.

What is the independent variable?
In an experiment the independent variable is associated with cause. It is the variable that can be changed by the investigator in order to change the outcome of the experiment. The value of an independent variable should not be affected by other variables.

In Samantha’s investigation, the independent variable is the layout algorithm used to determine the onscreen layout of the web pages by her tool.

What is the dependent variable?
The dependent variable in associated with effect.  So essentially, the dependent variable is indirectly altered by the independent variable the investigator is able to change.

In Samantha’s investigation, the dependent variable is the onscreen layout, and its associated “improvedness”, that is altered by changing the independent variable, the layout algorithm.

What confounding factors might affect her evaluation process?
A confounding factor is an undesirable element in a study that produces an effect that makes it impossible to differentiate between two or more causes of an effect.  For example, is it Test Driven Development producing less buggy software, or is it more experienced developers naturally producing less buggy software?  Skill level and ability are a common confounding factor in experimental studies.

In Samantha’s experiment there are a number of possible confounding factors:

  • Screen quality may effect the clarity of the web pages if they are being viewed on a low quality screen.
  • The students current default browser is likely to effect the amount they feel her tool has improved web page layout, someone using IE5 is likely to have a greater impression of the tool than someone using FF3.
  • The students exposure and internet usage may also effect how they rate the tool as more advanced users may not feel the benefit, or conversely may appreciate it more…


What factors might influence the external validity of her results?
There are many different threats to validity.  These are factors which affect a study which may influence the validity of the conclusions drawn.

Internal validity are factors that may affect the the outcomes of the experiment without the investigators knowledge and may put the causal relationship between the treatment and outcome in question.  This could be where participants have not followed the prescribed activities correctly or where the membership of groups (control and non-control) has been unbalanced, even when selected at random.

External validity is concerned with how generalisable the result may be the wider population.  The concern is that the results are only applicable in the context of the investigation, i.e. if students are used in the study then it may not be possible to extrapolate the results to the whole population.

Construct validity is concerned with how well the outcomes of the study are linked to the concepts or theory behind the study, for example, comparison between methods to determine which is “better” without a clear definition of “better”.

In Samantha’s experiment main possible factor effecting the external validity of her results are the group of students she selects to use the tool.  It may not be possible to generalise from a group of intelligent, computer literate students to the wider population.

Question 2.2

Ethics Committee are generally concerned with the question of whether taking part in a particular study might affect the well-being of a participant, or influence their final grades. For example, if taking part in a study gives a student extra practice using a technique that will later appear as part of an exam question.

How might Software Engineering studies have an adverse affect on participants?

There’s a slim possibility that Software Engineering studies may have adverse physical affects on participants.  If there the participants were exposed to flashing and/or flickering they may be at risk of epileptic seizures.  There may be other medical conditions that could be triggered by experiments and studies.

Alternatively, an individuals privacy or anonymity may be at risk by partaking in a study.  Privacy should be preserved to ensure the fairest results, people may change their habits or not be completely truthful if their responses are not anonymous.

Question 2.3

Tim has been developing a new mail client utility using predictive text as part of his final year project, where this has different interactive options for accepting/rejecting the predictions. He has arranged for ten other students to use this as their mailer for a week, in order to provide him with feedback via a set of questionnaires. However, his software also covertly collects data about keystrokes to aid his analysis, and does so in such a way that it is potentially possible to reconstruct the individual mail messages. Tim does not mention this feature to the participants as he does not want them to restrict the use of his software – and of course, he has no intention of reconstructing any messages.

What is the ethical position that Tim is in?

Tim is in a complex ethical position.  While he believes he will not use the collected data, covert data collection is generally considered to be unethical (is it?) .

Ideally he should alter his tool so that the data collection is anonymised (if that’s possible); or he should forgo collecting the keystroke information; or if he absolutely has to, and can’t anonymise it, he should disclose the data collection to the participants.

Related posts:

  1. Software Evaluation – Experiments
  2. Software Evaluation – Case Studies
  3. Software Evaluation – Quasi-Experiments & Interviews
  4. Software Evaluation – Surveys
  5. Software Evaluation Introduction

Leave a Comment