Systematic Review of the Evaluation of Foster Care Programs

[Revisión sistemática sobre evaluación de programas de acogimiento familiar]

Laura Vallejo-Slocker1, Nahia Idoiaga-Mondragon2, Inge Axpe2, Rosalind Willi3, Mercedes Guerra-Rodríguez4, Carme Montserrat5, and Jorge F. del Valle6

1SOS Children’s Villages, Madrid, Spain; 2University of the Basque Country (UPV/EHU), Leioa, Spain; 3SOS Children’s Villages International, Innsbruck, Austria; 4Spanish Society of Rheumatology, Madrid, Spain; 5University of Girona, Spain; 6University of Oviedo, Spain

Received 20 February 2023, Accepted 23 June 2023


Objetive: The aim of this study was to conduct an exhaustive synthesis to determine which instruments and variables are most appropriate to evaluate foster care programs (foster, kinship, and professional families). This evaluation includes the children, their foster families, their families of origin, professionals, and foster care technicians. Method: The systematic review included randomized, quasi-randomized, longitudinal, and control group studies aimed at evaluating foster care interventions. Results: A total of 86 studies, 138 assessment instruments, 18 constructs, and 73 independent research teams were identified. Conclusions: (1) although the object of the evaluations was the children, the informants were usually the people in charge of their care; therefore, effort should be made to involve the children in a more participatory way; (2) psychosocial functioning, behavior, and parenting are transversal elements in most evaluations, while quality of life and coping are not sufficiently well incorporated; (3) practical instruments (brief and easy to apply and correct) that are widely used and carry scientific guarantees should be prioritized to ensure the comparability and reliability of the conclusions; and (4) progress should be made in the study of evaluation models for all forms of foster care, including foster, extended, and specialized families.


Objetivo: El objetivo es realizar una síntesis exhaustiva que contribuya a determinar qué instrumentos y variables son las más adecuadas para evaluar programas de acogimiento familiar (familias extensas, ajenas y profesionalizadas), incluyendo en esta evaluación a los niños, sus familias acogedoras, sus familias de origen y a los profesionales y técnicos del acogimiento familiar. Método: La revisión sistemática incluyó estudios aleatorizados, cuasialeatorizados, longitudinales y con grupo control dirigidos a evaluar intervenciones de acogimiento familiar. Resultados: Se identificaron 86 estudios, 138 instrumentos de evaluación, 18 constructos y 73 equipos de investigación independientes. Conclusiones: (1) aunque el objeto de las evaluaciones sean los niños, habitualmente los informantes son las personas a cargo de sus cuidados, con lo que se debe hacer un esfuerzo por involucrarlos de forma más participativa; (2) el funcionamiento psicosocial, el comportamiento o la parentalidad son elementos transversales en la mayor parte de evaluaciones, sin embargo la calidad de vida y el afrontamiento no están suficientemente bien incorporados; (3) deben priorizarse instrumentos prácticos (breves y fáciles de aplicar y corregir), de amplio uso y con garantías científicas para asegurar la comparabilidad y fiabilidad de las conclusiones; (4) debe avanzarse en la investigación de modelos de evaluación en todas las modalidades de acogimiento familiar, ya sea en familias ajenas, extensas o especializadas.


Psychosocial evaluation, Program evaluation, Foster care, Foster child, Foster family

Palabras clave

Evaluación psicosocial, Evaluación de programas, Acogimiento familiar, Niños en acogimiento, Familias acogedoras

Across the European Union, it is estimated that there are 421,810 children in foster care (UNICEF & Eurochild, 2021). Many of these children have experienced problems or deficiencies in their families of origin with regard to their care that motivated their family separation to protect the children’s best interests and safety (Bald et al., 2022).

Because these are situations in which the child is in danger, it is necessary to implement protective measures such as foster care and to provide these children with a new family environment with greater security, protection, and stability (Bernedo et al., 2022). However, because of their situation before leaving their family of origin and the impact of having to leave or being removed from their family and forced to adapt to a new family, these children often have complex mental and physical health needs arising from experiences of abuse, trauma, and loss (Dickes, 2018; Font & Gershoff, 2020). These needs and conditions make them especially vulnerable and remain even into adulthood (Bald et al., 2022). The capacity of caregivers and, especially, the child protection system influence the quality, suitability, and effectiveness of foster care interventions (Dickes, 2018; Gale, 2019). Ultimately, these factors determine the success of protection measures in terms of stability, improvement in the child’s well-being, and increased chances of reintegration or self-sufficiency when leaving care. In contrast, they may lead to the interruption of the new placement in an alternative care setting.

The latter situation can cause significant additional damage and contribute to detrimental effects throughout life (e.g., Connell et al., 2006; Gypen et al., 2017; Oosterman et al., 2007). Children who have experienced changes from one alternative care modality to another often experience problems compared to their peers in key aspects of well-being throughout their lives (e.g., Pecora et al., 2005; Sacker et al., 2021), including their mental health (Engler et al., 2022). There are multiple reasons for these problems, including interruptions in schooling, physical and mental health difficulties, stigma, and negative experiences in the protection system (Harrison et al., 2022).

In this context, both researchers and professionals (Family for Every Child, 2015; Font & Gershoff, 2020; George et al., 2003) agree that it is of utmost importance to develop an integrated evaluation approach for foster care and that effective evaluations should focus on both child welfare outcomes and care processes. In other words,

[…] evaluation should be both systemic – assessing average system performance – and individual – assessing, for each child, whether the system is meeting their needs and, if not, what needs to change in their case plan or foster care environment (Font & Gershoff, 2020, p. 18).

In addition, evaluations should consider the voices of children and value their opinions and ideas in relation to their needs, strengths, and care, which can considerably improve protection services (Ager et al., 2012; Font & Gershoff, 2020; Gale, 2019; Gaskell, 2010; Randle, 2013). It is also important to consider the family of origin because it is essential to understand children’s situation and to prepare them for reunification with their biological parents (Lau et al., 2003).

Ideally, evaluations should also consider the protective factors, resilience, and strengths of children rather than focusing exclusively on their problems and deficiencies (Ager et al., 2012). This wide evaluation could contribute to reducing the breakdown in foster care programs that assess children’s characteristics, the type of placement, families, and the interaction between these factors (Montserrat et al., 2020).

However, there is still no consensus on what outcomes and aspects of foster care to measure and how to do so in a robust way. This situation is further complicated by the different and complex stories of care, trajectories, care settings and caregivers, and other possible variables of the children in care, creating considerable methodological challenges to a rigorous evaluation (Ager et al., 2012; Dickes, 2018; Gale, 2019). There are also problems derived from limited internal and external validity and clinical heterogeneity (Dickes, 2018). Furthermore, many studies on outcomes lack a holistic view of children or adolescents and do not consider contextual circumstances and previous care settings (Gale, 2019). As a result, evidence on the effectiveness of interventions in foster care remains limited, and the results are rarely comparable.

Furthermore, the evaluation of these interventions is complex because of the multiplicity of factors and actors. Psychological factors include behavior, cognition (intelligence, language), emotion, and psychological functioning related to family and social contexts, whereas actors refer to the parents or caregivers, including foster professionals or technicians (psychologists, educators, social workers), who help implement programs to improve foster care. This area of assessment therefore involves many interdependences.

Our ultimate objective is to contribute to the development of a systematic and flexible evaluation methodology and tool to measure the effectiveness and quality of interventions in foster care. Specifically, we intend to identify which evaluation instruments are used to evaluate foster care programs, which variables or constructs are most frequently evaluated, and the characteristics of the evaluation instruments that are usually used in these programs.

To obtain a better understanding of the complexity of evaluating foster care interventions, this systematic review uses research of high methodological quality (randomized or quasi-randomized, longitudinal, or control group research designs) to identify relevant evaluation techniques.


Inclusion Criteria

Randomized or quasi-randomized studies, longitudinal studies or studies with a control group that were published in peer-reviewed journals were selected for the systematic review.

The participants of the studies included children and adolescents from 0 to 18 years of age, their foster families (any type of foster family, including extended or professional families foster care models), and their families of origin.

The systematic review focused on foster care programs in foster families, extended families, and professional families. In some countries, professional and foster families are considered the same; however, some countries distinguish between these two types and consider professional families to be foster families that establish a contractual relationship with the care institution that supervises the foster process.

The selected studies focused on evaluating the efficacy and effectiveness of foster care programs (including foster families, kinship families, and professional families), identifying constructs and validating evaluation tests. Studies that compared the results with those for other types of foster care, with residential care, or with a control group or those that evaluated only a specific type of foster care, were considered.

Search Methods

Because of the breadth and depth of the topic in question, a scoping review was carried out during the initial phase of the study as a first approach. This allowed for the definition of the specific objectives and research questions of the current systematic review. The entire process was carried out in a consensual manner.

The research questions were formulated in a specific way following the Patient, Intervention, Comparison, Outcome format (PICO format; Eriksen & Frandsen, 2018) together with the SPICE format (Setting, Perspective, Intervention, Comparison, Evaluation; Booth, 2006). Once the questions were formulated, a bibliographic search was conducted in Medline (through PubMed), Embase (Elsevier), Cochrane Library (Wiley Online Library), Scopus (Elsevier), PsycINFO (EBSCO Host), ERIC (EBSCO Host), PsycArticles (EBSCO Host), and PSICODOC (EBSCO Host). These databases were selected because they were the most appropriate for the content of the study and because they were expected to provide the highest yield of relevant results in the preliminary and exploratory searches.

Search strategies were performed by combining terms in free language, which were restricted in many cases to the title and abstract, and controlled language using the thesaurus of each database (MeSH, Emtree, and Decs) to balance the sensitivity and specificity of the searches. No geographic, temporal, or language restrictions were established in the search strategy.

The last search was conducted on April 3, 2022. The search process was completed with a manual search of references, the setting of bibliographic alerts in the databases used and posters and abstracts of congresses (excluded in the searches) of interest to reviewers and experts. In May 2022, the systematic review protocol for the evaluation of foster care was registered in the International Prospective Registry of Systematic Reviews (PROSPERO) (code CRD42022312993).

To manage the bibliographic references retrieved, the Mendeley bibliographic manager was used after duplicates from the different databases were eliminated.

Figure 1

Flow Chart.

Note. The reasons for exclusion refer to noncompliance with the following criteria: reason 1-participants, reason 2-intervention, reason 3-comparison, reason 4-study, reason 5-instrument, reason 6-results, reason 7-combination of 2 criteria, reason 8-combination of 3 criteria, reason 9-combination of 4 criteria, reason 10-combination of 5 criteria, reason 11-all criteria, and reason 12-other inclusion and/or exclusion criteria.


Screening by title and abstract was performed independently by 2 of the members of the research team. Discrepancies were resolved by consensus after detailed analysis and after consulting the opinion of a third member. The three members had PhDs in psychology and specialized in the study of children and adolescents.

In this phase, the studies that were excluded were (1) directed at populations outside the child protection system (graduates of protection programs, war veterans, and groups of elderly people); (2) evaluations of hospital and psychiatric programs focused on mental health problems; (3) assessments of exclusively residential care programs or orphanages; (4) evaluations of adoption programs; and (5) analyses of foster care programs focused on drug use, financial resources, nutritional aspects, motor development, physical health, AIDS, or other specific medical conditions.

Subsequently, full text reading was performed following the same process. In this systematic review, studies were included that (1) used validated assessment instruments; (2) empirically evaluated intervention programs using quantitative designs; and (3) assessed foster care programs with a focus on psychosocial aspects (improvement in children’s well-being, mental health, behavior, resilience, attachment, interpersonal relationship patterns, and social skills patterns; interventions focused on improving family relationships, reducing conflict of loyalties, improving parenting skills, reducing parental stress, and improving the psychological well-being of parents).

The search identified 5,334 studies. A total of 1,665 studies were removed because they were duplicates, and 3,384 studies were screened (3,369 from the databases and 15 from automatic alerts). Finally, 86 studies were included in the review, while 512 were excluded for other reasons. The complete process can be seen in Figure 1. The key terms used in the search strategy were foster home care, foster care, foster family, foster home, foster parents, foster child, kinship home care, kinship care, foster grandparents, foster grandmothers, questionnaire, checklist, instrument, outcome, process assessment, effectiveness, and program evaluation.

Data Analysis

A database was prepared with the 86 studies included in the systematic review in which the variables of interest for the analysis were collected: the authors and research teams, year of publication, country, questionnaires used, domains and subdomains evaluated, target population, informants, and intervention. Subsequently, the original data matrix was transformed into several ad hoc views that allowed the interrelation of some variables. Then, a high-level descriptive analysis was conducted using Power BI, which allows the handling of large volumes of this type of data and its visualization in a synthesized way. Different aggregation and filter functions were used in Power Query to calculate counts, nested counts, percentages, and other aggregation measures for each of the research questions. The interrater reliability calculation was performed with SPSS. The risk of bias was estimated through interrater agreement calculated using Cohen’s kappa. A degree of agreement of .762 was obtained for the title and abstract screening and .912 for the full-text screening.


The 86 articles included were conducted by 73 different research teams, and among the studies 138 different instruments were identified. Different versions of a questionnaire were counted as a single questionnaire. For example, if the children’s and parents’ version of the SDQ was used in a study, the study was considered to have used one questionnaire rather than two and to have applied two different versions.

Most of the studies in this systematic review were conducted in the United States (57.53%) or the United Kingdom (9.59%). The remaining studies were conducted in Romania, the Netherlands, Australia, Korea, Norway, Spain, Canada, Belgium, Iraq, and Kurdistan. The included studies were conducted between 1997 and 2021.

Target Population and Informants in the Evaluation of Foster Care Programs

Regarding the target population of the included studies, of the 73 research teams, 8.22% were directed only toward caregivers, 52.05% were directed only toward children, and the remaining 39.73% were directed toward both groups. In sum, the most common target population was children (73.61%). This percentage was calculated by summing the proportion of studies that were directed only toward children and the proportion of studies that included children and other groups of participants followed by their caregivers (26.39%). This percentage was calculated by summing the proportion of studies that exclusively included caregivers with the proportion of studies that included caregivers and other groups of participants. No studies were found whose target population was foster care technicians, social workers, or educators who accompanied families and children or families of origin.

A total of 58.90% of the studies focused exclusively on foster care in foster families, while only 5.48% focused on foster care in extended families; 21.92% jointly evaluated both, and 13.70% compared foster care with foster families with residential care. No data were collected on other forms of foster care, such as the foster care model in professional families.

The most common sources of data were adult caregivers (52.79%), followed by children (27.88%) and professionals (19.33%). Target populations did not always coincide with the informants, giving rise to various target population x informant possibilities, as shown in Figure 1, where the proportions and types of self-reports and other-reports are presented. Other-reports (50.19%) were more common than self-reports (49.81%). The most frequent category was child-caregiver other-reports (30.86%), followed by child-child self-reports (27. 88%), caregiver-caregiver self-reports (21.93%), child-professional other-reports (14.87%), and caregiver-professional other-reports (4.46%).

Figure 2

Proportion of Self-reports and Other-reports.

Note. The nomenclature for the types of self-reports and other-reports is expressed using the following structure: target population-informant.

Although the main target population group was children, the main informants were not children but rather their caregivers (families of origin or foster families) in 30.86% of cases and professionals (foster care technicians, psychologists, educators, social workers, teachers, and researchers) in 14.87% of the cases. The main object of the evaluations of these adults was usually questions that concerned children and, with much less frequency, questions that referred to themselves (in 45.73% of other-reports, caregivers and professionals contributed information on the children, and in 21.93% of self-reports, caregivers reported on themselves). Children participated as informants only to talk about themselves (27.88% of child-child self-reports), and there were no studies in which children reported on aspects related to their caregivers (0% caregiver-child other-reports) (see Figure 2).

The distribution of self-reports and other reports, as well as their typologies, varied depending on the construct evaluated (Table 1). For children, self-reports predominated over other-reports with regard to cognitive aspects (67.86% of self-reports compared to 32.14% of other reports between caregivers and professionals), psychopathology (35.29% of self-reports compared to 23.53% of other-reports between caregivers and professionals), self-concept (70% of self-reports compared to 30% of other-reports between caregivers and professionals), coping (66.67% of self-reports compared to 33.34% of other-reports between caregivers and professionals), quality of life (60% of self-reports of children compared to 20% of other-reports between caregivers and professionals and an additional 20% of self-reports of caregivers about themselves), social support (66.67% of self-reports compared to the absence of other-reports), and other aspects such as autonomy, education, health, and facilities (80% of self-reports compared to 20% of other-reports between caregivers and professionals). Aspects such as the psychosocial functioning of the child or adolescent and his or her behavior, trauma, family relationships, attachment, and psychological well-being were more frequently reported by caregivers and professionals than by the children themselves.

Table 1

Informants, Target Population, Constructs, and Instruments in the Evaluation of Foster Care Programs

Note. The nomenclature for the types of self-reports and other-reports are expressed using the following structure: target population-informant. 1Target population coincided with the informant; 2target population was children, and the informants were their caregivers (family of origin or foster family); 3target population was children, and the informants were professionals (foster care technicians, psychologists, educators, social workers, teachers and researchers); 4target population was the caregivers (family of origin or foster family), and the informants were the children; 5target population was the caregivers (family of origin or foster family), and the informants were professionals (foster care technicians, psychologists, educators, social workers, teachers and researchers); 6domains for which data were not collected individually from more than 1 research team: autonomy, education, health and facilities; 7number of different instruments identified by construct; and 8number of research teams included in each construct in the evaluation.

For caregivers, information provided by professionals was rarely collected (4.46% of caregiver-professional other-reports), and no related information was obtained from children (caregiver-child other-reports). Constructs such as parenting (71.43% of caregiver-caregiver self-reports), family relationships (40% of caregiver-caregiver self-reports), aspects related to the intervention (100% of caregiver-caregiver self-reports), and psychopathology predominated in the assessment of caregivers. With respect to this last construct, although the psychopathological aspects of the parents were of interest (41.18% of caregiver-caregiver self-reports), more importance was given to these aspects of children (58.82% of child-child self-reports and other-reports by their caregivers and professionals). Constructs such as the psychosocial functioning of the parents, cognitive aspects, self-concept, coping, or other types of issues in the child-adolescent population were not evaluated.

Finally, the information provided by the professionals was especially relevant for child assessments (14.87% of child-professional other-reports) and was used less often for the evaluation of caregivers (4.46% of caregiver-professional other-reports). Specifically, more weight was given to the viewpoints of professionals than the viewpoints of children regarding aspects related to psychosocial functioning (26.98% of child-professional other-reports compared to 17.46% of child-child self-reports) or their psychological well-being (28.57% of child-professional other-reports compared to the absence of child-child self-reports).

Constructs in the Evaluation of Foster Care Programs

In total, 18 related constructs were identified. Figure 3 shows the frequency with which these constructs were evaluated, with psychosocial functioning being the most common in the evaluations. The graph distinguishes target populations, that is, whether the objective of the evaluation was to assess the situation of children or the situation of their main caregivers, who could be both families of origin and foster families. Some research teams focused their evaluation on both groups. No evaluations were found whose objective was to assess the situation of foster care technicians (psychologists, social workers, educators, etc.).

Figure 3

Number of Research Teams that Evaluated Each of the Identified Constructs for the Different Target Populations.

Note. CA = children and adolescents.

As shown in Figure 3, psychosocial functioning, cognitive aspects, self-concept, coping, autonomy, and educational and health aspects are constructs that were only evaluated in the child-adolescent population. Aspects related to the intervention and assessment of the foster care program were only addressed to caregivers. The other constructs were of interest in both populations, although in general the research teams tended to evaluate each of these aspects in only one of these populations, with few considering the perspective of both groups for the same issue. In this regard, only 2 teams conducted studies from the perspective of children and caregivers about behavioral and parenting issues, and only 1 did so in the case of family relationships and attachment. These constructs tended to be used in combination in evaluations. On average, 2.4 different constructs were evaluated per study, with a maximum of 7 different constructs in the same study.

Figure 4

Combinations of Constructs most Frequently Used in the Evaluation of Foster Care by Research Teams.

Note. “Psychosocial F” refers to psychosocial functioning.

In total, 55 different variations were found. The 14 combinations used by at least 2 research teams were considered relevant for the analysis and comprised the following constructs: psychosocial functioning, behavior, parenting, cognition, psychological well-being, trauma, family relationships, and psychopathology. In total, 5 of the combinations were unidimensional (they evaluated a single construct), and 9 were multidimensional (see Figure 4). Constructs such as attachment, self-concept, and coping remained outside the usual evaluation schemes of foster care programs. Although they were evaluated by more than 1 research team, they were not part of the most frequent evaluation strategies.

Considering Figures 3 and 4 together, the psychosocial functioning construct was the most frequently evaluated and was rarely evaluated in isolation (only 6 research teams used it without combining it with other constructs). It tended to be used in a generalized way in conjunction with other constructs. A similar result was observed for parenting.

Finally, in most constructs, there was high variability in the measurement instruments. For some constructs, there were almost as many different questionnaires as research teams (Table 1). In this regard, the psychosocial functioning construct was the one for which the greatest consensus was found, i.e., 11 questionnaires were used by the 44 teams that evaluated this construct.

For the evaluation of cognition, more questionnaires were used (n = 20) than the number of research teams that measured this construct (n = 17). For other constructs, each research team used a different questionnaire. This occurred, for example, in the evaluation of family relationships, where 11 different questionnaires were used by 11 different research teams.

Instruments Used in the Evaluation of Children and Adolescents in Foster Care Programs

The main population under study was children. In fact, in the publications analyzed from 73 different research teams, 39 teams focused their evaluations only on children, 30 evaluated children and their main caregivers, and only 7 focused on the adults in charge of care. The data for the evaluations with children as the study population were obtained mainly through other-report measures (see Figure 1).

Among the most analyzed constructs or variables in the child population in studies in which the informants were, for the most part, adults, the following stand out: psychosocial functioning (55.56% of child-caregiver other-reports and 26.98% of child-professional other-reports), behavior (44.95% of child-caregiver other-reports and 18.92% of child-professional other-reports), trauma (33.33% of child-caregiver other-reports and 11.11% of child-professional other-reports), family relationships (13.33% of child-caregiver other-reports and 13.133% of child-professional other-reports), attachment (50% of child-caregiver other-reports), and psychological well-being (57.14% of child-caregiver other-reports and 28.57% of child-professional other-reports) (Table 1).

Among the constructs in which the main informants were the children themselves, the following stand out: cognitive development (67.86% of child-child self-reports), psychopathology (35.29%), self-concept (70%), coping (66.67%), quality of life (60%), social support (66.67%), and other aspects such as autonomy, education, health, and facilities (80%) (Table 1).

For the analysis of the instruments used to evaluate each of the constructs, those that were not used by at least 2 research teams were excluded from Table 2. In the same way, constructs for which instruments with sufficient evidence were not found were excluded.

For psychosocial functioning (construct evaluated by 44 teams; see Figure 2), 11 different instruments specific to the child population were detected, of which 4 were used by at least 2 research teams. The most commonly used instruments were the Child Behavior Checklist (CBCL; Achenbach & Rescorla, 2001) and the Strengths and Difficulties Questionnaire or SDQ (Goodman, 1997) (Table 2). This construct presented less variability in the proportion of instruments used versus the number of research teams.

Table 2

Instruments Most Commonly Used for the Evaluation of Children and Adolescents

Note. Those that were not used by at least 2 research teams were excluded from the analysis. Those constructs for which instruments with sufficient evidence have not been found were also excluded.

For child behavior or conduct (a construct evaluated by 19 teams; see Figure 2), 12 different instruments were found specifically for children that were organized into 5 different categories: general behavior or conduct, antisocial behavior, assertive behavior, hyperactive or aggressive behavior, and social competition. Of these 15 instruments, only 4 were used by at least 2 research teams. The most widely used measurement instrument was the Parent’s Daily Report Checklist (PDR; Chamberlain & Reid, 1987) (Table 2).

For the measurement of aspects related to cognition (construct evaluated by 17 teams; see Figure 2), 20 different instruments were found specifically for children. In some studies, several questionnaires were combined to collect information regarding various cognitive areas. These instruments were organized into 6 subcategories: intelligence, language, general cognition, development, flexibility, and theory of mind. Of the 17 instruments, 6 were used by at least 2 research teams, including the Kaufman Brief Intelligence Test (KBIT; Kaufman & Kaufman, 1990) (Table 2).

Psychological well-being (evaluated by 5 teams; see Figure 2) was measured with 4 different questionnaires specific to children organized into 2 categories: general psychological well-being and mental and physical health. Of these 6 questionnaires, only the Health and Behavior Questionnaire (HBQ; Essex et al., 2002) (Table 2) was used by at least 2 research teams.

Trauma (evaluated by 9 teams; see Figure 2) was measured with 8 different instruments specific to children organized into 2 categories: trauma and abuse. Of these 11 questionnaires, only the Trauma Symptom Checklist (TSC; Briere, 1996; Briere et al., 2001) (Table 2) was used by at least 2 research teams.

For the evaluation of psychopathology (evaluated by 7 teams; see Figure 2), 8 different instruments specific to children were used that were organized into 4 categories: anxiety-stress, depression, psychopathology in general, and hopelessness. Of these 4 questionnaires, only 2, the Revised Children’s Manifest Anxiety Scale (Reynolds & Richmond, 1985) and the Children’s Depression Inventory (Kovacs, 1985), were used by at least 2 research teams (see Table 2).

For self-concept (evaluated by 6 teams; see Figure 2), 6 different specific questionnaires for children were used that were organized into 4 categories (self-regulation, self-concept, self-efficacy, and self-esteem), with only the Self-Perception Profile for Children (Harter, 1982, 1985, 1988) used by more than 1 research team. For coping (evaluated by 5 teams; see Figure 2), 3 different questionnaires specific to children were used, with only the Vineland Adaptive Behavior Scales (Sparrow et al., 1984; Sparrow et al., 1993; Sparrow et al., 1989) used by more than 1 research team. Finally, for attachment (evaluated by 5 teams; see Figure 2), 4 different questionnaires specific to children were used, with only the Disturbance of Attachment Interview (Smyke et al., 2002; Smyke & Zeanah, 1999) used by more than 1 research team (Table 2).

Instruments Used in the Evaluation of Caregivers in Foster Care Programs

Adults were the main informants, whether the evaluation was directed at themselves or at issues that concerned children (see Figure 1). Among the most analyzed constructs or variables of caregivers in studies in which they were the main informants, the following stand out: parenting (71.43% of self-reports and 14.29% of caregiver-professional other-reports), behavior (13.51% of self-reports and 2.70% of caregiver-professional other-reports), psychopathology (100% of self-reports), aspects related to trauma and abuse (16.67% of self-reports and 5.56% of caregiver-professional other-reports), family relationships (40% of self-reports and 13.33% of caregiver-professional other-reports), and psychological well-being, quality of life, social support, and assessment of the foster care program (intervention). Issues such as psychosocial functioning, cognitive aspects, self-concept, and coping were omitted from the evaluation (Table 1).

For the analysis of the instruments used to evaluate each of the constructs, those that were not used by at least 2 research teams were excluded from Table 3. In the same way, constructs for which instruments with sufficient evidence were not found were excluded.

For parenting (a construct evaluated by 25 teams; see Figure 2), 19 different questionnaires specific to adults were used that were organized into 5 categories: parenting competences, stress, attitudes and beliefs, efficacy, and satisfaction, and coping. Of these 19 questionnaires, 4 were used by at least 2 research teams. For parental stress, the same questionnaire (Parenting Stress Index; Abidin, 1983, 1990, 1995, 1997, 2011, 2012) was used by all the research teams (n = 13) that evaluated this construct (Table 3).

Table 3

Instruments most Used for the Evaluation of Caregivers

Note. Those that were not used by at least 2 research teams were excluded from the analysis. In the same way, those constructs for which instruments with sufficient evidence have not been found were excluded.

For caregiver behavior (construct evaluated by 6 teams; see Figure 2), 5 different questionnaires specific to adults were used that were organized into 2 categories: general behavior and psychopathological behavior. Of these 5 questionnaires, only the Carer-Defined Problems Scale (Scott et al., 2001) was used by at least 2 research teams (Table 3).

For psychopathological aspects related to caregivers (a construct evaluated by 6 teams; see Figure 2), 6 different questionnaires specific to adults were used that were organized into 3 categories: general psychopathology, anxiety, and depression. Of these 6 questionnaires, only the Beck Depression Inventory (Beck et al., 1996; Beck et al., 1961) was used by at least 2 research teams (Table 3). For family relationships (a construct evaluated by 6 teams; see Figure 2), 6 different questionnaires specific to adults were used, of which 2 were used by at least 2 research teams (Table 3). Questionnaires with sufficient evidence and consensus were not found for trauma, attachment, psychological well-being, quality of life, intervention, or social support.


The continuous need to evaluate and understand the efficiency and effectiveness of services offered to children (Portwood et al., 2022) motivated this systematic review, whose objective was to identify appropriate instruments for evaluating foster care programs in extended, foster, and professional families because these families and the vulnerable children they serve deserve programs and services of the highest quality and effectiveness (Barth et al., 2022).

The information obtained makes it possible to determine which variables are included in these evaluations, how they are related to each other, who participates in these evaluations, and what types of instruments are used. The information also indicates which aspects are not being considered, which deserve reflection to improve evaluation models of foster care.

Children and Adolescents

These models usually assume that a central aspect of the evaluation of foster care is children and adolescents’ psychosocial well-being (Wakefield & Wildeman, 2022), highlighted by the fact that the main target population is children. However, their perspective is lost because in most cases adults report in their place. This happens for aspects of a more personal nature (psychosocial functioning, behavior, trauma, psychological well-being, and attachment) and for family issues (parenting and family relationships).

For practical reasons, it is more common to conduct evaluations with adults than with children in research on children because the participation of children requires adaptation of the design and methodology of these studies. Sensitive instruments are required at each evolutionary stage in addition to an adapted space, more time, accessible materials, and sometimes the presence of several evaluators. In addition, the involvement of children requires the consideration of ethical aspects and authorization from legal guardians, which is sometimes a limitation with regard to effective participation by children. However, children’s voice in these types of programs is important, as is the voice of those who are particularly excluded, such as children or adolescents with disabilities (Fox et al. 2000; Gale, 2019; George et al., 2003), although “[t]o date…youths’ perspectives are not well-integrated into data systems or performance evaluations” (Font & Gershoff, 2020, p. 18). Evaluations should allow children, through appropriate instruments, to directly report their personal and family experiences.

Foster Care Professionals and Technicians

The research also reveals the limited presence of evaluation models that regularly include technicians and professionals as informants as well as their total absence as a target population. In addition to the link or relationship that they can establish, their stability and availability are considered key in the process (Ridley et al., 2016). Although the participation of technicians and professionals is necessary, (1) it must be performed with awareness of the bias in professional reports on third parties, such as children, foster families, or families of origin, and (2) the participation of workers in evaluations should be understood within a learning culture that allows continuous improvement in the care provided to children and therefore should not be used in a punitive way to exclusively measure the performance of professionals. These types of evaluations are costly in time and resources, which is why it is important to properly balance these aspects so that evaluation does not become a “suboptimal approach for improving child well-being” (Font & Gershoff, 2020, p. 20).

Foster Families and Families of Origin

Families of origin were observed to be a limited presence in the analyzed studies. Similar to children, the voices of members of families of origin are insufficiently considered despite their roles as key actors in family reunification and as essential sources of knowledge regarding the needs, strengths, and protective factors of children (Slack et al., 2022). Consideration of the opinions and ideas of all agents directly involved in the foster care process (children, parents, relatives, and service professionals, including judges, politicians, and researchers) would undoubtedly allow a broader view of the aspects that may require improvement (Barth et al., 2022).

Regarding foster care modalities, foster care predominates over extended family care in studies. Less research has been found on new forms of foster care, such as professional or specialized types with exclusive dedication. Finally, despite the need to involve children to a greater extent, the objective is not to displace adults (professionals, foster families, and families of origin) from these evaluations but to balance all perspectives appropriately. Furthermore, it is necessary to distinguish between constructs for which multiple evaluations predominate and those for which they do not.

Psychological Functioning and Behavior

The constructs and areas of evaluation studied are diverse and, to a great extent, related to each other, as is typical of multi-trait analysis. In this sense, behavior, family relationships, parenting, attachment, and trauma are evaluated through multiple reports, where the perspectives of children as well as caregivers and professionals are considered. Likewise, for all the variables studied, the information provided by the children is complemented by caregivers and/or professionals.

Psychological functioning is the most frequently evaluated construct in both one-dimensional and multidimensional models. This is influenced by behavioral, cognitive, and parental aspects, among other factors. In this regard, the most frequently used questionnaire (Child Behavior Checklist) evaluates the psychopathological status of children and adolescents. Although it is true that children tend to experience greater mental health problems because of abuse in their family of origin and their separation (Engler et al., 2022), an evaluation that includes the child’s situation should be somewhat broader. Although a psychopathological (diagnostic) reference is useful to normalize the evaluation, it facilitates the stigmatization of the group. In this sense, an instrument such as the Strengths and Difficulties Questionnaire may be more appropriate because it has a broader approach and also focuses on strengths without losing the psychopathological perspective (Ortuño-Sierra et al., 2016).

Including the perspective of children, as well as their caregivers, should not exclude, as a complementary source of evaluation, other people involved in the activities of the child. An instrument that can be completed by technicians, teachers, or other people in contact with children is the Eyberg Child Inventory (Eyberg & Pincus, 1999; Eyberg & Ross, 1978), which has been used in various studies.

Psychological Well-Being, Quality of Life and Coping

Focusing on problems and deficits contributes to the stigmatization and labeling of children. Preferably, evaluations should also examine protective factors and resilience capacity (Ager et al., 2012). In this sense, quality of life and coping are measures that can reflect the quality of foster care and the care provided. However, the review showed that the evaluation of these aspects is not entirely adequate. Quality of life has been evaluated in a generic way by alluding to factors oriented toward physical well-being (e.g., the Health and Behavior Questionnaire) and using mostly other-reports. A more suitable alternative may be the Kidscreen (Ravens-Sieberer et al., 2010), a measure of quality of life that evaluates aspects of psychological, social, and physical well-being.

Regarding coping, the measures used (Vineland scales; Sparrow et al., 1984; Sparrow et al., 1993; Sparrow et al., 1989) are more oriented toward developmental problems and disability; therefore, measures that consider normal situations are more appropriate. For example, Kidcope (Spirito et al., 1988) is appropriate for most young people. In any case, it is highly important that assessment models include aspects related to the resilience and strengths of children and that they do not focus only on pathological aspects, deficiencies, or behavior problems (Ager et al., 2012).


Given that the care of children can be especially complex, parental competencies are a differential and highly relevant element in the well-being or stress of adults and in the psychosocial situation of children (Job et al., 2022). Thus, a measure of parenting is essential with regard to both parental competences and the (often complex) situations in which they must exercise them because these parents, due to the complex demands of the children they host, must often interact not only with the child protection system but also with the education and health systems and must learn to manage contact and visits with the child’s family of origin in an attempt to develop a positive and collaborative relationship with them (Bernedo et al., 2022). In this sense, it is important to know what parenting skills allow communication, connection, and support to be maintained between family members (Lietz et al., 2016).

Given that parenting measures are based on parents themselves, it is necessary to include children as informants. Therefore, it is necessary to emphasize the importance of choosing instruments that have versions for parents and caregivers as well as for children so that evaluations are as complete as possible. These types of measures are more common in domains such as psychosocial functioning and behavioral or psychopathological aspects, whereas they are nonexistent for parenting.

It would also be appropriate to identify other resources or capacities of the family, such as its adaptability and cohesion, which can be assessed with the Spanish version (Martínez-Pampliega et al., 2006) of the Family Adaptability and Cohesion Evaluation Scale II (FACES-II; Olson et al., 1982), as well as the stress perceived by parents in relation to care, a topic that was measured in studies included in this review through the use of the Parenting Stress Index (Abidin, 1983, 1990, 1995, 1997, 2011, 2012). Issues related to parenting skills are evaluated from the perspective of adult caregivers.

Intelligence and Self-Concept

Intelligence assessments are relevant if disabilities or special needs are detected but are less relevant for interventions in foster care if these needs are not detected. Self-concept, particularly cognitive aspects, tends to be evaluated with self-report questionnaires. Psychosocial functioning, behavioral and psychopathological aspects, and aspects related to trauma and coping tend to be evaluated using other-reports despite the existence of self-report versions for the instruments found in Table 2.

Comparability of the Results

An essential element is that the instruments applied to young people in foster care are not different from those used or capable of being used for young people who live outside the protection system. This factor has great advantages: it does not point out or stigmatize children and it is possible to reference normative values of the general population if instruments are used with adequate methodological rigor. The existence of common elements between evaluations allows comparisons of family care interventions with a view toward improving the process (Dickes, 2018). This also makes it easier to comprehensively assess factors that mediate care outcomes, including quality of care, that might otherwise go unnoticed (Font & Gershoff, 2020).

The promotion of a more solid empirical base to improve the protection of vulnerable children requires holistic and comprehensive evaluations (Suh & Holmes, 2022) that complement the most objective data with the experiences of the protagonists of the processes (Barth et al., 2022) as well as program development, context-appropriate methodologies capable of evaluating the scalability of the intervention and longitudinal designs (Job et al., 2022) to explore the trajectories of the children. In addition, future programs would benefit from system-wide data confluence and international comparisons, research that emphasizes coping and resilience mechanisms, and the participation of children in monitoring and evaluation (Ager et al., 2012).

Cost Effectiveness

In addition to methodological aspects, it is important to determine the most practical and flexible evaluation models to guarantee that an evaluation is incorporated into the intervention process. In this regard, instruments of the highest scientific quality that are standardized and widely available should be used, such as the Child Behavior Checklist (Achenbach & Rescorla, 2001), the SDQ (Goodman, 1997), and the Parenting Stress Index (Abidin, 1983, 1990, 1995, 1997, 2011, 2012). Additionally, instruments that require less time and effort in their application should be used because long-term follow-ups of foster care are needed (Job et al., 2022). This is an advantage of, for example, instruments such as the SDQ, the Parental Reflective Functioning Questionnaire, and the Parenting Scale. For constructs such as psychosocial functioning, the most commonly used questionnaire, the Child Behavior Checklist (Achenbach & Rescorla, 2001), has a larger number of items than other questionnaires in the same category, such as the SDQ (Goodman, 1997) (Table 2).

Implications for Practice

First, in foster care, as in other areas, there should be a trend toward multi-trait assessment models (a diversity of constructs), multiple methods (self-reports, other-reports, standardized questionnaires, interviews, etc.), and multiple behavior assessments (evaluation of thoughts, emotions, motivations, etc.).

Second, these evaluation models must be practical (brief, easy to apply, and correct) to guarantee their generalized and regular use. They should facilitate comparability by selecting widely used instruments and must have sufficient scientific guarantees to draw reliable conclusions.

Third, regarding the evaluated constructs, psychosocial functioning is a cross-sectional feature in both the evaluations and the central aspects of the care and life of children in foster care, as are behavior and parenting. For this reason, evaluation models should include at least these 3 constructs. The most practical, generalized, reliable, and multi-informative questionnaires available should be selected, such as the SDQ and the Eyberg Child Behavior Inventory. Despite the existence of good questionnaires for parenting, such as the Parenting Stress Index, it is necessary to identify or develop instruments that incorporate the perspective of children.

Fourth, attention should be given to constructs that have not been sufficiently evaluated, such as coping strategies, which are directly related to resilience and allow a focus on the strengths of everyone involved to guide interventions. Likewise, a broader evaluation of quality of life should be conducted that includes psychosocial aspects and not only medical aspects.

Fifth, every effort should be made to extend these evaluation models to all types of foster care (in the extended family, in a foster family and specialized) and to all the actors involved (families of origin, foster families, children, and professionals).

Sixth, although it is appropriate for children to be the focus of evaluations (as a target population), it is necessary to make their participation effective by involving them as informants on a greater variety of issues.

Limitations and Future Research Directions

Some limitations of this study are the lack of studies that include professionals and families of origin, the lack of studies evaluating professional or specialized foster care programs, the predominance of foster care through foster families compared to other modalities, and the predominance of Anglo-Saxon studies and a relative lack of studies from other countries with different intervention models.

Another limitation is related to the assessment instruments. The choice of validated instruments with a specific study of reliability and validity excludes from this systematic review other instruments with a simple descriptive objective or “ad hoc” instruments without a psychometric study.

Although there are many questions to be answered in the field of foster care and in the evaluation of these programs, the high number of studies included in this review with robust methodologies (randomized, longitudinal, and with comparison groups) is striking. This shows that there is high-quality research in this field, laying a foundation to respond to challenges, such as the need to reduce the high dispersion of methodologies in this field and the move toward intervention models based on evidence and comparability of the results without losing the specialization of services and individualized attention to each case.

The assessment of these types of interventions should be founded on standard assessment instruments. The use of instruments with adequate scientific characteristics allows the comparison of different foster care programs and the use of normative scores for these groups. Additionally, a regular use in the general population could lead to the establishment of comparisons between foster care programs and the conventional care of children and adolescents.

Conflict of Interest

The authors of this article declare no conflict of interest.

Cite this article as: Vallejo-Slocker, L., Idoiaga-Mondragon, N., Axpe, I., Willi, R., Guerra-Rodríguez, M., Montserrat, C., and Valle, J. F. D. (2023). Systematic Review of the Evaluation of Foster Care Programs. Psychosocial Intervention , Ahead of print. Correspondence: (L. Vallejo-Slocker)

