BLOG

 

Breaking Down Barriers: Examining the Potential of AI and LLMs in Making Therapy More Accessible

By Dr. Alyson Carr, LMHC & Eric Singer

The Rise of AI and LLMs and Their Implications for Therapy and Mental Healthcare

At the time of writing, at the start of Q2 ‘23, artificial intelligence (AI) is rapidly changing the way we communicate, learn, work, and play. Most of the tech-adjacent public in the United States has probably heard about ChatGPT - either from a local TV news segment, reading something online or hearing about it on their favorite podcast, interacting directly with ChatGPT themselves, or learning about it from a relative who likes to use it to help them write ‘funny’ poems with quirky substitutions and stylistic instructions. And this sudden ubiquity shouldn’t come as a surprise: according to a February report by UBS analysts, ChatGPT may very well be the fastest growing consumer application in history, amassing a stunning 100M users in just 2 months.

ChatGPT represents an example of one of the most promising and exciting developments in AI in recent years: the emergence of large language models (LLMs), which are deep-learning algorithms that can generate natural language responses after being trained on massive amounts of data. LLMs can perform various tasks across domains and languages, such as writing, translating, summarizing, coding, or conversing, with the current zeitgeist fixated on their potential to revolutionize how people perform their work and how it is already beginning to challenge our long established thinking as a culture about ways of working. 

Take a look at what Dr. Jim Fan, AI Research scientist at NVIDIA, had to say recently on Twitter, citing work from OpenAI and UPenn (research source):

In his thread, Dr. Fan elaborates on the figures in his first tweet (above): the authors of the paper conclude that some job types, when “using the described LLM via ChatGPT…can reduce the time required to complete the DWA [Detailed Work Activity] or task by at least half (50%)” [emphasis ours]. See the excerpted table from the paper below.

Fig 1.1 - Table, below, is an excerpt from the working paper of Eloundou et al, March 27, 2023: “GPTs are GPTs” which Dr. Fan references in his thread.

Okay, so at this point, you’re probably willing to grant us the premise: ‘AI is here, a lot of people are excited about it, and it has the potential to change how millions of people do work in some appreciable capacity.But what if LLMs could also revolutionize how we approach mental healthcare (MH)? What if LLMs could act as virtual mental health counselors, personality assessors, or reasoning enhancers? And what are the benefits and risks of using LLMs for mental healthcare? 

If you’ve read this far and noticed the title of this post, our aim is to explore some of these questions and more. We discuss how LLMs could be utilized to enhance therapy and mental health care in various ways, review some of the existing studies and systems that have used or interacted with LLMs for therapy or mental health care, and finally suggest some future research directions and challenges for improving the quality and reliability of LLMs for therapeutic purposes.

A Review of Relevant Current Research and Systems on the use of LLMs for MH 

One potential application of LLMs in therapeutic domains is to serve as virtual mental health counselors. Sufficiently sophisticated LLMs could offer empathetic and supportive feedback, as well as evidence-based interventions and resources for various psychological challenges - and they are already being implemented in practice. 

For instance, a virtual mental health counselor named Serena employs a natural language processing model trained on thousands of person-centered counseling sessions from licensed psychologists. Supposedly, Serena can engage with users about their affective, cognitive, and behavioral patterns, and facilitate their emotional exploration and goal-setting. Serena can also recommend relevant articles, videos, podcasts, or apps for users to enhance their knowledge and skills regarding their psychological issues. According to their website, users have expressed positive feedback and satisfaction with Serena’s service; but she has her limitations. In the company’s FAQ they’re careful to disclaim as much, indicating that Serena is not intended for, and never will be, “address[ing] acute or emergent mental health issues”:

Fig 1.2 - Excerpt from serena.chat’s FAQ

Another LLM that has shown remarkable capabilities in therapeutic contexts is familiar - ChatGPT, developed by OpenAI. ChatGPT can generate natural language responses based on user input, mood, and progress in therapy. These types of natural language responses could be beneficial and feel supportive to users who seek therapy or intervention online or offline. A study by Rao et al. (2023) evaluated ChatGPT’s effectiveness in therapy and found that it resulted in improved user satisfaction, symptom reduction, mood improvement, and progress tracking.

However, despite these early findings concerning ChatGPT’s effectiveness in therapeutic contexts still being very recent and in pre-print access on arXiv, researchers are already beginning to publish their findings for OpenAI’s newest, most sophisticated LLM, GPT-4, which was recently released to the public via a paid tier of the otherwise-free ChatGPT interface, called ChatGPT Plus, and via waitlist access

GPT-4 is truly a generational leap forward in capability and scale as compared to the previous version which powered ChatGPT (GPT-3/3.5) - there are numerous studies that illuminate the gulf between versions, but it is simply easier to see for yourself. The latest model of GPT-4 was trained using an unprecedented scale of compute and data and can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting (Kosinski, 2023). Moreover, GPT-4 has demonstrated signs of artificial general intelligence (AGI), such as the ability to impute unobservable mental states to others (theory of mind), which is central to human social interactions, communication, empathy, self-consciousness, and morality (Kosinski, 2023). GPT-4 also has a safety reward mechanism that reduces harmful outputs and assists in  producing ethical responses. As such, the potential applications and implications for therapy and mental health care should be seriously and responsibly explored. 

Going Spelunking with the Mad Hatter: Emerging LLM Capabilities and Their Implications for Future Applications in Therapy and MH

A study by Huang et al. (2022) showed that LLMs can self-improve by generating high-confidence rationale-augmented answers for unlabeled questions using chain-of-thought prompting and self-consistency. This could imply that LLMs like GPT-4 could learn to provide more accurate and relevant responses for therapy and mental health care without extensive supervision. The authors suggested that this approach could improve the general reasoning ability of LLMs and achieve state-of-the-art-level performance on various tasks. Similarly, a study by Wei et al. (2023) showed that LLMs can self-improve by generating creative content for unlabeled prompts using chain-of-thought prompting and self-evaluation. This could indicate that LLMs like GPT-4 could learn to provide more diverse and engaging responses for therapy and mental health care without external feedback. The authors demonstrated that this approach could improve the creativity and quality of LLMs and achieve competitive performance on various benchmarks. Another study by Shinn et al. (2023) proposed Reflexion, an approach that endows an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities. This could suggest that LLMs like GPT-4 could learn to provide more optimal and goal-oriented responses for therapy and mental health care by utilizing success detection cues to improve their behavior over long trajectories. The authors showed that this approach could improve the decision-making and problem-solving ability of LLMs and enable them to complete decision-making tasks in AlfWorld environments and knowledge-intensive, search-based question-and-answer tasks in HotPotQA environments. 

These studies demonstrate the potential of LLMs to self-improve by using various techniques and achieve impressive results on a number of tasks. However, none of these studies address the challenge of solving complex AI tasks that span different domains and modalities, which is an important step toward advanced artificial intelligence. A recent paper by Shen et al. (2023) proposes a novel system called HuggingGPT that uses ChatGPT to connect various AI models in Hugging Face to solve complex AI tasks across different domains and modalities. The paper claims that HuggingGPT can leverage the language understanding, generation, interaction, and reasoning abilities of ChatGPT and the expertise of hundreds of AI models in Hugging Face to handle tasks in language, vision, speech, video, and cross-modality. The paper also reports impressive results on several challenging AI tasks such as image captioning, text summarization, text-to-speech, text-to-video, and more, demonstrating the potential of HuggingGPT for advancing artificial intelligence. This paper suggests a new way of designing general AI systems that can handle complicated AI tasks by combining the strengths of LLMs and expert models. This could imply that LLMs like GPT-4 could learn to adapt to new tasks and domains without forgetting previous knowledge or requiring retraining. 

Regardless of these impressive results, GPT-4 is still far from being a true artificial general intelligence (AGI) system and faces many limitations and challenges in generating natural language responses in any context. For example, GPT-4 may not always be factual, precise, reliable, coherent, consistent, or sensitive in its responses (Bubeck et al., 2023). This is because GPT-4 relies on statistical patterns learned from large-scale text corpora, which may not reflect the reality, logic, or norms of human communication. GPT-4 may also generate responses that are inaccurate or imprecise due to the limitations of natural language processing (Bubeck et al., 2023). For instance, GPT-4 may struggle with ambiguity, anaphora, negation, or common-sense reasoning. GPT-4 may also face ethical, social, legal, and professional issues such as privacy, consent, confidentiality, transparency, accountability, fairness, safety, security, and ethics (Bubeck et al., 2023). These issues may arise from the data sources, methods, applications, or impacts of GPT-4 and its responses. 

[Author’s Note: We must emphatically caution once more that responsible and ethical implementation of AI in (mental) healthcare contexts requires researchers, practitioners, and the population of its affected society, generally, to be aware of the strengths and weaknesses of technologies like GPT-4, LLMs like it, and their responses. AI researcher Dr. Károly Zsolnai-Fehér, of popular Youtube channel Two Minute Papers has a saying when reviewing recent developments in AI research: “The First Law of Papers says that research is a process - do not look at where we are, look at where we will be two more papers down the line.”]

In addition to the examples discussed so far - ChatGPT, Serena, GPT- 4, and HuggingGPT - other noteworthy LLMs developed by different organizations and researchers include BLOOM by BigScience and LaMDA, by Google. These LLMs differ from ChatGPT in their size, language, data, tasks, performance, and ethics as follows:

  • To reiterate, GPT-4 is the largest LLM to date with 176 billion parameters. It can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more. It also has signs of artificial general intelligence (AGI), such as the ability to impute unobservable mental states to others (theory of mind), which is central to human social interactions and communication.

  • BLOOM is a multilingual LLM that can perform various tasks across languages and domains. It is designed to be as transparent as possible, with researchers sharing details about the data it was trained on, the challenges in its development, and the way they evaluated its performance. It aims to democratize access to cutting-edge AI technology for researchers around the world.

  • LaMDA is a conversational LLM that can engage in open-ended dialogue on any topic. It can generate coherent and relevant responses that maintain the context and flow of the conversation. Like GPT-4, it also has a safety layer that reduces harmful outputs and assists in producing ethical responses.

These LLMs could also have benefits and challenges for therapy and mental health care. Examples include:

  • GPT-4 could be more empathetic and adaptive to users’ needs and preferences due to its general intelligence and theory of mind. However, it could also be more unpredictable and unreliable due to its lack of supervision and guidance.

  • BLOOM could be more accessible and trustworthy for users who need therapy in different languages due to its transparency and multilingualism. However, it could also face more difficulties and risks in ensuring data quality and privacy due to its open-access and collaborative nature.

  • LaMDA could be more engaging and interactive for users who need social support due to its conversational skills and flexibility. However, it could also be more prone to misinformation or manipulation due to its dependence on external sources and services.

The Road Ahead: Research Directions and Challenges for Enhancing LLMs for Therapeutic Purposes

An example of a relevant data source that could be helpful for overcoming some of the challenges for training LLMs to be more factual, precise, reliable, coherent, consistent, and sensitive is the Psychotherapy Corpus which is a collection of over 2,500 transcripts of psychotherapy sessions from different modalities, such as Cognitive Behavioral therapy (CBT), Psychodynamic therapy, and Humanistic therapy. The Psychotherapy Corpus also includes annotations of therapist and client utterances, such as speech acts, emotions, topics, and outcomes. This data source could be a useful tool for training LLMs to generate natural language responses that are appropriate for different therapeutic contexts, tasks, and domains.

While the Psychotherapy Corpus provides relevant data to improve LLMs for therapeutic purposes, there are many other evidence-based treatment approaches that are not integrated in the delivery of CBT, Psychodynamic, and Humanistic frameworks. Therefore, there are a number of theories, modalities, approaches, and interventions that may not be captured by the current Psychotherapy Corpus data source. For example, using the current Psychotherapy Corpus data source, if a user asked ChatGPT, “What can I do to manage my anxiety?”, a possible response could be, “Exercise for 30 minutes a day.” Although the body of literature indicates exercise releases neurotransmitters that are helpful for decreasing anxiety (Laibstain & Carek, 2011), this response would align with the theory that underpins CBT, which is “if we change our thoughts (cognitions), we can change our behavior. If we change our behavior, we can change our thoughts.” While the response of “exercise for 30 minutes a day,” could be facilitative for some users, it may not provide enough therapeutic engagement or accuracy for others. 

The level of care required by the user may determine the level of nuance a LLM needs to apply in order to produce quality, accurate, and reliable responses. The level of nuance could inform the progression of inquiry for a LLM. Using the example, “What can I do to manage my anxiety?”, the progression of inquiry could produce a question such as, “When did this anxiety start?” Depending on the users’ answer, ChatGPT could provide more sensitive and accurate guidance, and/or ask additional questions. If a user answers, “My anxiety began 3 weeks ago when I started applying to medical schools,” this user could get feedback that is tailored to their specific presentation of situational/circumstantial anxiety symptoms. Whereas a user who responds to a question about duration of anxiety symptoms with a statement such as, “I've felt anxious for my entire life,” would receive different feedback including possible questions like, “How did you see your primary caregivers respond to stress when you were growing up?” It is through this nuanced progression of inquiry that LLMs and AI may be able to deliver more meaningful and relevant therapeutic responses. Improving and training LLMs and AI for therapeutic purposes is analogous to training a puppy: just as puppies need consistent reinforcement and positive feedback to learn new behaviors, LLMs need consistent exposure to relevant data sources in order to generate more appropriate responses.

According to the National Institute for the Clinical Application of Behavioral Medicine, (2022) bottom-up approaches focus on raw emotions and defense systems by working with clients’ to regulate and be attuned with their bodies. Meanwhile, top-down approaches focus on shifting the way a client thinks. In terms of treatment approaches that are not CBT (a top-down approach), psychodynamic, or humanistic (which are modalities captured by the Psychotherapy Corpus data set), a clinician who is certified in a bottom-up approach such as Eye Movement Desensitization and Reprocessing (EMDR) may respond to the question, “What can I do to manage my anxiety?”, with additional questions such as, “What is the negative belief you have about yourself when you feel anxiety?”, “What is the positive belief you would like to have about yourself when you feel anxiety?”, or, “If you float back… can you think of a time you noticed feeling similar to how you are feeling now?” Further, a Solution Focused practitioner may respond with what is referred to as the “miracle question” (a common Solution Focused Therapy intervention), i.e., "If you were to go to sleep… and in the middle of the night, a miracle happened, and you woke up tomorrow with all of your current problems removed, what would that look like?" A clinician trained in Sensorimotor Psychotherapy may ask "where do you feel anxiety in your body? What is happening inside that is telling you this is the feeling of being ‘anxious’? What sensations are you noticing?” And, a practitioner who is trained in Internal Family Systems (IFS, often referred to as “parts work”) could respond to the question “What can I do to manage my anxiety?” with additional questions like, "What ‘part’ of you is online right now? How old is this part of you that is feeling anxious? How is this part trying to protect you and what does this part of you need to feel safe?"

Challenges and Risks of Applying LLMs and AI to Therapy and Mental Health Care

While AI and LLMs are remarkable achievements in AI research, they also raise many questions and concerns about their implications and applications for human society, especially for therapy and mental health care. Some of the ethical, social, legal, and professional issues that arise from using LLMs and AI for therapy include:

  • Data quality and privacy: How can we ensure that the data used to train and evaluate LLMs are accurate, relevant, diverse, representative, and secure? How can we protect the sensitive information of users and therapists from unauthorized access or misuse? Data is the fuel that powers LLMs and AI, but it can also be the source of many problems. If the data is inaccurate, irrelevant, biased, or incomplete, it can affect the quality and reliability of the LLMs’ outputs. For example, if the data contains errors or inconsistencies, the LLMs may generate incorrect or misleading responses. If the data is skewed or unrepresentative, the LLMs may favor or exclude certain groups or individuals based on their characteristics. If the data is outdated or incomplete, the LLMs may miss or ignore important information or perspectives. Moreover, if the data is not secure or private, it can expose the users and therapists to potential harms. For example, if the data is hacked or leaked, it can reveal personal or confidential information about the users or therapists that could be used for malicious purposes. If the data is shared or sold without consent, it can violate the rights and interests of the users or therapists who provided it. Therefore, data quality and privacy are crucial issues that need to be addressed when using AI and LLMs for therapy and mental health care.

  • Toxicity and bias: How can we prevent or reduce the harmful outputs of LLMs such as racist or sexist language? How can we detect or correct the biases of LLMs that may favor or exclude certain groups or individuals based on their characteristics? Toxicity and bias are two sides of the same coin that can undermine the trust and respect between users and therapists. Toxicity refers to the offensive or harmful language that LLMs may generate due to their exposure to negative or inappropriate data. For example, LLMs may use racist or sexist terms, insults, threats, or profanity that could hurt or offend users or therapists. Bias refers to the unfair or unequal treatment that LLMs may exhibit due to their learning from skewed or unrepresentative data. For example, LLMs may show preference or prejudice towards certain groups or individuals based on their race, gender, age, religion, etc. that could discriminate or exclude users or therapists. Therefore, toxicity and bias are serious issues that need to be prevented or reduced when using AI and LLMs for therapy and mental health care.

  • Reliability and consistency: How can we ensure that LLMs provide accurate and relevant responses that match the user’s input, mood, and progress in therapy? How can we ensure that LLMs maintain a coherent and logical flow of conversation that follows the user’s context and expectations? Reliability and consistency are essential factors that influence the effectiveness and satisfaction of users and therapists. Reliability refers to the accuracy and relevance of LLMs’ responses that reflect their understanding and interpretation of user input, mood, and progress in therapy. For example, LLMs should provide correct and helpful information or advice that align with user needs and preferences. Consistency refers to the coherence and logic of LLMs’ responses that maintain a smooth and natural flow of conversation that follows user context and expectations. For example, LLMs should provide clear and concise responses that connect to a users’ previous and current messages. Therefore, reliability and consistency are important issues that need to be ensured when using LLMs and AI for therapy and mental health care.

  • Ethical and social implications: How can we ensure that LLMs respect the values, principles, and responsibilities of ethical and professional practice? How can we ensure that LLMs enhance rather than replace the human role and relationship in therapy and mental health care? Ethical and social implications are complex and multifaceted issues that affect the outcomes and impacts of using AI and LLMs for therapy and mental health care. Ethical implications refer to the moral dilemmas or conflicts that arise from using LLMs that challenge the values, principles, and responsibilities of ethical and professional practice. For example, LLMs may pose questions such as: Should LLMs disclose their identity as non-human agents? Should LLMs obtain informed consent from users? Should LLMs report cases of abuse or harm? Should LLMs adhere to codes of ethics or standards of practice? Social implications refer to the social consequences or changes that result from these interactions. Therefore, ethical considerations and social implications are critical factors that require further evaluation.

In this post, we have explored the potential of AI and LLMs to transform therapy and mental health care in various ways. We have examined some of the current applications and studies that have leveraged these technologies for therapeutic purposes; we have also identified some of the key research questions and challenges that need to be addressed to improve the quality and reliability of LLMs for therapy.

We believe that AI and LLMs have the capacity to revolutionize therapy and mental health care by offering personalized and effective interventions and outcomes for patients. However, we also recognize that there are significant limitations and risks that need to be overcome before these technologies can be fully integrated into therapeutic practice. By advancing evidence-supported methodologies, collaborating with human experts, and evaluating our systems in realistic settings, we can work toward creating a therapeutic experience that is not only efficient and accessible, but also ethical and trustworthy for patients seeking mental health care services and treatment.

Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., ... & Zhang, Y. (2023). Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.

Buczynski, Ruth. (2022). Infographic: Brain-Based Approaches to Help Clients After Trauma. NICABM. https://www.nicabm.com/brain-based-approaches-to-help-clients-after-trauma/

Carek, P. J., Laibstain, S. E., & Carek, S. M. (2011). Exercise for the Treatment of Depression and Anxiety. The International Journal of Psychiatry in Medicine, 41(1), 15–28. https://doi.org/10.2190/PM.41.1.c

Kosinski, M. (2023). Theory of mind may have spontaneously emerged in large language models. arXiv preprint arXiv:2302.02083.

Rao, H., Leung, C., & Miao, C. (2023). Can ChatGPT Assess Human Personalities? A General Evaluation Framework. arXiv preprint arXiv:2303.01248.

Shen, Y., Song, K., Tan, X., Li, D., Lu, W., & Zhuang, Y. (2023). HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace. arXiv preprint arXiv:2303.17580.

Shinn, N., Labash, B., & Gopinath, A. (2023). Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.

Alyson CarrComment