Artificial Intelligence (AI) Learning Centre

Last Updated: October 3, 2024

No Results Found 0/0

A major AI revolution is here. Tools for all areas of primary care provision are available or in development, ranging from clinical workflow, diagnosis, management, treatment, and beyond.    We need to become literate in this new field, to distinguish between truth and hype and critically evaluate products and services.  

The aims of the AI Learning Centre are to: 

  • Provide trusted, practical information  to help Ontario’s primary care practitioners make informed decisions about using AI in clinical practice.
  • Promote complementary resources and partner with other trusted organizations to align messaging and reduce noise in the system. 
  • Monitor the space where AI meets primary care in Ontario, keeping practitioners updated with critical information and insights as they develop. 
  • Fill information gaps, demystify concepts, and debunk misconceptions where possible.  

Key messages

Expect change and embrace experimentation

Innovation cycles are faster for AI tools, and machine learning tools improve independently and with the help of developers and consumers. Capabilities, limitations, risks, and mitigation strategies continue to evolve. See AI Fundamentals for Practitioners to keep up-to-date with tech changes. 

Take a balanced approach

Understanding the limitations and potential risks of AI tools is crucial. Not all aspects of care will benefit from AI, and traditional methods may still be preferable in many situations. See AI Tools in Primary care to stay informed about accuracy, biases, and appropriate use cases for each AI tool you employ.

Ensure professional obligations are met

While AI is expanding into primary care in new ways, practitioner responsibilities remain the same: accuracy and accountability of the health record, protecting privacy and security of patient health data, and adhering to requirements for informed patient consent. See AI Tools in Primary care and Ethical and Regulatory Landscape to keep up to date on guidance related to AI.

 

AI Fundamentals for Practitioners New

Jump to:

For quick definitions of AI terms, hover-over terms with an “i” icon. For longer definitions and additional context, see our AI Glossary.

Introduction

Recent advances in artificial intelligence (AI) are driving a boom in AI-assisted tools, particularly in the healthcare sector. This surge is reshaping the digital health landscape, offering new possibilities and challenges for clinicians and healthcare providers.

Technological leap
Click to view
AI in the digital health spectrum

In the realm of digital health, algorithms Algorithms define the logic by which an AI model operates. have long been integral to tools supporting clinical decision-making. The integration of AI brings both opportunities and challenges: 

  • AI tools offer advanced capabilities and adaptability, pushing the boundaries of what’s possible in healthcare technology. However, they also present unique challenges, particularly around transparency and potential biases. 
  • In contrast, traditional non-AI digital health tools provide more predictable and transparent performance but lack the flexibility and depth of AI solutions. 

As we explore the fundamentals of AI for clinicians, it’s crucial to understand these technologies, their potential applications, and their limitations in healthcare settings.

Basic AI concepts

Jump to:
Machine Learning

Machine Learning (ML) is a subset of artificial intelligence rooted in training machines or programs on existing data. Once training is complete, the machine or program can apply what it learned to new, unseen data to identify patterns, make predictions, or execute tasks.

Explainability, interpretability, and the challenge of the “black box” 

Interpretability is a central concern for applications powered by machine learning. Many machine learning models are “black boxes”, with inner workings so complex that it is impossible for algorithm designers, engineers, and users to know precisely how the model came to a specific result. 

While related, interpretability and explainability are distinct concepts in machine learning. 

  • Interpretability is the extent to which users can grasp the reasoning behind an algorithm’s decision. It measures how accurately an AI’s output can be predicted by human users.  
  • Explainability, also called explainable AI (XAI), is the collection of processes and techniques designed to help human users understand and have confidence in the results produced by machine learning algorithms. 

Not all machine learning algorithms are black boxes – for example, decision tree algorithms follow a branching sequence of interconnected decisions, which can be visually represented as a tree diagram and are more straightforward to audit.

However, even interpretable algorithms can become black boxes due to complexity and scale. For instance, in the context of deeply complex prediction models for chronic health conditions.  

Example: ChatGPT (OpenAI) is a black box

Training data has not been disclosed and there is no functionality to explain how a given result was produced.

Natural Language Processing (NLP) and Large Language Models (LLMs)

Natural Language Processing (NLP) and Large Language Models (LLMs) are two interrelated areas of artificial intelligence that are revolutionizing how computers understand and generate human language.  

NLP is a branch of AI that focuses on the interaction between computers and human language. It enables machines to understand, interpret, and generate human language in a valuable way. 

LLMs are a type of generative AI Generative AI, or GenAI, can produce original content based on a user’s prompt or request, such as text, images, audio, or video. model trained on vast amounts of data. They can produce original content based on a user’s prompt or request such as text, images, audio, or video that uses Natural Language Processing (NLP). 

Natural Language Processing (NLP)
  • Handle administrative tasks
  • Uncover hidden patterns, trends, and relationships in text data
  • Automate processing and organization of large volumes of unstructured text data
  • Build a knowledge base for swift retrieval of organizational information
  • Skew results based on biases AI biases are the result from human biases captured in training data or algorithms, that will be reflected in responses and may be overt or subtle and difficult to detect. present in
    training data
    Training data is used by machine learning systems to learn how to recognize patterns and generate writing. The quality of training data, plus additional learning, directly impacts the quality of system outputs.
  • May struggle with terms, accents, dialects, or ways of speaking not part of its training data
  • May struggle with new words and changing grammar conventions
  • Often misses nuances in tone, stress, sarcasm, and body language, complicating semantic analysis

Large Language Models (LLMs)
  • Analyse and infer context across different data types  
  • Produce realistic and persuasive outputs.   
  • Generate original content.   
  • Learn from corrections of human users and trainers   
  • Improve and optimize performance over time
  • Struggle with concepts not part of its
    training data
    Training data is used by machine learning systems to learn how to recognize patterns and generate writing. The quality of training data, plus additional learning, directly impacts the quality of system outputs.
  • Hallucinate Hallucinations are specific to LLMs and refer to when the model’s response to a prompt is nonsensical or inaccurate. Hallucinations can be difficult or impossible to detect if the subject is outside the prompter’s realm of expertise. and invent information
  • Parrot  biases AI biases are the result from human biases captured in training data or algorithms, that will be reflected in responses and may be overt or subtle and difficult to detect. present in training data
  • Persuasive tone can invite over-trust and over-reliance
  • Not designed to know if outputs are accurate or inaccurate
  • Skill sets are generalized rather than expert
Machine learning (ML) classification 

Classification algorithms Algorithms define the logic by which an AI model operates. in machine learning Machine Learning (ML) is a subset of artificial intelligence rooted in training machines or programs on existing data. Once training is complete, the machine or program can apply what it learned to new, unseen data to identify patterns, make predictions, or execute tasks. aim to categorize data accurately. They identify patterns in existing data to determine how to label or define previously unseen examples. 

Details
  • Efficiently process, analyze and classify large volumes of data
  • Identify complex, non-obvious patterns in datasets
  • Can adapt to various data types
  • Requires substantial, high quality training data
    Training data is used by machine learning systems to learn how to recognize patterns and generate writing. The quality of training data, plus additional learning, directly impacts the quality of system outputs.
    to perform effectively
  • Analysis affected by biases AI biases are the result from human biases captured in training data or algorithms, that will be reflected in responses and may be overt or subtle and difficult to detect. or underrepresented categories in training data
  • Can be misled by irrelevant or “noisy” data points
  • Can focus too much on specific training examples rather than general principles
Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) systems convert human speech into text through a highly complex process involving linguistics, mathematics, and statistics. The performance of speech recognition technology is measured by its word error rate (WER) Word error rate (WER) is the percentage of words inaccurately translated by the ASR system  and speed. Factors like pronunciation, accent, pitch, volume, and background noise can affect ASR systems’ performance. 

Details
  • Capture, recognize, and interpret voices
  • Identify different people in conversations
  • Follow conversational patterns
  • Learn from corrections of human users and trainers
  • Improve and optimize performance over time

 

  • Susceptible to environmental conditions – placement of microphones, background noise 
  • May struggle with terms, accents, dialects, or ways of speaking not part of its training data
    Training data is used by machine learning systems to learn how to recognize patterns and generate writing. The quality of training data, plus additional learning, directly impacts the quality of system outputs. For ASR, training data that contains varied voices, accents, conversation styles, background noise levels, etc., will improve the model’s performance.
    to perform effectively
  • May struggle with fragmented and non-linear conversations  
Optical character recognition (OCR)

Using automated data extraction Automated data extraction transforms unstructured or semi-structured data into structured information through an automated process. ,optical character recognition (OCR) converts images of text into a format readable by a computer. It identifies individual letters in an image, assembles them into words, and then forms those words into sentences. Using both hardware and software, OCR systems transform physical, printed documents into text that a computer can read.  

Details
  • Processes scanned documents, camera images, and image-only PDFs.
  • Recognizes and organizes letters into words and sentences.
  • Allows access to and editing of the original content, saving time and reducing redundancy of manual data entry.
  • Advanced systems can extract information in challenging conditions like unusual fonts, low resolution, poor lighting, and varied colours and backgrounds.
  • When powered by generative AI Generative AI (or GenAI) produces original content based on a user’s prompt or request, such as text, images, audio, or video. , advanced OCR systems can assist in structuring document data even more quickly.
  • Accuracy depends on the quality of the original image.
  • Can be limited by the complexity and formatting of documents.
  • Requires significant computing power for large volumes of data.
  • May need additional software to integrate with other systems.
  • May struggle with handwriting or poorly printed text. For example, difficulty in interpreting handwritten notes and medical records due to handwriting variability can cause errors or incomplete data extraction, affecting the accuracy and reliability of digitized medical information.
Retrieval Augmented Generation (RAG)

Retrieval-Augmented Generation enhances LLM-generated responses by incorporating external sources of knowledge, enriching the model’s internal understanding. This approach allows the system to access the most current and reliable data while providing users with source references to verify the accuracy of its claims. 

Details
  • Improves the accuracy of responses by integrating real-time, external data sources, ensuring the information is current and reliable and mitigating limitations of training data’s fixed cutoff date. 
  • Can generate answers that are more contextually appropriate and aligned with the user’s query by grounding responses in specific, relevant documents. 
  • Provides traceable references for the information used in its responses, allowing users to verify the accuracy and reliability of the content. 
  • The quality of RAG outputs heavily depends on the reliability and relevance of the external sources it accesses. Poor-quality data can lead to inaccurate or misleading responses. 
  • Implementing RAG requires significant computational resources and can be complex to integrate with existing LLMs, potentially increasing latency and costs. 
  • While RAG enhances response accuracy by grounding in external data, it may struggle with generating insights or inferences that go beyond the explicit content of the retrieved documents. 
  • The model may rely too heavily on the retrieved documents, which could limit its ability to generate more creative or generalized responses. 
Computer vision

Computer vision is a broad discipline that employs machine learning Machine Learning (ML) is a subset of artificial intelligence rooted in training machines or programs on existing data. Once training is complete, the machine or program can apply what it learned to new, unseen data to identify patterns, make predictions, or execute tasks. and neural networks Neural networks are complex, layered algorithms that mimic the structure of human brains. Nodes in neural networks perform calculations on numbers passed between them on connective pathways. to enable computers and systems to extract information from digital images, videos, and other visual inputs. It allows these systems to make recommendations or take actions based on their visual analysis.

Details
  • Extracts meaningful information from pictures and videos.
  • Spots and names objects in visual content.
  • Identifies specific individuals in images or video.
  • Detects irregularities in visual inputs.
  • Needs a large dataset of correctly labeled examples to learn effectively.
  • Susceptible to environmental conditions, with performance potentially affected by changes in lighting, viewing angles, or objects that are partially hidden.
  • Has difficulty understanding broader context beyond what is visually ‘present’
  • Requires substantial computing power to process information and learn.
  • May reflect biases AI biases are the result from human biases captured in training data or algorithms, that will be reflected in responses and may be overt or subtle and difficult to detect. present in its training data Training data is used by machine learning systems to learn how to recognize patterns and generate writing. The quality of training data, plus additional learning, directly impacts the quality of system outputs. , potentially leading to unfair or inaccurate results.

Putting it together

Current AI-powered tools often combine technologies to achieve high performance in specific tasks (what’s known as ensemble models).  

Improving performance for healthcare applications 

Tools designed for healthcare are strongly recommended for use in clinical workflow over generic tools. Custom pre-training and fine-tuning significantly improves performance for specialised environments such as healthcare: 

How design impacts performance

Hear conversation
Physical environment

Healthcare AI tools are tailored to operate in busy healthcare environments. 

Generalized AI tools may not function well in busy or unpredictable environments – with varied lighting, many people talking or moving, and background noise. 

Identify people and terms
Healthcare terminology

Healthcare AI tools are trained to recognize specialized healthcare terminology, non-English languages and accents.  

Generalized AI tools may struggle with medical terms, accents, dialects, or ways of speaking not part of their training data. 

Follow conversation
Understand context

Healthcare AI tools are trained to interpret conversations and data products typical in healthcare settings. 

Generalized AI tools may struggle with fragmented or non-linear conversations and healthcare-specific data formats.

Optimize performance
Optimize clinical accuracy

Healthcare AI tools prioritize reducing errors that impact clinical accuracy and patient safety. 

Generalized AI tools typically treat all errors equally, without considering the critical nature of healthcare information.  

Optimize performance
Regulatory compliance

Some healthcare AI tools may be developed with healthcare regulations in mind to ensure data privacy and security. 

Generalized AI tools may not inherently comply with healthcare-specific regulatory requirements. 

Accuracy and performance 

Reliable, up-to-date information about accuracy rates for specific AI-powered tools is scarce. Often accuracy rate information is included within promotional materials produced by the product developers, and lack of transparency makes it difficult to assess validity of advertised performance (HAI, 2024).

At this time, the best way to assess the accuracy of an AI tool in clinical use is to:  

  • Use a well-known product advertised specifically for use in clinical documentation.   
  • Solicit information on accuracy from colleagues who are using the product 
  • Try the product in practice and do your own assessment of accuracy.   
Current limitations in assessing performance of AI models

As artificial intelligence (AI) tools become increasingly prevalent in healthcare, practitioners should be aware of several key challenges in evaluating their true capabilities and limitations.

  • Training Bias: AI models may be trained on standardized test data, skewing test results. For example, high MCAT scores of an LLM might reflect exposure to practice tests in training, not true medical understanding.
  • Lack of Transparency: Most private companies developing foundation models do not disclose training data. It’s difficult to trust an AI tool’s recommendations without knowing what it was trained on.
  • Inconsistent Standards: There are no standards for AI training or performance in healthcare. Comparisons between tools are unreliable, which can make selection of a product challenging.
  • “Overfitting”: Overfitting refers to when an AI model is over-trained on its data set and has difficulty extrapolating to new data or information. This results in AI models that excel in test scenarios but falter with real patient complexities.

    AI Tools in Primary Care: Applications and practical considerations

    Jump to:

    Amid an ever-expanding AI tool landscape, it is critical to understand the corresponding risk levels and practical considerations when seeking to implement an AI-powered solution into clinical workflow. 

    What can I do right now?

    Starting with low-risk, non-clinical tasks is a good approach to gain an understanding and confidence in how AI-powered tools function in clinical workflow. While there are many different products available, the challenges of AI in healthcare are significant, see AI in CDSS, and clinicians should consider the risks inherent in possible engagement with AI tools for various tasks.  

    For more detailed examples of applications, see Examples of AI applications in primary care.

    Though use of AI in primary care is evolving, practitioners have medico-legal responsibilities when it comes to the use of AI in clinical practice. For a current, detailed overview of issues related to privacy, bias, and consent, see the Navigating AI in healthcare webinar (60 min – MainPro+ certified) (CMPA, June 4, 2024), or AI Webinar—Important takeaways (CMPA, 2024).

    Update AI in primary care: Risk spectrum

    Tasks in primary care range from low-risk administrative functions to high-risk clinical decision support, with risk levels generally correlating to the degree of clinical involvement. As risk increases, so should the level of management, mitigation, and oversight, including stricter protocols, mandatory clinician review, regular audits, and clear escalation procedures. 

    Key Terms

    Large language models (LLMs), Natural Language Processing (NLP), Automatic Speech Recognition (ASR).
    For detailed information about these and other types of AI models and algorithms, see Fundamentals for Practitioners. For definitions of terms, see the AI Glossary.

    Varied risk levels in AI clinical assistant products

    Private clinical software companies are producing comprehensive AI-powered clinic assistant suites that execute a range of tasks with varying degrees of risk. For safe implementation using clinical assistant suites, practitioners should:

    • Understand the capabilities and limitations of each component in the suite.
    • Evaluate each component individually for risk level and appropriate use.
    • Implement tailored strategies for different risk levels.
    • Regularly reassess tool performance and risk profiles.

      Examples of AI applications in primary care and their associated risk levels

      Spotlight: AI scribes 

      • A recent small (n=38) pre-post observational study showed use of an ambient clinical transcription tool over five weeks resulted in a significant reduction in practitioner burnout rate (from 69% to 43%), assessed using the Stanford Professional Fulfillment Index (PFI) (medRxiv, July 19, 2024).
      • Early research showed use of medical scribes reduced after-hours documentation time by up to 50% and enabled clinicians to see 12 additional patients per month (Government of Ontario, April 2024).
      • Results from a 2023 preliminary pilot conducted by OntarioMD (OMD) and supported by the Ontario Medical Association (OMA) showed that use of an AI medical scribe product reduced cognitive load and administrative burden, increased interaction time with patients and enhanced engagement and improved accuracy of documentation details during patient visits (OMD, 2024).
      • In 2024, the Ontario Ministry of Health and Ontario Health funded an AI scribe evaluation study by OMD, the eHealth Centre of Excellence (eCE), and the Women’s College Hospital Institute for Health System Solutions and Virtual Care (WIHV). Including patients, clinicians, and AI scribe vendors, the study aimed to help clinicians and patients adopt AI scribes efficiently, enhancing practice without adding burden. Detailed results are expected to be released later in 2024 (OMD, 2024; eCE, 2024). 
      Examples of AI medical scribe outputs 

      Depending on the product, AI medical scribes generate valuable outputs for clinical workflow, including: 

      • SOAP notes for EMRs 
      • Referral letters and request for consult 
      • Insurance documentation
      Effective use: Practical guidance from clinicians

      Emerging evidence: Practical applications

      New Demographic disparities in medical LLMs (medRxiv Sept 9, 2024)

      Takeaway: Current and past medical LLMs continue to have pervasive issues with training data bias that affects the fairness of AI performance. While mitigation strategies and bias detection are improving, there needs to be further validation of algorithms and strategies to mitigate bias in medical LLMs. 

      • 15 of 16 studies identified gender disparities, where language used or instructions given reflected stereotypes and traditional gender roles. For example, one study found that recommendation letters produced for a male audience included “agentic” terms promoting assertiveness, independence, etc, at a significantly higher rate than for women. Meanwhile, letters produced for women used more “communal” language.  
      • 10 of 11 studies exhibited racial or ethnic bias that influenced treatment recommendations, language use, and diagnostic accuracy. For example, one study’s assessment of GPT-4 found that when compared to groups of “European” descent, the model recommended advanced imaging at lower rates for patients from underrepresented racial groups. 

      Recent emerging evidence

      Evaluation of AI Solutions in Healthcare Organizations – the OPTICA tool (NEJM AI Aug 14, 2024).

      Takeaway: The OPTICA (Organizational PerspecTIve Checklist for AI solutions adoption) framework lays out a requirements list for confidence in the use of AI-powered tools in a clinical setting.

      CEP note: In the current landscape of what’s possible with AI tools, it is unlikely that any clinical AI tool available now will meet this or similar criteria. For full confidence in clinical use, confirmation and validation that input data and development processes are robust, unbiased, and well-documented is necessary, and there are significant technological barriers to overcome before that is widely possible. See AI in CDSS: Challenges and considerations for more information.

      Overview: The OPTICA (Organizational PerspecTIve Checklist for AI solutions adoption) framework is designed to address the gap between theoretical evaluation frameworks and the practical needs of healthcare organizations. OPTICA emphasizes the importance of evaluating AI solutions against the specific data, population, and workflow contexts of the implementing organization, recognizing that AI solutions may perform differently outside their original development environment. Domains include Clinical Need Specification, Data Exploration, Development & Performance Evaluation, and Deployment & Monitoring Plan.

      Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat Med (2024).

      Takeaway: In a clinical simulation, recent available LLMs including healthcare-aligned models (Llama 2, OpenAssistant, WizardLM, Clinical Camel and Meditron) performed significantly worse than human clinicians on a variety of common clinical tasks. Aggregated across all diseases and models, neither of the two healthcare-aligned models performed significantly better than generic models.

      Study overview and key results: Using de-identified patient data from MIT’s MIMIC-IV database as the basis for the clinical simulation, LLMs were evaluated on pre-identified criteria including diagnostic accuracy, guideline adherence, clinical workflow integration, and decision-making.

      • Diagnostic accuracy rates: LLMs showed poor diagnostic accuracy, ranging from 45.5-53.9%, compared to 87.5-92.5% accuracy of human clinicians
      • Lab result interpretation: Given test results and reference range, LLMs were asked to classify results as below, within, or above range. While a simple task for humans, the performance of the LLMs was surprisingly low and highly inconsistent. In identifying results below and above the given range, accuracy across LLMS ranged widely from 25%-77%.
      • Errors and hallucinations: Error and hallucination rates were high, with errors noted every 2-4 patients, and hallucinations every 2-5 patients.

        New AI in CDSS: Challenges and considerations

        Use of non-explainable, non-transparent CDSS in clinical practice presents serious risks. Without insight into the underlying evidence or the ability to trace recommendation logic, use of one of these systems would compromise clinicians’ medico-legal responsibilities and jeopardize patient care. 

        Clinical decision support systems (CDSS) are typically categorized as either knowledge-based or non-knowledge based. 

        • Knowledge-based CDSS function by employing rules (if-then statements). The system pulls data to assess the rule and generates a corresponding action or outcome. These rules can be derived from literature-based, practice-based, or patient-specific evidence. 
        • Non-knowledge-based CDSS still require a data source but make decisions using AI, machine learning, or statistical pattern recognition instead of following predefined expert medical knowledge.  

        A significant challenge of non-knowledge-based CDSS is the “black box” effect. Deep learning models are highly complex, and even a model’s developers and engineers cannot trace precisely how the AI generated a response. Furthermore, many private companies are not transparent about training data, leaving users blind to even the information sources the AI is using to generate responses. 

        Assessing AI-CDSS products 

        Currently, explainable or transparent systems are not required by law. Users must rely solely on vendor-supplied information. Avoid use of non-explainable, non-interpretable AI-powered CDSS in clinical practice.

        A note on evidence:

        Many products advertise use of “best” or “highest quality” evidence. This as a standalone statement is not sufficient to engender trust. Confidence in evidence can be related to currency, context, curation, risk of bias, generalizability, and myriad other factors.   

        Does a vendor get specific about which evidence is used, how it is selected, and how often it is updated?  If the answer is NO, the risk of using such a tool in practice is high.

        Is the evidence source a “walled garden,” or are there other inputs (such as an LLM) that could add an additional and possibly inaccurate layer of interpretation? If the answer is YES, the risk of using such a tool in practice is high.

        Ethical and Regulatory Landscape

        Ethical landscape

        This section will be updated on an ongoing basis to track bias and potential mitigation strategies in AI-powered healthcare tools. 

        Biases in AI: Implications for primary care

        AI models are influenced by biases from a variety of sources:

        • Limitations of model types
        • Biases and lack of diversity in pre-training data and fine-tuning process
        • Biases and lack of diversity of developers and human users

        Consumers of AI products should expect that products do contain and perpetuate biases.

        In healthcare, AI biases have the potential to result in considerable harms. 

        Spotlight: Impact of biases in training data  

        The data that AI models draw on is not better than existing data sources, and even high-quality health population data and data from academic or scholarly literature have known biases and limitations (Oxford CEBM, 2024). Furthermore, without transparency in training data, there is no way to know what data sources are used to inform an AI model. It is safe to assume that datasets used by AI models over-represent certain demographic groups, under-represent others, use flawed proxies, and articulate patterns or trends that are in themselves flawed. 

        In primary care, this could translate to:  

        • Diagnostic Errors: Biased AI systems may lead to missed or delayed diagnoses, particularly for underrepresented patient groups.
        • Treatment Disparities: AI-driven treatment recommendations might not be equally effective across all patient populations.
        • Communication Barriers: Biased language models or speech recognition systems could impede effective communication with diverse patient populations.
        • Reinforced Inequities: Unchecked AI bias could exacerbate existing health disparities and inequities in healthcare access and outcomes.
        • Erosion of Trust: If patients perceive or experience bias in AI-assisted care, it could damage trust in their healthcare providers and the healthcare system overall.
        Movements to incorporate ethics into the design and use of AI 

        Responsible AI refers to principles guiding the design, development, deployment, and use of AI to foster trust and empower organizations and collaborators. It addresses the broader societal impacts of AI systems, ensuring they align with social values, legal standards, and ethical principles. The aim is to integrate these ethical guidelines into AI applications and workflows, thereby reducing risks and negative outcomes while enhancing positive results. 

        Explainable AI (XAI) involves methods and processes that allow human users to comprehend and trust the outcomes of machine learning algorithms. It clarifies an AI model’s anticipated impact, possible biases, accuracy, fairness, transparency, and decision-making results. XAI is crucial for fostering trust and confidence in AI models and promoting a responsible approach to AI development, especially as AI becomes more sophisticated and less understandable. 

        Regulatory landscape 

        Currently, AI is not regulated in Canada, though health regulations apply to certain AI uses.  

        Non-binding principles 
        Binding regulations (proposed) 

        AI Glossary

        AI glossary

        AI biases result from human biases captured in training data or algorithms. They will be reflected in responses and may be overt or subtle and difficult to detect.

        AI Models are used to make decisions or predictions.

        Algorithms define the logic by which an AI model operates.

        Artificial intelligence (AI) is technology that allows computers and machines to mimic human intelligence and problem-solving abilities. It can be used to execute tasks that would otherwise require a human. Examples include GPS or self-driving cars.

        Automated data extraction uses an automated process to transform unstructured or semi-structured data into structured information.  

        Automated is a system where predefined human instructions are used by machines to perform repetitive tasks. 

        Automatic speech recognition (ASR) systems convert human speech into text through a highly complex process involving linguistics, mathematics, and statistics. The performance of speech recognition technology is measured by its word error rate (WER) and speed. Factors like pronunciation, accent, pitch, volume, and background noise can affect ASR systems’ performance. 

        Machine learning models with inner workings so complex that it is impossible for algorithm designers, engineers, and users to know precisely how the model came to a specific result. 

        Classification algorithms in machine learning aim to categorize data accurately. They identify patterns in existing data to determine how to label or define previously unseen examples. 

        Computer vision is a broad discipline that employs machine learning and neural networks to enable computers and systems to extract information from digital images, videos, and other visual inputs. It allows these systems to make recommendations or take actions based on their visual analysis. 

        Deep learning arranges machine learning algorithms in layers to form neural networks. Deep learning has had significant breakthroughs in recent years, and most of the AI products consumers interact with today are powered by deep learning models.

        Products that combine multiple base AI models together.

        The collection of processes and techniques designed to help human users understand and have confidence in the results produced by machine learning algorithms. 

        Explainable AI (XAI) involves methods and processes that allow human users to comprehend and trust the outcomes of machine learning algorithms. It clarifies an AI model’s anticipated impact, possible biases, accuracy, fairness, transparency, and decision-making results. XAI is crucial for fostering trust and confidence in AI models and promoting a responsible approach to AI development, especially as AI becomes more sophisticated and less understandable. 

        Forced hallucination is a term for when a user attempts to use an LLM in ways contrary to how they are designed to work, which forces the model into a position where their limitations are more stark. For instance, LLMs are not designed to be accurate or to “find” information.  Asking an LLM-powered chatbot for data or statistics that are impossible to know may result in a response that it doesn’t know but could also result in the model inventing an answer.

        Foundation models are large-scale, using vast amounts of training data to facilitate use across a variety of contexts.

        Generative AI, or GenAI, can produce original content based on a user’s prompt or request, such as text, images, audio or video.

        Hallucinations are specific to LLMs and refer to when the model’s response to a prompt is nonsensical or inaccurate. Hallucinations can be difficult or impossible to detect if the subject is outside the prompter’s realm of expertise.

        The extent to which users can grasp the reasoning behind an algorithm’s decision. It measures how accurately an AI’s output can be predicted by human users.

        Large Language Models (LLMs) are a type of generative AI model trained on vast amounts of data. They can produce original content based on a user’s prompt or request such as text, images, audio, or video that uses Natural Language Processing (NLP). 

        Machine Learning (ML) is a subset of artificial intelligence rooted in training machines or programs on existing data. Once training is complete, the machine or program can apply what it learned to new, unseen data to identify patterns, make predictions, or execute tasks.

        Machine learning classification (MLC) uses algorithms and aims to categorize data accurately. They identify patterns in existing data to determine how to label or define previously unseen examples. 

        Multimodal models analyze and produce many different types of data. Current LLM chatbots such as ChatGPT and Gemini are multimodal – they can read and analyze many types of data (text, video, audio, images), and produce other data types as part of requested outputs, such as taking a text input and producing an infographic.

        Natural Language Processing (NLP) combines computational linguistics, statistical modeling, machine learning, and deep learning to enable computers to understand and generate realistic humanlike text and speech.

        Neural networks are complex, layered algorithms that mimic the structure of human brains. Nodes in neural networks perform calculations on numbers passed between them on connective pathways.

        Optical character recognition (OCR) is a technology using automatic data extraction and both hardware and software systems to convert images of text into a format readable by a computer. It identifies individual letters in an image, assembles them into words, and then forms those words into sentences, thus transforming physical, printed documents into text that a computer can read. 

        Overfitting in machine learning happens when an algorithm models the training data too precisely, making it ineffective at predicting or interpreting new data. This undermines the model’s utility, as the ability to generalize to new data is crucial for making accurate predictions and classifications. 

        Retrieval-augmented generation enhances LLM-generated responses by incorporating external sources of knowledge, enriching the model’s internal understanding. This approach allows the system to access the most current and reliable data while providing users with source references to verify the accuracy of its claims. 

        Responsible AI refers to principles guiding the design, development, deployment, and use of AI to foster trust and empower organizations and collaborators. It addresses the broader societal impacts of AI systems, ensuring they align with social values, legal standards, and ethical principles. The aim is to integrate these ethical guidelines into AI applications and workflows, thereby reducing risks and negative outcomes while enhancing positive results. 

        Timeboxing refers to the fact that machine learning models’ training data is only current up to a certain date. Foundation models require a massive amount of time, resources, and computing power to train, and can cost hundreds of millions or close to a billion dollars to train. Thus, continuous training is not feasible. For more information on the costs to develop foundation models, see 2024 AI Index Report (Stanford) under Top Resources. 

        Training data is used by machine learning systems to learn how to recognize patterns and generate writing. The quality of training data, plus additional learning, directly impacts the quality of system outputs. 

          • For ASR, training data that contains varied voices, accents, conversation styles, background noise levels, etc., will improve the model’s performance. Models that have not been trained with diverse data will struggle to accurately capture conversations that include sounds, accents, or manners of speaking they were not trained on.
          • For LLMs, training data that includes biased, discriminatory, or violent content will be reflected in model reasoning and outputs, in ways that may or may not be clear in any given conversation.  Foundation models have added extensive fine-tuning after training to attempt to surface, correct, and eradicate harmful vectors and concepts.
          • Currently, private development companies keep details of training data secret, so consumers have no way to discover what data current frontier models were trained on.

        Transformer models are advanced neural networks designed to handle data such as words or sequences of words. They work by converting input sentences into numerical values that capture their meaning. These values are processed through multiple layers, each refining the data by focusing on different aspects of the sentence. This method allows transformer models to effectively translate languages and perform other complex tasks by learning patterns and relationships within the data. 

        Unimodal models can only read and produce a single type of data, such as an LLM that can only read and produce text.

        Weight refers to how LLMs map the probability of different possible word or concept pairings. The more frequently words or concepts are paired together, the more weight the model will give them.

        Word error rate (WER) is the percentage of words inaccurately translated by the ASR system. WER counts all words equally in an exchange. For specialized fields, such as medicine, the use of specialized WERs can help fine-tune the model to ensure high performance for capturing critical health information. Healthcare WERs include Medical Concept WER (MC-WER), Doctor-Specific Word Error Rate (D-WER), or Patient-Specific Word Error Rate (P-WER).

        Word vectors are created by LLMs during training to map how words are related. They are created to capture relationships between words or concepts and extrapolate to identify analogies.

        Top Resources New

        Join the conversation New