Giving Residents Feedback at Scale Without More Shadowing

Communication skill is not absorbed through osmosis. It develops through feedback on real encounters—repeated, specific, behavior-level feedback delivered close enough in time to the interaction that the resident can still connect the comment to what actually happened. Every experienced program director knows this. Most also know how rarely it actually occurs.

The gap between what good communication development requires and what residency programs can realistically deliver is not a failure of intent. It is a structural problem. The gold standard for assessing resident communication—direct observation of a clinical encounter—is faculty-time-intensive by design, difficult to schedule without disrupting patient care, and therefore sparse. Programs end up with Milestone ratings that aggregate impressionistic feedback from dozens of brief encounters, and residents whose communication development is shaped more by self-correction and osmosis than by deliberate coaching.

This article is not an argument that direct observation should be abandoned. It is an argument that direct observation alone was never a sufficient system—and that GME programs can do better without simply demanding more shadowing hours from an already stretched faculty.

The shadowing bottleneck

Direct observation via tools like the mini-CEX is well-validated. Norcini and colleagues established the mini-CEX's measurement characteristics across 388 encounters in internal medicine training programs, demonstrating that it could reliably distinguish performance levels and yield actionable feedback when used well.¹ The problem is not the tool—it is the frequency with which it gets used.

Research on observation rates paints a consistent picture. Studies have found that even when faculty and residents share a clinical session, direct observation of resident performance accounts for less than 5 percent of that shared time.² Some surveys of residents report that a substantial proportion were never directly observed performing a history and physical during their training. In internal medicine, one program reported approximately four scheduled direct observations per resident per academic year.² For a training program that spans hundreds of patient encounters, four observations is a floor, not a ceiling.

The ACGME Common Program Requirements mandate that faculty must directly observe, evaluate, and frequently provide feedback on resident performance during each rotation.³ The spirit of that requirement is clear. The gap between the requirement and what most programs actually achieve is equally clear—and documented.

Three factors drive the bottleneck. First, there is the basic arithmetic of faculty time: a program director or attending supervising multiple residents across a busy clinical service cannot shadow each of them without compromising other responsibilities. Second, scheduling a formal direct observation encounter adds coordination overhead—reserving time, selecting an appropriate patient interaction, ensuring the faculty observer is positioned to watch without intruding. Third, there is the Hawthorne effect: being watched changes behavior. Research on direct observation in clinical settings confirms that clinicians alter their conduct when they know an observer is present, raising the question of whether a scheduled mini-CEX captures what a resident typically does or what they do when they know they are being evaluated.⁴

Why feedback ends up sparse, delayed, and generic

The observation bottleneck has downstream consequences for feedback quality that programs often underestimate.

When direct observation is rare, there are few discrete behavioral moments to reference. Faculty end up giving feedback based on chart notes, verbal summaries from other team members, and whatever impressions accumulated over a rotation. The result is feedback that tends toward the general: "your communication with patients is good" or "you need to work on your delivery of bad news." Neither of those comments gives a resident anything specific to act on. Neither is tied to a concrete moment. Neither distinguishes between a resident who struggles with structure and one who struggles with affect.

Research on feedback in GME is consistent on this point: residents want feedback, but frequently describe it as vague, untimely, and disconnected from specific encounters.⁵ The problem is not that faculty lack insight. It is that without a behavioral record tied to real encounters, specific feedback is hard to construct.

Timeliness is the other casualty. Feedback delivered weeks after an encounter—common when formal evaluations arrive at the end of a rotation—requires residents to reconstruct context that has long since dissolved. The instructional value diminishes with every day that passes between the encounter and the comment about it.

What good communication feedback looks like

Specificity is the most important feature of feedback that actually changes behavior. "You interrupted the patient three times before they finished describing their symptoms" is useful. "You could work on your listening skills" is not—even if both comments originate from the same observation.

Behavior-level specificity requires a behavioral record. Without one, feedback collapses back into generalities. The four features that distinguish feedback likely to produce learning from feedback that produces compliance or resentment are:

Specific: tied to an observable behavior in a real encounter, not a trait or a general pattern
Timely: delivered close enough to the encounter that the learner can contextualize it
Formative: framed as information for learning, not material for ranking or evaluation
Actionable: paired with a concrete alternative or a direction to explore

These features are not controversial in the medical education literature. They are widely endorsed in frameworks for clinical feedback, including those that underpin ACGME's own faculty development resources on direct observation.³ What is harder to achieve is the structural condition that makes them possible: a feedback process with enough frequency and enough behavioral specificity to give residents something to work with across their training, not just at milestone checkpoints.

Scaling without more ride-alongs

The alternative to more direct observation is not no observation—it is structured review of real encounters that already happened.

Rather than requiring a faculty member to be physically present during a clinical interaction, this approach works from encounters the resident has already completed. The resident reflects on the encounter with structure: what moments felt effective, where the conversation shifted, what they would do differently. That structured reflection can be guided by rubrics aligned to the program's communication competency framework, generating a behavioral record that a faculty coach can review asynchronously and use as the basis for a targeted conversation.

This is not a novel concept in education. Deliberate practice theory, developed by Ericsson and colleagues, holds that improvement in complex skills depends on repeated cycles of performance, feedback, and adjustment—not simply on accumulated experience.⁶ Residency provides ample experience. It does not, by default, provide the feedback cycles that convert experience into skill development.

What makes structured encounter review work at scale is the combination of learner-generated reflection and rubric-anchored framing. A resident who self-identifies where a conversation became difficult is already doing the analytical work that coaching needs to build on. A rubric tied to the ACGME's Interpersonal and Communication Skills (ICS) milestones gives that reflection a consistent vocabulary—one the program director can aggregate across residents to see where the cohort as a whole is developing and where it is not.⁷

Protecting this process as explicitly formative is essential. Residents who believe their self-assessments will be used in summative evaluation will calibrate their responses accordingly. The instructional value depends on an environment where honest reflection is safe—where identifying a difficult moment in a patient conversation is treated as evidence of reflective practice, not as admission of deficiency.

Keeping learner trust

Feedback at scale creates a privacy question that programs need to answer directly, not assume away. Residents are aware that any systematic collection of data about their clinical encounters carries surveillance potential, and they are not wrong to notice. If structured encounter review is perceived as a monitoring program rather than a development program, participation will be guarded, self-assessments will be sanitized, and the process will generate data that reflects what residents are willing to say, not what they actually experience.

Maintaining trust requires a few structural commitments. Data should be learner-facing first: residents should have access to their own longitudinal record before anyone else does. Participation in formative coaching should be clearly decoupled from summative milestone ratings and program evaluations—residents need to know that what they share in a coaching session cannot appear verbatim in a clinical competency committee review. Aggregate program-level data should be de-identified before it reaches program leadership, so that patterns across the cohort are visible without individuals being identifiable in every summary.

None of this precludes using aggregate data to improve program curriculum or identify systemic communication development needs. It simply requires that the architecture of the system communicates to residents that coaching is for their benefit—not a data collection mechanism that happens to use their conversations as its input.

Inflect is designed with this architecture in mind. Individual coaching is learner-owned and explicitly formative. Program-level analytics draw on de-identified aggregate data, so GME leaders can see where the curriculum is and is not producing communication competency without the system functioning as individual surveillance. The goal is to give programs the insight they need to improve training without giving residents a reason to approach their own development defensively.

If your program is looking for a scalable approach to communication assessment and coaching, the Inflect GME solutions page describes how programs are using structured encounter review to give residents more frequent, specific feedback. If you'd prefer to start with a conversation, you can schedule a demo to see how the program-level analytics and resident coaching tools work together.

Norcini JJ, Blank LL, Arnold GK, Kimball HR. The mini-CEX (clinical evaluation exercise): a preliminary investigation. Annals of Internal Medicine. 1995;123(10):795–799. PMID 7574198. https://pubmed.ncbi.nlm.nih.gov/7574198/ ↩
Multiple studies on direct observation frequency in residency are summarized in: Kogan JR, Holmboe ES, Hauer KE. Tools for direct observation and assessment of clinical skills of medical trainees: a systematic review. JAMA. 2009;302(12):1316–1326. Available at https://pubmed.ncbi.nlm.nih.gov/19773567/. The figure that direct observation accounts for less than 5% of shared faculty-resident time is reported in studies reviewed therein; the figure of approximately four scheduled observations per year in internal medicine is reported in: Holmboe ES, Fiebach NH, Galaty LA, Huot S. Effectiveness of a focused educational intervention on resident evaluations from faculty: a randomized controlled trial. J Gen Intern Med. 2001;16(7):427–434. ↩ ↩²
ACGME Common Program Requirements (Residency), Section V.A.1 (Formative Evaluation) and the ACGME Direct Observation Toolkit. https://www.acgme.org/programs-and-institutions/programs/common-program-requirements/ ↩ ↩²
O'Brien MJ, Rogers C, Jamtvedt G, et al. The Hawthorne effect in direct observation research with physicians and patients. J Clin Epidemiol. 2018;93:1–8. PMC5741487. https://pmc.ncbi.nlm.nih.gov/articles/PMC5741487/ ↩
van der Leeuw RM, Slootweg IA. Twelve tips for making the best use of feedback. Med Teach. 2013;35(5):348–351. The characterization of resident feedback as vague and untimely is a consistent finding across GME literature; for a recent example see: Perceptions of scheduled vs. unscheduled directly observed visits in an internal medicine residency outpatient clinic. BMC Med Educ. 2020. https://pmc.ncbi.nlm.nih.gov/articles/PMC7057513/ ↩
Ericsson KA, Krampe RT, Tesch-Römer C. The role of deliberate practice in the acquisition of expert performance. Psychological Review. 1993;100(3):363–406. The application of deliberate practice to clinical skills development is discussed in: Ericsson KA. Deliberate practice and acquisition of expert performance: a general overview. Acad Emerg Med. 2008;15(11):988–994. https://pubmed.ncbi.nlm.nih.gov/18778378/ ↩
ACGME Milestones: Interpersonal and Communication Skills (ICS) harmonized milestones covering ICS1 (patient-centered communication) and ICS2 (team management). ACGME Milestones resources available at https://www.acgme.org/milestones/ ↩