00;00;00;00 - 00;00;30;15 Unknown Welcome to the deep dive. Today we are, strapping on the next generation of computing AI powered smartglasses. Yeah, it feels like we're really moving past those, you know, clunky early versions. Definitely. The current landscape is was pretty sophisticated. You've got these high def displays, cameras, sensors and powerful AI assistants all packed in. Right? And they promise to transform, well, everything how we communicate, accessibility, even how health care gets delivered. 00;00;30;16 - 00;00;53;09 Unknown It's a huge technological leap, no doubt. I mean, multimodal computer vision, real time translation. It's all happening in these tiny devices. It is. The tech is dazzling. Absolutely. But here's the mission today. And maybe the critical question for you, the learner. What makes these things genuinely usable? Exactly. Like what specific features are driving people to actually, you know, wear them day to day and where do they just fall short? 00;00;53;10 - 00;01;11;12 Unknown Why do people end up taking them off and shoving them in a drawer? Yeah, it's easy to get lost in the specs, right? 46 degree field of view. 12 megapixel cameras. Sounds great, but what feature actually gives you so much value that you won't leave home without them? And to really dig into that, we kind of have to start with a foundational critique. 00;01;11;13 - 00;01;33;21 Unknown It's going to be the lens for this whole deep dive really. Don Norman's paradox of wearable technologies. Don Norman always bringing us back to usability. He really hammered this point home. The critical design challenge, he said, is making sure the device actually augments us, makes us better instead of just becoming another thing, stealing our attention. Exactly. 00;01;33;21 - 00;01;55;09 Unknown Another distraction. That's the paradox. So the more powerful and futuristic the tech gets, the more likely it is to just like, bombard you with notifications, complex gestures, visual noise, and it ends up ruining the experience of actually living in the real world. Right? So if the glasses need constant fiddling or they interrupt you with too much stuff when you're just trying to cross the street, they fail. 00;01;55;09 - 00;02;16;18 Unknown They violate that core principle. So we need to judge these things, not just on potential, but on whether they enhance what you're trying to do without being annoying. Attention, hogs. Do they help your thinking or just get in the way? Okay, good framework. Yeah. Let's start where most people encounter these things. First, the consumer market. This seems like the trickiest area. 00;02;16;20 - 00;02;40;02 Unknown It really is because success here is all about normalization. Low friction. People just won't wear them if they don't look and feel like, well, normal glasses. So convenience, communication, entertainment that's the focus. But what are people actually using the AI for right now? And that takes us straight to meta, because they seem to have found a specific kind of high value little interaction that people are okay with. 00;02;40;03 - 00;03;00;03 Unknown YouTube hands free social capture and just talking to it. Ambient voice interaction. They've definitely hit on something. The Ray-Ban meta smart glasses. The second gen huge success rate. Yeah, you mentioned it 210% year over year growth. Yeah, massive growth in the eye glasses market. Over a million sold in 2024. It proves if you nail the look people might actually try them. 00;03;00;09 - 00;03;25;05 Unknown But the interesting usability point you're saying is what they left out at first. Exactly. They look like classic Ray-Bans, right? Packed with cameras, dual 12 MP and good audio with. Open your speakers and mics but critically no visual head up display, no HUD. They focus purely on taking photos and videos without using your hands and voice commands. That simplicity that was the genius move. 00;03;25;07 - 00;03;53;11 Unknown Minimal distraction. So dodging the complex, maybe annoying AR overlay was actually the key to getting people on board. Prioritize capture over like full spatial computing? Absolutely. The device used its basic senses camera mic for quick info retrieval, you know, hey meta, what am I looking at? And the I would describe it or translate text right. Using multimodal computer vision on the camera feed, but only when you asked it give you that hands free usefulness without throwing a complicated visual interface in your face. 00;03;53;13 - 00;04;13;10 Unknown Kept the design minimal. Focused on simple tasks that make sense. Keep the cognitive load low. But meta is already pushing beyond that, aren't they? There's the new Ray-Ban display model, the third gen. Yeah, they're finally adding a visual bit. It's a transparent micro display just in the right eye, but small, right? Like 600 by 600 pixels. Deliberately small. 00;04;13;10 - 00;04;34;03 Unknown Yeah, it's not for watching movies. It's designed for minimal info like simple text alerts, arrows for walking directions, maybe short visual replies from the eye so they're easing people into are making sure the visual stuff is minimal enough not to be overwhelming. Seems like it. Trying to avoid that cognitive overload Norman warned about. Okay, and what about input? 00;04;34;04 - 00;04;54;20 Unknown You mentioned meta is also looking at this neural wristband thing. EMG signals. Yeah, sounds very sci fi. It does. It's trying to solve that problem of subtle gesture control. The sensors on your wrist use electromyography EMG to pick up tiny electrical signals from your forearm muscles, so you can click or scroll just by thinking about moving your finger. 00;04;54;20 - 00;05;17;10 Unknown Almost sort of. Yeah, by making really slight internal finger movements long before anyone else would notice. If they can make that reliable and crucially discoverable. The two things doorman says gesture controls often lack exactly if they know that it could be huge for interacting discretely in public, right. Reliability and discretion. Okay, so Meta's approach life style capture, subtle utility. 00;05;17;13 - 00;05;42;06 Unknown Lets pivot to snap spectacles. They seem like the total opposite, right? All about high end are totally different. Focus snap is all about the creators and pushing. What are can look like. They've got dual 3D waveguide displays, a wider 46 degree field of view designed specifically for their AR effects, their lenses, and they pack serious tech like the Snap spatial engine for six degrees of freedom tracking six to F yeah, six two of his. 00;05;42;08 - 00;06;06;10 Unknown It makes the virtual stuff feel properly anchored in your real world. Tracks your head and hands so the AR feels more stable, more real. Theoretically, yes. And they boast impressive clarity, like 37 pixels per degree. That's getting close to bigger, bulkier headsets. Interaction is voice and a touchpad. Plus, this scan feature where the AI recognizes stuff and suggests AR lenses like looking at the Eiffel Tower and getting a history overlay. 00;06;06;14 - 00;06;34;04 Unknown Exactly that kind of context aware AR. It's powerful, but and this is the big usability caveat our sources mentioned these are still just developer kits. Why does that matter for the average person? Because the complexity creates friction. Sure, they show what high end spatial computing can do, but developers often put up with stuff regular users won't like. Limited battery life when you're actually creating AR content, the processing power needed for that constant 60 of tracking and just a learning curve for using it. 00;06;34;04 - 00;06;59;00 Unknown All right, snap is pushing the boundaries, definitely. But they haven't proven yet that the average Joe needs or wants that full on, always active AR experience. Every day. It raises that Norman warning again, don't let the novelty outrun solid, everyday usability. Okay, so that leads us to the third consumer model, the X Real Air two Pro. They kind of sidestepped all the AR complexity, didn't they? 00;06;59;01 - 00;07;24;03 Unknown They did found a different niche. The actual air two Pro are basically wearable personal displays. Super light only 72g. But they project this huge virtual screen like 130in. Yeah, ten ADP per eye, 46 degrees FOV. Their whole utility is private media watching and productivity, so they skip the AR distraction entirely. The use case is turning like an airplane seat into your private movie theater or office. 00;07;24;04 - 00;07;51;25 Unknown Exactly. And that's the specific utility driving their adoption, especially for productivity. Think developers, analysts, even gamers it can replace needing multiple physical monitors. You just put on the glasses and project maybe 2 or 3 virtual desktops, right? Using a companion device or software that focused utility, high quality private portable screen space is a big win for getting work done on the go, and the tradeoff is obvious they aren't see through when you're using them as a screen. 00;07;51;25 - 00;08;13;26 Unknown Yeah, and they don't have outward cameras or onboard AI. No real environmental awareness. Precisely. Utility is focused inward. They rely on your phone or PC. It avoids the AR clutter, avoids the cognitive overload, and it just reinforces that main point consumer adoption. Right now, it's driven by hands free capture, like meta or private viewing like ex real, not by complex. 00;08;13;26 - 00;08;32;16 Unknown Always on spatial computing. Not yet anyway. Success hinges on specific, high value, little interactions over trying to do everything at once. If the glasses let you do one thing better or easier than pulling out your phone, they have a chance. Okay, that makes sense for consumer ones, but does that hold up when we move to critical human needs? 00;08;32;18 - 00;08;57;09 Unknown Let's talk accessibility. How does the usability calculation change there? Oh, it transforms completely in section two. Usability driven by necessity. The value proposition is just different. Utility trumps everything. Esthetics minor discomfort because the goal is independence, quality of life. They're not gadgets anymore. They're essential tools. Exactly. Wearable cognitive or sensory prosthetics basically. So let's start with vision assistance. 00;08;57;11 - 00;09;35;14 Unknown The Envision Ally glasses built on Google Glass hardware, light about 50g. They use a camera, bone conduction audio. What's the AI breakthrough? The breakthrough is the ally AI assistant. It works entirely through natural voice conversation. It uses lmms multimodal AI so it can read text using OCR optical character recognition. Right? Reads text, recognizes objects, faces, colors, describes what's around you in real time, and that hands free independence is everything for someone who can't easily use a screen or handle a device, and the user doesn't need rigid commands, they can just talk to it. 00;09;35;19 - 00;09;55;25 Unknown That's the crucial part. They can just say, describe my surroundings. Or as really nuanced follow up questions like and so I'm just hearing Red Apple. They can ask, is that apple right? Or does it look bruised? Wow. And the AI analyzes the visual data more deeply, maybe taps into online info to give a better answer. That depth of info delivered instantly and privately. 00;09;55;25 - 00;10;20;17 Unknown It preserves dignity, fosters independence. It's huge. And the sources mentioned redundancies. Key here too, like fallback options, absolutely critical. The system has robust fallback paths if the ally AI gets stuck on something complex, maybe navigating a new subway, which can happen. Yeah, it seamlessly integrates with human assistant services like Ara or Be my eyes that layered approach means the user is never just left stranded. 00;10;20;24 - 00;10;50;29 Unknown That's a fundamental measure of usability for a life critical application. Okay, now what about hearing assistance for the deaf and hard of hearing community? Are glasses offering subtitles for real life? That sounds incredible. It really is a game changer. It tackles a fundamental communication barrier head on. Solutions like the xray eyeglass app running on consumer AR glasses. They transcribe conversations in real time, floating text, closed captions for reality. 00;10;50;29 - 00;11;08;24 Unknown Pretty much super helpful in noisy places where hearing aids struggle, or in group meetings where lip reading is tough. It's getting more advanced than just transcription. Oh yeah, the tech can translate dozens of languages on the fly, and systems like RCS can even tag who's speaking by name, which really helps in group conversations that make a big difference. 00;11;08;24 - 00;11;30;24 Unknown Definitely. And the underlying tech, the ASR systems, automated speech recognition have gotten so much better. Latency. The delay is down to just a second or two, so the conversation feels much more natural. Beyond sensory help, smart glasses are also being used for cognitive support like memory augmentation for dementia patients. Yes, carry our health Technologies has done brilliant work here. 00;11;30;27 - 00;11;55;09 Unknown Using computer vision, the glasses identify familiar people, okay. And then discreetly whisper an audio prompt in the wearer's ear. Like this is your granddaughter, Anna. Wow. That's that's preserving dignity right there. Exactly. It fills that memory gap instantly without needing a caregiver to jump in or using some distracting visual cue. The glasses also handle medication reminders with audio prompts, so the glasses act like this. 00;11;55;09 - 00;12;17;23 Unknown Gentle private cognitive coach. Amazing and similar uses for social cues Stanford research use something called a super power glass app on Google Glass hardware. Again, it gave kids with autism real time feedback, audio or visual on people's facial expressions like happy, angry. And did it help dramatically improved social engagement? Yeah, the kids up and saw it is kind of a fun game. 00;12;17;23 - 00;12;39;16 Unknown Improved eye contact, got better at reading faces. So for these cognitive uses, why is the interface design so critical? Why lean on discrete audio over AR visuals? It really comes down to minimizing cognitive load. If someone's already struggling with sensory input or understanding their environment, throwing a visual AR overlay on top, even a simple one just adds more noise. 00;12;39;19 - 00;13;08;13 Unknown It can add complexity. Visual clutter A simple private audio whisper is less demanding on their visual processing. It lets them stay focused on the real world interaction, delivering the info they need without overwhelming them. It's a perfect example of solving Norman's paradox through minimal, high impact design. It really feels like an accessibility. The utility is just so incredibly high, literally giving people back independence that adoption isn't really a question of if, but when. 00;13;08;13 - 00;13;30;28 Unknown The value is undeniable. Absolutely. Necessity is, well, the mother of adoption here. Okay, let's move into section three usability and critical contexts. So health care and enterprise here, things like cost even the bulkiness of headsets like C and Apple Vision Pro or a whole lens two they matter less, right? Much less critical than the hands free advantage and the measurable precision gains. 00;13;31;01 - 00;13;52;22 Unknown Here are the priorities are safety efficiency accuracy period. And this is where are really seems to be proving its worth with tangible results like in surgery. Definitely. That's the prime example in our sources. Augmented reality in the operating room, the classic x ray vision use case, we hear about surgeons using AR goggles. I think augmented vision was mentioned. 00;13;52;22 - 00;14;16;18 Unknown It's FDA cleared, right? They use it to overlay 3D scans like CT or MRIs directly on to the patient so they can see the patient's anatomy, maybe tumors or spinal structures in real time or right where they are in the clinical results. Are there. Oh yeah. Augmented vision used in over 10,000 surgeries. Now they're seeing 97 100% accuracy for screw placement and spine procedures. 00;14;16;20 - 00;14;37;21 Unknown That's not just cool tech. It's a real measurable safety improvement and surgeries report. It's less mentally taxing, lower cognitive load, less fatigue because navigating becomes intuitive. It's all heads up. And that heads up part is the usability linchpin, isn't it? Especially in a sterile O.R.. Totally. They never have to break the sterile field or look away from the patient to check some external screen. 00;14;37;23 - 00;15;04;14 Unknown Vital signs, checklists, ultrasound guidance it can all be projected right into their view. Better ergonomics, lower error risk, faster workflow. The utility isn't optional. It's about safety and efficiency. And beyond the O.R., this tech is boosting telemedicine and remote expert help to doctors using smart glasses like Zoom or Google Glass. Enterprise. Right to pull up patient records. Lab results, just using voice commands, keeping hands free. 00;15;04;14 - 00;15;31;14 Unknown Focus on the patient and the collaboration part. Remote assistance. That's huge. Platforms like rods and cones they've supported over 40,000 remote sessions. Think about a surgeon needing help from a specialist miles away. Yeah, the specialist sees exactly what the surgeon sees in real time through the glasses, camera, and they can draw AR annotations, circle a nerve point to an incision spot, and those drawings appear right in the surgeon's view, instantly, instantly and accurately overlaid on the real world. 00;15;31;16 - 00;15;59;09 Unknown It's a super practical blend of communication and AR tech, but this kind of high stakes use raises big questions about usability requirements, doesn't it? What are the absolute must haves for hardware in a clinical setting? They're totally non-negotiable because the stakes are immense. First, ruggedness. Easy to clean, maybe sterilize. Second hands free control mainly voice to maintain sterility and, crucially, security, IP compliance and mandatory. 00;15;59;12 - 00;16;23;17 Unknown You're transmitting incredibly sensitive patient data through that camera feed. Those voice commands, maybe to the cloud or a remote expert. You absolutely cannot risk a data breach mid procedure and battery life has to be reliable. Ideally, all day options. A surgeon's guidance system can't just die halfway through an operation. So it's not just about the fancy display. It's fundamentally about security, reliability, continuous operation. 00;16;23;17 - 00;16;48;05 Unknown Precisely. The cost of failure is just too high. That drives the design requirements. Looking ahead, A's role seems set to grow beyond just displaying info into diagnostics. Maybe warnings. Absolutely. We're expecting smarter surgical glasses using computer vision machine learning. They could warn a surgeon if they're about to nick a major blood vessel. Maybe recognizing anatomy faster than the human eye or outside the O.R.. 00;16;48;09 - 00;17;14;21 Unknown Glasses that detect abnormal moles. Maybe subtle signs of a stroke in a patient's face. Just through real time AI analysis. It turns the device into this indispensable heads up second opinion. Like a tireless assistant watching for things the clinician might physically fighting fatigue, reducing cognitive load. Okay, so let's recap the drivers consumer market coolness convenience capture accessibility profound independence. 00;17;14;21 - 00;17;38;19 Unknown Professional demonstrable safety precision efficiency. But in all cases, the utility has to be immediate and better than just using a phone or looking at a screen. Which brings us neatly to section four, the usability challenge. Because despite all this potential, these targeted successes, why aren't smart glasses everywhere yet? We need to tackle the persistent usability gaps, the hardware issues that still make people, you know, give up on them. 00;17;38;26 - 00;17;58;02 Unknown Let's start with the physical stuff, the things that just make them uncomfortable to wear all day. Yeah, the list is still pretty long. Based on market critiques, battery life is a big one. Many glasses just don't last a full day. When you're running the display, the eye, the sensors constantly and weight, ergonomics, heat, heat management is a real issue. 00;17;58;02 - 00;18;21;23 Unknown All that processing right next to your face. It generates noticeable heat that gets uncomfortable fast. Significant friction point, and then the display limitations, even advanced ones, often have a limited field of view. Right? Feels like looking through a narrow window. Yeah, that tunnel vision effect brightness can be an issue in daylight, but maybe the most frustrating technical problem is spatial registration drift. 00;18;21;26 - 00;18;45;05 Unknown Drift. That sounds annoying. It's a massive usability killer. It's when the virtual content, the holograms or overlays slowly misaligned from the real world as you move your head. Why does that happen? Technically, it's the internal slam algorithms, simultaneous localization and mapping. Getting slightly confused by imperfect sensor data over time. Give me a concrete example of why that's so bad. 00;18;45;11 - 00;19;12;27 Unknown Okay, imagine you're that surgeon again, using the vision system and the virtual overlay of the patient's spine starts to slowly slide off onto their shoulder. Oh, wow. Utility gone instantly, instantly destroyed. You lose all trust. You have to stop recalibrate manually. The user experience has to handle drift gracefully, maybe with self-correcting systems. Otherwise the device just becomes a burden right back to Norman's point about disruption. 00;19;12;27 - 00;19;32;21 Unknown And speaking of Norman, he was pretty skeptical about input methods too, especially gestures. Deeply skeptical, he specifically warned about systems relying too much on gestures, whether big mid-air hand waves or even subtle finger movements, why they're hard to discover. Like, how do you even know what gestures to make? They often lack clear feedback. Did it register my click or not? 00;19;32;21 - 00;19;57;00 Unknown And they can be unreliable or ambiguous depending on the situation. Noisy room, low light user fatigue. So we need more than just gestures. We need robust redundant inputs. Voice, gaze tracking. Maybe a simple touchpad relying only on gestures breaks fundamental UX rules like discoverability and reliability. Okay, beyond hardware and interaction, there's the elephant in the room, the social gap, the trust issue, that camera. 00;19;57;01 - 00;20;19;14 Unknown It's been a problem since day one, hasn't it? Yeah. Glasses are just perceptually invasive. Bystander discomfort. Surveillance anxiety is still a major barrier if the glasses have a camera. And most useful AI ones do, people around you just assume they're being recorded? Yeah. Violates social norms, creates friction everywhere you go in public. So the design itself needs to signal what's happening. 00;20;19;17 - 00;20;41;19 Unknown Social affordances. Exactly. Things like clear, maybe highly visible recording indicator lights, user control over when the camera is on or off and making them look like normal glasses. Like meta did. That helps lower the initial barrier. It minimizes that visual cue of I'm wearing recording tech until you actively choose to use it. Thankfully, people are working on solutions, right? 00;20;41;20 - 00;21;09;27 Unknown Experimental UX stuff front fix these problems. Yeah, lots of research aiming to tackle these usability hurdles and, solve the Nauman paradox for good. On the input side, people are exploring combining eye tracking mainly for selecting things with really subtle micro gestures for confirming actions and also developing contextual voice tech, trying to get systems to understand whispers or even sub vocal speech. 00;21;09;29 - 00;21;34;07 Unknown So you could quietly ask your eye for directions on a crowded bus without shouting hey Google! Exactly. Discreet public interaction. And what about tackling cognitive overload? The core of Norman's paradox? There's some really fascinating work there. Systems designed for attention flow management. How does that work? They try to learn your patterns when you usually check messages, when you tend to ignore notifications, and then optimize when to interrupt you versus when to just stay quiet. 00;21;34;09 - 00;21;57;09 Unknown Maybe using attention budgets where apps have to sort of compete for your limited focus so only the really important stuff gets through. That's the idea. Other concepts too, like adaptive opacity where content is there but fades back if you need to focus on reality, then emerges when irrelevant and using the edges of your vision. Peripheral vision notifications. Yeah, putting info at the edges of the display so it's available but not right in your face. 00;21;57;12 - 00;22;24;21 Unknown Ambient awareness not constant distraction. Managing focus seamlessly, and finally trying to fix that deep seated privacy issue. Through design, people are experimenting concepts like privacy bubbles, maybe visual indicators bystanders can see when recording is happening, or consent based interaction systems that try to negotiate data sharing in real time with people nearby. It's about trying to build trust back into the design. 00;22;24;21 - 00;22;46;00 Unknown So the ultimate goal is what? Ambient computing? Yeah, ambient contextual computing. The interface that basically disappears until you need it. It should feel more like, I don't know, augmented intuition rather than explicitly using a device. If they get it right, you won't even feel like you're using an interface. You just feel smarter, more capable. That's the dream. That's the ultimate definition of high usability. 00;22;46;05 - 00;23;13;10 Unknown Okay, so summing up, what does this all mean for AI glasses right now it seems clear they're only really succeeding where they offer unambiguous hands free utility and manage to navigate that norm and paradox. Right. And success currently looks like subtle audio camera stuff for consumers like Metta, profound life changing assistance for accessibility like envision, ally or Critical Precision and Safety gains in professional settings like automatics and surgery. 00;23;13;12 - 00;23;38;13 Unknown The industry is definitely all in meta, Apple, Google, tons of startups. They're pushing the hardware hard. It's getting lighter, more stylish, more powerful. The tech is getting close, but the future really hinges not just on better chips or displays, but on developers creating those killer apps right? Using the SDK is available and crucially, mastering the user experience challenges calibration, non distraction, social acceptance. 00;23;38;19 - 00;24;02;00 Unknown Yeah, the pivot happens when the glasses stop feeling like disruptive new hardware and become just seamless extensions of the digital life we already have. It feels like it was profound. Success might not even be the flashiest AR game or a holographic desk. Probably not. It'll likely be the tech that completely disappears, the AI assistant giving you guidance, translation, memory cues so seamlessly you just forget you're wearing it. 00;24;02;00 - 00;24;24;01 Unknown So for you, the learner, maybe the final question is this what daily cognitive burden, what little mental friction point would you most want your smart glasses to just invisibly eliminate? Finding your keys, remembering someone's name? Translating a menu instantly. The winning design will be the one you put on and immediately just stop thinking about Well-put. Thanks for joining us on the deep dive. 00;24;24;01 - 00;24;25;02 Unknown We'll see you next time.