055 - What Can Carol Smith’s Ethical AI Work at the DoD Teach Us About Designing Human-Machine Experiences?

It’s not just science fiction: As AI becomes more complex and prevalent, so do the ethical implications of this new technology.But don’t just take it from me – take it from Carol Smith, a leading voice in the field of UX and AI. Carol is a senior research scientist in human-machine interaction at Carnegie Mellon University’s Emerging Tech Center, a division of the school’s Software Engineering Institute. Formerly a senior researcher for Uber’s self-driving vehicle experience, Carol-who also works as an adjunct professor at the university’s Human-Computer Interaction Institute-does research on Ethical AI in her work with the US Department of Defense.

Throughout her 20 years in the UX field, Carol has studied how focusing on ethics can improve user experience with AI. On today’s episode, Carol and I talked about exactly that: the intersection of user experience and artificial intelligence, what Carol’s work with the DoD has taught her, and why design matters when using machine learning and automation. Better yet, Carol gives us some specific, actionable guidance and her four principles for designing ethical AI systems.

In total, we covered:

“Human-machine teaming”: what Carol learned while researching how passengers would interact with autonomous cars at Uber (2:17)
Why Carol focuses on the ethical implications of the user experience research she is doing (4:20)
Why designing for AI is both a new endeavor and an extension of existing human-centered design principles (6:24)
How knowing a user’s information needs can drive immense value in AI products (9:14)
Carol explains how teams can improve their AI product by considering ethics (11:45)
“Thinking through the worst-case scenarios”: Why ethics matters in AI development (14:35) and methods to include ethics early in the process (17:11)
The intersection between soldiers and artificial intelligence (19:34)
Making AI flexible to human oddities and complexities (25:11)
How exactly diverse teams help us design better AI solutions (29:00)
Carol’s four principles of designing ethical AI systems and “abusability testing” (32:01)

Quotes from Today’s Episode

“The craft of design-particularly for #analytics and #AI solutions-is figuring out who this customer is-your user-and exactly what amount of evidence do they need, and at what time do they need it, and the format they need it in.” – Brian

“From a user experience, or human-centered design aspect, just trying to learn as much as you can about the individuals who are going to use the system is really helpful … And then beyond that, as you start to think about ethics, there are a lot of activities you can do, just speculation activities that you can do on the couch, so to speak, and think through – what is the worst thing that could happen with the system?” – Carol

“[For AI, I recommend] ‘abusability testing,’ or ‘black mirror episode testing,’ where you’re really thinking through the absolute worst-case scenario because it really helps you to think about the people who could be the most impacted. And particularly people who are marginalized in society, we really want to be careful that we’re not adding to the already bad situations that they’re already facing.” – Carol, on ways to think about the ethical implications of an AI system

“I think people need to be more open to doing slightly slower work […] the move fast and break things time is over. It just, it doesn’t work. Too many people do get hurt, and it’s not a good way to make things. We can make them better, slightly slower.” – Carol

“The four principles of designing ethical AI systems are: accountable to humans, cognizant of speculative risks and benefits, respectful and secure, and honest and usable. And so with these four aspects, we can start to really query the systems and think about different types of protections that we want to provide.” – Carol

“Keep asking tough questions. Have these tough conversations. This is really hard work. It’s very uncomfortable work for a lot of people. They’re just not used to having these types of ethical conversations, but it’s really important that we become more comfortable with them, and keep asking those questions. Because if we’re not asking the questions, no one else may ask them.” – Carol

Transcript

Brian: Welcome back to Experiencing Data. This is Brian T. O’Neill. Today we’re going to talk about design user experience, and particularly in the context of artificial intelligence. And I have Carol Smith on the line today.

Carol is formerly of Uber’s self-driving car unit, and she is now a senior research scientist in human-machine interaction and adjunct professor in the HCI Institute at Carnegie Mellon University. So, Carol, that’s a lot of words in your title. Tell us exactly what all that means, what you’re up to today, and why does UX matter for AI?

Carol: Yeah, thank you for having me. So, the work I do really crosses between the people and the problems we’re trying to solve and these technologies that are emerging, and new, and complex, and creating new and interesting problems that we need to pay attention to and try to solve. But none of this is necessarily new, but because of the ways that AI can be extended beyond where typical software programs can be extended, for example, is the reason why it becomes even more important. It’s just more and more people are affected by the decisions that we make with these systems. And we need to be more aware of the effects of the systems on people.

Brian: Yeah, I mean, this is actually one of the reasons I don’t use ‘UX’ too much in the language when I talk about this because now we have third parties who are not stakeholders, and they’re not users that are also impacted. And so the human-centered design framework, to me, is a better positioning of the work that we all need to be doing, I think if we care about the systems in the world that we want to live in, and all this jazz. So, the other area of focus that I thought was really interesting here is you’re doing a lot of work in ethics, and you’re also doing work and warfighting with artificial intelligence. And I love this yin and yang here. And so I want to dive into some of that as well.

But maybe you can kind of start us out with the work that you were doing at Uber, and I’m assuming that seeded some things in the work that you’re doing today at Carnegie Mellon and with the government clients. So, tell me about that kind of process, and what did you take away from Uber, and all that?

Carol: Yeah, yeah. So, I joined Uber with the expectation that self-driving vehicles were right around the corner. I was somewhat ignorant, going into the status of that technology. And it appeared that the vehicles driving around my neighborhood that seemed to be doing it themselves, and seemed to be maturing quite quickly from the early prototypes that we saw, to the more formalized frameworks that we were seeing, literally driving right down the street. So, I assumed it was near, and coming, and really was hoping my kids would be able to use self-driving cars instead of learning to drive.

But that is not going to happen, unfortunately. And so that was back in 2017. And when I joined, I was doing some really interesting work on human-potential passengers, and what would their expectations be and how do you gain trust? As well as the operators and dealing with the information that they were getting from the system, and how would they understand what the system’s needs were, and what the next movement would be of the vehicle and just having that relationship, the human-machine teaming there. And then also the actual developers working on that developer experience.

People who are actually making these systems work, they needed information about the systems in easy to understand formats at the appropriate level. And so trying to keep the complexity there, but also make sure that the meaning was available to them, and that they had this high-level view. So, those were some of the problems I was looking at there. And it was just really interesting. I’ve always enjoyed working with machines and people.

When I was at Goodyear Tire, we worked with these massive vehicles that did mining. The tires that we were mostly concerned with were 15 feet high. But the people were 30 feet high in these vehicles, and so just understanding how they were managing these systems, how they understood what was going on, that was very important, then, as well. And then the ethics has actually been something that is kind of flowed throughout my career. The work that I do has always involved consent, and making sure that the people that I was involving in the research understood what I was doing, and why, and that they could stop the work at any time, and that sort of thing.

So, a lot of dealing with just the ethical implications of the work I was doing has always been part of that. And then as well worrying about am I doing work that is actually helpful to people? Am I doing work that can be added-socie-making society a little better, even if it is just a financial application or something like that. It’s still, ideally, I want to be doing something that’s somewhat meaningful. And so the opportunity at Carnegie Mellon was an opportunity to work, as you mentioned, to help to keep our warfighters safe, and that intersection between soldiers and artificial intelligence.

And many of them don’t have that background in computer science or in artificial intelligence necessarily understand the systems. And so helping to break down what the system is doing for them, and helping them to be able to have the proper control. Something-thankfully, our Department of Defense does have a set of ethical principles. And they are very strongly pushing those for artificial intelligence, so maintaining the control with the humans, and making sure that these systems are within those sets of ethics. And so how do you implement that for a soldier? And make it relatively easy to use: these are very complex systems, but thinking through that process is the work that I do.

Brian: Mm-hm. And do you self-identify as a designer, as well, or your work primarily just in the research, and feeding that into the makers, so to speak, the teams that are going to produce?

Carol: Yeah, I’m definitely not a not a designer, in a visual sense anyway. I do wireframing, and prototyping, and things like that, early ideation and early troubleshooting mostly. And then ideally, I partner with someone. There-Alex, on our team, is a wonderful designer, so I partner with her, frequently. Working with people who have that skill and are able to really bring things to life is really important as well, it’s just outside of my skill set.

Brian: Got it, got it. One thing I wanted to talk about, I think maybe you touched on this, and this is something that I’ve given a fair amount of thought to, and I feel like there’s a lot of young energy in the design community, and the audience on this show is primarily people coming from the data science, product management, analytics background. The young energy is treating a lot of this designing for AI and machine learning as something entirely new. I see this as an extension of what we do with human-centered design with anything.

At the end of the day, we’re still building software, and we talk about data products and machine learning and all this, it’s like, “Wait a second. There’s still just a software application. Maybe there’s a hardware component. Let’s not turn it too much into something different,” because to me, a lot of the core material, the core activities still need to happen. It’s just, it’s almost like a bigger problem, now. You have probabilistic outcomes, so many more use cases to test for, learning systems, and all of this. Do you agree with that, or do you think, no, we need a fundamental shift in mindset about how we approached all of this?

Carol: Yeah, so I think it’s both. It is definitely building on what we’ve always done as far as best practices in design, as far as user experience, clarity, and understanding of the next steps, and things like that. Like the basic heuristics that we’ve always learned. Those are ever so important, and even more so now because when you’re working with a more complex system, you need to know even more. So, that hasn’t changed.

But what has changed, and what I see to be more of a challenge for people coming into this type of work is the complexity of the systems, and understanding dynamic systems, and systems where the data and the information that’s presented may change over time and may change relatively quickly, and the interaction needs to adjust as well. And so if you’ve mostly worked on websites and relatively simple applications, it’s a huge change to go from those types of interactions to these. So, that’s where I see the biggest challenge. And then with autonomy, that is new work. And that is something that there just isn’t a lot of visual design efforts around.

Most of the interfaces for people who need to understand what the autonomous vehicle is doing are either completely simplified to the point that there’s really almost no information gained through; that’s usually the consumer view, so if you look at some of the transportation vehicles that are used in parking lots and things like that, and even Tesla’s interactions, they’re very scaled down. They’re very basic. It’s your speed and direction and the little bit of information beyond that, and that’s about it. Whereas the expert using a system where they really need to understand much more detail, and they need more information about what’s in the scene, and what’s around them, there’s a lot of work to be done in that area to provide an expert view that is consumable visually.

Brian: Sure, sure. And I think our audience is probably always riding this line because as people producing decision and intelligence products, and things like this, it’s always a question of how much evidence do you throw at people? How much conclusions do you draw? Model interpretability is a big question. I’m seeing a lot more activity around, you know, “Any amount of interpretability is always better. More of this all the time.” And I’m kind of like, “Well, wait until that becomes a problem because you can also just overwhelm people as well.”

And this is the craft of design is figuring out who this customer is-your user-and exactly what amount of evidence do they need, and at what time do they need it, and the format that need it in. I don’t know if you can just build a recipe around that except to say, “If you’re not doing research, you can’t find the answer.” Like [laugh] would you agree with that. It’s not a-

Carol: Definitely.

Brian: It’s not a checklist item, you know?

Carol: Yeah, yeah. And there’s no one right answer for these systems. It’s got to be for the context that the person is in, and it needs to be for that individual. So, a technician who is trying to troubleshoot a system is going to need very different information than the person who’s just using the system, just for an example. And I always think of the Bloomberg Terminal and how much data is there, and if you’re not familiar with those kinds of financial pieces of data, it just looks like a huge mess. [laugh]. “Why would anyone do that?” is usually the reaction.

And yet, the financial analysts and the others who use and appreciate that, find it to be enormously helpful and very meaningful information, and the right information at the right time. And so, there’s that difference. You also find this in any complex system: airplanes, and things like that, there’s a huge amount of information that needs to be given to an expert, and they need that information, but they need the right information at the right time, and so figuring out that balances is only something you can get from working with those individuals and understanding their work deeply.

Brian: Yeah, you hammered on something I talk about a lot, and this is that context, and we see sometimes it’s, “Let’s copy this template,” or, “Let’s copy this design from someone else.” And it’s like, well, unless you’re copying all the research and the customer base, you have no idea whether or not this template makes sense, and the Bloomberg is a great example. So-especially with enterprise, I think you have to take a grain of salt. When I go do an audit, I’m very careful about making assumptions about stuff that looks really bad on the surface. Not to mention the disruption you can cause by going in and changing things that you don’t understand the legacy of why they’re there, and all this kind of stuff.

So, you make some really good points there about the right amount of volume, and information, and all of this. So, if you don’t have designers on your team, there’s still a lot of data science groups and analytics groups that are now being tasked with coming up with machine learning and AI strategies, and that’s a different kind of work, especially when the work is not well-defined upfront by the business. Now, we’re into innovation, and discovery, and problem framing, and all this other kind of stuff. What are some things that someone that doesn’t have designers on staff, but they know, “I want to build better stuff because people aren’t using our stuff a lot of the time. We want to get more adoption, we want to drive more value.” What are some of the activities that quote “anybody” can start doing to get better at this? What would you recommend?

Carol: Yeah, yeah. So, certainly from a user experience, or human-centered design aspect, just trying to learn as much as you can about the individuals who are going to use the system is really helpful. So, even looking on LinkedIn, and websites that they might frequent, and just trying to glean as much information about those individuals as you can. Minimally understanding the terminology that’s appropriate is extremely important. And then beyond that, as you start to think about, like, ethics and things like that, there are a lot of activities you can do, just speculation activities that you can do on the couch, so to speak, and think through, what is the worst thing that could happen with the system?

Something I’ve been working on is a [00:12:59 checklist], which we can share with your audience, to help people just kind of frame those conversations that they need to have with their team and start to think through just the implications of their work. How are we controlling the system? When are we passing control from the human to the machine and vice versa? How are we going to represent the data and the source of the data? How are we going to show individuals the biases that is inherent in the data, and how we convey that in a way that shows them the strengths of the system, as well as the weaknesses, or the limitations of the system.

So, just really having those conversations can help you to begin to understand better how to do that work. And then there are lots of resources online, certainly for user experience and human-centered design research type activities, that people can start doing. It’s one of those things that you only get better by doing it a lot. And so it’s tough, if you’re just doing it once in a while, to be skillful. Much like if you put me in front of a terminal and asked me to start coding, it would really not go well. [laugh].

Brian: [laugh]. I understand. You make good points there. I do want to drive into the ethics material because you’ve published a fair amount of stuff. I did see the checklist and I’ve already pre-linked it up in my [00:14:15 show notes] because I think it’s great.

Is part of that exercise-and I feel like it should be I don’t know if it is because I didn’t have this question in my head when I read it, but should we be caring about ethics? And I know there’s business leaders probably saying like, “Okay. It’s, like, another tax on my project, and we need to kind of check the ethics box.” They’re not really excited about it, like maybe the way a UX person would be because we’re driven by empathy and all these other stuff. They want to do the right thing, but they also are going to say we don’t have time to, like, blow up the whole world and spend tons of time on this.

Is part of the work figuring out first whether there is a potential ethics issue, and then if you say, “Oh, there’s not,” you pass the level one checks, you really can just proceed because this is, like, some internal operations thing that’s shuffling paperwork around using machine learning, and it’s not really going to affect anybody. Is that part of it? Is figuring out if there actually is an ethics question, and then proceeding with level two diagnostic if there is-how do you frame that? Is that the wrong framing?

Carol: Yeah, no. I think that’s a reasonable way to go. It’s just initially just really being exhaustive about it, to some extent, really thinking through the worst-case scenarios. So, particularly if you know that there’s going to be private identifiable information about individuals, or you know, if you know that you’re dealing already with a particularly risky area. So, in the US, like, with mortgages, there’s been a lot of work where people have tried to create these systems, and because of the inherent bias in the data that they’re trying to use to build the system, that data just carries on through into the system.

And so that would be a situation where clearly there are already issues, we know they’re existing issues in the human system thinking that the AI system is going to get rid of those is nonsense, really. It just can’t, it can’t take away those types of issues. So, I think the first thing is, yeah, looking just subjectively, is this an area where we need to be on high alert or is this more of a situation, where we just need to make sure that we’re building really good software, and that we’re not leaving open doors that someone could easily hack into the system and that sort of thing. But then, when you do know that you’re dealing with the public in any way, or dealing with particularly those higher risk areas, then there’s a lot more work that needs to be done to both protect the data, protect the people, and also to do mitigation planning and communication type work, just to think through ahead of time, how are you going to shut off the system if it comes to that? How are you going to-

Brian: You can’t turn it off.

Carol: Right.

Brian: [laugh].

Carol: Right. That’s every sci-fi movie, right?

Brian: Right, I know. [laugh].

Carol: Ahh. You didn’t plan for that? How can you not plan for that?

Brian: Yeah. There’s no off switch. It wasn’t a requirement.

Carol: Right. Right. Right. Really? Like, yeah, just I think doing good work is the key to preventing a lot of this. My kids were watching Cloudy With a Chance of Meatballs the other day, and he built this machine and launches it, and it storms food items, and people are getting hurt by food items, and turn it off is unfortunately not an option.

Brian: Right. [laugh].

Carol: Yeah. [laugh].

Brian: Something that I can see from the engineers who I know and love, from all my clients in the past, super smart people, but sometimes it’s like, “Well, we can add that in later. We can provide a method to give feedback, so now it’s learning from the feedback that it gets,” or these are, like, features that get added on. That’s maybe a downside of the traditional software engineering approach, or is it? I’m sure some of this, if you’re training a model on a bad data set, and there’s no discussion about that, then the nuts and bolts of the whole system already has a problem, right? Or at least if it’s not explained to the user that that’s what has been trained on.

But do you see some of this as these are improvements that are added along to an MVP or a first version? Or it’s no, you just don’t ship any of this without some minimum level of all of these different kind of special requirements for AI solutions? Like, how do you think about that?

Carol: I think it depends. It depends, like, is that MVP, literally just a click-through site together knowledge and interest? Then that’s probably minimally a problem. But if you’re already building and you know you’re building an AI system, a much more complex system, you must start this work super early because if you wait, you’re going to find that some of the decisions cannot be done, at least not easily. So, it really is important to do it at the inception of the project, at least from those high-level speculative type activities.

So, some of the ones that I recommend are ‘abusability testing,’ or ‘black mirror episode testing,’ where you’re really thinking through the absolute worst-case scenario because really helps you to think about the people who could be the most impacted. And particularly people who are marginalized in society, we really want to be careful that we’re not adding to the already bad situations that they’re already facing. So, it’s really important to do it upfront. And much like accessibility, people feel like, “Oh, we’ll fix that later.” You can’t really effectively fix those types of activities later; you really do need to build it into the system.

And it’s really important to do the right thing, in this sense. And to your earlier point, businesses aren’t always interested in that, and it’s a hard sell, unfortunately. It will probably end up being lawsuits before many individuals will really understand the importance of it. But I’m hoping that we can get enough people at the ground-level doing this work that will already be baked in.

Brian: Yeah, I mean, ultimately, a lot of it’s going to come down to appetite for risk, taking chances, and all that kind of stuff. So, this kind of is a good bridge, I think, to talk about some of the warfighting and military work that you’re doing. Talking about ethics in that context is really interesting, as I’m sure you’re imagining. It’s kind of like I’m here to pull everyone to the left, while the technology wants to go to the right, and you find some middle ground here.

The first thing that jumped in mind when I saw this is, “We’re putting out a code of ethics about this.” Isn’t that exactly what quote “the enemy” is going to not follow to help to level up, and then you have that natural poll to, like, bend the rules and, like, “Well, we’ll just automate this too, and we’ll automate that.” And the next thing you know, you’re-it’s machines and machines. So, talk to me about this yin and yang, and finding that middle ground, what is that like? It’s a fascinating area.

Carol: It is. It is. And it’s not something unfortunately, that’s new for the US Department of Defense. We’re constantly, unfortunately, working against organizations that will go to any lengths to make sure that they win. And that’s just not the way we do the work that we do.

That’s not the way the United States wants to present itself, at least that’s not the United States I believe in. So, trying to figure out where that balance is, is really challenging, particularly right now, in the cyber world. Now, I don’t work as much in that work-doing cybersecurity type work-because there’s so many people using the various activities to get into these systems and to break through security protocols. In some cases, some of my colleagues have to think about other less ethical ways of doing that work, too. And thinking about, do we make what they call a honeypot and attract them in order to prevent further damage?

And at what point are you crossing that line? And just really thinking through all those types of implications. And the same way with the warfighters is, we need to give them enough control and protection to keep them safe, and at the same time, there’s always that point of, well at what point, if, for example, someone lost control of the vehicle and there was a crash, and people got hurt, you know, starting to think about, how do we prevent that? Is there a way to prevent that? Is there a way to put in an automatic stop, and what are the implications of that in the system?

If it automatically stops, then what risk are we potentially incurring? And just thinking through those types of things is really hard. I was teaching a workshop a few days ago, and talking about how most of this isn’t about the trolley problem the way people think about the trolley problem, which is-the idea is, that there’s a trolley person and they are managing the trolley, and there’s a fork, and the decision is to go left or to go right, but in both directions, people get hurt, get killed most likely, by the trolley. That’s an in the moment decision. What we really are doing with this work is thinking through that long, long before it ever is going to happen.

So, long before that incident, how can we prevent that? And where does the trolley stop? And what are the implications of that? Who do we need to notify that it’s stopped because it prevented either of those strategies because we built a safe system? And that doesn’t necessarily mean-safety is always relative.

Another example is passing other vehicles. To human drivers, we want good three, five feet, I don’t know what it typically is, but there’s a distance that we want, between us with that yellow line and some distance, and to a self-driving vehicle, it doesn’t care, right? As long as it’s passing by the merest centimeter, it’s safe, technically. But how many people are going to feel safe in that vehicle at that time, when it’s that tight?

It’s like driving in-I’ve been in some countries where people pass each other that closely, going slower, but [laugh] it’s still very-feels very risky. And where’s that balance between protecting and doing all we can to keep people safe, and also not creating a worst problem in some way?

Brian: Some of the stuff you were talking about, I thought it’s really important to hammer this, I think, into the community of makers working on these things is, for example, you can’t teach a machine justice; you can simulate justice-like decisions, but it doesn’t feel that-safety is also a feeling, right? It’s not quantitatively decided. If I’m in a jumbo jet going 900 miles an hour, then that three to five feet of spacing does not feel as good as that if I’m in a 10 mile per hour tank, where it’s like, oh, we got plenty of space, reach my hand out. There’s something going on there that’s very emotional, and it’s human, and you can’t teach that to the machine. I mean, maybe you can.

Maybe there’s an algorithm or a formula for size of craft, velocity, this much space feels right. I don’t know, maybe there’s some way to quantify that. But is that kind of the work you’re doing, is helping to quantify some of these things, and make them into requirements or parameters for the system, and say this is what we learned through research? Is that part of what you’re doing?

Carol: To some extent, yeah. I’m not doing as much of that right now, but I have in the past, and really just trying to figure out how do you get that context into the system. But the other thing is, it’s just as subjective. So, what you think is safe and what I think are safe are different. In the time of COVID, we’re learning a lot about what that person thinks and what I think are safe are two very different things sometimes.

And we both use the word ‘safe,’ and we-you know. So, it is, it’s very subjective. And part of my job is also just figuring out okay, so since this is very subjective, how do we make a system that is somewhat flexible for that situation, so that for this more aggressive person, the system is appropriate and helpful to them, but also for this more conservative, more careful person, they are going to feel that the system works? Or is that two different systems? Do we need to build two different systems?

And then how do you easily switch off? And what happens if they have to share? There are lots of those kinds of, really the aspects of the humans and our complexities and oddities, and the machine and figuring out how do we get that partnership really working is mostly where I focus.

Brian: I think a lot of product designers have probably felt this before at some point in their career that we create work and we slow shit down. Like, we add tax for long term value, you know, long term usability, long term investments that pay off, but in the short term, it feels like tax, more requirements, more stuff, slow it down. How does it feel when we’re doing warfighting? And in the context of work you’re doing, do you think that yes, I know it’s a tax and you, kind of, acknowledge that? Or do they not see it that way, the teams you work on? How do they see the work that you do with user experience?

Carol: Yeah. Well, I’m very lucky, I mostly just work on prototypes. So, I work for what’s called an FFRDC, so the work we’re doing is super early, just thinking through ideas and trying to help people. So, we actually don’t have that constraint. I have worked a lot in Agile, though, and in those instances, it can feel like the user experience work, the ethics work, that it’s slowing the system’s down.

And I think people need to be more open to doing slightly slower work, I do think that the move fast and break things time is over. It just, it doesn’t work. Too many people do get hurt, and it’s not a good way to make things. We can make them better, slightly slower. I’m a huge fan of getting things released, and getting feedback, and getting things out, but if it takes three weeks, instead of two weeks because we spent a little bit more time thinking through, and we protected people, I think that’s a win.

Brian: Yeah, I think it’s always-the risks have to be understood. I think part of the work is asking the questions, having the scenarios that you talked about. I want to get into the abusability testing, too. That’s a great word; I hadn’t heard of that before. But before that, you had written down, I’m going to quote you here.

I think this was on the ethics checklist, but you said-and this is talking about diversity of teams in this process, and why this is important, and you said, “Talented individuals with a variety of life experiences will be better prepared to create systems for our diverse warfighters and the range of challenges they face.” And I was like, “Wow. Okay, so I was a touring musician, went on a van, whatever. What the heck does my life experience have to do with contributing to something like that?” I found that fascinating. So, talk to me both about diversity, not just in skin color and race in the teams but of experience. How does an artist have something to do with this? Tell me about that.

Carol: Actually, that’s that’s an excellent example because you have worked on cramming huge amounts of equipment into probably not large enough vehicles, and you probably were traveling with more people than you maybe should have been at the time. And that’s an excellent way to think about soldiers in a vehicle who are being transported: they usually are in very tight spaces and have a lot of equipment, and they need to make sure that the equipment is cared, and fed for, but also themselves. And being able to appreciate that and understand that is actually important. So, in that example, I think it’s a great comparator. And generally, the thing that people with different backgrounds bring is just those different life experiences.

So, an example I don’t enjoy using, but that’s very exemplary is when you think about smart thermostats and smart speakers in people’s homes, and the way that we tend to share passwords and things like that, with other humans that we’re in close relationships with. If those relationships become violent, if a person then leaves, the violent offender leaves, then they still have access to the home. They still are able to potentially make the home temperature uncomfortable, they can raise and lower the volume of speakers, they can do all kinds of things to make that person’s life unbearable, even though they’ve physically not-you know, they’ve left the home. So, those are the types of things I try to think about. Just how can we keep people safe and prevent those things from happening?

And abusability testing is an excellent way to do that because if you think about how that system can be abused, you might get there. Someone who has, unfortunately, been in that type of relationship, or been stalked, or been in other situations where they felt threatened, is going to be much more imaginative about the ways a system can be abused than someone who has never felt concerned for their own safety. So, that’s where that diversity matters, as well. It’s just having people who have those different life experiences and can say, “You know, I can imagine an ex of mine really misusing that, or abusing me with this. If they had access to that I’d still be having trouble.” That type of thing. I think that just having people with those different experiences, you’re just more likely to have those kinds of conversations, or at least I hope that you’re in a safe enough organization that you can have those conversations.

Brian: Talk to me about, then, these four principles for designing ethical AI systems, and if you could go into the abusability testing thing, I think it’s an actual tactic; it sounds like a very fun activity people can actually do, but it has real purpose as well. So, can you break down these four principles?

Carol: Yeah, sure. Yeah. So, the four principles are: accountable to humans, cognizant of speculative risks and benefits, respectful and secure, and honest and usable. And so with these four aspects, we can start to really query the systems and think about different types of protections that we want to provide. So, with accountable to humans, for example, we can start to think about who is making decisions?

How are we making sure that humans can appeal or somehow undo an action that an important decision is made by the system? How are we protecting the quality of life and human lives in general? How are we making sure that the system is not making decisions that we don’t want it to make? And with cognizant of speculative risks and benefits, this is where we get into abusability testing. So, this is really thinking through those worst-case scenarios.

And with the abusability testing-this was made popular by Dan Brown, and the idea is that you really take the time to think about the work that you’re doing, to think about the scenario, and go through the steps of thinking about the good, the good things and the benefits that the system can provide, but also the negative aspects and what those are, and what could happen potentially, if the system was hacked, or if the system was turned off inadvertently, or if someone wants to hurt someone else with the system, using the data, using whatever aspects they might have access to. This is particularly important for systems that are using camera data, or facial recognition, or anything like that. Things where human lives or any important information, again, is determined, we want to make sure that we’re being as speculative as possible so that we can prepare, ideally prevent, but at least mitigate, and then communicate about how we’re mitigating those issues, and make sure that people are aware of them. And then with respectful and secure, this has to do with people’s data. For example, just making sure that we’re not collecting more information than we need, and that we’re taking responsibility for all the data that we collect, and making sure that we’re keeping it as safe as possible.

Also, making sure that the system is easy enough to use, easy enough to be secure, that we don’t have to worry about people writing information down on a post-it note where it might be accessed by someone else. And then with honest and usable, that’s with regard to the system actually identifing itself: being clear about when it’s an AI system. So, particularly with smart speakers, and chatbots, and things like that, we want to make sure that humans always know when they’re dealing with another human versus a machine. And so making that really clear to them. And being honest about, again, the weaknesses, the limitations of the system, how it was built, who built it, where the data came from, why they should trust the data-or why the data might be questionable-and providing all that information upfront so that people can determine how best to use the system.

Because there may be-there was an example that-I’m not going to remember her name-but I was at a presentation a week or two ago, and she was talking about how they found with certain candidates using the system that they had built, they shouldn’t use the AI system. The AI system had significant bias for certain individuals, and so in some cases, they would go ahead and use the system because it was faster for the decisions that they’re making, and in some cases, they would specifically not use the system because they knew the system would not make the best decision. Even though it was going to be faster, it was going to provide a very biased answer, and so they just made that determination about when to use the system.

Brian: Got it. Got it. The second principle, I wanted to ask you about this because I thought this is really interesting, and I wrote down, “Risk to humans’ personal information and decisions that affect their life, quality of life, health, or reputation must be anticipated as much as possible, and then evaluated long before humans are using or affected by the AI system.” And so I immediately thought about your warfighting [00:36:00 unintelligible]; I was like, wow, that’s kind of at odds in some ways, right? How do you balance that? And did this actually come out of some of the work you were doing in the defense space, where you’re like, “It’s this yin and yang,” I just found that kind of fascinating because the opposite seems to be what you’d want the tech to do if it was an offensive solution, you know? [laugh].

Carol: Yeah. Yeah, yeah. Well, ideally, we will always have humans making those final types of decisions, regardless. But it’s still-yeah, it’s a really difficult area to work in. And the Department of Defense has always had standards of ethics and standards of action with regard to the soldiers.

Unfortunately, those aren’t always followed, but for the most part, they are, and they’re very important to the Armed Forces. And so making sure that that’s also in the AI systems is really important. Making sure that we’re still standing for the things that we believe in, and protecting life as much as possible. Certainly, no soldier wants to be responsible for the death of anyone that they don’t intend to injure, so that’s part of it; it’s just making sure that the systems truly are safe, and truly are protecting life as much as possible. And where it’s up to the individual soldier to make that determination, then they’ll make that determination.

And they’ll have that responsibility on them, not on an AI system because an AI system doesn’t have rights and responsibilities. It’s just a computer. So, making sure that that responsibility stays with the individual who is making that decision, who’s always had to make that decision.

Brian: Yeah, yeah. Good stuff. Any closing thoughts for our audience? Where do you see things going? Or a message you’d like to convey strongly about this work?

Carol: Yeah. I’d say just keep asking tough questions. Have these tough conversations. This is really hard work. It’s very uncomfortable work for a lot of people. They’re just not used to having these types of ethical conversations, but it’s really important that we become more comfortable with them, and keep asking those questions. Because if we’re not asking the questions, no one else may ask them. And we need to make sure that the work that we’re doing is protecting people, and is the right work to be doing, and is going to be helpful, and hopefully be really useful and usable, and all the wonderful things that we want our users’ experiences to be.

Brian: I think it’s a great place to finish. So, Carol Smith from Carnegie Mellon, thank you so much for coming and talking about this.

Carol: Thank you. This was a pleasure.

Show Notes

Quotes from Today’s Episode

Links

Transcript

Other Episodes

Episode 0

033 — How Vidant Health’s Data Team Creates Empathetic Data Products and Ethical Machine Learning Models with Greg Nelson

Episode 0

023 - Balancing AI-Driven Automation with Human Intervention When Designing Complex Systems with Dr. Murray Cantor

Episode 0

036 - How Higher-Ed Institutions are Using AI and Analytics to Better Serve Students with Professor of Learning Informatics and Edtech Expert Simon Buckingham Shum