Just a quick one here: I wrote about Alphabet company Jigsaw’s machine learning-based approach to online content moderation a while back. At the time, I said it was nice to see AI and machine learning being applied to humdrum every problems that actually needed solving, but back then this was merely a concept that Jigsaw was making available. So it’s great validation for the technology that the New York Times is actually adopting it in a modified, customized form it’s developed with Jigsaw. That should both improve comment moderation on the Times website while also giving the underlying technology a boost, presumably making other news organizations more likely to try it.
Amazon Adds Reminders and Named Timers to Alexa (Jun 2, 2017)
★ Apple Starts Manufacturing Its Siri Speaker (May 31, 2017)
Samsung’s Bixby Further Delayed in US to End of June (May 31, 2017)
ARM Updates Main Processor Lines, Includes AI Optimizations (May 29, 2017)
★ Apple is Developing a Dedicated AI Chip (May 26, 2017)
★ Google Makes Assistant and Home Announcements at I/O (May 17, 2017)
It emerged over the weekend that Apple has acquired Lattice Data, a company which specializes in analyzing unstructured data like text and images to create structured data (i.e. SQL database tables) which can then be analyzed by other computer programs or human beings. TechCrunch has a single source which puts the price paid at $200 million, and Apple has issued its usual generic statement confirming the acquisition but offering no further details. It’s worth briefly comparing the acquisition to Google’s of DeepMind in 2014: that buy was said to cost $500 million and was for 75 employees including several high profile AI experts, though it was unclear to outside observers exactly what it was working on, while this one reportedly brought 20 engineers to Apple and has several existing public applications and projects to point to. Lattice is the commercialized version of Stanford’s DeepDive project, which has already been used for a number of applications involving large existing but unstructured data sets. Lattice has a technique called Distant Supervision which it claims obviates the need for human training and instead relies on existing databases to establish links between items that can be used as a model for determining additional links in new data sets. It’s not clear to me whether the leader of the DeepDive team at Stanford, Christopher Ré, is joining Apple, but he was a MacArthur Genius Grant winner in 2015 and this video from MacArthur is a great summary of the work DeepDive does (there’s also a 30-minute talk by Ré on the DeepDive tech). Seeing Apple make an acquisition of this scale in AI is an indication that, despite not making lots of noise about its AI ambitions publicly, it really is serious about the field and wants to do better at parsing the data at its disposal to create new features and capabilities in its products. It’s entirely possible that we’ll never know exactly how this technology gets used at Apple, but it’s also possible that a year from now at WWDC we hear about some of the techniques Lattice has brought to Apple and applied to some of its products. Interestingly, the code for DeepDive and related projects is open source and available on GitHub, so I’m guessing Apple is acquiring the ability to make further advances in this area as much as the technology in its current form.
Back in December, Microsoft announced its equivalent of Amazon’s Alexa platform for third parties in the form of its Cortana Skills Kit and Cortana Devices SDK. A week later, Harman Kardon announced its was working on a speaker that would feature Cortana, and said it would launch in 2017. Five months later, the two companies have provided a name (Invoke), pictures, and some capabilities for the device, but there’s still no specific launch date (beyond “Fall 2017”) or pricing. On paper, the Invoke looks a lot like Echo in both its design and its capabilities (it even has an Echo-like 7-mic array), and the main difference is that it will do Skype voice calls, which is something that’s been rumored for both Echo and Google Home but isn’t yet supported by either. One advantage Harman would have over Amazon or Google in this space is that it’s a speaker maker, so it may well have better audio quality in its version than those companies have in theirs, something that’s been a shortcoming in this category so far. And of course, it’s interesting given Samsung’s ownership of Harman Kardon that this speaker is running neither of the assistants Samsung itself supports – its own new Bixby assistant or the Google Assistant – though this partnership obviously began before the Samsung acquisition closed. Pricing is an interesting question: whereas Google and Amazon both have broader ecosystems which benefit from such a device and therefore justify subsidizing or selling it at cost, Harman obviously needs to make money on it, so it may end up being priced higher (as Apple’s version likely will be too). Lastly, we might see other ecosystem devices using Cortana announced at Microsoft’s Build developer conference this week.
★ Apple Siri Speaker Could Debut at WWDC in June (May 1, 2017)
KGI, which as I’ve noted before has a decent track record on future Apple products, says there’s a 50/50 chance that Apple’s entry in the connected home speaker market could debut at WWDC next month. There’s scant detail in the report other than that Apple’s speaker will have better audio hardware than the Echo, which has been criticized as being sub-par as a speaker despite its effectiveness as a voice-activated assistant device. I would certainly expect such a device to combine Siri, AirPlay, HomeKit device control, and possibly some kind of WiFi connectivity, but it’s very unlikely Apple could do all that well and still make its usual margin at the $130-180 price point that the full Echo and Home devices sell for. It’s more likely this would be sold in the range of the larger Sonos speakers (which Apple has been selling in its stores for the last little while), which would mean $300-500. That puts it in a different category from what’s out there today, which wouldn’t be unusual for Apple but would put it well out of impulse buy territory for most people and limit sales quite a bit. One big question is whether Siri is yet good enough for such a speaker, and what upgrades Apple might have in store for Siri at WWDC this year to help it get there. As I’ve suggested in the past, Siri’s shortcomings are at least in part hardware-based: more often than not, the problem is wrongly interpreting what’s said because of the tiny mics being used for voice recognition, and a big device should help a great deal with that. But Siri can also be frustrating even when it does understand what you say, and its more conversational elements are still pretty limited, which could be a big shortcoming on a device without an alternative input mechanism. I’m sure Apple will have some other special sauce in mind so this isn’t just another Echo or Home but something a bit different. But there’s a good chance this ends up being yet another new product category for Apple which sells a few million a year and which critics therefore contend is a flop, while it quietly generates a decent amount of revenue and profit for Apple (see also the Apple Watch and AirPods).
Amazon is giving developers of Skills (apps) for Alexa new speech tools which should help them create interactions where the assistant sounds more human through the use of pauses, different intonation, and so forth. Amazon already uses these for Alexa’s first party capabilities, but third party developers haven’t had much control over how Alexa intones the responses in their Skills. This should be a useful additional developer tool for adding a bit more personality and value, but I wonder how many developers will bother – new platform tools like this are always a great test of how engaged developers are and how committed they are to creating the best possible experience rather than just testing something out. I’ve argued from the beginning that the absolute number of Skills available for Alexa (now at 12,000) is far less meaningful than the quality of those apps, and many of them are very basic or sub-par, likely from developers trying something out as a hobby without any meaningful commitment to sustaining or improving their apps. On the other hand, the smaller number of really serious apps for Alexa should benefit from these new tools.
Amazon has announced a new device in its Echo family called the Echo Look, which assumes a different form factor, adding a still and video camera to features of the standard device for $20 more. For now, the focus is fashion advice: the camera can take full-length photos or videos of the user, acting like a full-length mirror at a basic level but also offering fashion advice through machine learning tools trained by fashion experts. I say for now, because once you have a camera in an Echo device it could be used for many other things too – indeed, when reports and pictures of this device first surfaced people assumed it was a security camera, and there’s really no reason why it couldn’t be. And several of these devices together could be very useful for motion sensing and other tasks as part of a smart home system over time too. But Amazon’s also smart to start specializing the Echo a little, with a particular focus on women, as I would guess a majority of sales of Echo devices to date have gone to men. I’d bet we’ll see other more specialized devices in time, but also other uses for this camera as it gets software updates. And this also starts to get at a real business model for Echo, which so far hasn’t done much to boost e-commerce sales but could now drive clothing revenue through sales of both third party apparel and Amazon’s own growing line. And what Amazon learns from the Look and its associated app can be fed back into the core Amazon.com clothes shopping experience too, improving recommendations in the process. But of course all this comes with downsides: not only do you have a device in your home that’s always listening, but you now have a device with a camera, which could feasibly be hacked remotely to take pictures or video of you. And Amazon will store the images it captures indefinitely, creating a further possible source of problems down the line.
via The Verge
★ Google Home Now Recognizes Multiple Users by Voice (Apr 20, 2017)
This has been a long time coming – in fact, in just a few weeks it’ll be a year since Google debuted Home at its I/O developer conference and implied that it would have multi-user support, though of course it was missing when the device actually launched in the fall. And that’s been a big limitation of a device that’s supposed to get to know you as an individual. So the fact that Google Home now recognizes distinct users by voice is a big deal, and an important differentiator over Amazon Echo. I’ve just tried it with my unit and although it set up accounts for me and my daughter without problems the app conked out when I tried to add my wife, so the results are mixed (I suspect it may be because my wife’s account is a Google Apps account). It does recognize the two voices we set up and will now serve us up different responses, which is great. One big limitation, though, is that each user has to have a Google account and has to download the Google Home app onto their phone, which means it won’t recognize little kids who don’t have Google accounts. And given that it’s using voice recognition rather than, say, different trigger phrases, I can’t set up separate personal and work accounts. But for those who can use it, the Home will now be a much more useful device, serving up calendar information, music preferences and so on on an individualized basis rather than trying everyone in a home as the same person.
★ Amazon Scales Alexa Back-End by Opening Lex Voice and Text Service to All Developers (Apr 19, 2017)
So much of the focus of coverage of voice assistants and interfaces is on the dedicated consumer products which use them, and that’s natural: these are the most visible and measurable signs of a company’s success or failure in this space. And yet the scale of those dedicated voice product is still very small relative to smartphones, which carry their own voice assistants. And scale is vital if these products are to improve, because they require lots and lots of training to get better, and so the more users there are training them, the better they become. As such, I suspect the next phase of competition in this space is going to be about developer voice platforms at least as much as it is about first-party hardware and software, and we’re starting to see signs of this from the big companies in the space, including Google and Amazon. Today, Amazon announced that Lex, which is a back-end service that combines many of the technologies behind Alexa, is opening up to all developers. But critically, this isn’t just a voice platform – it supports text and voice processing, which means that many of the developers might use it in chat bots or other similar environments that have nothing to do with voice but still help train Amazon’s natural language processing tools. Google is doing similar things with its own voice processing technology, but it’s doubtful whether Apple will ever open its voice tools up in the same way. That’s not a huge deal, because it has massive scale in voice on smartphones alone, but it may make a bigger difference over time as these other platforms benefit not only from growing first party scale but increasing third party adoption and use too.