Search Unity

Training an AI using human instruction...

Discussion in 'General Discussion' started by Arowx, Apr 25, 2017.

  1. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
  2. dogzerx2

    dogzerx2

    Joined:
    Dec 27, 2009
    Posts:
    3,967
    English and human language in general is not very ... precise. There's tons of things we just assume when we talk.
    On normal circumstances we interpret what others say correctly, but there's potential for misunderstanding.

    Sometimes it's fault of the hearer, but sometimes a phrase is just too ambiguous, so even a great AI machine could get it wrong. And lets face, sometimes people are lazy when speaking " Hey hand me that thing on top of that thing". You could fix this by being more precise when using English, though really you may just end up speaking in a similar way than programming languages.

    Though for having AI or Robots taking orders from non-programmers humans, seems like a possible future.
     
    GibTreaty likes this.
  3. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    Or could a 'programming AI' take simple descriptive natural language and convert it to code?!
     
  4. dogzerx2

    dogzerx2

    Joined:
    Dec 27, 2009
    Posts:
    3,967
    It definitely could, it's a give or take between 'descriptive' and 'natural'.

    Natural = You give less information to the computer, and the computer figures out the rest using context and "common sense" that has been extensively programmed before-hand.

    Descriptive = You need to be more specific and precise ... much like your typical programming language.
     
  5. Billy4184

    Billy4184

    Joined:
    Jul 7, 2014
    Posts:
    6,013
    This is where I would put my money. I think that there is no better bridge between an AI and a human being than spoken language.

    However, and this is the important part, the language used would not be some new kind of coding language or anything like that. It would be a case of you telling the AI what you want in layman's terms, and it knowing enough about programming to implement it.

    I think people who are trying to reinvent coding through some other interface are missing the mark. Coding is already about as efficient as it can be for a human being without turning it into some kind of esoteric sport. The next step is not to reinvent the means of communication, but to refine the contents of the communication itself so that a person can communicate less, and get a computer to do more.

    In my view, the biggest hurdle is going to be context awareness for an AI interface. If an AI can understand the 'common sense' unspoken boundaries of what you're telling it to do, that's when it becomes really useful. But there's no way to teach common sense with if/else statements, you need fast adaptive learning and a very deep, very refined, and very navigable repository of contextual information such as a human being develops through experience.
     
  6. dogzerx2

    dogzerx2

    Joined:
    Dec 27, 2009
    Posts:
    3,967
    I think it's possible. Today's tech is pretty impressive as it is, like Ok Google, I like the way it switches homophone words in order to make more sense out of what I'm trying to say, and it's surprisingly accurate, even if I speak dumbly.

    In addition, some tasks may not require a super high level of understanding. If you're telling an automated taxi where to go, I'm sure speaking English normally is more than enough for AI to get it right, requiring no more confirmation that a human driver would. I bet you could even give AI voice different personalities ... just to make it cool.
     
    Billy4184 likes this.
  7. Kiwasi

    Kiwasi

    Joined:
    Dec 5, 2013
    Posts:
    16,860
    The project used a 'fixed set of instructions'. That's not natural language at all. Natural language was simply a style choice, not a fundamental part of the process.
     
  8. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    The game was a longer game, with a longer time horizon than other games that AI technology had mastered. So the fixed instructions were guiding patterns that allowed the AI to navigate a longer time/goal problem.

    And the solution was to allow an AI to be guided by a human/domain expert.

    So could this open the way for game designers to use AI systems to make games?
     
  9. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,566
    Spoken language is very ambiguous, so it is not a good bridge by itself. You'll need to teach AI body language and intonation as well. Unless AI speaks Lojban, that is.
     
  10. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,147
    We haven't been shown a video of the game in action but the example phrases they give are completely within the realms of a text adventure game from the late 70s and early 80s. There is already at least one game on Steam that is controlled via voice.

    http://store.steampowered.com/app/319740/
     
  11. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    Watson Jeopardy, Siri... I think spoken language is good enough.

    I think the problem is knowledge domains, or jargon. The aim here is to guide an AI to solve a problem not obstruct it e.g. Tell an AI about your game design or where you need to drive to.
     
  12. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    Before looking at the article:

    My first reaction was, that's siri,alexa and okay google ...

    My second reaction was, what's the input data and what's the output data. Given what I know and what has been established, an AI (DNN) can look at an image and output what is happening in the image. Given a NN trained to pair of image instruction, I see this just a different application of something well documented by now (captions generation), instead of forming a phrase (ie a sequence of token) it turn it into actions (command instead of token, in term of application that's the same).



    Given a sufficient training, DNN excel at parsing ambiguous context on a level equivalent to human.

    I think the model propose can be improve and generalized in multiple way, first go through a syntactic parser that will tokenized the sentence, in order to expose underlying information like word category, class and function, to have generic knowledge, make it so it access specific words (like name) through a tokenize memory (to keep the model agnostic of specific words with similar function like name). Train the model with those data on top of current, use a reverse parser to handle the memory as token, the NN would simply see and manipulate the tokenize memory and the parser would replace specific words under similar circumstance.
     
  13. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,566
    Jeopardy is not actually maintaining dialogue, and speaking of siri:
    $ (2)_crop.jpg $_crop.jpg

    AI voice recognition still has a long way to go.

    In the game @Ryiah mentioned there are people complaining that that voice recognition usually fail.

    It is not a very good interface. Unless there will be a language designed for talking with robots.
     
  14. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    Well voice recognition and semantic analysis are quite different problems, that's unfair to lump them together.
     
  15. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,566
    I think it is fair, if the idea was to use speech as a "communication bridge with AI". For this to work both voice recognition and semantic analysis needs to be perfect. Otherwise you'll need to talk with AI in either lawyerspeak or in lojban. Neither of those is a good idea.
     
    zombiegorilla and Ryiah like this.
  16. Billy4184

    Billy4184

    Joined:
    Jul 7, 2014
    Posts:
    6,013
    In the context of getting the AI to program something you want, body language and intonation is probably not incredibly relevant.

    A reasonably precise technical language would be necessary to describe the high-level attributes of what you want, but what the AI would need to mainly fill in has probably more to do with taking a few high-level attributes, and producing a comprehensive set of low-level attributes that are very likely to not contradict something that you didn't explicitly say, but is nonetheless important. Basically something equating to a human worldview, and experience in the field.

    I'm not talking about "speaking code with AI", I'm talking about telling the AI what you would like in the same terms that you would describe it to a human programmer.
     
  17. Billy4184

    Billy4184

    Joined:
    Jul 7, 2014
    Posts:
    6,013
    Yeah that's what I mean, the idea is for the AI to already know 95% of what it has to do for any given task, and you only have to trigger it - and only replace its understanding of what you want with something explicit if it's something that you would have to specify to a human being as well.
     
    dogzerx2 likes this.
  18. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    I would say it depend on the scope of the domain problem, it works for data query (siri and co) but if you want to discussed why your last romance failed that's sound like a big scope. Imho we are past the data query level already but not quite yet at a strong conversational AI yet (very obviously).

    But the reason I make the distinction, is that voice recognition is currently decent, and semantic aspect are explored by voiceless chatbot. Chatbot don't work very well in a strong conversational sense yet (only after a clever human anticipate and script a lot of context, and even then that's still not enough) and they get the literal words out of your keyboards. So my reaction is that voice recognition is close to keyboards input in natural language conversation (not code like input though, but I haven't thought through this one).

    I haven't seen anything convincing with any AI for semantic, past the context scope of a single sentence, anyway. AI would need long semantic dependency and I haven't seen any architecture that support this yet. It only work on limited sequences then forgot the data (for NN, other method have exponential growth of the syntactic tree and its permutation, as I understand it)

    The problem with siri and co is that it's basically a chatbot with a voice interface, as such they suffer from limitation of chatbot, mostly that their domain are quite limited and the author didn't anticipate domain like suicide prevention or domestic abuse (why a tech nerd would have thought of that anyway, time to hire non techy people with an eye for peoples). Also it has the failing and limitation of human too, ie they don't have all the culture and vernacular of everywhere, so an AI to be registered as human like must actually be superhuman like.

    Now for programming with voice, I need to what would be the requirement in the first place, I don't think we have such hi level framework anyway in any format.
     
  19. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,566
    Properly describing a problem to a human programmer very often involves a huge design document and several hours of talking. And even after that misunderstandings can arise when things aren't written down properly.

    Which brings us back to lojban and lawyerspeak.

    While I can agree that it improved, I don't think it is decent. It'll be decent when software will be able to accurately write down casual speech without people dictatating it. My android tablet has voice recognition enabled for 3 languages, and it has very hard time reliably recognizing voice input in any of them reliably. Basically, it tries to find the most similar sounding word, and often misses by the mile, sometimes incorrectly switching language in process. The idea is cool, but I haven't yet seen a system that can easily catch user's speech without user putting in extra effort to make sure he/she is understood properly.

    I do agree that current voice services are pretty much chatbots, though.
     
  20. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    The problem of casual speech is not that they can't do it, is that they must do it for all form of human casual speech lol, even human has trouble with this, as long as you are only listening is fine, but when you put your mind to translate it into regular text suddenly things get complicated, casual speech is usually about the general vague idea rather than the precise wording (text translation would be more paraphrases), I don't think it's impossible, but you need the resource to chase this particular problem.

    So lawyer speak is inevitable in the end whether it's AI or people. Either you have the same background and can fill in the blank perfectly, and that only work with people you have work with for a very long time (or an AI trained sufficiently long on your works, which mean no out of the box magic solution) or you need a superhuman AI that had spies on humanity long enough to be omniscient :eek:

    That said small version can still work in very small domain. The next real breakthrough is when semantic parsing will be on the scope of an entire books or movie rather than one sentence or image.

    We don't have to aim for perfect, good enough will be already great, and we can adapt and learn to work around the limitation, like we do for all interfaces. After all even text itself isn't a correct translation of the mind.
     
  21. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Your guess is as good as mine Mr Arowx.

    I would say that in order to teach an AI you might also require
    • A pointing device (e.g. a finger)
    • Two facial expression: Happy=good.:) Sad=bad :mad:
    • Two eye expression. o_O expectant eyes. as in "your turn to do something". and looking at the pointing device.
    With these few things a surprising amount of information may be conveyed. The AI's goal would simply be to maximise the time it sees a happy facial expression.
     
  22. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493

    How to Convert Text to Images - Intro to Deep Learning #16

    While it's an example with image, it shows that you can turn (text) speech into complex contextual data pattern. And artist challenged people can think of a time where they will simply tell the computer what they want and the computer will synthesize them ... It's not even science fiction as it's possible now with quite good output. If you think the example provided in this video are blurry and unfit for production it's already solved in paper, they have another NN trained to deblur image into very good reconstruction, then pass this result again to upsampling NN that will give it details at a good resolution. The only thing missing is a good interface and a good design of input data to get something truly production ready. And you can do it for a bunch of reading a few hundred lines of coding, less it takes to make a game and less complex that it takes to make some gameplay mechanics. Also you can use google services now:

    Adding Machine Learning to your applications

    There was also a paper that turn simple instruction into a series of steps, so this idea might be closer to what we thinks, since most improvement require linear improving. It won't be long where you input a movie pitch and it will generate an entire movie proposition.
     
  23. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    At some point, AI will be used to improve AI. That's the point where things really get interesting.
     
    Ryiah likes this.
  24. Billy4184

    Billy4184

    Joined:
    Jul 7, 2014
    Posts:
    6,013
    The real question though is, will we be part of that? I think it would be worth trying.
     
  25. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    There is already attempt though, let's drop that here :rolleyes:
    https://github.com/lordjesus/UnityNEAT

    The other thing I would like to try if I had time to indulge in all of this is having a neural networks looking at another as if it is an image (to pick a metaphor) and optimizing the learning rates of the observed network by setting initial parameters, controlling hyper parameters (learning rates, drop out selection and co), basically piloting it instead of having human makes wild guess before running a training.

    Also my pet hypothesis, of an armchair neural engineer who never implemented a deep version, is that neural networks are basically a search into N dimensions, drop out methods (aka randomly dropping layers or neuron in the training to speed up learning) is less counter intuitive when you consider this perspective, by constantly moving all the parameters (neurons) you introduce diagonal biases, drop out allow it to move by small nudges (side step) in a smaller number of axis (neurons), making it easier to home on the goal by avoiding obstacles. I wonder if astar would works on NN learning lol (I think it's a stupid idea that would have been attempted by now lol).
     
  26. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    I don't doubt that with endless processing, you can eventually find the answer to every possible question that can be asked as a question. But this isn't true reasoning, it's just path of least resistance. For true reasoning you need to figure out what flaws to introduce and why.

    Otherwise, given a set of inputs, the human brain will always faithfully produce outputs. It can't do that because it is flawed.* So the flaws are interesting and people just want machines to make better guesses.


    * The time taken to perform mental arithmetic will always vary. Imagining a picture will always vary, and so on.
     
    Ryiah likes this.
  27. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,566
    Erm.

    How are you planning to train the controller network? Neural network is a mathematical function that maps input to output. You'll need a training dataset, and tuning another network is a MUCH bigger problem than classification of several millions of images into classes.

    Also... deep learning is insanely time consuming. If I remember correctly set of images used for imagenet took a week or so on a GPU cluster.
     
  28. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
    So they wipe out all the answers I made yesterday, sad!