… when A.I.+ exceeds human intelligence
There are risk with super intelligent machines, but AI+ will also be able to solve many problems. Besides help at everyday tasks, it could possibly diminish diseases, accelerate our knowledge in science and compute solutions to world problems.
By reading your mails, chats, social profiles and listening to your talks and speeches – your personal digital twin will know you so very well – she/he/it could handle many not so important decisions for you!
Artificial Intelligence – Claudio R&D
Claudio & co. is doing Research & Development in Artificial Intelligence and Machine Learning. Focussing on Deep Learning with aspects from Neuroscience for analysing BigData towards AGI, as well as Natural Language Processing for improved and people-centric Human-Computer-Interaction. We like to solve the connection between World-knowledge databases (symbolic) and statistical machine learning!
www.ai-claudio.com is engaged in research, authoring articles, blogging and connecting the AI communtiy.
Please feel free to browse through the posts or to get in touch through the contact form below.
MHP – A Porsche Company
MHP is a leading consulting firm in Digitalization and Automotive and a subsidary of Porsche.
Nonetheless MHP is consulting many other companies as well, like 6 automotive manufacturers, 90% of the German Top 25 automotive suppliers, a third of all German Top 100 commercial enterprises (by turnover) …
For inquiries about Consulting for Artificial Intelligence and Machine Learning for Big Data Analytics, Connected Car or Industry 4.0 – feel free to contact Claudio through MHP. Phone: +49 151 4066 7937
For all my international colleagues, let’s meet in Berlin on June 1st or the days before.
After the keynote panel talk and the break I will give a speech about these topics:
Give me a call or e-mail in case you are in town!
As responsible for the topic and team of Data Scientists in our business area, we moved to using Deep Learning more and more instead of the common classic statistical algorithms.
Of course there are a tons of different Frameworks but I ♥ keras
I came to I love keras as the meta language is able to work on top of TensorFlow, theano and also Microsoft CTNK is in development of supporting it.
Even for some, and I believe only a few, use cases the pure frameworks are more powerful – in most cases development time, optimisation, collaboration and education with it is amazingly efficient.
Read about keras: https://keras.io
Additionally, also Lasagne is such a good meta language approach. As I didn’t implement it yet in any projects, I can’t compare it much.
I will give a deeper insight in my following speeches and webinar.
In the last time a lot of great AI core software came out.
TensorFlow by Google integrated the code of DeepMind and is strong in imae recognition as well as natural language processing.
Besides the OpenAI group is working on a open and bigplayer independent AI platform for the masses. Of cource also Microsoft is strong with Machine Learning on Azure and Amazons DSSTNE is still living a shaddow life.
Within Automotive there are news everyday about new accomplishmens. Like from Audi, Mercedes, Telsa and all the others. Also IT gigants like Google and open-secretly Apple as well as often critizised Uber.
Even there are many restrictions often commented – mostly within the areas of ethics and laws – it’s a clear trend towards autonomous driving with more and more driver assistance systems.
Already for me, as digital native, the software features running within a car trumps the design & interrieur etc.
Working within the Automotive area on machine learning and artificial intelligence means having great changes ahead.
So more will follow 🙂
If you see a button with the description ‘auto’ on a machine – you know you don’t have to do much – the machine will work it out itself – and that will soon be the same for cars.
In Germany we already call a car them ‘Auto’ which is a short for ‘Automobil’, deriving from the greek αὐτός and latin ‘mobilis’ – “self moving”. Well the name has it already in it – and this year there were so many news about the selfdriving cars.
Daimlers concept car F015 has some nice elements for people inside and outside the car.
Will the be no traffic jams if all cars drive in a steady way, with a secure but short distance to each other without any surprising behaviour? No traffic lights needed when cars are connected? Order a car with a desk during daytime or a comfy couch on the evening. No need of an all purpose car – enjoy a nice sporty one if you feel like it – or an super energy efficient if you prefer.
As soon as most traffic is done by autonomous cars – or as I like to call them Auto Autos safety will increase incredible – and small cars will be as nearly as safe as huge ones.
Big Data was a Buzzwords as well as a mostly undefined Fuzzword of the last years – now we have “Data Science”.
Whereas Big Data combines the 4 Vs (Volume, Variety, Velocity, Value = Analytics) Data Science is the science to work with big data, but here are some definitions:
Data Scientists are either statisticians with outstanding programming skills – or – software developer with outstanding knowledge of statistics.
while others say
Data Science is just a fancy word for the advanced linear regression
and a widly acknowledged definition is:
Data Scientists know the the Big Data tools, as well as the statistical Analytics combined with domain knowledge of the field of work they are in.
The IDC calculated in 2012 we created 2,8 Zetabyte of data, in 2020 it’s supposed to be already 40 Zetabytes (Tera, Peta, Exa, Zetta, Yotta … 40 * 10^21 Bytes). Heureka! It’s the new more ecofriendly raw oil 🙂
Carsten Bangefrom BARC said: Big Data is not only about big amounts of data. It’s also about the processes and methods for scaleable retrieval and analytics of information, which are present in diverse, often unpredictable structures.
Times changed and earlier database paradigms become outdates. Such as there should be no redundandent data in SQL databases become kind of obsolete, as in Hadoop redundancy is required at it’s core and as raw data is saved – it usually comes with redundant data.
The NoSQL area comes with a wide variety of techniques. With BASE (Basically Available, Soft State, Eventually Consistent) times of de-facto standard ACID (Atomic, Consistent, Isolated, Durable) are broken. Some sysems use Key-Value, graph or document-oriented -databases (like InfiniteGraph, Neoj4, CouchDB, MongoDB) others column-oriented tables (Amazon SimpleDB, Hadoop, SAP HANA).
Considering the CAP-Theorem – you can’t have it all: Consistency, Availability and Partition Tolerance – you have to decide two of them.
Processes of Data Seas
Big Data also means party unstructured or incomplete data – therefore it’s not like common databases (like SQL Tables) where a schema of the data is presented always beforehand – now it has to be more like a sea of data – where all data flows into – whatever shape it has. The formatting, preperation and transformation is then done afterwards and before the processing.
Interview Extract (tbc)
Hadoop, Apache Projects & other Tools
At it’s main core of Hadoop implements the HDFS – the Hadoop Filesystem. This is a distributed filesystem with is extremly scaleable, stores data redudantant, knows the network topology for improved errorhandling and much more. Most of the application within the Hadoop Ecosystem are able to access the HDFS.
Yet another Resource Negotiator (YARN) is handling distributed jobs and tasks. Instead of transferring data to machines to process – the jobs/tasks will be send to the machines where the data already is or close to. Usually this will be sent as a single Java JAR-File and YARN takes care of distribution as well as error-handling. Most but not all tools on top of Hadoop are using YARN.
Map Reduce is Hadoops programming model to process large quantities of distributed data. Within the Map-phase the tasks are sent to the nodes where they are computed locally and key-value result sets are created. Then within the Reduce-phase these information are mapped across the machines and unnecessary information removed.
MapReduce was a big success-factor for Hadoop at the beginning. As it relies heavily on writing results to disk it’s not applicable for real-time computing. Therefore this ‘batch-technology’ is used less and less by other tools within the Hadoop world. Several tools implement their own processing model either using disks or In-Memory approaches.
Apache Spark: In-Memory Database
Apache Storm: Stream processing for Input & Output
Apache Hive: Data warehouse providing data summarization, query, and analysis
Apache Hive Stream: the Storm alternative for Hive
Apache Flink: TU Berlin devoloped extra fast system
Apache Drill: SQL for ad-hoc reading from Hadoop, based on Google Dremel
Apache Pig: Script based, SQL like, Hadoop Task Creator
Apache Zookeeper: Overview and Taskmanager of Hadoop-world
Apache Hue: Web Administration of Hadoop ecosystem
Cloudera Impala: In-Memory Alternative to Hadoop but also possible to use within Hadoops HDFS
The Business Intelligence solutions: The BI software developer like JasperSoft, Tableau, Pentaho, Qlik as well as the giants Oracle, Microsoft, SAP and SAS support Hadoop.
SQL Engines reading Hadoop HDFS: Couldera Impala and IBM BigSQL / Infoshpere BigInsights
Queries combining Data from RDMS and Hadoop: Microsoft Analytics Plattfrom System (previous Parallel Data Warehouse) and Oracle (Big Data SQL)
In-Memory Spark (tbc)
In-Memory SAP HANA
SAP HANA is an extremly fast in-memory SQL database with is ordered by columns not rows. The system is as fast as it is promised and fairly similar to SQL or T-SQL Databases. It’s not only supportinf Hadoops HDFS but also simple integration to SAPs systems, such as R/3. Data in HANA is compressed which is advertised to fit 7 times the amount of (CSV) data per Gigabyte. Real-world examples are even better and compressed to one fiveteenth of the size. HANA SQL is pretty similar to T-SQL with some different Syntax – found here as PDF.
What took on regular databases hours, will run now in minutes – just let HANA selfoptimze (no manual indices needed) and possibly give a command a limited amount of RAM (with the new version) in case there is an error in the code.
A nice example for HANA is Process Mining. Instead of relying on peoples subjective opinion and answers, the real world processes are drwan by data-driven analytics. Mostly checking on timestamps and connecting tables of Logfiles and Changes in the database you can find out how much percentages of processes are in a unusual processes order. You get objective information so you can improve the processes.
So the conversion von traditional SQL Databases to SAP HANA is happily unspectecular – as Sebastian Walters states in the german source Big Data iX by heise.de.
Some say Predictive Data Science is a more fancy linear regression. Of course it’s not that simple, but statistics and mostly the regression is at it’s core.
Linear Regression and other statistical methods
Most common in linear regression is the method of least squares by Carl Friedrich Gauß from Göttingen. Here the discrete and incomplete Datapoints are given and a function is wanted which hach the smallest distance to all points in average. It shouldn’t matter if the distances are positive or negative – therefore the square of the distances are calculated. To create such a function one can choose how complicated the function can be (linear, any polynom, sin, cos, log, …) and how many variables it should have (e.g. y = ax^2 + bx + c) Finally from discrete values a continous function is build.
With modern computers it’s more easy to create predictions with non-linear methods, such as the Gauß-Newton method.
Another simple statistical method is the Maximum-Likelihood and the confidence interval where the average is compare with the average of a training sample set.
Contact the Author
Talking about machine learning today doesn’t go without talking about distributed fault-tolerant data storage and query system.
Hadoop is an open-source Apache Project derived 2006 from Yahoo! and based on papers from Google in 2003, is such a widely used system. Even though it is basically ‘mostly just’ a filesystem it’s three biggest advantages are
At the core of Hadoop is the filesystem HDFS (Hadoop Distributed File System) which stores it’s data in blocks across all DataNode machines. The data is replicated usually on a (or more) machine in the same rack, as well as on an other rack. Clients connecting to Hadoop to read or write will first question the NameNode which will tell them at which DataNodes they can attempt to access.
The current Hadoop 2.x version rely on YARN (Yet another Resource Negotiater) for that and with data and server replication there is no single-point-of-failure anymore. On top of that Hadoop uses MapReduce as key/value database. Therefore Hadoop is great for lots of data retrievals and querying.
It’s drawbacks are: the data in HDFS is not editable, only append able, it takes a lot of configuration work and without any additions and it’s not for real-time queries.
Artificial Intelligence is a huge topic at the ongoing Google I/O 2015.
Life from the keynote: most products discussed have new features heavily dependant on A.I.!
All-New Google Photos is using improved face and pattern recognition. Not only the face recognition was improved, which was already available in Google Picasa and other apps. But additionally and new to consumer photo album software – it also will use pattern recognition to add automatically searchable text tags.
Additional automatically cut and edited videos known as “auto awesome” will be improved as Google Photos “Assistant” – using A.I. internally to figure out which video sequences and photos to use.
Google Now has more then 1 billion information pieces able to show users to assist them at their current action. Simple via (A.I. powered) speech-recognition connected information are shown.
They have mentioned that since 2013, when speech recognition had a failure of about 23%, it’s down today to 8%. So we can expect this number to go down even further.
Len Epp is writing in A Vision Of A Driverless Future | TechCrunch about conclusions of the possibility of autonomous cars. Not only the drivers habits will change but the whole industry: shop on wheels, local services, no ownership of cars needed anymore, huge variation in sizes, …
So the AI powered self-driving cars will bring a huge change in our everyday life and for many businesses – great potential for many startups too.
Bill Gates agrees that we should be worried about artificial super intelligence!
Just a little after the ‘Future of Life Institute’ published an open letter on Research Priorities for Robust and Beneficial Artificial Intelligence, which states many risks and examples including many research ideas to avoid those risks. You can read our article Demonic A.I.! What are we afraid of? Solutions to the risks! about that.
Bill Gates: I am in the camp that is concerned about super intelligence. First the machines will do a lot of jobs for us and not be super intelligent. That should be positive if we manage it well. A few decades after that though the intelligence is strong enough to be a concern. I agree with Elon Musk and some others on this and don’t understand why some people are not concerned
Source: thisisbillgates @ Reddit
But he also is very exited and enthusiastic about A.I.:
Artificial intelligence is correctly perceived as a significant opportunity but also as an existential threat. A while ago Elon Musk urged us to be aware of A.I. , as it might be like calling for demons we might not be able to control. Also Cosmologist Stephen Hawking warned of the possible end of civilisation by AI. The dangers of which A.I. are very versatile.
About the dangers and far-reaching consequences was written very much in various media lately. As most readers have probably already read most of potential problems – here are solutions instead, from the brightest minds of A.I. and me.