… when A.I.+ exceeds human intelligence
There are risk with super intelligent machines, but AI+ will also be able to solve many problems. Besides help at everyday tasks, it could possibly diminish diseases, accelerate our knowledge in science and compute solutions to world problems.
… when you can have a digital twin
By reading your mails, chats, social profiles and listening to your talks and speeches – your personal digital twin will know you so very well – she/he/it could handle many not so important decisions for you!
Claudio Weck & Co. is doing Research and Development in Artificial Intelligence and Machine Learning. We focus on Deep Learning with aspects from Neuroscience for analysing BigData towards AGI as well as Natural Language Processing for improved and very human-centric Human-Computer-Interaction. We like to solve the connection between World-knowledge databases (symbolic) and Deep Learning!
ai-claudio.com is quite new, but we already engage relentless in research, authoring articles, blogging, IT consulting and project management. Please feel free to browse through our posts and contact us in the form below.
For consulting requests please approach Claudio Weck’s new employer MHP
We’re fit at BIGDATA & ANALYTICS and our innovation topics are: Connected Car – Real-time Business – Sustainable Mobility – Industry 4.0 – Social Media
If you see a button with the description ‘auto’ on a machine – you know you don’t have to do much – the machine will work it out itself – and that will soon be the same for cars.
In Germany we already call a car them ‘Auto’ which is a short for ‘Automobil’, deriving from the greek αὐτός and latin ‘mobilis’ – “self moving”. Well the name has it already in it – and this year there were so many news about the selfdriving cars.
Daimlers concept car F015 has some nice elements for people inside and outside the car.
Will the be no traffic jams if all cars drive in a steady way, with a secure but short distance to each other without any surprising behaviour? No traffic lights needed when cars are connected? Order a car with a desk during daytime or a comfy couch on the evening. No need of an all purpose car – enjoy a nice sporty one if you feel like it – or an super energy efficient if you prefer.
As soon as most traffic is done by autonomous cars – or as I like to call them Auto Autos safety will increase incredible – and small cars will be as nearly as safe as huge ones.
Big Data was a Buzzwords as well as a mostly undefined Fuzzword of the last years – now we have “Data Science”.
Whereas Big Data combines the 4 Vs (Volume, Variety, Velocity, Value = Analytics) Data Science is the science to work with big data, but here are some definitions:
Data Scientists are either statisticians with outstanding programming skills – or – software developer with outstanding knowledge of statistics.
while others say
Data Science is just a fancy word for the advanced linear regression
and a widly acknowledged definition is:
Data Scientists know the the Big Data tools, as well as the statistical Analytics combined with domain knowledge of the field of work they are in.
The IDC calculated in 2012 we created 2,8 Zetabyte of data, in 2020 it’s supposed to be already 40 Zetabytes (Tera, Peta, Exa, Zetta, Yotta … 40 * 10^21 Bytes). Heureka! It’s the new more ecofriendly raw oil
Carsten Bangefrom BARC said: Big Data is not only about big amounts of data. It’s also about the processes and methods for scaleable retrieval and analytics of information, which are present in diverse, often unpredictable structures.
Times changed and earlier database paradigms become outdates. Such as there should be no redundandent data in SQL databases become kind of obsolete, as in Hadoop redundancy is required at it’s core and as raw data is saved – it usually comes with redundant data.
The NoSQL area comes with a wide variety of techniques. With BASE (Basically Available, Soft State, Eventually Consistent) times of de-facto standard ACID (Atomic, Consistent, Isolated, Durable) are broken. Some sysems use Key-Value, graph or document-oriented -databases (like InfiniteGraph, Neoj4, CouchDB, MongoDB) others column-oriented tables (Amazon SimpleDB, Hadoop, SAP HANA).
Considering the CAP-Theorem – you can’t have it all: Consistency, Availability and Partition Tolerance – you have to decide two of them.
Processes of Data Seas
Big Data also means party unstructured or incomplete data – therefore it’s not like common databases (like SQL Tables) where a schema of the data is presented always beforehand – now it has to be more like a sea of data – where all data flows into – whatever shape it has. The formatting, preperation and transformation is then done afterwards and before the processing.
Interview Extract (tbc)
Hadoop, Apache Projects & other Tools
At it’s main core of Hadoop implements the HDFS – the Hadoop Filesystem. This is a distributed filesystem with is extremly scaleable, stores data redudantant, knows the network topology for improved errorhandling and much more. Most of the application within the Hadoop Ecosystem are able to access the HDFS.
Yet another Resource Negotiator (YARN) is handling distributed jobs and tasks. Instead of transferring data to machines to process – the jobs/tasks will be send to the machines where the data already is or close to. Usually this will be sent as a single Java JAR-File and YARN takes care of distribution as well as error-handling. Most but not all tools on top of Hadoop are using YARN.
Map Reduce is Hadoops programming model to process large quantities of distributed data. Within the Map-phase the tasks are sent to the nodes where they are computed locally and key-value result sets are created. Then within the Reduce-phase these information are mapped across the machines and unnecessary information removed.
MapReduce was a big success-factor for Hadoop at the beginning. As it relies heavily on writing results to disk it’s not applicable for real-time computing. Therefore this ‘batch-technology’ is used less and less by other tools within the Hadoop world. Several tools implement their own processing model either using disks or In-Memory approaches.
Apache Spark: In-Memory Database
Apache Storm: Stream processing for Input & Output
Apache Hive: Data warehouse providing data summarization, query, and analysis
Apache Hive Stream: the Storm alternative for Hive
Apache Flink: TU Berlin devoloped extra fast system
Apache Drill: SQL for ad-hoc reading from Hadoop, based on Google Dremel
Apache Pig: Script based, SQL like, Hadoop Task Creator
Apache Zookeeper: Overview and Taskmanager of Hadoop-world
Apache Hue: Web Administration of Hadoop ecosystem
Cloudera Impala: In-Memory Alternative to Hadoop but also possible to use within Hadoops HDFS
The Business Intelligence solutions: The BI software developer like JasperSoft, Tableau, Pentaho, Qlik as well as the giants Oracle, Microsoft, SAP and SAS support Hadoop.
SQL Engines reading Hadoop HDFS: Couldera Impala and IBM BigSQL / Infoshpere BigInsights
Queries combining Data from RDMS and Hadoop: Microsoft Analytics Plattfrom System (previous Parallel Data Warehouse) and Oracle (Big Data SQL)
In-Memory Spark (tbc)
In-Memory SAP HANA
SAP HANA is an extremly fast in-memory SQL database with is ordered by columns not rows. The system is as fast as it is promised and fairly similar to SQL or T-SQL Databases. It’s not only supportinf Hadoops HDFS but also simple integration to SAPs systems, such as R/3. Data in HANA is compressed which is advertised to fit 7 times the amount of (CSV) data per Gigabyte. Real-world examples are even better and compressed to one fiveteenth of the size. HANA SQL is pretty similar to T-SQL with some different Syntax – found here as PDF.
What took on regular databases hours, will run now in minutes – just let HANA selfoptimze (no manual indices needed) and possibly give a command a limited amount of RAM (with the new version) in case there is an error in the code.
A nice example for HANA is Process Mining. Instead of relying on peoples subjective opinion and answers, the real world processes are drwan by data-driven analytics. Mostly checking on timestamps and connecting tables of Logfiles and Changes in the database you can find out how much percentages of processes are in a unusual processes order. You get objective information so you can improve the processes.
So the conversion von traditional SQL Databases to SAP HANA is happily unspectecular – as Sebastian Walters states in the german source Big Data iX by heise.de.
Some say Predictive Data Science is a more fancy linear regression. Of course it’s not that simple, but statistics and mostly the regression is at it’s core.
Linear Regression and other statistical methods
Most common in linear regression is the method of least squares by Carl Friedrich Gauß from Göttingen. Here the discrete and incomplete Datapoints are given and a function is wanted which hach the smallest distance to all points in average. It shouldn’t matter if the distances are positive or negative – therefore the square of the distances are calculated. To create such a function one can choose how complicated the function can be (linear, any polynom, sin, cos, log, …) and how many variables it should have (e.g. y = ax^2 + bx + c) Finally from discrete values a continous function is build.
With modern computers it’s more easy to create predictions with non-linear methods, such as the Gauß-Newton method.
Another simple statistical method is the Maximum-Likelihood and the confidence interval where the average is compare with the average of a training sample set.
Contact the Author
Talking about machine learning today doesn’t go without talking about distributed fault-tolerant data storage and query system.
Hadoop is an open-source Apache Project derived 2006 from Yahoo! and based on papers from Google in 2003, is such a widely used system. Even though it is basically ‘mostly just’ a filesystem it’s three biggest advantages are
At the core of Hadoop is the filesystem HDFS (Hadoop Distributed File System) which stores it’s data in blocks across all DataNode machines. The data is replicated usually on a (or more) machine in the same rack, as well as on an other rack. Clients connecting to Hadoop to read or write will first question the NameNode which will tell them at which DataNodes they can attempt to access.
The current Hadoop 2.x version rely on YARN (Yet another Resource Negotiater) for that and with data and server replication there is no single-point-of-failure anymore. On top of that Hadoop uses MapReduce as key/value database. Therefore Hadoop is great for lots of data retrievals and querying.
It’s drawbacks are: the data in HDFS is not editable, only append able, it takes a lot of configuration work and without any additions and it’s not for real-time queries.
Artificial Intelligence is a huge topic at the ongoing Google I/O 2015.
Life from the keynote: most products discussed have new features heavily dependant on A.I.!
All-New Google Photos is using improved face and pattern recognition. Not only the face recognition was improved, which was already available in Google Picasa and other apps. But additionally and new to consumer photo album software – it also will use pattern recognition to add automatically searchable text tags.
Additional automatically cut and edited videos known as “auto awesome” will be improved as Google Photos “Assistant” – using A.I. internally to figure out which video sequences and photos to use.
Google Now has more then 1 billion information pieces able to show users to assist them at their current action. Simple via (A.I. powered) speech-recognition connected information are shown.
They have mentioned that since 2013, when speech recognition had a failure of about 23%, it’s down today to 8%. So we can expect this number to go down even further.
Len Epp is writing in A Vision Of A Driverless Future | TechCrunch about conclusions of the possibility of autonomous cars. Not only the drivers habits will change but the whole industry: shop on wheels, local services, no ownership of cars needed anymore, huge variation in sizes, …
So the AI powered self-driving cars will bring a huge change in our everyday life and for many businesses – great potential for many startups too.
Bill Gates agrees that we should be worried about artificial super intelligence!
Just a little after the ‘Future of Life Institute’ published an open letter on Research Priorities for Robust and Beneficial Artificial Intelligence, which states many risks and examples including many research ideas to avoid those risks. You can read our article Demonic A.I.! What are we afraid of? Solutions to the risks! about that.
Bill Gates: I am in the camp that is concerned about super intelligence. First the machines will do a lot of jobs for us and not be super intelligent. That should be positive if we manage it well. A few decades after that though the intelligence is strong enough to be a concern. I agree with Elon Musk and some others on this and don’t understand why some people are not concerned
Source: thisisbillgates @ Reddit
But he also is very exited and enthusiastic about A.I.:
Artificial intelligence is correctly perceived as a significant opportunity but also as an existential threat. A while ago Elon Musk urged us to be aware of A.I. , as it might be like calling for demons we might not be able to control. Also Cosmologist Stephen Hawking warned of the possible end of civilisation by AI. The dangers of which A.I. are very versatile.
About the dangers and far-reaching consequences was written very much in various media lately. As most readers have probably already read most of potential problems – here are solutions instead, from the brightest minds of A.I. and me.
(Starting soon) UC Berkeley’s introductory course of Artificial Intelligence (upper division course CS188) will begin and available to everyone online. You can join the classes for free or get certified for a minimum fee.
The lecturers will be Pieter Abbeel and Dan Klein, which both have masses of very positive student reviews and previous experience.
Stay also tuned for further information as well as lecture notes and possible discussions about this course on ai-claudio.com. If you can’t wait, have a look at UDACITY Machine Learning – 1 Supervised.
As young researcher we are extremely interested in the research and publication in Artificial Intelligence, especially Neuroscience for Machine Learning. Here are interesting Conferences and Events in 2015.
If you cannot attend, you still will be able get some of their papers and further resources on their linked event page.
Please feel free to send me further events or comments, they are as always appreciated!
02-05 January 2015, The Future of AI: Opportunities and Challenges in San Juan, Puerto Rico
21-23 January 2015, International Symposium on Artificial Life and Robotics (AROB 2015), in Beppu, Japan
25-30 January 2015, AAAI Conference on Artificial Intelligence (AAAI-15) in Austin, Texas, USA.
05-07 February 2015, Australasian Conference on Artificial Life and Computational Intelligence (ACALCI 2015), in Newcastle, Australia
25-28 March 2015, SMART Cognitive Science International Conference in Amsterdam, Netherlands
07-11 April 2015, International Neuroscience Winters Conference (17.) in Soelden, Österreich
08 – 10 April 2015, Evostar 2015 in Copenhagen, Denmark
19-22 April 2015, Spring Brain Conference, Bridging Neural Mechanisms and Cognition by FENS in Copenhagen, Denmark
22-24 April 2015, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2015) (23.) in Bruges, Belgium
19-20 May 2015 ICT Spring Europe 2015 with a AI&ROBOT AREA special in Luxembourg City, Luxembourg
26-30 May 2015, IEEE Robotics & Automation Societys Conference, (ICRA 2015) in Seattle, USA
25-28 May 2015, IEEE Congress on Evolutionary Computation (IEEE CEC 2015) in Sendai, Japan