Security-, AI-, and Business-related blog of Eugene Mayevski

Wednesday, December 8, 2021

About AI and professional focus

While studying AI (both ML and AGI), I've made the following notes, which, possibly, will be of interest to others.

By "AI," we'll mark what is currently called "artificial intellect," but that's not intellect. This is just automation of solving certain tasks using specific machine decision-making algorithms - decision trees, neural networks, and some other mechanics. All those algorithms are based on data and require vast amounts of homogenous data for their operations and enhancements. It would be more appropriate to call this Data Science, yet the talk is not about science but about applications of this science. Hence AI. BUT this is not intellect as understood by humans.
Why is Python so popular in AI? The answer is quite unexpected. Python as a programming language for large products is not suitable. But its benefit is a fairly rapid mastery. BASIC of modernity of a kind. So, ML is not for programmers but for data scientists who may or may not be able to program. For them, Python is easier to master. The fact that everyone says "Python has more libraries" is not a problem because .NET also has ML.NET and wrappers for TensorFlow, Keras, etc. That is, if someone needs to implement ML in .NET - no problems. Is it possible to access Python from .NET? You can try using IronPython or approach the task via pythonnet. Thus, if you want libraries in Python, then there are solutions.
Various ML courses still teach primitive algorithms, "bricks." The nuance is that all these bricks have long been implemented in the above-mentioned libraries, and it is not algorithms that should be studied but when to use which algorithms. A modern data scientist will not write an own regression or some other primitive function - it's not just extra work (everything has already been written for us), but you are not able to use the implementation with ready-made bricks. It's like writing your SHA256 or AES128 - well, you wrote it, but then where to apply it, as all encryption libraries have and use their own implementations.
A consequence of (3) - a modern data scientist doesn't do programming at all but performs composition of blocks (low-code and no-code technologies) as well as builds and optimizes data pipelines. That's what is called MLOps now.
There exist two directions of operations and development of an AI specialist and AI business. The first one is the building of data pipelines, described in (4). That's a great job, well-paid and required by the market. But that is not programming. If you love "data" as an entity, then you'd like this job too. So, this work and this business are about data: the more data, the better. There are no genius ideas, disruptive technologies, sophisticated inventions, and similar breakthroughs. The foundations of ML are not rocket science: they have simple math and trivial data transformations. You will succeed in building a business here if you see data and know how to collect them and apply them to optimize certain business processes. To re-iterate, you need to possess huge amounts of data; only this gets you ahead of the competition. The second direction of action is the application of the ideas and capabilities of ML in solving vertical market tasks and problems. AI activities and AI businesses in this direction grow from narrow and specific scientific and technological needs. Here, you may have little data (or no data at all). But you may make a breakthrough when solving a scientific or technological problem. For this direction, you first of all need to be a specialist in some subject - chemistry, physics, biology (or you should have such a specialist), and then you'd search, which ideas and principles of ML and AI you can apply in this specialist's work. Everything between those two directions ("we can write code, and we want to apply this knowledge in AI" most often doesn't have practical applications and a business benefit. Consequently, the risks of such a business are not justified by the possible commercial results.
Could one use ML and AI for solving tasks X, Y, Z? Most likely, no. Modern AI, based on large datasets and data streams, is hardly applicable when you have little or no data. For this, you need other technologies like AGI. Those are, unfortunately, put behind because they don't bring quick money to large players, capable of investing in those technologies.

(this post has been originally published on my personal page).

Monday, February 20, 2017

What really breaks SSL?

An article about how SSL is misused (or not used at all).

The point is that SSL itself is secure, and it's people whose mistakes and misunderstandings make SSL-protected resources vulnerable.

Why are companies bigger, than they are supposed to be?

When you look at the average IT company, you see the front-end (the product it offers) and that's all. Many people wonder why the amount of personnel is so large, for the seemingly simple product? Well, there are several main reasons, which I list below.

1. Customer service and sales
2. Scalable Backend development
3. Research and optimization
4. Permanent security evaluation
5. Integration with others

1. Customer service and sales. This is the department (in IT companies it's often the same personnel for both tasks) that communicates with customers, prospects, and partners. It can be infinitely large, especially for a mass market product.

2. Scalable Backend development. Most (if not all) IT products (software) have a part, called back-end, which contains most of the business logic. For mass market products, this part must not just work properly, but also be scalable, i.e. be able to handle many requests concurrently. To do so, the backend runs on several computer systems simultaneously, and they must play nice with each other. such scheme of operations is called "distributed system". Design and maintenance of distributed systems are one of the most complicated tasks in IT industry, and there's always room for improvement in any distributed system.

3. Research and optimization. The efficiently designed and implemented system can perform much better, than humans, and human labor is nowadays much more expensive than computer systems. Thus it makes sense to put the most of the burden on computers. But their resources also have certain limits, and the more operations you can "fit" into a single system, the more commercially efficient and profitable (let's hope so :) your system is.

4. Permanent security evaluation. New ways to penetrate protection of computer systems are found and presented literally every day. This can lead to money loss or even more serious, even deadly threats. Obviously, this is not what any business would like to experience, so security evaluation, patching, further evaluation is the never-ending loop of work of IT security specialists.

5. Integration with others. IT systems rarely work in the isolated environment. Even when the software application works on the local computer, it is run on top of the operating system layer, and playing nice with the OS is a kind of integration as well. Now, the data must enter and leave the software system. Making this possible requires following certain standards, data exchange formats etc. All of this is also integration. Finally, the software rarely delivers the complete solution to some business problem. More often, the software is the part of the larger workflow and integrating into as many workflows as possible is important to deliver the better experience to customers and to grow the user base.

Wednesday, January 11, 2017

Some more manipulations with the cat's fur

As I wrote a number of times, PKI's reliance on Certificate Authorities, which could possibly work 30 years ago when it was originally invented, doesn't perform well now, when the number of CAs has grown significantly, and it has become possible for a wide range of shady entities, both governmental and private, to become CAs or to issue CA certificates and make them trusted on customer computers. This includes hardware vendors (remember Superfish issue), antivirus software (check Kaspersky's root certificate issues), corporate proxies and more.

There's a need to check in some way, that the certificates presented during the connection, are the ones that the original server intended to send. It turns out that there's no standard way to do this, and the browser vendors are not looking for efficient ways to address the problem either. There's a great article on Computerworld, which lists some of the possible solutions for the problem, with descriptions and references. It contains a couple of simple yet easy to implement approaches, which beg for putting them into RFCs and similar standards. Instead, Google goes for implementing hard-to-enforce and hard-to-use approaches like Certificate Transparency Initiative. My vote goes to Certificate Validation Framework. Also, you can choose your favorite approach :).

Sunday, October 9, 2016

Ignorance as a key to malware

The very disturbing study, described in detail here, indicates that any security warning are not effective, cause in the worst case people simply not "trust" the warnings. I.e. they trust a fishing website more, than the respected developer of the web browser. Ok ...

I think that's the same reason people go for Trump in these elections.