The Future of Computer Science

时间：2022-06-27 10:19:03

It’s a great pleasure for me to be here today and have this opportunity to talk to you about my view of the future of computer science, because I think this is a very important time for those of you, the students. What I like to do is I like to start out by telling you a very quick story about the early part of my career. I graduated from Standford in 1964 from the electrical engineering department. This was before computer science got started. I was hired in coincidence in the electrical engineering department, and what I was asked to do was to teach a new course in computer science. I did realize the time of what this did, as this is maybe one of the world’s first computer scientists. What it did is to give me a lot of opportunities that quite not normally have, because there were no senior faculty ahead of me that I have to wait for them to retire. So I have many opportunities, that if I have been in electrical engineering physics, I would still be waiting today for the senior faculty ahead of me to retire. The moral of that story is that I think you are at a similar time, because I think computer science is changing in a fundamental way. And if you position yourself for the future, you will have a dramatically fantastic career.

So the story is simply to make the point that there is a fundamental change taking place. It’s going to impact us in many ways that those of you that recognize this and position yourselves, you will benefit from it. So let me talk a little bit about the early years of computer science. For the last 30 or 40 years, we have been concerned with how to create programming languages, compilers, operating systems. We are worried about algorithms. We are worried about what functions can be computed and things of this type. But that era I think has come to an end. And the future is going to have more to do with how computers are used, and some of the topics that have been important are structures of large networks, large data sets, information and search. The drivers of this change are part with the computers. You will find them in your automobile and your home, they are just becoming ubiquitous. And the speed is sufficient for most of the things that you do. More important is the merging of computing and communications, the fact that information is available in digital form, and sensor networks are being created. What I am going to do with my talk is that I am going to talk a little bit about what this means for the disciples of the computer science. I will give you some examples so that you will certain to see what my vision of the future is. And then I will give you an example of the science that is needed to support this future.

So if you look at CS departments around the world, you will begin to notice that courses are changing. One particular area is, you will see their courses having to do now with random graphs, phase transitions, giant components, spectral analysis, and small world phenomena. All go through I will give you a little examples of each of these. What is going to bethe theory that is needed to support this future? We are going to have to learn how to handle large amounts of data. That means the future could be more concerned with probability and statistics than with discrete mathematics. You are going to see data which is very noisy and you will have to extract that data signals. You are going to deal with high dimensionalitems, and this is quite different from the CS of the last 30 years. Just to give you some examples. Today if you want to buy an automobile, you might type into Google “autos”, and get hundreds of web pages out that you will have to search. Similarly if you want to find out about graph theory, you type that in a query “graph theory”. And if you want to know what university you go to, you type in colleges and universities. But in each case you will get back hundreds of web pages, and you will have to sort through them for the information that you want. But tomorrow the queries are likely to change. What you going to do is have the computer sort through those pages for you. Instead of typing in a word “autos”, you’ll say “which car should I buy”, and you like to get the answer. For “construct an annotated bibliography on graph theory” or “where should I go to college”. Let me just show you what kinds of answer should one expect from the computer. So if you want to know “what kind of car should I buy”, what the search engine will do is that it would search many web pages, and it will look to find various parameters of automobiles. And it will give you back a list of things likecost, reliability, fuel economy, crash safety. It would ask you to referring to which of these are important in your decision to buy an automobile. And once you have done that, it would then recommend 2 or 3 automobiles to you. And what you review is if it gives you a table giving you answers to all these questions, it tells you the cost, reliability and so forth. But it would also give you links to websites where there are independent reviews of the automobiles it is recommending. Such as in the US, there is a publication called “consumer reports” of another car driver, which are independent of the automobile industry, and if it links to the specific cars, that writes recommending to you.

If you ask where I should go to college, once again it will give you a set of criteria, and it will ask you which is important. Is the cost important? Is geographical location, size, type of institution? And when you talk to computer which part is important, it will give you a rank of institutions according to your criteria. It will tell you things like student faculty ratios. And it might also give you the email addresses of graduates from your high school who had gone to that particular college, so you could ask them what the institution really like.

If you ask how did some scientific field develop, you will expect the computer maybe to go to the ISI database to find the important papers, whom the authors of these important papers were, what institutions, and how ideas flow through the development of that field. Just show you this is possible, that should be possible outcome if I ask “show me how theoretical computer science developed”, it could first search and find out what papers were the key papers, and it would discover there is one by Hartmanis and Stearns. And maybe 10 years later, there is a paper by Cook and another by Blum that are very important. Then one buy Karp, looks paper on different places. And theoretical important papers, one of them by Andrew Yao, who is here in Beijing at Tsinghua University. And what it will do is to give you a list of these key papers, from whom they are, and the links. So you could pull out to find those papers and see very quickly how the field developed. This is quite different than today you have to search to hundreds of publications to try to figure out which ones are important. Just show you that these are actually feasible, this is a graph developed by a couple of faculties and a couple of graduate students for now. What they did is they went to different NIPS conferences and this conference has had been existence of 13 years and all the papers are in digital form. So they took this set of papers and they clustered them. And then what they did is they took 11 or 12 clusters and plotted the size of the clusters over the time. And you can see from this, there is a yellow cluster here, which is corresponding to Bayesian network, mixture, posterior, likelihood, and you can see that is an area growing, so that’s an area that maybe you a Ph.D student want to get into. You can see some other areas here, these areas are becoming less important. This is just a simple example to show you that you can do to figure out which areas are growing and which are not.

And what I am going to do now is I am going to just show through some of the things which are changing. And just give you a few of the world is changing. When I first started research, the way we want to publish, is you make discovery, write paper, submit to a journal, send out for refereeing, revise the article, copy edited, in print a couple years later, and then the results are available worldwide in major research libraries. Well, what I like to have is I want to make sure that all these publications are available to everybody without charge. But since there is some commercial reasons here, that wasn’t true. So one of the things we did is we give press to young faculties. At first, publishers will not like Google search their publications. Faculties told junior faculties don’t publish in the journal that Google can’t search them. What this did is just force the literature publisher let Google search their tables and contents. But then what would be happened is you would like to search and you can find the article, but when you click on the article, it would ask you for $10 or something to get a copy of that. And so we then told faculties that were publishing don’t publish in the journal unless it allows you to put a copy of the publication on your webpage, so it could be accessed by everybody. And today that’s now the standard procedure. So you can see that the internet is forcing changes on companies even though they don’t want to make those changes.

I suspect both of you are familiar with Wikipedia, this is inside encyclopedia, where everybody can write an article. When I first heard about this I wondered how could you have any quality control if anybody from high school can go in and write an article. And the answer is there are so many people do read it, if some component on it that it isn’t quite right, somebody else will come along and correct it. It turns out there is a recent comparison between the quality of Wikipedia and the quality of encyclopedia mechanics, while encyclopedia mechanics is sort of the gold standard. And it turns out that the Wikipedia has the same quality with the encyclopedia mechanics. There are only differences that there are millions of articles within Wikipedia. I use it frequently to compare lecture and let say I can’t remember how something like Cayley’s formula is derived. I simply go to Wikipedia, I find it I work out the details before class and it works great.

I am not sure whether this slide is visible to you. I just tell you sort of it. This is just everything today with tracked. When I purchase something online, I can watch it from the time they leaves from store, till it was put on an airplane, into a flight to Africa, till it is put on a track to be delivered to my door. Not only can you track packages. This is a slide again I apologize its disability. This is a map of north America, and on it is every commercial aircraft at the moment I took this snapshot of the website. There are 3000 planes in the air, and you can see where they are from this. If you know the flight number, you can track it. What you don’t realize is in this database, it embedded a lot of information that people don’t realize. For example, if you want to track a hurricane that was coming close to Florida, you would be able to see it on this website, because the aircraft would get crowded because of the weather. So the weather is embedded into this website. One of the things you could do in the future is extract all kinds of information from websites, where you don’t even know that information was there. Just show you how this is used. If I was flying somewhere and someone was going to pick me up in the airport. 10 years ago, they would probably call the airport to ask whether it could be on time. They don’t do that anymore. They simply type in the flight number, and they see where the flight is. This one is going from New York to San Francisco. And you can see it is still about 2 hours away. So you would stay home an hour before you drive for the airport.

This is another snapshot that I will show you about sensor networks. I was in Seattle around Christmas time. I was going to drive up to Snoqualmie Pass. My wife was concerned about the weather as in Snoqualmie Pass. She said it was really good to go if there would be snow on the streets. Whereas in the past I would contact the state control, and ask them what the whether conditions were. Instead of that idea I googled Snoqualmie Pass, I found a web camera, which is pointed to the road surface. I looked the surface and I can see there is completely snowy. Show it to my wife and said let’s go. On that same trip, I was north the city of Seattle, and the airport is south of the city. I asked someone how long I was going to take to the airport. They said anywhere from 30 minutes to 2 hours, depending on the traffic. So once again, I googled the high 5 which is the road to the airport, saw the traffic going the direction that I want to go was pretty. And so I waited another hour and then go to the airport.

Just to mention another thing is I walked to and from work. It’s about 30 minutes’ walk. One of the things that I don’t like to get caught in a rain story, so what I do before I head home is I bring up a weather station, where there is VDR which is scanning the area where I am. You can see from the VDR contain, it’s about moister in the air. You can see this for ranges about 100 miles.What I do is I click this website the last five minutes, so I can see if a rain storm is coming, how it is moving, and its variation. And then I can estimate where it can be in the next 30 minutes, and know whether I can leave for home and stay dry, or whether I’d better wait for a while wait for it passes.

One of the things in the future you will not have to do is possessing information. There would be systems put in place where this information is presented to you automatically. Now I am going through a couple of examples. A few years ago, there is a concern in the US, about this spread of the bird fires from Southeast Asia to Europe. Then to the US, by migrating birds, we want to estimate what was the probability of this going to happen. It turns out there are about 6,000 species of migrating birds. To figure out the probability, you have to know flight pass and the timings of migrations for the 6,000 species of birds. The problem is we don’t have that data. So we asked a question, could we from setting the bird determine where the flight pass were. So we picked one particular bird, Ruby-throated Hummingbird. It turns out, I am not sure if you can see, but this is the US and these blue arrows here are setting of these birds at every size. What we did is we quantized geographically US into small squares, and we also quantized time into days. Reported exciting, we created underneath a model where we set if one bird is in one location, what is the probability will be another location a day later. From this setting, we then run a program to try to optimize what is the most likely flight pass on to the birds’ cycles. This problem is underneath, I just conclude this by showing you the results. The results are, we were able to develop the flight pass as these birds started from Mexico, flew north in spring, and you’ll notice half of them trying to cross through Mexico and the other half take a land drop through Mexico. Then they spread out through North America. Then in the fall when they come back, what’s interesting is they certain coming down, you will notice when they return, for some reason, they all pick the land graph. Another fly from Francisco to Mexico. We took these results for now. We asked a question, is this right? They said we don’t know. But this is their best guest of what the flies passes or some. This is kind of thing that data people are going to extract from vary websites.

What I am going to do is to talk a little bit about the theory that needed to support these kinds of things. So we are going to deal with autographs, structural analysis, highly dimensional data. All go through, I will give you some examples and then give you a little bit of the theory. You will see how it is developed. Just start with, let me talk about graph theory, as it was talked in the 1950s. Graphs are usually make up of small number of vertices, 10, 20…Maybe somebody looked at the graph with thousands of vertices, but that was as large as they look. Their theorem like graphs later, some contain curves, sub graphs and interaction. What do you think about this theorem? There are a few edges to the graph. It will change from planter to not planter. Today you are going to deal with graphs, which have billions of vertices. The exact number of edges presented is not critical. In other words, I take you a graph and I arrange them randomly. The size of edges is not going to change. Anything is interesting and any interesting properties of your graph. What we have to do is we have to develop a new graph theory, some theoretical bases state in large graphs. That theory would maybe a theory of fresh generation. And we want it to have two properties. One we want it to be variant small changes in definition. Secondly, we want to approve some basic theorems in this theory. One of the examples is a small range model in the random graph. This model you have some number of vertices, say in vertices. Then for each pair of vertices, you ask whether there is an edge or not. What you do is you put a coin which comes down for head with some probability P. As it comes down heads, put the edge in. And when it comes down tail, put the edge out. Then what you can do is you can approve a number of theorems about the graph generated. One of the theorems is the grid distribution of the graph will be diverse. In other words, all the vertices will have grades that every post of it effectively found, constraint to parameters and the value. A lot of theories I suppose to work is the Trellis graph theory, but there is problem with it. The problem comes out if you look at the real graph, US. city like Chicago has many flights to many cities: New York, San Francisco, Los Angels. These are very high degree nodes. There are other nodes which are small cities. If you found Africa-New York, they many only have degree 3. You will notice the degree distribution of this graph is not very constrained with specific value. This degree distribution is called “power law distribution”. What do we have to do is we will have a new model for a random graph. What constraints the model is it is a generative model. What you do is you start up with a small number of vertices with small number of edges. Each of the time, you add a vertex and you add some edges. The question is you did grow where the place is. And there are two rules can be explored. One is to take two vertices uniformly random. The other is to take preferential attachment. In preferential attachment, you select a vertex with a probability proportional to its degree. So certain group will get richer and richer. The vertex with high degree is likely to be picked up to add an edge. If you use this preferential attachment, you will get a graph with a power law degree distribution, which is near the degree distribution of real graphs. The power law distribution is like this: the value falls in 1 cube, something like that.

Let me show you something else about random graphs. If you take any database, just go out a web, find a database, you can convert it to a graph with some reasonable form. This is an example I went to the Science magazine. I found an article on protein interactions. At that time, they have 2730 proteins in that database, with 3602 interactions. What I did is I created an interaction graph. Each vertex in the graph represents a protein, an edge in the graph represents two proteins interacted. With that idea, I write a computer program which find the connected components in this graph, and how many of their sons. There were 48 components of size 1, 179 of size 2. Size 1 means you have a protein that doesn’t have interactions with any other proteins. Size 2 means a pair of proteins only have interactions between themselves. 50 size 3 and so on. 1 of size 11, 1 of size 16. The node with size over 100 hasn’t been found anymore. But if you add up the number of vertices in these components, you will be missing 1851 proteins. Where are they? If you go on, you will find there is one package of component having 1851 proteins. We called that giant component. You might say, woo, that’s strange. But that turns out, if you take any database and do this, you almost surely you will find a giant component. It suggests mathematically that there is something existing in giant component, and there is something that we don’t understand. Just give you an estimation of that giant component. You can program, put thousands of vertices out and then gradually put in edges. Each time you put in edges, calculate number of each size, and see what happens. Let me just show you. When we started out, I have thousands of components with size 1, because I have no edges. Then I add an edge. Then the number of components with size 1 decreases to 999, and there is a component with size 2. As I put in more and more edges, number of components of size 1 is dropping and I started to get larger and larger components. And here we have components of size 11. So far, nothing seems interesting. But if you continue on, you will see here that we have components with size 20, 55, and 101. Some sort of transitions is happening. What happens is you shouldn’t have some medium size of components. When you put in an edge, it’s very likely that edge is to connect those intermediate size components and it will form into a giant component. And then the giant component is going to sum up smaller components, so gradually get a connected graph. One of the things is the question that I asked. How many edges do you have to put in before that giant component forms? It turns out that mathematically you can predict it very precise, we can tell you a small number of edges would expect when a giant component is going to be formed. We call that point a phase transition. It’s very similar to what happens to physics that cool water down to freezes. One of the things you should be aware of is the phase transition in polynomial structures. In fact, it turns out that every property of polynomial structure will have a phase transition. Let me point out the giant component is just another place of wises. I get this slide from the IBM. They get from the worldwide, as the get 200,000,000 web pages. What they did is, this is a directed graph, with vertices of web pages. If there is a link from a webpage to another, there will be an edge. Then instead of find a component, we are interested in finding strong connected components, a set of vertices that you get an edge from one vertex to another and then versa back. It turns out that there is a giant soft component that there is a set of web pages around the world wide website. It turns out that there are a lot of pages you can reach from some pages and then come back, and there are vertices that you can reach these fundamental components. And then there are some vertices that can reach them to start.

I mentioned the phase transition, and there is a mathematical theorem says you have any components of structures and you can take any properties of that structure, and that property is one of the components you can get from the parameter that will phase that transition.

Let we move to some technologies of how search engines work. If you have a document, what you can do is you take all the words with existence in the document and you can sort them, and you can count the number of occurrence of each word. In this case, in the English language. In this document, we have CEO 17 times, Microsoft 61. You probably can guess what this document is about, just by that word vector. What we do is we represent documents by vectors. Two documents are similar if the dot product of their normalized vectors of length is 1. If the dot product is close to 1, the two documents are similar. If it is close to 0, then they are dissimilar. What you will also do is when you type a query into a search engine, you can form a vector of that query.The way to find that in documents is to take a dot product of that query vector with the document vectors.

I am not going to talk the science of this. I would just to tell you that not only to represent the document by these vectors, the subject matter. But there are also ways to express documents to a small number of bits. So I can take a book, I can take 1000 number of bits. In such a way I can answer those questions I want to answer about that book from those thousands bits. One of the things that disallows people to do is to detect matrix. If you submit a paper to physics, one of the things they will do is they will tell you how much of your paper, what percentage of your paper is copied from other papers that already in their type. So if you write a paper, and you want to send two versions of the same paper, they will tell you whether the second version is 95 percent similar with something you already have, then maybe you don’t want to submit it. If you get back there is 10 percent over allowance, that’s not bad because if you write it and send it to somebody else, you can use the same definition, or you can use the same terminology. A little about overlap is ok. Each of these kinds of things is computed today.

Located relatively of documents. One of the things is I wanna to type down a search, I type in a word “Gates”. What I want is I want articles on “Boolean Gates”, this is the one that make up the switch services of computers. What I got back was 2 millions web pages. Half a million are about Bill Gates, 177,000 on Gates county, and it wasn’t down here of 19,000 that I wanted. No matter how you work those, I will have to take a long time to find that page I want. What do you prefer to get back? It’s a cluster, then I can quickly say I don’t want “Bill Gates”, etc. I want “Boolean Gates”. I can click on the cluster and then I can go in to look at that. This is the kind of software I am going to use in the future.

And it’s not clear how we cluster things. Do I cluster documents by their subject matter? Like this politics or economy. Or I want to cluster whether it is children’s book, text book or reference book, or book for the generated population. And you can see what the distance preference is between the documents which is important and what you get.

I want to give you an example of what the future is going to be. For those I want to see just a little bit about mathematics. I thought I would take one topic just developed mathematics of that topic. I will take a notion of high dimensional of that. One point you should walk away with is the intuition is probably from your experience of low dimensions. These will be very misleading when you start to deal with high dimensions. What I have here is I have a picture of a cube. The cube has three dimensions, and its volume is 1. And if I draw a new cube, a 4D cube, 5D cube or 6D cube, its volume is always 1. But what would happen if I take the universe sphere, and I start to increase its dimensions. What happened to the problem? The answer is, the volume goes to 0. When I was first heard about that, I was surprised. So I suspect you are probably surprised too. But I will also show you what have happened. Here is in 2-dimensions. I draw a universe sphere and a unit cube which is in the sphere. You can see the unit cube is completely contained inside the sphere. So the sphere volume is greater than 1. If I increase the dimension to 4, then the sphere still greater than 1, but the distance from the center out to that vertex is 1/2 each of 4 dimensions. So the distance is 1/2 square + 1/2 square + 1/2 square + 1/2 square, then take square root, is 1. What happened when go up to d dimensions? While that cube, the vertex all of the sum, which is out for square root of d/2. So you can see if there are 1000 dimensions, which is data you are going to deal with much higher dimensions. That vertex is 50 units out. And the square is only gradients of 1. So there is something wrong with this picture. What the picture really should be look like? I don’t know this shows back. This is unit sphere here, and out here, there is a vertex of cube. You can see that the sphere, the volume of the sphere is probably very small compared to the volume of the cube. In fact if you go through, if you compute the interval, you will find out that once again above the dimension 5, the volume of the sphere is starting off to 0.

I will show you something else the difference about of low and high dimensions. If I put coins down in low dimension, and I find two points which are closest together. And two points that are far apart. The difference maybe this order may be the distance of larger distance with small distance. What happened if I put coins down randomly in high dimension? If I calculate distances between points, it turns out the distance between every pair is the same. First I might be surprised. If I want to calculate a distance of two points x and y, what I do is four coordinates in each dimension, squared, and sum them all into one dimension. If these xi and yi are random variables, this difference is random variable, the square of this is random variable. If I add up a large number of random variables, from a distribution of finite variance, it turns out that the sum will be highly concentrated about its expected value. That’s why we have high dimensional data, the distance of all points are the same. And trying to work with high dimensional data is going to be statistically unstable.

Then, I am going to do is to calculate the expected distance between points from two Gaussians separated by distance self, if origins are δ distant apart. I generate one point here, and one point here, which just means between. Simply geography, I generate the first point, I change my fault, put it out north for. I generate the second point, it is out here. And there are two little triangles. You will find out, between two points the distance is square power delta plus 2b.

So, if I want to be able and separate and tell you which Gaussian generates which points? As long asis represented by plus a noise, because some approximation I made. I approximate the annulus just by sphere, and a few other all approximations. But as long as it is true, I can just calculate the distance between a pair of points. And they are same sphere answer between 2D as δ numbers are larger numbers. If you take the same large number, and expand it, just work through it as long as δ is real represented four dimessions. You will OK.

So, let me bring up one of my talk, it is dimension reduction. You can deal with high dimension, one of things you would like to do is to project them in a lower dimension space. I will take that problem this point. I am going to project them down to lower dimensional spaces.

What I could do this. I suppose I could find a y, which is one of the centers of two Gaussians. Now you can have the first question to ask me, how do you find the center of the two Gaussians? As you know which Gaussian generates which points yet? If you do the similar values decomposition, and you take the first center of the vector, if you give this line, and that tells you student which better learn about similar the values decomposition, because something is very important during your life time.

Now, if I take all of gather projected on this line, what happens? One of things to happen, is I reduce the distance between points. But notice I do not reduce the distance between two centers. So, this point helps me significantly and figure out which Gaussian generates which points. Also notice that all of distances wait from the line is noises. It is random due to rejection causes and I am going to figure out noise and give you a similar. It is the fundamental thing in your careers. You will be given noisy gap, high-dimensional noisy gap, you want to signal. How do you do that and how do you save noise? Well, once you do that, it turns out to be because the short distances. Mathematical consults much that.

What to do is answer that provides δ square plus two times numbers of Gaussian. So, I am thinking that is 2δ and I else generate. As long as square δ plus 2k is larger than square 2k plus an approximation, you can figure out which Gaussian generates which points. Notice that it is independent of the dimension now. So you deal with ten thousands dimensional healthy. That is ten thousands no longer complicated equations. That problem is how much to be solved. If you do such with computer, by the way, all the things I will tell you really work. When you go home, if you program like component software, it really do happen.

So, I talk about a query and thing. A kind of problem you have to deal with. One of them, ranking is very important, restaurant, movies and web pages. It is a multi billion dollars industry. And when we investigate the tech involved. People try to learn their limits. Let me very quickly tell you how one of the ranking systems works. It is Google use. That is page rank. That page rank all calculates the new walk around. It calculates the stationary probability and one of probability is you given the first vertex. That probability is page rank. A couple of techniques have to do to make that work. It is noticed that the first here place, each other to know place walk to do the vertex. It walks just here, also, notice the walk get there. It is going to walk and get there, what have to do is to make the restaurant strongest, to do that, yeah, every the vertex, you make a choice. You say easier you will take a stem in the ground, or you are going to restart. Usually when do these, you should restart about 15% probability you take stem from the ground. When you get to restart…

The Future of Computer Science

文档上传者