Advene - Tim Berners-Lee at TED: The next Web of open, linked data

Transcription of the talk

Table of contents

Origin of the Web: frustration

Time flies, It's actually almost twenty years ago when I wanted to reframe the way we use information, the way we work together, I (??) it the World Wide Web. Now, twenty years on, TED, I want to ask your help in a new reframement. So going back to 1989, I wrote a memo suggesting global hypertext system. Nobody really did anything with it very much. But eighteen months later - this is, you know, this is how innovation happens, eighteen months later my boss said I could do it on-on a side, as a sort of a play(?) project, ehr, (??) on the computer we'd got. And so he gave me the time to code it up. So, I basically roughed out what HTML looks like, the hypertext protocol, HTTP, the idea if URLs, these names for things which sorted(?) HTTP. I wrote the code, and put it out there. Why did I do it? Well, it was basically frustration. I was frustrated with- I was in this- I was working as a software engineer in this huge very exciting lab. Lots of people coming from all over the world. They (??) all sorts of different communities(?) with them, they had all sort of different data formats, all sorts of kinds of documentation systems. So that, in all that diversity, if I wanted to figure out how to build something out of one little bit of this and a bit of this, everything I looked into, I had to connect to some new machine, I had to learn to run some new program. I had to- I would find the data may be, the information I wanted, in some new data format and they were all com- incompatible. It was just very frustrating, the frustration was on this- all this unlocked potential. In fact on all these disks, there were documents. So, if you just imagine they all being part of some big virtual documentation system in the sky, then- say, on the internet, then life would be so much easier. Well, once you have an idea like that, it kinds of gets under your skin, and even if people don't read your memo (actually he did, it was found after he died, his copy, it was found and he'd written "vague but exciting" in pencil in the corner).

See also

↑TOC

A grassroots movement

But in general, it was difficult to exp- it's really difficult to explain what the Web was like, you don't- it's difficult to explain to people now, that(?) it was difficult then. But then, OK, when TED started, there was no Web. So we- things like clicked in(?) have(?) the same meaning. I could show somebody a piece of hypertext, a page which has got some links, and we click on a link and *bing*, there will be another hypertext page. Not impressive, you know, we've seen that, we've got things on the hypertext on CD-ROMs. What was difficult was to get them to imagine. So imagine that that link could have gone to virtually any document you could imagine. All right? That is the- that is the leap that was very difficult for people to make. Well, some people did. So yes, it was difficult to explain, but it was a grassroots movement. And that is what made it has made it most of- most fun. That was the most exciting thing, not the technology, not the things people'd done with it, but actually the community, the spirit of all these people getting together, sending e-mails. That's what it was like, then. Do you know what, it's funny but right now it's kind of like that again. I asked everybody more or less to put their documents, say "Could you put your documents on this Web thing." And you did, thanks. It were- it's been a blast, hasn't it. I mean, it's- it's been quite interesting because we found out that the things that happened with the Web really blew(?) us away. They're much more than we'd eventually(?) imagined, when we put together the little web- you know, the initial website that we started off with.

↑TOC

The importance of data

Now, I want you to put your data on the Web. Turns out that there is still huge unlocked potential. There is still a huge frustration that people have because we haven't got data on the Web as data. What do you mean, "data", what's the difference, documents, data? Well documents you read, OK? More or less, you can read them, you can put a link from them and that's it. Data, you can do all kinds of stuffs with the computer. Who was here or, don't know, has seen Hans Rosling's talk. When Hans Rosling was at Ted, yeah, one of the- great, yes, a lot of people has seen it, cause it was one of the greatest Ted's talks. Hans put up this presentation in which he shows, for various different countries in various different colours, he shows income level on one axis and he showed infant mortality, and he showed this thing animated from time. So he'd taken this data, made a presentation which just shattered a lot of myths that people have about the economics in the developing world. He put up a slide a little bit like this. It had underground all the data. OK, data is brown and boxy and boring and all that(?), that's what we think of it, isn't it, data? Cause data you can't naturally use by itself. But in fact data drives a huge amount of what happens in our lives. It happens because somebody takes that data and does something with it. In this case Hans, he could put the data together, he found from all kinds of United Nation websites and things. He put it together, combined it into something more interesting than the original pieces. And then he put it into this software, which I think is Sun developed originally, and produces this wonderful presentation. And Hans made a point of saying it's really important to have a lot of data, and I'm happy to see, the party last night, that he was still saying very forcibly, it's really important to have a lot of data.

See also

↑TOC

The principles of Linked Data

So I want us now to think about, not just two pieces of data being connected, or six like he did, but I want to think of about a world where everybody has put data on the Web, and so virtually anything you could imagine is on the Web, and I'm calling that Linked Data. The technology is Linked Data, and it's extremely simple. If you want to put something on the Web, there are three rules. First thing is, that those HTTP names, those things that start with "http:", we're using them not just for documents, now we're using them for things that the documents are about. We're using them for people, we're using them for places. We're using them for your products. We're using them for events. All kinds of conceptual things they star- they have names now, that start with "http". Second rule: when- if I take one of these "http" names and I look it up, I go and do the Web thing with it, I fetch the data using the HTTP protocol from the Web, I will get back some data in a standard format which is kind of useful data somebody might like to know about that thing, about that event, who's at the event, whatever it is about that person, where they were born, things like that. So, second rule is: I get important information back. Third rule is that when I get back this information, it's not just got somebody's height and weight and when they were born, it's got relationships. Data is relationships. Interestingly, data is relationships. It's got this person was born in Berlin, Berlin is in Germany, and when it has relationships, whatever expresses this relationship, then the other thing that it's related to is given a na- one of those names that starts "http". So I can go ahead and look that thing out. So I look up a Person, I can look up then the city where they were born, then I can look up the region it's in, and the town it's in and the population of it, and so on, so I can browse this stuff. So that's it really. That is Linked Data. I wrote an article entitled "Linked Data" a couple of years ago, and soon after that, things started to happen. The idea of Linked Data is that we get lots an lots and lots of these boxes that Hans had, and we get lots and lots and lots of things sprouting. It's not just an whole lot of other plants, it's not just a root supplying a plant. But for each of those plants, whatever it is, a presentation, an analysis, somebody's looking for patterns in the data, they get to look at all the data and they get it connected together, and the really important thing about data is that the more things you have to connect together, the more powerful it is.

See also

↑TOC

It's working: DBpedia

So, Linked Data, the mean went out there. And pretty soon Chris Bizer at the Freie Universität in Berlin was one of the first people to put interesting things up. He noticed that Wikipedia, you know Wikipedia, the online encyclopedia with lots and lots of interesting documents in it, well in those documents, there are little squares, little boxes and those- in those information boxes, there's data. So he wrote a program to take the data, extract it from Wikipedia and put it into a blob of linked data on the Web; which he called dbpedia. Dbpedia is represented by the blue blob in the middle of this slide. And if you actually go and look at Berlin you'll find that there are other blobs of data which also have stuff about Berlin and they are linked together. So if you pull the data from dbpedia about Berlin, you'll end up pulling up these other things as well. And the exciting thing is: it's starting to grow. This is just a grassroots stuff again, OK? Now let's thing about data (??). Data comes in fact in lots and lots of different forms. Think of the diversity of the Web. It's a really important thing that the Web allows you to put all kinds of data up there. So it is with data. I can talk about all kinds of data. We can talk about government data, enterprise data is really important. There's scientific data, there's personal data. There's weather data, there's data about events. There's data about talks, and there's news, and there's all kinds of stuff. I'm just going to mention a few of them, so that you get the idea of the diversity of it, so that you also see how much unlocked potential.

See also

↑TOC

Government data

Let's start with government data. Barak Obama said in a speech that he- the American government data would be available on the internet in accessible formats. And I hope that they will put it out as linked data. That's important. Why is it important? Not just for transparency. Yes, transparency in government's important. But that data, this is the data from all the government departments. Think about how much of that data is about how life is lived in America. It's actually useful, it's got value. I can use it in my company. I could use it as a kid to do my homework. So we're talking about making the place, making the world run better by making this data available.

See also

↑TOC

Demand raw data now

In fact if you're responsible, if you know about some data in a government department, often you find that these people, they're very tempted to keep it, to (??) in database hugging. You hug your database, you don't want to let it go until you've made a beautiful website for it. Well I'd like to suggest that rath- before you- yes, make a beautiful website (who am I to say "don't make a beautiful website"). Make a beautiful website, but first, give us the unadulterated data. We want the data. We want unadulterated data. OK. We have to ask for raw data now, and I'm gonna ask you to practice that, OK? Can you say "raw"? Can you say "data"? Can you say "now"? Right: "raw data now". Practice that, it's important, because you have no idea the number of excuses people come up with to hang on to their data, and not give it to you, even though you've paid for it as a taxpayer. And it's not just America, it's all over the world. That is not just not just governments, of course it's enterprises as well.

↑TOC

Scientific data

So I'm just going to mention a few other sources of data. Well here we are, Ted, and all the time we are very conscious of the huge challenges that human society has right now. Curing cancer. Understanding the brain for Alzheimer's. Understanding economics, making it a little more stable. Understanding how the world works. The people who are gonna solve those are scientists, they have hard formed ideas in their head. They try to communicate of those over the Web, but a lot of the state of knowledge of the human race at the moment is on databases, often sitting in their computers and actually commonly not shared. In fact, I'm just going to one area: if you're looking at Alzheimer's for example, drug discovery, there is an whole lot of linked data which is just coming out because scientists in that field realize this is a great way of getting out of those silos. Because they had that genomic data in one database and in one building. And they had that protein data in another. Now they are sticking it onto it: Linked data. And now they can ask a question, a question that you probably wouldn't ask, I wouldn't ask, they would: "What proteins are involved in signal transduction and also are related to pyramidal neurons?" Well you take that (??) and if you put it to google, of course there is no page on the web which would answer that question because nobody has asked that question before. You get 223,000 hits: no result you can use. You ask the Linked Data which they've now put together: 32 hits, each of which is a protein which has these properties, and you can look at. The power of being able to ask those questions of a scientist, those questions which actually bridge across different disciplines is really a complete (??) change. It's very very important. Scientists have totally (??) at the moment there(?). The power of the data that other scientists have collected is locked up and we need to get it unlocked so we tackle those huge problems.

↑TOC

Personal data

Now, if I go on like this you'll think that all the data comes from huge institutions, and it has nothing to do with you. But that's not true. In fact data is about our lives. You just- you logon to your social networking site, you pick your favourite one, you say "this is my friend", *bing*, relationship, data. You say "this photograph, oh, it's about- it depicts this person", *bing*, that's data. Data data data. Everytime you do things in a social networking site, the social networking site is taking data and using it, repurposing it. And using it to make other people's lives more interesting on the site. But when you go to another Linked Data site, and you say this one about travel, and you say "I want to sent this photo to all the people in that group", you can't get over the walls. The Economist wrote an article about it, lots of people blogged about it, tremendous frustration. The way to break down the silos to get interoperability between social networking sites, we need to do that with Linked Data.

See also

↑TOC

OpenStreetMap

One last type of data I will talk about, may be it's the most exciting, before I came down here I looked up on the OpenStreetMap. OpenStreetMap is a map, but it's also a wiki. Zoom in and that's square thing is the theatre which we're in right now, the Terrace Theatre. It didn't have a name on it. So I could go in Edit mode, I could select the theatre. I could add on down the bottom the name. And then I could save it back, and now if you go back to the openstreetmap.org, and you find this place, you will find that the Terrace Theatre's got a name. I did that, me. I did that on the map. I just did that, I put that up on there and you know what? If I- the StreetMap is all about everybody doing their bit, and this creates an incredible resource because everybody else does theirs.

See also

↑TOC

What it's all about

And that is what Linked Data is all about. It's about people doing their bit to produce a little bit, and it all connecting. That's how Linked Data works. But you do your bit, everybody else does this. You may not have lots of data which you have to- yourself to put on there, but you know to demand it, and we've practiced that. So, Linked Data is this huge. I've only told you of a very small number of things. There are data in every aspect of our lives, every aspect of work and pleasure, OK? And it's not just about the number of places where data comes. It's about connecting it together, and when you connect data together, you get power in a way that doesn't happen just with the Web, with documents. You get this really huge power out of it. So, we're at a stage now where we have to do this. Those- the people who think it's a great idea. And all the people, and I think there are a lot of people at Ted, who do things, because even though there's not an immediate return on investment, you have- because it will only really pay off when everybody else has done it, they'll do it, because they're the sort of person who just does things which would be good if everybody else did them. OK? So it's called Linked Data. I want you to make it. I want you to demand it. And I think it's an idea worth spreading. Thanks.

↑TOC