Archiv für April 2012
Machine learning for identification of cars
Veröffentlicht von dakoller in data science am 23. April 2012
This is a handy getting started guide for computer vision using R from e.g. surveillance cameras, as all the with R-bloggers: it contains the needed source code.
…just answered: Where are the Semantic web incubators? Any thoughts on building an economic ecosystem for Semantic web to keep momentum enough to attract si
Veröffentlicht von dakoller in Uncategorized am 23. April 2012
My basic message is: as long as your startup wants to use semantic tools/infrastructures (vs. providing tools and infrastructures) you likely not need a specific incubator, as semantic web just influences the tech part of your startup.
Semantic Web: Where are the Semantic web incubators? Any thoughts on building an economic ecosystem for Semantic web to keep momentum enough to attract sizable investment? 1 answer on Quora
How to work with Google n-gram data sets in R using MySQL
Veröffentlicht von dakoller in data science, google am 12. April 2012
How to work with Google n-gram data sets in R using MySQL. via R-Bloggers: I like really much about this blog the focus interesting things along with code examples to try it out on your own.
N-Grams datasets can also be created from your own texts using NTLK functions (see http://nltk.googlecode.com/svn/trunk/doc/howto/collocations.html ): in analytical use cases N-grams give you a better basis to have a machine ‘understand’ the meaning of a text (compared to looking at the words individually.
—Update from 2012-04-10:
Stefan Keller ( http://twitter.com/sfkeller ) hinted me to a blog entry about how to use n-grams in a PostgreSQL based setting to optimize search functionality.
knitR: Report-Generierung mit R
Veröffentlicht von dakoller in data science am 12. April 2012
knitR: Report-Generierung mit R
In Projekten zur Datenauswertung ist es häufig relevant, die Ergebnisse zeitnah ansprechend dokumentieren zu können. Ein gute Weg dabei, den Fortgang der Experimente und deren Ergebnisse in einem dynamisch generierten Dokument zu verfolgen: dabei kann knitR helfen.
knitR bügelt einige Schwächen der Sweave-Lösung (http://www.statistik.lmu.de/~leisch/Sweave/ ): insbesondere können gut aussehende Grafiken besser eingebaut werden.
Weitere Infos zum Tool finden sich unter http://www.inside-r.org/howto/knitr-elegant-flexible-and-fast-dynamic-report-generation-r , eine Beispielausgabe ist unter http://cloud.github.com/downloads/yihui/knitr/Stat615-Report1-Yihui-Xie.pdf zu sehen.
Will web 3.0 be all about the use of a single taxonomy?
Veröffentlicht von dakoller in Semantic Web am 11. April 2012
The Wikidata project ( http://meta.wikimedia.org/wiki/W… ) somehow follows the path of the DBPedia project ( dbpedia.org ) in the regard to connect/collect the described facts in Wikipedia, the extent to which it will follow the semantic web & ontology standards is still open. (current status can be seen at http://meta.wikimedia.org/wiki/W… ).
I support Jorn Barger's statement above: you can't build an universal ontology, but I would like to add that – for most of the use cases Ih have seen until now – THE one correct & complete ontology is not needed.
(and yes pictures may help people to identify things)
To my understanding it is much more important to be able to work on your own ontology subset and to link it with somehow more general documented wisdom. (e.g. linking to DBPedia concepts from your own terms using the SKOS vocabulary at http://www.w3.org/TR/skos-primer/ .)
(My experience from this kind of standardization projects is that you might be able to manage the technical side of it, but the organizational and managerial aspect get very complicated once you target a singel taxonomy).
Silicon Valley, London, NYC: Startup Genome Data Reveals How The World's Top Tech Hubs Stack Up
Veröffentlicht von dakoller in business ideas am 11. April 2012
Last year, we covered an ambitious collaborative R&D project called "Startup Genome," created by three young entrepreneurs, Bjoern Herrmann, Max Marmer, and Ertan Dogrultan. The goal of the ongoing project was (and is) to take a comprehensive, data-driven dive into what makes tech startups successful -- and not so successful.
Out of its research came, among other things, Startup Compass…
A response to “The race for speed at the data layer” re. SAP HANA
Veröffentlicht von dakoller in data science, sap am 7. April 2012
I just wanted to post some remarks to the very interesting blog from David Smith, as I was able to take a deeper look to the HANA appliance:
- I agree with the statement that tool providers focus today on ‘high-performance analytics’:But the most important steps in the SAP-/ERP-world is still to be done: too much of analytics domain information is today deeply buried in application code, BI tools from the past were merely seen as pure inspection tools to this information.
- SAP is about to place more application logic on the database layer, which in perspective enables more of David’s “more than just basic analytics”: the usage of (optimizable) prediction models could be possible then.
- (I remember especially a very interesting use case for “more than just basic analytics” from an SAP discussion: appliances like HANA with specific application functionality enable a production company/facility to evaluate the ‘best’ scenario of how to fulfill orders in taking into account the bills of material and facts like availability of parts in case of limitations.)
SAP in fact announced formal R integration:
- The so called “HANA pocketbook” ( at https://www.experiencesaphana.com/servlet/JiveServlet/previewBody/1436-102-1-1946/SAP%20HANA%20Pocketbook-DRAFT.pdf ) describes the high level picture of R integration (starting on p. 59).
- Alvaro Tejada ( @blag ) posted a number of blogs on R integration with HANA: http://scn.sap.com/people/alvaro.tejadagalindo3 I consider him to be the R-mastermind inside SAP.
The complete R integration was not present in the previews of HANA I have seen: the key to the success of R in the SAP world is to which level constraints for R are in place: e.g. whether all the nice machine learning/hadoop enablers for R can be used. ( only a small-scale R-language support would not be sufficient for these use cases.
Business idea: Put touristic activities in personal travel planning
Veröffentlicht von dakoller in business ideas, data science, nlp, Semantic Web am 2. April 2012
I’ll start with this blog entry a session of business ideas, which come up near me… which I cannot pursue at the moment, but are maybe interesting for others.
Tagline: Offer spare time activities to people planning a trip fitting to their interests and their personal time planning.
Technology: Mashup of APIs used from travel planning tools (like tripit.com or dopplr.com ) and crawled/stored information about events, touristic activites etc. based on user profiling e.g. from Facebook Likes.
Business models: mainly affiliate model (bringing guests to organizers of events/tour organizers)
Martin Hepp ( @mfhepp ), the author of the GoodRelations vocabulary for eBusiness, just posted a cookbook entry to show how business entities offering travel activities (outdoor, concerts etc.) can publish this information in a machine-readable way.
(I think) for this reason he defined the Ticket Ontology to describe events, activities and their business impact.
But for the time being (as long as not many travel organizer make their activities machine-readable) a crucial technical part is the collection of travel activites and making / keeping connections with these business entities offering activities.
Even this idea can make use of BigData analysis techniques: you can initially optimize and later predict, which kind of activity is attractive to which group of users. (a use case of customer segmentation).