Alex Gorelik

Alex Gorelik

San Francisco, California, United States
2K followers 500+ connections

Contributions

Activity

Join now to see all activity

Experience

  • Meta Graphic

    Meta

    Menlo Park, California, United States

  • -

    San Francisco Bay Area

  • -

    Mountain View, CA

  • -

  • -

    Mountain View, CA

  • -

    Menlo Park CA

  • -

    Mountain View, CA

  • -

    Rancho Bernardo, CA

  • -

    Redwood City, CA

  • -

    Redwood City, CA

  • -

    Silicon Valley Labs, San Jose, CA

  • -

    Santa Clara, CA

  • -

  • -

    Palo Alto, CA

  • -

  • -

    Sunnyvale, CA

  • -

Education

Publications

  • The Enterprise Big Data Lake: Delivering the Promise of Big Data and Data Science

    O'Reilly

    The data lake is a daring new approach for harnessing the power of big data technology and providing convenient self-service capabilities. But is it right for your company? This book is based on discussions with practitioners and executives from more than a hundred organizations, ranging from data-driven companies such as Google, LinkedIn, and Facebook, to governments and traditional corporate enterprises. You’ll learn what a data lake is, why enterprises need one, and how to build one…

    The data lake is a daring new approach for harnessing the power of big data technology and providing convenient self-service capabilities. But is it right for your company? This book is based on discussions with practitioners and executives from more than a hundred organizations, ranging from data-driven companies such as Google, LinkedIn, and Facebook, to governments and traditional corporate enterprises. You’ll learn what a data lake is, why enterprises need one, and how to build one successfully with the best practices in this book.

    Alex Gorelik, CTO and founder of Waterline Data, explains why old systems and processes can no longer support data needs in the enterprise. Then, in a collection of essays about data lake implementation, you’ll examine data lake initiatives, analytic projects, experiences, and best practices from data experts working in various industries.

    Get a succinct introduction to data warehousing, big data, and data science
    Learn various paths enterprises take to build a data lake
    Explore how to build a self-service model and best practices for providing analysts access to the data
    Use different methods for architecting your data lake
    Discover ways to implement a data lake from experts in different industries
    You'll explore various approaches to starting and growing a Data Lake, including Data Warehouse off-loading, analytical sandboxes, and "Data Puddles." Author Alex Gorelik shows you methods for setting up different tiers of data, from raw untreated landing areas to carefully managed and summarized data. You'll learn how to enable self-service to help users find, understand, and provision data; how to provide different interfaces to users with different skill levels; and how to do all of that in compliance with enterprise data governance policies.

    See publication

Patents

  • Generating composite key relationships between database objects based on sampling

    Issued US 9336246

    According to one embodiment of the present invention, a system determines key relationships between database tables and includes a computer system including at least one processor. The system determines a sampling range for one or more matching columns between first and second database tables. The matching columns satisfy one or more matching criteria and the sampling range is based on quantities of distinct values within the matching columns. Data is sampled from the first and second database…

    According to one embodiment of the present invention, a system determines key relationships between database tables and includes a computer system including at least one processor. The system determines a sampling range for one or more matching columns between first and second database tables. The matching columns satisfy one or more matching criteria and the sampling range is based on quantities of distinct values within the matching columns. Data is sampled from the first and second database tables in accordance with the sampling ranges to determine a sample set. Keys between the first and second database tables are determined based on matching between columns within the sample set. Embodiments of the present invention further include a method and computer program product for determining key relationships between database tables in substantially the same manner described above.

    Other inventors
    See patent
  • Semantic discovery and mapping between data sources

    Issued US 9336253

    An apparatus and method are described for the discovery of semantics, relationships and mappings between data in different software applications, databases, files, reports, messages, or systems. In one aspect, semantics and relationships and mappings are identified between a first and a second data source. A binding condition is discovered between portions of data in the first and the second data source. The binding condition is used to discover correlations between portions of data in the…

    An apparatus and method are described for the discovery of semantics, relationships and mappings between data in different software applications, databases, files, reports, messages, or systems. In one aspect, semantics and relationships and mappings are identified between a first and a second data source. A binding condition is discovered between portions of data in the first and the second data source. The binding condition is used to discover correlations between portions of data in the first and the second data source. The binding condition and the correlations are used to discover a transformation function between portions of data in the first and the second data source.

    Other inventors
    See patent
  • Creation of change-based data integration jobs

    Issued US 9305067

    A computer software implemented method for transforming a first extract transform load (ETL) job having at least some unload transform load (UTL) portions. The method includes the following steps: (i) decomposing the first ETL job into an intermediate set of one or more jobs; and (ii) for each job of the intermediate set, transforming the job into a transactionally equivalent job to yield a final set of one or more jobs. The decomposing is performed so that each job of the intermediate jobs set…

    A computer software implemented method for transforming a first extract transform load (ETL) job having at least some unload transform load (UTL) portions. The method includes the following steps: (i) decomposing the first ETL job into an intermediate set of one or more jobs; and (ii) for each job of the intermediate set, transforming the job into a transactionally equivalent job to yield a final set of one or more jobs. The decomposing is performed so that each job of the intermediate jobs set is a Simple UTL job. The transforming is performed so that each job of the final set includes no UTL portions.

    Other inventors
    See patent
  • Automatic consistent sampling for data analysis

    Issued US 9239853

    A method, computer program product, and system for analyzing data within one or more databases, comprising selecting one or more databases for analysis, each database comprising one or more database objects comprising one or more data values, applying a function to each data value in each database object within the one or more databases, where the function produces function values limited to a predetermined range, identifying for analysis the data values producing a certain function value…

    A method, computer program product, and system for analyzing data within one or more databases, comprising selecting one or more databases for analysis, each database comprising one or more database objects comprising one or more data values, applying a function to each data value in each database object within the one or more databases, where the function produces function values limited to a predetermined range, identifying for analysis the data values producing a certain function value within the predetermined range to form a sampled data set, and analyzing the sampled data set to determine relationships between the database objects within and across the one or more databases.

    See patent
  • Discovering pivot type relationships between database objects

    Issued US 8930303

    According to a present invention embodiment, a system determines a relationship between source and target database tables, and includes a computer system including at least one processor. Potential pivot keys of the target database table are determined, and maps are created for each potential pivot key between the database tables based on distinct values. Transformations for each map are generated that enable target data to be produced from source data. The transformations for each potential…

    According to a present invention embodiment, a system determines a relationship between source and target database tables, and includes a computer system including at least one processor. Potential pivot keys of the target database table are determined, and maps are created for each potential pivot key between the database tables based on distinct values. Transformations for each map are generated that enable target data to be produced from source data. The transformations for each potential pivot key are analyzed and the potential pivot key with the transformations that generate the greatest amount of matching data is selected as the resulting pivot key. The database table columns corresponding to the resulting pivot key are determined to be associated by the relationship. Embodiments of the present invention further include a method and computer program product for determining a relationship between source and target database tables in substantially the same manner described above.

    Other inventors
    See patent
  • Searching and displaying data objects residing in data management systems

    Issued US 8898194

    A data source is accessed to provide information. The data source is accessed by defining a plurality of data objects each associated with data within the data source, where the data objects include search information that facilitates searching of the data objects, and the data objects further include display information that pertains to a format in which data obtained from a search of the data objects is displayed, defining one or more relationships linking at least one data object with at…

    A data source is accessed to provide information. The data source is accessed by defining a plurality of data objects each associated with data within the data source, where the data objects include search information that facilitates searching of the data objects, and the data objects further include display information that pertains to a format in which data obtained from a search of the data objects is displayed, defining one or more relationships linking at least one data object with at least one other data object so as to establish associated data objects with linking relationships, receiving a query to search for data within the data source, retrieving data within the data source satisfying the query in accordance with the search information, where the retrieved data comprises data from at least two associated data objects, organizing and displaying the retrieved data in accordance with the display information, and displaying one or more links associated with the retrieved data so as to enable navigation between associated data objects.

    See patent
  • Automatic consistent sampling for data analysis

    Issued US 8892525

    A method, computer program product, and system for analyzing data within one or more databases, comprising selecting one or more databases for analysis, each database comprising one or more database objects comprising one or more data values, applying a function to each data value in each database object within the one or more databases, where the function produces function values limited to a predetermined range, identifying for analysis the data values producing a certain function value…

    A method, computer program product, and system for analyzing data within one or more databases, comprising selecting one or more databases for analysis, each database comprising one or more database objects comprising one or more data values, applying a function to each data value in each database object within the one or more databases, where the function produces function values limited to a predetermined range, identifying for analysis the data values producing a certain function value within the predetermined range to form a sampled data set, and analyzing the sampled data set to determine relationships between the database objects within and across the one or more databases.

    See patent
  • Semantic discovery and mapping between data sources

    Issued US 8874613

    An apparatus and method are described for the discovery of semantics, relationships and mappings between data in different software applications, databases, files, reports, messages, or systems. In one aspect, semantics and relationships and mappings are identified between a first and a second data source. A binding condition is discovered between portions of data in the first and the second data source. The binding condition is used to discover correlations between portions of data in the…

    An apparatus and method are described for the discovery of semantics, relationships and mappings between data in different software applications, databases, files, reports, messages, or systems. In one aspect, semantics and relationships and mappings are identified between a first and a second data source. A binding condition is discovered between portions of data in the first and the second data source. The binding condition is used to discover correlations between portions of data in the first and the second data source. The binding condition and the correlations are used to discover a transformation function between portions of data in the first and the second data source.

    Other inventors
    See patent
  • Automatic consistent sampling for data analysis

    Issued US 8856085

    A method, computer program product, and system for analyzing data within one or more databases, comprising selecting one or more databases for analysis, each database comprising one or more database objects comprising one or more data values, applying a function to each data value in each database object within the one or more databases, where the function produces function values limited to a predetermined range, identifying for analysis the data values producing a certain function value…

    A method, computer program product, and system for analyzing data within one or more databases, comprising selecting one or more databases for analysis, each database comprising one or more database objects comprising one or more data values, applying a function to each data value in each database object within the one or more databases, where the function produces function values limited to a predetermined range, identifying for analysis the data values producing a certain function value within the predetermined range to form a sampled data set, and analyzing the sampled data set to determine relationships between the database objects within and across the one or more databases.

    See patent
  • Semantic discovery and mapping between data sources

    Issued US 8442999

    An apparatus and method are described for the discovery of semantics, relationships and mappings between data in different software applications, databases, files, reports, messages, or systems. In one aspect, semantics and relationships and mappings are identified between a first and a second data source. A binding condition is discovered between portions of data in the first and the second data source. The binding condition is used to discover correlations between portions of data in the…

    An apparatus and method are described for the discovery of semantics, relationships and mappings between data in different software applications, databases, files, reports, messages, or systems. In one aspect, semantics and relationships and mappings are identified between a first and a second data source. A binding condition is discovered between portions of data in the first and the second data source. The binding condition is used to discover correlations between portions of data in the first and the second data source. The binding condition and the correlations are used to discover a transformation function between portions of data in the first and the second data source.

    Other inventors
    See patent
  • Searching and displaying data objects residing in data management systems

    Issued US 8380750

    A data source is accessed to provide information. The data source is accessed by defining a plurality of data objects each associated with data within the data source, where the data objects include search information that facilitates searching of the data objects, and the data objects further include display information that pertains to a format in which data obtained from a search of the data objects is displayed, defining one or more relationships linking at least one data object with at…

    A data source is accessed to provide information. The data source is accessed by defining a plurality of data objects each associated with data within the data source, where the data objects include search information that facilitates searching of the data objects, and the data objects further include display information that pertains to a format in which data obtained from a search of the data objects is displayed, defining one or more relationships linking at least one data object with at least one other data object so as to establish associated data objects with linking relationships, receiving a query to search for data within the data source, retrieving data within the data source satisfying the query in accordance with the search information, where the retrieved data comprises data from at least two associated data objects, organizing and displaying the retrieved data in accordance with the display information, and displaying one or more links associated with the retrieved data so as to enable navigation between associated data objects.

    See patent
  • Specification to ABAP code converter

    Issued US 8250529

    Other inventors
    See patent
  • Method and system for facilitating data retrieval from a plurality of data sources

    Issued US 7680828

    A method and a system for facilitating data retrieval from a plurality of data sources are provided. A plurality of ‘Local Data Objects’ (LDOs) corresponding to the plurality of data sources are generated. Further, the plurality of LDOs are mapped onto a ‘Global Data Object’ (GDO). The GDO consolidates the plurality of LDOs into a single integrated model. The mapping of the LDOs onto the GDO includes a plurality of ‘binding conditions’ that relate LDO attributes to GDO attributes. The mapping…

    A method and a system for facilitating data retrieval from a plurality of data sources are provided. A plurality of ‘Local Data Objects’ (LDOs) corresponding to the plurality of data sources are generated. Further, the plurality of LDOs are mapped onto a ‘Global Data Object’ (GDO). The GDO consolidates the plurality of LDOs into a single integrated model. The mapping of the LDOs onto the GDO includes a plurality of ‘binding conditions’ that relate LDO attributes to GDO attributes. The mapping also includes a plurality of ‘transformation functions’ for transforming the LDO attributes to the GDO attributes.

    See patent
  • Semantic discovery and mapping between data sources

    Issued US 8082243

    In one aspect, semantics and relationships and mappings are identified between a first and a second data source. Data between the first and second data source is compared. A binding condition is discovered between portions of data in the first and the second data source based upon the comparison, wherein the binding condition identifies data within the first and second data sources that map to each other. The binding condition is used to discover correlations between portions of data in the…

    In one aspect, semantics and relationships and mappings are identified between a first and a second data source. Data between the first and second data source is compared. A binding condition is discovered between portions of data in the first and the second data source based upon the comparison, wherein the binding condition identifies data within the first and second data sources that map to each other. The binding condition is used to discover correlations between portions of data in the first and the second data source, wherein the correlations identify data in the first data source that correspond to values in the second data source. The binding condition and the correlations are used to discover a transformation function between portions of data in the first and the second data source, wherein the transformation function generates data in the second data source data in the first data source.

    Other inventors
    See patent
  • Method and system for facilitating data retrieval from a plurality of data sources

    Issued US 7680828

    A method and a system for facilitating data retrieval from a plurality of data sources are provided. A plurality of ‘Local Data Objects’ (LDOs) corresponding to the plurality of data sources are generated. Further, the plurality of LDOs are mapped onto a ‘Global Data Object’ (GDO). The GDO consolidates the plurality of LDOs into a single integrated model. The mapping of the LDOs onto the GDO includes a plurality of ‘binding conditions’ that relate LDO attributes to GDO attributes. The mapping…

    A method and a system for facilitating data retrieval from a plurality of data sources are provided. A plurality of ‘Local Data Objects’ (LDOs) corresponding to the plurality of data sources are generated. Further, the plurality of LDOs are mapped onto a ‘Global Data Object’ (GDO). The GDO consolidates the plurality of LDOs into a single integrated model. The mapping of the LDOs onto the GDO includes a plurality of ‘binding conditions’ that relate LDO attributes to GDO attributes. The mapping also includes a plurality of ‘transformation functions’ for transforming the LDO attributes to the GDO attributes.

    See patent
  • Method and apparatus for semantic discovery and mapping between data sources

    Issued US 7426520

    An apparatus and method are described for the discovery of semantics, relationships and mappings between data in different software applications, databases, files, reports, messages, or systems. In one aspect, semantics and relationships and mappings are identified between a first and a second data source. A binding condition is discovered between portions of data in the first and the second data source. The binding condition is used to discover correlations between portions of data in the…

    An apparatus and method are described for the discovery of semantics, relationships and mappings between data in different software applications, databases, files, reports, messages, or systems. In one aspect, semantics and relationships and mappings are identified between a first and a second data source. A binding condition is discovered between portions of data in the first and the second data source. The binding condition is used to discover correlations between portions of data in the first and the second data source. The binding condition and the correlations are used to discover a transformation function between portions of data in the first and the second data source.

    Other inventors
    See patent
  • Specification to ABAP code converter

    Issued US 7320122

    A method of generating procedural language code for extracting data from a data warehouse comprising the steps of accepting a declarative specification and generating procedural language code to execute the declarative specification.

    Other inventors
    See patent
  • Specification to ABAP code converter

    Issued US 7320122

    Other inventors
    See patent

Organizations

  • Tau Beta Pi

    -

    - Present

    Engineering honor society

  • VPE/CTO Community of Practice

    -

    -

    http://www.communityofpractice.net/

Recommendations received

More activity by Alex

View Alex’s full profile

  • See who you know in common
  • Get introduced
  • Contact Alex directly
Join to view full profile

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Alex Gorelik in United States

Add new skills with these courses