Insight Mind Sdn Bhd

Insight Mind Sdn Bhd

Teknologi Maklumat dan Perkhidmatan

Bandar Puchong Jaya, Selangor 1,203 pengikut

Empowering Actionable Insights

Perihal kami

Our Mission: To improve the quality of life of the communities we serve by leveraging the power of technology and data towards maximizing productivity. Our core values: Innovation – We demonstrate our strength in the technology we create for our clients, and we value innovative efforts, ideas and methods to continually improve our business Self Responsibility – We will be responsible and be accountable towards our action Integrity- We exhibit honesty, openness, fairness, and integrity at all times when dealing with the team, partner, and clients. Growth- We celebrate failure as an opportunity for growth. We admit mistakes and learn from them. Heart – We seek to have a positive impact by always putting our partner, client and team first Trustworthy- We believe that being trustworthy and showing integrity in our daily lives is the key to long-standing client relationships and loyalty within our team.

Laman web
https://www.insightmind.com.my
Industri
Teknologi Maklumat dan Perkhidmatan
Saiz syarikat
2-10 pekerja
Ibu pejabat
Bandar Puchong Jaya, Selangor
Jenis
Milik Persendirian
Ditubuhkan
2017
Pengkhususan
Information and Communication Technology, Internet of Things, Data Science, Innovation, Design Thinking, Project Management, ICT Strategic Planning, ICT Training, Leadership, Agriculture Insights, HR Insights, Data Driven, Business Analytics

Lokasi

  • Utama

    SS-02-20, Skypod Square,

    Persiaran Puchong Jaya Selatan

    Bandar Puchong Jaya, Selangor 47100, MY

    Dapatkan arah

Pekerja di Insight Mind Sdn Bhd

Kemas Kini

  • Senior vs Junior Data Engineer #dataengineering #modernstack

    Lihat profil Chad Sanderson, grafik

    CEO @ Gable.ai (Shift Left Data Platform)

    The first time I was hired to work in data, it felt so exciting - I was going to be creating business value, working on AI/ML pipelines, and implementing cutting-edge technology. Then the reality quickly set in: The data infrastructure had been completely neglected for years. No one had established clear data domains. There was so much data debt I legitimately could not understand what the most important tables in the company were supposed to be doing or why they were even important. OK, not great...but then things started breaking. When the alarms began going off, people came to yell at ME. They got annoyed that I asked them to file a ticket, and then they got even more annoyed when I didn't get to their ticket in time. I would try to explain that these data quality issues weren't my fault or the fault of my team - but no one cared to listen, they just wanted their dashboard to look right so the CMO wouldn't keep hounding them that the numbers were wrong. It dawned on me that my role wasn't about building cool features or pushing the boundary on data management- it was a glorified customer support agent for SQL. The next data role I had wasn't much different - everything I had seen in the first position reared its head again - but this time, my mission was to change the culture and the people, NOT to spend my entire day languishing in a dark room writing comments asking for more information on JIRA. Now, don't get me wrong, on-call is a part of life. But when it feels like it is the ONLY thing your time is spent working on the mental burden can be worse can be worse than the paycheck you get every two weeks. And the absolute worst part is that you become non-essential in the eyes of leadership. You aren't adding to the bottom line like a software engineer working in a product org...you're a cost center. And cost centers are the first to get cut. If you are a data leader, I believe your number one goal is to make data engineering MANAGABLE for your organization. Buying an analytical database and ELT/ETL tool is just the mechanics. How do you people use those tools? What's the standard operating procedures? How do you deal with quality issues? Who is accountable? How do you decide what is a P0 and what's a P1/P2? Where do you NEED governance and quality versus where is it a nice to have? Does the rest of the business know their role in data generation and incident management? What VALUE is your team adding, how do you prove that value? Unless you can confidently answer those questions (and more) your team is going to spend more cycles hating their jobs than doing useful work. Good luck!

    • Tiada penerangan teks alternatif diberikan bagi imej ini
  • Apache Arrow Data Stack Ecosystem #pyarrow #pyiceberg #datastack #spark #trino

    Lihat profil Dipankar Mazumdar, M.Sc, grafik

    Staff Data Engineer Advocate @Onehouse.ai | Apache Hudi, Iceberg Contributor | Author of "Engineering Lakehouses"

    Apache Arrow Use Cases. It’s been really exciting over the past couple years to see Apache Arrow’s tremendous growth. Today Arrow is the de facto standard for efficient in-memory columnar analytics that provides high performance when processing & transporting large volumes of data. However, apart from being used as the in-memory columnar format, Arrow has evolved to be a full software development platform. Here are some more use cases where Arrow is utilized. ✅ Reading/writing columnar formats: there are many Arrow libraries that provides methods to read/write data from columnar formats like #Parquet ✅ Sharing memory locally: Arrow IPC files can be memory-mapped for data handling beyond memory limits & cross-language, while the C data interface enables zero-copy sharing within a single process, free from build-time or link-time dependencies. ✅ Data Transfer over network: enables serialization & transfer of columnar data across networks. ✅ In-memory data structure: Query engines like DataFusion, Polars leverages Arrow’s in memory format. Lakehouse table formats such as Apache Hudi, Apache Iceberg & Delta Lake also used Arrow to provide in memory data operations. I wrote about the growth of Arrow (link in comments) & number of projects adopting it are only increasing. #dataengineering #softwareengineering

    • Tiada penerangan teks alternatif diberikan bagi imej ini
  • Postgres Enhancement for OLAP Database #olap #postgres #duckdb

    Lihat profil Alireza Sadeghi, grafik

    Consultant @ ZyeLabs | Big Data Engineering | Data Architecture

    PostgreSQL for Unified Data Analytics!? ⚡ It's remarkable how PostgreSQL has evolved from a traditional transactional OLTP database into a Hybrid OLTP/OLAP engine. While super PostgreSQL extensions like TimescaleDB and Citus have been around for quite some time, we are now witnessing a new wave of modern extensions. 👉 New extensions, like 𝗽𝗴_𝗱𝘂𝗰𝗸𝗱𝗯 from DuckDB and 𝗽𝗴_𝗮𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 by ParadeDB, use embedded DuckDB to enhance PostgreSQL's OLAP capabilities. These extensions enable PostgreSQL to function as a Hybrid database engine, allowing for federated queries over Parquet files in object stores (S3) and data lakehouses (Iceberg, etc). 👉 Additionally, the 𝗽𝗴_𝗺𝗼𝗼𝗻𝗰𝗮𝗸𝗲 extension from Crunchy Data, provides native column store capabilities in open table formats like Iceberg and Delta. 👉 pg_mooncake leverages native transactions support for performing CRUD while using DuckDB as the execution engine. ⚡ When using these extensions on a production PostgresSQL system they still rely on the primary database resources, which can impact performance in production systems. 👉 The new open source 𝗕𝗲𝗺𝗶𝗗𝗕 built by Bemi is trying to address this challenge by automatically managing and using a read-replica of the production database for analytical workloads. 👉 This is similar to the Zero-ETL capabilities offered by managed transactional database systems like Amazon Aurora and RDS, with seamless data integration with Amazon Redshift. Like other extensions, BemiDB leverages DuckDB for its analytical processing. 💡 I anticipate that in 2025, we will hear much more about 𝙋𝙤𝙨𝙩𝙜𝙧𝙚𝙎𝙌𝙇 𝙛𝙤𝙧 𝙐𝙣𝙞𝙛𝙞𝙚𝙙 𝘼𝙣𝙖𝙡𝙮𝙩𝙞𝙘𝙨! (Image by Evgeny Li, who has shared further insights about their Bemi product on the official blog)

    • Tiada penerangan teks alternatif diberikan bagi imej ini
  • New MongoDB Release Comes with Notebooks on AI Agents #rag #aiagents #mongodb

    Lihat profil Eric Vyacheslav, grafik

    AI/ML Engineer | Ex-Google | Ex-MIT

    MongoDB just released a HUGE repo with 50+ step-by-step notebooks on RAG pipelines, AI agents, and vector search. It’s full of practical code samples and pre-built resources to help you: 🤖 Build AI agents that actually reason and plan ⚡ Speed up retrieval with optimized RAG pipelines 🔍 Combine vector search with keyword search for better accuracy 🛠️ Work with LangChain, LlamaIndex, Haystack, and more 📂 Store and retrieve embeddings at scale with MongoDB Atlas Check it out: https://fnf.dev/4aMGznn

    • Tiada penerangan teks alternatif diberikan bagi imej ini
  • Data DevOps with Data Assets and Observability in Mind #dataDevOps #dataAssets #dataObservability

    Lihat profil Chad Sanderson, grafik

    CEO @ Gable.ai (Shift Left Data Platform)

    Data DevOps is a missing practice within most data organizations. Without it, an entire class of quality, governance, and compliance issues becomes incredibly challenging to detect or prevent. First, what is Data DevOps? Data DevOps is the practice of embedding metadata management-- for data quality, governance, and compliance enforcement-- directly into the software development lifecycle to ensure data integrity before application code is deployed. Simply put, it brings the same systems software engineers use to manage their engineering applications to the code that produces data. Data DevOps has a few qualities that make it a perfect place to start with the adoption of data contracts: 1. It puts data management in the software engineering workflow. 2. It treats data quality/governance as a form of compliance as step one. 3. It allows software engineering teams to start simple and expand to more complex DQ rules over time. 4. It prevents data outages at the moment when code changes happen, stopping the issues in advance. The key difference between Data DevOps and traditional Data Observability or Data Cataloging is that Data DevOps focuses on code, not data. This is a critical issue because by the time data is produced, it's already too late! The code change has shipped, and it can be impossible to identify exactly what caused it and why. Like DevOps, Data DevOps allows data engineering and platform teams to embed policies and workflows as code to ensure the entire organization is managing the evolution of their data in a healthy way directly from the source. I believe Data DevOps is the next major step forward in data management. It allows us to bridge the gap between data and software engineering teams. It focuses on the intersection between every service, AI agent, and data warehouse table - the code written to produce, transform, or otherwise modify the data. The best way to solve data quality and governance is to solve a bigger problem that every engineer can empathize with. Gable was designed to fill this gap in the data management industry. It can not only trace code from one end of the application code base to another (that means 10s of millions of lines of code), map how data is transformed across environments and technologies, but runs as another step in a CI/CD workflow detecting changes, evaluating data contracts, and looping data consumers into the change communication workflow before issues happen. That means no migrations to a brand new tool, and no change to the processes that already exist. Good luck!

    • Tiada penerangan teks alternatif diberikan bagi imej ini
  • ETL and Data Preparation Using SQL and PySpark #etl #pyspark #datapreparation #sql

    Lihat profil Sachin Chandrashekhar 🇮🇳, grafik

    30K Fam - Follow me for Your daily Data Engineering Dose | 300+ Member ‘Data Engg Hub’ Community Leader | 100 days AWS Data Engg Program | Sr Data Engineer @ World’s #1 Airline | AWS Data Engineering Trainer & Mentor

    This one's for mainly for #sql Heroes... ... Data Analysts, DBAs, ETL testers, ETL developers! 𝐋𝐞𝐭'𝐬 𝐠𝐞𝐭 𝐭𝐨 𝐭𝐡𝐞 𝐩𝐨𝐢𝐧𝐭 - 𝐒𝐐𝐋 𝐢𝐬 𝐬𝐨 𝐩𝐨𝐩𝐮𝐥𝐚𝐫 𝐭𝐡𝐚𝐭 𝐒𝐩𝐚𝐫𝐤 𝐚𝐥𝐬𝐨 𝐡𝐚𝐬 𝐚 𝐟𝐥𝐚𝐯𝐨𝐫 𝐨𝐟 𝐒𝐐𝐋 - 𝐒𝐩𝐚𝐫𝐤 𝐒𝐐𝐋 Knowing SQL and then Spark opens up tremendous opportunities. Start with SQL and then along the way, add spark to your skillset. Spark is so popular that even aws #glue is based on spark! 𝐋𝐞𝐭 𝐒𝐩𝐚𝐫𝐤 𝐛𝐞 𝐢𝐧 𝐲𝐨𝐮𝐫 "𝐑𝐨𝐚𝐝𝐦𝐚𝐩" 𝐟𝐨𝐫 𝐬𝐮𝐫𝐞! But Remember - it all starts with SQL Data Engineering is all about SQL! You can just survive with SQL! To earn more, add Spark Here's an amazing document on pyspark equivalent of SQL! If you've read so far, do LIKE the post 👍 𝐏.𝐒: Learn no-nonsense AWS Data Engineering with 275+ others: https://aws.sachin.cloud & 𝐖𝐡𝐚𝐭𝐬𝐚𝐩𝐩 𝐠𝐫𝐨𝐮𝐩 : https://w.sachin.cloud

  • Data Preparation with Pandas #datacleaning #pandas

    Lihat profil Lovee Kumar, grafik

    Data Engineer at Digivate Labs 🌀| 36k+ LinkedIn | Databricks Certified | Mentor | Helping Freshers to Get a Job in Data World 🌍|

    Hello, Connections! 𝐃𝐀𝐓𝐀 𝐂𝐋𝐄𝐀𝐍𝐈𝐍𝐆 𝐈𝐍 𝐏𝐘𝐓𝐇𝐎𝐍📑 Data cleaning is a crucial step in data analysis and machine learning, as it ensures the quality and reliability of the dataset. The process typically involves identifying and correcting errors, inconsistencies, and missing values. 🔴 This PDF Covers: ◼Dealing with Missing Data ◼Dealing with Duplicates ◼Outlier Detection ◼Encode Categorical Features ◼Transformation ⏩ 𝐉𝐨𝐢𝐧 𝐦𝐲 𝐓𝐞𝐥𝐞𝐠𝐫𝐚𝐦 𝐂𝐡𝐚𝐧𝐧𝐞𝐥 𝐭𝐨 𝐥𝐞𝐚𝐫𝐧 𝐃𝐚𝐭𝐚 𝐒𝐜𝐢𝐞𝐧𝐜𝐞 & 𝐀𝐧𝐚𝐥𝐲𝐭𝐢𝐜𝐬: https://t.me/LK_Data_world ⏩ If you found this post informative, save and repost it🔁. ⏩ Follow for more Lovee Kumar 🛎 #data #transformation #datacleaning #datascientist #dataengineers #python #interviewprep #spark #pyspark #sql #linkedinconnection

  • Useful Reference for Building Realtime Data Assets and Analytics #realtimedata #datapipeline #kafka #debezium #modernstack

    Lihat profil Sachin Chandrashekhar 🇮🇳, grafik

    30K Fam - Follow me for Your daily Data Engineering Dose | 300+ Member ‘Data Engg Hub’ Community Leader | 100 days AWS Data Engg Program | Sr Data Engineer @ World’s #1 Airline | AWS Data Engineering Trainer & Mentor

    ETL tools like Informatica, Abinitio, SSIS, Talend, Datastage, etc. are no longer in demand! they are still being used and so are Mainframe based ETL jobs but.  their days of shine and lustre are long gone!! Lets look at Modern Data Pipelines: The modern data landscape demands agility, scalability, and real-time insights. But how do we tame the ever-growing data & transform it into valuable business intelligence? Imagine: 𝐑𝐞𝐚𝐥-𝐭𝐢𝐦𝐞 𝐚𝐧𝐚𝐥𝐲𝐭𝐢𝐜𝐬: No more waiting days for insights. Get notified of critical trends as they happen. 𝐔𝐧𝐢𝐟𝐢𝐞𝐝 𝐝𝐚𝐭𝐚 𝐥𝐚𝐤𝐞𝐬: Ditch data silos and fragmented information. Access all your data in one, centralized location. 𝐂𝐥𝐨𝐮𝐝-𝐧𝐚𝐭𝐢𝐯𝐞 𝐚𝐠𝐢𝐥𝐢𝐭𝐲: Spin up pipelines in minutes, not months. Scale seamlessly to meet demand. Here are some key ingredients for your modern data pipeline recipe: 𝐂𝐥𝐨𝐮𝐝 𝐩𝐥𝐚𝐭𝐟𝐨𝐫𝐦𝐬: Leverage the power of AWS, Azure, or GCP for elastic, scalable infrastructure.  Choose AWS - has highest market share. 𝐒𝐭𝐫𝐞𝐚𝐦𝐢𝐧𝐠 𝐭𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐢𝐞𝐬: Embrace Apache Kafka, #aws Kinesis, Apache Flink for real-time data ingestion. 𝐌𝐢𝐜𝐫𝐨𝐬𝐞𝐫𝐯𝐢𝐜𝐞𝐬 𝐚𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭𝐮𝐫𝐞: Break down monolithic pipelines into modular, manageable components.  #aws lambda is a game changer here; Use #aws sns and sqs as well. 𝐂𝐨𝐧𝐭𝐚𝐢𝐧𝐞𝐫𝐢𝐳𝐚𝐭𝐢𝐨𝐧: Package your pipeline elements for portability and consistency.  #aws ecs fargate is an amazing container based service. 𝐎𝐫𝐜𝐡𝐞𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧 𝐭𝐨𝐨𝐥𝐬: Use Airflow to automate and schedule tasks seamlessly.  #aws MWAA shines here. Want to learn Modern Data Engineering on AWS? 𝐏.𝐒: Learn no-nonsense AWS Data Engineering with 275+ others: https://aws.sachin.cloud & 𝐖𝐡𝐚𝐭𝐬𝐚𝐩𝐩 𝐠𝐫𝐨𝐮𝐩 : https://w.sachin.cloud If you've read so far, do LIKE the post 👍

Laman yang serupa